Learning Representation For Automatic Colorization

Refer to the paper here. The slideshow can be found here.

Prerequisite Knowledge

Colorization: The inverse of desaturation (grayscaling).

Abstract

  • Train model to predict per-pixel color histograms

    • Used to automatically generate a coloured image

Introduction

This paper combines ideas from image classification and object detection.

Problems in past works that this paper tries to solve

  1. Colourization usually requires some user input (not fully automatic)

  2. Promising results on landscapes, but has trouble with complex images with foreground objects.

  3. Requires processing of a large dataset (past approach is to find reference image and transfer colour onto a grayscale image)

Technical System/Model Overview

Design principles:

  • Semantic knowledge → Leverage ImageNet-based classifier

  • Low-level/high-level features → Zoom-out/Hypercolumn architecture

  • Colorization not unique → Predict histograms

  1. Process grayscale image through VGG and take spatially localized multilayer slices (hypercolumn) as per-pixel descriptors

Going from histogram prediction to RGB image

  • Sample

  • Mode

  • Median ← Chroma

  • Expectation ← Hue

The Paper's Approach

  1. Semantic composition and localize objects are important.

    • What is in the image, and where things are in the image

    • Use CNNs to achieve these things

  2. Some image elements can be assigned one colour with high confidence (e.g. clothes, car), others could be multiple colours. To solve this, we predict a colour histogram instead of a single colour at every pixel.

Previous colorization methods fall into the following 3 categories.

Scribble-based Methods

This method required manually specifying desired colours in regions of the image. Then, it would be assumed that pixels adjacent to these regions would have similar colours and brightness. The user can also further refine with additional scribbles.

Transfer-based Methods

This method relies on the availability of references images as it transfer colour to grayscale images. This makes it partially manual.

Automatic Direct Prediction Methods (What this paper is aiming for)

More in Method.

Method

    • Last layer is always softmax for histogram predictions

This task can be viewed as an image-to-image prediction problem. A value is predicted for each input pixel. These classification problems are usually done with pretrained networks. These networks can be converted to fully-convolutional layers which means the output image shape is the same as the input image shape using the shift-and-stitch method or the a trous algorithm.

Skip-layer Connections

These connections link low- and mid-level features to the prediction/classifier layers. This paper implements this by extracting per-pixel descriptors by reading localized slices of multiple layers via hypercolumns.

How do we generate training data (3.1 Colour Spaces)?

Hue/Chroma

Problem with HSL (1st image): The values of S and H are unstable at the top (white) and bottom (black).

    • Euclidean distance between this vector and the origin determins chroma.

3.2 Loss

Histogram Loss

At first, a mean squared error loss function was considered for measuring prediction errors. However, regression targets do not handle multimodal color distributions well. Instead, we predict distributions over a set of colour bins:

Binning Colour-Space

Hue/Chroma Loss

3.3 Inference

With histogram predictions, we have the following options:

  • Sample: Draw sample from histogram. If you are drawing per pixel, this may create high-frequency colour changes in areas of high-entropy histograms.

  • Expectation/Mean: Sum over colour bin centroids weighted by histogram.

For Lab output, expectations produces the best results. For hue/chroma, median produces the best results.

For hue, we compute the complex expectation:

3.5 Neural Network Architecture

Base network: VGG-16

Two changes to the network:

  1. Classification layer (fc8) is discarded

Last updated