Learning Representation For Automatic Colorization
Refer to the paper here. The slideshow can be found here.
Prerequisite Knowledge
Colorization: The inverse of desaturation (grayscaling).
Abstract
Train model to predict per-pixel color histograms
Used to automatically generate a coloured image
Introduction
This paper combines ideas from image classification and object detection.
Problems in past works that this paper tries to solve
Colourization usually requires some user input (not fully automatic)
Promising results on landscapes, but has trouble with complex images with foreground objects.
Requires processing of a large dataset (past approach is to find reference image and transfer colour onto a grayscale image)
Technical System/Model Overview
Design principles:
Semantic knowledge → Leverage ImageNet-based classifier
Low-level/high-level features → Zoom-out/Hypercolumn architecture
Colorization not unique → Predict histograms
Process grayscale image through VGG and take spatially localized multilayer slices (hypercolumn) as per-pixel descriptors
Going from histogram prediction to RGB image
Sample
Mode
Median ← Chroma
Expectation ← Hue
The Paper's Approach
Semantic composition and localize objects are important.
What is in the image, and where things are in the image
Use CNNs to achieve these things
Some image elements can be assigned one colour with high confidence (e.g. clothes, car), others could be multiple colours. To solve this, we predict a colour histogram instead of a single colour at every pixel.
Related Work
Previous colorization methods fall into the following 3 categories.
Scribble-based Methods
This method required manually specifying desired colours in regions of the image. Then, it would be assumed that pixels adjacent to these regions would have similar colours and brightness. The user can also further refine with additional scribbles.
Transfer-based Methods
This method relies on the availability of references images as it transfer colour to grayscale images. This makes it partially manual.
Automatic Direct Prediction Methods (What this paper is aiming for)
More in Method.
Method
Last layer is always softmax for histogram predictions
This task can be viewed as an image-to-image prediction problem. A value is predicted for each input pixel. These classification problems are usually done with pretrained networks. These networks can be converted to fully-convolutional layers which means the output image shape is the same as the input image shape using the shift-and-stitch method or the a trous algorithm.
Skip-layer Connections
These connections link low- and mid-level features to the prediction/classifier layers. This paper implements this by extracting per-pixel descriptors by reading localized slices of multiple layers via hypercolumns.
How do we generate training data (3.1 Colour Spaces)?
Hue/Chroma
Problem with HSL (1st image): The values of S and H are unstable at the top (white) and bottom (black).
Euclidean distance between this vector and the origin determins chroma.
3.2 Loss
Histogram Loss
At first, a mean squared error loss function was considered for measuring prediction errors. However, regression targets do not handle multimodal color distributions well. Instead, we predict distributions over a set of colour bins:
Binning Colour-Space
Hue/Chroma Loss
3.3 Inference
With histogram predictions, we have the following options:
Sample: Draw sample from histogram. If you are drawing per pixel, this may create high-frequency colour changes in areas of high-entropy histograms.
Expectation/Mean: Sum over colour bin centroids weighted by histogram.
For Lab output, expectations produces the best results. For hue/chroma, median produces the best results.
For hue, we compute the complex expectation:
3.5 Neural Network Architecture
Base network: VGG-16
Two changes to the network:
Classification layer (fc8) is discarded
Last updated