Final Project- Image Analogies

Author: Stephen Tawa (stawa)

Overview

For my final project I am implementing the Image Analogies paper available here. We have images A and A' that are related in some way, as well as a source image B. The goal of the algorithm is to complete the analogy A:A' :: B:B'. This is applicable to several problems such as artistic filtering, texture synthesis, texture transfer, and texture-by-numbers.

The Algorithm

The main idea of the algorithm is that to determine the value of pixel q in B', we should find a pixel p in A that is "similar" to pixel q in B. Then, the value of p in A' should be appropriate for the value of pixel q in B'. The success of the algorithm lies in which features and matching functions we use to define "similarity".

My implementation does some precomputation before filling in B'. First, it calculates features for each pixel. As suggested by the paper, I use the luminence (Y) component in the YIQ color space because human perception is more sensitive to luminence than to particular color channels. Additional information such as filter responses can be added as features for specific applications, but I found luminence sufficient. Also, we set up the index structures to be used by the ANN algorithm. In a multiscale approach, we would also set up the pyramids and do this for each level. I switched to single-scale when my multi-scale approach had trouble interfacing with the FLANN library (described below).

We fill in B' pixel by pixel by using the value from A' of a "similar" pixel. We want to find pixels that blend well with the surrounding area. The feature vector for pixel q of B' is the concatenation of features of many pixels: the previously synthesized neighborhood of q in B' and the 5x5 neighborhood of q in B. Thus, our feature vectors have up to 37 dimensions: up to 12 from B', and up to 25 from B.

Given a feature vector for a pixel in B', we need to find the best match. There are two candidates for the best match.

First, we find the nearest neighbor among all of the feature vectors for A. I used the FLANN library to quickly find approximate nearest neighbors. However, the data structures for this require all feature vectors to be the same length (37 dimensions), so I had to brute force the image boundaries. Brute forcing this part was the bottleneck of the algorithm, so I attempted to speed it up by creating additional FLANN indices for each edge. This would make the number of brute forced pixels independent of image size (only the corners). But doing so resulted in mysterious segfaults in FLANN, so I never found out if it would be faster.

Second, we find a coherence match, which returns the best pixel that is coherent with some already-synthesized portion of B' adjacent to q. To do this we maintain a data structure to lookup the source pixels, and for each neighbor in B', we do a lookup and calculate the distance between the corresponding vectors. We use a constant, kappa, to favor the coherence matches. This is because they tend to look better even if their distance is greater.

At this point we have filled in B' with luminence information from corresponding pixels in A'. Depending on the application, we will obtain the remaining color information from either B or A'.

Results

I was able to get the algorithm working for basic filters, artistic renderings, and texture by numbers. I do not have results for texture synthesis and texture transfer. I originally thought that texture synthesis would directly follow from the image analogies algorithm, but after investigating the web site, it seems they incorporated ideas from other papers, and only meant that it is conceptually similar. I tried some of my own ideas but none were successful, so I declared it outside the scope of this project :)

The paper gave a high-level overview of the algorithm, leaving plenty of details for me to fill in. As a result, my images look somewhat different than those online. Some important parameters that I varied were how I weighted pixels in each vector, and how I chose kappa. Also, for the sake of efficiency, I downsized A and A' to reduce the search space for nearest neighbors. This may have resulted in my images being coarser than those in the paper. But all images here were generated in 1 to 5 minutes, while the paper claimed some took several hours (in 2001).

Basic Filters

Here is a blur filter learned through an analogy, with kappa=5:

Here is an interesting effect of learning the same filter with kappa=25 (strongly favoring coherence over accuracy):

Next is an emboss filter. Here is the luminence information (as before, the flower is the training pair and the shoreline displays the learned filter):

This shows how we combine the learned luminence with color from the original image:

Artistic Filters

You can compare these to the authors' results.

Pastel painting:

Watercolor painting:

My Halloween trollface...in watercolor

Texture-by-Numbers

Again, the source images are from the paper's website. My results have some more artifacts, and some water in places where it shouldn't be.

A couple more results with different kappa/Gaussian parameters:

Texture Synthesis Fail

Conceptually, texture synthesis is a special case of image analogies where A and B are ignored. We have the texture as A', and build up B' from that. However, the manner in which A' and B' are compared in my algorithm is not sufficient for good texture synthesis. At first, my B' ended up as a copy of A'. I attempted to add some randomness, such as taking a probability distribution over k nearest neighbors instead of always picking the closest match. Here's what I got as I varied some probabilities:

Rope texture:

Rope textures???