The picture on the left is the initial result before the removal of the borders. The picture on the right is the result after the removal of the borders. Displacement of the red and green channels are shown below.
Sergei Mikhailovich Prokudin-Gorskii (1863 - 1944) was a Russian photographer who has won the Tzar's special permission to travel across the vast Russian Empire and take color photographs of the various facets of daily life. However, since there was no ready-made equipment to display color photographs, he recorded three exposures of every scene in three channels: red, green and blue. These exposures were recorded on glass plates, which found their way to Library of Congress. This project takes the separate exposures and attempts to combine them in the manner they were intended to be.
The main problem to overcome was the issue of aligning the photographs correctly. This could be done by hand, but we want to automate this process. So there are two approaches to calculate the "closeness" of the separate channels - the sum of squared differences (SSD) or normalized cross-correlation (NCC). In theory, NCC is supposed to operate better than SSD due to the different lighting levels in the different color channels. However, after trial and error, there is little discernible differences between the two algorithms; SSD ran faster than NCC so SSD was the algorithm of choice. Essentially, using the blue channel as the base channel, the red and green exposure was shifted in an window of some width and height, and scored using the SSD algorithm. The shift giving the smallest score was used, and the three exposures are concatenated together using the blue and now-shifted red and green channels.
The above was the naive solution for small images. It took too long
to run the algorithm on larger iamges if a search window
size that was proportional to the image size was used, as the run
time of circshift
is proportional to the total size
of the matrix. Thus a reduction of a single pixel of search space
on a high resolution picture would give a large boost in performance,
however search space would be limited. Thus, an image pyramid was
used: pictures were scaled by half recursively, until an image
smaller than 64-by-64 was obtained. At every stage, a guassian filter
was applied to the image to remove aliasing. A coarse-to-fine alignment
algorithm was used: starting from the smallest image, mark the first
displacement with the lowest score, then move up the image pyramid.
The displacement would be scaled by two, and the red and green
exposures are shifted by this initial displacement. The search
is then run again from this initial displacement, and the new final
displacement is propagated and scaled up the image pyramid, until the
the original image has been aligned. This algorithm reduces the need
for a large search space on the higher levels of the image pyramid
which speeds up the process tremendously.
At first the algorithms worked fine with the smaller images, but it did not work as well for the larger images. First median-filtering was applied, as it was observed that there were some spots and blemishes in the pictures, and median filtering could remove these high spots. However, there was no visible improvement in image quality after the application of median filter. Another approach was used: since ssd was scoring based on "closeness", was there another feature in the picture which was artificially raising the closeness value? I.e. was there a common feature in the pictures but not part of the subject? It appears that there are borders present on the edges of the pictures which are common in all the pictures but are not part of the subject - this encouraged the pictures to align in accordance with the borders, which clearly had no bearing on the alignment of the subjects themselves.
Two approaches were considered in the removal of the borders. One was to calculate some threshold values, and crop out the border areas which did not meet this threshold. The other method was to crop some arbitary area of the picture. The first approach turned out to be ineffective because in some pictures, the borders had similar intensities with some areas of the photots, which meant that it was impossible to set some threshold to remove this areas - either the borders were ignored or areas in the picture were removed along with the borders. It was then noticed that the size of vertical borders were pretty much constant in all the images, thus the second approach was utilized. After testing a range of values from 5 to 25%, a removal of 10% of the picture of all edges returned the best results.
The picture on the left is the initial result before the removal of the borders. The picture on the right is the result after the removal of the borders. Displacement of the red and green channels are shown below.
According to the algorithm, we want to apply a low-pass filter to one image, and a high-pass filter to another image and perform a linear combination of the images together. The low-pass filter was a Gaussian filter of size 17 and 20 sigma, and the high-pass image was the subtraction of the low-pass image from the original. The first two images on each row were the source images, and the next two are possible permutations of high and low pass images tagged with their respective 2D Fourier transform images.