Introduction
For the first checkpoint of project 4, we were supposed to
-
take pictures that are suitable for merging into a panorama
-
estimate a homography in the least square
sense
-
warp the images together using the recovered homography
-
perform image rectification
-
blend the images into a mosaic
Then, the whole process of feature selection and correspondence matching was
done automatically in the second part of the project. I have also implemented
some extras, like panorama discovery and doing stitching on a cylinder.
Picture Taking
The pictures were taken around campus using a
Pentax K-x.
Homography Estimation
Given two sets of corresponding points,
x1, x2, ..., xn and
X1, X2, ..., Xn we are intersted in finding a homography that
maps each
xi to the corresponding
Xi. But as usually
the system we get is overconstrained, we are rather interested in the
least squares solution. The set of of
n correspondences results
in a system of
2n equations, and this can be easily approximated
in MATLAB using the backslash operator (i.e. multiplying by the
pseudoinverse).
Warping
I implemented inverse warping: we go over the resulting picture and take
the inverse of each point. Then, all of this data is fed to MATLAB's
interp2 which then interpolates the data using the intensities
in the original image.
Image Rectification
Once we have the above algorithms implemented, we can perform image
rectification. We pick four points such that they will become a square
when we look from the perspective we are intested in. Then, we compute
a homography to a square with known coordinates and then simply warp.
Here are some results with this technique:
|
|
Original (scaled down to 500px width)
|
Rectified
|
|
|
Original (scaled down to 500px width)
|
Rectified
|
|
|
Original (scaled down to 500px width)
|
Rectified
|
Creating Mosaics
The mosaics are created in an iterative fashion. If the images are 1,2, ..., N
then at the i-th iteration we join what is the result of merging 1 through i
and i+1. At each step we define a set of correspondence points from which we
estimate the homography.
As the boundaries where the images meet are in general not going to be smooth,
I have implemented feathering. How this is done, is that for one of the pictures
that is merged we define a radial feathering mask that is 0 at the centre of
the image and grows to 1 radially. This makes the transitions smoother.
Here are some results.
|
Original Images (click for larger version)
|
|
Stitched
|
|
Original Images (click for larger version)
|
|
Stitched
|
|
Original Images (click for larger version)
|
|
Stitched
|
Automatic Stitching
First, the features are selected using the Harris corner detector. I have
used the implementation provided with the homework. This is an example
of the output from the Harris detector.
|
Harris detector output.
|
As you can see, there are too many points selected in the picture. To make
the correspondence process easier, we select the strongest 500 points (or less),
but we also take into consideration their distribution - we want to have
them all over the picture. For this, we use Adaptive Non-Maximal Suppression (ANMS),
as described in the provided
paper
by Brown et al. This is the result of applying this algorithm on the above picture.
|
Output of ANMS.
|
Once we have these points, we are ready to select the actual feature. At each point
we center a 40x40 window (or smaller, if it is near the borders of the picture).
Then we resize this window to 8x8 and put all of the entries in a 64-dimensional
vector. This vector is bias/gain normalized by subtracting the mean and dividing
by the variance. These are some pictures of example extracted windows (they are
shown before reduction to 8x8, so they are easier to identify).
|
Extracted features.
|
Now that we have the 64 dimensional features, we have to match the features in
one picture with the features in the other. This is done as described in the
paper. We iterate over all pixels and greedily choose the best neighbor. If
the ratio of
best_neighbor/second_best is above some threshold, then
we add that pair.
Unfortunately, some of the pairs will be outliers. Simply using a least squares
approximation will not work, as the outliers will have a big effect on the
estimated parameters. Thus, we use a RANSAC algorithm which maximizes the
number of inliers, rather than a general target function. In a loop, 10 000 times
(a constant chosen empirically), we choose randomly 4 corresponding pairs from
which we estimate the homography. Then, we count how many points are inliers
when this homography is used. This turned out to be very robust and gave some
really good results. Below some examples are shown.
|
Ran on the same data as the one obtained manually.
|
|
Ran on the same data as the one obtained manually.
|
|
Ran on the same data as the one obtained manually.
|
|
Original Images (click for larger version)
|
|
Stitched
|
Bells & Whistles
Panorama Discovery
Sometimes, when we have a large dataset of pictures, we would like to
automate the whole process even more so that panoramas are discovered
automatically and stitched - the user has to do zero work. How I do this
is as follows. I create an undirected weighted graph where each node
corresponds to a different picture. The edges are weighted, and the weights
corresponds to the fraction of points that were inliers when performing
RANSAC. Some of the weak edges are discarded, as these are probably
not related pictures. Then, the connected components are extracted.
Each connected component should (hopefully) correspond to a different
panorama. Now, for each component I do a breadth first search and I join
the images with the current panorama as they are discovered. I put
all the files from the above panoramas in the same folder and I got
the same panoramas as above.
Blending & Compositing
We can also easily play with pictures, composing them in various ways.
These are some results.
Cylindrical Mapping
Instead of doing planar mapping, we can first map the images to a cylinder and
then estimate a translation between them. For cylinder mapping, I have used the
equations in Szeliski's book. The map estimation was done using a very simple
algorithm (similar to RANSAC), where we go over all feature points, we
compute the translation between the point and its 'image' (two points define
a translation) and we choose the translation with most inliers.
I did not manage to get the applet to work - it is either hanging on 'Loading Image',
or it is not able to find the IVR file - I tried both relative and
absolute addressing. Below I show a result of this procedure,
but note that it is not completely 360 degrees and it was not taken with
a tripod (unfortunately, I don't have one :( ). Also, I picked a value
for the focal length that made the result look good.