Automatic Generation of An Infinite Panorama

By: Lisa H. Chan
Advised by: Alexei A. Efros and James Hays

Page Creation: November 26, 2007.
Updated: December 20, 2007.

Initial Proposal

InfPan_Proposal.pdf, 550KB

Paper

AutoGen_InfinitePan.pdf, 2.47MB

Abstract

This study presents the possibility of using image completion combined with a large image database to create an infinite panorama. The algorithm performs scene matching using a portion of the original input image to find the best matching neighboring scenes and then composites these images in a seamless way. Even though the automatically generated panoramas may not be convincingly realistic, the panoramas are nevertheless aesthetically pleasing and artistic.

Introduction

With using the traditional methods of creating a panorama, much time and effort are needed to capture the necessary images on location and then to composite the images into a panorama in a photo-editing program. However, another key requirement for compositing a panorama in this way is that the idea of a panorama must have been pre-meditated. If the desire to view a panorama occurs post-image-capture, the only option is to travel back to the original location and capture the necessary images. The problem with this option is that not only is it time and monetarily consuming, but logistically impossible since the mise en scene and plenoptic function will be very different. A possible solution to this whole problem is image completion using a large image database.

Methods

The scene matching and image compositing methods are largely based on previous work by Hays and Efros [1], and will therefore only be very briefly mentioned here. The overall schematic of the algorithm used in this study is summarized below:

Given an input image for compositing, 1/3 of the input image is removed and preserved for later compositing, while a gist descriptor is computed using the remaining 2/3 of the input image. Scene matching is then performed by calculating a gist distance with the sum of the squared differences (SSD) between the gist of the input image and each of the gist descriptors in the database, and a color distance that is computed in the L*a*b color space. With the top 200 nearest neighbor scenes, SSD is employed to find the best match along the overlap region of the input image and the neighbor scenes. After the neighboring image is selected, graph cut seam finding [2] combined with standard poisson blending [3] are then used to composite the two images. Once the images are blended together, the output image can be reintroduced to the algorithm as a new input image to start a new search for a neighboring scene. The result of the overall algorithm is many smaller pieces of a large panorama. Because poisson blending will actually change the coloring of the images, a simple cut and paste does not work in this case. Instead, a two-level laplacian blending was used to blend all the images together to create the final large panoramic image.

Results

The results from this study are shown below. Overall, the images all show nice seamless transitions between the additional images, even though at certain times the scene changes are rather obvious. On the right side of all the panoramas, it can be seen that the scene has transitioned from the original high frequency input image on on the left side into a region of low frequency image additions. The low frequency images added onto the panorama will cause any further image searching to remain in the low frequency region, never being able to recover back into a scene with high frequencies. Some of the possible solutions have already been attempted and are presented in Solutions section, while others are located in the Future Work section.

Solutions

Stitching within one search:

In this large panorama, only 80 images were added onto the original input image even though 200 top nearest neighbor scenes were found. The reason behind the large reduction in images is due to either the results being black or white, or the images not being able to fit into the "hole". As can be seen, the scenes stitched together do not fit well as neighbors, and at times, half an animal or person are cut off.

Maintaining total energy of the gist descriptor:

In the following panoramas, an additional variable was added to the distance calculation during the scene matching search. The total energy of the gist descriptor was compared to the total energy of each gist descriptors in the database to find images with similar energies. Both the total energy of the gist of the entire original image and 2/3 original image (normal input image) were attempted. Results show that the panoramas look worse than without using the energy variable.

Using energy of the gist calculated from 2/3 original image

Using energy of the gist calculated from entire original image

Future Work

Stitching pairs of images with 200 nearest neighbors then filling gaps with further searching

Maintaining frequency of input image throughout the building of the panorama by imposing fourier transforms and studying histograms of the searched images

Using the full images to search for nearest neighbor

Using a larger database of images

Just for Fun

Here's just a fun comparison of the difference between manually selecting the correct image to composite versus the automatic selection. I truly feel that the manual selection looks more aestically pleasing, but it simply require too much time to generate a considerable length of panorama.

Human Selection
Automatic Selection

Examples of Failures:

Below are examples of failures that are manually selected when all top 200 images were composited to the original image. When the fully automatic generation of the code was finalized, SSD was able to rule out these failures for the overall composite of the panoramas.

	1/2 of Original Image	2/3 of Original Image	3/4 of Original Image
Input Image
Match #1
Match #2
Match #3
Match #4
Match #5

References:

[1] J. Hays,A.A. Efros, SIGGRAPH 2007, Los Angeles, 2007.
[2] V. Kwatra, A. Schodl, I. Essa, G. Turk,A. Bobick, ACM Trans. Graph. 22 (2003) 227-286.
[3] P. Perez, M. Gangnet,A. Blake, ACM Trans. Graph. 22 (2003) 313-318.

Comments, questions to lisachan@andrew.cmu.edu.