Programming Assignment 1: IMAGE MOSAICING

15-869, Image-Based Modeling and Rendering

Due: Sunday, September 19, at midnight
Revision 2, Sept. 13, 1999

In this assignment you will take two or more photographs and create an image mosaic by registering, projective warping, resampling, and compositing them. As background for this assignment, read Projective Mappings for Image Warping, Paul Heckbert, and Image Mosaicing for Tele-Reality Applications, Richard Szeliski, Digital CRL 94/2, 1994 (revised version appeared in IEEE Computer Graphics & Appls, Mar. 1996).

Typically, I'd prefer that you work individually, but if you would like to work in a team of two, speak to me.

The steps of the assignment are:

shoot and digitize pictures
register them
warp and mosaic them
submit your results

Shoot and Digitize Pictures

Shoot two or more photographs so that the transforms between them are projective (a.k.a. perspective). One way to do this is to shoot from the same point of view but with different view directions, and with overlapping fields of view. Another way to do this is to shoot pictures of a planar surface (e.g. a wall) from different points of view.

The options for digitization may dictate your choice of camera:

Digital camera: use yours, borrow one from us or a friend. It may be harder for you to get one of these cameras, and typically their spatial resolution isn't as high as you'd get with photographic film, but once you've got the camera, you get instant feedback on the pictures you've taken, and getting them on-line is easy, so this is the recommended method. We recommend you use a resolution of at least 640x480 pixels, and capture and save the pictures in the highest quality mode available. For the cameras we can loan, a machine running Windows 9x/NT is required to transfer the pictures after shooting.
Photo CD from slide or print film. You can shoot on standard photographic film and most photo labs can have your images digitized onto a Photo CD or Picture CD. These are two different CD ROM formats for color images. Either is quite easy to manipulate with image processing programs such as Photoshop. Call a photo lab to get prices. This is a good route if you want top quality and you're going to be shooting more than 10 images, say, but it's more expensive than the options below.
Scan color prints. You could shoot standard color print film, get it developed, and then scan the images using the scanners in Wean 3501. For CS grad students, your NC 58 key will get you in. When I last used the scanner there, the procedure was: Put photo in Silverscan flatbed scanner; on Mac, run Photoshop; on menu: Acquire, Silverscan; scan at 100-200 pixels per inch (recommended), if 4"x6" prints, if more, the files will be huge; crop if desired; write picture to file (TIFF format is good). It is important that you transfer all of your picture files to another machine (e.g. by Appleshare) and clean up after yourself, since these are public machines.
Scan color slides. This is probably less desirable. There is a slide scanner in the Robotics Graphics Deli in Smith Hall. Ask Debra Tobin if you can use it.

We're not particular about how you take your pictures or get them into the computer, but we recommend:

Avoid fisheye lenses or lenses with significant barrel distortion (do straight lines come out straight?). Any focal length is ok in principle, but wide angle lenses often make more interesting mosaics.
Shoot as close together in time as possible, so your subjects don't move on you, and lighting doesn't change too much.
Use identical aperture & exposure time, if possible. On most "idiot cameras" you don't have control of this, unfortunately. It's nice to use identical exposures so that the images will have identical brightness in the overlap region.
Overlap the fields of view significantly. 20% to 50% overlap is recommended. Too little overlap makes registration harder.
It's OK to vary the zoom between pictures.
You can use pictures that you shot specially for this class, or pictures you shot previously. If you can't use your own pictures, talk to Paul.

If you're shooting a non-planar scene, then shoot pictures from the same position (turn camera, but don't translate it). A tripod can help in this, particularly if objects are close.

Good scenes are: building interiors with lots of detail, inside a canyon or forest, tall waterfalls, panoramas. The mosaic can extend horizontally, vertically, or can tile a sphere. You might want to shoot several such image sets and choose the best.

Shoot & digitize your pictures early - leave time to re-shoot in case they don't come out! Print and lay out your photos on a table to see approximately what the mosaic will look like.

Register Images

To register the images, for each pair of overlapping images, find four or more corresponding points, get their coordinates, and compute the transformation from one image to the other. We recommend that you use the starter code provided in the class's pub/src/asst1 directory, in particular, mosaic.cxx. Currently, this program loads a single image and allows you to click on points and it prints pixel coordinates. It's written in C++, it uses OpenGL for graphics, and FLTK for the user interface. We suggest that you modify it and create a program that loads multiple pictures (side-by-side or sequentially), permits you to tap out four or more corresponding points for each pair, and solves the linear system of equations for the coefficients of the projective transform. Tips: If you permit the image to be zoomed up, then users will be able to digitize points with fractional pixel precision, thereby improving the registration. Placing the digitization points as far apart as possible will improve accuracy.

If you digitize n points, you'll be solving a system of 2n equations in 8 unknowns. See the Projective Mappings paper mentioned earlier. If n=4 then you can turn it into an 8x8 linear system of equations. If n>4 then the system is overdetermined, and requires more work to solve. (In the overdetermined case, depending on whether you work with the rational or linearized versions of the equations, given in the above paper, you get either a nonlinear or linear overdetermined system of equations, which can be solved by least squares methods. The nonlinear approach probably would give more accurate registration, but the linear approach is far easier to implement using the ``normal equations'' described in any linear algebra textbook.) You can find code to solve a linear system in the VL library (discussed in the course software web page ) in the Numerical Recipes book, or numerous other sources. Any reasonably accurate method should suffice here. The math for projective image warps will be covered in lecture.

Warp and Mosaic the Pictures

Warp the images so they're registered and create an image mosaic. Instead of having one picture overwrite the other, which would lead to strong mosaic artifacts, use weighted averaging. You can leave one image unwarped and warp the other image(s) into its projection, or you can warp all images into a new projection; it's up to you.

You'll probably need to transform some picture corners and find a bounding box to determine how big your output image will be. From there, the recommended algorithm is to scan out those pixels in scanline order. For each output pixel, loop over all input pictures and

projectively transform that point into the coordinate system of that input picture
compute a weight for that picture (zero if outside the picture or at edge, peak weight at center)
if the weight is positive, interpolate a color from the input picture using bilinear interpolation (if you use point sampling instead of some form of interpolation, your warped pictures will look jaggy)

From these weights and colors, compute the weighted average at each output pixel and write it out (if the registration is perfect, the image exposures are identical, there is no vignetting, and the scene was static, the colors being averaged here will be equal, but life is rarely that simple).

Bugs in the bilinear interpolation code are common because they're easily overlooked. (It's recommended that you test this code by doing a special test where you zoom up a picture by a factor of 10. The resulting picture should have no step discontinuities in intensity.)

Try to get the registration and compositing working well enough that the seams become invisible. If the exposures or processing of your pictures differ markedly, you may find it helpful to do some color correction (use Photoshop, say). Vignetting (darkening on edges of image due to the lens optics) can also be a slight problem.

Although you could get results close to this using OpenGL texture mapping, we want you to write code to do the warping and compositing yourself for this assignment.

If your mosaic spans more than 180 degrees, you'll need to break your mosaic into pieces (as in cubical environment maps), or else use non-projective mappings, e.g. spherical or cylindrical projection.

Submit Your Results

Put your code and executable in /afs/cs/project/classes-ph/869/students/yourname/asst1 , and put your best pictures and a web page explaining and displaying them in a subdirectory asst1/www . (as of 9/15, the student directories have been created). If you didn't get a directory, but need one, send email to ph@cs. The asst1 directory will be private, while the asst1/www directory will be public (to permit students to view each others' results), so the latter should not contain code, and it should not contain links to private files. In asst1/www/index.html, put a web page (HTML) containing your original images, your final, best mosaic, and some explanation.

If you don't know HTML, examine the source to this web page, for example, to see how to do simple formatting and picture display.

High output resolution is desirable, but for the web page, please zoom down your images to a width and height of no more than 1200x1000. Converting your picture files from TIFF to JPEG will permit Netscape or Explorer to display them directly. We recommend Photoshop or the UNIX program convert for this.

On this web page, mark the exact correspondence points in the original images somehow (e.g. bright red pixels, or little X's). Include a few paragraphs explaining what you did: what the pictured object is, briefly how you shot the pictures and digitized them, a sentence or two on what the user does to specify the correspondence, how long the program took to run, the best and worst aspects of your results, and any unusual aspects of your solution (e.g. "My program is written in matlab" or "My output projection is cylindrical.")

Extra Credit

Implement semi-automatic registration. A good balance of interaction and automation would be to have the user indicate approximate correspondence points (within a few pixels) and then have the system do a local search within a small window (at sub-pixel precision) for the point that minimizes the sum of squared differences within a small window. Or go further and try the method from section 4.1 of Szeliski's paper. If your image warping code is fast enough, you could redisplay as the user "drags" and stretches the images.
Generate a panorama and put it in Quicktime VR file format so it can be displayed using one of the optimized Quicktime VR viewers.

Change log: 9/14: corrected section on overdetermined systems, added link to my Projective Mappings paper.

Paul Heckbert