Programming Assignment 3: Single View Modeling

Due: Tue Oct 19, midnight

Revision 2. Oct 13.

In this assignment you will create 3D texture-mapped models from a single image using the method described in "Single View Metrology," by Criminisi, Reid, and Zisserman, ICCV 99.

The steps of the assignment are:

1. Image acquisition
2. Calculate vanishing points
3. Choose reference points
4. Compute textures and 3-D positions and create a VRML model
5. Submit results

Image Acquisition

For this assignment you should take high resolution (preferably at least 800x800) images or scans of at least two different scenes. One of your images should be a sketch or painting. For instance, a photo of the Cathedral of Learning and a painting of Leonardo Da Vinci's "The Last Supper" might be interesting choices. (We don't want everyone in the class to do these objects, however.) Note also that the object you digitize need not be monumental, or be a building exterior. An office interior or desk is also a possibility. At the other extreme, aerial photographs of a section of a city could also be good source material (you might have more occlusion in this case, necessitating some manual fabrication of textures for occluded surfaces). The images need not be in color. Be sure to choose images that accurately model perspective projection without fisheye distortions. You'll want to choose images that are complex enough to create an interesting model with at least ten textured polygons, yet not so complex that the resulting model is hard to digitize or approximate. See below regarding scanners and other resources.

Calculating Vanishing Points

Choose a scene coordinate frame by defining lines in the scene that are parallel to the X, Y, and Z axis. For each axis, digitize more than two lines parallel to that axis. The intersection of these lines in the image defines the corresponding vanishing point. Since the accuracy of your model depends on the precision of the vanishing points, implement a robust technique for computing vanishing points that uses more than two lines. We recommend the method described by Collins (see links at bottom). The technique described in class will also work but may give less accurate results.

To compute vanishing points, choose line segments that are as long as possible and far apart in the image. Use high resolution images, and implement a zoom feature to specify line endpoints with sub-pixel accuracy. A small number of "good" lines is probably better than many inaccurate lines. You will save quite a bit of time by adding a "save" feature to your program so that you don't have to recalculate vanishing points every time you load an image. You could write them out in a simple ASCII file format of your own design, for example.

Choose Reference Points

To avoid affine distortions in your model, you will need to set the scale parameters as described in lecture and in the paper. One way of doing this is to measure, in 3-D, when you shoot the picture, the positions of 4 points on the reference plane and one point off of that plane. The 4 reference plane points and their image projections define a 3x3 matrix H that maps u-v points to X-Y positions on the plane (using the same method for calculating H that you used for assignment 1). The fifth point determines the scale factor alpha off of the plane, as described in lecture and in the paper. Alternatively, you can specify H and alpha without physical measurement by identifying a regular structure such as a cube and choosing its dimensions to be unit lengths. This latter approach is necessary for paintings and other scenes in which physical measurements are not feasible.

Compute 3D Positions

The paper provides two different approaches for computing distances: in-plane measurements and out-of-plane measurements. You can combine these techniques to increase the power of the technique. For instance, once you have computed the height of one point X off of the reference plane P, you can compute the coordinates of any other point on the plane through X that is parallel to P. By choosing more than one reference plane, you can make even more measurements. Be creative and describe what you did to make measurements in your web page.

Compute Texture Maps

Use the points you have measured to define several planar patches in the scene. Note that even though your measurements may be in horizontal or vertical directions, you can include planes that are slanted, such as a roof.

The last step is to compute texture maps for each of these patches. If the patch is a rectangle in the scene, e.g., a wall or door, all that is needed is to warp the quadrilateral image region into a rectangular texture image, using the code you wrote for assignment 1. It is best to choose the width and height of the texture image to be the about the same as that of the original quadrilateral, to avoid loss of resolution. If the warp you perform scales down the image significantly along any direction, then you might find that bilinear interpolation does not filter sufficiently, and aliasing results. There are more elegant solutions, but a simple fix is to warp to a larger rectangle using a bilinear filter, and then filter that down to the desired size.

If the patch is a non-rectangular region such as the outline of a person, you will need to perform the following steps: (1) define a quadrilateral in the image containing the region you want, (2) warp this into a rectangular texture image, as before, and (3) edit the texture image and mark out "transparent" pixels by hand using image editing software. You could choose a distinctive color or pixel value as a flag to indicate transparency.

Create a VRML model

For each image you work from, create VRML model (see documentation below) with at least 10 texture-mapped polygonal faces. You should include two versions of the VRML model, one with the camera position shown and one without. The version without the camera will be easier to browse with the VRML viewer (because you can rotate about the center of the scene, not the scene + camera). You should also translate/rotate the model so that the initial view is similar to the input image, based on your knowledge of the camera position.

Submit Results

Put your code and executable in /afs/cs/project/classes-ph/869/students/yourname/asst3, and create a web page in asst3/www/index.html that contains:

• source images, show them both in their original form and with annotations and marks to show which points and lines you digitized. Give details on where you got the image (name of building, book and page number, artist, etc)
• a still image of a new view of the reconstructed scene, fairly far away from the input image.
• some of your texture maps, show some of the more interesting ones, commenting on any hand retouching you did (perhaps show before and after retouching, if it was significant)
• Include at least one non-quadrilateral object to make the scene more interesting.
• VRML files--for each input image, include one with the camera position marked and one without.
• A picture (screen snapshot) of your user interface.
• A description of your approach and analysis of the results. Comment on your design choices, what worked, what didn't. What hardware, operating system, and support libraries (e.g. fltk, OpenGL) did you use?
• Describe extensions that would be nice to include if you had more time.
The asst3 and asst3/www directories have been set up for you so that the former is private and the latter is public (the latter is a symbolic link to classdir/pub/results/yourname/asst3). If you put your web page in the wrong place then we might not find it.

Extra Credit

• Measurements from mosaics. Create an image mosaic, and then use the mosaic as your source image. You'll need to be extra-careful during the mosaic registration step to avoid introducing distortion.
• Merging models from multiple images. For instance, create a complete model of the Cathedral of Learning exterior from a few photographs that capture all four sides.

Resources

• Projective Geometry: The class web page has several links related to Criminisi's work that you should definitely look at. To brush up on projective geometry, you may want to take a look at the lecture notes, where Steve Seitz's slides and Bob Collins' writeup on computing vanishing points are available. There are also some projective geometry tutorials online.
• Source images: Browse the art and architecture books in Hunt Library.
• VRML: The Virtual Reality Modeling Language, a file format for interactive 3-D models (a.k.a. virtual worlds) on the Internet. The VRML repository has specifications for the file format and information on free VRML plugins that permit a web browser to display VRML models. See this on choosing a VRML viewer. If you have trouble getting the Cosmoplayer browser, try this. We have put a sample commented VRML file (it is a text file) online. If your browser has a VRML plugin (you can download these freely over the web), you should see a guy with sunglasses standing on a plank. The guy should look like a cardboard cutout (not a rectangle) if transparency is working. The two texture gif files are in the same directory. More on the VRML file format. Note that we'll only be using a fraction of VRML's capabilities.
• Image Editing Tools: We recommend
• Photoshop on Macintosh, PC
• gimp on Unix
Be sure to choose an image editor that supports transparent gifs.
• Scanners: See the scanner tips from assignment 1.
Steve Seitz