Programming Assignment 2: View Transformation

15-869, Image-Based Modeling and Rendering,

Due: October 3, 1999 at midnight

Version 3: Sept. 28.

In this assignment you will acquire a range scan of your face or some other object and write image-warping code to render it from different camera viewpoints using McMillan's image warping and visibility technique.

Reading for this assignment:

View Interpolation for Image Synthesis, Eric Chen and Lance Williams, SIGGRAPH '93, ( PDF without color figures).
Head-Tracked Stereoscopic Display Using Image Warping, Leonard McMillan and Gary Bishop, Stereoscopic Displays and Virtual Reality Systems II, SPIE Proceedings 2409, Feb. 1995.
Optional: see also chapters 3 and 6 of McMillan's PhD Thesis, if the transformation derivation or algorithm from his SPIE paper is unclear.

The first paper is for background. The second paper describes the algorithm you should use. McMillan has written other papers on extending this technique to cylindrical images, but we'll stick with the simpler planar perspective case discussed in this paper.

The steps of the assignment are:

Scan an object
Convert the data
Write code to warp the image to create new views
Submit result images

Scanning

We recommend you use the Minolta Vivid 700 scanner in Martial Hebert's lab. This is a 3D laser range finder that projects a laser stripe into the scene and measures the deformation of this stripe as viewed from a slightly different camera viewpoint to infer the surface of the scene under the stripe. By quickly scanning the stripe across the scene, an entire range surface can be extracted. This technique is called optical triangulation. For more information on this and other scanning techniques, see the notes from Prof. Seitz's recent SIGGRAPH course on 3D Photography: http://www.cs.cmu.edu/~seitz/3DPhoto.html .

The Vivid scanner produces a 3D model (represented as either a mesh or a cloud of points) and a registered color image (texture). Maximum resolution is 400x400 pixels for the image and about 200x200 for the point set. Here are some instructions on using the Vivid scanner. You are free to use a different scanner, if you like.

You could scan your face or some other object. Only one scan is needed for this assignment (Chen and Williams usually used two or more input views, but we're doing McMillan's algorithm, which is a bit different. That's the reason we're calling this assignment ``view transformation'', not ``view interpolation''.)

Data Conversion

Code to read the Inventor format and convert it into a more useful format is available. See ivgrid.cxx , grid.h , and printgrid.cxx . The latter contains main(). These resample the points in the file written by the Vivid software and create a rectangular array of delta, r, g, b values, which is the form best suited to the implementation of McMillan's algorithm. Hopefully you won't need to modify ivgrid.cxx or grid.h. Please put most of your new code in a separate source file(s). There are working Makefiles for Linux, Sun, and SGI machines in the pub/src/asst2 directory.

It was necessary to resample because the (s,t) texture coordinate grid and the (x/z,y/z) projected 3-D point grid from the Vivid are not rectangular, but are slightly warped, probably intentionally so to correct for lens distortions and laser properties. But you shouldn't have to worry about that.

An older, simpler program, ivpoints.cxx , which reads the Inventor file and prints out the data as a list of x, y, z, r, g, b points is also available. This is less useful, however, because the points output usually do not make up a complete grid (points of unknown depth are skipped).

Generate New Views

Implement the algorithm described in the paper to warp each pixel in the reference image (the scan) to a new camera viewpoint. You should provide an interface for allowing the user to interactively rotate the camera and change its position (or rotate and translate the object in front of the camera). A basic (and sufficient) interface would be to provide sliders for X/Y/Z rotation, translation and zoom. A more natural interface would allow the user to directly change these parameters by clicking on the object and moving the mouse.

You have a choice of techniques for performing the warping. One option is to each 2x2 block of adjacent points in the data, connect them with a quadrilateral (or two triangles), and thereby create a surface model. This would get warped into a new quadrilateral in the destination image. This approach requires that you implement code to draw filled quadrilaterals or triangles, but the advantage is that you can avoid holes -- unseemly gaps between pixels in the destination image. It has the disadvantage that it creates long, bogus polygons at silhouettes. A second option is to approximate the pixel's shape in the new image as an axis-aligned square, or even a single pixel. This method will run faster and is easier to implement, but can generate holes. If you choose this method, you should implement a reasonable method to fill the holes.

You should write the rendering code for this assignment yourself. While it would be possible (trivial) to do view transformation of a textured 3-D point cloud using OpenGL's z-buffering and 3-D polygon rendering, that would defeat the purpose of this assignment, which is to demonstrate an alternative method that requires no z-buffer. You can use OpenGL's raster graphics features (e.g. glDrawPixels), but not OpenGL's scan conversion (e.g. glBegin(GL_POLYGON)).

Since the 3-D point grid you'll start with will typically be only 200x200 points, you might want to zoom up your pictures on your screen by a factor of 2 or 3 during testing so they won't be so tiny. This will help you to see holes and other minor flaws better. One way to do this is with glPixelZoom and glDrawPixels.

Make sure that your code works for all 9 cases (actually 3 for x and 3 for y) of visibility orders discussed in McMillan's paper.

Submit Results

Put your code and executable in /afs/cs/project/classes-ph/869/students/yourname/asst2, and create a web page in asst2/www/index.html that contains your images and describes your approach and results. Comment on your design choices, the speed of your program, what worked, what didn't, and describe extensions that would be nice to include if you had more time. Include one image for each of the 9 cases of visibility orders, demonstrating your implementation of the visibility algorithm. Be sure to include an interesting range of different camera viewpoints, including rotations of 90 degrees to get a profile view.

Minor points:

Put your name on your web page, since they're public.
Put your name on your source files -- a good habit.
If you use Microsoft Photo Editor, beware its Image Resize feature: the current version (3.0) yields unacceptable results when zooming down a picture. See examples. It seems to do unfiltered resampling (point sampling), even in so-called Smooth mode, resulting in severe block artifacts. This bug already damaged the pictures of one of our students in assignment 1.
When saving pictures related to this (or any other assignment), if using a lossy compression scheme such as JPEG, use a high quality setting, otherwise you might corrupt your hopefully flawless pictures with JPEG 8x8 pixel block artifacts. E.g. for "convert" or "cjpeg" use "-quality 95".

Extra Credit

Modify your warping code to produce stereo pairs, as described in the paper. If you have the ability to fuse two images viewed side-by-side by crossing your eyes, you can create a cross-eyed 3D display. By moving the camera viewpoint, you may be able to see a cross-eyed 3D animation. Increasing or decreasing the eye-separation will give you different depth effects. You should have a slider to experiment with different eye-separations and tune it to each viewer.

Extend your approach to interpolate mosaics. One approach would be to use your mosaic code from assignment 1 to mosaic two range scans together. Since this will involve acquiring two or more range scans, you should let us know early if you want to do this and capture the scans at the initial scanning appointment. A more ambitious extension would be to interpolate cylindrical mosaic images, as described in McMillan's 1995 SIGGRAPH paper entitled "Plenoptic Modeling: An Image-Based Rendering System." There is a scanner on campus manufactured by K²T (a CMU spin-off) that can capture panoramic depth images directly. It may bepossible to get previously-acquired data from this scanner or access to the scanner itself (but no guarantees). Check Takeo Kanade's web site at the Robotics Institute for information about this scanner.

From two or more range images of the same object at different poses, compute a Layered Depth Image or LDI (Shade et. al, SIGGRAPH 98) and adapt your warping code to work with LDI's. To do this you will need to either acquire multiple scans and register them, find registered scans on the net (Owen Carmichael may have some), or write code to convert a 3D model into an LDI. If you want to acquire multiple scans, you should let us know right away and capture these scans at your initial scanning appointment.

Change log: 9/28: added links to ivgrid.cxx etc.

Steve Seitz and Paul Heckbert