15-869, Image-Based Modeling and Rendering,
Due: October 3, 1999 at midnight
Version 3: Sept. 28.
In this assignment you will acquire a range scan of your face or some other object and write image-warping code to render it from different camera viewpoints using McMillan's image warping and visibility technique.
Reading for this assignment:
The steps of the assignment are:
We recommend you use the Minolta Vivid 700 scanner in Martial Hebert's lab. This is a 3D laser range finder that projects a laser stripe into the scene and measures the deformation of this stripe as viewed from a slightly different camera viewpoint to infer the surface of the scene under the stripe. By quickly scanning the stripe across the scene, an entire range surface can be extracted. This technique is called optical triangulation. For more information on this and other scanning techniques, see the notes from Prof. Seitz's recent SIGGRAPH course on 3D Photography: http://www.cs.cmu.edu/~seitz/3DPhoto.html .
The Vivid scanner produces a 3D model (represented as either a mesh or a cloud of points) and a registered color image (texture). Maximum resolution is 400x400 pixels for the image and about 200x200 for the point set. Here are some instructions on using the Vivid scanner. You are free to use a different scanner, if you like.
You could scan your face or some other object. Only one scan is needed for this assignment (Chen and Williams usually used two or more input views, but we're doing McMillan's algorithm, which is a bit different. That's the reason we're calling this assignment ``view transformation'', not ``view interpolation''.)
Code to read the Inventor format and convert it into a more useful format is available. See ivgrid.cxx , grid.h , and printgrid.cxx . The latter contains main(). These resample the points in the file written by the Vivid software and create a rectangular array of delta, r, g, b values, which is the form best suited to the implementation of McMillan's algorithm. Hopefully you won't need to modify ivgrid.cxx or grid.h. Please put most of your new code in a separate source file(s). There are working Makefiles for Linux, Sun, and SGI machines in the pub/src/asst2 directory.
It was necessary to resample because the (s,t) texture coordinate grid and the (x/z,y/z) projected 3-D point grid from the Vivid are not rectangular, but are slightly warped, probably intentionally so to correct for lens distortions and laser properties. But you shouldn't have to worry about that.
An older, simpler program, ivpoints.cxx , which reads the Inventor file and prints out the data as a list of x, y, z, r, g, b points is also available. This is less useful, however, because the points output usually do not make up a complete grid (points of unknown depth are skipped).
Implement the algorithm described in the paper to warp each pixel in the reference image (the scan) to a new camera viewpoint. You should provide an interface for allowing the user to interactively rotate the camera and change its position (or rotate and translate the object in front of the camera). A basic (and sufficient) interface would be to provide sliders for X/Y/Z rotation, translation and zoom. A more natural interface would allow the user to directly change these parameters by clicking on the object and moving the mouse.
You have a choice of techniques for performing the warping. One option is to each 2x2 block of adjacent points in the data, connect them with a quadrilateral (or two triangles), and thereby create a surface model. This would get warped into a new quadrilateral in the destination image. This approach requires that you implement code to draw filled quadrilaterals or triangles, but the advantage is that you can avoid holes -- unseemly gaps between pixels in the destination image. It has the disadvantage that it creates long, bogus polygons at silhouettes. A second option is to approximate the pixel's shape in the new image as an axis-aligned square, or even a single pixel. This method will run faster and is easier to implement, but can generate holes. If you choose this method, you should implement a reasonable method to fill the holes.
You should write the rendering code for this assignment yourself. While it would be possible (trivial) to do view transformation of a textured 3-D point cloud using OpenGL's z-buffering and 3-D polygon rendering, that would defeat the purpose of this assignment, which is to demonstrate an alternative method that requires no z-buffer. You can use OpenGL's raster graphics features (e.g. glDrawPixels), but not OpenGL's scan conversion (e.g. glBegin(GL_POLYGON)).
Since the 3-D point grid you'll start with will typically be only 200x200 points, you might want to zoom up your pictures on your screen by a factor of 2 or 3 during testing so they won't be so tiny. This will help you to see holes and other minor flaws better. One way to do this is with glPixelZoom and glDrawPixels.
Make sure that your code works for all 9 cases (actually 3 for x and 3 for y) of visibility orders discussed in McMillan's paper.
Put your code and executable in /afs/cs/project/classes-ph/869/students/yourname/asst2, and create a web page in asst2/www/index.html that contains your images and describes your approach and results. Comment on your design choices, the speed of your program, what worked, what didn't, and describe extensions that would be nice to include if you had more time. Include one image for each of the 9 cases of visibility orders, demonstrating your implementation of the visibility algorithm. Be sure to include an interesting range of different camera viewpoints, including rotations of 90 degrees to get a profile view.
Change log: 9/28: added links to ivgrid.cxx etc.
Steve Seitz and Paul Heckbert