Programming Assignment #2: View Transformation
Terence Sim

Oct 3, 1999

Better results here

For this assignment, I scanned a picture of a Snoopy toy with a laser scanner.  This produced a 200x200 color image with a depth map.

The original picture and depth map are shown here:
 
 

(Left) Original image.  (Right) Depth map. White means object is closer, 
gray, farther back, while black denotes unknown depth.

I created a simple GUI to display the image, and to provide controls to "fly" around the scene, by controlling the values of Alpha, Beta, and Zoom.  Imagine a 3D coordinate axis set up at the center of the object (in this case, roughly at Snoopy's nose), such that the x-, y- and z-axes point right, up and outwards, respectively.  Then alpha is the azimuth angle, i.e. the horizontal angle; beta is the elevation, i.e. the angle from the y-axis towards the horizontal plane; and zoom is the distance from the viewpoint to the object center.  So, alpha=-90 degrees correspond to rotating to the left profile view, and beta=90 degrees means a view from the bottom.  A zoom value >1 means the viewing distance is farther away, so the image is smaller (zooming out); while a zoom value <1 means zooming into the object.

The GUI is shown here:



The algorithm used is based on McMillan's paper, but working out the geometry transformations was tricky.  In the end, I derived my own view transformation algorithm which allowed me to compute the new view from any alpha, beta, and zoom setting.  This is done first without regard to the scan order or hole-filling, just to see if my transformation works.  Of course, the view may not be rendered correctly.  I then applied McMillan's visibility algorithm to correct this problem.  The results are as follows:
 
 
 

Comparison of results: (left) no scan order, (middle) McMillan's algorithm, (right) hole filling.
GUI parameters: alpha= -20, beta= -10, zoom=1

The left image is generated without dealing with correct visibility: the left foot is incorrectly rendered because part of the body is overlaid on it.  Also, the face has a large cleft in it.  The middle image uses McMillan's algorithm.  This corrects the visibility problem, but not the holes.  The right image employs a hole-filling algorithm.  This uses a 9x9 Gaussian mask (sigma=3) to fill the holes.  The mask is applied 3 times.  This removes many small holes, but has the effect of blurring the image.  The large holes remain because there is no data available to fill them.

There is significant ghosting in the right image: a second outline of the feet are visible, and the logo on the body is rendered twice.  This could be due to the rounding of pixel coordinates to integer values in the source-scan method, or to the unknown depth values around parts of the object (see depth map above).

I also implemented the Z-buffer algorithm.  It is interesting to note that the result of McMillan's algorithm is almost identical to the Z-buffer algorithm.  Only a few pixels differ.  This shows that McMillan's algorithm can perform the same task with considerably less memory (no z-buffer needed).

More results are shown below:
 
 

(Top left) Same viewpoint but zoom=0.71 (Top right) zoom= 1.5
(Bottom left) Bottom profile, beta= 90 (Bottom right) Left profile, alpha= 90

Clearly, much more can be done to fill the holes.  In the zoomed-in image (top left), many holes are still visible, and they cause the image to darken.  The bottom and left profiles show large holes that cannot be reasonably filled with any interpolation scheme.  The only solution is to acquire more images from different viewpoints and use these to fill in the holes.

If I had more time, I would consider the following improvements:

1. Use multiple images to eliminate the large holes
2. Improve the interpolation, to fill the small holes
3. Use a combination of source-image and destination image-scan to prevent ghosting
4. Implement a web-based applet so that users can interactively fly around the scene
5. Implement more general transformations, including translation

The above program was all done in Matlab, including the GUI.  I had considered using C to speed up the critical routines (Mcmillan's algorithm), but Matlab took about 1-2 minutes, which is still reasonable.  For a web-based implementation, all the code could be re-written in Java.

More details are available here.
 
 

--- END ---