15-869 Image based Modeling and Rendering
Final Project
3D-Mosaicing from Stereo video
Kiran S. Bhat
Robotics Institute
Carnegie Mellon University
Pittsburgh, PA 15213

We describe an algorithm to register images obtained from multiple viewpoints that are translated and rotated with respect to each other. Range information at these viewpoints are obtained from a stereo pair that is mounted on a mobile cart. We recover the rotation and translation parameters by minimizing the sum-squared difference of the image intensities. We use Levenberg Marquardt optimization techniques to perform the minimization. We register all the images into a single viewpoint, resulting in a 3D mosaic. Using the registration information, we can render images from novel camera viewpoints by applying view interpolation techniques.

Image mosaicing techniques have been used widely in tele-reality applications like virtual walkthrough of real building interiors [Szeliski1,2,Irani3]. Current techniques for creating a virtual walkthrough relies on building a panoramic mosaic of the office interior at several key locations,called hot-spots in the room [Eric Chen4, Quicktime website5]. The user can pan & tilt a virtual camera at these hot-spots, and the system renders an appropriate view of the environment. Moreover, the user can jump between the hot-spots to get a global view of the environment. To realistically model a large environment using this technique, one needs to create a large number of mosaics from several hot-spots. This process can be extremely long, and involves a lot of human intervention. Moreover, since the transition from the hot-spots is discrete, the walkthrough is not smooth. The fundamental reason for these drawbacks is due to the fact that the standard image mosaicing techniques register images that are obtained from the same center of projection. In this work, we address the problem of registering images obtained from camera viewpoints that are translated and rotated relative to each other. Our technique requires the depth information from each viewpoint, which we obtain from stereo. After registration, we build a composite image from a single viewpoint, which we call a 3D mosaic. This information can also be used to render views from novel camera viewpoints using standard view interpolation techniques[Chen and Williams6]. Our technique can be used for both building a panoramic mosaic at hot-spots and smoothly transition between them by controlling the new viewpoint. Note that our technique does not need a 3D model for rendering from a novel viewpoint. The performance of our registration and rendering algorithms depend critically on the stereo disparity results. We use the cooperative stereo matching algorithm developed by Zitnick and Kanade [Zitnick7] to obtain disparities from stereo. The algorithm for registering images from multiple viewpoints is discussed in section 2. Section 3,4 and 5 shows results of 3D mosaicing and rendering from novel viewpoints of a stereo sequence obtained from an office interior. Finally, we discuss some limitations of our work, and possible future work to improve the performance of the algorithm.


We obtain the registration between two images using a 11 parameter formulation. Details of the formulation can be found in this report. We have 8 parameters {m0..m7}that describe the homography between the images and 3 parameters{tx,ty,tz} that account for the offset in the direction of the epipole. Note that we need information about the generalised disparity for each pixel in the images, which is available from stereo. We pose the parameter estimation as a minimization problem, where the eobjective function is the sum of squared difference in the image intensities over the overlapping regions. The Levenberg - Marquartd optimization technique is used to perform this minimization.We implemented the registration algorithm in a coarse to fine scheme using a gaussian pyramid scheme. At the coarsest level, we solve for the translation parameters {tx,ty,tz} alone. At the middle level, we solve for the eight homography parameters {m0..m7} with fixed values for translation parameters obtained from the previous level. At the highest level, we solve for all the eleven parameters. We find that the coarse to fine scheme helps a lot in converging to the correct local minima. At each level, the algorithm attains a local minima in at most 10 steps.

Figure 1 show the registration results obtained for a pair of consecutive images taken from the stereo cart using the 11-parameter algorithm. The images in the first row are the input images to the registration algorithm, and the image in the bottom row shows the result of Img2 (the right image in the first row) warped to Img1 (the left image in the first row). We can see that the registration is pretty accurate in the regions of overlap, and the warped image matches Img1 almost perfectly.

                                                            Figure 1: Registration results

3-D Mosaic Generation
Given that we can register two images taken at different centers of projection and having different orientations with respect to each other, we can warp one of the images into the viewpoint of the other image and obtain a composite image. We call this composite image as a 3D mosaic. Figure 2 shows an example of a 3D mosaic created from a sequence of images, two of which are also shown.


                                           Figure 2: 3D mosaic from two registered images

Rendering from Novel Viewpoint

Figure 3 shows two images obtained from different viewpoints along with their corresponding disparity maps. Figure 4 shows results of rendered images at novel viewpoints using view interpolation techniques. Since we perform forward warping, holes are created in the new views. We fill the holes using a combination of median filtering and image sharpening, which causes the rendered images to look somewhat blurry.

                                    Figure 3: Two camera images and the corresponding disparity maps

                                                Figure 4: Rendered images from novel viewpoints

Conclusions and future work
We present a technique to register images obtained from multiple viewpoints. We can build a composite 3D image mosaic by warping the multiple views into a single viewpoint. We also demonstrate some rendering results from novel viewpoints. Since the input images are obtained from cameras that can rotate and translate relative to each other, we need just six parameters (rigid body motion) to characterize the registration. We however use 11 parameters for this registration, with five redundant parameters. Formulating the problem with 6 parameters could improve the registration performance, since we are now optimizing over a lower dimensional space (6 as opposed to 11 dimensions). Another limitation of our approach is stereo. The quality of the rendered scene depends directly on the accuracy of the stereo results.
Future work: We would like to implement the 6 parameter formulation for registration, and compare the results with the current formulation. Since we have multiple registered views of a scene, we can correct for errors in the stereo disparity maps caused due to occlusions. Currently, our 3D mosaic is obtained by projecting all images onto a single viewpoint. We can extend this to an orthographic projection. We can also construct an LDI of the 3D mosaic (Layered depth mosaic!) by incorporating multiple depths at each pixel.

1. R.Szeliski. Image mosaicing for tele-reality applications. In IEEE Workshop on Applications of Computer Vision, pp 44-53, December 1994.
2. R. Szeliski and H. Shum. Creating full view panoramic image mosaics and environment maps. Computer Graphics (SIGGRAPH’97), pp. 251-258, 1997.
3. M. Irani, P. Anandan, J. Bergen, R. Kumar and S. Hsu. Mosaic representations of video sequences and their applications. Signal Processing: Image Communication, special issue on Image and Video semantics: Processes, Analysis and Applications, Vol 8, No. 4, pp 327-351, May 1996
4. S.E. Chen. Quicktime VR - an image based approach to virtual environment navigation. Computer Graphics (SIGGRAPH ‘95), pp 29-38, August 1995.
5. Quicktime VR demos- http://www.apple.com/quicktime/qtvr/
6. S. Chen and L. Williams. View interpolation for image synthesis. Computer Graphics (SIGGRAPH ‘93), pp 279-288, August 1993.
7.C. Zitnick and T. Kanade. A Cooperative Algorithm for Stereo Matching and Occlusion Detection. tech. report CMU-RI-TR-99-35, Robotics Institute, Carnegie Mellon University, October, 1999.