15-869 Image based Modeling and Rendering
Final Project
3D-Mosaicing from Stereo video
Kiran S. Bhat
Robotics Institute
Carnegie Mellon University
Pittsburgh, PA 15213
kiranb@cs.cmu.edu
Abstract
We describe an algorithm to register images obtained from multiple
viewpoints that are translated and rotated with respect to each other.
Range information at these viewpoints are obtained from a stereo pair that
is mounted on a mobile cart. We recover the rotation and translation parameters
by minimizing the sum-squared difference of the image intensities. We use
Levenberg Marquardt optimization techniques to perform the minimization.
We register all the images into a single viewpoint, resulting in a 3D mosaic.
Using the registration information, we can render images from novel camera
viewpoints by applying view interpolation techniques.
Introduction
Image mosaicing techniques have been used widely in tele-reality applications
like virtual walkthrough of real building interiors [Szeliski1,2,Irani3].
Current techniques for creating a virtual walkthrough relies on building
a panoramic mosaic of the office interior at several key locations,called
hot-spots in the room [Eric Chen4, Quicktime website5]. The user
can pan & tilt a virtual camera at these hot-spots, and the system
renders an appropriate view of the environment. Moreover, the user can
jump between the hot-spots to get a global view of the environment. To
realistically model a large environment using this technique, one needs
to create a large number of mosaics from several hot-spots. This process
can be extremely long, and involves a lot of human intervention. Moreover,
since the transition from the hot-spots is discrete, the walkthrough is
not smooth. The fundamental reason for these drawbacks is due to the fact
that the standard image mosaicing techniques register images that are obtained
from the same center of projection. In this work, we address the problem
of registering images obtained from camera viewpoints that are translated
and rotated relative to each other. Our technique requires the depth information
from each viewpoint, which we obtain from stereo. After registration, we
build a composite image from a single viewpoint, which we call a 3D
mosaic. This information can also be used to render views from novel
camera viewpoints using standard view interpolation techniques[Chen and
Williams6]. Our technique can be used for both building a panoramic mosaic
at hot-spots and smoothly transition between them by controlling the new
viewpoint. Note that our technique does not need a 3D model for rendering
from a novel viewpoint. The performance of our registration and rendering
algorithms depend critically on the stereo disparity results. We use the
cooperative stereo matching algorithm developed by Zitnick and Kanade [Zitnick7]
to obtain disparities from stereo. The algorithm for registering images
from multiple viewpoints is discussed in section 2. Section 3,4 and 5 shows
results of 3D mosaicing and rendering from novel viewpoints of a stereo
sequence obtained from an office interior. Finally, we discuss some limitations
of our work, and possible future work to improve the performance of the
algorithm.
Registration
We obtain the registration between two images using a 11 parameter formulation. Details of the formulation can be found in this report. We have 8 parameters {m0..m7}that describe the homography between the images and 3 parameters{tx,ty,tz} that account for the offset in the direction of the epipole. Note that we need information about the generalised disparity for each pixel in the images, which is available from stereo. We pose the parameter estimation as a minimization problem, where the eobjective function is the sum of squared difference in the image intensities over the overlapping regions. The Levenberg - Marquartd optimization technique is used to perform this minimization.We implemented the registration algorithm in a coarse to fine scheme using a gaussian pyramid scheme. At the coarsest level, we solve for the translation parameters {tx,ty,tz} alone. At the middle level, we solve for the eight homography parameters {m0..m7} with fixed values for translation parameters obtained from the previous level. At the highest level, we solve for all the eleven parameters. We find that the coarse to fine scheme helps a lot in converging to the correct local minima. At each level, the algorithm attains a local minima in at most 10 steps.
Figure 1 show the registration results obtained for a pair of consecutive images taken from the stereo cart using the 11-parameter algorithm. The images in the first row are the input images to the registration algorithm, and the image in the bottom row shows the result of Img2 (the right image in the first row) warped to Img1 (the left image in the first row). We can see that the registration is pretty accurate in the regions of overlap, and the warped image matches Img1 almost perfectly.
Figure 1: Registration results
3-D Mosaic Generation
Given that we can register two images taken at different centers of
projection and having different orientations with respect to each other,
we can warp one of the images into the viewpoint of the other image and
obtain a composite image. We call this composite image as a 3D mosaic.
Figure 2 shows an example of a 3D mosaic created from a sequence of images,
two of which are also shown.
Figure 2: 3D mosaic from two registered images
Rendering from Novel Viewpoint
Figure 3 shows two images obtained from different viewpoints along with their corresponding disparity maps. Figure 4 shows results of rendered images at novel viewpoints using view interpolation techniques. Since we perform forward warping, holes are created in the new views. We fill the holes using a combination of median filtering and image sharpening, which causes the rendered images to look somewhat blurry.
Figure 3: Two camera images and the corresponding disparity maps
Figure 4: Rendered images from novel viewpoints
Conclusions and future work
We present a technique to register images obtained from multiple viewpoints.
We can build a composite 3D image mosaic by warping the multiple views
into a single viewpoint. We also demonstrate some rendering results from
novel viewpoints. Since the input images are obtained from cameras that
can rotate and translate relative to each other, we need just six parameters
(rigid body motion) to characterize the registration. We however use 11
parameters for this registration, with five redundant parameters. Formulating
the problem with 6 parameters could improve the registration performance,
since we are now optimizing over a lower dimensional space (6 as opposed
to 11 dimensions). Another limitation of our approach is stereo. The quality
of the rendered scene depends directly on the accuracy of the stereo results.
Future work: We would like to implement the 6 parameter formulation
for registration, and compare the results with the current formulation.
Since we have multiple registered views of a scene, we can correct for
errors in the stereo disparity maps caused due to occlusions. Currently,
our 3D mosaic is obtained by projecting all images onto a single viewpoint.
We can extend this to an orthographic projection. We can also construct
an LDI of the 3D mosaic (Layered depth mosaic!) by incorporating multiple
depths at each pixel.
References
1. R.Szeliski. Image mosaicing for tele-reality applications. In IEEE
Workshop on Applications of Computer Vision, pp 44-53, December 1994.
2. R. Szeliski and H. Shum. Creating full view panoramic image mosaics
and environment maps. Computer Graphics (SIGGRAPH’97), pp. 251-258, 1997.
3. M. Irani, P. Anandan, J. Bergen, R. Kumar and S. Hsu. Mosaic representations
of video sequences and their applications. Signal Processing: Image Communication,
special issue on Image and Video semantics: Processes, Analysis and Applications,
Vol 8, No. 4, pp 327-351, May 1996
4. S.E. Chen. Quicktime VR - an image based approach to virtual environment
navigation. Computer Graphics (SIGGRAPH ‘95), pp 29-38, August 1995.
5. Quicktime VR demos- http://www.apple.com/quicktime/qtvr/
6. S. Chen and L. Williams. View interpolation for image synthesis.
Computer Graphics (SIGGRAPH ‘93), pp 279-288, August 1993.
7.C. Zitnick and T. Kanade. A Cooperative Algorithm for Stereo Matching
and Occlusion Detection. tech. report CMU-RI-TR-99-35, Robotics Institute,
Carnegie Mellon University, October, 1999.