A Bayesian Approach to Image Mosaicking

Frank Dellaert

Problem and Impact

Many applications require stitching together of many images into one, panoramic image called a 'mosaic'. Often this technique is referred to as 'image mosaicking'. The prototypical application is that of surveying a scene with a camcorder, and then mosaic the video-sequence into a panoramic picture of the whole scene. It is also used extensively in remote-sensing applications. In our group, we use mosaicking to build a visual map of the environment in which a robot operates. In particular, we equipped the robotic museum tour-guide Minerva with a camera pointed towards the ceiling. The idea is to build a large mosaic of the ceiling, which can then be used during the operation of the robot to determine its position. Since during operation only a camera is needed to globally localize and track the position of the robot, this idea has large potential within the emerging market of low-cost, commercial mobile robots, where expensive sensor modalities such as laser range finders are often unaffordable.

State of the Art:

  \begin{figure}% latex2html id marker 106
\leavevmode \hspace*{\fill}
...n by Minerva }}

It soon became apparent that the video-sequence recorded in the museum posed new challenges that the existing mosaicking techniques are unable to handle. In particular, as is illustrated in Figure 1, the images are noisy, defocused, and are contaminated with many light-reflection effects when the camera looks into a bright light. In addition, most state of the art mosaicking is unable to deal with significant parallax resulting from the 3D structure of the ceiling. Finally, most mosaicking techniques only deal with rotation of the camera around a fixed points, as translation again gives rise to significant parallax effects.


Our approach is to pose the problem of mosacking within the framework of Bayesian estimation, and apply the tools developed within the image restoration community for maximum a posteriori or MAP image restoration to image mosaicking. In particular, we use robust estimators to deal with the noise and light contamination, rather than the SSD criterion typically used in the literature. Robust error measures are more tolerant of outliers, and will effectively discard the an apparent light as a reflection if other images contain conflicting information. In addition, we use edge preserving Random Markov Field (MRF) image priors to achieve high resolution estimates of the mosaic. MRF priors convey knowledge about what the mosaic is expected to look like, e.g. that it is smooth most of the time. Edge preserving MRFs allow piecewise smooth images, such that actual edges are not smoothed out in the final mosaic. Finally, the image registration problem is posed in a novel way and solved with tools used before in the context of global position estimation in the mobile robot community.

 \begin{figure}% latex2html id marker 127
...nting camera.}}
A typical result is shown in Figure 2, which shows a mosaic resulting from stitching together 150 ceiling images into one composite image. The picture shows the 60 meter wide by 40 meter deep first floor entrance area of the Museum for American History (MAH), one of the Smithsonian Institute's museums on the mall in Washington DC. In the figure, the entrance is at the top, and the large octagonal feature in the middle is the opening through which visitors can observe Foucault's pendulum swing. The map covers the entire area where Minerva spent two weeks giving tours to visitors in the summer of 1998.

Future Work:

Although in the case of the MAH we have been quite successful in the construction of a mosaic, this still required a lot of manual intervention from us, the robot operators. In future work, we would like to automate the process of image gathering and mosaic construction so it can be done on line and on the fly, as a robot wanders into new territory. This simultaneous localization and mapping problem has already been solved for other sensor modalities, but many real-time issues need to be addressed when dealing with the large image data-sets obtained by a video camera.

About this document...

This document was generated using the LaTeX2HTML translator Version 98.1p1 release (March 2nd, 1998).
The translation was performed on 1998-11-24.