Characterizing Stereo Matching Problems using Local Spatial Frequency

Ph.D. Thesis
May 1996

Mark W. Maimone

School of Computer Science
Carnegie Mellon University


The model of local spatial frequency provides a powerful analytical tool for image analysis. In this thesis we explore the application of this representation to long-standing problems in stereo vision. As the basis for this analysis, we develop a phase-based algorithm for stereo matching that uses an adaptive scale selection process. Our approach demonstrates a novel solution to the phase-wraparound problem that has limited the applicability of other phase-based methods.

The problem of ambiguous matches, or false targets, can greatly reduce the accuracy of a stereo vision system. A common approach to alleviating the problem is the use of a coarse to fine refinement strategy, but we show that this approach imposes some (perhaps overly) strong requirements on the stereo images. Our phase-based method relaxes those requirements, and is therefore able to handle a wider variety of otherwise ambiguous images. But sometimes ambiguity is inherent in the images, so we propose a generalized disparity model to explicitly represent multiple candidates.

Perspective foreshortening, an effect that occurs when a surface is viewed at a sharp angle, can reduce the precision of stereo methods. Many methods tacitly assume that the projection of an object will have the same area in both images, but this condition is violated by perspective foreshortening. We show how to overcome this problem using a local spatial frequency representation. A simple geometric analysis leads to an elegant solution in the frequency domain which, when applied to our phase-based system, increases the system's maximum matchable surface angle from 30 degrees to over 75 degrees.

The analysis of stereo vision algorithms can be greatly enhanced through the use of datasets with ground truth. We outline a taxonomy of datasets with ground truth that use varying degrees of realism to characterize particular aspects of stereo vision systems, and show that each component of this taxonomy can be effectively realized with current technology. We propose that datasets generated in this way be used as the foundation for a suite of statistical analyses to effectively characterize the performance of stereo vision systems.

A PostScript copy of this thesis is available from the CMU SCS Computer Science Technical Report Collection as Tech Report CMU-CS-96-125 in three parts:
  • Part I - Chapters 1, 2, and 3 [0.9Meg]
  • Introduction, Taxonomy, Phase-based Stereo
  • Part II - Chapter 4 [1.3Meg]
  • Ambiguous Matches
  • Part III - Chapters 5, 6 and Appendices [1.9Meg]
  • Foreshortening, Conclusions, Appendices
  • Much of the data used in this thesis is available:
  • Some Stereo Datasets with Ground Truth
  • Matlab source code available as a gzipped tar file (54K) but see the disclaimer
  • Images (forthcoming)

  • This research was sponsored in part by the Department of the Army, Army Research Office under grant number DAAH04-94-G-0006, and the NASA Ames Graduate Student Researchers Program NGT 51026. The views and conclusions contained in this document are those of the author and should not be interpreted as necessarily representing official policies or endorsements, either expressed or implied, of the Department of the Army or the United States Government.

    Mark Maimone