Geometrically Coherent Image Interpretation

Graduate Student Researcher: Derek Hoiem
Faculty Advisors: Alexei A. Efros (Principal Investigator), Martial Hebert
Project Sponsor: National Science Foundation CAREER award IIS-0546547

Image interpretation, the ability to see and understand the three-dimensional world behind a two-dimensional image, goes to the very heart of the computer vision problem. The overall objective of this research project effort is, given a single image, to automatically produce a coherent interpretation of the depicted scene. On one level, such interpretation should include opportunistically recognizing known objects (e.g. people, houses, cars, trees) and known materials (e.g. grass, sand, rock, foliage) as well as their rough positions and orientations within the scene. But more than that, the goal is to capture the overall sense of the scene even if we do not recognize some of its constituent parts.

To address this extremely difficult task, we propose a novel framework that aims to jointly model the elements that make up a scene within the geometric context of the 3D space that they occupy. Because none of the measured quantities in the image -- geometry, materials, objects and object parts, scene classes, camera pose, etc. -- are reliable in isolation, they must all be considered together, in a coherent way. Having the geometric context representation will allow all the elements of the image to be physically "placed" within this contextual frame and will permit reasoning between them and their 3D environment in a joint optimization framework. During the timeframe of this project, we will develop such a framework which will allow a geometrically coherent semantic interpretation of a image to emerge.

Intellectual Merit: At the core of the project is an effort to unify two disjoint philosophies -- the traditional "Geometry" school that deals with 3D quantities like points and surfaces, and the newer "Appearance" school that operates in terms of 2D pixel patterns. These two views are here combined into one coherent framework, where appearance and geometry co-exist and rely on each other to jointly produce an interpretation of an image.

Broader Impact: There are a number of important real-world problems that will benefit from the proposed research even during its development. Direct applications of this work include: developing navigation assistant technology for the visually impaired, scene awareness for mobile robots and car safety, and creating graphical 3D walk-through environments from a single image.

Publications

Seeing the World Behind the Image: Spatial Layout for 3D Scene Understanding
Derek Hoiem
PhD Thesis, Robotics Institute, Carnegie Mellon University, August 2007
Closing the Loop on Scene Interpretation
Derek Hoiem, Alexei A. Efros, Martial Hebert
in CVPR 2008.
See 3D reconstruction compared to Photo Pop-up and Make3D
Recovering Occlusion Boundaries from a Single Image
Derek Hoiem, Andrew Stein, Alexei A. Efros, Martial Hebert
in ICCV 2007
Opportunistic use of vision to push back the path-planning horizon
Bart Nabbe, Derek Hoiem, Alexei A. Efros, Martial Hebert
in IROS 2006
Putting Objects in Perspective
Derek Hoiem, Alexei A. Efros, Martial Hebert
In CVPR 2006
Best Paper Award
Geometric Context from a Single Image
Derek Hoiem, Alexei A. Efros, Martial Hebert
In ICCV 2005
(journal version available, accepted to IJCV)
Automatic Photo Pop-up
Derek Hoiem, Alexei A. Efros, Martial Hebert
In SIGGRAPH 2005

Software

The executables for a number of algorithms in this project are collected in the software page.
This software is made available for academic use only.

Datasets

A number of datasets have been gathered to evaluate our algorithms. They can be accessed from the dataset page.
These datasets are made available for academic use only.
Some of this material is based upon work supported by the National Science Foundation under CAREER Grant No. ISS-0546547. Any opinions, findings, and conclusions or recommendations expressed in this material are those of the author(s) and do not necessarily reflect the views of the National Science Foundation.