3D Scene Analysis


David C. Lee
Martial Hebert
Takeo Kanade


We study the problem of generating plausible interpretations of a scene from a collection of line segments automatically extracted from a single indoor image. We show that we can recognize the three dimensional structure of the interior of a building, even in the presence of occluding objects. Several physically valid structure hypotheses are proposed by geometric reasoning and verified to find the best fitting model to line segments, which is then converted to a full 3D model. Our experiments demonstrate that our structure recovery from line segments is comparable with methods using full image appearance. Our approach shows how a set of rules describing geometric constraints between groups of segments can be used to prune scene interpretation hypotheses and to generate the most plausible interpretation.

It is easy for us to recognize the structure from a collection of line segments in the image in top right, as well as locate a few doors. However, automatic recognition of structure from a collection of line segments is challenging, as not all lines defining the building structure are perfectly detected by low level image processing. To further complicate the problem, extra edges may lie on surfaces of walls or even on objects that are not part of the target structure. We can still interpret the collection of line segments because 1) we perform geometric reasoning and only consider physically plausible interpretations, 2) we have the ability to look globally at the overall structure, and 3) we have prior knowledge on how the world, in our case the interior of a building, is structured.
Geometric Rules on Corners Sample Building Models
Left figure shows the corners allowed by geometric reasoning. Only four types of physically valid corners can exist in a given region. Right figure shows example building models that we consider for indoor scenes.

Sample of Structure Hypotheses
Finding the building structure is done in three steps; line segments and vanishing points are found, many plausible building model hypotheses are created, and each hypothesis is tested against an orientation map, which is a map of local belief of region orientations, to find the best matching hypothesis.

Recovered 3D Building Structure
Now that we have the scene structure, we would like to use it as a "frame" that defines the scene, and populate it with objects in the scene. Recovering the "scene frame" is a stepping stone toward a more complete scene understanding, as it provides a global geometric context of the scene. Our ultimate goal is to recognize all the objects in a scene. Most objects of interest fall into one of the two categories: objects that lie on the floor, and objects that are attached to a wall. Objects that lie on the floor interacts with the scene frame by being supported at the point it contacts the floor of the frame, which determines its 3D location. These objects need to be in an empty space of the frame, and not inside walls. Locations of objects attached to walls are also constrained by the scene frame. These constraints allow us to find objects more robustly in 3D space. Results are shown for doors and people.

Objects in Scene

Video of Results

Scene Analysis. [WMV 5MB]


D. C. Lee, M. Hebert, and T. KanadeGeometric Reasoning for Single Image Structure Recovery.IEEE International Conference on Computer Vision and Pattern Recognition (CVPR), June, 2009.


This research is supported by:

Copyright notice