Structure Discovery in Multi-modal Data: a Region-based Approach


Alvaro Collet Romea
Siddhartha Srinivasa
Martial Hebert


The ability of a perception system to discern what is important in a scene and what is not is an invaluable asset, with multiple applications in object recognition, people detection and SLAM, among others. In this paper, we aim to analyze all sensory data available to separate a scene into a few physically meaningful parts, which we term structure, while discarding background clutter. In particular, we consider the combination of image and range data, and base our decision in both appearance and 3D shape. Our main contribution is the development of a framework to perform scene segmentation that preserves physical objects using multi-modal data. We combine image and range data using a novel mid-level fusion technique based on the concept of regions that avoids any pixel-level correspondences between data sources. We associate groups of pixels with 3D points into multi-modal regions that we term regionlets, and measure the structure-ness of each regionlet using simple, bottom-up cues from image and range features. We show that the highest-ranked regionlets correspond to the most prominent objects in the scene. We verify the validity of our approach on 105 scenes of household environments.

In this work, our goal is to generate a scene segmentation, together with a ranking mechanism, such that the highest-ranking segments correspond to physical entities in the scene. We call this process structure discovery. We combine range and image data to compute perceptual cues such as concavities and discontinuities. These cues are then used to generate scene segmentations that preserve physical entities.

(Top-left) Input image (range data is also an input, not shown). (Top-right) Highest ranked structures according to our algorithm. (Bottom-left) All structures found by our algorithm. (Bottom-right) 3D point cloud of all structures found by our algorithm, color-coded from white to red for easier visibility, being white the best score.
Examples of test images and their structure segmentation


Alvaro ColletSiddhartha Srinivasa,   Martial HebertStructure Discovery in Multi-modal Data: a Region-based Approach   IEEE International Conference on Robotics and Automation (ICRA'11), May, 2011.


This material is based upon work partially supported by the National Science Foundation under Grant No. EEC-0540865. Alvaro Collet is partially supported by a Caja Madrid fellowship.

Copyright notice