In this work we propose a hierarchical approach for labeling semantic objects and regions in scenes. Our approach is reminiscent of early vision literature in that we use a decomposition of the image in order to encode relational and spatial information. In contrast to much existing work on structured prediction for scene understanding, we bypass a global probabilistic model and instead directly train a hierarchical inference procedure inspired by the message passing mechanics of some approximate inference procedures in graphical models. This approach mitigates both the theoretical and empirical difficulties of learning probabilistic models when exact inference is intractable. In particular, we draw from recent work in machine learning and break the complex inference process into a hierarchical series of simple machine learning subproblems. Each subproblem in the hierarchy is designed to capture the image and contextual statistics in the scene. This hierarchy spans coarse-to-fine regions and explicitly models the mixtures of semantic labels that may be present due to imperfect segmentation. To avoid cascading of errors and overfitting, we train the learning problems in sequence to ensure robustness to likely errors earlier in the inference sequence and leverage the stacking approach developed by Cohen et al.

Updated Results (As of April 26, 2013)

The performance on the Stanford Background dataset is: Image classifications are available [here]. Differences from the ECCV 2010 publication are: Computation time breakdown per image (seconds):




The original naive Matlab implementation of the ECCV 2010 paper: [code] Creative Commons License



ECCV 2010 Stacked Hierarchical Labeling
D. Munoz, J. A. Bagnell, M. Hebert
ECCV 2010 Oral Presentation
[pdf] [project page] [bibtex]
See the project page for updated results!

CVPR 2011 Learning Message-Passing Inference Machines
for Structured Prediction

S. Ross, D. Munoz, M. Hebert, J. A. Bagnell
CVPR 2011
[pdf] [project page] [bibtex]

ICRA 2011 3-D Scene Analysis via Sequenced Predictions
over Points and Regions

X. Xiong, D. Munoz, J. A. Bagnell, M. Hebert
ICRA 2011 Best Vision Paper Award Finalist
[pdf] [project page] [bibtex]

ECCV 2012 Co-inference for Multi-modal Scene Analysis
D. Munoz, J. A. Bagnell, M. Hebert
ECCV 2012
[pdf] [project page] [bibtex]

Inference Machines:
Parsing Scenes via Iterated Predictions

D. Munoz
PhD Thesis, Carnegie Mellon University 2013
[pdf] [bibtex]