Publications of year 2005

Vision and Mobile Robotics Laboratory \| Publications
Home \| Members \| Projects \| Publications \| Software \| Videos \| Job opportunities	Internal

BACK TO INDEX

Conference's articles

Raghavendra Rao Donamukkala, Daniel Huber, Anuj Kapuria, and Martial Hebert. Automatic class selection and prototyping for 3-D object classification. In Proceedings of the 5th International Conference on 3-D Digital Imaging and Modeling (3DIM '05), pages 64-71, June 2005. IEEE Computer Society Press. (url) (pdf)
Abstract: "Most research on 3-D object classification and recognition focuses on recognition of objects in 3-D scenes from a small database of known 3-D models. Such an approach does not scale well to large databases of objects and does not generalize well to unknown (but similar) object classification. This paper presents two ideas to address these problems (i) class selection, i.e., grouping similar objects into classes (ii) class prototyping, i.e., exploiting common structure within classes to represent the classes. At run time matching a query against the prototypes is sufficient for classification. This approach will not only reduce the retrieval time but also will help increase the generalizing power of the classification algorithm. Objects are segmented into classes automatically using an agglomerative clustering algorithm. Prototypes from these classes are extracted using one of three class prototyping algorithms. Experimental results demonstrate the effectiveness of the two steps in speeding up the classification process without sacrificing accuracy."

@inproceedings{donamukkala-3dim-05,
author = "Raghavendra Rao Donamukkala and Daniel Huber and Anuj Kapuria and Martial Hebert",
title = "Automatic class selection and prototyping for 3-D object classification",
booktitle = "Proceedings of the 5th International Conference on 3-D Digital Imaging and Modeling (3DIM '05)",
month = "June",
year = "2005",
pages = "64-71",
publisher = "IEEE Computer Society Press",
keywords="",
url="http://www.ri.cmu.edu/pubs/pub_5228.html",
pdf="http://www.ri.cmu.edu/pub_files/pub4/donamukkala_raghavendra_rao_2005_1/donamukkala_raghavendra_rao_2005_1.pdf",
abstract="Most research on 3-D object classification and recognition focuses on recognition of objects in 3-D scenes from a small database of known 3-D models. Such an approach does not scale well to large databases of objects and does not generalize well to unknown (but similar) object classification. This paper presents two ideas to address these problems (i) class selection, i.e., grouping similar objects into classes (ii) class prototyping, i.e., exploiting common structure within classes to represent the classes. At run time matching a query against the prototypes is sufficient for classification. This approach will not only reduce the retrieval time but also will help increase the generalizing power of the classification algorithm. Objects are segmented into classes automatically using an agglomerative clustering algorithm. Prototypes from these classes are extracted using one of three class prototyping algorithms. Experimental results demonstrate the effectiveness of the two steps in speeding up the classification process without sacrificing accuracy." 
}

Chris Gordon, Burcu Akinci, Frank Boukamp, and Daniel Huber. Assessment of visualization software for support of construction site inspection tasks using data collected from reality capture technologies. In Proceedings of the 2005 International Conference on Computing in Civil Engineering, July 2005. American Society of Civil Engineers. (url) (pdf)
Abstract: "Emerging reality capture technologies, such as LADAR and embedded sensing, have potential to increase the efficiency of inspectors by generating detailed data about as-built conditions that can be analyzed in real time and at a later time at an office. The data collected using these systems provide an opportunity to visualize and analyze as-built conditions on construction sites in a more comprehensive way. At the same time, some characteristics of the data collected, such as its size and level of detail, provide unique visualization challenges. Currently available software systems deliver some functionalities to support construction inspection tasks by enabling the visualization and manipulation of the data captured using these reality capture technologies. In this paper, we provide an assessment of some functionalities provided by a set of selected visualization software tools based on the characteristics of the data collected and to be visualized, and on the construction inspection process that need to be supported."

@inproceedings{Gordon_2005_5229,
author = "Chris Gordon and Burcu Akinci and Frank Boukamp and Daniel Huber",
title = "Assessment of visualization software for support of construction site inspection tasks using data collected from reality capture technologies",
booktitle = "Proceedings of the 2005 International Conference on Computing in Civil Engineering",
month = "July",
year = "2005",
publisher = "American Society of Civil Engineers",
abstract="Emerging reality capture technologies, such as LADAR and embedded sensing, have potential to increase the efficiency of inspectors by generating detailed data about as-built conditions that can be analyzed in real time and at a later time at an office. The data collected using these systems provide an opportunity to visualize and analyze as-built conditions on construction sites in a more comprehensive way. At the same time, some characteristics of the data collected, such as its size and level of detail, provide unique visualization challenges. Currently available software systems deliver some functionalities to support construction inspection tasks by enabling the visualization and manipulation of the data captured using these reality capture technologies. In this paper, we provide an assessment of some functionalities provided by a set of selected visualization software tools based on the characteristics of the data collected and to be visualized, and on the construction inspection process that need to be supported.",
keywords="",
url="http://www.ri.cmu.edu/pubs/pub_5229.html",
pdf="http://www.ri.cmu.edu/pub_files/pub4/gordon_chris_2005_1/gordon_chris_2005_1.pdf" 
}

Derek Hoiem, Alexei A. Efros, and Martial Hebert. Geometric Context from a Single Image. In International Conference of Computer Vision (ICCV), October 2005. IEEE. (url) (pdf)
Abstract: "Many computer vision algorithms limit their performance by ignoring the underlying 3D geometric structure in the image. We show that we can estimate the coarse geometric properties of a scene by learning appearance-based models of geometric classes, even in cluttered natural scenes. Geometric classes describe the 3D orientation of an image region with respect to the camera. We provide a multiple-hypothesis framework for robustly estimating scene structure from a single image and obtaining confidences for each geometric label. These confidences can then be used to improve the performance of many other applications. We provide a thorough quantitative evaluation of our algorithm on a set of outdoor images and demonstrate its usefulness in two applications: object detection and automatic single-view reconstruction."

@inproceedings{hoiem-iccv-2005,
author = "Derek Hoiem and Alexei A. Efros and Martial Hebert",
title = "Geometric Context from a Single Image",
booktitle = "International Conference of Computer Vision (ICCV)",
month = "October",
year = "2005",
publisher = "IEEE",
url="http://www.ri.cmu.edu/pubs/pub_5164.html",
pdf="http://www.ri.cmu.edu/pub_files/pub4/hoiem_derek_2005_3/hoiem_derek_2005_3.pdf",
abstract="Many computer vision algorithms limit their performance by ignoring the underlying 3D geometric structure in the image. We show that we can estimate the coarse geometric properties of a scene by learning appearance-based models of geometric classes, even in cluttered natural scenes. Geometric classes describe the 3D orientation of an image region with respect to the camera. We provide a multiple-hypothesis framework for robustly estimating scene structure from a single image and obtaining confidences for each geometric label. These confidences can then be used to improve the performance of many other applications. We provide a thorough quantitative evaluation of our algorithm on a set of outdoor images and demonstrate its usefulness in two applications: object detection and automatic single-view reconstruction.",
keywords="" 
}

Yan Ke, Rahul Sukthankar, and Martial Hebert. Efficient Temporal Mean Shift for Activity Recognition in Video. In NIPS Workshop on Activity Recognition and Discovery, 2005.
Abstract: ""

@InProceedings{ke-nips-05,
author = {Yan Ke and Rahul Sukthankar and Martial Hebert},
title = {Efficient Temporal Mean Shift for Activity Recognition in Video},
booktitle = {NIPS Workshop on Activity Recognition and Discovery},
year = 2005,
abstract = "",
url = "",
pdf = "",
keywords ="" 
}

Yan Ke, Rahul Sukthankar, and Martial Hebert. Efficient Visual Event Detection using Volumetric Features. In International Conference on Computer Vision, October 2005. (url) (pdf)
Abstract: "This paper studies the use of volumetric features as an alternative to popular local descriptor approaches for event detection in video sequences. Motivated by the recent success of similar ideas in object detection on static images, we generalize the notion of 2D box features to 3D spatio-temporal volumetric features. This general framework enables us to do real-time video analysis. We construct a real-time event detector for each action of interest by learning a cascade of filters based on volumetric features that efficiently scans video sequences in space and time. This event detector recognizes actions that are traditionally problematic for interest point methods -- such as smooth motions where insufficient space-time interest points are available. Our experiments demonstrate that the technique accurately detects actions on real-world sequences and is robust to changes in viewpoint, scale and action speed. We also adapt our technique to the related task of human action classification and confirm that it achieves performance comparable to a current interest point based human activity recognizer on a standard database of human activities. "

@inproceedings{ke-iccv-05,
author = "Yan Ke and Rahul Sukthankar and Martial Hebert",
title = "Efficient Visual Event Detection using Volumetric Features",
booktitle = "International Conference on Computer Vision",
month = "October",
year = "2005",
pdf="http://www.ri.cmu.edu/pub_files/pub4/ke_yan_2005_2/ke_yan_2005_2.pdf",
url="http://www.ri.cmu.edu/pubs/pub_5165.html",
keywords="",
abstract="This paper studies the use of volumetric features as an alternative to popular local descriptor approaches for event detection in video sequences. Motivated by the recent success of similar ideas in object detection on static images, we generalize the notion of 2D box features to 3D spatio-temporal volumetric features. This general framework enables us to do real-time video analysis. We construct a real-time event detector for each action of interest by learning a cascade of filters based on volumetric features that efficiently scans video sequences in space and time. This event detector recognizes actions that are traditionally problematic for interest point methods -- such as smooth motions where insufficient space-time interest points are available. Our experiments demonstrate that the technique accurately detects actions on real-world sequences and is robust to changes in viewpoint, scale and action speed. We also adapt our technique to the related task of human action classification and confirm that it achieves performance comparable to a current interest point based human activity recognizer on a standard database of human activities. " 
}

Sanjiv Kumar and Martial Hebert. A Hierarchical Field Framework for Unified Context-Based Classification. In Proc. ICCV, October 2005. (url)

@inproceedings{Kumar_2005_5291,
author = "Sanjiv Kumar and Martial Hebert",
title = "A Hierarchical Field Framework for Unified Context-Based Classification",
booktitle = "Proc. ICCV",
month = "October",
year = "2005",
url = "http://www.ri.cmu.edu/pubs/pub_5291.html" 
}

Sanjiv Kumar and Martial Hebert. A Hierarchical Field Framework for Unified Context-Based Classification. In International Conference on Computer Vision, 2005.
Abstract: ""

@InProceedings{kumar-iccv-05,
author = {Sanjiv Kumar and Martial Hebert},
title = {A Hierarchical Field Framework for Unified Context-Based Classification},
booktitle = "International Conference on Computer Vision",
year = 2005,
abstract="",
pdf="",
url="",
keywords="" 
}

Jean-Francois Lalonde, Ranjith Unnikrishnan, Nicolas Vandapel, and Martial Hebert. Scale Selection for Classification of Point-sampled 3-D Surfaces. In Fifth International Conference on 3-D Digital Imaging and Modeling (3DIM 2005), June 2005. (url) (pdf)
Keywords: scale selection, 3-d data, classification, ladar.

Annotation: "Three-dimensional ladar data are commonly used to perform scene understanding for outdoor mobile robots, specifically in natural terrain. One effective method is to classify points using features based on local point cloud distribution into surfaces, linear structures or clutter volumes. But the local features are computed using 3-D points within a support-volume. Local and global point density variations and the presence of multiple manifolds make the problem of selecting the size of this support volume, or scale, challenging. In this paper we adopt an approach inspired by recent developments in computational geometry and investigate the problem of automatic data-driven scale selection to improve point cloud classification. The approach is validated with results using data from different sensors in various environments classified into different terrain types (vegetation, solid surface and linear structure). " .

@inproceedings{Lalonde_2005_5070,
author = "Jean-Francois Lalonde and Ranjith Unnikrishnan and Nicolas Vandapel and Martial Hebert",
title = "Scale Selection for Classification of Point-sampled 3-D Surfaces",
booktitle = "Fifth International Conference on 3-D Digital Imaging and Modeling (3DIM 2005)",
month = "June",
year = "2005",
keywords ="scale selection, 3-d data, classification, ladar",
url="http://www.ri.cmu.edu/pubs/pub_5070.html",
pdf="http://www.ri.cmu.edu/pub_files/pub4/lalonde_jean_francois_2005_2/lalonde_jean_francois_2005_2.pdf",
annote="Three-dimensional ladar data are commonly used to perform scene understanding for outdoor mobile robots, specifically in natural terrain. One effective method is to classify points using features based on local point cloud distribution into surfaces, linear structures or clutter volumes. But the local features are computed using 3-D points within a support-volume. Local and global point density variations and the presence of multiple manifolds make the problem of selecting the size of this support volume, or scale, challenging. In this paper we adopt an approach inspired by recent developments in computational geometry and investigate the problem of automatic data-driven scale selection to improve point cloud classification. The approach is validated with results using data from different sensors in various environments classified into different terrain types (vegetation, solid surface and linear structure). " 
}

Jean-Francois Lalonde, Nicolas Vandapel, and Martial Hebert. Data Structure for Efficient Processing in 3-D. In Robotics: Science and Systems 1, June 2005. MIT Press. (url) (pdf)
Keywords: data structure, classification, 3-d data.

Annotation: "Autonomous navigation in natural environment requires three-dimensional (3-D) scene representation and interpretation. High density laser-based sensing is commonly used to capture the geometry of the scene, producing large amount of 3-D points with variable spatial density. We proposed a terrain classification method using such data. The approach relies on the computation of local features in 3-D using a support volume and belongs, as such, to a larger class of computational problems where range searches are necessary. This operation on traditional data structure is very expensive and, in this paper, we present an approach to address this issue. The method relies on reusing already computed data as the terrain classification process progresses over the environment representation. We present results that show significant speed improvement using ladar data collected in various environments with a ground mobile robot." .

@inproceedings{lalonde-rss-05,
author = "Jean-Francois Lalonde and Nicolas Vandapel and Martial Hebert",
title = "Data Structure for Efficient Processing in 3-D",
booktitle = "Robotics: Science and Systems 1",
month = "June",
year = "2005",
publisher = "MIT Press",
keywords="data structure, classification, 3-d data",
pdf = "http://www.ri.cmu.edu/pub_files/pub4/lalonde_jean_francois_2005_3/lalonde_jean_francois_2005_3.pdf",
url = "http://www.ri.cmu.edu/pubs/pub_5080.html",
annote = "Autonomous navigation in natural environment requires three-dimensional (3-D) scene representation and interpretation. High density laser-based sensing is commonly used to capture the geometry of the scene, producing large amount of 3-D points with variable spatial density. We proposed a terrain classification method using such data. The approach relies on the computation of local features in 3-D using a support volume and belongs, as such, to a larger class of computational problems where range searches are necessary. This operation on traditional data structure is very expensive and, in this paper, we present an approach to address this issue. The method relies on reusing already computed data as the terrain classification process progresses over the environment representation. We present results that show significant speed improvement using ladar data collected in various environments with a ground mobile robot." 
}

Marius Leordeanu and Martial Hebert. A Spectral Technique for Correspondence Problems using Pairwise Constraints. In International Conference on Computer Vision, October 2005. (url) (pdf)
Abstract: "We present an efficient spectral method for finding consistent correspondences between two sets of features. We build the adjacency matrix M of a graph whose nodes represent the potential correspondences and the weights on the links represent pairwise agreements between potential correspondences. Correct assignments are likely to establish links among each other and thus form a strongly connected cluster. Incorrect correspondences establish links with the other correspondences only accidentally, so they are unlikely to belong to strongly connected clusters. We recover the correct assignments based on how strongly they belong to the main cluster of M, by using the principal eigenvector of M and imposing the mapping constraints required by the overall correspondence mapping (one-to-one or one-to-many). The experimental evaluation shows that our method is robust to outliers, accurate in terms of matching rate, while being several orders of magnitude faster than existing methods."

@inproceedings{leordeanu-iccv-05,
author = "Marius Leordeanu and Martial Hebert",
title = "A Spectral Technique for Correspondence Problems using Pairwise Constraints",
month = "October",
year = "2005",
booktitle = "International Conference on Computer Vision",
pdf="http://www.ri.cmu.edu/pub_files/pub4/leordeanu_marius_2005_1/leordeanu_marius_2005_1.pdf",
url="http://www.ri.cmu.edu/pubs/pub_5161.html",
keywords="",
abstract="We present an efficient spectral method for finding consistent correspondences between two sets of features. We build the adjacency matrix M of a graph whose nodes represent the potential correspondences and the weights on the links represent pairwise agreements between potential correspondences. Correct assignments are likely to establish links among each other and thus form a strongly connected cluster. Incorrect correspondences establish links with the other correspondences only accidentally, so they are unlikely to belong to strongly connected clusters. We recover the correct assignments based on how strongly they belong to the main cluster of M, by using the principal eigenvector of M and imposing the mapping constraints required by the overall correspondence mapping (one-to-one or one-to-many). The experimental evaluation shows that our method is robust to outliers, accurate in terms of matching rate, while being several orders of magnitude faster than existing methods." 
}

Charles Rosenberg, Martial Hebert, and Henry Schneiderman. Semi-Supervised Self-Training of Object Detection Models. In Seventh IEEE Workshop on Applications of Computer Vision, January 2005. (pdf)
Abstract: "The construction of appearance-based object detection systems is time-consuming and difficult because a large number of training examples must be collected and manually labeled in order to capture variations in object appearance. Semi-supervised training is a means for reducing the effort needed to prepare the training set by training the model with a small number of fully labeled example and an additional set of unlabeled or weakly labeled examples. In this work we present a semi-supervised approach to training object detection systems based on self-training. We implement our approach as a wrapper around the training process of an existing detector and present empirical results. The key contributions of this empirical study is to demonstrate that a model trained in this manner can achieve results comparable to a model trained in the traditional manner using a much larger set of fully labeled data, and that a training data selection metric that is defined independently of the detector greatly outperforms a selection metric based on the confidence generated by the detector."

@inproceedings{Rosenberg_2005_4875,
author = "Charles Rosenberg and Martial Hebert and Henry Schneiderman",
title = "Semi-Supervised Self-Training of Object Detection Models",
booktitle = "Seventh IEEE Workshop on Applications of Computer Vision",
month = "January",
year = "2005",
abstract="The construction of appearance-based object detection systems is time-consuming and difficult because a large number of training examples must be collected and manually labeled in order to capture variations in object appearance. Semi-supervised training is a means for reducing the effort needed to prepare the training set by training the model with a small number of fully labeled example and an additional set of unlabeled or weakly labeled examples. In this work we present a semi-supervised approach to training object detection systems based on self-training. We implement our approach as a wrapper around the training process of an existing detector and present empirical results. The key contributions of this empirical study is to demonstrate that a model trained in this manner can achieve results comparable to a model trained in the traditional manner using a much larger set of fully labeled data, and that a training data selection metric that is defined independently of the detector greatly outperforms a selection metric based on the confidence generated by the detector.",
pdf="http://www.ri.cmu.edu/pub_files/pub4/rosenberg_charles_2005_1/rosenberg_charles_2005_1.pdf",
keywords="",

}

A. Stein and M. Hebert. Incorporating Background Invariance into Feature-Based Object Recognition. In Workshop on Applications of Computer Vision, 2005. (url) (pdf)
Keywords: SIFT, object recognition, BSIFT, invariant features, recognition.

Annotation: Current feature-based object recognition methods use information derived from local image patches. For robustness, features are engineered for invariance to various transformations, such as rotation, scaling, or affine warping. When patches overlap object boundaries, however, errors in both detection and matching will almost certainly occur due to inclusion of unwanted background pixels. This is common in real images, which often contain significant background clutter, objects which are not heavily textured, or objects which occupy a relatively small portion of the image. We suggest improvements to the popular Scale Invariant Feature Transform (SIFT) which incorporate local object boundary information. The resulting feature detection and descriptor creation processes are invariant to changes in background. We call this method the Background and Scale Invariant Feature Transform (BSIFT). We demonstrate BSIFT's superior performance in feature detection and matching on synthetic and natural images.

@InProceedings{stein-vacv-05,
author = "A. Stein and M. Hebert",
title = "Incorporating Background Invariance into Feature-Based Object Recognition",
booktitle = {Workshop on Applications of Computer Vision},
year = 2005,
url="http://www.ri.cmu.edu/pubs/pub_4781.html",
pdf="http://www.ri.cmu.edu/pub_files/pub4/stein_andrew_2005_1/stein_andrew_2005_1.pdf",
annote = {Current feature-based object recognition methods use information derived from local image patches. For robustness, features are engineered for invariance to various transformations, such as rotation, scaling, or affine warping. When patches overlap object boundaries, however, errors in both detection and matching will almost certainly occur due to inclusion of unwanted background pixels. This is common in real images, which often contain significant background clutter, objects which are not heavily textured, or objects which occupy a relatively small portion of the image. We suggest improvements to the popular Scale Invariant Feature Transform (SIFT) which incorporate local object boundary information. The resulting feature detection and descriptor creation processes are invariant to changes in background. We call this method the Background and Scale Invariant Feature Transform (BSIFT). We demonstrate BSIFT's superior performance in feature detection and matching on synthetic and natural images.},
keywords="SIFT, object recognition, BSIFT, invariant features, recognition" 
}

John Tuley, Nicolas Vandapel, and Martial Hebert. Analysis and Removal of Artifacts in 3-D LADAR Data. In IEEE International Conference on Robotics and Automation, 2005. (pdf)

Annotation: "Errors in laser based range measurements can be divided into two categories: intrinsic sensor errors (range drift with temperature, systematic and random errors), and errors due to the interaction of the laser beam with the environment. The former have traditionally received attention and can be modeled. The latter in contrast have long been observed but not well characterized. We propose to do so in this paper. In addition, we present a sensor independent method to remove such artifacts. The objective is to improve the overall quality of 3-D scene reconstruction to perform terrain classification of scenes with vegetation.".

@inproceedings{tuley-icra-05,
author = "John Tuley and Nicolas Vandapel and Martial Hebert",
title = "Analysis and Removal of Artifacts in 3-D LADAR Data",
booktitle = "IEEE International Conference on Robotics and Automation",
year = "2005",
annote ="Errors in laser based range measurements can be divided into two categories: intrinsic sensor errors (range drift with temperature, systematic and random errors), and errors due to the interaction of the laser beam with the environment. The former have traditionally received attention and can be modeled. The latter in contrast have long been observed but not well characterized. We propose to do so in this paper. In addition, we present a sensor independent method to remove such artifacts. The objective is to improve the overall quality of 3-D scene reconstruction to perform terrain classification of scenes with vegetation.",
pdf ="http://www.ri.cmu.edu/pub_files/pub4/tuley_john_2005_1/tuley_john_2005_1.pdf" 
}

Ranjith Unnikrishnan and Martial Hebert. Measures of Similarity. In Seventh IEEE Workshop on Applications of Computer Vision, pages 394-400, January 2005. (pdf)
Abstract: "Quantitative evaluation and comparison of image segmentation algorithms is now feasible owing to the recent availability of collections of hand-labeled images. However, little attention has been paid to the design of measures to compare one segmentation result to one or more manual segmentations of the same image. Existing measures in statistics and computer vision literature suffer either from intolerance to labeling refinement, making them unsuitable for image segmentation, or from the existence of degenerate cases, making the process of training algorithms using the measures to be prone to failure. This paper surveys previous work on measures of similarity and illustrates scenarios where they are applicable for performance evaluation in computer vision. For the image segmentation problem, we propose a measure that addresses the above concerns and has desirable properties such as accommodation of labeling errors at segment boundaries, region sensitive refinement, and compensation for differences in segment ambiguity between images. "

@inproceedings{Unnikrishnan_2005_4874,
author = "Ranjith Unnikrishnan and Martial Hebert",
title = "Measures of Similarity",
booktitle = "Seventh IEEE Workshop on Applications of Computer Vision",
month = "January",
year = "2005",
pages = "394-400",
keywords="",
pdf="http://www.ri.cmu.edu/pub_files/pub4/unnikrishnan_ranjith_2005_1/unnikrishnan_ranjith_2005_1.pdf",
abstract="Quantitative evaluation and comparison of image segmentation algorithms is now feasible owing to the recent availability of collections of hand-labeled images. However, little attention has been paid to the design of measures to compare one segmentation result to one or more manual segmentations of the same image. Existing measures in statistics and computer vision literature suffer either from intolerance to labeling refinement, making them unsuitable for image segmentation, or from the existence of degenerate cases, making the process of training algorithms using the measures to be prone to failure. This paper surveys previous work on measures of similarity and illustrates scenarios where they are applicable for performance evaluation in computer vision. For the image segmentation problem, we propose a measure that addresses the above concerns and has desirable properties such as accommodation of labeling errors at segment boundaries, region sensitive refinement, and compensation for differences in segment ambiguity between images. " 
}

Ranjith Unnikrishnan, Caroline Pantofaru, and Martial Hebert. A Measure for Objective Evaluation of Image Segmentation Algorithms. In Proceedings of the 2005 IEEE Conference on Computer Vision and Pattern Recognition (CVPR '05), Workshop on Empirical Evaluation Methods in Computer Vision, June 2005. (url) (pdf)

Annotation: "Despite significant advances in image segmentation techniques, evaluation of these techniques thus far has been largely subjective. Typically, the effectiveness of a new algorithm is demonstrated only by the presentation of a few segmented images and is otherwise left to subjective evaluation by the reader. Little effort has been spent on the design of perceptually correct measures to compare an automatic segmentation of an image to a set of hand-segmented examples of the same image. This paper demonstrates how a modification of the Rand index, the Normalized Probabilistic Rand (NPR) index, meets the requirements of largescale performance evaluation of image segmentation. We show that the measure has a clear probabilistic interpretation as the maximum likelihood estimator of an underlying Gibbs model, can be correctly normalized to account for the inherent similarity in a set of ground truth images, and can be computed efficiently for large datasets. Results are presented on images from the publicly available Berkeley Segmentation dataset.".

@inproceedings{unnikrishnan-cvpr-05,
author = "Ranjith Unnikrishnan and Caroline Pantofaru and Martial Hebert",
title = "A Measure for Objective Evaluation of Image Segmentation Algorithms",
booktitle = "Proceedings of the 2005 IEEE Conference on Computer Vision and Pattern Recognition (CVPR '05), Workshop on Empirical Evaluation Methods in Computer Vision",
month = "June",
year = "2005",
annote ="Despite significant advances in image segmentation techniques, evaluation of these techniques thus far has been largely subjective. Typically, the effectiveness of a new algorithm is demonstrated only by the presentation of a few segmented images and is otherwise left to subjective evaluation by the reader. Little effort has been spent on the design of perceptually correct measures to compare an automatic segmentation of an image to a set of hand-segmented examples of the same image. This paper demonstrates how a modification of the Rand index, the Normalized Probabilistic Rand (NPR) index, meets the requirements of largescale performance evaluation of image segmentation. We show that the measure has a clear probabilistic interpretation as the maximum likelihood estimator of an underlying Gibbs model, can be correctly normalized to account for the inherent similarity in a set of ground truth images, and can be computed efficiently for large datasets. Results are presented on images from the publicly available Berkeley Segmentation dataset.",
keywords ="",
url="http://www.ri.cmu.edu/pubs/pub_5083.html",
pdf = "http://www.ri.cmu.edu/pub_files/pub4/unnikrishnan_ranjith_2005_2/unnikrishnan_ranjith_2005_2.pdf" 
}

Nicolas Vandapel, James Kuffner, and Omead Amidi. Planning 3-D Path Networks in Unstructured Environments. In Proceedings of the IEEE International Conference on Robotics and Automation (ICRA), 2005. (url)

Annotation: "In this paper, we explore the problem of three-dimensional motion planning in highly cluttered and unstructured outdoor environments. Because accurate sensing and modeling of obstacles is notoriously difficult in such environments, we aim to build computational tools that can handle large point data sets (e.g. LADAR data). Using a priori aerial data scans of forested environments, we compute a network of free space bubbles forming safe paths within environments cluttered with tree trunks, branches and dense foliage. The network (roadmap) of paths is used for efficiently planning paths that consider obstacle clearance information. We present experimental results on large point data sets typical of those faced by Unmanned Aerial Vehicles, but also applicable to ground-based robots navigating through forested environments." .

@inproceedings{vandapel-icra-05,
author = "Nicolas Vandapel and James Kuffner and Omead Amidi",
title = "Planning 3-D Path Networks in Unstructured Environments",
booktitle = "Proceedings of the IEEE International Conference on Robotics and Automation (ICRA)",
year = "2005",
url="http://www.ri.cmu.edu/pubs/pub_4962.html",
annote ="In this paper, we explore the problem of three-dimensional motion planning in highly cluttered and unstructured outdoor environments. Because accurate sensing and modeling of obstacles is notoriously difficult in such environments, we aim to build computational tools that can handle large point data sets (e.g. LADAR data). Using a priori aerial data scans of forested environments, we compute a network of free space bubbles forming safe paths within environments cluttered with tree trunks, branches and dense foliage. The network (roadmap) of paths is used for efficiently planning paths that consider obstacle clearance information. We present experimental results on large point data sets typical of those faced by Unmanned Aerial Vehicles, but also applicable to ground-based robots navigating through forested environments." 
}

Hulya Yalcin, Robert Collins, Martial Hebert, and Michael J. Black. A Flow-Based Approach to Vehicle Detection and Background Mosaicking in Airborne Video. In Video Proceedings in conjunction with CVPR'05, June 2005. (url) (pdf)

Annotation: "In this work, we address the detection of vehicles in a video stream obtained from a moving airborne platform. We propose a Bayesian framework for estimating dense optical flow over time that explicitly estimates a persistent model of background appearance. The approach assumes that the scene can be described by background and occlusion layers, estimated within an Expectation-Maximization framework. The mathematical formulation of the paper is an extension of our previous work where motion and appearance models for foreground and background layers are estimated simultaneously in a Bayesian framework" .

@inproceedings{yalcin-cvpr-05,
author = "Hulya Yalcin and Robert Collins and Martial Hebert and Michael J. Black",
title = "A Flow-Based Approach to Vehicle Detection and Background Mosaicking in Airborne Video",
booktitle = "Video Proceedings in conjunction with CVPR'05",
month = "June",
year = "2005",
keywords = "",
url = "http://www.ri.cmu.edu/pubs/pub_5018.html",
pdf = "http://www.ri.cmu.edu/pub_files/pub4/yalcin_hulya_2005_3/yalcin_hulya_2005_3.pdf",
annote = "In this work, we address the detection of vehicles in a video stream obtained from a moving airborne platform. We propose a Bayesian framework for estimating dense optical flow over time that explicitly estimates a persistent model of background appearance. The approach assumes that the scene can be described by background and occlusion layers, estimated within an Expectation-Maximization framework. The mathematical formulation of the paper is an extension of our previous work where motion and appearance models for foreground and background layers are estimated simultaneously in a Bayesian framework" 
}

H. Yalcin, R. Collins, and M. Hebert. Background Estimation under Rapid Gain Change in Thermal Imagery. In IEEE Workshop on Object Tracking and Classification in and Beyond the Visible Spectrum (OTCBVS'05), 2005.

@InProceedings{yalcin-otcbvs-05,
author = {H. Yalcin and R. Collins and M. Hebert},
title = {Background Estimation under Rapid Gain Change in Thermal Imagery},
booktitle = {IEEE Workshop on Object Tracking and Classification in and Beyond the Visible Spectrum (OTCBVS'05)},
year = 2005 
}

Automatic Photo Pop-up. In ACM SIGGRAPH, August 2005. (url) (pdf)
Abstract: "This paper presents a fully automatic method for creating a 3D model from a single photograph. The model is made up of several texture-mapped planar billboards and has the complexity of a typical children?s pop-up book illustration. Our main insight is that instead of attempting to recover precise geometry, we statistically model geometric classes defined by their orientations in the scene. Our algorithm labels regions of the input image into coarse categories: ?ground?, ?sky?, and ?vertical?. These labels are then used to ?cut and fold? the image into a pop-up model using a set of simple assumptions. Because of the inherent ambiguity of the problem and the statistical nature of the approach, the algorithm is not expected to work on every image. However, it performs surprisingly well for a wide range of scenes taken from a typical person?s photo album."

Annotation: "AVI Video available at: http://www.cs.cmu.edu/~dhoiem/projects/popup/popup_movie_912_500_DivX.avi".

@inproceedings{hoiem-siggraph-05 author = "Derek Hoiem and Alexei A. Efros and Martial Hebert",
title = "Automatic Photo Pop-up",
booktitle = "ACM SIGGRAPH",
month = "August",
year = "2005",
annote = "AVI Video available at: http://www.cs.cmu.edu/~dhoiem/projects/popup/popup_movie_912_500_DivX.avi",
abstract ="This paper presents a fully automatic method for creating a 3D model from a single photograph. The model is made up of several texture-mapped planar billboards and has the complexity of a typical children?s pop-up book illustration. Our main insight is that instead of attempting to recover precise geometry, we statistically model geometric classes defined by their orientations in the scene. Our algorithm labels regions of the input image into coarse categories: ?ground?, ?sky?, and ?vertical?. These labels are then used to ?cut and fold? the image into a pop-up model using a set of simple assumptions. Because of the inherent ambiguity of the problem and the statistical nature of the approach, the algorithm is not expected to work on every image. However, it performs surprisingly well for a wide range of scenes taken from a typical person?s photo album.",
pdf="http://www.ri.cmu.edu/pub_files/pub4/hoiem_derek_2005_2/hoiem_derek_2005_2.pdf",
url="http://www.ri.cmu.edu/pubs/pub_5125.html",
keywords="" 
}

Internal reports

Sanjiv Kumar. Models for Learning Spatial Interactions in Natural Images for Context-Based Classification. Technical report CMU-CS-05-28, Robotics Institute, Carnegie Mellon University, Pittsburgh, PA, August 2005. (url) (pdf)

Annotation: "Classification of various image components (pixels, regions and objects) in meaningful categories is a challenging task due to ambiguities inherent to visual data. Natural images exhibit strong contextual dependencies in the form of spatial interactions among components. For example, neighboring pixels tend to have similar class labels, and different parts of an object are related through geometric constraints. Going beyond these, different regions e.g., sky and water, or objects e.g., monitor and keyboard appear in restricted spatial configurations. Modeling these interactions is crucial to achieve good classification accuracy. In this thesis, we present discriminative field models that capture spatial interactions in images in a discriminative framework based on the concept of Conditional Random Fields proposed by Lafferty et al. The discriminative fields offer several advantages over the Markov Random Fields (MRFs) popularly used in computer vision. First, they allow to capture arbitrary dependencies in the observed data by relaxing the restrictive assumption of conditional independence generally made in MRFs for tractability. Second, the interaction in labels in discriminative fields is based on the observed data, instead of being fixed a priori as in MRFs. This is critical to incorporate different types of context in images within a single framework. Finally, the discriminative fields derive their classification power by exploiting probabilistic discriminative models instead of the generative models used in MRFs. Since the graphs induced by the discriminative fields may have arbitrary topology, exact maximum likelihood parameter learning may not be feasible. We present an approach which approximates the gradients of the likelihood with simple piecewise constant functions constructed using inference techniques. To exploit different levels of contextual information in images, a two-layer hierarchical formulation is also described. It encodes both short-range interactions (e.g., pixelwise label smoothing) as well as long-range interactions (e.g., relative configurations of objects or regions) in a tractable manner. The models proposed in this thesis are general enough to be applied to several challenging computer vision tasks such as contextual object detection, semantic scene segmentation, texture recognition, and image denoising seamlessly within a single framework. ".

@techreport{Kumar_2005_5143,
author = "Sanjiv Kumar",
title = "Models for Learning Spatial Interactions in Natural Images for Context-Based Classification",
institution = "Robotics Institute, Carnegie Mellon University",
month = "August",
year = "2005",
number = "CMU-CS-05-28",
address = "Pittsburgh, PA",
url = "http://www.ri.cmu.edu/pubs/pub_5143.html",
pdf = "http://www.ri.cmu.edu/pub_files/pub4/kumar_sanjiv_2005_1/kumar_sanjiv_2005_1.pdf",
annote = "Classification of various image components (pixels, regions and objects) in meaningful categories is a challenging task due to ambiguities inherent to visual data. Natural images exhibit strong contextual dependencies in the form of spatial interactions among components. For example, neighboring pixels tend to have similar class labels, and different parts of an object are related through geometric constraints. Going beyond these, different regions e.g., sky and water, or objects e.g., monitor and keyboard appear in restricted spatial configurations. Modeling these interactions is crucial to achieve good classification accuracy. In this thesis, we present discriminative field models that capture spatial interactions in images in a discriminative framework based on the concept of Conditional Random Fields proposed by Lafferty et al. The discriminative fields offer several advantages over the Markov Random Fields (MRFs) popularly used in computer vision. First, they allow to capture arbitrary dependencies in the observed data by relaxing the restrictive assumption of conditional independence generally made in MRFs for tractability. Second, the interaction in labels in discriminative fields is based on the observed data, instead of being fixed a priori as in MRFs. This is critical to incorporate different types of context in images within a single framework. Finally, the discriminative fields derive their classification power by exploiting probabilistic discriminative models instead of the generative models used in MRFs. Since the graphs induced by the discriminative fields may have arbitrary topology, exact maximum likelihood parameter learning may not be feasible. We present an approach which approximates the gradients of the likelihood with simple piecewise constant functions constructed using inference techniques. To exploit different levels of contextual information in images, a two-layer hierarchical formulation is also described. It encodes both short-range interactions (e.g., pixelwise label smoothing) as well as long-range interactions (e.g., relative configurations of objects or regions) in a tractable manner. The models proposed in this thesis are general enough to be applied to several challenging computer vision tasks such as contextual object detection, semantic scene segmentation, texture recognition, and image denoising seamlessly within a single framework. ",

}

Jean-Francois Lalonde, Ranjith Unnikrishnan, Nicolas Vandapel, and Martial Hebert. Scale Selection for Classification of Point-sampled 3-D Surfaces. Technical report CMU-RI-TR-05-01, Robotics Institute, Carnegie Mellon University, Pittsburgh, PA, January 2005. (pdf)
Keywords: scale selection, classification, 3-d data, ladar, data structure, classification, 3-d data.
Abstract: "Laser-based range sensors are commonly used on-board autonomous mobile robots for obstacle detection and scene understanding. A popular methodology for analyzing point cloud data from these sensors is to train Bayesian classifiers using locally computed features on labeled data and use them to compute class posteriors on-line at testing time. However, data from range sensors present a unique challenge for feature computation in the form of significant variation in spatial density of points, both across the field-of-view as well as within structures of interest. In particular, this poses the problem of choosing a scale for analysis and a support-region size for computing meaningful features reliably. While scale theory has been rigorously developed for 2-D images, no equivalent exists for unorganized 3-D point data. Choosing a satisfactory fixed scale over the entire dataset makes feature extraction sensitive to the presence of different manifolds in the data and varying data density. We adopt an approach inspired by recent developments in computational geometry and investigate the problem of automatic data-driven scale selection to improve point cloud classification. The approach is validated with results using real data from different sensors in various environments (indoor, urban outdoor and natural outdoor) classified into different terrain types (vegetation, solid surface and linear structure"

@techreport{lalonde-cmu-tr-05,
author = "Jean-Francois Lalonde and Ranjith Unnikrishnan and Nicolas Vandapel and Martial Hebert",
title = "Scale Selection for Classification of Point-sampled 3-D Surfaces",
institution = "Robotics Institute, Carnegie Mellon University",
month = "January",
year = "2005",
number = "CMU-RI-TR-05-01",
address = "Pittsburgh, PA",
keywords="scale selection, classification, 3-d data, ladar",
pdf="http://www.ri.cmu.edu/pub_files/pub4/lalonde_jean_francois_2005_1/lalonde_jean_francois_2005_1.pdf",
keywords="data structure, classification, 3-d data",
abstract="Laser-based range sensors are commonly used on-board autonomous mobile robots for obstacle detection and scene understanding. A popular methodology for analyzing point cloud data from these sensors is to train Bayesian classifiers using locally computed features on labeled data and use them to compute class posteriors on-line at testing time. However, data from range sensors present a unique challenge for feature computation in the form of significant variation in spatial density of points, both across the field-of-view as well as within structures of interest. In particular, this poses the problem of choosing a scale for analysis and a support-region size for computing meaningful features reliably. While scale theory has been rigorously developed for 2-D images, no equivalent exists for unorganized 3-D point data. Choosing a satisfactory fixed scale over the entire dataset makes feature extraction sensitive to the presence of different manifolds in the data and varying data density. We adopt an approach inspired by recent developments in computational geometry and investigate the problem of automatic data-driven scale selection to improve point cloud classification. The approach is validated with results using real data from different sensors in various environments (indoor, urban outdoor and natural outdoor) classified into different terrain types (vegetation, solid surface and linear structure" 
}

Caroline Pantofaru and Martial Hebert. A Comparison of Image Segmentation Algorithms. Technical report CMU-RI-TR-05-40, Robotics Institute, Carnegie Mellon University, Pittsburgh, PA, September 2005. (url) (pdf)
Abstract: "Unsupervised image segmentation algorithms have matured to the point where they generate reasonable segmentations, and thus can begin to be incorporated into larger systems. A system designer now has an array of available algorithm choices, however, few objective numerical evaluations exist of these segmentation algorithms. As a first step towards filling this gap, this paper presents an evaluation of two popular segmentation algorithms, the mean shift-based segmentation algorithm and a graph-based segmentation scheme. We also consider a hybrid method which combines the other two methods. This quantitative evaluation is made possible by the recently proposed measure of segmentation correctness, the Normalized Probabilistic Rand (NPR) index, which allows a principled comparison between segmentations created by different algorithms, as well as segmentations on different images. For each algorithm, we consider its correctness as measured by the NPR index, as well as its stability with respect to changes in parameter settings and with respect to different images. An algorithm which produces correct segmentation results with a wide array of parameters on any one image, as well as correct segmentation results on multiple images with the same parameters, will be a useful, predictable and easily adjustable preprocessing step in a larger system. Our results are presented on the Berkeley image segmentation database, which contains 300 natural images along with several ground truth hand segmentations for each image. As opposed to previous results presented on this database, the algorithms we compare all use the same image features (position and colour) for segmentation, thereby making their outputs directly comparable. "

@techreport{pantaforu-tr-05,
author = "Caroline Pantofaru and Martial Hebert",
title = "A Comparison of Image Segmentation Algorithms",
institution = "Robotics Institute, Carnegie Mellon University",
month = "September",
year = "2005",
number = "CMU-RI-TR-05-40",
address = "Pittsburgh, PA",
url="http://www.ri.cmu.edu/pubs/pub_5135.html",
pdf="http://www.ri.cmu.edu/pub_files/pub4/pantofaru_caroline_2005_1/pantofaru_caroline_2005_1.pdf",
abstract="Unsupervised image segmentation algorithms have matured to the point where they generate reasonable segmentations, and thus can begin to be incorporated into larger systems. A system designer now has an array of available algorithm choices, however, few objective numerical evaluations exist of these segmentation algorithms. As a first step towards filling this gap, this paper presents an evaluation of two popular segmentation algorithms, the mean shift-based segmentation algorithm and a graph-based segmentation scheme. We also consider a hybrid method which combines the other two methods. This quantitative evaluation is made possible by the recently proposed measure of segmentation correctness, the Normalized Probabilistic Rand (NPR) index, which allows a principled comparison between segmentations created by different algorithms, as well as segmentations on different images. For each algorithm, we consider its correctness as measured by the NPR index, as well as its stability with respect to changes in parameter settings and with respect to different images. An algorithm which produces correct segmentation results with a wide array of parameters on any one image, as well as correct segmentation results on multiple images with the same parameters, will be a useful, predictable and easily adjustable preprocessing step in a larger system. Our results are presented on the Berkeley image segmentation database, which contains 300 natural images along with several ground truth hand segmentations for each image. As opposed to previous results presented on this database, the algorithms we compare all use the same image features (position and colour) for segmentation, thereby making their outputs directly comparable. ",
keywords="" 
}

BACK TO INDEX

The VMR Lab is part of the Vision and Autonomous Systems Center within the Robotics Institute in the School of Computer Science, Carnegie Mellon University.
This page was generated by a modified version of bibtex2html written by Gregoire Malandain