=========================================================================== Semi-Supervised Training of Models for Appearance-Based Statistical Object Detection Methods Charles Joseph Rosenberg CMU-CS-04-150 May 2004 Ph.D. Thesis School of Computer Science Carnegie Mellon University Pittsburgh, PA 15213 Thesis Committee Martial Hebert, Co-Chair Sebastian Thrun, Co-Chair Henry Schneiderman Avrim Blum Tom Minka, Microsoft Research Copyright 2004 Charles Rosenberg This research was supported in part by a fellowship from the Eastman Kodak Company. The views and conclusions in this document are those of the author and should not be interpreted as representing the official policies, either expressed or implied of Carnegie Mellon University or the Eastman Kodak Company. =========================================================================== Abstract Appearance-based object detection systems using statistical models have proven quite successful. They can reliably detect textured, rigid objects in a variety of poses, lighting conditions and scales. However, the construction of these systems is time-consuming and difficult because a large number of training examples must be collected and manually labeled in order to capture variations in object appearance. Typically, this requires indicating which regions of the image correspond to the object to be detected, and which belong to background clutter, as well as marking key landmark locations on the object. The goal of this work is to pursue and evaluate approaches which reduce the amount of fully labeled examples needed, by training these models in a semi-supervised manner. To this end, we develop approaches based on Expectation-Maximization and self-training that utilize a small number of fully labeled training examples in combination with a set of weakly labeled examples. This is advantageous in that weakly labeled data are inherently less costly to generate, since the label information is specified in an uncertain or incomplete fashion. For example, a weakly labeled image might be labeled as containing the training object, with the object location and scale left unspecified. In this work we analyze the performance of the techniques developed through a comprehensive empirical investigation. We find that supplementing a small fully labeled training set with weakly labeled data in the training process reliably improves detector performance for a variety of detection approaches. The outcome is the identification of successful approaches and key issues that are central to achieving good performance in the semi-supervised training of object detection systems. ===========================================================================