Discriminative Distance Measures for Object Detection


Abstract

The reliable detection of an object of interest in an input image with arbitrary background clutter and occlusion has to a large extent remained an elusive goal in computer vision. Traditional model-based approaches are inappropriate for a multi-class object detection task primarily due to difficulties in modeling arbitrary object classes. Instead, we develop a detection framework whose core component is a nearest neighbor search over object parts. The performance of the overall system is critically dependent on the distance measure used in the nearest neighbor search.

A distance measure that minimizes the mis-classification risk for the 1-nearest neighbor search can be shown to be the probability that a pair of input measurements belong to different classes. This pair-wise probability is not in general a metric distance measure. Furthermore, it can out-perform any metric distance, approaching even the Bayes optimal performance.

In practice, we seek a model for the optimal distance measure that combines the discriminative powers of more elementary distance measures associated with a collection of simple feature spaces that are easy and efficient to implement; in our work, we use histograms of various feature types like color, texture and local shape properties. We use a linear model combining such elementary distance measures that is supported by observations of actual data for a representative discrimination task. For performing efficient nearest neighbor search over large training sets, the linear model was extended to discretized distance measures that combines distance measures associated with discriminators organized in a tree-like structure. The discrete model was combined with the continuous model to yield a hierarchical distance model that is both fast and accurate.

Finally, the nearest neighbor search over object parts was integrated into a whole object detection system and evaluated against both an indoor detection task as well as a face recognition task yielding promising results.


Overview of the Detection Scheme

(1) Detect the top few nearest neighbour parts from the training set using an optimal distance measure combining measurements in color, shape and texture.
(2) For each part detected, hypothesize the presence of a whole object in the input image.
(3) Verify the hypothesis by combining the NN scores of all parts predicted by the whole object hypothesis in the input image, and then thresholding. Not shown is the subsequent non-maximal suppression.

Some results


Slides for my Ph.D. Oral Defense
Ph.D. Document