Video Verification of Identity (VIVID)

Principal Investigators:
Martial Hebert (CMU, Robotics)
Bob Collins (CMU, Robotics)
Daniel Morris (Northrop Grumman)

In this project, we address the problem of tracking object in video sequences and of recognizing the tracked objects in subsequent sequences. The recognition is used for verifying the "identity" of an object tracked in an initial sequence and then recovered in another sequence. We are also interested in detecting moving objects (people, vehicles) in wide field imagery.

Technical Approach

The approach to tracking is a combination of mean-shift tracking and feature selection. Mean shift tracking is used for locating the optimal position of the the tracked object in each frame, given a likelihood image whose peak is the most likely position of the tracked object in the image. The likelihood image is generated by computing, at every pixel, the probability that the pixel belongs to the object based on the distribution of feature values learned from the previous frames. The features may be combination of color channels, or other combinations of pixel values. A key issue in this approach is the selection of the "right" features to use for tracking. An approach for feature selection is being developed. In this approach, a collection of candidate features is evaluated for each tracked object and the feature with the highest score is selected for tracking in the next frame. The measure used for evaluating the features is a statistical measure of how well this feature separates the object from the background locally. Current research involves refining the feature selction algorithm, including a measure of object shape in the tracker, and developing robust methods for dealing with occlusions and sudden changes in appearance.

The approach to identification involves the extraction of local regions from the image of the tracked object in each frame of the "training" sequence. The regions are extracted by using detectors that are robust to changes in viewpoint. Each region is represented by a collection of features computed from the distribution of color, intensity, and gradient within the region. A model is constructed from this set of region to enable fast indexing in the feature set. Given a "test" tracking sequence, the features from the new frames computed at candidate locations for the object are matched against the model built from the training sequence. An overall score is computed in order to select the best matching object. Challenges include large variations in pose between training and test sequences, low variations within the training sequence, low density of pixels on the object, and variations in scale and lighting between training and test sequences. Current research involves adapting general research in object recognition to the problem of recognition in videos.

Additionally, we are developing techniques for detecting moving object in wide frame imagery. The challenge here is the detection of objects with very low pixel coverage while maintaining a low false positive rate of detection and the filtering of motion artifacts, such as parallax effects.