Two-Granularity Tracking: Mediating Trajectory and Detection Graphs for Tracking under Occlusions1CIS, UPenn
2 X'ian University
Abstract
We want to segment and track objects occluding each other in crowded scenes.
We propose a tracking framework that mediates grouping cues from two levels of tracking granularities: coarse-grain detection tracklets and fine-grain point trajectories.
Each tracking granularity proposes corresponding grouping cues: trajectories with similar long-term motion and disparity attract each other, detections overlapping in time repulse each other.
Tracking is formulated as selection-clustering in the joint detection and trajectory space.
Affinities of trajectories and detections will be contradictory in cases of false alarm detections or accidental motion similarity of trajectories.
We resolve such contradictions in a steering-clustering framework
where confident detections change trajectory affinities, by inducing repulsions between trajectories claimed by repulsive detection tracklets. Two-granularity tracking offers a unified representation for object segmentation and tracking independent of what objects to track, how occluded they are, whether monocular or binocular input or whether camera is moving or not.
Detection tracklets and point trajectories: complementary for tracking/segmentation.
Two-Granularity joint graph
The resulting joint graph suffers from: 1) false alarm detection tracklets that erroneously claim trajectories 2) affinity contradictions between trajectory affinities and detection tracklet repulsions in cases of accidental motion similarity, which confuse the co-clustering.
Steering Cut
Clustering in the steered graph provides the space time object clusters. Results - Code
The latest version of the source code can be downloaded here.
Please report comments/bugs to katef at seas.upenn.edu .
DatasetThe UrbanStreet dataset used in the paper can be downloaded here [188M] . It contains 18 stereo sequences of pedestrians taken from a stereo rig mounted on a car driving in the streets of Philadelphia during rush hours. The image resolution is 516x1024. Ground-truth is provided in the form of pedestrian segmentation masks for the left view. All pedestrians larger than 100 pixels in height are labelled every 4 frames (0.6 seconds) in each video sequence. The video below shows ground-truth label samples.
PaperTwo Granularity Tracking: Mediating Trajectory and Detection Graphs for Tracking under Occlusions Katerina Fragkiadaki, Weiyu Zhang, Geng Zhang, and Jianbo Shi in ECCV 2012 paper | poster |
Last update: Dec, 2012.