Video Dataset for Occlusion/Object Boundary Detection

This dataset of short video clips was developed and used for the following publications, as part of our continued research on detecting boundaries for segmentation and recognition. Please reference one or more of them (at least the IJCV article) if you use this dataset.
Combining Local Appearance and Motion Cues for Occlusion Boundary Detection
A. Stein and M. Hebert, British Machine Vision Conference (BMVC), 2007.

Learning to Find Object Boundaries Using Motion Cues
A. Stein, D. Hoiem, and M. Hebert, IEEE International Conference on Computer Vision (ICCV), 2007.

Occlusion Boundaries: Low-Level Detection to High-Level Reasoning
A. Stein, Doctoral Dissertation, Technical Report CMU-RI-TR-08-06, Robotics Institute, Carnegie Mellon University, 2008.

Towards Unsupervised Whole-Object Segmentation: Combining Automated Matting with Boundary Detection
A. Stein, T. Stepleton, and M. Hebert, IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2008.

Occlusion Boundaries from Motion: Low-Level Detection and Mid-Level Reasoning
A. Stein and M. Hebert, International Journal of Computer Vision (IJCV), Vol. 82(3), May 2009.

Example clips and labels from this dataset:

Download the dataset.
Note: This gzipped tar file is nearly 450MB!

The dataset contains three directories:


Contains subdirectories for each clip. Each clip subdirectory contains the individual frames in TIFF format, a ground-truth binary occlusion boundary map for the reference frame (used for the BMVC 2007 paper, also in TIFF format), and for many clips, an MPEG movie of the sequence going forward and then backward (allowing a "ping-pong" style viewing if the movie is played in a continuous loop).

There is also a subdirectory for each clip called 'stabilized' which contains stabilized versions of the frames, where each frame is registered to the middle "reference" frame by a simple global translation. The stabilized sequences have been cropped slightly to exclude border effects. The cropping rectangle is stored in the simple text file "crop-rect" containing the upper-left and lower-right coordinates:

[x_left y_top x_right y_bottom]
Thus, this same cropping should be applied to any ground truth before registering it to the stabilized data.


Contains a Matlab MAT file for each clip. This file contains a single variable called 'objects' which is a "stack" of UINT8 images, each containing a binary mask for a closed object or piece of an object, ordered from back to front. This provides the closed boundaries and foreground/background layering which were used in the ICCV 2007 paper.


For convenience, contains the "reference" or middle frame of each sequence. Note that the middle frame is determined according to
where N is the number of frames in the sequence. Thus, for example, a 20-frame sequence's reference frame is img_010.tif, i.e. the 11th frame (note that numbering starts at 000).

Our Results for Comparison [New!]

For use in comparing to our results in your own publications, there is now an additional dowload available, which contains the following for each of the 30 clips in the data set. All the results and ground truth images described below (provided as PNG files, named as indicated) and a .mat file containing the raw data for each are in this results ZIP file (5 MB).

Occlusion Boundary Probability

A pixel-wise occlusion boundary "probability" map, which maps our probabilities computed on our extracted boundary fragments (as shown and reported in our publications) to a more typical pixel map.


Fragment Ground Truth

The corresponding ground truth map, where each fragment's pixels have been assigned a label based on the "ground_truth_objects" data (as provided in the original dataset download and described above). In other words, this is the "optimal" labeling of the fragments that were actually extracted. (This is the ground truth labeling we used to create our precision and recall graphs.) Since the fragments don't necessarily correspond exactly to the boundaries of the human-labeled object regions, we've done our best to map from one to the other (as discussed in our papers). This may or may not be the "right" sort of labeling for your evaluation: e.g. there can be situations where this may not completely capture false negatives, when the original fragments totally miss a labeled boundary.


Object Ground Truth

The ground truth boundary maps extracted directly from the labeled "object maps", i.e. just the edges of those labeled object regions.


Summary Figure

A summary figure which shows all of the above, plus the original fragment-based results plotted on the original images (note that the corresponding clip name is shown at the bottom of each figure). An example is provided below.


Please contact Andrew Stein (steinX[AT]XriX[DOT]XcmuX[DOT]Xedu) with questions regarding this dataset.

Last updated: July 16, 2010