Watch and Learn: Semi-Supervised Learning of Object Detectors from Videos


We present a semi-supervised approach that localizes multiple unknown objects in videos. We address the problem of training object detectors from sparsely labeled video data. Existing approaches either do not consider multiple objects, or make strong assumptions about dominant motion of the objects present. In contrast, our approach works in a generic setting of sparse labels, and lack of explicit negative data. We show a way to constrain semi-supervised learning by combining multiple weak cues in videos and exploiting decorrelated errors in a multi-feature modeling of data. Our experiments demonstrate the effectiveness of our approach by evaluating our automatically labeled data on a variety of metrics including quality, coverage (recall), diversity, and relevance to training a object detector.

People

Ishan Misra, Abhinav Shrivastava, Martial Hebert

Paper

Ishan Misra, Abhinav Shrivastava and Martial Hebert. Watch and Learn: Semi-Supervised Learning of Object Detectors from Videos, Proceedings of IEEE Computer Vision and Pattern Recognition (CVPR), 2015. [PDF] [BibTeX]

Acknowledgements

This work was supported in part by NSF Grant IIS1065336, the Siebel Scholarship (for IM), Microsoft Research PhD Fellowship (for AS) and a Google Faculty Research Award.

Copyright notice

The documents contained in these directories are included by the contributing authors as a means to ensure timely dissemination of scholarly and technical work on a non-commercial basis. Copyright and all rights therein are maintained by the authors or by other copyright holders, notwithstanding that they have offered their works here electronically. It is understood that all persons copying this information will adhere to the terms and constraints invoked by each author's copyright. These works may not be reposted without the explicit permission of the copyright holder, except when identified by Creative Commons License 2.0, in which case the license applies to both the original and modified versions of the images.