Robotics Thesis Proposal
- Newell-Simon Hall
- OSHAN MISRA
- Ph.D. Student
- Robotics Institute
Visual Learning without Exhaustive Supervision
Machine learning models have led to remarkable progress in visual recognition. A key driving factor for this progress is the abundance of labeled data. Over the years, researchers have spent a lot of effort curating visual data and carefully labeling it. However, moving forward, it seems impossible to annotate the vast amounts of visual data with everything we wish to learn from it. This reliance on exhaustive labeling is a key limitation in the rapid development and deployment of computer vision systems in the real world. Our current systems also scale poorly to the large number of concepts and are passively spoon-fed supervision and data.
In this thesis, we explore methods that enable visual learning without exhaustive supervision. Our core idea is to model the natural regularity and repetition from the visual world in our learning algorithms as their inductive bias. We observe recurring patterns in the visual world - a person always lifts their foot before taking a step, dogs are similar to other furry creatures than to furniture, etc. This natural regularity in visual data also imposes regularities on the semantic tasks and models that operate on it - a dog classifier must be similar to classifiers of furry animals than to furniture classifiers. We exploit this abundant natural structure or `supervision' in the visual world in the form of self-supervision for our models, modeling relationships between tasks and labels, and similarities in the space of classifiers. We show the effectiveness of these methods on both static images and videos across varied tasks such as image classification, object detection, action recognition, human pose estimation etc. However, all these methods are still passively fed supervision and thus lack agency: the ability to decide what information they need and how to get it. To this end, we propose having interactive learners that ask for supervision when needed and can also decide what samples they want to learn from.
Martial Hebert (Co-chair)
Abhinav Gupta (Co-chair)
Alexei A. Efros (University of California, Berkeley)
Andrew Zisserman (University of Oxford)