Robotics Thesis Defense

  • Newell-Simon Hall
  • Mauldin Auditorium 1305
  • Ph.D. Student
  • Robotics Institute
  • Carnegie Mellon University
Thesis Orals

Visual Learning with Minimal Human Supervision

Machine learning models have led to remarkable progress in visual recognition. A key factor driving this progress is the abundance of labeled data. Unfortunately, this reliance on lots of labeled data is also a key limitation in the rapid development and deployment of vision systems. These visual recognition systems show poor performance on concepts with limited data. Also, as these models are passive and are just “fed” lots of supervision, they lack the ability to actively seek supervision and improve their own performance. This hurts their adaptability and generalization to new environments.
To tackle these challenges, this thesis explores methods that enable visual learning with minimal supervision. The core idea is to model the natural regularity and repetition from the visual world in our learning algorithms as their inductive bias. This regularity can be used by directly exploiting similarities in the visual data, or indirectly by using the structure in the semantic tasks and models that operate on this visual data. We use this abundant natural structure or “supervision” in the visual world in the form of temporal structure from videos, modeling relationships between tasks and labels, and similarities in the space of classifiers. We show the effectiveness of these methods on both static images and videos across various tasks such as image classification, object detection, action recognition, human pose estimation, etc. However, all these methods are still passively fed supervision and thus lack the ability to decide what information they need and how to get it. To this end, we propose interactive learners that ask for supervision when needed and can also decide what samples they want to learn from.

Thesis Committee:
Martial Hebert (Co-chair)
Abhinav Gupta (Co-chair)
Deva Ramanan
Alexei A. Efros (University of California, Berkeley)
Andrew Zisserman (University of Oxford)

Copy of Thesis Document

For More Information, Please Contact: