Learning by Asking Questions


Ishan Misra     Ross Girshick     Rob Fergus     Martial Hebert     Abhinav Gupta     Laurens van der Maaten    

Moving away from Passive Learning

While recognition systems show a lot of promise in the passively supervised setting, it is unclear how to move to interactive agents. In this paper, we propose an interactive Visual Question Answering (VQA) setting in which agents must ask questions about images to learn. As an added bonus, our test time setup is exactly that of VQA which means that we can use well understood metrics for evaluation.

Our model consists of three modules - (1) a question proposal module that generates diverse and relevant questions; (2) a question answering module that tries to answer these questions and judge their difficulty; (3) a question selection module that selects the most useful questions to ask the oracle

Some Results

Example questions asked by our model. As training iterations increase, our model asks more complicated questions.
More sample efficient than the CLEVR training data. Generalizes better to other distributions (CLEVR-Humans).


Ishan Misra, Ross Girshick, Rob Fergus, Martial Hebert,
Abhinav Gupta, Laurens van der Maaten
[pdf] [bib]


The authors would like to thank Arthur Szlam, Jason Weston, Saloni Potdar and Abhinav Shrivastava for helpful discussions and feedback on the manuscript; Soumith Chintala and Adam Paszke for their help with PyTorch.