Aayush Bansal

I received a PhD in Robotics for my work on the unsupervised learning of the 4D audio-visual world from sparse unconstrained real-world samples under the supervision of Prof. Deva Ramanan and Prof. Yaser Sheikh. During the course of my graduate studies, I received an Uber Presidential Fellowship for 2016-17 and was named a Presidential Fellow at CMU, a Qualcomm Fellowship for 2017-18, and a Snap Fellowship for 2019. I am also fortunate to have collaborated with many production houses in the last few years.

Email  /  CV  /  Google Scholar  /  GitHub  /  Thesis Talk

There is a poetry in our existence. I want to dance on every rhythm and syllable of this poem.


Imagine you want to take your partner down the memory lane and show them the place where you grew up -- the lake, the greenery and the mountains, and make them hear the melodies of chirping birds, the dancing trees, and the rustling leaves. How would the place look like on a sunny day? How mesmerizing it becomes with clouds and rain? How important mountains and forests were to the beauty of the area? and how gloomy it became when the lake once dried up due to the drought? What if I give you a tool to do that.

My research is primarily about building the Computational Studio, a computational machinery that continually learns the 4D audio-visual world from the sparse real-world samples in an unsupervised manner and enables audio-visual social communication for non-experts on their everyday computational devices. There are three essential aspects of the Computational Studio: (1) capturing 4D visual world along with audio; (2) example-based audio-visual synthesis; and (3) interactively synthesizing the audio-visual world.

My work on the Computational Studio lies at the intersection of Computer Vision and Graphics, Machine Learning, Robotics, Human-Computer Interaction, and Psychology.

Streaming Self-Training via
Domain-Agnostic Unlabeled Images

Zhiqiu Lin, Deva Ramanan, Aayush Bansal

project page / paper / arXiv


Video Exploration via Video-Specific Autoencoders
Kevin Wang, Deva Ramanan, Aayush Bansal

project page / long paper / short paper / code / arXiv


Stereo Radiance Fields:
Learning View Synthesis for Sparse Views of Novel Scenes

Julian Chibane, Aayush Bansal, Verica Lazova, Gerard Pons-Moll
Computer Vision and Pattern Recognition (CVPR), 2021
project page / paper / code

Unsupervised Audiovisual Synthesis via Exemplar Autoencoders
Kangle Deng, Aayush Bansal, Deva Ramanan
International Conference on Learning Representations (ICLR), 2021
project page / paper / open-review / code

4D Visualization of Dynamic Events from
Unconstrained Multi-View Videos

Aayush Bansal, Minh Vo, Yaser Sheikh, Deva Ramanan, Srinivasa Narasimhan
Computer Vision and Pattern Recognition (CVPR), 2020
project page / paper / arXiv /summary / multi-view sequences
CMU Press Coverage

Shapes and Context : In-the-wild Image Synthesis & Manipulation
Aayush Bansal, Yaser Sheikh, Deva Ramanan
Computer Vision and Pattern Recognition (CVPR), 2019
(Oral Presentation, Best Paper Award Finalist)
project page / paper / arXiv / five minutes / CVPR Talk
web-app (beta version) / code / demo video

Recycle-GAN: Unsupervised Video Retargeting
Aayush Bansal, Shugao Ma, Deva Ramanan, Yaser Sheikh
European Conference on Computer Vision (ECCV), 2018
project page / pre-print / arXiv / code / one minute
CMU Press Coverage


PixelNN: Example-based Image Synthesis
Aayush Bansal, Yaser Sheikh, Deva Ramanan
International Conference on Learning Representations (ICLR), 2018
project page / arXiv / paper / codes


PixelNet: Representation of the pixels, by the pixels, and for the pixels.
Aayush Bansal, Xinlei Chen, Bryan Russell, Abhinav Gupta, Deva Ramanan
project page / arXiv / codes


Marr Revisited: 2D-3D Alignment via Surface Normal Prediction
Aayush Bansal, Bryan Russell, Abhinav Gupta
Computer Vision and Pattern Recognition (CVPR), 2016
project page /arXiv preprint / codes


Patch Correspondences for Intepreting Pixel-level CNNs
Victor Fragoso, Chunhui Liu, Aayush Bansal, Deva Ramanan


Mid-level Elements for Object Detection
Aayush Bansal, Abhinav Shrivastava, Carl Doersch, Abhinav Gupta
arXiv / analysis


Towards Transparent Systems: Semantic Characterization of Failure Modes
Aayush Bansal, Ali Farhadi, Devi Parikh
European Conference on Computer Vision (ECCV), 2014
project page / supplementary material
VT NEWS / Virginia Center for Autonomous Systems


Understanding How Camera Configuration and Environmental Conditions Affect Appearance-based Localization
Aayush Bansal, Hernan Badino, Daniel Huber
IEEE Intelligent Vehicles (IV), 2014
project page


Which Edges Matter?
Aayush Bansal, Adarsh Kowdle, Devi Parikh, Andrew Gallagher, Larry Zitnick
Workshop on 3D Representation and Recognition (3dRR) at ICCV, 2013
project page / presentation


CANINE : A robotic mine dog
B. A. Stancil, J. Hyams, J. Shelly, K. Babu, H. Badino, A. Bansal, D. Huber, P. Batavia
IS&T Conference on Electronic Imaging (SPIE), 2013
project page / video / competition rules

I wrote the object detection module for this robot.


Geometry-based Methods in Computer Vision (16-822), CMU
Teaching Assistant (TA) with Prof Martial Hebert
Fall 2017

Computer Vision (16-720), CMU
Teaching Assistant (TA) with Prof Srinivasa Narasimhan
Spring 2015


CMU AI Seminar Series
Smith Hall Messiest Desk Award
Unsung heroes in Science: The ones who I have read so far.

Thanks to Jon Barron for the webpage design!