Marr Revisited: 2D-3D Alignment via
Surface Normal Prediction

Aayush Bansal, Bryan Russell, Abhinav Gupta



We introduce an approach that leverages surface normal predictions, along with appearance cues, to retrieve 3D models for objects depicted in 2D still images from a large CAD object library. Critical to the success of our approach is the ability to recover accurate surface normals for objects in the depicted scene. We introduce a skip-network model built on the pre-trained Oxford VGG convolutional neural network (CNN) for surface normal prediction. Our model achieves state-of-the-art accuracy on the NYUv2 RGB-D dataset for surface normal prediction, and recovers fine object detail compared to previous methods. Furthermore, we develop a two-stream network over the input image and predicted surface normals that jointly learns pose and style for CAD model retrieval. When using the predicted surface normals, our two-stream network matches prior work using surface normals computed from RGB-D images on the task of pose prediction, and achieves state of the art when using RGB-D input. Finally, our two-stream network allows us to retrieve CAD models that better match the style and pose of a depicted object compared with baseline approaches.


Marr Revisited: 2D-3D Model Alignment via Surface Normal Prediction

A. Bansal, B. Russell, and A. Gupta.
CVPR, 2016

PDF | arXiv | Poster | bibtex


is avaible on Github.

Surface Normal Estimation

Here are the qualitative results of surface normal estimation on NYUD test set. The network is trained using 750 trainval images of NYU (with standard augmentation techniques), and ~220K video frames (no data augmentation as opposed to previous works). Note that we got results better than previous state-of-the-art approaches with just 750 training images. These were improved by adding video data. Hopefully we may see better results with aggressive data augmentation.

Friends at CMU have been giving random images to generate surface normal. Probably they want to test if it works outside a standard dataset or not. Here are some of them!

Related Paper

A. Bansal*, X. Chen*, B. Russell, A. Gupta, and D. Ramanan. PixelNet: Towards a General Pixel Level Architecture, arXiv 2016.


This research was partially supported by grants from NSF IIS-1320083 and ONR MURI N000141612007.

Comments, questions to Aayush Bansal