The proliferation of images and 3D models on the Internet, together with the ubiquity of photo-editing tools and CAD software has enabled millions of consumers to creatively manipulate 2D and 3D content. However, there still exists a disconnect between the 2D and 3D domains: photo-editing software provides realistic edits, yet is largely restricted to 2D, and while CAD software can provide 3D control, obtaining realistic rendered images is far out of the scope of everyday users. Successful coupling of 3D models with images will greatly enhance the repertoire of creative edits that users can perform on images. While attempts have been made to automatically align 3D models of objects to images for the purpose of manipulating image content, the alignments provided by these methods are, in general, approximate. For several tasks such as manipulating photographed objects in 3D, changing viewpoint or illumination, or transferring 3D content across images, the viewpoint and deformation of the 3D model of an object must be an exact match to the object in the photograph.
In this talk, I shall present our first steps towards automatically estimating the exact viewpoint and deformation of 3D models of objects to their photographed instances. This task is rendered challenging for two reasons: 1) objects in the real world show significant deformation from their 3D models in geometry and appearance, and 2) photographed instances of objects have large variation in viewpoint and scale. To estimate viewpoint and scale, we first create a database of rendered images per model from several camera views and distances. We then infer the image location of the object, and the best matching viewpoint and scale, by integrating the location estimates provided by patches of various sizes distributed across each rendered example. To estimate geometric deformation, we trace back to determine the locations provided by individual patches given the viewpoint and scale, while imposing a deformation model on the 3D mesh as a prior. To maintain invariance to appearance changes, we operate in the Laplacian of Gaussian space of the images. We show results for the estimated viewpoint and deformation for several objects given their 3D models and photographs.
Natasha Kholgade is a fifth year Ph.D. student at the Robotics Institute at Carnegie Mellon University. Her research lies at the intersection of computer graphics and computer vision. She is especially interested in extraction, manipulation, and editing of three-dimensional information in images. Prior to attending Carnegie Mellon, she received her B.S. and M.S. degrees in Computer Engineering in 2009 at Rochester Institute of Technology, in Rochester, New York. Her honors include the Google Anita Borg Scholarship (finalist), Outstanding Undergraduate Award, and the Norman E. Miles Scholarship.