16-721 Learning-Based Methods in Vision
Spring 2009
"The aim of computer vision is to overfit to our visual world"
     -- remark by Antonio Torralba (after his third beer)


Human vision is one of the most remarkable machines that ever existed. From sparse, noisy, hopelessly ambiguous local scene measurements our brain manages to create a coherent global visual experience. But how can this task, while seemingly effortless for humans, remain so excruciatingly difficult for a computer? Part of the answer is that humans rely on years of prior visual experience to make sense of the world, while computers have to start tabula rasa. Clearly, learning is needed to make progress on this severely underconstrained problem. However, attempts at direct application of machine learning tools to raw visual data have been largely unsuccessful.

The goal of this graduate seminar course is to gain a deeper understanding of the computer vision problem in order to better reason about ways data and learning could be used to tackle it. The central focus will be on representation of visual data, rather than on fancy learning techniques. We will be looking at all stages of visual processing, from low-level (color, texture, local patches) all the way to high-level (object recognition, general image understanding). We will pay particular attention to mid-level vision (grouping, segmentation, figure/ground, scene layout, image parsing) -- a crucial glue tying vision together that has been largely neglected. The course will have an emphasis on using large amounts of real data (images, video, textual annotations, other meta-data). We will also discuss the difficult issue of what is the right choice of training data and how can it be acquired.

The course will consist of reading and presenting an eclectic mix of classic and recent papers on a range of topics. All students will be required to submit a written summary for each paper. Additionally, there will be two substantial class projects during the term.

Prerequisite: 16-720 or equivalent graduate Computer Vision course (No exceptions!)

We will meet on Mondays and Wednesdays Noon-1:20pm in Wean 5409.

Instructor: Alexei (Alyosha) Efros, Assistant Professor, 4207 Newell-Simon Hall.
TA: Tomasz Malisiewicz, Smith Hall 232.


Check out this list of data sources for some ideas on where to get images to work with.

Challenges: Each project team will have regular meetings to discuss the progress of their course project.
Meeting times are listed on the project meeting schedule.

Paper Discussion

Leave your comments about papers on the Class Blog

Paper List

The paper list contains papers that will be discussed in class.



Date Presenter Paper title Slides
Jan. 12 Alyosha Efros Introduction, Vision: Measurement vs. Perception
Administrative stuff, overview of the course, datasets
Intro ppt
Jan. 14 Alyosha Efros Overview lecture on theories of Visual Perception
Cavanagh, P. (1995) Vision is getting easier every day
Optional reading: Nakayama, K. (1998) Vision fin-de-siecle - a reductionistic explanation of perception for the 21st century?
Theories ppt
Jan 19 MLK Jr. Day -- no class
Jan. 21 Alyosha Efros Overview lecture on the physiology of vision
Adelson, E.H. & Bergen, J.R. (1991) The Plenoptic Function and the Elements of Early Vision
Physiology ppt
Jan. 26 Alyosha Efros What should be done at the Low level? Low Level ppt
Jan. 28 Varun Probability of Boundary
D. Martin, C. Fowlkes, and J. Malik. PAMI May 2004.
Learning to Detect Natural Image Boundaries Using Local Brightness, Color, and Texture Cues

M. Maire, P. Arbelaez, C. Fowlkes, and J. Malik. CVPR 2008.
Using Contours to Detect and Localize Junctions in Natural Images
Global Pb pdf
Feb. 2 Varun/Alyosha Probability of Boundary Continued
When is object/scene recognition just texture recognition?
Feb. 4 Alyosha Efros When is object/scene recognition just texture recognition?
Renninger, L.W. & Malik, J. Vision Research 2004. When is scene recognition just texture recognition?
Csurka, G., Bray, C., Dance, C., and Fan, L. ECCV 2004. Visual categorization with bags of keypoints
Winn, J., Criminisi, A. and Minka, T. ICCV 2005.Object Categorization by Learned Universal Visual Dictionary
Bag of Words ppt
Feb. 9 Dan TextonBoost Day
TextonBoost: Joint Appearance, Shape and Context Modeling for Multi-Class Object Recognition and Segmentation.
J. Shotton, J. Winn, C. Rother, A. Criminisi. In Proc. ECCV 2006.

(optional) Journal version of TextonBoost

TextonBoost Code
TextonBoost+STF pdf
TextonBoost+STF ppt
Feb. 11 Dan/Alyosha Semantic Texton Forests
Semantic Texton Forests for Image Categorization and Segmentation.
J. Shotton, M. Johnson, R. Cipolla. In Proc. IEEE CVPR 2008.
Semantic Texton Forests implementation

Intro to objects: Geometry vs. Appearance
Object Recognition in the Geometric Era: a Retrospective. J. Mundy. 2006.
(link is above)
Feb. 16 James Hays Large Scale Scene Matching for Graphics and Vision
Feb. 18 Alyosha Appearance makes an appearance: Sliding windows, constellations models, pictorial structures, and more. Objects and Parts ppt
Feb. 23 Edward Parts-Based Object Recognition
A Discriminatively Trained, Multiscale, Deformable Part Model
P. Felzenszwalb, D. McAllester, D. Ramanan, In Proc. IEEE CVPR 2008.

Latent pdf
Feb. 25 Alyosha Introduction to Context Context
March 2 Michael Tarr Uncovering the Fundamental Principles of Visual Cortex
March 4 Brian Object Recognition by Scene Alignment
B. C. Russell, A. Torralba, C. Liu, R. Fergus, W. T. Freeman In NIPS, 2007.

code for gist descriptor

SIFT flow: dense correspondence across different scenes
C. Liu, J. Yuen, A. Torralba, J. Sivic, and W. T. Freeman. ECCV, 2008.
project page
Stealing Objects with Computer Vision
March 16 Ekaterina Contextual priming for object detection
A. Torralba. IJCV, Vol. 53(2), 169-191, 2003.

Object detection and localization using local and global features
K. Murphy, A. Torralba, D. Eaton, W. T. Freeman. Sicily workshop on object recognition, 2005.
(see also The context challenge)
Context Challenge slides
March 18 Alyosha / Utsav Introduction to Segmentation

Objects in Context
Andrew Rabinovich, Andrea Vedaldi, Carolina Galleguillos, Eric Wiewiora and Serge Belongie. ICCV 2007.

Context Based Object Categorization: A Critical Survey
Carolina Galleguillos and Serge Belongie
Technical Report UCSD CS2008-0928, 2008.
Friday March 20
NSH 1109
Utsav / Alyosha Context Continued...
Object Categorization using Co-Ocurrence, Location and Appearance
Carolina Galleguillos, Andrew Rabinovich and Serge Belongie. CVPR 2008.

Segmentation Continued...
Recovering Human Body Configurations: Combining Segmentation and Recognition
G. Mori, X. Ren, A. Efros, and J. Malik. CVPR 2004.
Objects in Context
March 23 Pyry Learning a Classification Model for Segmentation.
Xiaofeng Ren and Jitendra Malik. in ICCV 2003.

project page

Image Segmentation by Data-Driven Markov Chain Monte Carlo.
Z. Tu and S. C. Zhu, PAMI, vol.24, no.5, pp. 657-673, May, 2002.

project page
Segmentation Through Optimization
March 25 Alyosha Surfaces
On the semantics of a glance at a scene. Biederman, I. 1981
Recovering Surface Layout from an Image. D. Hoiem, A.A. Efros, and M. Hebert. IJCV, Vol. 75, No. 1, October 2007.
See also classic papers: Yakimovsky and Feldman (1973), Ohta, Kanade, Sakai (1978), Barrow and Tenenboum (1978).
It's a 3D world, after all!
March 30 Alyosha Occlusion and Figure/Ground Reasoning

Figure/Ground Assignment in Natural Images.
Xiaofeng Ren, Charless Fowlkes and Jitendra Malik, ECCV 2006.

Project Page
Recovering Occlusion Boundaries from a Single Image.
D. Hoiem, A.N. Stein, A.A. Efros, and M. Hebert. ICCV 2007
April 1st Jiyan Depth estimation from image structure
A. Torralba, A. Oliva. PAMI Vol. 24(9): 1226-1238. 2003.

Depth Information by Stage Classification.
Vladimir Nedovic, Arnold W.M. Smeulders, Andre Redert and Jan-Mark Geusebroek. ICCV 2007.

Learning Depth from Single Monocular Images
Ashutosh Saxena, Sung H. Chung, Andrew Y. Ng. In NIPS 2005.
Learning Depth
April 6 Mark Total Recall: Automatic Query Expansion with a Generative Feature Model for Object Retrieval
Chum, O. , Philbin, J. , Sivic, J. , Isard, M. and Zisserman, A. In ICCV 2007.
Content Based Image Search
April 8 Alyosha Categorization
Principles of Categorization. Eleanor Rosch
Big Book of Concepts, Chapter 3. Gregory L. Murphy.
(just focus on "Exemplar View" section)
Concepts: from Instances to Meaning
April 10: 3:30pm NSH 1305 Derek Hoiem Inferring Object Attributes
April 13 Yuandong Sharing visual features for multiclass and multiview object detection
A. Torralba, K. P. Murphy and W. T. Freeman PAMI. vol. 29, no. 5, pp. 854-869, May, 2007.

Sharing Features Code
Sharing Slides
April 15 Zhaoyin Learning compositional models for object categories from small sample sets
J. Porway, B. Yao, and S.C. Zhu Book Chapter in Sven Dickinson et al (eds.)
Object Categorization: Computer and Human Vision Perspectives, Cambridge University Press. 2009

A Stochastic Grammar of Images
Song-Chun Zhu and David Mumford
Foundations and Trends in Computer Graphics and Vision Vol. 2, No 4. 2007.
Grammar Slides
April 20 Alyosha and Scott Learning Realistic Human Actions from Movies.
Ivan Laptev, Marcin Marszalek, Cordelia Schmid and Benjamin Rozenfeld. in Proc. CVPR'08

project page
Action Slides
April 22 Alyosha The Unreasonable Effectiveness of Data and the Wisdom of Crowds data
April 27 Alyosha + everyone How do we know that we have solved vision? Solving Vision
April 29 Project Presentations (1-4)
April 30, 6-8pm in NSH 3002 Project Presentations (5-10)

Similar Courses

This course has been inspired by these offered by several of my colleagues. Here is a partial list:

Some tutorials, workshops and seminars:

Page maintained by Tomasz Malisiewicz (email: tmalisie at cs dot cmu dot edu)
Valid HTML 4.01 Transitional