"The aim of computer vision is to overfit to our visual world"
-- remark by Antonio Torralba (after his third beer)

Overview

Human vision is one of the most remarkable machines that ever existed. From sparse, noisy, hopelessly ambiguous local scene measurements our brain manages to create a coherent global visual experience. But how can this task, while seemingly effortless for humans, remain so excruciatingly difficult for a computer? Part of the answer is that humans rely on years of prior visual experience to make sense of the world, while computers have to start tabula rasa. Clearly, learning is needed to make progress on this severely underconstrained problem. However, attempts at direct application of machine learning tools to raw visual data have been largely unsuccessful.

The goal of this graduate seminar course is to gain a deeper understanding of the computer vision problem in order to better reason about ways data and learning could be used to tackle it. The central focus will be on representation of visual data, rather than on fancy learning techniques. We will be looking at all stages of visual processing, from low-level (color, texture, local patches) all the way to high-level (object recognition, general image understanding). We will pay particular attention to mid-level vision (grouping, segmentation, figure/ground, scene layout, image parsing) -- a crucial glue tying vision together that has been largely neglected. The course will have an emphasis on using large amounts of real data (images, video, textual annotations, other meta-data). We will also discuss the difficult issue of what is the right choice of training data and how can it be acquired.

The course will consist of reading and presenting an eclectic mix of classic and recent papers on a range of topics. All students will be required to submit a written summary for each paper. Additionally, there will be two substantial class projects during the term.

Prerequisite: 16-720 or equivalent graduate Computer Vision course (No exceptions!)

We will meet on Mondays and Wednesdays Noon-1:20pm in Wean 5409.

Instructor: Alexei (Alyosha) Efros, Assistant Professor, 4207 Newell-Simon Hall.
TA: Tomasz Malisiewicz, Smith Hall 232.

Projects

Check out this list of data sources for some ideas on where to get images to work with.

Challenges:

Each project team will have regular meetings to discuss the progress of their course project.
Meeting times are listed on the project meeting schedule.

Paper Discussion

Leave your comments about papers on the Class Blog

Paper List

The paper list contains papers that will be discussed in class.

Schedule

Introduction

Date	Presenter	Paper title	Slides
Jan. 12	Alyosha Efros	Introduction, Vision: Measurement vs. Perception Administrative stuff, overview of the course, datasets	Intro ppt
Jan. 14	Alyosha Efros	Overview lecture on theories of Visual Perception Cavanagh, P. (1995) Vision is getting easier every day Optional reading: Nakayama, K. (1998) Vision fin-de-siecle - a reductionistic explanation of perception for the 21st century?	Theories ppt
Jan 19		MLK Jr. Day -- no class
Jan. 21	Alyosha Efros	Overview lecture on the physiology of vision Adelson, E.H. & Bergen, J.R. (1991) The Plenoptic Function and the Elements of Early Vision	Physiology ppt
Jan. 26	Alyosha Efros	What should be done at the Low level?	Low Level ppt
Jan. 28	Varun	Probability of Boundary D. Martin, C. Fowlkes, and J. Malik. PAMI May 2004. Learning to Detect Natural Image Boundaries Using Local Brightness, Color, and Texture Cues M. Maire, P. Arbelaez, C. Fowlkes, and J. Malik. CVPR 2008. Using Contours to Detect and Localize Junctions in Natural Images	Global Pb pdf
Feb. 2	Varun/Alyosha	Probability of Boundary Continued When is object/scene recognition just texture recognition?
Feb. 4	Alyosha Efros	When is object/scene recognition just texture recognition? Renninger, L.W. & Malik, J. Vision Research 2004. When is scene recognition just texture recognition? Csurka, G., Bray, C., Dance, C., and Fan, L. ECCV 2004. Visual categorization with bags of keypoints Winn, J., Criminisi, A. and Minka, T. ICCV 2005.Object Categorization by Learned Universal Visual Dictionary	Bag of Words ppt
Feb. 9	Dan	TextonBoost Day TextonBoost: Joint Appearance, Shape and Context Modeling for Multi-Class Object Recognition and Segmentation. J. Shotton, J. Winn, C. Rother, A. Criminisi. In Proc. ECCV 2006. (optional) Journal version of TextonBoost TextonBoost Code	TextonBoost+STF pdf TextonBoost+STF ppt
Feb. 11	Dan/Alyosha	Semantic Texton Forests Semantic Texton Forests for Image Categorization and Segmentation. J. Shotton, M. Johnson, R. Cipolla. In Proc. IEEE CVPR 2008. Semantic Texton Forests implementation Intro to objects: Geometry vs. Appearance Object Recognition in the Geometric Era: a Retrospective. J. Mundy. 2006.	(link is above)
Feb. 16	James Hays	Large Scale Scene Matching for Graphics and Vision
Feb. 18	Alyosha	Appearance makes an appearance: Sliding windows, constellations models, pictorial structures, and more.	Objects and Parts ppt
Feb. 23	Edward	Parts-Based Object Recognition A Discriminatively Trained, Multiscale, Deformable Part Model P. Felzenszwalb, D. McAllester, D. Ramanan, In Proc. IEEE CVPR 2008. code	Latent pdf
Feb. 25	Alyosha	Introduction to Context	Context
March 2	Michael Tarr	Uncovering the Fundamental Principles of Visual Cortex
March 4	Brian	Object Recognition by Scene Alignment B. C. Russell, A. Torralba, C. Liu, R. Fergus, W. T. Freeman In NIPS, 2007. code for gist descriptor SIFT flow: dense correspondence across different scenes C. Liu, J. Yuen, A. Torralba, J. Sivic, and W. T. Freeman. ECCV, 2008. project page	Stealing Objects with Computer Vision
March 16	Ekaterina	Contextual priming for object detection A. Torralba. IJCV, Vol. 53(2), 169-191, 2003. Object detection and localization using local and global features K. Murphy, A. Torralba, D. Eaton, W. T. Freeman. Sicily workshop on object recognition, 2005. (see also The context challenge)	Context Challenge slides
March 18	Alyosha / Utsav	Introduction to Segmentation Objects in Context Andrew Rabinovich, Andrea Vedaldi, Carolina Galleguillos, Eric Wiewiora and Serge Belongie. ICCV 2007. Context Based Object Categorization: A Critical Survey Carolina Galleguillos and Serge Belongie Technical Report UCSD CS2008-0928, 2008.	Segmentation
Friday March 20 NSH 1109	Utsav / Alyosha	Context Continued... Object Categorization using Co-Ocurrence, Location and Appearance Carolina Galleguillos, Andrew Rabinovich and Serge Belongie. CVPR 2008. Segmentation Continued... Recovering Human Body Configurations: Combining Segmentation and Recognition G. Mori, X. Ren, A. Efros, and J. Malik. CVPR 2004.	Objects in Context
March 23	Pyry	Learning a Classification Model for Segmentation. Xiaofeng Ren and Jitendra Malik. in ICCV 2003. project page Image Segmentation by Data-Driven Markov Chain Monte Carlo. Z. Tu and S. C. Zhu, PAMI, vol.24, no.5, pp. 657-673, May, 2002. project page	Segmentation Through Optimization
March 25	Alyosha	Surfaces On the semantics of a glance at a scene. Biederman, I. 1981 Recovering Surface Layout from an Image. D. Hoiem, A.A. Efros, and M. Hebert. IJCV, Vol. 75, No. 1, October 2007. See also classic papers: Yakimovsky and Feldman (1973), Ohta, Kanade, Sakai (1978), Barrow and Tenenboum (1978).	It's a 3D world, after all!
March 30	Alyosha	Occlusion and Figure/Ground Reasoning Figure/Ground Assignment in Natural Images. Xiaofeng Ren, Charless Fowlkes and Jitendra Malik, ECCV 2006. Project Page Recovering Occlusion Boundaries from a Single Image. D. Hoiem, A.N. Stein, A.A. Efros, and M. Hebert. ICCV 2007	Occlusions
April 1st	Jiyan	Depth estimation from image structure A. Torralba, A. Oliva. PAMI Vol. 24(9): 1226-1238. 2003. Depth Information by Stage Classification. Vladimir Nedovic, Arnold W.M. Smeulders, Andre Redert and Jan-Mark Geusebroek. ICCV 2007. Learning Depth from Single Monocular Images Ashutosh Saxena, Sung H. Chung, Andrew Y. Ng. In NIPS 2005.	Learning Depth
April 6	Mark	Total Recall: Automatic Query Expansion with a Generative Feature Model for Object Retrieval Chum, O. , Philbin, J. , Sivic, J. , Isard, M. and Zisserman, A. In ICCV 2007.	Content Based Image Search
April 8	Alyosha	Categorization Principles of Categorization. Eleanor Rosch Big Book of Concepts, Chapter 3. Gregory L. Murphy. (just focus on "Exemplar View" section)	Concepts: from Instances to Meaning
April 10: 3:30pm NSH 1305	Derek Hoiem	Inferring Object Attributes
April 13	Yuandong	Sharing visual features for multiclass and multiview object detection A. Torralba, K. P. Murphy and W. T. Freeman PAMI. vol. 29, no. 5, pp. 854-869, May, 2007. Sharing Features Code	Sharing Slides
April 15	Zhaoyin	Learning compositional models for object categories from small sample sets J. Porway, B. Yao, and S.C. Zhu Book Chapter in Sven Dickinson et al (eds.) Object Categorization: Computer and Human Vision Perspectives, Cambridge University Press. 2009 A Stochastic Grammar of Images Song-Chun Zhu and David Mumford Foundations and Trends in Computer Graphics and Vision Vol. 2, No 4. 2007.	Grammar Slides
April 20	Alyosha and Scott	Learning Realistic Human Actions from Movies. Ivan Laptev, Marcin Marszalek, Cordelia Schmid and Benjamin Rozenfeld. in Proc. CVPR'08 project page	video Action Slides
April 22	Alyosha	The Unreasonable Effectiveness of Data and the Wisdom of Crowds	data
April 27	Alyosha + everyone	How do we know that we have solved vision?	Solving Vision
April 29		Project Presentations (1-4)
April 30, 6-8pm in NSH 3002		Project Presentations (5-10)

Similar Courses

This course has been inspired by these offered by several of my colleagues. Here is a partial list:

Visual Recognition and Search (Kristen Grauman, Texas-Austin, Spring 2009)
Visual Scene Understanding (Derek Hoiem, UIUC, Spring 2009)
Statistical Models for Visual Recognition (Deva Ramanan, UCI, Winter 2009)
Object Recognition and Scene Understanding (Antonio Torralba, MIT, Fall 2008)
Scene Understanding Seminar (Aude Oliva, MIT, Fall 2008)
Selected Topics in Vision & Learning (Serge Belongie, UCSD, Fall 2006)
Learning and Inference in Vision (Bill Freeman, MIT)
High-level Recognition in Computer Vision (Fei-Fei Li, Princeton)
Recognizing People, Objects, and Scenes (Jitendra Malik, Berkeley)
Recognition Problems in Computer Vision (Greg Mori, SFU, Fall 2007)
Visual Recognition (Pietro Perona, CalTech)
Vision and Learning (Jianbo Shi, UPenn)

Some tutorials, workshops and seminars:

CMU VASC Seminar
Como Workshop on Category-level Object Recognition (2008)
IMA Visual Learning and Recognition Workshop (2006)
MSRI Visual Recognition Workshop (2006)
Scene Understanding Symposium SUnS
Recognizing and Learning Object Categories (ICCV'05/CVPR'07 Tutorial)

Page maintained by Tomasz Malisiewicz (email: tmalisie at cs dot cmu dot edu)