-- remark by Antonio Torralba (after his third beer)
OverviewHuman vision is one of the most remarkable machines that ever existed. From sparse, noisy, hopelessly ambiguous local scene measurements our brain manages to create a coherent global visual experience. But how can this task, while seemingly effortless for humans, remain so excruciatingly difficult for a computer? Part of the answer is that humans rely on years of prior visual experience to make sense of the world, while computers have to start tabula rasa. Clearly, learning is needed to make progress on this severely underconstrained problem. However, attempts at direct application of machine learning tools to raw visual data have been largely unsuccessful.
The goal of this graduate seminar course is to gain a deeper understanding of the computer vision problem in order to better reason about ways data and learning could be used to tackle it. The central focus will be on representation of visual data, rather than on fancy learning techniques. We will be looking at all stages of visual processing, from low-level (color, texture, local patches) all the way to high-level (object recognition, general image understanding). We will pay particular attention to mid-level vision (grouping, segmentation, figure/ground, scene layout, image parsing) -- a crucial glue tying vision together that has been largely neglected. The course will have an emphasis on using large amounts of real data (images, video, textual annotations, other meta-data). We will also discuss the difficult issue of what is the right choice of training data and how can it be acquired.
The course will consist of reading and presenting an eclectic mix of classic and recent papers on a range of topics. All students will be required to submit a written summary for each paper. Additionally, there will be two substantial class projects during the term.
Prerequisite: 16-720 or equivalent graduate Computer Vision course (No exceptions!)
We will meet on Mondays and Wednesdays Noon-1:20pm in Wean 5409.
Instructor: Alexei (Alyosha) Efros, Assistant Professor, 4207 Newell-Simon Hall.
TA: Tomasz Malisiewicz, Smith Hall 232.
ProjectsCheck out this list of data sources for some ideas on where to get images to work with.
Meeting times are listed on the project meeting schedule.
Paper DiscussionLeave your comments about papers on the Class Blog
Paper ListThe paper list contains papers that will be discussed in class.
|Jan. 12||Alyosha Efros||Introduction, Vision: Measurement vs. Perception
Administrative stuff, overview of the course, datasets
|Jan. 14||Alyosha Efros||Overview lecture on theories of Visual Perception
Cavanagh, P. (1995) Vision is getting easier every day
Optional reading: Nakayama, K. (1998) Vision fin-de-siecle - a reductionistic explanation of perception for the 21st century?
|Jan 19||MLK Jr. Day -- no class|
|Jan. 21||Alyosha Efros||Overview lecture on the physiology of vision
Adelson, E.H. & Bergen, J.R. (1991) The Plenoptic Function and the Elements of Early Vision
|Jan. 26||Alyosha Efros||What should be done at the Low level?||Low Level ppt|
|Jan. 28||Varun||Probability of Boundary
D. Martin, C. Fowlkes, and J. Malik. PAMI May 2004.
Learning to Detect Natural Image Boundaries Using Local Brightness, Color, and Texture Cues
M. Maire, P. Arbelaez, C. Fowlkes, and J. Malik. CVPR 2008.
Using Contours to Detect and Localize Junctions in Natural Images
|Global Pb pdf|
|Feb. 2||Varun/Alyosha||Probability of Boundary Continued
When is object/scene recognition just texture recognition?
|Feb. 4||Alyosha Efros||When is object/scene recognition just texture recognition?
Renninger, L.W. & Malik, J. Vision Research 2004. When is scene recognition just texture recognition?
Csurka, G., Bray, C., Dance, C., and Fan, L. ECCV 2004. Visual categorization with bags of keypoints
Winn, J., Criminisi, A. and Minka, T. ICCV 2005.Object Categorization by Learned Universal Visual Dictionary
|Bag of Words ppt|
|Feb. 9||Dan||TextonBoost Day
TextonBoost: Joint Appearance, Shape and Context Modeling for Multi-Class Object Recognition and Segmentation.
J. Shotton, J. Winn, C. Rother, A. Criminisi. In Proc. ECCV 2006.
(optional) Journal version of TextonBoost
|Feb. 11||Dan/Alyosha||Semantic Texton Forests
Semantic Texton Forests for Image Categorization and Segmentation.
J. Shotton, M. Johnson, R. Cipolla. In Proc. IEEE CVPR 2008.
Semantic Texton Forests implementation
Intro to objects: Geometry vs. Appearance
Object Recognition in the Geometric Era: a Retrospective. J. Mundy. 2006.
|(link is above)|
|Feb. 16||James Hays||Large Scale Scene Matching for Graphics and Vision|
|Feb. 18||Alyosha||Appearance makes an appearance: Sliding windows, constellations models, pictorial structures, and more.||Objects and Parts ppt|
|Feb. 23||Edward||Parts-Based Object Recognition
A Discriminatively Trained, Multiscale, Deformable Part Model
P. Felzenszwalb, D. McAllester, D. Ramanan, In Proc. IEEE CVPR 2008.
|Feb. 25||Alyosha||Introduction to Context||Context|
|March 2||Michael Tarr||Uncovering the Fundamental Principles of Visual Cortex|
Object Recognition by Scene Alignment
B. C. Russell, A. Torralba, C. Liu, R. Fergus, W. T. Freeman In NIPS, 2007.
code for gist descriptor
SIFT flow: dense correspondence across different scenes
C. Liu, J. Yuen, A. Torralba, J. Sivic, and W. T. Freeman. ECCV, 2008.
|Stealing Objects with Computer Vision|
|March 16||Ekaterina||Contextual priming for object detection
A. Torralba. IJCV, Vol. 53(2), 169-191, 2003.
Object detection and localization using local and global features
K. Murphy, A. Torralba, D. Eaton, W. T. Freeman. Sicily workshop on object recognition, 2005.
(see also The context challenge)
|Context Challenge slides|
|March 18||Alyosha / Utsav||
Introduction to Segmentation
Objects in Context
Andrew Rabinovich, Andrea Vedaldi, Carolina Galleguillos, Eric Wiewiora and Serge Belongie. ICCV 2007.
Context Based Object Categorization: A Critical Survey
Carolina Galleguillos and Serge Belongie
Technical Report UCSD CS2008-0928, 2008.
|Friday March 20
|Utsav / Alyosha||Context Continued...
Object Categorization using Co-Ocurrence, Location and Appearance
Carolina Galleguillos, Andrew Rabinovich and Serge Belongie. CVPR 2008.
Recovering Human Body Configurations: Combining Segmentation and Recognition
G. Mori, X. Ren, A. Efros, and J. Malik. CVPR 2004.
|Objects in Context|
Learning a Classification Model for Segmentation.
Xiaofeng Ren and Jitendra Malik. in ICCV 2003.
Image Segmentation by Data-Driven Markov Chain Monte Carlo.
Z. Tu and S. C. Zhu, PAMI, vol.24, no.5, pp. 657-673, May, 2002.
|Segmentation Through Optimization|
On the semantics of a glance at a scene. Biederman, I. 1981
Recovering Surface Layout from an Image. D. Hoiem, A.A. Efros, and M. Hebert. IJCV, Vol. 75, No. 1, October 2007.
See also classic papers: Yakimovsky and Feldman (1973), Ohta, Kanade, Sakai (1978), Barrow and Tenenboum (1978).
|It's a 3D world, after all!|
|March 30||Alyosha||Occlusion and Figure/Ground Reasoning
Figure/Ground Assignment in Natural Images.
Xiaofeng Ren, Charless Fowlkes and Jitendra Malik, ECCV 2006.
Recovering Occlusion Boundaries from a Single Image.
D. Hoiem, A.N. Stein, A.A. Efros, and M. Hebert. ICCV 2007
|April 1st||Jiyan||Depth estimation from image structure
A. Torralba, A. Oliva. PAMI Vol. 24(9): 1226-1238. 2003.
Depth Information by Stage Classification.
Vladimir Nedovic, Arnold W.M. Smeulders, Andre Redert and Jan-Mark Geusebroek. ICCV 2007.
Learning Depth from Single Monocular Images
Ashutosh Saxena, Sung H. Chung, Andrew Y. Ng. In NIPS 2005.
Recall: Automatic Query Expansion with a Generative Feature Model for Object Retrieval
Chum, O. , Philbin, J. , Sivic, J. , Isard, M. and Zisserman, A. In ICCV 2007.
|Content Based Image Search|
Principles of Categorization. Eleanor Rosch
Big Book of Concepts, Chapter 3. Gregory L. Murphy.
(just focus on "Exemplar View" section)
|Concepts: from Instances to Meaning|
|April 10: 3:30pm NSH 1305||Derek Hoiem||Inferring Object Attributes
|April 13||Yuandong||Sharing visual features for multiclass and multiview object detection
A. Torralba, K. P. Murphy and W. T. Freeman PAMI. vol. 29, no. 5, pp. 854-869, May, 2007.
Sharing Features Code
|April 15||Zhaoyin||Learning compositional models for object categories from small sample sets
J. Porway, B. Yao, and S.C. Zhu Book Chapter in Sven Dickinson et al (eds.)
Object Categorization: Computer and Human Vision Perspectives, Cambridge University Press. 2009
A Stochastic Grammar of Images
Song-Chun Zhu and David Mumford
Foundations and Trends in Computer Graphics and Vision Vol. 2, No 4. 2007.
|April 20||Alyosha and Scott||Learning Realistic Human Actions from Movies.
Ivan Laptev, Marcin Marszalek, Cordelia Schmid and Benjamin Rozenfeld. in Proc. CVPR'08
|April 22||Alyosha||The Unreasonable Effectiveness of Data and the Wisdom of Crowds||data|
|April 27||Alyosha + everyone||How do we know that we have solved vision?||Solving Vision|
|April 29||Project Presentations (1-4)|
|April 30, 6-8pm in NSH 3002||Project Presentations (5-10)|
Similar CoursesThis course has been inspired by these offered by several of my colleagues. Here is a partial list:
- Visual Recognition and Search (Kristen Grauman, Texas-Austin, Spring 2009)
- Visual Scene Understanding (Derek Hoiem, UIUC, Spring 2009)
- Statistical Models for Visual Recognition (Deva Ramanan, UCI, Winter 2009)
- Object Recognition and Scene Understanding (Antonio Torralba, MIT, Fall 2008)
- Scene Understanding Seminar (Aude Oliva, MIT, Fall 2008)
- Selected Topics in Vision & Learning (Serge Belongie, UCSD, Fall 2006)
- Learning and Inference in Vision (Bill Freeman, MIT)
- High-level Recognition in Computer Vision (Fei-Fei Li, Princeton)
- Recognizing People, Objects, and Scenes (Jitendra Malik, Berkeley)
- Recognition Problems in Computer Vision (Greg Mori, SFU, Fall 2007)
- Visual Recognition (Pietro Perona, CalTech)
- Vision and Learning (Jianbo Shi, UPenn)