16-721 Learning-Based Methods in Vision
Spring 2007
"The aim of computer vision is to overfit to our visual world"
     -- remark by Antonio Torralba (after his third beer)


A graduate seminar course in Computer Vision with emphasis on using large amounts of real data (images, video, textual annotations, user preferences, etc) to learn the structure of our visual world toward the ultimate goal of Image Understanding. We will be reading an eclectic mix of classic and recent papers on topics including: theories of perception, low-level vision (color, texture), mid-level vision (grouping and segmentation), object and scene recognition, image parsing, words and pictures models, image manifolds, etc.

Prerequisite: 16-720 or similar Computer Vision course

We will meet on Tuesdays and Thursdays from 10:30am-11:50am in NSH 3002.

Instructor: Alexei (Alyosha) Efros, Assistant Professor, 4207 Newell-Simon Hall.
Office Hours: Tuesdays at Noon, Thursdays at 1:30pm

TA: Jean-Francois Lalonde, A521 Newell-Simon Hall.
Office Hours: Monday 1:30pm and Wednesday 1:30pm (also by appointment if you can't make it: jlalonde at cs)


Check out this list of data sources for some ideas on where to get images to work with.


Class Schedule

A list of suggested papers to present is available here.

If you want to change your presentation date, please arrange a swap with another student and notify the instructor and the TA at least two weeks in advance.


Date Presenter Paper title Slides
Jan. 16 Alyosha Efros Introduction, Vision: Measurement vs. Perception
Administrative stuff, overview of the course, datasets
Intro ppt
Jan. 18 Alyosha Efros Overview lecture on theories of Visual Perception
1. Cavanagh, P. (1995) Vision is getting easier every day
2. Cavanagh, P. (1991) What's up in top-down processing?
3. Cavanagh, P. (2005) The Artist as Neuroscientist
Suggested reading: Nakayama, K. (1998) Vision fin-de-siecle - a reductionistic explanation of perception for the 21st century?
Theories ppt
Jan. 23 Alyosha Efros Overview lecture on the physiology of vision
4. Adelson, E.H. & Bergen, J.R. (1991) The Plenoptic Function and the Elements of Early Vision
Physiology ppt

Part 1: Images

Learning Features from Data

Date Presenter Paper title Slides
Jan. 25 Byron
Evaluator: Eakta
5. Olshausen, B. & Field, D. (1996) Wavelet-like receptive fields emerge from a network that learns sparse codes for natural images, Nature (Byron) Coming soon...
Jan. 30 Byron
Evaluator: Eakta
We will first finish the Olshausen & Field paper from last class.
6. Serre, T., Wolf, L. Poggio, T. (2005) Object recognition with features inspired by visual cortex, CVPR (Andrew)
Serre ppt

Distributions of Features

Date Presenter Paper title Slides
Feb. 1st Frederik
7. Rubner, Y., Tomasi, C. and Guibas, L.J. (2000) The Earth Mover's Distance as a Metric for Image Retrieval, IJCV (Frederik) 8. Martin, Fowlkes and Malik (2004) Learning to Detect Natural Image Boundaries Using Local Brightness, Color and Texture Cues, PAMI (Jean-Francois) Rubner ppt
Martin pdf

Images as Texture ("Bag of Words" models)

Date Presenter Paper title Slides
Feb. 6 Alyosha 9. Renninger, L.W. & Malik, J. (2004) When is scene recognition just texture recognition?, Vision Research (Alyosha) 10. Csurka, G., Bray, C., Dance, C., and Fan, L. (2004) Visual categorization with bags of keypoints (Alyosha)
11. Winn, J., Criminisi, A. and Minka, T. (2005) Object Categorization by Learned Universal Visual Dictionary (Alyosha)
Coming soon...

Images as Scenes

Date Presenter Paper title Slides
Feb. 8 Sebastian 12. Torralba, A. and Oliva, A. (2003) Statistics of Natural Image Categories, Network: Computation in Neural Systems (Sebastian)
13. Torralba, A. and Oliva, A. (2002) Depth estimation from image structure, PAMI (Sebastian)
14. Oliva, A. and Torralba, A. (2001) Modeling the shape of the scene: a holistic representation of the spatial envelope, IJCV (Sebastian)
Gist pdf

Images as Feature Vectors

Date Presenter Paper title Slides
Feb. 13 Google talk! (Henry Rowley)
Feb. 15 Alyosha 15. Roweis, S. & Saul, L. (2000) Nonlinear dimensionality reduction by locally linear embedding, Science (Presenter: Alyosha, Evaluator: Ankur) 16. Tenenbaum, J.B., De Silva, V. and Langford, J.C. (2000) A global geometric framework for nonlinear dimensionality reduction, Science (Presenter: Alyosha, Evaluator: Ankur) Manifolds ppt
Feb. 20 Devi
Evaluator: Ankur
Ankur will evaluate papers 15 and 16
Additional applications
Isomap applications ppt
Feb. 22 Ralph 17. Tenenbaum & Freeman (2000) Separating Style and Content with Bilinear Models, Neural Computation (Ralph) Coming soon...

Image matching (Distance Transforms)

Date Presenter Paper title Slides
Feb. 27 Alyosha
Evaluator: Minh
18. Learned-Miller, E. (2005) Data Driven Image Models through Continuous Joint Alignment, PAMI (Alyosha) Registration ppt
Mar. 1 Ankur
Evaluator: Byron
19. Huttenlocker, Klanderman, G. and Rucklidge, W. (1993) Comparing Images Using the Hausdorff Distance, PAMI (Ankur)
20. Borgefors, G. (1988) Hierarchical Chamfer Matching: A Parametric Edge Matching Algorithm, PAMI (Ankur)
Comparison ppt

Image Correspondence (Caltech-101-fest!)

Date Presenter Paper title Slides
Mar. 6 Ross
21. Zhang, H., Berg, A., Maire, M. and Malik, J. (2006) SVM-KNN: Discriminative Nearest Neighbor Classification for Visual Category Recognition, CVPR (Ross)
22. Frome, A., Singer, Y. and Malik, J. (2006) Image Retrieval and Recognition Using Local Distance Functions, NIPS (to appear) (Ross)
23. Berg, A., Berg, T. and Malik, J. (2005) Shape Matching and Object Recognition using Low Distortion Correspondences, CVPR (Alyosha)
Mar. 8 Special lecture by Andrew Zisserman!

Lots of Data is Fun!

Date Presenter Paper title Slides
Mar. 13 No class: Spring break!
Mar. 15 No class: Spring break!
Mar. 20 Hongwen
24. Lazebnik, S., Schmid, C. and Ponce, J. (2006) Beyond Bags of Features: Spatial Pyramid Matching for Recognizing Natural Scene Categories, CVPR (Ross) 27. Sivic, J. and Zisserman, A. (2003) Video Google: A Text Retrieval Approach to Object Matching in Videos, ICCV (webpage) (Presented by A.Z. last class)
28. Nistér, D. and Stewénius, H. (2006) Scalable Recognition with a Vocabulary Tree (Hongwen)
Coming soon...
Mar. 22 Alyosha
25. Zitnik & Kanade (2003) Content-free image retrieval, unpublished (Alyosha)
26. Berg, T., Berg, A., Edwards, J., Maire, M., White, R, Teh, R.Y., Learned-Miller, E. and Forsyth, D.A. (in submission) Names and Faces (Ralph)
Coming soon...
Mar. 27 Devi
27. Dalal and Triggs (2005) Histograms of Oriented Gradients for Human Detection, CVPR (Devi)
  • Data available
28. Marszalek, M. and Schmid, C. (2006) Spatial weighting for bag-of-features, CVPR (Devi)
29. Snavely, N., Seitz, S.M. and Szeliski, R. (2006) Photo tourism: Exploring photo collections in 3D, SIGGRAPH, (webpage) (Jean-Francois)
Coming soon...

Boosting Background

Date Presenter Paper title Slides
Mar. 29 Sebastian
30. AdaBoost background (Sebastian)
31. Friedman, J. H., Hastie, T. and Tibshirani, R. (1998) Additive Logistic Regression: a Statistical View of Boosting (Sebastian)
32. Schneiderman, H. and Kanade, T. (2004) Object Detection Using the Statistics of Parts, IJCV (Presenter: Minh, Evaluator: Andrew) 33. Viola, P. and Jones (2001) Robust Real-time Object Detection, Second International Workshop on Statistical and Computational Theories of Vision (Presenter: Minh, Evaluator: Andrew)
Obj. detection ppt
Evaluation ppt

Part 2: Objects and Parts


Date Presenter Paper title Slides
Apr. 3-5 Alyosha
34. Wertheimer, M. (1923) Laws of Organization in Perceptual Forms (Alyosha)
35. Weiss, Y. (1999) Segmentation using eigenvectors: a unifying view, ICCV (Fred)
36. Ng, A.Y., Jordan, M.I. and Weiss, Y. (2001) On Spectral Clustering: Analysis and an algorithm, NIPS (Fred)
Coming soon...
Apr. 10 Ross
Evaluator: Hongwen
37. Tu and Zhu (2002) Image Segmentation by Data-Driven Markov Chain Monte Carlo, PAMI (Ross) Coming soon...
Apr. 12 Jean-Francois
Evaluator: Hongwen
38. Boykov and Jolly (2001) Interactive Graph Cuts for Optimal Boundary & Region Segmentation of Objects in ND Images, ICCV (Jean-Francois)
  • Application: Li, Y., Sun, J., Tang, C.K. and Shum, H. (2004) Lazy Snapping, SIGGRAPH (Jean-Francois, Evaluator: Hongwen)
Coming soon...

Grouping Repeated Structures

Date Presenter Paper title Slides
Apr. 17 Ankur
39. Boiman, O. and Irani, M (2006), Similarity by Composition, NIPS (Ankur) Coming soon...
Apr. 19 No classes (from academic calendar)
Apr. 24 Eakta
Evaluator: Fred
40. Kannan, A., Winn, J. and Rother, C. (2006) Clustering appearance and shape by learning jigsaws, NIPS (Eakta)
41. Ren, X. and Malik, J. (2003) Learning a Classification Model for Segmentation, ICCV 42. Russell, B.C., Efros, A.A., Sivic, J., Freeman, W.T. and Zisserman, A. (2006) Using Multiple Segmentations to Discover Objects and their Extent in Image Collections, CVPR (Alyosha)
Coming soon...

From Features to Objects

Date Presenter Paper title Slides
Apr. 26 Hongwen 44. Torralba, A., Murphy, K.P. and Freeman, W.T (in press) Sharing visual features for multiclass and multiview object detection, PAMI (Hongwen)
45. Opelt, A., Pinz, A, Zisserman, A. (2006) Incremental learning of object detectors using a visual shape alphabet, CVPR
46. Ferrari, V., Fevrier, L., Jurie, F. and Schmid, C. (2006) Groups of Adjacent Contour Segments for Object Detection, INRIA Technical Report
47. Leibe, B., Leonardis, A. and Schiele, B. (2004) Combined Object Categorization and Segmentation with an Implicit Shape Model, ECCV'04 Workshop on Statistical Learning in Computer Vision (Hongwen)
48. Leibe, B., Seemann, E. and Schiele, B. (2005) Pedestrian Detection in Crowded Scenes, CVPR
Coming soon...

Scenes, Context, and Image Parsing

Date Presenter Paper title Slides
May 1 Byron
66. Saxena, A., Chung, S. and Ng, A.Y. (2005) Learning Depth from Single Monocular Images, NIPS (Byron)
64. Hoiem, D., Efros, A.A. and Hebert, M. (2005) Geometric Context from a Single Image, ICCV (Alyosha)
67. Tu, Z., Chen, X., Yuille, A. and Zhu, S.C. (2005) Image Parsing: Unifying Segmentation, Detection, and Recognition, IJCV
68. Ren, X., Fowlkes, C. and Malik, J. (2006) Figure/Ground Assignment in Natural Images, ECCV
69. Cornelis, N., Leibe, B., Cornelis, K. and Van Gool, L. (2006) 3D City Modeling Using Cognitive Loops, 3DPVT
Coming soon...

Face Modeling / Recognition

Date Presenter Paper title Slides
May 3 Andrew
Evaluator : Ralph
70. Sinha, P., Balas, B.J., Ostrovsky, Y., and Russell, R. (under review) Face recognition by humans: 20 results all computer vision researchers should know about (Andrew)
71. Cootes, T.F., Edwards, G.J. and Taylor, C.J. (1998) Active Appearance Models, ECCV (Minh)
  • (Evaluator: Ralph)
Coming soon...

Final project presentations

Date Informations
May 7 The presentations will be from 1:00 to 4:00 pm. The location is PH226A (that's Porter Hall). See here for updated information from the HUB (search for 16721).

Similar Courses

This course has been inspired by these offered by several of my colleagues. Here is a partial list: Some tutorials, workshops and seminars:
Page created and maintained by Jean-Francois Lalonde (email: jlalonde at cs dot cmu dot edu)
Valid HTML 4.01 Transitional