16-721: Advanced Perception

ROBOTICS INSTITUTE
CARNEGIE MELLON UNIVERSITY

Suggested Papers for 16-721: Learning-based Methods in Vision

Spring 2007

SUGGESTED PAPERS

These are just suggestions. The idea is that we would like to cover at least some papers in each category. Sometimes, you would be asked to present two related papers together, or present some background on a given paper. If you would like to present a paper that is not on the list, please talk to the instructor.

Part I: Images

Learning Features from Data

Olshausen & Field, Wavelet-like receptive fields emerge from a network that learns sparse codes for natural images. (1996) Nature, 381: 607-609. (code available)

T. Serre, L. Wolf and T. Poggio. Object recognition with features inspired by visual cortex. In: Computer Vision and Pattern Recognition (CVPR 2005), San Diego, USA, June 2005. [pdf] (code available)

Images as Texture (“Bag of Words” models)

Renninger, L.W. & Malik, J. (2004). When is scene recognition just texture recognition? Vision Research, 44, 2301-2311 (data available)

G. Csurka, C. Bray, C. Dance, and L. Fan. Visual categorization with bags of keypoints. In Workshop on Statistical Learning in Computer Vision, ECCV, pages 1-22, 2004.

J. Winn, A. Criminisi and T. Minka. Object Categorization by Learned Universal Visual Dictionary. Proc. IEEE Intl. Conf. on Computer Vision (ICCV), Beijing 2005.

All to be briefly covered by Alyosha

Distributions of Features

Y. Rubner,J. Puzicha, C. Tomasi, and J. M. Buhmann. Empirical Evaluation of Dissimilarity Measures for Color and Texture. Computer Vision and Image Understanding Journal, 84(1):25-43, October 2001.

Y. Rubner and C. Tomasi and L. J. Guibas. The Earth Mover's Distance as a Metric for Image Retrieval. International Journal of Computer Vision, 40(2) November 2000, pages 99--121. (code available)

Follow-up: E. Levina and P.J. Bickel (2001)." The Earth Mover's Distance is the Mallows Distance: Some Insights from Statistics." In Proceedings of ICCV 2001, Vancouver, Canada, p. 251-256.

Martin, Fowlkes, Malik, Learning to Detect Natural Image Boundaries Using Local Brightness, Color, and Texture Cues. IEEE Transactions on Pattern Analysis and Machine Intelligence, 26(5):530-549, May 2004. (short version) (code and data available)

Images as Scenes

A. Torralba and A. Oliva. Statistics of Natural Image Categories (2003) Network: Computation in Neural Systems. Vol. 14, 391-412.

A. Torralba, A. Oliva. Depth estimation from image structure (2002) IEEE Transactions on Pattern Analysis and Machine Intelligence. 24(9): 1226-1238.

A. Oliva, A. Torralba. Modeling the shape of the scene: a holistic representation of the spatial envelope. (2001) International Journal of Computer Vision, Vol. 42(3): 145-175.

Images as Feature Vectors

Sam Roweis & Lawrence Saul. Nonlinear dimensionality reduction by locally linear embedding. Science v.290 no.5500, Dec.22, 2000. (code available)

J. B. Tenenbaum, V. De Silva, and J. C. Langford. A global geometric framework for nonlinear dimensionality reduction. Science 290 (5500): 22 December 2000. (code available)

Applications:

Robert Pless. Using isomap to explore video sequences. In Proc. International Conference on Computer Vision (ICCV), pages 1433-1440, 2003.[ .pdf ]

Using Thousands of Images of an Object Robert Pless and Ian Simon, Computer Vision, Pattern Recognition and Image Processing,2002 (PDF, 330 kb)

Mohan, A., Winnemoller, H., Tumblin, J. and Gooch, B., “ Light Waving: Light Position Estimates from Photos Alone ”, Vol. 24, issue 3, pp. 433--438, EUROGRAPHICS2005. [PDF] Webpage:[Website]

Tenenbaum, & Freeman. Separating Style and Content with Bilinear Models. Neural Computation, 2000.

Image Matching (Distance Transforms)

Comparing Images Using the Hausdorff Distance, IEEE Trans. on Pattern Analysis and Machine Intelligence, vol. 15, no. 9, pp. 850-863, 1993 (Huttenlocker and G. Klanderman and W. Rucklidge).

G. Borgefors, "Hierarchical Chamfer Matching: A Parametric Edge Matching Algorithm", PAMI, 1988. [pdf]

Distance Transforms of Sampled Functions, Cornell Computing and Information Science Technical Report TR2004-1963, September 2004. (Huttenlocker & P. Felzenszwalb).   CODE

Erik Learned-Miller, (2005) Data Driven Image Models through Continuous Joint Alignment. IEEE Transactions on Pattern Analysis and Machine Intelligence (PAMI). (code available)

Image Correspondence (Caltech-101-fest!)

SVM-KNN: Discriminative Nearest Neighbor Classification for Visual Category Recognition. Hao Zhang, Alex Berg, Michael Maire, Jitendra Malik. CVPR, 2006.

A. Frome, Y. Singer, J. Malik. "Image Retrieval and Recognition Using Local Distance Functions". Proceedings of Neural Information Processing Systems (NIPS) 2006 (to appear). DRAFT pdf

A Berg, T Berg, J Malik, Shape Matching and Object Recognition using Low Distortion Correspondences, CVPR 2005

Alternative approach: M. Leordeanu and M. Hebert, A Spectral Technique for Correspondence Problems using Pairwise Constraints, ICCV 2005

Beyond Bags of Features: Spatial Pyramid Matching for Recognizing Natural Scene Categories. Svetlana Lazebnik, Cordelia Schmid, and Jean Ponce. CVPR, 2006

Background: The Pyramid Match Kernel:Discriminative Classification with Sets of Image Features. K. Grauman and T. Darrell. International Conference on Computer Vision (ICCV), 2005.

Lots of Data is Fun!

Zitnik & Kanade, Content-free image retrieval

Tamara L. Berg, Alexander C. Berg, Jaety Edwards, Michael Maire, Ryan White, Yee Whye Teh, Erik Learned-Miller, David A. Forsyth. Names and Faces. in submission

Sivic, J. and Zisserman, A., Video Google: A Text Retrieval Approach to Object Matching in Videos
Proceedings of the International Conference on Computer Vision (2003) PDF webpage

D. Nistér and H. Stewénius, Scalable Recognition with a Vocabulary Tree, accepted for oral presentation at CVPR 2006. PDF

Noah Snavely, Steven M. Seitz, Richard Szeliski, "Photo tourism: Exploring photo collections in 3D," ACM Transactions on Graphics (SIGGRAPH Proceedings), 25(3), 2006, 835-846., website

Part II: Objects and Parts

Segmentation

Max Wertheimer, Laws of Organization in Perceptual Forms (1923)

Weiss, Y. Segmentation using eigenvectors: a unifying view. Proceedings of the Seventh IEEE International Conference on Computer Vision, Kerkyra, Greece, 20-27 Sept. 1999.

Andrew Y. Ng, Michael I. Jordan, Yair Weiss, On Spectral Clustering: Analysis and an algorithm (2001) NIPS

Xiaofeng Ren and Jitendra Malik, Learning a Classification Model for Segmentation. in ICCV '03 (superpixel code available)

Tu and Zhu, Image Segmentation by Data-Driven Markov Chain Monte Carlo, PAMI (2002)

Boykov & Jolly, Interactive Graph Cuts for Optimal Boundary & Region Segmentation of Objects in ND Images. ICCV 01

Application: Yin Li; Jian Sun; Chi-Keung Tang; Heung-Yeung Shum, Lazy Snapping, SIGGRAPH 04

Grouping Repeated Structures

O. Boiman and M. Irani, Similarity by Composition.   Neural Information Processing Systems (NIPS), Vancouver, December 2006.

A. Kannan, J. Winn, and C. Rother. Clustering appearance and shape by learning jigsaws. NIPS 2006

Russell, B. C. , Efros, A. A. , Sivic, J. , Freeman, W. T. and Zisserman, A. Using Multiple Segmentations to Discover Objects and their Extent in Image Collections. CVPR 2006

K. Grauman and T. Darrell. Unsupervised Learning of Categories from Sets of Partially Matching Image Features. CVPR 2006

Automatic Ranking of Iconic Images [pdf], Tamara L. Berg, David A. Forsyth, U.C. Berkeley Technical Report, Jan. 2007

D. D. Lee and H. S. Seung. Learning the parts of objects by non-negative matrix factorization. Nature 401, 788-791 (1999). (code available)

Boosting Background

AdaBoost background

Friedman, J. H., Hastie, T. and Tibshirani, R., "Additive Logistic Regression: a Statistical View of Boosting." (Aug. 1998)

From Features to Objects

A. Torralba, K. P. Murphy, and W. T. Freeman. Sharing visual features for multiclass and multiview object detection. IEEE Transactions on Pattern Analysis and Machine Intelligence (PAMI), In press.

A. Opelt, A. Pinz, and A. Zisserman. Incremental learning of object detectors using a visual shape alphabet. CVPR 2006.

V. Ferrari, L. Fevrier, F. Jurie, and C. Schmid "Groups of Adjacent Contour Segments for Object Detection", INRIA Technical Report, Grenoble, September 2006.

Combined Object Categorization and Segmentation with an Implicit Shape Model, B. Leibe, A. Leonardis, and B. Schiele. in ECCV'04 Workshop on Statistical Learning in Computer Vision, Prague, May 2004.

B Leibe, E Seemann, B Schiele, Pedestrian Detection in Crowded Scenes, CVPR 2005

H. Schneiderman and T. Kanade. Object Detection Using the Statistics of Parts. International Journal of Computer Vision, 2004 (demo available)

Viola, Jones, Robust Real-time Object Detection (2001) Second International Workshop on Statistical and Computational Theories of Vision (short version)

Dalal, Triggs, Histograms of Oriented Gradients for Human Detection, CVPR 2005 (data available)

M. Marszalek and C. Schmid. Spatial weighting for bag-of-features. In Proceedings of the IEEE International Conference on Computer Vision and Pattern Recognition (CVPR), 2006.

PASCAL-fest

J. Zhang, M. Marszalek, S. Lazebnik, and C. Schmid. Local features and kernels for classification of texture and object categories: A comprehensive study. In Proceedings of the Beyond Patches workshop, in conjunction with CVPR2006, 2006.

PASCAL Challenge 2006 Tech Report

pick any papers you feel did/might do well, e.g. from here

Recognition with Segmentation

Eran Borenstein, Shimon Ullman. Class-Specific, Top-Down Segmentation. ECCV 2002

Eran Borenstein, Shimon Ullman: Learning to Segment. ECCV 2004

E. Borenstein, E. Sharon, S. Ullman, Combining Top-Down and Bottom-Up Segmentation, Proceedings IEEE workshop on Perceptual Organization in Computer Vision, IEEE Conference on Computer Vision and Pattern Recognition, Washington, DC, June 2004.

Borenstein and Malik, Shape Guided Object Segmentation, CVPR 2006

Stella X. Yu and Jianbo Shi, Object-Specific Figure-Ground Segregation, CVPR 2003

Pinar Duygulu, Kobus Barnard, Nando de Freitas, and David Forsyth. Object recognition as machine translation: Learning a lexicon for a fixed image vocabulary. ECCV 2002.

Scenes, Context, and Image Parsing

A. Torralba, K. P. Murphy, W. T. Freeman and M. A. Rubin, Context-based vision system for place and object recognition, ICCV 2003

A. Torralba, K. P. Murphy and W. T. Freeman (2004), Contextual Models for Object Detection using Boosted Random Fields. To appear in Adv. in Neural Information Processing Systems (NIPS)

Hoiem, Efros, Hebert, Geometric Context from a Single Image, ICCV 2005 (code available)

X. He, R. Zemel, and M. Carreira-Perpinan. Multiscale Conditional Random Fields for Image Labeling. CVPR 2004.

Ashutosh Saxena, Sung Chung, and Andrew Y. Ng. Learning Depth from Single Monocular Images. NIPS 2005.

Z Tu, X Chen, AL Yuille, SC Zhu. Image Parsing: Unifying Segmentation, Detection, and Recognition. International Journal of Computer Vision, 2005

X. Ren, C. Fowlkes, J. Malik. "Figure/Ground Assignment in Natural Images", ECCV, Graz, Austria, (May 2006). [pdf]

3D City Modeling Using Cognitive Loops, N. Cornelis, B. Leibe, K. Cornelis, L. Van Gool. in Third International Symposium on 3D Data Processing, Visualization, and Transmission (3DPVT'06), Chapel Hill, USA, June 2006.

Face Modeling / Recognition

T. F. Cootes, G. J. Edwards, C. J. Taylor, Active Appearance Models, ECCV’98

Sinha, P., Balas, B.J., Ostrovsky, Y., & Russell, R. Face recognition by humans: 20 results all computer vision researchers should know about.   (under review)

a face recognition paper of your choice (Ralph?)

Most recently updated on January. 16, 2007 by Alyosha Efros