Computer Vision Misc Reading Group
2007 Archived Schedule

Date Presenter Description
1/3/2007 David Bradley Today I will be presenting a short overview of several papers from NIPS 06. They are:

Randomized Clustering Forests for Building Fast and Discriminative Visual Vocabularies
Frank Moosmann, Bill Triggs, Frederic Jurie

Image Retrieval and Recognition Using Local Distance Functions
A. Frome, Y. Singer, J. Malik.

Multi-Task Feature Learning
Andreas Argyriou, Theos Evgeniou, Massimiliano Pontil You can also watch Andreas Argyriou present his paper here.

Boosting Structured Prediction for Imitation Learning
Nathan Ratliff, David Bradley, Drew Bagnell, Joel Chestnutt

1/10/2007 Caroline Pantofaru NIPS Overview
1/17/2007 Marius Leordeanu Learning to Model Spatial Dependency: Semi-Supervised Discriminative Random Fields
from NIPS 06, by Chi-Hoon Lee et all

with the background paper:

Semi supervised conditional random fields for improved sequence segmentation and labeling
by F. Jiao, S. Wang, C. Lee, R. Greiner and D. Schuurmans

and another background paper:

Semi-Supervised Learning by Entropy Minimization
by Y. Grandvalet and Y. Bengio, from NIPS 2004

Andrew Stein I'll focus on the following paper, which won an Outstanding Student Paper Award at the most recent NIPS conference:

Analysis of Contour Motions,
Ce Liu, William T. Freeman, and Edward H. Adelson

In addition, I will likely draw on the following papers, which are also related to motion of boundaries and occlusion events:

  • Disambiguating Visual Motion by Form-Motion Interaction -- a Computational Model,
    by Bayerl, P. & Neumann, H., in IJCV, April 2007.
  • Beyond junctions: nonlocal form constraints on motion interpretation,
    by Josh McDermott, Yair Weiss, and Edward H. Adelson, in Perception 2001.
  • 1/31/2007 Tom Stepleton This Wednesday I'll be presenting a tutorial introduction to Dirichlet Process Mixture Models (DPMs), the flexible, nonparametric Bayesian method for clustering with variable numbers of clusters (among other things). I'll introduce a number of terms and metaphors people use to discuss DPMs, and I'll derive the so-called Chinese Restaurant Process, the Markov Chain Monte Carlo method for DPM inference. Finally, I'll describe one or two applications to computer vision, including recent work that integrates DPMs and MRFs for smooth image segmentation.

    This talk comes with a guarantee: once it's done, you'll be able to go back to your office or cube and implement a Dirichlet Process Mixture Model on your own---or your money back!

    I will cover topics from some of the following papers---the first is a terrific reference, and the rest can serve as a "seed bibliography" on the subject:

  • R. Neal, "Markov chain sampling methods for Dirichlet process mixture models." J. Computational and Graphical Statistics, 2000.
  • Orbanz and Buhmann, "Smooth Image Segmentation by Nonparametric Bayesian Inference." ECCV 2006.
  • Zhu, Ghahramani, and Lafferty, "Time-Sensitive Dirichlet Process Mixture Models." CMU-CALD-05-104, May 2005.
  • Beal, Ghahramani, and Rasmussen, "The Infinite Hidden Markov Model." NIPS 2001.
  • E.B. Sudderth, A. Torralba, W.T. Freeman, A.S. Willsky, "Describing Visual Scenes using Transformed Dirichlet Processes." NIPS 2005.
  • D.M. Blei, M.I. Jordan, "Variational methods for the Dirichlet Process." ICML 2004.
  • (and for background) D.M. Blei, A.Y. Ng, M.I. Jordan, "Latent Dirichlet Allocation." J. Machine Learning Research 3:993-1022, 2005.
  • 2/7/2007 Ranjith Unnikrishnan This week, I'll attempt a tutorial on Fields of Experts (Roth & Black), a framework for learning expressive image priors through large-neighborhood MRFs. I'll describe a recent (ICML '06) extension to multi-channel images, and briefly cover their applications in image denoising, in-painting, and optical flow computation.

    The talk will draw from the following papers (listed in reverse chronological order):

  • Learning High-Order MRF Priors of Color Images, J.McAuley et al, ICML '06
  • Denoising archival films using a learned Bayesian model, T.Moldovan et al, ICIP '06
  • On the Spatial Statistics of Optical Flow, S.Roth & M.Black, ICCV '05 [Marr Prize honorable mention]
  • Fields of Experts: A Framework for Learning Image Priors, S.Roth and M.Black, CVPR '05
  • 2/14/2007 Jiang Ni Learning Bayesian networks from data: an information-theory based approach, Jie Cheng, Russell Greiner, Jonathan Kelly, David Bell, Weiru Liu, The Artificial Intelligence Journal, Volume 137, Pages 43-90, 2002.

    This paper provides constraint-based algorithms for learning Bayesian network structures from data, that require only polynomial numbers of conditional independence (CI) tests. The exponential complexity on the number of CI tests is avoided by using some nice heuristics.

    2/21/2007 James Hays (non-public paper -- see email)
    2/28/2007 Fernando de la Torre CVPR Submissions -- see email.
    3/7/2007 Yan Ke I'll be presenting some recent work in shape classification. In particular, I'll be presenting the following three papers:
  • Integral Invariants for Shape Matching
    Siddharth Manay, Daniel Cremers, Byung-Woo Hong, Anthony J. Yezzi Jr., and Stefano Soatto. PAMI Oct '06.
  • Shape Representation and Classification Using the Poisson Equation
    Lena Gorelick, Meirav Galun, Eitan Sharon, Ronen Basri, and Achi Brandt. PAMI Dec '06.
  • Shape Classification Using the Inner-Distance
    Haibin Ling, and David W. Jacobs. PAMI Feb '07.
  • 3/14/2007
    Derek Hoiem I will summarize the recent computational theories of human attentional vision in static scenes. Topics covered: basic physiology and visual mechanisms, bottom-up attention (saliency), and top-down attention (purposeful search). I will also show a couple attention/change-blindness video demos. Links to two representative papers are below:

  • Itti-Koch-Niebur 1998
  • Torralba Eye Movements paper
  • 3/21/2007 Brian Potetz Paper will be sent via email this week. Please do not re-distribute!

    Abstract: Belief propagation over pairwise connected Markov Random Fields has become a widely used approach, and has been successfully applied to several important computer vision problems. However, pairwise interactions are often insufficient to capture the full statistics of the problem. Higher-order interactions are sometimes required. Unfortunately, the complexity of belief propagation is exponential in the size of the largest clique. In this paper, we introduce a new technique to compute belief propagation messages in time linear with respect to clique size for a large class of potential functions over real-valued variables.

    We demonstrate this technique in two applications. First, we perform efficient inference in graphical models where the spatial prior of natural images is captured by 2x2 cliques. This approach shows significant improvement over the commonly used pairwise-connected models, and may benefit a variety of applications using belief propagation to infer images or range images. Finally, we apply these techniques to shape-from-shading and demonstrate significant improvement over previous methods, both in quality and in flexibility.

    3/28/2007 Alyosha Efros See email for details and papers. Please do not post or link to the distributed papers.
    4/4/2007 Jon Huang I'll do an introduction to Gaussian Processes and go over the following papers which use the Gaussian Process Latent Variable Model:

    Gaussian Process Latent Variable Models for Visualisation of High Dimensional Data
    Neil D. Lawrence

    WiFi-SLAM Using Gaussian Process Latent Variable Models
    Brian Ferris, Dieter Fox, Neil Lawrence

    Gaussian Process Dynamical Models
    Jack M.Wang, David J. Fleet, Aaron Hertzmann

    3D People Tracking with Gaussian Process Dynamical Models
    Raquel Urtasun, David J. Fleet, Pascal Fua

    4/11/2007 Henry Kang See email for details. Please do not distribute the paper.
    4/18/2007 Sanjeev Koppal I will be presenting Image-based Material Editing
    4/25/2007 Ankur Datta I will be presenting the following CVPR'07 oral paper:

    Inferring Temporal Order of Images From 3D Structure
    Grant Schindler, Sing Bing Kang, Frank Dellaert

    In this paper, we describe a technique to temporally sort a collection of photos that span many years. By reasoning about persistence of visible structures, we show how this sorting task can be formulated as a constraint satisfaction problem (CSP). Casting this problem as a CSP allows us to efficiently find a suitable ordering of the images despite the large size of the solution space (factorial in the number of images) and the presence of occlusions. We present experimental results for photographs of a city acquired over a one hundred year period.

    Slides (PDF)
    Jean-Francois Lalonde I'll be talking about a paper accepted at CVPR 2007, titled "Learning Color Names from Real-World Images", by Joost van de Weijer, Cordelia Schmid and Jakob Verbeek at INRIA Rhones-Alpes.

    The paper will be distributed by email. Please do not redistribute!

    Within a computer vision context color naming is the action of assigning linguistic color labels to image pixels. In general, research on color naming applies the follow- ing paradigm: a collection of color chips is labelled with color names within a well-defined experimental setup by multiple test subjects. The collected data set is subsequently used to label RGB values in real-world images with a color name. Apart from the fact that this collection process is time consuming, it is unclear to what extent color naming within a controlled setup is representative for color naming in real-world images. In this paper, we propose to learn color names from real-world images. We avoid test sub- jects by using Google Image to collect a data set. From the data set color names can be learned using a PLSA model tailored to this task. Experimental results show that color names learned from real-world images significantly outper- form color names learned from labelled color chips on re- trieval and classification.

    Tomasz Malisiewicz I will be giving a short overview of Supervised Distance Metric Learning techniques as well as discussing the following paper,
    Image Retrieval and Recognition Using Local Distance Functions.
    A. Frome, Y. Singer, J. Malik.
    Proceedings of Neural Information Processing Systems (NIPS) 2006

    In this paper we introduce and experiment with a framework for learning local perceptual distance functions for visual recognition. We learn a distance function for each training image as a combination of elementary distances between patch-based visual features. We apply these combined local distance functions to the tasks of image retrieval and classification of novel images. On the Caltech 101 object recognition benchmark, we achieve 60.3% mean recognition across classes using 15 training images per class, which is better than the best published performance by Zhang, et al.

    Stano Funiak I will discuss the following paper from the upcoming CVPR:

    Approximate Nearest Subspace Search with Applications to Pattern Recognition
    by Ronen Basri, Tal Hassner, and Lihi Zelnik-Manor

    Linear and affine subspaces are commonly used to describe appearance of objects under different lighting, viewpoint, articulation, and identity. A natural problem arising from their use is ?given a query image portion represented as a point in some high dimensional space ?find a subspace near to the query. This paper presents an efficient solution to the approximate nearest subspace problem for both linear and affine subspaces. Our method is based on a simple reduction to the problem of nearest point search, and can thus employ tree based search or locality sensitive hashing to find a near subspace. Further speedup may be achieved by using random projections to lower the dimensionality of the problem. We provide theoretical proofs of correctness and error bounds of our construction and demonstrate its capabilities on synthetic and real data. Our experiments demonstrate that an approximate nearest subspace can be located significantly faster than the exact nearest subspace, while at the same time it can find better matches compared to a similar search on points, in the presence of variations due to viewpoint, lighting etc.

    5/23/2007 No Meeting Meeting cancelled for Black Friday. Mohit will give this talk on July 11 instead.
    No Meeting Due to so many people being away, plus CVPR, there will be no meetings for a few weeks.
    6/27/2007 Everyone Special CVPR Overview Meeting organized by Alyosha.
    7/4/2007 July 4th Holiday No meeting.
    7/11/2007 Postponed Postponed til next week.
    7/18/2007 Pete Barnum In many cases, computer vision focuses on pictures taken with unknown camera parameters. We've also seen cases where the camera is modified to capture additional information, such as with a modulated shutter. In my talk, I will discuss a few papers that go even further, and attempt to use feedback loops to optimize the camera settings during acquisition.

    I'll discuss parts of these three papers (not the hardware, except for the basic theory):

  • Adaptive flat multiresolution multiplexed computational imaging architecture utilizing micromirror arrays to steer subimager fields of view
    Marc P. Christensen et al.
    Applied Optics 45(13) May 2006
  • Programmable Imaging: Towards a Flexible Camera
    Shree K. Nayar, Vlad Branzoi, and Terry E. Boult
    CVPR 2004
  • View-dependent Non-uniform Sampling for Image-Based Rendering
    Cha Zhang and Tsuhan Chen
    ICIP 2004

    And if you're especially interested, you can also look at:

  • "DMD-based Bloom Control for Intensified Imaging Systems?/B>
    J. Castracane and M. Gutin
    Diffractive and Holographic Technologies, Systems, and Spatial Light Modulators IV 1999
  • “ACTIVE-EYES: an adaptive pixel-by-pixel image-segmentation sensor architecture for high-dynamic-range hyperspectral imaging?/B>
    Marc P. Christensen, et al.
    Applied Optics 41(29) October 2002
  • 7/25/2007 Mohit Gupta I will be presenting the following paper from ECCV 2006:

    Confocal Stereo (Project Page)
    by Samuel W. Hasinoff and Kiriakos N. Kutulakos

    This paper got the Longuet-Higgins Best Paper Award, Honorable Mention.

    We present confocal stereo, a new method for computing 3D shape by controlling the focus and aperture of a lens. The method is specifically designed for reconstructing scenes with high geometric complexity or fine-scale texture. To achieve this, we introduce the confocal constancy property, which states that as the lens aperture varies, the pixel intensity of a visible in-focus scene point will vary in a scene-independent way, that can be predicted by prior radiometric lens calibration. The only requirement is that incoming radiance within the cone subtended by the largest aperture is nearly constant. First, we develop a detailed lens model that factors out the distortions in high resolution SLR cameras (12MP or more) with large-aperture lenses (e.g., f1.2). This allows us to assemble an AxF aperture-focus image (AFI) for each pixel, that collects the undistorted measurements over all A apertures and F focus settings. In the AFI representation, confocal constancy reduces to color comparisons within regions of the AFI, and leads to focus metrics that can be evaluated separately for each pixel. We propose two such metrics and present initial reconstruction results for complex scenes, as well as for a scene with known ground-truth shape.

    8/1/2007 David Bradley One of the great strengths of the human visual system is it's ability to share common hardware across many different tasks, and to learn from just a few labeled examples for each task. I will present a survey of multi-task learning algorithms that attempt to replicate that ability to share computation, and labeled data among a group of tasks, and describe how they are being used in vision applications. The papers I will talk about include:

  • Self-taught learning: Transfer learning from unlabeled data, Rajat Raina, Alexis Battle, Honglak Lee, Benjamin Packer and Andrew Y. Ng. To appear in Proceedings of the Twenty-fourth International Conference on Machine Learning, 2007.
  • Learning Visual Representations using Images with Captions, A. Quattoni, M. Collins, T. Darrell, CVPR 2007.
  • Sharing visual features for multiclass and multiview object detection, A. Torralba, K. P. Murphy and W. T. Freeman, IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 29, no. 5, pp. 854-869, May, 2007.
  • A Framework for Learning Predictive Structures from Multiple Tasks and Unlabeled Data, Rie K. Ando and Tong Zhang. JMLR, 6:1817-1853, 2005.
  • A fast learning algorithm for deep belief nets. Hinton, G. E., Osindero, S. and Teh, Y.
  • 8/8/2007 Christopher Geyer I am interested in large-scale data association problems, and on Wednesday I will talk about the following papers:

    The Identity Management Kalman Filter (IMKF)
    B. Schumitsch, S. Thrun, L. Guibas, K. Olukotun

    Abstract: Tracking posteriors estimates for problems with data association uncertainty is one of the big open problems in the literature on filtering and tracking. This paper presents a new filter for online tracking of many individual objects with data association ambiguities. It tightly integrates the continuous aspects of the problem -- locating the objects -- with the discrete aspects -- the data association ambiguity. The key innovation is a probabilistic information matrix that efficiently does identity management, that is, it links entities with internal tracks of the filter, enabling it to maintain a full posterior over the system amid data association uncertainties. The filter scales quadratically in complexity, just like a conventional Kalman filter. We derive the algorithm formally and present large-scale results.

    Second, if I get to it:
    Multi-object tracking with representations of the symmetric group.
    R. Kondor, A. Howard and T. Jebara: AISTATS 2007.

    8/15/2007 Marius Leordeanu I am thinking of talking about the following papers, not sure which one will be the main focus:

    1. A. Hoogs, R. Collins, B. Kaucic and J. Mundy. A Common Set of Perceptual Observables for Grouping, Figure-Ground Discrimination and Texture Classification. In IEEE Transactions on Pattern Analysis and Machine Intelligence, Special Section on Perceptual Organization in Computer Vision, 25(4)

    2. Learning to segment images using region-based perceptual features, Kaufhold, J. Hoogs, A., CVPR 2004

    3. Supervised learning of large perceptual organization: graph spectral partitioning and learning automata, Sarkar, S.; Soundararajan, P., PAMI 2000, vol 22, no 5.

    8/22/2007 Caroline Pantofaru Using High-Level Visual Information for Color Constancy
    Joost van de Weijer, Cordelia Schmid, Jakob Verbeek

    We propose to use high-level visual information to improve illuminant estimation. Several illuminant estimation approaches are applied to compute a set of possible illuminants. For each of them an illuminant color corrected image is evaluated on the likelihood of its semantic content: is the grass green, the road grey, and the sky blue, in correspondence with our prior knowledge of the world. The illuminant resulting in the most likely semantic composition of the image is selected as the illuminant color. To evaluate the likelihood of the semantic content, we apply probabilistic latent semantic analysis. The image is modelled as a mixture of semantic classes, such as sky, grass, road, and building. The class description is based on texture, position and color information. Experiments show that the use of high-level information improves illuminant estimation over a purely bottom-up approach. Furthermore, the proposed method is shown to significantly improve semantic class recognition performance.

    8/29/2007 Ranjith Unnikrishnan This week, I'll survey the Perspective n-Point (PnP) problem, stated as: Given n correspondences between 3D points in world coordinates and their 2D projections in a calibrated camera, find the rigid transform relating the world and camera frames.

    The highlight will be a ICCV '07 paper [1] that uses a clever and simple math trick to give a non-iterative O(n) solution to the problem, as or more accurate than state-of-the-art methods that are O(n^5) or more! This should be of particular interest to people doing sensor calibration, and model-based pose estimation and tracking.

    [1] Accurate Non-iterative O(n) solution to the PnP problem, F.Moreno-Noguer, V.Lepetit and P.Fua, ICCV '07 preprint

    and it's closest competitor:
    [2] Fast and Globally Convergent Pose Estimation from Video Images, C.P.Lu, G.Hager and E.Mjolsness, PAMI 2000

    9/5/2007 Cancelled Cancelled in favor of Special VASC Seminar.
    9/12/2007 Jiang Ni I will talk about my research: Face View Synthesis Using A Single Image. Face view synthesis involves using one view of a face to artificially render another view. The fact that the input is only a single image, makes the problem very difficult. We observe that the statistical dependency varies among different groupings of pixels in the 2D images and use a Bayesian Network to represent such a sparse structure. This is an on-going research so your feedback or discussion are welcome. Here is a related work by Vetter if you are interested.
    9/19/2007 Gunhee Kim I'm thinking of presenting two papers from ETH Zurich which apply data mining techniques to computer vision problems - Recognition and Video Mining.

    One is a ICCV 2007 Paper:
    Till Quack, Vittorio Ferrari, Bastian Leibe and Luc Van Gool, Efficient Mining of Frequent and Distinctive Feature Configurations (to appear) ICCV 2007, Rio de Janeiro, Brazil

    The other is:
    Till Quack, Vittorio Ferrari, Luc Van Gool, Video Mining with Frequent itemset Configurations. CIVR 2006, Tempe, AZ, USA, July 2006

    NSH 1507
    Fernando de la Torre ***Note special location this week: NSH 1507 (still 4:00)***

    Learning Graph Matching, Tiberio Caetano, Li Cheng, Quoc Le, Alex Smola, ICCV 2007.

    AT 4:30!
    Yan Ke *** Note that we're switching to a 4:30 start time! ***

    I'll be giving a practice job talk about my work in using volumetric features for event detection. It's based on the following papers:

  • Event Detection in Crowded Videos
  • Spatio-temporal Shape and Flow Correlation for Action Recognition
  • Efficient Visual Event Detection using Volumetric Features

    Real-world actions occur often in crowded, dynamic environments. This poses a difficult challenge for current approaches to video event detection because it is difficult to segment the actor from the background due to distracting motion from other objects in the scene. We propose a technique for event recognition in crowded videos that reliably identifies actions in the presence of partial occlusion and background clutter. Our approach is based on three key ideas: (1) we efficiently match the volumetric representation of an event against over-segmented spatio-temporal video volumes; (2) we augment our shape-based features using flow; (3) rather than treating an event template as an atomic entity, we separately match by parts (both in space and time), enabling robustness against occlusions and actor variability. Our experiments on human actions, such as picking up a dropped object or waving in a crowd show reliable detection with few false positives.

  • 10/10/2007 James Hays TBA
    10/17/2007 No Meeting ICCV - Meeting Cancelled.
    10/24/2007 No Meeting Intel Open House. ICCV Overview postponed to next week.
    10/31/2007 Alyosha Efros ICCV Overview
    11/7/2007 Pete Barnum Inverse Shade Trees for Non-Parametric Material Representation and Editing, by Jason Lawrence, Aner Ben-Artzi, Christopher DeCoro, Wojciech Matusik, Hanspeter Psterr, Ravi Ramamoorthi, and Szymon Rusinkiewicz. SIGGRAPH 2006

    AppWand: Editing Measured Materials using Appearance-Driven Optimization, by Fabio Pellacini and Jason Lawrence. SIGGRAPH 2007

    11/14/2007 Tom Stepleton This meeting has been cancelled. Tom will give his talk on a later date.

    3D generic object categorization, localization, and pose estimation
    S. Savarese and L. Fei-Fei

    This paper is about building 3D part-based models of object categories. An object is comprised of a collection of 2D "canonical part" images (e.g. a car's bumper) linked to each other through coordinate transforms: imagine arranging polaroids of object parts on a sphere surrounding the object. The technique learns the model of a class in an unsupervised way from still images of instances of the class. I like this model because I think the brain represents objects in a similar way, and if I have time to prepare the information I might say something about that.

    11/21/2007 Thanksgiving No meeting.
    11/28/2007 Yaser Sheikh I'll be presenting a recent paper of mine from CVPR 2007:
    Spacetime Geometry of Gallilean Cameras

    A projection model is presented for cameras moving at constant velocity (which we refer to as Galilean cameras). We introduce the concept of spacetime projection and show that perspective imaging and linear pushbroom imaging are specializations of the proposed model. The epipolar geometry between two such cameras is developed and we derive the Galilean fundamental matrix. We show how six different "fundamental" matrices can be directly recovered from the Galilean fundamental matrix including the classic fundamental matrix, the Linear Pushbroom fundamental matrix and a fundamental matrix relating Epipolar Plane Images. To estimate the parameters of this fundamental matrix and the mapping between videos in the case of planar scenes we describe linear algorithms and report experimental performance of these algorithms.

    12/5/2007 Andrew Stein Due to the CVPR extension, there will be no meeting today.
    12/12/2007 Everyone "CVPR Decompression Fest"