Today I will be presenting a short overview of several papers from NIPS
06. They are:
Randomized Clustering Forests for Building Fast and Discriminative
Image Retrieval and Recognition Using Local Distance Functions
Boosting Structured Prediction for Imitation Learning
|1/10/2007||Caroline Pantofaru||NIPS Overview|
Learning to Model Spatial Dependency: Semi-Supervised Discriminative Random Fields
from NIPS 06, by Chi-Hoon Lee et all
with the background paper:
Semi supervised conditional random fields for improved sequence
segmentation and labeling
and another background paper:
Semi-Supervised Learning by Entropy Minimization
I'll focus on the following paper, which won an Outstanding Student Paper Award at the most recent NIPS conference:
Analysis of Contour Motions,
In addition, I will likely draw on the following papers, which are also related to motion of boundaries and occlusion events:
by Bayerl, P. & Neumann, H., in IJCV, April 2007.
by Josh McDermott, Yair Weiss, and Edward H. Adelson, in Perception 2001.
This Wednesday I'll be presenting a tutorial introduction to Dirichlet
Process Mixture Models (DPMs), the flexible, nonparametric Bayesian method
for clustering with variable numbers of clusters (among other things).
I'll introduce a number of terms and metaphors people use to discuss DPMs,
and I'll derive the so-called Chinese Restaurant Process, the Markov Chain
Monte Carlo method for DPM inference. Finally, I'll describe one or two
applications to computer vision, including recent work that integrates
DPMs and MRFs for smooth image segmentation.
This talk comes with a guarantee: once it's done, you'll be able to go back to your office or cube and implement a Dirichlet Process Mixture Model on your own---or your money back!
I will cover topics from some of the following papers---the first is a terrific reference, and the rest can serve as a "seed bibliography" on the subject:
This week, I'll attempt a tutorial on Fields of Experts (Roth & Black), a framework for learning expressive image priors through large-neighborhood MRFs. I'll describe a recent (ICML '06) extension to multi-channel images, and briefly cover their applications in image denoising, in-painting, and optical flow computation.
The talk will draw from the following papers (listed in reverse chronological order):
Learning Bayesian networks from data: an information-theory based approach,
Jie Cheng, Russell Greiner, Jonathan Kelly, David Bell, Weiru
Liu, The Artificial Intelligence Journal, Volume 137, Pages 43-90, 2002.
This paper provides constraint-based algorithms for learning Bayesian network structures from data, that require only polynomial numbers of conditional independence (CI) tests. The exponential complexity on the number of CI tests is avoided by using some nice heuristics.
|2/21/2007||James Hays||(non-public paper -- see email)|
|2/28/2007||Fernando de la Torre||CVPR Submissions -- see email.|
I'll be presenting some recent work in shape classification. In particular,
I'll be presenting the following three papers:
Siddharth Manay, Daniel Cremers, Byung-Woo Hong, Anthony J. Yezzi Jr., and Stefano Soatto. PAMI Oct '06.
Lena Gorelick, Meirav Galun, Eitan Sharon, Ronen Basri, and Achi Brandt. PAMI Dec '06.
Haibin Ling, and David W. Jacobs. PAMI Feb '07.
I will summarize the recent computational theories of human attentional
vision in static scenes. Topics covered: basic physiology and visual
mechanisms, bottom-up attention (saliency), and top-down attention
(purposeful search). I will also show a couple attention/change-blindness
video demos. Links to two representative papers are below:
Paper will be sent via email this week. Please do not re-distribute!
Abstract: Belief propagation over pairwise connected Markov Random Fields has become a widely used approach, and has been successfully applied to several important computer vision problems. However, pairwise interactions are often insufficient to capture the full statistics of the problem. Higher-order interactions are sometimes required. Unfortunately, the complexity of belief propagation is exponential in the size of the largest clique. In this paper, we introduce a new technique to compute belief propagation messages in time linear with respect to clique size for a large class of potential functions over real-valued variables.
We demonstrate this technique in two applications. First, we perform efficient inference in graphical models where the spatial prior of natural images is captured by 2x2 cliques. This approach shows significant improvement over the commonly used pairwise-connected models, and may benefit a variety of applications using belief propagation to infer images or range images. Finally, we apply these techniques to shape-from-shading and demonstrate significant improvement over previous methods, both in quality and in flexibility.
|3/28/2007||Alyosha Efros||See email for details and papers. Please do not post or link to the distributed papers.|
I'll do an introduction to Gaussian Processes and go over
the following papers which use the Gaussian Process Latent
WiFi-SLAM Using Gaussian Process Latent Variable Models
Gaussian Process Dynamical Models
3D People Tracking with Gaussian Process Dynamical Models
|4/11/2007||Henry Kang||See email for details. Please do not distribute the paper.|
|4/18/2007||Sanjeev Koppal||I will be presenting Image-based Material Editing|
I will be presenting the following CVPR'07 oral paper:
Inferring Temporal Order of Images From 3D Structure
I'll be talking about a paper accepted at CVPR 2007, titled "Learning
Color Names from Real-World Images", by Joost van de Weijer, Cordelia
Schmid and Jakob Verbeek at INRIA Rhones-Alpes.
The paper will be distributed by email. Please do not redistribute!
I will be giving a short overview of Supervised Distance Metric Learning techniques as well as discussing the following paper,
Image Retrieval and Recognition Using Local Distance Functions.
A. Frome, Y. Singer, J. Malik.
Proceedings of Neural Information Processing Systems (NIPS) 2006
I will discuss the following paper from the upcoming CVPR:
Approximate Nearest Subspace Search with Applications to Pattern Recognition
|5/23/2007||No Meeting||Meeting cancelled for Black Friday. Mohit will give this talk on July 11 instead.|
|No Meeting||Due to so many people being away, plus CVPR, there will be no meetings for a few weeks.|
|6/27/2007||Everyone||Special CVPR Overview Meeting organized by Alyosha.|
|7/4/2007||July 4th Holiday||No meeting.|
|7/11/2007||Postponed||Postponed til next week.|
In many cases, computer vision focuses on pictures taken with unknown camera parameters.
We've also seen cases where the camera is modified to capture additional information,
such as with a modulated shutter. In my talk, I will discuss a few papers that go even further,
and attempt to use feedback loops to optimize the camera settings during acquisition.
I'll discuss parts of these three papers (not the hardware, except for the basic theory):
Marc P. Christensen et al.
Applied Optics 45(13) May 2006
Shree K. Nayar, Vlad Branzoi, and Terry E. Boult
Cha Zhang and Tsuhan Chen
And if you're especially interested, you can also look at:
J. Castracane and M. Gutin
Diffractive and Holographic Technologies, Systems, and Spatial Light Modulators IV 1999
Marc P. Christensen, et al.
Applied Optics 41(29) October 2002
I will be presenting the following paper from ECCV 2006:
This paper got the Longuet-Higgins Best Paper Award, Honorable Mention.
One of the great strengths of the human visual system is
it's ability to share common hardware across many different tasks,
and to learn from just a few labeled examples for each task.
I will present a survey of multi-task learning algorithms that attempt
to replicate that ability to share computation, and labeled data among a
group of tasks, and describe how they are being used in vision applications.
The papers I will talk about include:
I am interested in large-scale data association problems, and on Wednesday I will talk about the following papers:
Abstract: Tracking posteriors estimates for problems with data association uncertainty is one of the big open problems in the literature on filtering and tracking. This paper presents a new filter for online tracking of many individual objects with data association ambiguities. It tightly integrates the continuous aspects of the problem -- locating the objects -- with the discrete aspects -- the data association ambiguity. The key innovation is a probabilistic information matrix that efficiently does identity management, that is, it links entities with internal tracks of the filter, enabling it to maintain a full posterior over the system amid data association uncertainties. The filter scales quadratically in complexity, just like a conventional Kalman filter. We derive the algorithm formally and present large-scale results.
Second, if I get to it:
I am thinking of talking about the following papers, not sure which one
will be the main focus:
1. A. Hoogs, R. Collins, B. Kaucic and J. Mundy. A Common Set of Perceptual Observables for Grouping, Figure-Ground Discrimination and Texture Classification. In IEEE Transactions on Pattern Analysis and Machine Intelligence, Special Section on Perceptual Organization in Computer Vision, 25(4)
2. Learning to segment images using region-based perceptual features, Kaufhold, J. Hoogs, A., CVPR 2004
3. Supervised learning of large perceptual organization: graph spectral partitioning and learning automata, Sarkar, S.; Soundararajan, P., PAMI 2000, vol 22, no 5.
Using High-Level Visual Information for Color Constancy
Joost van de Weijer, Cordelia Schmid, Jakob Verbeek
We propose to use high-level visual information to improve illuminant estimation. Several illuminant estimation approaches are applied to compute a set of possible illuminants. For each of them an illuminant color corrected image is evaluated on the likelihood of its semantic content: is the grass green, the road grey, and the sky blue, in correspondence with our prior knowledge of the world. The illuminant resulting in the most likely semantic composition of the image is selected as the illuminant color. To evaluate the likelihood of the semantic content, we apply probabilistic latent semantic analysis. The image is modelled as a mixture of semantic classes, such as sky, grass, road, and building. The class description is based on texture, position and color information. Experiments show that the use of high-level information improves illuminant estimation over a purely bottom-up approach. Furthermore, the proposed method is shown to significantly improve semantic class recognition performance.
This week, I'll survey the Perspective n-Point (PnP) problem, stated as:
Given n correspondences between 3D points in world coordinates and their
2D projections in a calibrated camera, find the rigid transform relating
the world and camera frames.
The highlight will be a ICCV '07 paper  that uses a clever and simple math trick to give a non-iterative O(n) solution to the problem, as or more accurate than state-of-the-art methods that are O(n^5) or more! This should be of particular interest to people doing sensor calibration, and model-based pose estimation and tracking.
 Accurate Non-iterative O(n) solution to the PnP problem, F.Moreno-Noguer, V.Lepetit and P.Fua, ICCV '07 preprint
and it's closest competitor:
|9/5/2007||Cancelled||Cancelled in favor of Special VASC Seminar.|
|9/12/2007||Jiang Ni||I will talk about my research: Face View Synthesis Using A Single Image. Face view synthesis involves using one view of a face to artificially render another view. The fact that the input is only a single image, makes the problem very difficult. We observe that the statistical dependency varies among different groupings of pixels in the 2D images and use a Bayesian Network to represent such a sparse structure. This is an on-going research so your feedback or discussion are welcome. Here is a related work by Vetter if you are interested.|
I'm thinking of presenting two papers from ETH Zurich which apply data
mining techniques to computer vision problems - Recognition and Video
One is a ICCV 2007 Paper:
The other is:
||Fernando de la Torre||
***Note special location this week: NSH 1507 (still 4:00)***
Learning Graph Matching, Tiberio Caetano, Li Cheng, Quoc Le, Alex Smola, ICCV 2007.
*** Note that we're switching to a 4:30 start time! ***
I'll be giving a practice job talk about my work in using volumetric features for event detection. It's based on the following papers:
|10/17/2007||No Meeting||ICCV - Meeting Cancelled.|
|10/24/2007||No Meeting||Intel Open House. ICCV Overview postponed to next week.|
|10/31/2007||Alyosha Efros||ICCV Overview|
Inverse Shade Trees for Non-Parametric Material Representation and
Editing, by Jason Lawrence, Aner Ben-Artzi, Christopher DeCoro, Wojciech
Matusik, Hanspeter Psterr, Ravi Ramamoorthi, and Szymon Rusinkiewicz.
AppWand: Editing Measured Materials using Appearance-Driven Optimization, by Fabio Pellacini and Jason Lawrence. SIGGRAPH 2007
This meeting has been cancelled. Tom will give his talk on a later date.
3D generic object categorization, localization, and pose estimation
This paper is about building 3D part-based models of object categories. An object is comprised of a collection of 2D "canonical part" images (e.g. a car's bumper) linked to each other through coordinate transforms: imagine arranging polaroids of object parts on a sphere surrounding the object. The technique learns the model of a class in an unsupervised way from still images of instances of the class. I like this model because I think the brain represents objects in a similar way, and if I have time to prepare the information I might say something about that.
I'll be presenting a recent paper of mine from CVPR 2007:
Spacetime Geometry of Gallilean Cameras
A projection model is presented for cameras moving at constant velocity (which we refer to as Galilean cameras). We introduce the concept of spacetime projection and show that perspective imaging and linear pushbroom imaging are specializations of the proposed model. The epipolar geometry between two such cameras is developed and we derive the Galilean fundamental matrix. We show how six different "fundamental" matrices can be directly recovered from the Galilean fundamental matrix including the classic fundamental matrix, the Linear Pushbroom fundamental matrix and a fundamental matrix relating Epipolar Plane Images. To estimate the parameters of this fundamental matrix and the mapping between videos in the case of planar scenes we describe linear algorithms and report experimental performance of these algorithms.
|12/5/2007||Andrew Stein||Due to the CVPR extension, there will be no meeting today.|
|12/12/2007||Everyone||"CVPR Decompression Fest"|