|1/4/2006||Winter Break||No Meeting|
|1/11/2006||Yan Ke||NIPS Overview|
The Pyramid Match Kernel: Discriminative Classification with Sets of Image Features
by Kristen Grauman & Trevor Darrell.
I'll spend a few minutes discussing the paper that a few of us were mumbling
about last week that uses the pyramid histogram kernel to model spatial
I'll spend the bulk of the time discussing the paper:
Hyperfeatures -- Multilevel Local Coding for Visual Recognition
It's been accepted as an oral for ECCV 2006. The tech report relating to the paper is here. The conference submission will be made available via email, but should not be distributed, as requested by the author.
This week, I'll present two recent papers related to "Dynamic Graph Cuts" by Pushmeet Kohli and Phil Torr from across the pond:
In (1) they exploit an simple idea to quickly compute optimal graph cuts in slowly changing energy functions for figure-ground segmentation in video. In (2) they show how to compute min-marginals associated with the label assignments for any latent variable in an MRF, and subsequently compute a useful confidence measure for label assignments in image segmentation.
I will present Kumar, Torr, and Zisserman's ICCV 2005 paper,
Learning Layered Motion Segmentations of Video
Abstract: We present an unsupervised approach for learning a generative layered representation of a scene from a video for motion segmentation. The learnt model is a composition of layers, which consist of one or more segments. Included in the model are the effects of image projection, lighting, and motion blur. The two main contributions of our method are: (i) A novel algorithm for obtaining the initial estimate of the model using efficient loopy belief propagation; (ii) Using alpha-swap and alpha-beta-expansion algorithms, which guarantee a strong local minima, for refining the initial estimate. Results are presented on several classes of objects with different types of camera motion. We compare our method with the state of the art and demonstrate signicant improvements.
TextonBoost: Joint Appearance, Shape and Context Modeling for Multi-Class
Object Recognition and Segmentation
J. Shotton, J. Winn, C. Rother, and A. Criminisi
A link for the paper will be distributed via the email list. Please do not distribute.
|2/22/2006||Goksel Dedeoglu||"Learning Depth from Single Monocular Images", Ashutosh Saxena, Sung H. Chung, Andrew Y. Ng, NIPS 18|
NOTE: Meeting in NSH 3001 this week!
I will present the following paper:
Parameter-Free Radial Distortion Correction with Centre of Distortion Estimation
In my presentation I am going to discuss a recent (CVPR'05) paper
"Hallucinating Faces: TensorPatch Super-Resolution and Coupled Residue Compensation".
If I have time I will also discuss some of my more recent work involving patches and faces from this year's upcoming CVPR.
|3/15/2006||Spring Break||Srinivas will be out of town, and it's spring break, so this meeting is cancelled.|
Weakly Supervised Learning of Part-Based Spatial Models for Visual Object Recognition
David Crandall and Daniel Huttenlocher (Cornell University)
ECCV 2006 Oral
Abstract: In this paper we investigate a new method of learning part- based models for visual object recognition, from training data that only provides information about class membership (and not object location or configuration). This method learns both a model of local part ap- pearance and a model of the spatial relations between those parts. In contrast, other work using such a weakly supervised learning paradigm has not considered the problem of simultaneously learning appearance and spatial models. Some of these methods use a “bag” model where only part appearance is considered whereas other methods learn spatial models but only given the output of a particular feature detector. Pre- vious techniques for learning both part appearance and spatial relations have instead used a highly supervised learning process that provides substantial information about object part location. We show that our weakly supervised technique produces better results than these previous highly supervised methods. Moreover, we investigate the degree to which both richer spatial models and richer appearance models are helpful in improving recognition performance. Our results show that while both spatial and appearance information can be useful, the effect on perfor- mance depends substantially on the particular object class and on the difficulty of the test dataset.
Stable Real-Time 3D Tracking Using Online and Offline Information
L. Vacchetti, V. Lepetit and P. Fua
IEEE Transactions on Pattern Analysis and Machine Intelligence,
Vol. 26, Nr. 10, pp. 1391-1391, 2004.
Abstract: We propose an efficient real-time solution for tracking rigid objects in 3D using a single camera that can handle large camera displacements, drastic aspect changes, and partial occlusions. While commercial products are already available for offline camera registration, robust online tracking remains an open issue because many real-time algorithms described in the literature still lack robustness and are prone to drift and jitter.
To address these problems, we have formulated the tracking problem in terms of local bundle adjustment and have developed a method for establishing image correspondences that can equally well handle short and wide-baseline matching. We then can merge the information from preceding frames with that provided by a very limited number of keyframes created during a training stage, which results in a real-time tracker that does not jitter or drift and can deal with significant aspect changes.
Distance Metric Learning for Large Margin Nearest Neighbor Classification
by Kilian Q. Weinberger, John Blitzer and Lawrence K. Saul
Abstract: We show how to learn a Mahanalobis distance metric for k-nearest neighbor (kNN) classification by semidefinite programming. The metric is trained with the goal that the k-nearest neighbors always belong to the same class while examples from different classes are separated by a large margin. On seven data sets of varying size and difficulty, we find that metrics trained in this way lead to significant improvements in kNN classification—for example, achieving a test error rate of 1.3% on the MNIST handwritten digits. As in support vector machines (SVMs), the learning problem reduces to a convex optimization based on the hinge loss. Unlike learning in SVMs, however, our framework requires no modification or extension for problems in multiway (as opposed to binary) classification.
I'll be talking about the NIPS '05 paper:
Describing Visual Scenes using Transformed Dirichlet Processes
Abstract: Motivated by the problem of learning to detect and recognize objects with minimal supervision, we develop a hierarchical probabilistic model for the spatial structure of visual scenes. In contrast with most existing models, our approach explicitly captures uncertainty in the number of object instances depicted in a given image. Our scene model is based on the transformed Dirichlet process (TDP), a novel extension of the hierarchical DP in which a set of stochastically transformed mixture components are shared between multiple groups of data. For visual scenes, mixture components describe the spatial structure of visual features in an object–centered coordinate frame, while transformations model the object positions in a particular image. Learning and inference in the TDP, which has many potential applications beyond computer vision, is based on an empirically effective Gibbs sampler. Applied to a dataset of partially labeled street scenes, we show that the TDP’s inclusion of spatial structure improves detection performance, flexibly exploiting partially labeled training images.
I'll be presenting the CVPR 2006 paper titled
The Layout Consistent Random Field for Recognizing and Segmenting Partially Occluded Objects, by John Winn and Jamie Shotton.
Abstract: This paper addresses the problem of detecting and segmenting partially occluded objects of a known category. We first define a part labelling which densely covers the object. Our Layout Consistent Random Field (LayoutCRF) model then imposes asymmetric local spatial constraints on these labels to ensure the consistent layout of parts whilst allowing for object deformation. Arbitrary occlusions of the object are handled by avoiding the assumption that the whole object is visible. The resulting system is both efficient to train and to apply to novel images, due to a novel annealed layout-consistent expansion move algorithm paired with a randomised decision tree classifier. We apply our technique to images of cars and faces and demonstrate state-of-the-art detection and segmentation performance even in the presence of partial occlusion.
Data-driven scale selection and data structure for mobile robot perception
This talk presents research work I have been involved in during my master's, and serves as the speaking requirement for the degree.
Autonomous robot navigation in terrain containing vegetation remains a considerable challenge because of the difficulty in capturing the variability of such complex environments. Usual perception techniques that rely on a 2-D map of the terrain fail to capture three-dimensional details, such as overhanging obstacles for example. In this talk, we will present an approach that enables robotic navigation in complex, 3-D environments.
This presentation will be divided in three sections. First, we present an overview of our approach, that generates a detailled 3-D semantic representation of the environment using only 3-D data from a laser range sensor. The approach relies on point-wise classification based on the extraction of local geometric features taken over a region of interest around each point. This is subject to two main problems: the approach is computationally expensive, and the size of the region of interest is determined manually.
In the two following sections, we propose solutions to each of these problems. First, we present an efficient data structure and algorithm that allows a 4x speedup of range search, a critical operation that lies at the core of our approach, but can also be used in other applications. Second, we introduce an automatic scale selection technique that improves classification accuracy for point-sampled surfaces.
Here are some related papers:
I will be presenting the following CVPR'06 paper:
Abstract: Dimensionality reduction involves mapping a set of high dimensional input points onto a low dimensional manifold so that “similar” points in input space are mapped to nearby points on the manifold. Most existing techniques for solving the problem suffer from two drawbacks. First, most of them depend on a meaningful and computable distance metric in input space. Second, they do not compute a “function” that can accurately map new input samples whose relationship to the training data is unknown. We present a method - called Dimensionality Reduction by Learning an Invariant Mapping (DrLIM) - for learning a globally coherent non-linear function that maps the data evenly to the output manifold. The learning relies solely on neighborhood relationships and does not require any distance measure in the input space. The method can learn mappings that are invariant to certain transformations of the inputs, as is demonstrated with a number of experiments. Comparisons are made to other techniques, in particular LLE.
I'm going to talk primarily about the CVPR 2006 paper:
Incremental learning of object detectors using a visual shape alphabet, by Opelt, Pinz, and Zisserman.
I will also be describing the boundary fragment model that they use
which is outlined in the ECCV 2006 paper:
Here is the abstract from the CVPR paper:
|5/24/2006||Black Friday||No meeting.|
|5/31/2006||Sanjeev Koppal||I'm presenting the CVPR 2006 paper, A planar light probe|
I am in the process of preparing an invited survey paper on the topic of "Computational Symmetry" for the new journal "Foundations and Trends in Computer Graphics and Vision". I am going to give a summary talk on the formal, mathematical definition of types of symmetry, symmetry groups and computational symmetry, their relevance to CV and CG, a sample of previous work, current challenges and future directions. I am looking for your feedback on clarity and completeness. Here's a quote from an incoming SIGGRAPH paper on the relevance of symmetry to get you started:
Symmetry is an essential and ubiquitous concept in nature, science and art. For example, in geometry, the Erlanger program of Felix Klein has fueled for over a century mathematicians' interest in invariance under certain group actions as a key principle for understanding geometric spaces. Numerous biological, physical, or man-made structures exhibit symmetries as a fundamental design principle or as an essential aspect of their function. Whether by evolution or by design, symmetry implies certain economies and efficiencies of structure that make it universally appealing. Symmetry also plays an important role in human visual perception and aesthetics. Arguably much of the understanding of the world around us is based on the perception and recognition of shared or repeated structures, and so is our sense of beauty. [Mitra, Guibas, Pauly]
We haven't been reading many mid-level vision papers lately. So, I will
expose my West Coast bias and present:
Figure/Ground Assignment in Natural Images.
|6/21/2006||CVPR 2006||No meeting.|
CVPR Overview, Part I
Srinivas will cover a few papers that caught his eye at CVPR 2006 and we will organize other overview presenters for the following week.
CVPR Overview, Part II
Various presenters will give short overviews of papers from CVPR 2006:
For my part of the CVPR'06 overview, I'll cover:
Discriminative Object Class Models of Appearance and Shape by Correlatons
Caroline will cover the paper she didn't get to last week:
Shape Guided Object Segmentation
and Dave will briefly go over the multiscale aggregation paper of Galun et al. referenced as  in the Borenstein paper. It provides the initial low-level segmentation and the multiscale segmentation prior.
Noise Estimation from a Single Image by Ce Liu,
William T. Freeman, Richard Szeliski and Sing Bing Kang, since it's a
good paper and it seems that not too many misc-readers were able to
attend its oral presentation at CVPR.
Abstract: In order to work consistently across images, many computer vision algorithms require that their parameters be adjusted according to the image noise level, making it an important quantity to estimate. We show how to estimate an upper bound on the noise level from a single image based on a piecewise smooth image prior model and measured CCD camera response functions. We illustrate the utility of this noise estimation for two algorithms: edge detection and feature preserving smoothing through bilateral filtering. For a variety of different noise levels, we obtain good results for both these algorithms with no user-specified inputs.
I'll be presenting the following CVPR'06 paper, which uses a Bayes Net to
label identities in multi-target tracking problems, where targets may
interact or occlude one another.
Multi-Target Tracking -- Linking Identities using Bayesian Network Inference
|8/9/2006||Fernando de la Torre||
The papers we will enjoy on Wednesday will be:
Happy reading and understanding.
Keypoint Recognition Using Randomized Trees
Lepetit, V. and Fua, P.
PAMI, Sept. 2006
They formulate the matching task between keypoints in the training and testing images as a classification problem, using randomized trees.
Abstract: In many 3D object-detection and pose-estimation problems, runtime performance is of critical importance. However, there usually is time to train the system, which we will show to be very useful. Assuming that several registered images of the target object are available, we developed a keypoint-based approach that is effective in this context by formulating wide-baseline matching of keypoints extracted from the input images to those found in the model images as a classification problem. This shifts much of the computational burden to a training phase, without sacrificing recognition performance. As a result, the resulting algorithm is robust, accurate, and fast-enough for frame-rate performance. This reduction in runtime computational complexity is our first contribution. Our second contribution is to show that, in this context, a simple and fast keypoint detector suffices to support detection and tracking even under large perspective and scale variations. While earlier methods require a detector that can be expected to produce very repeatable results, in general, which usually is very time-consuming, we simply find the most repeatable object keypoints for the specific target object during the training phase. We have incorporated these ideas into a real-time system that detects planar, nonplanar, and deformable objects. It then estimates the pose of the rigid ones and the deformations of the others.
I am presenting the following two BMVC 2006 papers, time permitting:
Finding people in repeated shots of the same scene
The goal of this work is to find all occurrences of a particular person in a sequence of photographs taken over a short period of time. For identification, we assume each individual’s hair and clothing stays the same throughout the sequence. Even with these assumptions, the task remains challenging as people can move around, change their pose and scale, and partially occlude each other.
We propose a two stage method. First, individuals are identified by clustering frontal face detections using color clothing information. Second, a color based pictorial structure model is used to find occurrences of each per- son in images where their frontal face detection was missed. Two extensions improving the pictorial structure detections are also described. In the first extension, we obtain a better clothing segmentation to improve the accuracy of the clothing color model. In the second extension, we simultaneously consider multiple detection hypotheses of all people potentially present in the shot.
Our results show that people can be re-detected in images where they do not face the camera. Results are presented on several sequences from a personal photo collection.
2. Patch-based Object Recognition Using Discriminatively Trained Gaussian Mixtures
Hegerath, Deselaers, Ney
We present an approach using Gaussian mixture models for part-based object recognition where spatial relationships of the parts are explicitly modeled and parameters of the generative model are tuned discriminatively. These extensions lead to great improvements of the classification accuracy. Furthermore we evaluate several improvements over our baseline system which incrementally improve the obtained results which compare favorable well to other published results for the three Caltech tasks and the PASCAL evaluation 05 tasks.
I'll be presenting
Object Categorization by Learned Universal Visual Dictionary
from ICCV 2005. I'll try to relate this work to others from
the same authors, such as
which have been previously presented in the MISC reading group by Derek and Caroline respectively.
Unsupervised Learning of Human Action Categories Using Spatial-Temporal Words
Juan Carlos Niebles, Hongcheng Wang and Li Fei-Fei.
Rethinking the Prior Model for Stereo
Hiroshi Ishikawa and Davi Geiger, ECCV 2006.
|9/20/2006||Andrew Stein||BMVC Overview|
I have decided to change the presented
subject to what is apparently a very hot paper in segmentation:
Boundary Extraction in Natural Images Using Ultrametric Contour Maps
with additional results here.
I will be presenting a method for fully automated
calibration of lens distortion and camera intrinsics.
We use structured-light patterns using a LCD to generate a
dense map between the display and the image coordinate systems.
This approach allows us to easily correct the distortion
even around the edge of a camera image in sub-pixel accuracy,
without assuming any model of lens distortion.
I haven't published this work to any conference or journal, and still wondering if I can claim the novelty of this work. Recently I found several work which is closely related (or fundamentally equivalent) to our work. One of them is here.
***Note: we'll be in the Clemente room at Intel this week!***
I'll go into detail on Particle Video: Long-Range Motion Estimation using Point Trajectories, by Peter Sand and Seth Teller from CVPR'06. Dave Tolliver presented a quick overview in the CVPR review meeting.
This paper describes a new approach to motion estimation in video. We represent video motion using a set of particles. Each particle is an image point sample with a longduration trajectory and other properties. To optimize these particles, we measure point-based matching along the particle trajectories and distortion between the particles. The resulting motion representation is useful for a variety of applications and cannot be directly obtained using existing methods such as optical flow or feature tracking. We demonstrate the algorithm on challenging real-world videos that include complex scene geometry, multiple types of occlusion, regions with low texture, and non-rigid deformations.
*** NOTE: This week's meeting will be in NSH 3001! ***
Here is the paper i'm talking about: Photometric Stereo with Nearby Planar Distributed Illuminants
*** NOTE: This week's meeting will be in NSH 3001! ***
"About natural color statistics"
Alyosha and I recently became interested about natural color statistics, that is: can we find a distribution of the colors we expect to see in natural images? If there is such a distribution, can we then parameterize it to obtain a compact representation? We could not find a paper on that exact topic, but instead found a lot of papers covering a wide range of color-related topics. In the upcoming misc-read meeting, I will present a high-level overview of the litterature on natural color statics. Topics and papers covered will be :
- D.A. Forsyth's gamut mapping
- Kobus Barnard's tutorial
- Some of Graham Finlayson's recent work at CVPR 2005
- Erik Miller's color flow
- Cohen-Or, Siggraph 2006
- Aude Oliva's color for scene recognition
You can read whichever's closest to your interests!
*** NOTE: This week's meeting will be in NSH 3001! ***
Discriminative Learning of Markov Random Fields for Segmentation of 3D Scan Data. D. Anguelov, B. Taskar, V. Chatalbashev, D. Koller, D. Gupta, G. Heitz, A. Ng. International Conference on Computer Vision and Pattern Recognition (CVPR05), San Diego, CA, June 2005.
Background reading with proofs:
*** NOTE: This week's meeting will be in NSH 4201! ***
Details sent via email.
I'll provide an overview of several recent manifold learning papers
which emphasize the role of topology rather than geometry. They are
Chan-Su Lee and Ahmed Elgammal, The 18th International Conference on Pattern Recognition (ICPR), Hong Kong, August 21-24, 2006
P. Niyogi, S. Smale, and S. Weinberger, to appear, Discrete and Computational Geometry, 2006.
G. Carlsson, T. Ishkhanov, V. de Silva, and A. Zomorodian, preprint, May 31, 2006.
A. Zomorodian and G. Carlsson, Discrete and Computational Geometry, 33 (2), pp. 247–274
|12/6/2006||All CVPR Submitters||
Anyone who submitted to CVPR, please send a link of the PDF of your submission to Andrew. We'll put them up on the projector to have a look at the figures, and everyone will have a chance to give the group a low-key, quick talk on their work. No slides necessary.
I will talk about some work that I did while at Berkeley in doing
structure-from-motion without reliable correspondences, or correspondence
inlier rates which may be smaller than 1%. With Ameesh Makadia and
Kostas Daniilidis, we proposed a method which uses a Radon transform to
compute cost functions in the full five-dimensional space of
relative motions between two cameras (up to scale). There was a
beautiful underlying theory relating the idea that the manifold of
essential matrices is a so-called homogeneous space, which admits a
Fourier transform, thereby allowing for efficient computation relative
to a brute force implementation of a Radon transform.
For more information see:
|12/20/2006||Black "Friday"||No Meeting|
|12/27/2006||Winter Break||No Meeting|