|1/5/2005||Winter Break & WACV||No Meeting|
I'll be presenting
Simultaneous Object Recognition and Segmentation by Image Exploration by Ferrari,
Tuytelaars, and Van Gool from ECCV 2004.
Daniel briefly mentioned this paper in his ECCV 2004 overview, however I think it deserves a little more attention. Also, Martial has been trying to get someone to present this for a while.
Abstract: Methods based on local, viewpoint invariant features have proven capable of recognizing objects in spite of viewpoint changes, oc- clusion and clutter. However, these approaches fail when these factors are too strong, due to the limited repeatability and discriminative power of the features. As additional shortcomings, the objects need to be rigid and only their approximate location is found. We present a novel Object Recognition approach which overcomes these limitations. An initial set of feature correspondences is first generated. The method anchors on it and then gradually explores the surrounding area, trying to construct more and more matching features, increasingly farther from the initial ones. The resulting process covers the object with matches, and simultaneously separates the correct matches from the wrong ones. Hence, recognition and segmentation are achieved at the same time. Only very few correct initial matches suffice for reliable recognition. The experimental results demonstrate the stronger power of the presented method in dealing with extensive clutter, dominant occlusion, large scale and viewpoint changes. Moreover non-rigid deformations are explicitly taken into account, and the approximative contours of the object are produced. The approach can extend any viewpoint invariant feature extractor.
|We will give a brief, high-level overview of some papers from WACV 2005 (and MOTION).|
The paper I will present is an IJCV January 2005 paper by
Pedro F. Felzenszalb and Daniel Huttenlocher called:
They present a computationally efficient framework for part-based modeling and recognition of objects, where objects are represented as a collection of parts arranged in a deformable configuration. Their work is motivated by the pictorial structure models introduced by Fischler and Elschlager.
I'll talk about the Improved Fast Gauss Transform (IFGT), an approximation technique for computing the popular "summation of a mixture of M Gaussians at N evaluation points" in O(M+N) as opposed to O(MN) in a naive evaluation, trading off exactness for complexity. It's a technique I'd viewed with much scepticism when it first appeared [CVPR01]. But after a battery of papers on applications in everything related to mean-shift and a demo at the last CVPR, it seems to show some promise in time-sensitive applications.
I'll try to present the main ideas of the math, and collate their results from some representative papers. Recommended reading is CVPR04 or/and NIPS04 (for something that's not mean-shift):
Efficient Kernel Machines Using the Improved Fast Gauss Transform, C. Yang, R. Duraiswami and Larry Davis, NIPS 2004.
Real-Time Kernel-Based Tracking in Joint Feature-Spatial Spaces, C. Yang, R. Duraiswami, A. Elgammal and L. Davis, CVPR 2004 (Demo)
The Fast Gauss Transform for efficient kernel density evaluation with applications in computer vision, Ahmed Elgammal, Ramani Duraiswami and Larry Davis, Nov 2003, PAMI., pp. 1499- 1504, vol. 25, 2003.
Efficient Non-parametric Adaptive Color Modeling Using Fast Gauss Transform, A.Elgammal, R.Duraiswami and L. S. Davis, CVPR 2001.
Shared Features for Scalable Appearance-Based Object Recognition
E. Murphy-Chutorian, J. Triesch
I'll be presenting the above paper on using shared features for object recognition. Objects are characterized as sets of discrete features (previously extracted from a continuous feature space) with locations relative to the object's center. Multiple object models can share the same features or, individually, use the same feature in multiple locations. By using the same lexicon across the object models, the task of detecting model components is shared across the object detectors, for considerable time savings.
I will talk about the following paper from CVPR 04. It
is about matching point sets without explicit correspondences, and its
nearest-neighbor approach is probably robust point matching (RPM) by
Joan Glaunes, Alain Trouve, Laurent Younes. Diffeomorphic matching of distributions: A new approach for unlabelled point-sets and sub-manifolds matching.
I will present an IJCV 2004 paper:
On Symmetry and Multiple-View Geometry: Structure, Pose and Calibration from a Single Image, Wei Hong, Allen Yang Yang, Kun Huang and Yi Ma, IJCV 60(3), 241-265, 2004.
Usually a single image provides insufficient information to reconstruct the 3-D structure. However, symmetric objects contain overwhelming clues to their orientation and position. This paper talks about how symmetry, including reflective, rotational and translational symmetry, helps in recovering the 3-D pose and structure.
|3/2/2005||Fernando de la Torre||
I will present:
and if there is time and I can understand it:
Enjoy the reading...
I want to discuss a NIPS04 paper written by Max Wellings et al:
They proposed an undirected two-layer graphical model as an alternative of widely-used direct two-layer graphical models such as probabilistic PCA (pPCA), Latent semantic Indexing (LSI) and Latent Dirichlet Allocation (LDA). They also studied a specific model setting similar to LSI for the purpose of dimension reduction. This model is applicable to other dimension reduction tasks other than document retrieval.
Detecting, Localizing, and Recovering Kinematics of Textured Animals
Deva Ramanan and D. A. Forsyth and Kobus Barnard
Abstract: We develop and demonstrate an object recognition system capable of accurately detecting, localizing, and recovering the kinematic configuration of textured animals in real images. We build a deformation model of shape automatically from videos of animals and an appearance model of texture from a labeled collection of animal images, and combine the two models automatically. We develop a simple texture descriptor that outperforms the state of the art. We test our animal models on two datasets; images taken by professional photographers from the Corel collection, and assorted images from the web returned by Google. We demonstrate quite good performance on both datasets. Comparing our results with simple baselines, we show that for the Google set, we can recognize objects from a collection demonstrably hard for object recognition.
|3/23/2005||Cancelled||This week's meeting has been cancelled in favor of attending the Stuart Russell lecture in NSH 3305, also at 3:00pm.|
|3/30/2005||Cancelled||Once again, this week's meeting has been cancelled in favor of attending Michael Cohen's presentation at the Graphics seminar in NSH 3305, starting at 3:30pm.|
I'll be presenting
An MCMC-based Particle Filter for Tracking Multiple Interacting Targets
by Zia Khan, Tucker Balch, and Frank Dellaert, from ECCV'04. Fernando
presented their work in CVPR'04 on particle filters for eigentracking (of bees)
last fall (tracking bees); this paper has a new take on motion models for
interacting targets (ants).
Also, here's an earlier version from IROS'03 if you're interested.
If I have time, I'll also try to give some background on multiple target tracking.
I'll discuss recent work from Anthony Lobay and David Forsyth on "shape from
texture." I'll relate this to work I've done with Yanxi Liu and Steve Lin
which can also be seen as a "shape from texture" method. I'll then discuss
our attempts to automate our methods and show some preliminary results.
Our prior work, probably already familiar to most of you:
|4/20/2005||Yan Ke||I'll be presenting a video retrieval paper by DeMenthon and Doermann. The paper looks long, but the reading is light. They use hierarchical mean shift to extract "video strands" in short segments of video. They then build and index descriptors for the video strands to facilitate efficient retrieval of near-duplicate video.|
|4/27/2005||Derek Hoiem||I will present our work on automatic single-view reconstruction of outdoor scenes. The paper is here. We are in the final stages of revision for the image-ready, so any comments on how to improve the paper would be welcome. I will also present results from our ICCV submission, including quantitative results and an application to object detection.|
I'll present Smith, Drummond, and Cipolla's work on segmentation of video into motion layers by tracking edges between frames. The main idea is to find and track Canny edges between frames and then use EM to assign each edge to a motion model. The edges are also used to segment the image into color regions and then those regions are assigned to layers based on the probabilities of their associated edges' belonging to each motion model. Integral to the entire process is also the determination of the relative depth ordering of the various motion layers.
While the majority of the discussion will center on two-frame, two-layer segmentation, the paper does provide for extension to multiple frames and multiple layers (results are shown on up to 3 layers). Computationally, however, things get exponentially more complex as the number of layers increases.
I may also run a demo of their code (which is conveniently available online).
Layered Motion Segmentation and Depth Ordering by Tracking Edges
|5/11/2005||Goksel Dedeoglu||Links/details will be provided via email.|
|5/18/2005||Black Friday||Meeting Cancelled|
I will go through the following paper (from ECCV 2004):
What Do Four Points in Two Calibrated Images Tell Us About the Epipoles?
Abstract: Suppose that two perspective views of four world points are given, that the intrinsic parameters are known, but the camera poses and the world point positions are not. We prove that the epipole in each view is then constrained to lie on a curve of degree ten. We give the equation for the curve and establish many of the curve’s properties. For example, we show that the curve has four branches through each of the image points and that it has four additional points on each conic of the pencil of conics through the four image points. We show how to compute the four curve points on each conic in closed form. We show that orientation constraints allow only parts of the curve and find that there are impossible configurations of four corresponding point pairs. We give a novel algorithm that solves for the essential matrix given three corresponding points and one epipole. We then use the theory to describe a solution, using a 1-parameter search, to the notoriously difficult problem of solving for the pose of three views given four corresponding points.
|6/1/2005||VASC Job Talk||
The misc-read meeting today is CANCELLED. Go to the VASC Job Talk for Vassilis Athitsos
in NSH 1305 at 3:30pm instead.
Similarity Measures and Indexing Methods for Gesture Recognition
This talk presents nearest neighbor-based methods for static and dynamic gesture recognition. Nearest neighbor classifiers are appealing in many pattern recognition domains because of their simplicity and their ability to model data that follows complex, non-parametric distributions. A fundamental choice in designing a nearest neighbor classifier is the selection of a similarity measure. A frequent problem in similarity measures used for gesture recognition is that they operate on features that are hard to extract reliably in uncontrolled environments, even using state-of-the-art algorithms. We present similarity measures that can tolerate a certain amount of inaccuracy in the feature extraction process, and we demonstrate that these similarity measures are more appropriate in realistic situations where traditional methods are not applicable.
In order to design a practical nearest neighbor classifier, we need a retrieval algorithm that can efficiently identify the nearest neighbors of test objects. The similarity measures used in gesture recognition are often non-Euclidean and even non-metric, thus preventing the use of traditional database techniques. We introduce an embedding-based indexing method that can significantly improve retrieval efficiency in the case of static gesture recognition. The key novelty of our method is that it poses embedding construction as a machine learning task, and uses a formulation that does not assume Euclidean or metric properties. Experiments with real hand images demonstrate the advantages of our method over alternative indexing approaches.
NOTE: This talk will start at 3:30pm to allow for people to attend
Taku Osada's informal talk on "The Future and Design of New Asimo" (in NSH 1507 at 3:00pm)
if they want.
I will discuss a couple of simple but cute papers in CVPR 2005 dealing with processing spatio-temporal volumes. Here they are:
Space-Time Behavior Based Correlation
Learning spatiotemporal T-junctions for occlusion detection
(I will mainly concentrate on the first paper, the second is just for dessert :)
|6/15/2005||VASC Talk||Meeting cancelled for VASC Seminar instead.|
I will be presenting the paper
Illumination Normalization with Time-Dependent Intrinsic Images for Video Surveillance
by Ikeuchi et al from PAMI 2004.
|July - August||No Meetings||Since so many people will be away during the summer months, meetings will be suspended for July and August until Fall semester.|
I am presenting the following paper:
Detecting Irregularities in Images and Video
The abstract is as follows:
We address the problem of detecting irregularities in visual data, e.g., detecting suspicious behaviors in video sequences, or identifying salient patterns in images. The term "irregular" depends on the context in which the "regular" or "valid" are defined. Yet, it is not realistic to expect explicit definition of all possible valid configurations for a given context. We pose the problem of determining the validity of visual data as a process of constructing a puzzle: We try to compose a new observed image region or a new video segment ("the query") using chunks of data ("pieces of puzzle") extracted from previous visual examples ("the database"). Regions in the observed data which can be composed using large contiguous chunks of data from the database are considered very likely, whereas regions in the observed data which cannot be composed from the database (or can be composed, but only using small fragmented pieces) are regarded as unlikely/suspicious. The problem is posed as an inference process in a probabilistic graphical model. We show applications of this approach to identifying saliency in images and video, and for suspicious behavior recognition.
I will be presenting the following paper, which Sanjeev was unable to present over the summer:
Illumination Normalization with Time-Dependent Intrinsic Images for Video Surveillance
by Ikeuchi et al from PAMI 2004.
Learning Appearance Manifolds from Video by Rahimi, Recht and Darrell,
presented at CVPR this year.
Abstract: The appearance of dynamic scenes is often largely governed by a latent low-dimensional dynamic process. We show how to learn a mapping from video frames to this lowdimensional representation by exploiting the temporal coherence between frames and supervision from a user. This function maps the frames of the video to a low-dimensional sequence that evolves according to Markovian dynamics. This ensures that the recovered low-dimensional sequence represents a physically meaningful process. We relate our algorithm to manifold learning, semi-supervised learning, and system identification, and demonstrate it on the tasks of tracking 3D rigid objects, deformable bodies, and articulated bodies. We also show how to use the inverse of this mapping to manipulate video.
I will try to relate their work to another paper also published at cvpr05:
I will talk about our paper accepted by ICCV workshop on Analysis and Modeling of Faces and Gestures 2005:
Face View Synthesis Across Large Angles
This paper is about synthesizing new views of human faces given only one single input image. Large out-of-plane rotation is involved.
|10/12/2005||Intel Open House||No regular meeting, go to Intel's Open House instead. (CIC Building, 4th Floor)|
I will present
Tracking Loose Limbed People
by Leonid Sigal,
Sidharth Bhatia, Stefan Roth, Michael Black, and Michael Isard from
CVPR'04, in which our heroes combine particle filtering with belief
propagation and eigenspace part detection to detect and track a human.
|10/26/2005||ICCV Attendees||ICCV 2005 Overview & Highlights|
We will have a bit more ICCV discussion overflowing from last week before Marek's talk.
Specular reflections and the perception of shape
R.W. Fleming, A. Torralba, E.H. Adelson
Journal of Vision 4:798-820 (2004) (Local Copy)
Abstract: Many materials, including leaves, water, plastic, and chrome exhibit specular reflections. It seems reasonable that the visual system can somehow exploit specular reflections to recover three-dimensional (3D) shape. Previous studies (e.g., J. T. Todd & E. Mingolla, 1983; J. F. Norman, J. T. Todd, & G. A. Orban, 2004) have shown that specular reflections aid shape estimation, but the relevant image information has not yet been isolated. Here we explain how specular reflections can provide reliable and accurate constraints on 3D shape. We argue that the visual system can treat specularities somewhat like textures, by using the systematic patterns of distortion across the image of a specular surface to recover 3D shape. However, there is a crucial difference between textures and specularities: In the case of textures, the image compressions depend on the first derivative of the surface depth (i.e., surface orientation), whereas in the case of specularities, the image compressions depend on the second derivative (i.e., surfaces curvatures). We suggest that this difference provides a cue that can help the visual system distinguish between textures and specularities, even when present simultaneously. More importantly, we show that the dependency of specular distortions on the second derivative of the surface leads to distinctive fields of image orientation as the reflected world is warped across the surface. We find that these “orientation fields” are (i) diagnostic of 3D shape, (ii) remain surprisingly stable when the world reflected in the surface is changed, and (iii) can be extracted from the image by populations of simple oriented filters. Thus the use of specular reflections for 3D shape perception is both easier and more reliable than previous computational work would suggest.
Globally optimal solutions for energy minimization in stereo vision
using reweighted belief propagation.
Talya Meltzer, Chen Yanover, Yair Weiss
|11/16/2005||Fernando de la Torre||
I will be presenting these two papers:
TemporalBoost for Event Recognition
Learning Effective Image Metrics from Few Pairwise Examples
Enjoy the reading..
|11/30/2005||James Hays||I'll present recent work with Marius, Alyosha, and Yanxi on the discovery of texture regularity.|
Probabilistic Parameter-Free Motion Detection
by Veit, Cao, Bouthemy.
I will present our recent work on distributed camera localization.
The talk title is:
Distributed localization of networked cameras
and it's joint work with Carlos Guestrin, Mark Paskin, and Rahul Sukthankar. It's based on our recent submission to IPSN (Information Processing in Sensor Networks) 2005. The abstract is attached below. Unfortunately, I don't have a link/paper for people to read, because I need to fix some issues in the paper before I make it available. So you can just relax and enjoy the show. :-)
|12/21/2005||Winter Break||No Meeting|
|12/28/2005||Winter Break||No Meeting|