Computer Vision Misc Reading Group
2005 Archived Schedule

Date Presenter Description
1/5/2005 Winter Break & WACV No Meeting
1/12/2005 Caroline Pantofaru I'll be presenting Simultaneous Object Recognition and Segmentation by Image Exploration by Ferrari, Tuytelaars, and Van Gool from ECCV 2004.

Daniel briefly mentioned this paper in his ECCV 2004 overview, however I think it deserves a little more attention. Also, Martial has been trying to get someone to present this for a while.

Abstract: Methods based on local, viewpoint invariant features have proven capable of recognizing objects in spite of viewpoint changes, oc- clusion and clutter. However, these approaches fail when these factors are too strong, due to the limited repeatability and discriminative power of the features. As additional shortcomings, the objects need to be rigid and only their approximate location is found. We present a novel Object Recognition approach which overcomes these limitations. An initial set of feature correspondences is first generated. The method anchors on it and then gradually explores the surrounding area, trying to construct more and more matching features, increasingly farther from the initial ones. The resulting process covers the object with matches, and simultaneously separates the correct matches from the wrong ones. Hence, recognition and segmentation are achieved at the same time. Only very few correct initial matches suffice for reliable recognition. The experimental results demonstrate the stronger power of the presented method in dealing with extensive clutter, dominant occlusion, large scale and viewpoint changes. Moreover non-rigid deformations are explicitly taken into account, and the approximative contours of the object are produced. The approach can extend any viewpoint invariant feature extractor.

1/19/2005 Ranjith Unnikrishnan
Tom Stepleton
Andrew Stein
We will give a brief, high-level overview of some papers from WACV 2005 (and MOTION).
1/26/2005 Marius Leordeanu The paper I will present is an IJCV January 2005 paper by Pedro F. Felzenszalb and Daniel Huttenlocher called:

Pictorial Structures for Object Recognition

They present a computationally efficient framework for part-based modeling and recognition of objects, where objects are represented as a collection of parts arranged in a deformable configuration. Their work is motivated by the pictorial structure models introduced by Fischler and Elschlager.

2/2/2005 Ranjith Unnikrishnan I'll talk about the Improved Fast Gauss Transform (IFGT), an approximation technique for computing the popular "summation of a mixture of M Gaussians at N evaluation points" in O(M+N) as opposed to O(MN) in a naive evaluation, trading off exactness for complexity. It's a technique I'd viewed with much scepticism when it first appeared [CVPR01]. But after a battery of papers on applications in everything related to mean-shift and a demo at the last CVPR, it seems to show some promise in time-sensitive applications.
I'll try to present the main ideas of the math, and collate their results from some representative papers. Recommended reading is CVPR04 or/and NIPS04 (for something that's not mean-shift):

Efficient Kernel Machines Using the Improved Fast Gauss Transform, C. Yang, R. Duraiswami and Larry Davis, NIPS 2004.

Real-Time Kernel-Based Tracking in Joint Feature-Spatial Spaces, C. Yang, R. Duraiswami, A. Elgammal and L. Davis, CVPR 2004 (Demo)

The Fast Gauss Transform for efficient kernel density evaluation with applications in computer vision, Ahmed Elgammal, Ramani Duraiswami and Larry Davis, Nov 2003, PAMI., pp. 1499- 1504, vol. 25, 2003.

Efficient Non-parametric Adaptive Color Modeling Using Fast Gauss Transform, A.Elgammal, R.Duraiswami and L. S. Davis, CVPR 2001.

2/9/2005 Tom Stepleton Shared Features for Scalable Appearance-Based Object Recognition
E. Murphy-Chutorian, J. Triesch
WACV 2005

I'll be presenting the above paper on using shared features for object recognition. Objects are characterized as sets of discrete features (previously extracted from a continuous feature space) with locations relative to the object's center. Multiple object models can share the same features or, individually, use the same feature in multiple locations. By using the same lexicon across the object models, the task of detecting model components is shared across the object detectors, for considerable time savings.

2/16/2005 Owen Carmichael I will talk about the following paper from CVPR 04. It is about matching point sets without explicit correspondences, and its nearest-neighbor approach is probably robust point matching (RPM) by Rangarajan etc.

Joan Glaunes, Alain Trouve, Laurent Younes. Diffeomorphic matching of distributions: A new approach for unlabelled point-sets and sub-manifolds matching.

2/23/2005 Jiang Ni I will present an IJCV 2004 paper:
On Symmetry and Multiple-View Geometry: Structure, Pose and Calibration from a Single Image, Wei Hong, Allen Yang Yang, Kun Huang and Yi Ma, IJCV 60(3), 241-265, 2004.

Usually a single image provides insufficient information to reconstruct the 3-D structure. However, symmetric objects contain overwhelming clues to their orientation and position. This paper talks about how symmetry, including reflective, rotational and translational symmetry, helps in recovering the 3-D pose and structure.

3/2/2005 Fernando de la Torre I will present:

A direct formulation for sparse PCA using semidefinite programming

and if there is time and I can understand it:

Algebraic Set Kernels with Application to Inference Over Local Image Representations.

Enjoy the reading...

3/9/2005 Rong Yan I want to discuss a NIPS04 paper written by Max Wellings et al:

Exponential Family Harmoniums with an Application to Information Retrieval

They proposed an undirected two-layer graphical model as an alternative of widely-used direct two-layer graphical models such as probabilistic PCA (pPCA), Latent semantic Indexing (LSI) and Latent Dirichlet Allocation (LDA). They also studied a specific model setting similar to LSI for the purpose of dimension reduction. This model is applicable to other dimension reduction tasks other than document retrieval.

3/16/2005 Dave Tolliver Detecting, Localizing, and Recovering Kinematics of Textured Animals
Deva Ramanan and D. A. Forsyth and Kobus Barnard
CVPR 2005

Abstract: We develop and demonstrate an object recognition system capable of accurately detecting, localizing, and recovering the kinematic configuration of textured animals in real images. We build a deformation model of shape automatically from videos of animals and an appearance model of texture from a labeled collection of animal images, and combine the two models automatically. We develop a simple texture descriptor that outperforms the state of the art. We test our animal models on two datasets; images taken by professional photographers from the Corel collection, and assorted images from the web returned by Google. We demonstrate quite good performance on both datasets. Comparing our results with simple baselines, we show that for the Google set, we can recognize objects from a collection demonstrably hard for object recognition.

3/23/2005 Cancelled This week's meeting has been cancelled in favor of attending the Stuart Russell lecture in NSH 3305, also at 3:00pm.
3/30/2005 Cancelled Once again, this week's meeting has been cancelled in favor of attending Michael Cohen's presentation at the Graphics seminar in NSH 3305, starting at 3:30pm.
4/6/2005 Jake Sprouse I'll be presenting An MCMC-based Particle Filter for Tracking Multiple Interacting Targets by Zia Khan, Tucker Balch, and Frank Dellaert, from ECCV'04. Fernando presented their work in CVPR'04 on particle filters for eigentracking (of bees) last fall (tracking bees); this paper has a new take on motion models for interacting targets (ants).

Also, here's an earlier version from IROS'03 if you're interested.

If I have time, I'll also try to give some background on multiple target tracking.

4/13/2005 James Hays I'll discuss recent work from Anthony Lobay and David Forsyth on "shape from texture." I'll relate this to work I've done with Yanxi Liu and Steve Lin which can also be seen as a "shape from texture" method. I'll then discuss our attempts to automate our methods and show some preliminary results.

Recovering shape and irradiance maps from rich dense texton fields
Anthony Lobay and D.A. Forsyth. Proceedings of Computer Vision and Pattern Recognition (CVPR), Washington DC, June 2004.
(Journal version of the same work)

Our prior work, probably already familiar to most of you:
Near-Regular Texture Analysis and Manipulation
Yanxi Liu, Wen-Chieh Lin, and James H. Hays. ACM Transactions on Graphics (SIGGRAPH 2004), 23(3), August 2004.

4/20/2005 Yan Ke I'll be presenting a video retrieval paper by DeMenthon and Doermann. The paper looks long, but the reading is light. They use hierarchical mean shift to extract "video strands" in short segments of video. They then build and index descriptors for the video strands to facilitate efficient retrieval of near-duplicate video.

Paper here

4/27/2005 Derek Hoiem I will present our work on automatic single-view reconstruction of outdoor scenes. The paper is here. We are in the final stages of revision for the image-ready, so any comments on how to improve the paper would be welcome. I will also present results from our ICCV submission, including quantitative results and an application to object detection.
5/4/2005 Andrew Stein I'll present Smith, Drummond, and Cipolla's work on segmentation of video into motion layers by tracking edges between frames. The main idea is to find and track Canny edges between frames and then use EM to assign each edge to a motion model. The edges are also used to segment the image into color regions and then those regions are assigned to layers based on the probabilities of their associated edges' belonging to each motion model. Integral to the entire process is also the determination of the relative depth ordering of the various motion layers.

While the majority of the discussion will center on two-frame, two-layer segmentation, the paper does provide for extension to multiple frames and multiple layers (results are shown on up to 3 layers). Computationally, however, things get exponentially more complex as the number of layers increases.

I may also run a demo of their code (which is conveniently available online).

Layered Motion Segmentation and Depth Ordering by Tracking Edges
P. Smith, T. Drummond, and R. Cipolla, PAMI 2004

5/11/2005 Goksel Dedeoglu Links/details will be provided via email.
5/18/2005 Black Friday Meeting Cancelled
5/25/2005 Qifa Ke I will go through the following paper (from ECCV 2004):

What Do Four Points in Two Calibrated Images Tell Us About the Epipoles?
David Nister and Frederik Schaffalitzky

Abstract: Suppose that two perspective views of four world points are given, that the intrinsic parameters are known, but the camera poses and the world point positions are not. We prove that the epipole in each view is then constrained to lie on a curve of degree ten. We give the equation for the curve and establish many of the curve’s properties. For example, we show that the curve has four branches through each of the image points and that it has four additional points on each conic of the pencil of conics through the four image points. We show how to compute the four curve points on each conic in closed form. We show that orientation constraints allow only parts of the curve and find that there are impossible configurations of four corresponding point pairs. We give a novel algorithm that solves for the essential matrix given three corresponding points and one epipole. We then use the theory to describe a solution, using a 1-parameter search, to the notoriously difficult problem of solving for the pose of three views given four corresponding points.

6/1/2005 VASC Job Talk The misc-read meeting today is CANCELLED. Go to the VASC Job Talk for Vassilis Athitsos in NSH 1305 at 3:30pm instead.

Similarity Measures and Indexing Methods for Gesture Recognition

This talk presents nearest neighbor-based methods for static and dynamic gesture recognition. Nearest neighbor classifiers are appealing in many pattern recognition domains because of their simplicity and their ability to model data that follows complex, non-parametric distributions. A fundamental choice in designing a nearest neighbor classifier is the selection of a similarity measure. A frequent problem in similarity measures used for gesture recognition is that they operate on features that are hard to extract reliably in uncontrolled environments, even using state-of-the-art algorithms. We present similarity measures that can tolerate a certain amount of inaccuracy in the feature extraction process, and we demonstrate that these similarity measures are more appropriate in realistic situations where traditional methods are not applicable.

In order to design a practical nearest neighbor classifier, we need a retrieval algorithm that can efficiently identify the nearest neighbors of test objects. The similarity measures used in gesture recognition are often non-Euclidean and even non-metric, thus preventing the use of traditional database techniques. We introduce an embedding-based indexing method that can significantly improve retrieval efficiency in the case of static gesture recognition. The key novelty of our method is that it poses embedding construction as a machine learning task, and uses a formulation that does not assume Euclidean or metric properties. Experiments with real hand images demonstrate the advantages of our method over alternative indexing approaches.

6/8/2005 Alyosha Efros NOTE: This talk will start at 3:30pm to allow for people to attend Taku Osada's informal talk on "The Future and Design of New Asimo" (in NSH 1507 at 3:00pm) if they want.

I will discuss a couple of simple but cute papers in CVPR 2005 dealing with processing spatio-temporal volumes. Here they are:

Space-Time Behavior Based Correlation
Eli Shechtman and Michal Irani

Learning spatiotemporal T-junctions for occlusion detection
Nicholas Apostoloff and Andrew Fitzgibbon

(I will mainly concentrate on the first paper, the second is just for dessert :)

6/15/2005 VASC Talk Meeting cancelled for VASC Seminar instead.
6/22/2005 CVPR Meeting Cancelled
6/29/2005 Sanjeev Koppal I will be presenting the paper
Illumination Normalization with Time-Dependent Intrinsic Images for Video Surveillance
by Ikeuchi et al from PAMI 2004.
July - August No Meetings Since so many people will be away during the summer months, meetings will be suspended for July and August until Fall semester.
9/7/2005 Ankur Datta I am presenting the following paper:

Detecting Irregularities in Images and Video
by Oren Boiman and Michal Irani. It was accepted at ICCV 2005 for Oral Presentation.

The abstract is as follows:

We address the problem of detecting irregularities in visual data, e.g., detecting suspicious behaviors in video sequences, or identifying salient patterns in images. The term "irregular" depends on the context in which the "regular" or "valid" are defined. Yet, it is not realistic to expect explicit definition of all possible valid configurations for a given context. We pose the problem of determining the validity of visual data as a process of constructing a puzzle: We try to compose a new observed image region or a new video segment ("the query") using chunks of data ("pieces of puzzle") extracted from previous visual examples ("the database"). Regions in the observed data which can be composed using large contiguous chunks of data from the database are considered very likely, whereas regions in the observed data which cannot be composed from the database (or can be composed, but only using small fragmented pieces) are regarded as unlikely/suspicious. The problem is posed as an inference process in a probabilistic graphical model. We show applications of this approach to identifying saliency in images and video, and for suspicious behavior recognition.

9/14/2005 No Meeting CANCELLED
9/21/2005 Philipp Michel I will be presenting the following paper, which Sanjeev was unable to present over the summer:
Illumination Normalization with Time-Dependent Intrinsic Images for Video Surveillance
by Ikeuchi et al from PAMI 2004.
Slides (PDF)
Jean-Francois Lalonde Learning Appearance Manifolds from Video by Rahimi, Recht and Darrell, presented at CVPR this year.

Abstract: The appearance of dynamic scenes is often largely governed by a latent low-dimensional dynamic process. We show how to learn a mapping from video frames to this lowdimensional representation by exploiting the temporal coherence between frames and supervision from a user. This function maps the frames of the video to a low-dimensional sequence that evolves according to Markovian dynamics. This ensures that the recovered low-dimensional sequence represents a physically meaningful process. We relate our algorithm to manifold learning, semi-supervised learning, and system identification, and demonstrate it on the tasks of tracking 3D rigid objects, deformable bodies, and articulated bodies. We also show how to use the inverse of this mapping to manipulate video.

I will try to relate their work to another paper also published at cvpr05:

"Online Learning of Probabilistic Appearance Manifolds for Video-based Recognition and Tracking, by Lee and Kriegman

10/5/2005 Jiang Ni I will talk about our paper accepted by ICCV workshop on Analysis and Modeling of Faces and Gestures 2005:

Face View Synthesis Across Large Angles
Jiang Ni and Henry Schneiderman

This paper is about synthesizing new views of human faces given only one single input image. Large out-of-plane rotation is involved.

10/12/2005 Intel Open House No regular meeting, go to Intel's Open House instead. (CIC Building, 4th Floor)
10/19/2005 Jake Sprouse I will present Tracking Loose Limbed People by Leonid Sigal, Sidharth Bhatia, Stefan Roth, Michael Black, and Michael Isard from CVPR'04, in which our heroes combine particle filtering with belief propagation and eigenspace part detection to detect and track a human.

Background papers:

  • Particle Filtering
  • Nonparametric BP
  • 10/26/2005 ICCV Attendees ICCV 2005 Overview & Highlights
    11/2/2005 Marek Michalowski We will have a bit more ICCV discussion overflowing from last week before Marek's talk.
    Specular reflections and the perception of shape
    R.W. Fleming, A. Torralba, E.H. Adelson
    Journal of Vision 4:798-820 (2004) (Local Copy)

    Abstract: Many materials, including leaves, water, plastic, and chrome exhibit specular reflections. It seems reasonable that the visual system can somehow exploit specular reflections to recover three-dimensional (3D) shape. Previous studies (e.g., J. T. Todd & E. Mingolla, 1983; J. F. Norman, J. T. Todd, & G. A. Orban, 2004) have shown that specular reflections aid shape estimation, but the relevant image information has not yet been isolated. Here we explain how specular reflections can provide reliable and accurate constraints on 3D shape. We argue that the visual system can treat specularities somewhat like textures, by using the systematic patterns of distortion across the image of a specular surface to recover 3D shape. However, there is a crucial difference between textures and specularities: In the case of textures, the image compressions depend on the first derivative of the surface depth (i.e., surface orientation), whereas in the case of specularities, the image compressions depend on the second derivative (i.e., surfaces curvatures). We suggest that this difference provides a cue that can help the visual system distinguish between textures and specularities, even when present simultaneously. More importantly, we show that the dependency of specular distortions on the second derivative of the surface leads to distinctive fields of image orientation as the reflected world is warped across the surface. We find that these “orientation fields” are (i) diagnostic of 3D shape, (ii) remain surprisingly stable when the world reflected in the surface is changed, and (iii) can be extracted from the image by populations of simple oriented filters. Thus the use of specular reflections for 3D shape perception is both easier and more reliable than previous computational work would suggest.

    11/9/2005 Dave Tolliver Globally optimal solutions for energy minimization in stereo vision using reweighted belief propagation.
    Talya Meltzer, Chen Yanover, Yair Weiss
    ICCV 2005
    11/16/2005 Fernando de la Torre I will be presenting these two papers:

    TemporalBoost for Event Recognition
    Paul Smith, Niels da Vitoria Lobo, and Mubarak Shah
    ICCV 2005

    Learning Effective Image Metrics from Few Pairwise Examples
    Hwann-Tzong Chen, Tyng-Luh Liu, Chiou-Shann Fuh
    Tenth IEEE International Conference on Computer Vision, Volume 2, pp. 1371-1378, 2005.

    Enjoy the reading..

    11/23/2005 Thanksgiving No meeting.
    11/30/2005 James Hays I'll present recent work with Marius, Alyosha, and Yanxi on the discovery of texture regularity.
    12/7/2005 Martial Hebert I'll discuss: Probabilistic Parameter-Free Motion Detection
    by Veit, Cao, Bouthemy.

    With background paper on grouping.

    12/14/2005 Stano Funiak I will present our recent work on distributed camera localization. The talk title is:
    Distributed localization of networked cameras
    and it's joint work with Carlos Guestrin, Mark Paskin, and Rahul Sukthankar. It's based on our recent submission to IPSN (Information Processing in Sensor Networks) 2005. The abstract is attached below. Unfortunately, I don't have a link/paper for people to read, because I need to fix some issues in the paper before I make it available. So you can just relax and enjoy the show. :-)

    Camera networks are perhaps the most common type of sensor network and are deployed in a variety of real-world applications including surveillance, intelligent environments and scientific remote monitoring. A key problem in deploying a network of cameras is calibration, i.e., determining the location and orientation of each sensor so that observations in an image can be mapped to locations in the real world. This paper proposes a fully distributed approach for camera network calibration. The cameras collaborate to track an object that moves through the environment and reason probabilistically about which camera poses are consistent with the observed images. This reasoning employs sophisticated techniques for handling the difficult nonlinearities imposed by projective transformations, as well as the dense correlations that arise between distant cameras. Our method requires minimal overlap of the cameras' fields of view and makes very few assumptions about the motion of the object. In contrast to existing approaches, which are centralized, our distributed algorithm scales easily to very large camera networks. We evaluate the system on a real camera network with five nodes as well as simulated camera networks of up to 50 cameras and demonstrate that our approach performs well even when communication is lossy.

    12/21/2005 Winter Break No Meeting
    12/28/2005 Winter Break No Meeting