Computer Vision Misc Reading Group
Wednesdays, 4:30 - 6:00, Intel Lab (Top Floor, Collaborative Innovation Center)

Mailing List Subscription | | Presenter List | | Slides | | Paper Suggestions | | Previous Years | Related Links | | FAQ |

Announcements:

2009 Schedule

Jump to next talk
(the highlighted row)

Date Presenter Description
1/7/2009
Slides
David Bradley
NIPS overview:

1. Supervised Dictionary Learning
Julien Mairal, Francis Bach, Jean Ponce, Guillermo Sapiro, Andrew Zisserman

It is now well established that sparse signal models are well suited to restoration tasks and can effectively be learned from audio, image, and video data. Recent research has been aimed at learning discriminative sparse models instead of purely reconstructive ones. This paper proposes a new step in that direction, with a novel sparse representation for signals belonging to different classes in terms of a shared dictionary and multiple discriminative class models. The linear variant of the proposed model admits a simple probabilistic interpretation, while its most general variant admits an interpretation in terms of kernels. An optimization framework for learning all the components of the proposed model is presented, along with experimental results on standard handwritten digit and texture classification tasks.

2. Transfer Learning by Distribution Matching for Targeted Advertising
Steffen Bickel, Christoph Sawade, and Tobias Scheffer

We address the problem of learning classifiers for several related tasks that may differ in their joint distribution of input and output variables. For each task, small ¨Cpossibly even empty ¨C labeled samples and large unlabeled samples are available. While the unlabeled samples reflect the target distribution, the labeled samples may be biased. This setting is motivated by the problem of predicting sociodemographic features for users of web portals, based on the content which they have accessed. Here, questionnaires offered to a portion of each portal's users produce biased samples. We derive a transfer learning procedure that produces resampling weights which match the pool of all examples to the target distribution of any given task. Transfer learning enables us to make predictions even for new portals with few or no training data and improves the overall prediction accuracy.

I will also briefly discuss a purely theoretical learning paper:
Mind the Duality Gap: Logarithmic regret algorithms for online optimization
Sham M. Kakade, and Shai Shalev-Schwartz

We describe a primal-dual framework for the design and analysis of online strongly convex optimization algorithms. Our framework yields the tightest known logarithmic regret bounds for Follow-The-Leader and for the gradient descent algorithm proposed in Hazan et al. [2006]. We then show that one can interpolate between these two extreme cases. In particular, we derive a new algorithm that shares the computational simplicity of gradient descent but achieves lower regret in many practical situations. Finally, we further extend our framework for generalized strongly convex functions.

1/14/2009 Jonathan Huang
NIPS overview:

Sparse Signal Recovery Using Markov Random Fields
Volkan Cevher, Marco Duarte, Chinmay Hedge, Richard Baraniuk

Compressive Sensing (CS) combines sampling and compression into a single sub-Nyquist linear measurement process for sparse and compressible signals. In this paper, we extend the theory of CS to include signals that are concisely represented in terms of a graphical model. In particular, we use Markov Random Fields (MRFs) to represent sparse signals whose nonzero coefficients are clustered. Our new model-based reconstruction algorithm, dubbed Lattice Matching Pursuit (LaMP), stably recovers MRF-modeled signals using many fewer measurements and computations than the current state-of-the-art algorithms.

Measures of Clustering Quality: A Working Set of Axioms for Clustering Shai Ben-David, Margareta Ackerman

Aiming towards the development of a general clustering theory, we discuss abstract axiomatization for clustering. In this respect, we follow up on the work of Kleinberg, ([1]) that showed an impossibility result for such axiomatization. We argue that an impossibility result is not an inherent feature of clustering, but rather, to a large extent, it is an artifact of the specific formalism used in [1]. As opposed to previous work focusing on clustering functions, we propose to address clustering quality measures as the object to be axiomatized. We show that principles like those formulated in Kleinberg's axioms can be readily expressed in the latter framework without leading to inconsistency. A clustering-quality measure (CQM) is a function that, given a data set and its partition into clusters, returns a non-negative real number representing how strong or conclusive the clustering is. We analyze what clustering-quality measures should look like and introduce a set of requirements (axioms) for such measures. Our axioms capture the principles expressed by Kleinberg's axioms while retaining consistency. We propose several natural clustering quality measures, all satisfying the proposed axioms. In addition, we analyze the computational complexity of evaluating the quality of a given clustering and show that, for the proposed CQMs, it can be computed in polynomial time.

Weighted Sums of Random Kitchen Sinks: Replacing minimization with randomization in learning Ali Rahimi, Ben Recht

Randomized neural networks are immortalized in this well-known AI Koan: In the days when Sussman was a novice, Minsky once came to him as he sat hacking at the PDP-6. "What are you doing?" asked Minsky. "I am training a randomly wired neural net to play tic-tac-toe," Sussman replied. "Why is the net wired randomly?" asked Minsky. Sussman replied, "I do not want it to have any preconceptions of how to play."Minsky then shut his eyes. "Why do you close your eyes?" Sussman asked his teacher. "So that the room will be empty," replied Minsky. At that moment, Sussman was enlightened. We analyze shallow random networks with the help of concentration of measure inequalities. Specifically, we consider architectures that compute a weighted sum of their inputs after passing them through a bank of arbitrary randomized nonlinearities. We identify conditions under which these networks exhibit good classification performance, and bound their test error in terms of the size of the dataset and the number of random nonlinearities.

Shape-Based Object Localization for Descriptive Classification Geremy Heitz, Gal Elidan, Benjamin Packer, Daphne Koller

Discriminative tasks, including object categorization and detection, are central components of high-level computer vision. Sometimes, however, we are inter- ested in more refined aspects of the object in an image, such as pose or particular regions. In this paper we develop a method (LOOPS) for learning a shape and image feature model that can be trained on a particular object class, and used to outline instances of the class in novel images. Furthermore, while the training data consists of uncorresponded outlines, the resulting LOOPS model contains a set of landmark points that appear consistently across instances, and can be accurately localized in an image. Our model achieves state-of-the-art results in precisely out- lining objects that exhibit large deformations and articulations in cluttered natural images. These localizations can then be used to address a range of tasks, including descriptive classification, search, and clustering.

1/21/2009 Minh Hoai Nguyen

Title: Unlabeled data: Now it helps, now it doesn't
Authors: Aarti Singh, Robert D. Nowak, Xiaojin Zhu

Abstract:
Empirical evidence shows that in favorable situations semi-supervised learning (SSL) algorithms can capitalize on the abundance of unlabeled training data to improve the performance of a learning task, in the sense that fewer labeled training data are needed to achieve a target error bound. However, in other situations unlabeled data do not seem to help. Recent attempts at theoretically characterizing SSL gains only provide a partial and sometimes apparently conflicting explanations of whether, and to what extent, unlabeled data can help. In this paper, we attempt to bridge the gap between the practice and theory of semi-supervised learning. We develop a finite sample analysis that characterizes the value of unlabeled data and quantifies the performance improvement of SSL compared to supervised learning. We show that there are large classes of problems for which SSL can significantly outperform supervised learning, in finite sample regimes and sometimes also in terms of error convergence rates.

1/28/2009 Sanjeev Koppal

Title: Interactive Cutaway Illustrations of Complex 3D Models

Abstract: From the authors: "We present a system for authoring and viewing interactive cutaway illustrations of complex 3D models using conventions of traditional scientific and technical illustration. Our approach is based on the two key ideas that 1) cuts should respect the geometry of the parts being cut, and 2) cutaway illustrations should support interactive exploration. In our approach, an author instruments a 3D model with auxiliary parameters, which we call "rigging," that define how cutaways of that structure are formed. We provide an authoring interface that automates most of the rigging process. We also provide a viewing interface that allows viewers to explore rigged models using high-level interactions. In particular, the viewer can just select a set of target structures, and the system will automatically generate a cutaway illustration that exposes those parts. We have tested our system on a variety of CAD and anatomical models, and our results demonstrate that! Our approach can be used to create and view effective interactive cutaway illustrations for a variety of complex objects with little user effort."

Link: http://grail.cs.washington.edu/pub/papers/li2007.pdf

2/4/2009 Mohit Gupta I will present the following paper in this week's misc-reading group:

Flexible Depth of Field Photography by H. Nagahara, S. Kuthirummal, C. Zhou, and S.K. Nayar. ECCV'2008

Webpage: http://www1.cs.columbia.edu/CAVE/projects/flexible_dof/

Abstract:
The range of scene depths that appear focused in an image is known as the depth of field (DOF). Conventional cameras are limited by a fundamental trade-off between DOF and signal-to-noise ratio (SNR). For a dark scene, the aperture of the lens must be opened up to maintain SNR, which causes the DOF to reduce. Also, todayˇŻs cameras have DOFs that correspond to a single slab that is perpendicular to the optical axis. In this project, we developed an imaging system that enables one to manipulate the DOF in new and powerful ways. We propose to vary the position and/or orientation of the image detector, during the integration time of a single photograph. Even when the detector motion is very small (tens of microns), a large range of scene depths (several meters) is captured both in and out of focus.
We demonstrate four applications of flexible DOF. First, extended DOF, where a large depth range is captured with a very wide aperture (low noise) but with nearly depth-independent defocus blur. Applying deconvolution to a captured image gives an image with extended DOF and yet high SNR. We also show that we can capture scenes with discontinuous DOFs. For instance, near and far objects can be imaged with sharpness while objects in between are severely blurred. We show that our camera can also capture images with tilted DOFs (Scheimpflug imaging) without tilting the image detector. Finally, we show how our camera can capture scenes with non-planar DOFs.

2/11/2009 Yaser Sheikh

I'll be discussing: Pinto N, Cox DD, and DiCarlo JJ. Why is Real-World Visual Object Recognition Hard? PLoS Computational Biology, 4(1):e27 (2008).

In the incindiery language of the authors' themselves: "Recent computational models have sought to match humans' remarkable visual abilities, and, using large databases of ''natural'' images, have shown apparently impressive progress. Here we show that caution is warranted. In particular, we found that a very simple neuroscience ''toy'' model, capable only of extracting trivial regularities from a set of images, is able to outperform most state-of-the-art object recognition systems on a standard ''natural'' test of object recognition. At the same time, we found that this same toy model is easily defeated by a simple recognition test that we generated to better span the range of image variation observed in the real world. Together these results suggest that current ''natural'' tests are inadequate for judging success or driving forward progress. In addition to tempering claims of success in the machine vision literature, these results point the way forward and call for renewed focus on image variation as a central challenge in object recognition."

2/18/2009 Stephen T. Nuske

Title: Visual Localisation in Dynamic Non-uniform Lighting

Abstract:

For vision to succeed as a perceptual mechanism in general field robotic applications, vision systems must overcome the difficult challenges presented by varying lighting conditions. Many current approaches rely on decoupling the effects of lighting, which is not possible in many situations -- not surprising considering an image is fundamentally an array of light measurements. This talk will describe two different visual localisation systems designed for two different field robot applications and were both designed to address the lighting challenges in their respective application environments.

The first visual localisation system discussed is for industrial ground vehicles operating outdoors. The system employs an invariant map combined with a robust localisation algorithm and an intelligent exposure control algorithm which together permit reliable localisation in a wide range of outdoor lighting conditions.

The second system discussed is for submarines navigating underwater structures, where the only light source is a spotlight mounted onboard the vehicle. The proposed system explicitly models the light source within the localisation framework which serves to predict the changing appearance of the structure. Experiments reveal that by understanding the effects of the lighting can help solve this difficult visual localisation scenario.

The results of the two systems are encouraging, given the extremely challenging dynamic non-uniform lighting in each environment and both systems are being developed with industry partners into the future.

Bio:

Stephen's research is in vision systems for mobile robots, focusing on the creation of practical systems that can deal with the problems arising from dynamic non-uniform lighting conditions. Stephen began his undergraduate studies at the University of Queensland, Australia, in Software Engineering. His undergraduate thesis was on the vision system for the university's robot soccer team that placed second at the RoboCup in Portugal. During his undergraduate years he gained work experience at BSD Robotics; a company that develops equipment for automated medical laboratories. After receiving his undergraduate degree Stephen began a PhD based at the Autonomous Systems Laboraty at CSIRO in Australia. He has spent three months during his PhD at INRIA in Grenoble; a French national institute for computer science. Stephen has now started a PostDoc position here at CMU in the Field Robotics Center under Sanjiv Singh.

2/25/2009 Marius Leordeanu Cancelled
3/4/2009 Moritz Tenorth

I will present the work of the Intelligent Autonomous Systems Group at the TU Munich. The work in our group deals with perceiving, interpreting and executing everyday manipulation tasks in a household environment, including the vision system for object recognition and human pose tracking, 3D environment maps created from laser scans, grounded knowledge processing, multi-level action understanding, and robot task and manipulation planning. Our goal is to cover the whole Perception-Cognition-Action loop and build robots that can act autonomously in human environments.

Website: http://ias.cs.tum.edu

3/11/2009 Spring Break No meeting
3/18/2009 All ICCV decompression
3/25/2009 Cancelled Cancelled
4/1/2009 David Lee

I will talk about my work that I contributed to this year's CVPR. It is titled Geometric Reasoning for Single Image Structure Recovery, and here is the abstract:

We study the problem of generating plausible interpretations of a scene from a collection of line segments automatically extracted from a single indoor image. We show that we can recognize the three dimensional structure of the interior of a building, even in the presence of occluding objects. Several physically valid structure hypotheses are proposed by geometric reasoning and verified to find the best fitting model to line segments, which is then converted to a full 3D model. Our experiments demonstrate that our structure recovery from line segments is comparable with methods using full image appearance. Our approach shows how a set of rules describing geometric constraints between groups of segments can be used to prune scene interpretation hypotheses and to generate the most plausible interpretation.

4/8/2009
Slides
Scott Satkin

I will be presenting the work of:

Ivan Laptev, Marcin Marsza??ek, Cordelia Schmid & Benjamin Rozenfeld

Learning realistic human actions from movies

The aim of this paper is to address recognition of natural human actions in diverse and realistic video settings. This challenging but important subject has mostly been ignored in the past due to several problems one of which is the lack of realistic and annotated video datasets. Our first contribution is to address this limitation and to investigate the use of movie scripts for automatic annotation of human actions in videos. We evaluate alternative methods for action retrieval from scripts and show benefits of a text-based classifier. sing the retrieved action samples for visual learning, we next turn to the problem of action classification in video. We present a new method for video classification that builds upon and extends several recent ideas including local space-time features, space-time pyramids and multi-channel non-linear SVMs. The method is shown to improve state-of-the-art results on the standard KTH action dataset by achieving 91.8% accuracy. Given the inherent problem of noisy labels in automatic annotation, we particularly investigate and show high tolerance of our method to annotation errors in the training set. We finally apply the method to learning and classifying challenging action classes in movies and show promising results.

4/15/2009
Slides
Edward Hsiao

I will talk about the following journal paper:

J.M. Morel and G.Yu, ASIFT: A New Framework for Fully Affine Invariant Image Comparison, to appear in SIAM Journal on Imaging Sciences, 2009.

Abstract. If a physical object has a smooth or piecewise smooth boundary, its images obtained by cameras in varying positions undergo smooth apparent deformations. These deformations are locally well approximated by affine transforms of the image plane. In consequence the solid object recognition problem has often been led back to the computation of affine invariant image local features. Such invariant features could be obtained by normalization methods, but no fully affine normalization method exists for the time being. Even scale invariance is only dealt with rigorously by the SIFT method. By simulating zooms out and normalizing translation and rotation, SIFT is invariant to four out of the six parameters of an affine transform. The method proposed in this paper, Affine-SIFT (ASIFT), simulates all image views obtainable by varying the two camera axis orientation parameters, namely the latitude and the longitude angles, left over by the SIFT method. Then it covers the other four parameters by using the SIFT method itself. The resulting method will be mathematically proved to be fully affine invariant. Against any prognosis, simulating all views depending on the two camera orientation parameters is feasible with no dramatic computational load. A two-resolution scheme further reduces the ASIFT complexity to about twice that of SIFT. A new notion, the transition tilt, measuring the amount of distortion from one view to another is introduced. While an absolute tilt from a frontal to a slanted view exceeding 6 is rare, much higher transition tilts are common when two slanted views of an object are compared (see Fig. 1.1). The attainable transition tilt is measured for each affine image comparison method. The new method permits to reliably identify features that have undergone transition tilts of large magnitude, up to 36 and higher. This fact is substantiated by many experiments which show that ASIFT outperforms significantly the state-of-the-art methods SIFT, MSER, Harris-Affine, and Hessian-Affine.

- Ed

4/22/2009 Ekaterina Spriggs I will present:

Lampert, C. H., M. B. Blaschko and T. Hofmann: Beyond Sliding Windows: Object Localization by Efficient Subwindow Search, CVPR 2008 (best paper)

Abstract:

Most successful object recognition systems rely on binary classification, deciding only if an object is present or not, but not providing information on the actual object location. To perform localization, one can take a sliding window approach, but this strongly increases the computational cost, because the classifier function has to be evaluated over a large set of candidate subwindows. In this paper, we propose a simple yet powerful branchand- bound scheme that allows efficient maximization of a large class of classifier functions over all possible subimages. It converges to a globally optimal solution typically in sublinear time. We show how our method is applicable to different object detection and retrieval scenarios. The achieved speedup allows the use of classifiers for localization that formerly were considered too slow for this task, such as SVMs with a spatial pyramid kernel or nearest neighbor classifiers based on the 2-distance. We demonstrate state-of-the-art performance of the resulting systems on the UIUC Cars dataset, the PASCAL VOC 2006 dataset and in the PASCAL VOC 2007 competitio

4/29/2009 Marius Leordeanu Canceled.
5/6/2009 Yuandong Tian Internal only, see email for details.
5/13/2009 Fernando de la Torre Canceled due to schedule conflict.
5/20/2009 James Hays Nonparametric Scene Parsing: Label Transfer via Dense Scene Alignment
Ce Liu, Jenny Yuen, Antonio Torralba

http://people.csail.mit.edu/torralba/publications/siftFlowCVPR.pdf

abstract
In this paper we propose a novel nonparametric approach for object recognition and scene parsing using dense scene alignment. Given an input image, we retrieve its best matches from a large database with annotated images using our modified, coarse-to-fine SIFT flow algorithm that aligns the structures within two images. Based on the dense scene correspondence obtained from the SIFT flow, our system warps the existing annotations, and integrates multiple cues in a Markov random field framework to segment and recognize the query image. Promising experimental results have been achieved by our nonparametric scene parsing system on a challenging database. Compared to existing object recognition approaches that require training for each object category, our system is easy to implement, has few parameters, and embeds contextual information naturally in the retrieval/alignment procedure.

5/27/2009 Pete Barnum

Light-Efficient Photography
Samuel W. Hasinoff and Kiriakos N. Kutulakos
http://www.cs.toronto.edu/~hasinoff/pubs/hasinoff-lightefficient-2008.pdf

We consider the problem of imaging a scene with a given depth of field at a given exposure level in the shortest amount of time possible. We show that by (1) collecting a sequence of photos and (2) controlling the aperture, focus and exposure time of each photo individually, we can span the given depth of field in less total time than it takes to expose a single narrower-aperture photo.Using this as a starting point, we obtain two key results. First, for lenses with continuously-variable apertures, we derive a closed-form solution for the globally optimal capture sequence, i.e., that collects light from the specified depth of field in the most efficient way possible. Second, for lenses with discrete apertures, we derive an integer programming problem whose solution is the optimal sequence. Our results are applicable to off-the-shelf cameras and typical photography conditions, and advocate the use of dense, wide-aperture photo sequences as a light-efficient alternative to single-shot, narrow-aperture photography.

9/16/2009 ALL ICCV09 preview
9/23/2009 ALL ICCV09 preview
9/30/2009 --- ICCV09 no meeting
10/7/2009 ALL ICCV09 review
10/14/2009 ALL ICCV09 review
10/21/2009 ALL ICCV09 review
10/28/2009 Ekaterina Spriggs TBA
11/4/2009 Tomasz Malisiewicz Title: Beyond Categories: The Visual Memex Model for Reasoning About Object Relationships

Abstract: The use of context is critical for scene understanding in computer vision, where the recognition of an object is driven by both local appearance and the object's relationship to other elements of the scene (context). Most current approaches rely on modeling the relationships between object categories as a source of context. In this paper we seek to move beyond categories to provide a richer appearance-based model of context. We present an exemplar-based model of objects and their relationships, the Visual Memex, that encodes both local appearance and 2D spatial context between object instances. We evaluate our model on Torralba's proposed Context Challenge against a baseline category-based system. Our experiments suggest that moving beyond categories for context modeling appears to be quite beneficial, and may be the critical missing ingredient in scene understanding systems.

The paper can be found here:
http://www.cs.cmu.edu/~tmalisie/projects/nips09/

11/11/2009 Mohit Gupta I plan to talk about Optical Illusions for this week's misc-group meeting. I will show a few kinds of illusions (brightness and contrast illusions, relative motion illusions, shadow illusions etc.) along with explanations for some of them.
11/18/2009 --- CVPR10 deadline, no meeting
11/25/2009 ALL CVPR decompression
12/2/2009 Minh Hoai Nguyen TBA
12/9/2009 Pyry Matikainen TBA
12/16/2009 Daniel Munoz TBA

Meetings in Previous Years

Paper Lists from Previous Years

Related Links

FAQ

1. How is the presenters' order generated?
The presenters' order is generated from the presenters' list in a FIFO manner.

2. Who is responsible if I can not present at the scheduled time?
Youself.

3. What should I do if I can not present at the scheduled time?
First, let the organizer know your situation, as early as possible. Second, contact other presenters on the list and see if they are willing to swap with you.

4. What happens if a new event takes place and we have to change the schedule?
To minimize disturbance, the conflited slot will be moved to the rear of the list after confirmed with the originally scheduled presenter, while all the other schedules remain unchanged.

5. I have a question not listed here...
Ask.

This file is located at: /afs/cs/project/vmr/www/misc_read/