1. Supervised Dictionary Learning
It is now well established that sparse signal models are well suited to restoration tasks and can effectively be learned from audio, image, and video data. Recent research has been aimed at learning discriminative sparse models instead of purely reconstructive ones. This paper proposes a new step in that direction, with a novel sparse representation for signals belonging to different classes in terms of a shared dictionary and multiple discriminative class models. The linear variant of the proposed model admits a simple probabilistic interpretation, while its most general variant admits an interpretation in terms of kernels. An optimization framework for learning all the components of the proposed model is presented, along with experimental results on standard handwritten digit and texture classification tasks.
2. Transfer Learning by Distribution Matching for Targeted Advertising
We address the problem of learning classifiers for several related tasks that may differ in their joint distribution of input and output variables. For each task, small ¨Cpossibly even empty ¨C labeled samples and large unlabeled samples are available. While the unlabeled samples reflect the target distribution, the labeled samples may be biased. This setting is motivated by the problem of predicting sociodemographic features for users of web portals, based on the content which they have accessed. Here, questionnaires offered to a portion of each portal's users produce biased samples. We derive a transfer learning procedure that produces resampling weights which match the pool of all examples to the target distribution of any given task. Transfer learning enables us to make predictions even for new portals with few or no training data and improves the overall prediction accuracy.
I will also briefly discuss a purely theoretical learning paper:
We describe a primal-dual framework for the design and analysis of online strongly convex optimization algorithms. Our framework yields the tightest known logarithmic regret bounds for Follow-The-Leader and for the gradient descent algorithm proposed in Hazan et al. . We then show that one can interpolate between these two extreme cases. In particular, we derive a new algorithm that shares the computational simplicity of gradient descent but achieves lower regret in many practical situations. Finally, we further extend our framework for generalized strongly convex functions.
Sparse Signal Recovery Using Markov Random Fields
Compressive Sensing (CS) combines sampling and compression into a single sub-Nyquist linear measurement process for sparse and compressible signals. In this paper, we extend the theory of CS to include signals that are concisely represented in terms of a graphical model. In particular, we use Markov Random Fields (MRFs) to represent sparse signals whose nonzero coefficients are clustered. Our new model-based reconstruction algorithm, dubbed Lattice Matching Pursuit (LaMP), stably recovers MRF-modeled signals using many fewer measurements and computations than the current state-of-the-art algorithms.
Measures of Clustering Quality: A Working Set of Axioms for Clustering
Shai Ben-David, Margareta Ackerman
Aiming towards the development of a general clustering theory, we discuss abstract axiomatization for clustering. In this respect, we follow up on the work of Kleinberg, () that showed an impossibility result for such axiomatization. We argue that an impossibility result is not an inherent feature of clustering, but rather, to a large extent, it is an artifact of the specific formalism used in . As opposed to previous work focusing on clustering functions, we propose to address clustering quality measures as the object to be axiomatized. We show that principles like those formulated in Kleinberg's axioms can be readily expressed in the latter framework without leading to inconsistency. A clustering-quality measure (CQM) is a function that, given a data set and its partition into clusters, returns a non-negative real number representing how strong or conclusive the clustering is. We analyze what clustering-quality measures should look like and introduce a set of requirements (axioms) for such measures. Our axioms capture the principles expressed by Kleinberg's axioms while retaining consistency. We propose several natural clustering quality measures, all satisfying the proposed axioms. In addition, we analyze the computational complexity of evaluating the quality of a given clustering and show that, for the proposed CQMs, it can be computed in polynomial time.
Randomized neural networks are immortalized in this well-known AI Koan: In the days when Sussman was a novice, Minsky once came to him as he sat hacking at the PDP-6. "What are you doing?" asked Minsky. "I am training a randomly wired neural net to play tic-tac-toe," Sussman replied. "Why is the net wired randomly?" asked Minsky. Sussman replied, "I do not want it to have any preconceptions of how to play."Minsky then shut his eyes. "Why do you close your eyes?" Sussman asked his teacher. "So that the room will be empty," replied Minsky. At that moment, Sussman was enlightened. We analyze shallow random networks with the help of concentration of measure inequalities. Specifically, we consider architectures that compute a weighted sum of their inputs after passing them through a bank of arbitrary randomized nonlinearities. We identify conditions under which these networks exhibit good classification performance, and bound their test error in terms of the size of the dataset and the number of random nonlinearities.
Shape-Based Object Localization for Descriptive Classification
Geremy Heitz, Gal Elidan, Benjamin Packer, Daphne Koller
Discriminative tasks, including object categorization and detection, are central components of high-level computer vision. Sometimes, however, we are inter- ested in more refined aspects of the object in an image, such as pose or particular regions. In this paper we develop a method (LOOPS) for learning a shape and image feature model that can be trained on a particular object class, and used to outline instances of the class in novel images. Furthermore, while the training data consists of uncorresponded outlines, the resulting LOOPS model contains a set of landmark points that appear consistently across instances, and can be accurately localized in an image. Our model achieves state-of-the-art results in precisely out- lining objects that exhibit large deformations and articulations in cluttered natural images. These localizations can then be used to address a range of tasks, including descriptive classification, search, and clustering.
|1/21/2009||Minh Hoai Nguyen||
Title: Unlabeled data: Now it helps, now it doesn't
Title: Interactive Cutaway Illustrations of Complex 3D Models
Abstract: From the authors: "We present a system for authoring and viewing interactive cutaway illustrations of complex 3D models using conventions of traditional scientific and technical illustration. Our approach is based on the two key ideas that 1) cuts should respect the geometry of the parts being cut, and 2) cutaway illustrations should support interactive exploration. In our approach, an author instruments a 3D model with auxiliary parameters, which we call "rigging," that define how cutaways of that structure are formed. We provide an authoring interface that automates most of the rigging process. We also provide a viewing interface that allows viewers to explore rigged models using high-level interactions. In particular, the viewer can just select a set of target structures, and the system will automatically generate a cutaway illustration that exposes those parts. We have tested our system on a variety of CAD and anatomical models, and our results demonstrate that! Our approach can be used to create and view effective interactive cutaway illustrations for a variety of complex objects with little user effort."
I will present the following paper in this week's misc-reading group:
Flexible Depth of Field Photography by H. Nagahara, S. Kuthirummal, C. Zhou, and S.K. Nayar. ECCV'2008
I'll be discussing: Pinto N, Cox DD, and DiCarlo JJ. Why is Real-World Visual Object Recognition Hard? PLoS Computational Biology, 4(1):e27 (2008).
In the incindiery language of the authors' themselves: "Recent computational models have sought to match humans' remarkable visual abilities, and, using large databases of ''natural'' images, have shown apparently impressive progress. Here we show that caution is warranted. In particular, we found that a very simple neuroscience ''toy'' model, capable only of extracting trivial regularities from a set of images, is able to outperform most state-of-the-art object recognition systems on a standard ''natural'' test of object recognition. At the same time, we found that this same toy model is easily defeated by a simple recognition test that we generated to better span the range of image variation observed in the real world. Together these results suggest that current ''natural'' tests are inadequate for judging success or driving forward progress. In addition to tempering claims of success in the machine vision literature, these results point the way forward and call for renewed focus on image variation as a central challenge in object recognition."
|2/18/2009||Stephen T. Nuske||
Title: Visual Localisation in Dynamic Non-uniform Lighting
For vision to succeed as a perceptual mechanism in general field robotic applications, vision systems must overcome the difficult challenges presented by varying lighting conditions. Many current approaches rely on decoupling the effects of lighting, which is not possible in many situations -- not surprising considering an image is fundamentally an array of light measurements. This talk will describe two different visual localisation systems designed for two different field robot applications and were both designed to address the lighting challenges in their respective application environments.
The first visual localisation system discussed is for industrial ground vehicles operating outdoors. The system employs an invariant map combined with a robust localisation algorithm and an intelligent exposure control algorithm which together permit reliable localisation in a wide range of outdoor lighting conditions.
The second system discussed is for submarines navigating underwater structures, where the only light source is a spotlight mounted onboard the vehicle. The proposed system explicitly models the light source within the localisation framework which serves to predict the changing appearance of the structure. Experiments reveal that by understanding the effects of the lighting can help solve this difficult visual localisation scenario.
The results of the two systems are encouraging, given the extremely challenging dynamic non-uniform lighting in each environment and both systems are being developed with industry partners into the future.
Stephen's research is in vision systems for mobile robots, focusing on the creation of practical systems that can deal with the problems arising from dynamic non-uniform lighting conditions. Stephen began his undergraduate studies at the University of Queensland, Australia, in Software Engineering. His undergraduate thesis was on the vision system for the university's robot soccer team that placed second at the RoboCup in Portugal. During his undergraduate years he gained work experience at BSD Robotics; a company that develops equipment for automated medical laboratories. After receiving his undergraduate degree Stephen began a PhD based at the Autonomous Systems Laboraty at CSIRO in Australia. He has spent three months during his PhD at INRIA in Grenoble; a French national institute for computer science. Stephen has now started a PostDoc position here at CMU in the Field Robotics Center under Sanjiv Singh.
I will present the work of the Intelligent Autonomous Systems Group at the TU Munich. The work in our group deals with perceiving, interpreting and executing everyday manipulation tasks in a household environment, including the vision system for object recognition and human pose tracking, 3D environment maps created from laser scans, grounded knowledge processing, multi-level action understanding, and robot task and manipulation planning. Our goal is to cover the whole Perception-Cognition-Action loop and build robots that can act autonomously in human environments.
|3/11/2009||Spring Break||No meeting|
I will talk about my work that I contributed to this year's CVPR. It is titled Geometric Reasoning for Single Image Structure Recovery, and here is the abstract:
We study the problem of generating plausible interpretations of a scene from a collection of line segments automatically extracted from a single indoor image. We show that we can recognize the three dimensional structure of the interior of a building, even in the presence of occluding objects. Several physically valid structure hypotheses are proposed by geometric reasoning and verified to find the best fitting model to line segments, which is then converted to a full 3D model. Our experiments demonstrate that our structure recovery from line segments is comparable with methods using full image appearance. Our approach shows how a set of rules describing geometric constraints between groups of segments can be used to prune scene interpretation hypotheses and to generate the most plausible interpretation.
I will be presenting the work of:
Ivan Laptev, Marcin Marsza??ek, Cordelia Schmid & Benjamin Rozenfeld
The aim of this paper is to address recognition of natural human actions in diverse and realistic video settings. This challenging but important subject has mostly been ignored in the past due to several problems one of which is the lack of realistic and annotated video datasets. Our first contribution is to address this limitation and to investigate the use of movie scripts for automatic annotation of human actions in videos. We evaluate alternative methods for action retrieval from scripts and show benefits of a text-based classifier. sing the retrieved action samples for visual learning, we next turn to the problem of action classification in video. We present a new method for video classification that builds upon and extends several recent ideas including local space-time features, space-time pyramids and multi-channel non-linear SVMs. The method is shown to improve state-of-the-art results on the standard KTH action dataset by achieving 91.8% accuracy. Given the inherent problem of noisy labels in automatic annotation, we particularly investigate and show high tolerance of our method to annotation errors in the training set. We finally apply the method to learning and classifying challenging action classes in movies and show promising results.
I will talk about the following journal paper:
J.M. Morel and G.Yu, ASIFT: A New Framework for Fully Affine Invariant Image Comparison, to appear in SIAM Journal on Imaging Sciences, 2009.
Abstract. If a physical object has a smooth or piecewise smooth boundary, its images obtained by cameras in varying positions undergo smooth apparent deformations. These deformations are locally well approximated by affine transforms of the image plane. In consequence the solid object recognition problem has often been led back to the computation of affine invariant image local features. Such invariant features could be obtained by normalization methods, but no fully affine normalization method exists for the time being. Even scale invariance is only dealt with rigorously by the SIFT method. By simulating zooms out and normalizing translation and rotation, SIFT is invariant to four out of the six parameters of an affine transform. The method proposed in this paper, Affine-SIFT (ASIFT), simulates all image views obtainable by varying the two camera axis orientation parameters, namely the latitude and the longitude angles, left over by the SIFT method. Then it covers the other four parameters by using the SIFT method itself. The resulting method will be mathematically proved to be fully affine invariant. Against any prognosis, simulating all views depending on the two camera orientation parameters is feasible with no dramatic computational load. A two-resolution scheme further reduces the ASIFT complexity to about twice that of SIFT. A new notion, the transition tilt, measuring the amount of distortion from one view to another is introduced. While an absolute tilt from a frontal to a slanted view exceeding 6 is rare, much higher transition tilts are common when two slanted views of an object are compared (see Fig. 1.1). The attainable transition tilt is measured for each affine image comparison method. The new method permits to reliably identify features that have undergone transition tilts of large magnitude, up to 36 and higher. This fact is substantiated by many experiments which show that ASIFT outperforms significantly the state-of-the-art methods SIFT, MSER, Harris-Affine, and Hessian-Affine.
I will present:
Lampert, C. H., M. B. Blaschko and T. Hofmann: Beyond Sliding Windows: Object Localization by Efficient Subwindow Search, CVPR 2008 (best paper)
Most successful object recognition systems rely on binary classification, deciding only if an object is present or not, but not providing information on the actual object location. To perform localization, one can take a sliding window approach, but this strongly increases the computational cost, because the classifier function has to be evaluated over a large set of candidate subwindows. In this paper, we propose a simple yet powerful branchand- bound scheme that allows efficient maximization of a large class of classifier functions over all possible subimages. It converges to a globally optimal solution typically in sublinear time. We show how our method is applicable to different object detection and retrieval scenarios. The achieved speedup allows the use of classifiers for localization that formerly were considered too slow for this task, such as SVMs with a spatial pyramid kernel or nearest neighbor classifiers based on the 2-distance. We demonstrate state-of-the-art performance of the resulting systems on the UIUC Cars dataset, the PASCAL VOC 2006 dataset and in the PASCAL VOC 2007 competitio
|5/6/2009||Yuandong Tian||Internal only, see email for details.|
|5/13/2009||Fernando de la Torre||Canceled due to schedule conflict.|
Nonparametric Scene Parsing: Label Transfer via Dense Scene Alignment
Ce Liu, Jenny Yuen, Antonio Torralba
We consider the problem of imaging a scene with a given depth of field at a given exposure level in the shortest amount of time possible. We show that by (1) collecting a sequence of photos and (2) controlling the aperture, focus and exposure time of each photo individually, we can span the given depth of field in less total time than it takes to expose a single narrower-aperture photo.Using this as a starting point, we obtain two key results. First, for lenses with continuously-variable apertures, we derive a closed-form solution for the globally optimal capture sequence, i.e., that collects light from the specified depth of field in the most efficient way possible. Second, for lenses with discrete apertures, we derive an integer programming problem whose solution is the optimal sequence. Our results are applicable to off-the-shelf cameras and typical photography conditions, and advocate the use of dense, wide-aperture photo sequences as a light-efficient alternative to single-shot, narrow-aperture photography.
|9/30/2009||---||ICCV09 no meeting|
Title: Beyond Categories: The Visual Memex Model for Reasoning About Object Relationships
Abstract: The use of context is critical for scene understanding in computer vision, where the recognition of an object is driven by both local appearance and the object's relationship to other elements of the scene (context). Most current approaches rely on modeling the relationships between object categories as a source of context. In this paper we seek to move beyond categories to provide a richer appearance-based model of context. We present an exemplar-based model of objects and their relationships, the Visual Memex, that encodes both local appearance and 2D spatial context between object instances. We evaluate our model on Torralba's proposed Context Challenge against a baseline category-based system. Our experiments suggest that moving beyond categories for context modeling appears to be quite beneficial, and may be the critical missing ingredient in scene understanding systems.
The paper can be found here:
|(Back to Top)|
|11/11/2009||Mohit Gupta||I plan to talk about Optical Illusions for this week's misc-group meeting. I will show a few kinds of illusions (brightness and contrast illusions, relative motion illusions, shadow illusions etc.) along with explanations for some of them.|
|11/18/2009||---||CVPR10 deadline, no meeting|
|12/2/2009||Minh Hoai Nguyen||TBA|