Computer Vision Misc Reading Group
Wednesdays, 4:30 - 6:00, Intel Lab (Top Floor, Collaborative Innovation Center)

Mailing List Subscription | | Presenter List | | Slides | | Paper Suggestions | | Previous Years | Related Links |


2008 Schedule

Jump to next talk
(the highlighted row)

Date Presenter Description
1/9/2008 TBA TBA
1/16/2008 Cancelled Scheduling mix-up. This meeting is cancelled.
1/23/2008 Dhruv Batra I'm going to talk about this recent talk from ICCV '07 -- Andrea Frome, Yoram Singer, Fei Sha, Jitendra Malik. Learning Globally-Consistent Local Distance Functions for Shape-Based Image Retrieval and Classification,

This is an improvement over their previous work -- Andrea Frome, Yoram Singer, Jitendra Malik. Image Retrieval and Recognition Using Local Distance Functions. NIPS 2006. A good background reference (which I won't necessarily talk about) is -- M. Schultz and T. Joachims, Learning a Distance Metric from Relative Comparisons, NIPS 2003.

1/30/2008 Dave Bradley I will be talking about deep learning in general and Restricted Boltzman Machines (RBMs) in particular. Deep learning refers to learning machines that have several intermediate representation layers. Traditionally, with the exception of convolutional neural networks, these deep machines have been hard to train. Recently, however, greedy layer-wise pretraining with unsupervised data has been found to produce good results for a variety of deep architectures. I will be covering some recent work presented at NIPS that use RBMs in deep architectures to model natural images (Osindero and Hinton 2007) and recognize the orientation of faces (Salakhutdinov and Hinton 2007). I will also cover an interesting empirical evaluation of RBMs that was presented at ICML (Larochelle et al. 2007) and takes a more critical view of the challenges remaining in deep learning.


Osindero, S. and Hinton, G. E. Modeling image patches with a directed hierarchy of Markov random fields. Advances in Neural Information Processing Systems 20, 2007

Salakhutdinov, R. R. and Hinton, G. E. Using Deep Belief Nets to Learn Covariance Kernels for Gaussian Processes. Advances in Neural Information Processing Systems 20 , 2007

Hugo Larochelle, Dumitru Erhan, Aaron Courville, James Bergstra and Yoshua Bengio, An Empirical Evaluation of Deep Architectures on Problems with Many Factors of Variation, International Conference on Machine Learning proceedings, 2007

2/6/2008 Henry Kang I will talk about learning a novel kernel from multiple kernels and its application in recognition tasks. In computer vision community, Support Vector Machine has been largely practiced. The recent research has been focused on engineering good kernels for each recognition task, such as Pyramid Matching Kernel and Spatial Pyramid Matching Kernel. In ICCV 2007, there are two papers accepted discussing generating a novel kernel by learning from multiple kernels. The key idea is different kernels might have different discriminative-invariance power, it is advantageous to combine them together linearly and form a new kernel.

Application Papers:

M. Varma and D. Ray. Learning the discriminative power-invariance trade-off. International Conference on Computer Vision, October 2007.

Ankita Kumar, Cristian Sminchisescu. Support Kernel Machines for Object Recognition. ICCV 2007

Background in Machine Learning community:

O. Chapelle, V. Vapnik, O. Bousquet, and S. Mukherjee. Choosing multiple parameters for Support Vector Machines. Machine Learning, 46:131C159, 2002.

A. Rakotomamonjy, F. Bach, S. Canu, and Y. Grandvalet. More efficiency in multiple kernel learning. In ICML, 2007.

F. R. Bach, G. R. G. Lanckriet, and M. I. Jordan. Multiple kernel learning, conic duality, and the SMO algorithm. In NIPS, 2004.

2/13/2008 Sanjeev Koppal The paper I am presenting is : Multi-View Stereo for Community Photo Collections

It can be found at:

2/20/2008 Ankur Datta I am presenting the following cute iccv 2007 paper:

Synthetic Aperture Tracking: Tracking through Occlusions

Abstract: Occlusion is a significant challenge for many tracking algorithms. Most current methods can track through transient occlusion, but cannot handle significant extended occlusion when the object's trajectory may change significantly. We present a method to track a 2D object through significant occlusion using multiple nearby cameras (e.g., a camera array). When an occluder and object are at different depths, different parts of the object are visible or occluded in each view due to parallax. By aggregating across these views, the method can track even when any individual camera observes very little of the target object. Implementationwise, the methods are straightforward and build upon established single-camera algorithms. They do not require explicit modeling or reconstruction of the scene and enable tracking in complex, dynamic scenes with moving cameras. Analysis of accuracy and robustness shows that these methods are successful when upwards of 70% of the object is occluded in every camera view. To the best of our knowledge, this system is the first capable of tracking in the presence of such significant occlusion.

Jean-Francois Lalonde Continuing the ICCV'07 review, I will be talking about the following paper from Yael Pritch et al.:

Webcam synopsis: peeking around the world

Here is the abstract:

The world is covered with millions of webcams. Some are private, but many transmit everything in their field of view over the internet 24 hours a day. A web search finds public webcams in airports, intersections, classrooms, parks, shops, ski resorts, and more. These public webcams are an endless resource, and some sites are already mapping them by location or by functionality. But when a webcam is selected - most of the video broadcast will be of no interest due to lack of activity. We propose to generate a short video that will be a synopsis of an infinite video stream, such as generated by a webcam. We would like to address queries like "I would like to watch in one minute the highlights of this camera broadcast during the past day". The two major phases are: (i) An online conversion of the video stream into a searchable structure based on objects and activities (rather than frames). (ii) A response phase, generating the video synopsis as a response to the user's query. To include maximum information in a short synopsis we simultaneously show activities that may have happened at different times. The synopsis video can also be used as an index into the original video stream, restoring the chronological order.

3/5/2008 Stano Funiak In my talk, I will cover two papers. First, I will briefly review the Maximum Variance Unfolding (MVU) method for dimensionality reduction:

Learning a kernel matrix for nonlinear dimensionality reduction K. Q. Weinberger, F. Sha, and L. K. Saul, ICML 2004

Then I will talk about a recent paper that allows one to integrate side information (such as class labels) into the embedding and provides a theoretical justification for MVU:

Colored Maximum Variance Unfolding Le Song, Alex Smola, Karsten Borgwardt, Arthur Gretton NIPS 2007

3/19/2008 Jonathan Huang I'll present: A novel set of rotationally and translationally invariant features for images based on the non-commutative bispectrum by Risi Kondor

abstract: We propose a new set of rotationally and translationally invariant features for image or pattern recognition and classification. The new features are cubic polynomials in the pixel intensities and provide a richer representation of the original image than most existing systems of invariants. Our construction is based on the generalization of the concept of bispectrum to the three-dimensional rotation group SO(3), and a projection of the image onto the sphere.

3/26/2008 Yaser Sheikh I'll be presenting, Lee and Elgammal, from ICCV 2007.
4/2/2008 Christopher Geyer Here's the paper and abstract:

Deformable Template As Active Basis by Ying Nian Wu, Zhangzhang Si, Chuck Fleming, and Song-Chun Zhu in ICCV 2007 (this paper won honorable mention at ICCV '07)

Research homepage:

This article proposes an active basis model and a shared pursuit algorithm for learning deformable templates from image patches of various object categories. In our generative model, a deformable template is in the form of an active basis, which consists of a small number of Gabor wavelet elements at different locations and orientations. These elements are allowed to slightly perturb their locations and orientations before they are linearly combined to generate each individual training or testing example. The active basis model can be learned from training image patches by the shared pursuit algorithm. The algorithm selects the elements of the active basis sequentially from a dictionary of Gabor wavelets. When an element is selected at each step, the element is shared by all the training examples, in the sense that a perturbed version of this element is added to improve the encoding of each example. Our model and algorithm are developed within a probabilistic framework that naturally embraces wavelet sparse coding and random field.

4/9/2008 Minh Hoai Nguyen Title: Imaged-based Shaving

Authors: Minh Hoai Nguyen, Jean-Francois Lalonde, Alexei Efros, Fernando De la Torre.

Abstracts: Many categories of objects, such as human faces, can be naturally viewed as a composition of several different layers. For example, a bearded face with glasses can be decomposed into three layers: a layer for glasses, a layer for the beard and a layer for other permanent facial features. While modeling such a face with a linear subspace model could be very difficult, layer separation allows for easy modeling and modification of some certain structures while leaving others unchanged. In this paper, we present a method for automatic layer extraction and its applications to face synthesis and editing. Layers are automatically extracted by utilizing the differences between subspaces and modeled separately. We show that our method can be used for tasks such beard removal (virtual shaving), beard synthesis, and beard transfer, among others.


4/16/2008 Gunhee Kim This talk serves as my speaking requirement for the MS degree.

Thesis: Link analysis techniques for object modeling and recognition

A copy of the thesis is available at: see email for detail.(link available from Tuesday. Since the ECCV paper is still under review, *Please do not redistribute!*)


This paper proposes a novel approach for unsupervised modeling and recognition of object categories in which we first build a large-scale complex network which captures the interactions of all unit visual features across the entire training set and we infer information, such as which features are in which categories, directly from the graph by using link analysis techniques. The link analysis techniques are based on well-established graph mining techniques used in diverse applications such as WWW, bioinformatics, and social networks. The techniques operate directly on the patterns of connections between features in the graph rather than on statistical properties, e.g., from clustering in feature space. We argue that the resulting techniques are simpler, and we show that they perform similarly or better compared to state of the art techniques on both common and more challenging data sets. Also, we extend this link analysis idea to combine it with the statistical framework of topic contents. By doing so, our approach not only dramatically increases performance but also provides feasible solutions to some persistent problems of statistical topic models based on bag-of-words representation such as modeling of geometric information, computational complexity, and the inherent ambiguity of visual words. Our approach can be incorporated in any generative models, but here we consider two popular models, pLSA and LDA. Experimental results show that the topic models updated by adding link analysis terms significantly outperform the standard pLSA and LDA models. Furthermore, we presented competitive performances on unsupervised modeling, classification, and localization tasks with datasets such as MSRC and PASCAL2005.

Thesis Committee Members: Martial Hebert(Chair), Christos Faloutsos(CSD ), Marius Leordeanu

4/23/2008 Santosh Kumar Divvala I will be presenting a small paper that we contributed to the CVPR 2008 workshop.

"Using lots of unlabelled data to help single-view geometry estimation" Abstract: We describe a preliminary investigation of utilising large amounts of unlabelled image data to help in the estimation of rough scene layout. We take the single-view geometry estimation system of Hoiem et al. as the baseline and see if it is possible to improve its performance by considering a set of similar scenes gathered from the web. The two complimentary approaches being considered are 1) improving surface classification by using average geometry estimated from the matches, and 2) improving surface segmentation by injecting segments generated from the average of the matched images. The system is evaluated using the labelled 300-image dataset of Hoiem et al. and shows promising results.

I am still in the process of updating the camera-ready copy of the paper. However, I have put a version exclusively for tomorrow's Misc-read audience(See email for details).

4/30/2008 meeting postponed ---
5/7/2008 Pyry Matikainen I'll be briefly presenting some of my own research. Abstract: Determining the motion consistency between two video clips is a key component for many applications such as video event detection and human pose estimation. Shechtman and Irani recently proposed a method for measuring the motion consistency between two videos by representing the motion about each point with a space-time Harris matrix of spatial and temporal derivatives. A motion-consistency measure can be accurately estimated without explicitly calculating the optical flow from the videos, which could be noisy. However, the motion consistency calculation is computationally expensive and it must be evaluated between all possible pairs of points between the two videos. We propose a novel quantization method for the space-time Harris matrices that reduces the consistency calculation to a fast table lookup for any arbitrary consistency measure. We demonstrate that for the continuous rank drop consistency measure used by Shechtman and Irani, our quantization method is much faster and achieves the same accuracy as the existing approximation.
5/14/2008 David Lee I will summarize Stanford's effort on 3D reconstruction from single image.

A dynamic Bayesian network model for autonomous 3d reconstruction from a single indoor image. Delage E., Lee H. , Ng A Y. CVPR 2006.

Automatic Single-Image 3d Reconstructions of Indoor Manhattan World Scenes. Delage E., Lee H., Ng A. Y. ISRR 2005.

Learning Depth from Single Monocular Images, Ashutosh Saxena, Sung H. Chung, Andrew Y. Ng. In Neural Information Processing Systems (NIPS) 18, 2005

Depth Estimation using Monocular and Stereo Cues, Ashutosh Saxena, Jamie Schulte, Andrew Y. Ng. In International Joint Conference on Artificial Intelligence (IJCAI), 2007

3-D Depth Reconstruction from a Single Still Image, Ashutosh Saxena, Sung H. Chung, Andrew Y. Ng. International Journal of Computer Vision (IJCV), Aug 2007

Learning 3-D Scene Structure from a Single Still Image, Ashutosh Saxena, Min Sun, Andrew Y. Ng, In ICCV workshop on 3D Representation for Recognition (3dRR-07), 2007. (Best paper award.)

3-D Reconstruction from Sparse Views using Monocular Vision, Ashutosh Saxena, Min Sun, Andrew Y. Ng, In ICCV workshop on Virtual Representations and Modeling of Large-scale environments (VRML), 2007

5/21/2008 Santosh Kumar Divvala I will be talking about "Co-Training". I will be presenting the general idea behind this interesting concept and then talk about few example works that have used this idea. Main Paper: "Combining Labeled and Unlabeled Data with Co-Training" by Blum and Mitchell in COLT'98

Few example papers:

"Unsupervised Improvement of Visual Detectors using Co-Training" in ICCV'03

"Online Detection and Classification of moving objects using progressively improving detectors" in CVPR'05

"Co-Tracking using semi-supervised support vector machines" in ICCV'07

5/28/2008 Daniel Munoz I will be presenting a synthesis of recent work done by the Oxford Brookes Vision Group on minimizing submodular energy functions for inference in Markov Random Fields.

Topics covered include: -approximating optimal solutions for certain high-order, multi-label energy functions -finding optimal solutions for certain multi-label energy functions

from the following papers: -Pushmeet Kohli, M Pawan Kumar, Philip Torr P3 & Beyond: Solving Energies with Higher Order Cliques, CVPR 2007

-Pushmeet Kohli, Lubor Ladicky, Philip Torr Graph Cuts for Minimizing Robust Higher Order Potentials. Technical report, Oxford Brookes University 2008.

-Pushmeet Kohli, Lubor Ladicky, Philip Torr Robust Higher Order Potentials for Enforcing Label Consistency. CVPR 2008.

-Srikumar Ramalingam, Pushmeet Kohli, Karteek Alahari, Philip Torr Exact Inference in Multi-label CRFs with Higher Order Cliques, CVPR 2008

These papers can be found at Kohli's website:

7/15/2008 All CVPR overview.

Check paper discussed and suggested at:

8/20/2008 Bryan Russell SIFT flow: dense correspondence across different scenes C. Liu, J. Yuen, A. Torralba, J. Sivic, and W. T. Freeman

To appear in European Conference on Computer Vision (ECCV) 2008, Oral presentation

Abstract: While image registration has been studied in different areas of computer vision, aligning images depicting different scenes remains a challenging problem, closer to recognition than to image matching. Analogous to optical flow, where an image is aligned to its temporally adjacent frame, we propose SIFT flow, a method to align an image to its neighbors in a large image collection consisting of a variety of scenes. For a query image, histogram intersection on a bag-of-visual-words representation is used to find the set of nearest neighbors in the database. The SIFT flow algorithm then consists of matching densely sampled SIFT features between the two images, while preserving spatial discontinuities. The use of SIFT features allows robust matching across different scene/object appearances and the discontinuity-preserving spatial model allows matching of objects located at different parts of the scene. Experiments show that the proposed approach is able to robustly align complicated scenes with large spatial distortions. We collect a large database of videos and apply the SIFT flow algorithm to two applications: (i) motion field prediction from a single static image and (ii) motion synthesis via transfer of moving objects.

8/27/2008 Tom Stepleton My topic will be modeling time series with the help of Dirichlet processes, with excursions introducing (1) a really easy way to think about Dirichlet distributions and (hierarchical) Dirichlet processes (2) a framework for partitioning integer weighted graphs with Plya processes (which are very similar to Dirichlet processes). Demonstrations will show some results on clustering motions from video, but bear in mind, these are machine learning results, not computer vision results!

M. J. Beal and Z. Ghahramani and C. E. Rasmussen The Infinite Hidden Markov Model NIPS 2001

Y.W. Teh, Jordan M.I., Beal M.J., and Blei D.M. Hierarchical Dirichlet Processes J. Am. Stat. Assoc., 101(476), 2006

9/3/2008 Fernando de la Torre I will lead the presentation of this paper:



From the recovery of structure from motion to the separation of style and content, many problems in computer vision have been successfully approached by using bilinear models. The reason for the success of these models is that a globally optimal decomposition is easily obtained from the Singular Value Decomposition (SVD) of the observation matrix. However, in practice, the observation matrix is often incomplete, the SVD can not be used, and only suboptimal solutions are available. The majority of these solutions are based on iterative local refinements of a given cost function, and lack any guarantee of convergence to the global optimum. In this paper, we propose a globally optimal solution, for particular patterns of missing entries. To achieve this goal, we re-formulate the problem as the minimization of the spectral norm of the matrix of residuals, i.e., we seek the completion of the observation matrix such that the largest singular value of its difference to a low rank matrix is the smallest possible. The class of patterns of missing entries we deal with is known as the Young diagram, which includes, as particular cases, many relevant situations, such as the missing of an entire submatrix. We describe experiments that illustrate how our globally optimal solution has impact in practice.

9/10/2008 Marius Leordeanu Canceled
9/17/2008 James Hays Here's a list of papers I'll mention, although I won't spend much time on some of them -- Motion Invariant Photography. Anat Levin, Peter Sand, Taeg Sang Cho, Frdo Durand, William T. Freeman

Unwrap Mosaics: A new representation for video editing (Project). Alex Rav-Acha (Weizmann Institute of Science), Pushmeet Kohli, Andrew Fitzgibbon, Carsten Rother (Microsoft Research Cambridge)

High-Quality Motion Deblurring From a Single Image. Qi Shan, Jiaya Jia (Chinese University of Hong Kong), Aseem Agarwala (Adobe Systems, Inc.)

Progressive Inter-scale and intra-scale Non-blind Image Deconvolution. Lu Yuan (Hong Kong University of Science & Technology), Jian Sun (Microsoft Research Asia), Long Quan (Hong Kong University of Science & Technology), Heung-Yeung Shum (Microsoft Research Asia)

Factoring Repeated Content Within and Among Images. Huamin Wang (Georgia Institute of Technology), Yonatan Wexler, Eyal Ofek (Microsoft), Hugues Hoppe (Microsoft Research)

Finding Paths through the World's Photos. Noah Snavely, Rahul Garg, Steven M. Seitz (University of Washington), Richard Szeliski (Microsoft Research)

Improved Seam Carving for Video Retargeting. Michael Rubinstein (Tel-Aviv University), Ariel Shamir (The Interdisciplinary Center), Shai Avidan (Adobe Systems Inc.)

Face Swapping: Automatic Face Replacement in Photographs. Dmitri Bitouk, Neeraj Kumar, Samreen Dhillon, Peter Belhumeur, Shree Nayar

Data-driven enhancement of facial attractiveness. Tommer Leyvand, Daniel Cohen-Or, Gideon Dror, Dani Lischinski

Self-Animating Images: Illusory Motion Using Repeated Asymmetric Patterns. Ming-Te Chi, Tong-Yee Lee (National Cheng-Kung University, Taiwan), Yingge Qu, Tien-Tsin Wong (Chinese University of Hong Kong)

9/24/2008 Alyosha Efros I will briefly talk about the main ideas in the following two upcoming ECCV papers:

* Histogram-based Image Segmentation in the Presence of Shadows and Highlights

* Priors for Large Photo Collections and What they Reveal about Cameras

10/1/2008 Pete Barnum I'll be discussing two CVPR 2008 papers on tracking.

1. "Drift-free Tracking of Rigid and Articulated Objects" by Juergen Gall, Bodo Rosenhahn, and Hans-Peter Seidel

2. "What Can Missing Correspondences Tell Us About 3D Structure and Motion?" by Christopher Zach, Arnold Irschara, and Horst Bischof

I'm going for a mix of solid results and interesting new ideas. The first is a system paper on rigid and articulated objects. The second is a fun take on outlier detection.

10/8/2008 Dhruv Batra I'm going to give an overview of a few interesting papers from BMVC this year. Here's the list:

Efficiently Combining Contour and Texture Cues for Object Recognition, J. Shotton, A. Blake and R. Cipolla

TVSeg - Interactive Total Variation Based Image Segmentation, M. Unger, T. Pock, W. Trobin, D. Cremers and H. Bischof

Foreground Focus: Finding Meaningful Features in Unlabeled Images, Y.J. Lee and K. Grauman

Combining High-Resolution Images With Low-Quality Videos, F. Schubert and K. Mikolajczyk

From Visual Query to Visual Portrayal, A. Shahrokni, C. Mei, P. Torr and I. Reid

Long Term Arm and Hand Tracking for Continuous Sign Language TV Broadcasts, P. Buehler, M. Everingham, D. Huttenlocher and A. Zisserman

10/15/2008 Jason Saragih I will give an overview of my previous work on discriminative linear face model fitting using a method called the Iterative Error Bound Minimization. Some references:

Iterative Error Bound Minimisation for AAM Alignment, J. Saragih and R. Goecke

A Nonlinear Discriminative Approach to AAM Fitting, J. Saragih and R. Goecke

10/22/2008 Hong-Wen (Henry) Kang I will first give a brief overview of recent works in image indexing and fast matching, following the Bag-Of-Words framework.

Next I'll use the following two papers as examples.

J. Philbin, O. Chum, M. Isard, J. Sivic, and A. Zisserman, Object retrieval with large vocabularies and fast spatial matching, in Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2007.

O. Chum, J. Philbin, and A. Zisserman, Near duplicate image detection: min-hash and tf-idf weighting, in Proceedings of the British Machine Vision Conference, 2008.

10/29/2008 Sanjeev Koppal Canceled
11/5/2008 Ankur Datta Articulated Multibody Tracking Under Egomotion, S. Gammeter, A. Ess, T. Jaeggli, K. Schindler, B. Leibe, and L. van Gool European Conference on Computer Vision (ECCV'08)


In this paper, we address the problem of 3D articulated multi-person tracking in busy street scenes from a moving, human-level observer. In order to handle the complexity of multi-person interactions, we propose to pursue a two-stage strategy. A multi-body detection-based tracker first analyzes the scene and recovers individual pedestrian trajectories, bridging sensor gaps and resolving temporary occlusions. A specialized articulated tracker is then applied to each recovered pedestrian trajectory in parallel to estimate the tracked person's precise body pose over time. This articulated tracker is implemented in a Gaussian Process framework and operates on global pedestrian silhouettes using a learned statistical representation of human body dynamics. We interface the two tracking levels through a guided segmentation stage, which combines traditional bottom-up cues with top-down information from a human detector and the articulated tracker's shape prediction. We show the proposed approach's viability and demonstrate its performance for articulated multi-person tracking on several challenging video sequences of a busy inner-city scenario.

11/12/2008 Jean-Francois Lalonde Understanding camera trade-offs through a Bayesian analysis of light field projections by A. Levin, W. T. Freeman, F. Durand, from ECCV 2008


Computer vision has traditionally focused on extracting structure, such as depth, from images acquired using thin-lens or pinhole optics. The de- velopment of computational imaging is broadening this scope; a variety of un- conventional cameras do not directly capture a traditional image anymore, but instead require the joint reconstruction of structure and image information. For example, recent coded aperture designs have been optimized to facilitate the joint reconstruction of depth and intensity. The breadth of imaging designs requires new tools to understand the tradeoffs implied by different strategies. This paper introduces a uni?ed framework for analyzing computational imaging approaches. Each sensor element is modeled as an inner product over the 4D light ?eld. The imaging task is then posed as Bayesian inference: given the observed noisy light ?eld projections and a prior on light ?eld signals, estimate the origi- nal light ?eld. Under common imaging conditions, we compare the performance of various camera designs using 2D light ?eld simulations. This framework al- lows us to better understand the tradeoffs of each camera type and analyze their limitations.

11/19/2008 Stano Funiak I will present two papers on optimizing a set of rotational constraints among a large number of rigid bodies:

A Tree Parameterization for Efficiently Computing Maximum Likelihood Maps using Gradient Descent Giorgio Grisetti, Cyrill Stachniss, Slawomir Grzonka, Wolfram Burgard

Efficient Estimation of Accurate Maximum Likelihood Maps in 3D Giorgio Grisetti, Slawomir Grzonka, Cyrill Stachniss, Patrick Pfaff, Wolfram Burgard

The papers use an idea of parameterizing the poses with a tree structure and distributing the error in a constraint using spherical linear interpolation. Even though the papers are set in the context of SLAM, their treatment is sufficiently general and may be of interest to the misc-reading audience. In particular, their algorithm can handle very poor initializations and converges quickly even on large problems.

11/26/2008 --- Thanksgiving break. No meeting.
12/3/2008 All CVPR decompression.
12/10/2008 Marius Leordeanu Unsupervised Learning for Graph Matching

Marius Leordeanu


Graph matching is an important problem in computer vision. It is used in 2D and 3D object matching and recognition. Despite its importance, there is little literature in the field on learning the parameters that define the graph matching problem, even though learning is important for improving the matching rate, as shown by this and other work. In this paper we show for the first time how to perform learning in an unsupervised fashion, that is when no correct correspondences between graphs are given during training. Our algorithm is based on gradient ascent that uses an efficient method for computing the partial derivatives of the principal eigenvector of large symmetric matrices with large eigengap. We also show empirically that unsupervised learning is comparable in efficiency and quality with the supervised one, while avoiding the tedious manual labeling of ground truth correspondences.

12/17/2008 Tomasz Malisiewicz Paper:
Spectral Hashing Y. Weiss, A. Torralba, R. Fergus. NIPS, 2008.

Semantic hashing seeks compact binary codes of data-points so that the Hamming distance between codewords correlates with semantic similarity. In this paper, we show that the problem of ??nding a best code for a given dataset is closely related to the problem of graph partitioning and can be shown to be NP hard. By relaxing the original problem, we obtain a spectral method whose solutions are simply a subset of thresholded eigen- vectors of the graph Laplacian. By utilizing recent results on convergence of graph Laplacian eigenvectors to the Laplace-Beltrami eigenfunctions of manifolds, we show how to efficiently calculate the code of a novel data- point. Taken together, both learning the code and applying it to a novel point are extremely simple. Our experiments show that our codes outper- form the state-of-the art.

Meetings in Previous Years

Paper Lists from Previous Years

Related Links

This file is located at: /afs/cs/project/vmr/www/misc_read/