Kernelized Sorting

Abstract

Matching pairs of objects is a fundamental operation of unsupervised learning. For instance, we might want to match a photo with a textual description of a person. In those cases it is desirable to have a compatibility function which determines how one set may be translated into the other. For many such instances we may be able to design a compatibility score based on prior knowledge or to observe one based on the co-occurrence of such objects.

In some cases, however, such a match may not exist or it may not be given to us beforehand. That is, while we may have a good understanding of two sources, we may not understand the mapping between the two spaces. For instance, we might have two collections of documents purportedly covering the same content, written in two different languages. Can we determine the correspondence between these two sets of documents without using a dictionary?

We will present a method which is able to perform such matching WITHOUT the need of a cross-domain similarity measure and we shall show that if such measures exist it generalizes normal sorting. Our method relies on the fact that one may estimate the dependence between sets of random variables even without knowing the cross-domain mapping. Various criteria are available. We choose the Hilbert Schmidt Independence Criterion between two sets and we maximize over the permutation group to find a good match. As a side-effect we obtain an explicit representation of the covariance.

We will demonstrate this kernelized sorting using various examples, including image layout, image matching, data attribute matching and multilingual document matching.

Bio

Venue, Date, and Time

Venue: NSH 1507

Date: Monday, September 22, 2008

Time: 12:00 noon

Slides