Who? Sam T. Roweis
Why? Faculty Candidate
Where from? Gatsby Unit, University College of London
What? Scalable Learning of Nonlinear Manifolds from Data
When? Wednesday, January 24, 10:00am
Where? Wean Hall 4623
Web? http://www.gatsby.ucl.ac.uk/~roweis/
How many numbers does it take to represent a complex object such as an
image of a face? Obviously, one number per pixel is enough, but many
fewer are actually needed. In fact, there is a thin "submanifold" of
faces hiding in the very high dimensional image space. Learning the
structure of such manifolds is the problem of nonlinear dimensionality
reduction. Its solution allows compression, generation, interpolation
and classification of complex objects.
A first step is to do "embedding": given some high dimensional
training data, find some low dimensional representation of each point
which preserves desired relationships. I will introduce locally
linear embedding (LLE), a new unsupervised learning algorithm I
developed with Lawrence Saul (AT&T Labs), which uses local symmetries
and linear reconstructions to compute low dimensional, neighborhood
preserving embeddings of multivariate data. The embeddings of LLE,
unlike those generated by multidimensional scaling (MDS) or principal
components analysis (PCA) are able to capture the global structure of
nonlinear manifolds. In particular, when applied to images of faces,
LLE discovers a coordinate representation of facial attributes;
applied to documents of text, it colocates---in a continuous semantic
space---words with similar contexts.
But we are *more ambitious* than just embedding. We want an explicit
mapping between the data and embedding spaces that is valid both on
and off the training data. In effect, we want a magic box that has a
few knobs which, when turned, generate all variations of the objects
in question (e.g. poses and expressions of faces); but no setting of
knobs should generate an image that is not a face. We also want to be
able to show the box an image and have it recommend knob settings
which would generate that object. I will describe how, starting only
with a large database of examples, we might build such a box by first
applying LLE to the data. Finally, I will discuss some of the work I
have done to scale up the algorithm so it works on very large
datasets.
Host: S. Thrun; contact Sharon Woodside for appointments.