Who? Sam T. Roweis Why? Faculty Candidate Where from? Gatsby Unit, University College of London What? Scalable Learning of Nonlinear Manifolds from Data When? Wednesday, January 24, 10:00am Where? Wean Hall 4623 Web? http://www.gatsby.ucl.ac.uk/~roweis/ How many numbers does it take to represent a complex object such as an image of a face? Obviously, one number per pixel is enough, but many fewer are actually needed. In fact, there is a thin "submanifold" of faces hiding in the very high dimensional image space. Learning the structure of such manifolds is the problem of nonlinear dimensionality reduction. Its solution allows compression, generation, interpolation and classification of complex objects. A first step is to do "embedding": given some high dimensional training data, find some low dimensional representation of each point which preserves desired relationships. I will introduce locally linear embedding (LLE), a new unsupervised learning algorithm I developed with Lawrence Saul (AT&T Labs), which uses local symmetries and linear reconstructions to compute low dimensional, neighborhood preserving embeddings of multivariate data. The embeddings of LLE, unlike those generated by multidimensional scaling (MDS) or principal components analysis (PCA) are able to capture the global structure of nonlinear manifolds. In particular, when applied to images of faces, LLE discovers a coordinate representation of facial attributes; applied to documents of text, it colocates---in a continuous semantic space---words with similar contexts. But we are *more ambitious* than just embedding. We want an explicit mapping between the data and embedding spaces that is valid both on and off the training data. In effect, we want a magic box that has a few knobs which, when turned, generate all variations of the objects in question (e.g. poses and expressions of faces); but no setting of knobs should generate an image that is not a face. We also want to be able to show the box an image and have it recommend knob settings which would generate that object. I will describe how, starting only with a large database of examples, we might build such a box by first applying LLE to the data. Finally, I will discuss some of the work I have done to scale up the algorithm so it works on very large datasets. Host: S. Thrun; contact Sharon Woodside for appointments.