Subject: CALD Seminar Nov. 14, 2005

 

Refreshments will be served at 4:15p outside of A53 Baker Hall. 


Where:  A53 Baker Hall

Speaker: Ann Lee

Title:  Geometric tools for high-dimensional data analysis
 
Abstract: 

In high-dimensional data analysis, one is often faced with the problem that real data is noisy and in many cases given in coordinates that are not informative for understanding the data structure itself or for performing later tasks, such as clustering, classification and regression. The combination of noise and very high dimensions (such as >1000) presents challenges for data analysis and calls for efficient dimensionality reduction tools that take the inherent geometry of natural data into account.  In this talk, I will first describe a data-driven multi-scale basis that can be used for feature extraction of smooth data as well as data where the coordinates may be randomly ordered. I will then, in the second half of my talk, describe a general framework for dimensionality reduction, data set parameterization and clustering that combines many ideas from eigenmaps, spectral graph theory and harmonic analysis. Our construction is based on a Markov random walk on the data, and allows one to define a system of coordinates that is robust to noise, and that reflects the intrinsic geometry or connectivity of the data points in a diffusion process. Examples will be taken from image analysis, word-document clustering and spectroscopy. (Part of this work is joint with R.R. Coifman and S. Lafon)