Projection Pursuit, Gaussian Scale Mixtures, and the EM Algorithm
WThe EM algorithm for fitting Gaussian mixture models is one of the
most widely-used clustering methods. Yet, surprisingly little is known
about its behavior. Under what conditions does it converge to a good
local optimum? What are good ways to initialize it, and to merge/remove
intermediate clusters? Such questions are difficult to answer with the
mathematical tools that have traditionally been applied to EM.
I will describe an alternative way of analyzing EM: by a probabilistic analysis. This reveals, first of all, that many common methods of initializing EM produce highly suboptimal results even in ideal clustering scenarios. On the other hand, I'll show that a particular variant of EM will provably recover a near-optimal clustering, provided that the clusters are adequately separated and that their distributions are of a certain fairly general form.
The type of cluster distributions allowed is motivated by a new result in projection pursuit, along the lines of the (somewhat mistaken) folklore that "almost all projections of high-dimensional data look Gaussian".
Specifically, I'll show that for any D-dimensional distribution with finite second moments, there is a precise sense in which almost all of its linear projections into d < D dimensions look like a scale-mixture of spherical Gaussians (concentric spherical Gaussians with the same center).
The extent of this effect depends upon the ratio of d to D, and upon the "eccentricity" of the high-dimensional distribution.
Dasgupta is an Assistant Professor in the Department of
Computer Science and Engineering at UC San Diego. He received his PhD
from Berkeley, and spent two years at AT&T Research Labs before
His area of research is algorithmic statistics, with a focus on unsupervised and minimally supervised learning. He has recently completed a textbook, "Algorithms" (with Christos Papadimitriou and Umesh Vazirani).