Marina Meila

Andrew Moore

Carnegie Mellon University

Chow and Liu introduced an algorithm for fitting a multivariate distribution with a tree graphical model. The original algorithm is quadratic in the domain dimension n and linear in the number of data points N that define the target distribution. We present two algorithms that aim to accelerate the Chow and Liu algorithm for high dimensional domains and large data sets. The first algorithm constructs a Gaussian tree over a continuous domain using sampling to direct the search towards the most promising edges. The second algorithm is taylored for discrete domains with sparse variables. We show that in the case of sparse data, a tree distribution can be fit exactly in time and memory that is typically subquadratic in both n and N. The algorithm exhibits speed ups of up to three orders of magnitude in experiments.

Back to Abstracts Last modified: Sun Nov 21 08:27:03 EST 1999