Accelerating EM for large high-dimensional databases

Bo Thiesson
Christopher Meek
David Heckerman
Microsoft Research

The EM algorithm is a popular method for parameter estimation in a variety of problems involving missing data or incomplete information. The EM algorithm often requires significant computational resources and has been dismissed as impractical for large databases. We present an approach for significantly reducing the computational cost of applying the EM algorithm: A version of the incremental EM algorithm described by Neal and Hinton (1998) which iterates through blocks of data in a cyclic way. The number of cases in each block dramatically effects the efficiency of the algorithm. We show how to easily select a (near) optimal block size. We also demonstrate that the method significantly reduces computational costs for two high-dimensional real-world problems with large databases.

Back to Abstracts Last modified: Fri Nov 19 22:43:16 EST 1999