Automatic Derivation of Statistical Algorithms: The EM Family and Beyond

Alex Gray - joint work with Bernd Fischer and Johann Schumann, NASA Ames, and Wray Buntine, Helsinki Institute for IT.

Abstract

  Machine learning has reached a point where most probabilistic methods can be understood as variations, extensions and combinations of a much smaller set of abstract themes, e.g., as different instances of the EM algorithm for different Bayesian network structures. This enables the systematic derivation of algorithms customized for different models. Here, we demonstrate the AutoBayes system, which takes a high-level statistical model specification, uses powerful symbolic techniques based on schema-based program synthesis and computer algebra to derive an efficient specialized algorithm for learning that model, and generates executable C or Matlab code implementing that algorithm.

This capability is far beyond that of code collections such as Matlab toolboxes or even tools for model-independent optimization such as BUGS for Gibbs sampling: complex new algorithms can be generated without new programming, algorithms can be highly specialized and tightly crafted for the exact structure of the model and data, and efficient and commented code can be generated for different languages or systems. We present a number of examples of automatically-derived algorithms ranging from closed-form solutions of Bayesian textbook problems to recently-proposed EM algorithms for clustering, regression, and a multinomial form of PCA.

The AutoBayes Project was begun by Dr. Wray Buntine (one of the pioneers of this general perspective in machine learning) in 1995 and also consists of Dr. Bernd Fischer and Dr. Johann Schumann (experts in automated software engineering) at NASA Ames, and myself. I joined the project in 2001 with the goal of pushing AutoBayes to the point of becoming a serious tool for the machine learning research community. To this end I've been working to extend the capabilities of the system with algorithmic schemas of recent interest in machine learning research, including particle filters, Kalman filter variants, and fast tree-based learning algorithms.


Back to the Main Page

Charles Rosenberg
Last modified: Mon Dec 9 10:46:55 EST 2002