Thu May 5 1994, WeH 4601, 1:30 Hierarchical Mixtures for Non-Experts Geoff Gordon Jordan and Jacobs published a tech report (MIT Cog Sci 9301) titled "Hierarchical Mixtures of Experts and the EM Algorithm". It describes a learning scheme which is a cross between ID3, neural nets, and generalized linear regression. Unfortunately, their discussion is highly dense and mathematical. I've spent some time trying to decipher it, because the algorithm looked cool, and I thought I'd try to pass on the fruits of my labor. I'll start by defining maximum likelihood estimation and giving a few examples. One of the examples will be linear regression; I'll give a quick derivation to show what the necessary assumptions are. These assumptions are restrictive; so I will introduce generalized linear models. Applying maximum likelihood to GLIMs, along with a numerical approximation, gives a fitting algorithm called iteratively reweighted least squares. Finally, I'll describe Jordan & Jacobs' HMEs and give (without proof) a fitting procedure based on the EM algorithm. This part of the talk will include a small performance comparison with neural nets. If time permits, I'll describe a pruning algorithm for HMEs that I've been working on.