Thu May 5 1994, WeH 4601, 1:30
Hierarchical Mixtures for Non-Experts
Geoff Gordon
Jordan and Jacobs published a tech report (MIT Cog Sci 9301) titled
"Hierarchical Mixtures of Experts and the EM Algorithm". It describes
a learning scheme which is a cross between ID3, neural nets, and
generalized linear regression. Unfortunately, their discussion is
highly dense and mathematical. I've spent some time trying to decipher
it, because the algorithm looked cool, and I thought I'd try to pass
on the fruits of my labor.
I'll start by defining maximum likelihood estimation and giving a few
examples. One of the examples will be linear regression; I'll give a
quick derivation to show what the necessary assumptions are.
These assumptions are restrictive; so I will introduce generalized
linear models. Applying maximum likelihood to GLIMs, along with a
numerical approximation, gives a fitting algorithm called iteratively
reweighted least squares.
Finally, I'll describe Jordan & Jacobs' HMEs and give (without proof)
a fitting procedure based on the EM algorithm. This part of the talk
will include a small performance comparison with neural nets.
If time permits, I'll describe a pruning algorithm for HMEs that I've
been working on.