12:00, Wed 18 Sep 1996, WeH 7220
Learning to Emulate an Unknown Probability Distribution
Leemon Baird
If a learning system is repeatedly shown random vectors that are
generated independently according to some unknown distribution, it
should be able to learn that distribution. This is simply modeling a
stochastic system. After learning, it should be able to tell the
probability pdf(x) or cdf(x) for any vector x. Even more importantly,
it should be able to emulate that distribution by generating random
vectors on its own according to the same distribution. And it would
be nice if this could be done using any arbitrary function
approximator, not just special ones (like a sum of Gaussians).
This talk will derive a very simple algorithm that does just that. It
can also learn conditional probabilities of the form "given I'm in
this state and do this action, what state might I end up in on the
next time step". That's needed to do reinforcement learning with
arbitrary neural networks if you want guaranteed convergence. This
algorithm can also be used for inverse control for noninvertable
systems where you say "given where I am and where I want to go, give
me one of the multiple actions that will get me there". This
algorithm can also be used for unsupervised learning, where it learns
a function such that f(x) is uniform even when x is not. The talk
will end with a discussion to see if anyone knows of other areas where
this algorithm might be useful.