Thu 31 Mar, 1:30, WeH 4601

		 How to Approximate a Value Function

		       An IPC* by Justin Boyan+


Reinforcement learning is based on dynamic programming; DP is based on
lookup tables; and sadly, lookup tables for the high-dimensional
domains that reinforcement learning aims to conquer--such as robot
control and game-playing--are impossibly big.  An obvious approach to
this "curse of dimensionality" is to replace the DP lookup table with
a smoothing function approximator ("fitter") such as a neural net.
Although doing this works brilliantly in the domain of backgammon, it
more often works poorly, and there are few theoretical guarantees.

Over the past several months, we've studied the interaction between a
simple, model-based DP algorithm and several fitters (polynomial
regression, neural nets, memory-based methods) on a variety of
simulated domains.  We have identified a particularly egregious form
of misbehavior, called "hallucination", in which fitter error
compounds itself over time to produce ever more atrocious policies.
We have also developed a new algorithm which promises to be immune to
hallucination yet still able to reap the immense benefits of
successful smoothing.


* Incredibly Preliminary Chat
+ joint work with Andrew Moore