Thu 31 Mar, 1:30, WeH 4601 How to Approximate a Value Function An IPC* by Justin Boyan+ Reinforcement learning is based on dynamic programming; DP is based on lookup tables; and sadly, lookup tables for the high-dimensional domains that reinforcement learning aims to conquer--such as robot control and game-playing--are impossibly big. An obvious approach to this "curse of dimensionality" is to replace the DP lookup table with a smoothing function approximator ("fitter") such as a neural net. Although doing this works brilliantly in the domain of backgammon, it more often works poorly, and there are few theoretical guarantees. Over the past several months, we've studied the interaction between a simple, model-based DP algorithm and several fitters (polynomial regression, neural nets, memory-based methods) on a variety of simulated domains. We have identified a particularly egregious form of misbehavior, called "hallucination", in which fitter error compounds itself over time to produce ever more atrocious policies. We have also developed a new algorithm which promises to be immune to hallucination yet still able to reap the immense benefits of successful smoothing. * Incredibly Preliminary Chat + joint work with Andrew Moore