Stable Function Approximation in Dynamic Programming
Geoff Gordon
The success of reinforcement learning in practical problems depends on
the ability to combine function approximation with temporal difference
methods such as value iteration. Experiments in this area have
produced mixed results; there have been both notable successes and
notable disappointments. Theory has been scarce, mostly due to the
difficulty of reasoning about function approximators that generalize
beyond the observed data. We provide a proof of convergence for a wide
class of temporal difference methods involving function approximators
such as k-nearest-neighbor, and show experimentally that these methods
can be useful. The proof is based on a view of function approximators
as expansion or contraction mappings. In addition, we present a novel
view of approximate value iteration: an approximate algorithm for one
environment turns out to be an exact algorithm for a different
environment.