Wed 7 Feb 1996, WeH 7220
Spurious Solutions to the Bellman Equation
Mance Harmon, Wright-Patterson AFB
Reinforcement learning algorithms often work by finding functions that
satisfy the Bellman equation. This is optimal for controlling a
Markov Decision Process (MDP) with a finite number of states and
actions. Unfortunately, this approach is also frequently applied to
MDPs with infinite states or actions. This discussion shows, in this
case, the Bellman equation may have multiple solutions, many of which
lead to very bad predictions and policies. Algorithms and conditions
will also be presented that guarantee a single, optimal solution to
the Bellman equation.