Wed 7 Feb 1996, WeH 7220 Spurious Solutions to the Bellman Equation Mance Harmon, Wright-Patterson AFB Reinforcement learning algorithms often work by finding functions that satisfy the Bellman equation. This is optimal for controlling a Markov Decision Process (MDP) with a finite number of states and actions. Unfortunately, this approach is also frequently applied to MDPs with infinite states or actions. This discussion shows, in this case, the Bellman equation may have multiple solutions, many of which lead to very bad predictions and policies. Algorithms and conditions will also be presented that guarantee a single, optimal solution to the Bellman equation.