Programmatic Reinforcement Learning

Dave Andre


  This talk will examine the use of partial programming as a means of designing agents for large Markov Decision Problems. In this approach, a programmer specifies only that which they know to be correct and the system then learns the rest from experience using reinforcement learning.

In contrast to previous low-level languages for partial programming, I will present ALisp, a Lisp-based high-level partial programming language. ALisp allows the programmer to constrain the policies considered by a learning process and to express his or her prior knowledge in a concise manner. Optimally completing a partial ALisp program is shown to be equivalent to solving a Semi-Markov Decision Problem (SMDP). Under a finite memory-use condition, online learning algorithms for ALisp are proved to converge to an optimal solution of the SMDP and thus to an optimal completion of the partial program.

The talk will then discuss method for exploiting the modularity of ALisp to speed the optimal completion of partial programs. Safe state abstraction allows an agent to ignore aspects of its current state that are irrelevant to its current decision, and therefore speeds up reinforcement learning. By decomposing representations of the value of actions along subroutine boundaries, greater state abstraction can be obtained. Unlike previous methods for state abstraction in reinforcement learning, the methods presented in this research yield significant state abstraction while maintaining hierarchical optimality , i.e., optimality among all policies consistent with the partial program. These methods are demonstrated on two simulated taxi tasks.

Function approximation, a method for representing the value of actions, allows reinforcement learning to be applied to problems where exact methods are intractable. Soft shaping is a method for guiding an agent toward a solution without constraining the search space. Both can be integrated with ALisp. ALisp with function approximation and reward shaping is successfully applied on a difficult continuous variant of the simulated taxi task.

Together, the methods presented in this work comprise a system for agent design that allows the programmer to specify what they know, hint at what they suspect using soft shaping, and leave unspecified that which they don't know; the system then optimally completes the program through experience and takes advantage of the hierarchical structure of the specified program to speed learning.

Back to the Main Page

Charles Rosenberg
Last modified: Thu Oct 16 12:55:20 EDT 2003