Approximately Optimal Approximate Reinforcement Learning

John Langford

Abstract

  Conservative Policy Iteration is a general algorithm for approximate dynamic programming which comes with 3 important guarantees:

    1) Each iteration improves according to a metric.
    2) The algorithm terminates in a finite number of steps.
    3) Upon termination, it returns a policy which is near optimal.


Back to the Main Page

Charles Rosenberg
Last modified: Thu May 9 23:39:10 EDT 2002