Approximately Optimal Approximate Reinforcement Learning

John Langford


  Conservative Policy Iteration is a general algorithm for approximate dynamic programming which comes with 3 important guarantees:

    1) Each iteration improves according to a metric.
    2) The algorithm terminates in a finite number of steps.
    3) Upon termination, it returns a policy which is near optimal.

