Maximum Entropy Inverse Reinforcement Learning

Brian D. Ziebart, Andrew Maas, J. Andrew Bagnell, and Anind K. Dey
AAAI Conference on Artificial Intelligence (AAAI 2008).
[pdf]

Abstract: Recent research has shown the benefit of framing problems of imitation learning as solutions to Markov Decision Problems. This approach reduces learning to the problem of recovering a utility function that makes the behavior induced by a near-optimal policy closely mimic demonstrated behavior. In this work, we develop a probabilistic approach based on the principle of maximum entropy. Our approach provides a well-defined, globally normalized distribution over decision sequences, while providing the same performance guarantees as existing methods. We develop our technique in the context of modeling real-world navigation and driving behaviors where collected data is inherently noisy and imperfect. Our probabilistic approach enables modeling of route preferences as well as a powerful new approach to inferring destinations and routes based on partial trajectories.

Related Research Areas: Inverse reinforcement learning, inverse optimal control, imitation learning, apprenticeship learning, structured prediction, conditional random fields.

Bibtex:
@inproceedings{ziebart2008maximum,
   author = {Brian D. Ziebart and Andrew Maas 
            and J. Andrew Bagnell and Anind K. Dey},
   title = {Maximum Entropy Inverse Reinforcement Learning},
   year = {2008},
   booktitle = {Proc. AAAI},
   pages = {1433--1438}
}

Additional notes:
An "off by one" error in the algorithm originally published is corrected in this version.
Brian Ziebart's Homepage