Maximum Entropy Inverse Reinforcement Learning
Brian D. Ziebart, Andrew Maas, J. Andrew Bagnell, and Anind K. Dey
AAAI Conference on Artificial Intelligence (AAAI 2008).
[
pdf]
Abstract:
Recent research has shown the benefit of framing problems
of imitation learning as solutions to Markov Decision Problems.
This approach reduces learning to the problem of recovering a
utility function that makes the behavior induced by a near-optimal
policy closely mimic demonstrated behavior. In this work, we
develop a probabilistic approach based on the principle of maximum
entropy. Our approach provides a well-defined, globally normalized
distribution over decision sequences, while providing the same
performance guarantees as existing methods.
We develop our technique in the context of modeling real-world
navigation and driving behaviors where collected data is inherently
noisy and imperfect. Our probabilistic approach enables modeling of
route preferences as well as a powerful new approach to inferring
destinations and routes based on partial trajectories.
Related Research Areas: Inverse reinforcement learning,
inverse optimal control, imitation learning, apprenticeship learning,
structured prediction, conditional random fields.
Bibtex:
@inproceedings{ziebart2008maximum,
author = {Brian D. Ziebart and Andrew Maas
and J. Andrew Bagnell and Anind K. Dey},
title = {Maximum Entropy Inverse Reinforcement Learning},
year = {2008},
booktitle = {Proc. AAAI},
pages = {1433--1438}
}
Additional notes:
An "off by one" error in the algorithm originally published is
corrected in this version.