Inverse Optimal Heuristic Control
Nathan Ratliff, Brian D. Ziebart, Kevin Peterson, J. Andrew Bagnell,
Martial Hebert, Anind K. Dey, and Siddhartha Srinivasa.
Conference on Artificial Intelligence and Statistics (AISTATS 2009).
[
pdf]
Abstract:
One common approach to imitation learning is
behavioral cloning (BC), which employs straight-
forward supervised learning (i.e., classification)
to directly map observations to controls. A second approach
is inverse optimal control (IOC),
which formalizes the problem of learning sequential
decision-making behavior over long horizons as
a problem of recovering a utility function that
explains observed behavior. This paper presents
inverse optimal heuristic control (IOHC), a novel
approach to imitation learning that capitalizes on
the strengths of both
paradigms. It employs long-horizon IOC-style
modeling in a low-dimensional space where inference
remains tractable, while incorporating an
additional descriptive set of BC-style features to
guide a higher-dimensional overall action selection.
We provide experimental results demonstrating the capabilities
of our model on a simple illustrative problem as well as on two real
world problems: turn-prediction for taxi drivers,
and pedestrian prediction within an office environment.
Bibtex:
@inproceedings{ratliff2009inverse,
author = {Nathan Ratliff and Brian Ziebart and Kevin Peterson and
J. Andrew Bagnell and Martial Hebert and Anind K. Dey and
Siddhartha Srinivasa},
title = {Inverse Optimal Heuristic Control for Imitation Learning},
year = {2009},
booktitle = {Proc. AISTATS},
pages = {424--431}
}