Inverse Optimal Heuristic Control

Nathan Ratliff, Brian D. Ziebart, Kevin Peterson, J. Andrew Bagnell, Martial Hebert, Anind K. Dey, and Siddhartha Srinivasa.
Conference on Artificial Intelligence and Statistics (AISTATS 2009).
[pdf]

Abstract: One common approach to imitation learning is behavioral cloning (BC), which employs straight- forward supervised learning (i.e., classification) to directly map observations to controls. A second approach is inverse optimal control (IOC), which formalizes the problem of learning sequential decision-making behavior over long horizons as a problem of recovering a utility function that explains observed behavior. This paper presents inverse optimal heuristic control (IOHC), a novel approach to imitation learning that capitalizes on the strengths of both paradigms. It employs long-horizon IOC-style modeling in a low-dimensional space where inference remains tractable, while incorporating an additional descriptive set of BC-style features to guide a higher-dimensional overall action selection. We provide experimental results demonstrating the capabilities of our model on a simple illustrative problem as well as on two real world problems: turn-prediction for taxi drivers, and pedestrian prediction within an office environment.

Bibtex:
@inproceedings{ratliff2009inverse,
   author = {Nathan Ratliff and Brian Ziebart and Kevin Peterson and 
             J. Andrew Bagnell and Martial Hebert and Anind K. Dey and
             Siddhartha Srinivasa},
   title = {Inverse Optimal Heuristic Control for Imitation Learning},
   year = {2009},
   booktitle = {Proc. AISTATS},
   pages = {424--431}
}

Brian Ziebart's Homepage