Recently, researchers have demonstrated state-of-the-art performance on sequential decision making problems (e.g., robotics control, sequential prediction) with deep neural networks and Reinforcement Learning (RL). However, for some of these problems, oracles that can demonstrate good performance are available during training. In this work, we propose AggreVaTeD, a policy gradient extension of the Imitation Learning (IL) approach of Ross & Bagnell (2014) that can leverage oracles to achieve faster and more accurate solutions with less training data than with a less-informed RL approaches. Specifically, we provide a comprehensive theoretical study of IL that demonstrates we can expect up to exponentially lower sample complexity for learning with AggreVaTeD than with RL algorithms. Finally, we present two stochastic gradient procedures that learn neural network policies for several problems including a sequential prediction task as well as various high dimensional robotics control problems. Our results and theory indicate that the proposed approach can achieve superior performance with respect to the oracle when the demonstrator is sub-optimal.
This a joint work with Arun Venkatraman, Geoff Gordon, Byron Boots and Drew Bagnell.