Neural Dynamic Policies
for End-to-End Sensorimotor Learning

Shikhar Bahl
Mustafa Mukadam
Abhinav Gupta
Deepak Pathak
CMU
FAIR
CMU/FAIR
CMU
Published at NeurIPS, 2020 (Spotlight)

[Paper]
[Slides]
[Poster]
[GitHub Code]


The current dominant paradigm in sensorimotor control, whether imitation or reinforcement learning, is to train policies directly in raw action spaces such as torque, joint angle, or end-effector position. This forces the agent to make decision at each point in training, and hence, limit the scalability to continuous, high-dimensional, and long-horizon tasks. In contrast, research in classical robotics has, for a long time, exploited dynamical systems as a policy representation to learn robot behaviors via demonstrations. These techniques, however, lack the flexibility and generalizability provided by deep learning or deep reinforcement learning and have remained under-explored in such settings. In this work, we begin to close this gap and embed dynamics structure into deep neural network-based policies by reparameterizing action spaces with differential equations. We propose Neural Dynamic Policies (\ours) that make predictions in trajectory distribution space as opposed to prior policy learning methods where action represents the raw control space. The embedded structure allow us to perform end-to-end policy learning under both reinforcement and imitation learning setups. We show that \ours achieve better or comparable performance to state-of-the-art approaches on many robotic control tasks using reward-based training, as well as on digit writing using demonstrations.


Neural Dynamic Policies





Source Code and Environment

We have released the PyTorch based implementation and environment on the github page. Try our code!
[GitHub]


Paper and Bibtex

[Paper] [ArXiv] [Slides] [Poster]

Citation
 
Shikhar Bahl, Mustafa Mukadam, Abhinav Gupta, Deepak Pathak. Neural Dynamic Policies for End-to-End Sensorimotor Learning
In NeurIPS 2020.

[Bibtex]
@inproceedings{bahl2020ndps,
  Author = { Bahl, Shikhar and Mukadam, Mustafa and
  Gupta, Abhinav and Pathak, Deepak},
  Title = {Neural Dynamic Policies for End-to-End Sensorimotor Learning},
  Booktitle = {NeurIPS},
  Year = {2020}
}


Acknowledgements

We thank Giovanni Sutanto, Stas Tiomkin and Adithya Murali for fruitful discussions. We also thank Franziska Meier, Akshara Rai, David Held, Mengtian Li, George Cazenavette, and Wen-Hsuan Chu for comments on early drafts of this paper. This work was supported in part by DARPA Machine Common Sense grant and Google Faculty Award to DP.