Shikhar Bahl

Hi there! I am a third year PhD student at the Robotics Institute, within the School of Computer Science at Carnegie Mellon University. I am interested in the intersection of robot control and machine learning. My goal is to build robotic agents that can operate in the wild. I am advised by Deepak Pathak and Abhinav Gupta. I am also a visiting researcher at FAIR, working with Aravind Rajeswaran. I have spent time at NVIDIA Robotics as a research intern.

Prior to CMU, I did my undergrad at UC Berkeley in Applied Math and Computer Science, where I was affiliated with Berkeley Artificial Intelligence Research (BAIR) and worked under Sergey Levine on problems in deep reinforcement learning and robotics.

Feel free to contact me via email! You can reach me at sbahl2 -at- cs dot cmu dot edu

email  /  CV  /  Google Scholar  /  Twitter  /  GitHub


I am broadly interested in creating robust autonomous agents that operate with minimal or no human supervision, in the wild. My research focuses on combining machine learning, perception and reinforcement learning for robotic control. Here is some of my work:


Human-to-Robot Imitation in the Wild
Shikhar Bahl, Abhinav Gupta*, Deepak Pathak*
RSS 2022

webpage | pdf | abstract | bibtex | arXiv | videos | talk

We approach the problem of learning by watching humans in the wild. While traditional approaches in Imitation and Reinforcement Learning are promising for learning in the real world, they are either sample inefficient or are constrained to lab settings. Meanwhile, there has been a lot of success in processing passive, unstructured human data. We propose tackling this problem via an efficient one-shot robot learning algorithm, centered around learning from a third person perspective. We call our method WHIRL: In the Wild Human-Imitated Robot Learning. In WHIRL, we aim to use human videos to extract a prior over the intent of the demonstrator, and use this to initialize our agent's policy. We introduce an efficient real-world policy learning scheme, that improves over the human prior using interactions. Our key contributions are a simple sampling-based policy optimization approach, a novel objective function for aligning human and robot videos as well as an exploration method to boost sample efficiency. We show, one-shot, generalization and success in real world settings, including 20 different manipulation tasks in the wild.

          author = {Bahl, Shikhar and
          Gupta, Abhinav and Pathak, Deepak},
          title  = {Human-to-Robot Imitation in the Wild},
          journal= {RSS},
          year   = {2022}

RB2: Robotic Manipulation Benchmarking with a Twist
Sudeep Dasari, Jianren Wang, Joyce Hong, Shikhar Bahl, Abitha Thankaraj, Karanbir Chahal, Berk Calli, Saurabh Gupta, David Held, Lerrel Pinto, Deepak Pathak, Vikash Kumar, Abhinav Gupta
NeurIPS 2021
(Datasets and Benchmark)

webpage | pdf | abstract | bibtex | code

Benchmarks offer a scientific way to compare algorithms using objective performance metrics. Good benchmarks have two features: (a) they should be widely useful for many research groups; (b) and they should produce reproducible findings. In robotic manipulation research, there is a trade-off between reproducibility and broad accessibility. If the benchmark is kept restrictive (fixed hardware, objects), the numbers are reproducible but the setup becomes less general. On the other hand, a benchmark could be a loose set of protocols (e.g. YCB object set) but the underlying variation in setups make the results non-reproducible. In this paper, we re-imagine benchmarking for robotic manipulation as state-of-the-art algorithmic implementations, alongside the usual set of tasks and experimental protocols. The added baseline implementations will provide a way to easily recreate SOTA numbers in a new local robotic setup, thus providing credible relative rankings between existing approaches and new work. However, these 'local rankings' could vary between different setups. To resolve this issue, we build a mechanism for pooling experimental data between labs, and thus we establish a single global ranking for existing (and proposed) SOTA algorithms. Our benchmark, called Ranking-Based Robotics Benchmark (RB2), is evaluated on tasks that are inspired from clinically validated Southampton Hand Assessment Procedures. Our benchmark was run across two different labs and reveals several surprising findings. For example, extremely simple baselines like open-loop behavior cloning, outperform more complicated models (e.g. closed loop, RNN, Offline-RL, etc.) that are preferred by the field. We hope our fellow researchers will use RB2 to improve their research's quality and rigor.

              title={RB2: Robotic Manipulation
              Benchmarking with a Twist},
              author={Dasari, Sudeep and
              Wang, Jianren and Hong, Joyce and
              Bahl, Shikhar and Lin, Yixin and
              Wang, Austin S and Thankaraj, Abitha
              and Chahal, Karanbir Singh and
              Calli, Berk and Gupta, Saurabh
              and others},
              booktitle={Thirty-fifth Conference
              on Neural Information Processing
              Systems Datasets and Benchmarks
              Track (Round 2)},

Hierarchical Neural Dynamic Policies
Shikhar Bahl, Abhinav Gupta, Deepak Pathak
RSS 2021  (Invited to Autonomous Robots Special Issue)

webpage | pdf | abstract | bibtex | arXiv | talk video

We tackle the problem of generalization to unseen configurations for dynamic tasks in the real world while learning from high-dimensional image input. The family of nonlinear dynamical system-based methods have successfully demonstrated dynamic robot behaviors but have difficulty in generalizing to unseen configurations as well as learning from image inputs. Recent works approach this issue by using deep network policies and reparameterize actions to embed the structure of dynamical systems but still struggle in domains with diverse configurations of image goals, and hence, find it difficult to generalize. In this paper, we address this dichotomy by leveraging embedding the structure of dynamical systems in a hierarchical deep policy learning framework, called Hierarchical Neural Dynamical Policies (H-NDPs). Instead of fitting deep dynamical systems to diverse data directly, H-NDPs form a curriculum by learning local dynamical system-based policies on small regions in state-space and then distill them into a global dynamical system-based policy that operates only from high-dimensional images. H-NDPs additionally provide smooth trajectories, a strong safety benefit in the real world. We perform extensive experiments on dynamic tasks both in the real world (digit writing, scooping, and pouring) and simulation (catching, throwing, picking). We show that H-NDPs are easily integrated with both imitation as well as reinforcement learning setups and achieve state-of-the-art results.

          author = {Bahl, Shikhar and
          Gupta, Abhinav and Pathak, Deepak},
          title  = {Hierarchical Neural
          Dynamic Policies},
          journal= {RSS},
          year   = {2021}

Neural Dynamic Policies for End-to-End Sensorimotor Learning
Shikhar Bahl, Mustafa Mukadam, Abhinav Gupta, Deepak Pathak
NeurIPS 2020  (Spotlight)

webpage | pdf | abstract | bibtex | arXiv | code | demo | spotlight talk

The current dominant paradigm in sensorimotor control, whether imitation or reinforcement learning, is to train policies directly in raw action spaces such as torque, joint angle, or end-effector position. This forces the agent to make decision at each point in training, and hence, limit the scalability to continuous, high-dimensional, and long-horizon tasks. In contrast, research in classical robotics has, for a long time, exploited dynamical systems as a policy representation to learn robot behaviors via demonstrations. These techniques, however, lack the flexibility and generalizability provided by deep learning or deep reinforcement learning and have remained under-explored in such settings. In this work, we begin to close this gap and embed dynamics structure into deep neural network-based policies by reparameterizing action spaces with differential equations. We propose Neural Dynamic Policies (NDPs) that make predictions in trajectory distribution space as opposed to prior policy learning methods where action represents the raw control space. The embedded structure allow us to perform end-to-end policy learning under both reinforcement and imitation learning setups. We show that NDPs achieve better or comparable performance to state-of-the-art approaches on many robotic control tasks using both reward-based training and demonstrations.

          Author = {Bahl, Shikhar and
          Mukadam, Mustafa and
          Gupta, Abhinav and Pathak, Deepak},
          Title = {Neural Dynamic Policies
          for End-to-End Sensorimotor Learning},
          Booktitle = {NeurIPS},
          Year = {2020}

Skew-Fit: State-Covering Self-Supervised Reinforcement Learning
Vitchyr H. Pong*, Murtaza Dalal*, Steven Lin*, Ashvin Nair, Shikhar Bahl, Sergey Levine
ICML 2020

webpage | pdf | abstract | bibtex | arXiv | code | video |

Reinforcement learning can enable an agent to acquire a large repertoire of skills. However, each new skill requires a manually-designed reward function, which typically requires considerable manual effort and engineering. Self-supervised goal setting has the potential to automate this process, enabling an agent to propose its own goals and acquire skills that achieve these goals. However, such methods typically rely on manually-designed goal distributions or heuristics to encourage the agent to explore a wide range of states. In this work, we propose a formal objective for exploration when training an autonomous goal-reaching policy that maximizes state coverage, and show that this objective is equivalent to maximizing the entropy of the goal distribution together with goal reaching performance. We present an algorithm called Skew-Fit for learning such a maximum-entropy goal distribution, and show that our method converges to a uniform distribution over the set of possible states, even when we do not know this set beforehand. When combined with existing goal-conditioned reinforcement learning algorithms, we show that Skew-Fit allows self-supervised agents to autonomously explore their entire state space faster than prior work, across a variety of simulated and real robotic tasks.

          title={Skew-fit: State-covering self-supervised reinforcement learning},
          author={Pong, Vitchyr H and Dalal, Murtaza and Lin, Steven and Nair, Ashvin and Bahl, Shikhar and Levine, Sergey},

Deep Reinforcement Learning for Industrial Insertion Tasks with Visual Inputs and Natural Rewards
Gerrit Schoettler*, Ashvin Nair*, Jianlan Luo, Shikhar Bahl, Juan aparicio Ojea, Eugen Solowjow, Sergey Levine
IROS 2020

webpage | pdf | abstract | bibtex | arXiv | video

We consider a variety of difficult industrial insertion tasks with visual inputs and different natural reward specifications, namely sparse rewards and goal images. We show that methods that combine RL with prior information, such as classical controllers or demonstrations, can solve these tasks from a reasonable amount of real-world interaction.

            title={Deep reinforcement learning for industrial insertion tasks with visual inputs and natural rewards},
            author={Schoettler, Gerrit and Nair, Ashvin and Luo, Jianlan and Bahl, Shikhar and Ojea, Juan Aparicio and Solowjow, Eugen and Levine, Sergey},

Contextual Imagined Goals for Self-Supervised Robotic Learning
Ashvin Nair*, Shikhar Bahl*, Alexander Khazatsky*, Vitchyr H. Pong, Glen Berseth , Sergey Levine
CoRL 2019

webpage | pdf | abstract | bibtex | arXiv | code | data | video

We propose a conditional goal-setting model that aims to only propose goals that are feasible reachable from the robot's current state, and demonstrate that this enables self-supervised goal-conditioned learning with raw image observations both in varied simulated environments and a real-world pushing task..

            title={Contextual imagined goals for self-supervised robotic learning},
            author={Nair, Ashvin and Bahl, Shikhar and Khazatsky, Alexander and Pong, Vitchyr and Berseth, Glen and Levine, Sergey},

Residual Reinforcement Learning for Robot Control
Tobias Johannink*, Shikhar Bahl*, Ashvin Nair*, Jianlan Luo, Avinash Kumar, Matthias Loskyll, Juan Aparicio Ojea, Eugen Solowjow, Sergey Levine
ICRA 2019

webpage | pdf | abstract | bibtex | arXiv | video

Conventional feedback control methods can solve various types of robot control problems very efficiently by capturing the structure with explicit models, such as rigid body equations of motion. However, many control problems in modern manufacturing deal with contacts and friction, which are difficult to capture with first-order physical modeling. Hence, applying control design methodologies to these kinds of problems often results in brittle and inaccurate controllers, which have to be manually tuned for deployment. Reinforcement learning (RL) methods have been demonstrated to be capable of learning continuous robot controllers from interactions with the environment, even for problems that include friction and contacts. In this paper, we study how we can solve difficult control problems in the real world by decomposing them into a part that is solved efficiently by conventional feedback control methods, and the residual which is solved with RL. The final control policy is a superposition of both control signals. We demonstrate our approach by training an agent to successfully perform a real-world block assembly task involving contacts and unstable objects

              title={Residual reinforcement learning for robot control},
              author={Johannink, Tobias and Bahl, Shikhar and Nair, Ashvin and Luo, Jianlan and Kumar, Avinash and Loskyll, Matthias and Ojea, Juan Aparicio and Solowjow, Eugen and Levine, Sergey},

Visual Reinforcement Learning with Imagined Goals
Ashvin Nair*, Vitchyr H. Pong*, Murtaza Dalal, Shikhar Bahl, Steven Lin, Sergey Levine
NeurIPS 2018  (Spotlight)

webpage | pdf | abstract | bibtex | arXiv | code | blog | videos

For an autonomous agent to fulfill a wide range of user-specified goals at test time, it must be able to learn broadly applicable and general-purpose skill repertoires. Furthermore, to provide the requisite level of generality, these skills must handle raw sensory input such as images. In this paper, we propose an algorithm that acquires such general-purpose skills by combining unsupervised representation learning and reinforcement learning of goal-conditioned policies. Since the particular goals that might be required at test-time are not known in advance, the agent performs a self-supervised "practice" phase where it imagines goals and attempts to achieve them. We learn a visual representation with three distinct purposes: sampling goals for self-supervised practice, providing a structured transformation of raw sensory inputs, and computing a reward signal for goal reaching. We also propose a retroactive goal relabeling scheme to further improve the sample-efficiency of our method. Our off-policy algorithm is efficient enough to learn policies that operate on raw image observations and goals for a real-world robotic system, and substantially outperforms prior techniques.

            title={Visual reinforcement learning with imagined goals},
            author={Nair, Ashvin V and Pong, Vitchyr and Dalal, Murtaza and Bahl, Shikhar and Lin, Steven and Levine, Sergey},

EECS127 - Fall 2018 (uGSI)

Website template from this repo and this webpage!