Robotics poses many challenges for RL, most notably training on a physical system can be expensive and dangerous, which has sparked significant interest in learning control policies using a physics simulator. In this work, we exploit the full state observability in the simulator to train better policies which take as input only partial observations (RGBD images). We do this by employing an actor-critic training algorithm in which the critic is trained on full states while the actor (or policy) gets rendered images as input. We show experimentally on a range of simulated tasks that using these asymmetric inputs significantly improves performance. Finally, we combine this method with domain randomization and show real robot experiments for several tasks like picking, pushing, and moving a block. We achieve this simulation to real world transfer without training on any real world data.
Scaling the self supervised learning framework for high-dimensional control requires either scaling up the data collection effort or using a clever sampling strategy for training. We present a novel approach to train policies that map visual information to high-level, higher-dimensional action spaces and apply to grasping using an adaptive underactuated gripper.
How do you learn to navigate an Unmanned Aerial Vehicle (UAV) and avoid obstacles? In this paper, we propose to bite the bullet and collect a dataset of crashes itself! We build a drone whose sole purpose is to crash into objects: it samples naive trajectories and crashes into random objects. We crash our drone 11,500 times to create one of the biggest UAV crash dataset. We show that simple self-supervised models learnt on this data is quite effective in navigating the UAV even in extremely cluttered environments with dynamic obstacles including humans.
This paper proposes the idea of robust adversarial reinforcement learning (RARL), where we train an agent to operate in the presence of a destabilizing adversary that applies disturbance forces to the system. Extensive experiments in multiple environments (InvertedPendulum, HalfCheetah, Swimmer, Hopper and Walker2d) conclusively demonstrate that our method (a) improves training stability; (b) is robust to differences in training/test conditions; and c) outperform the baseline even in the absence of the adversary.
Due to large number of experiences required for robot learning, recent approaches use a self-supervised paradigm: using sensors to measure success/failure. However, in most cases, these sensors provide weak supervision at best. In this work, we propose an adversarial learning framework that pits an adversary against the robot learning the task.
In an effort to defeat the adversary, the original robot learns to perform the task with more robustness leading to overall improved performance. By grasping 82% of presented novel objects compared to 68% without an adversary, we demonstrate the utility of creating adversaries. We also demonstrate that having robots in adversarial setting might be a better learning strategy as compared to having collaborative multiple robots.
End-to-end learning approaches are often critiqued due to their huge data requirements for learning a task. But do end-to-end approaches need to learn a unique model for every task?
In this paper, we attempt to take the next step in data-driven end-to-end learning frameworks: move from the realm of task-specific models to joint learning of multiple robot tasks. In an surprising result we show that models with multi-task learning tend to perform better than task-specific models trained with same amounts of data.
What is the right supervisory signal to train visual representations? In case of biological agents, visual representation learning does not require semantic labels. In fact, we argue that biological agents use active exploration and physical interactions with the world to learn visual representations unlike current vision systems which just use passive observations (images and videos downloaded from web). Towards this goal, we build one of the first systems on a Baxter platform that pushes, pokes, grasps and actively observes objects in a tabletop environment.
We use these physical interactions to collect more than 130K datapoints, with each datapoint providing backprops to a shared ConvNet architecture allowing us to learn visual representations. We show the quality of learned representations by observing neuron activations and performing nearest neighbor retrieval on this learned representation. Finally we evaluate our learned ConvNet on different image classification tasks.
In this project I attempt to learn open loop grasping parameters by implementing a two stage Gaussian Process (GP) based Upper Confidence Bound (UCB) bandit solving algorithm. Apart from showing successful grasp policies learnt from trial and error interactions, I present a method of motor grasp verification using soft fingers via 'tandem grasp'. I also present a brief analysis of the proposed bandit algorithm.
In this project, we learn a joint word and image space embedding. This would enable us to both predict image descriptions given an image and predict an image given an image description.These models are solved using Block coordinate descent, Stochastic gradient descent and Orthogonal Matching Pursuit. We further show image summarization results using weak tags from Flickr.
Given an end effector start and goal position, I find a trajectory that requires zero energy/torque to execute. All the energy supplied is in the form of initial angular velocity. This is done via a greedy search through a database of zero energy trajectories generated by exploiting the chaotic dynamics of the manipulator.
Research on time delayed control of Robotic arms and studies on the effects of varying time delays on effectiveness of such robots. Designed and developed a system to control robots at IITG's Virtual Robotics Lab remotely via the internet.
Designed, fabricated and controlled an autonomous mid-sized all terrain quadruped. Novelty in cost effectiveness (under 1000$) and energy efficiency (20W). Further contributions in the passive dynamics and developing energy efficient swing leg control for quadrupedal robots.