I'm a postdoc at Carnegie Mellon University's Robotics Institute,
working with David Held and the Robots Perceiving and Doing lab.
My research is at the intersection of robotics, computer vision, and machine learning, with a focus on manipulation of complex objects such as deformables.
These days, I'm interested in understanding how multimodal observation and action representations can lead to more sample-efficient and reliable learning.
Ultimately, I hope that this research can help open the doors to deploy robots in messy and unstructured environments.
Besides my scientific interests of computer science and robotics,
I'm interested in a variety of other technical fields, as well as
history, law, political science, and international affairs.
I am originally from Albany, New York.
04/30/2021: The New York Times featured our work on surgical robotics!
04/xx/2021: Invited research talks at Williams, Stanford, CMU, and Berkeley (BAIR).
03/19/2021: Invited research talk at the University of Toronto. (Video)
02/28/2021: Four research papers are accepted to ICRA 2021. We will be presenting virtually.
02/17/2021: Invited research talk at Siemens' robotics team.
12/31/2020: Multiple preprints are available on visual servoing, imitation learning, and deformable manipulation.
07/18/2020: Papers on fabric smoothing and surgical calibration were accepted to IROS 2020 and RA-Letters 2020.
05/05/2020: Our work on VisuoSpatial Foresight was accepted to RSS 2020! We will be presenting virtually.
05/05/2020: We released a new BAIR Blog post about some of our work in robot fabric manipulation.
03/23/2020: I was interviewed by Sayak Paul of PyImageSearch. You can read it here on Medium.
03/20/2020: We have three new preprints available, and an update to the 2019 preprint on fabric smoothing.
03/20/2020: Our paper on surgical peg transfer was accepted to ISMR 2020 (postponed to November).
03/05/2020: I'll be interning at Google Brain this summer (remotely), working with the robotics team!
10/08/2019: I attended ISRR 2019 in Vietnam and presented our paper on robot bed-making.
10/01/2019: We have a new preprint on fabric smoothing, and a paper at the Deep RL workshop at NeurIPS 2019.
07/31/2019: Our paper on robotic cloth manipulation and bed-making has been accepted to ISRR 2019.
10/23/2018: We have a new BAIR Blog post about work in the AUTOLAB related to depth sensing.
04/24/2018: I passed my PhD qualifying exam. Please see the bottom of this website for a transcript.
01/11/2018: Our paper on surgical debridement and calibration has been accepted to ICRA 2018.
08/02/2017: We wrote a BAIR Blog post about our work on minibatch Metropolis-Hastings.
Recent Talk
Here is a talk I gave at Cornell University on October 2022, which provides a representative overview of my research.
Research Publications
Below, you can find my publications, as well as links to code, relevant blog posts,
and paper reviews. I strongly believe that researchers should make code publicly
available. Our code is usually on GitHub where you can file issue reports with questions.
I generally list papers under review (i.e., "preprints") first, followed by papers at accepted conferences, journals, or other venues in reverse chronological order.
If a paper is on arXiv, that's where you can find the latest version.
As is standard in our field, authors are ordered by contribution level, and asterisks (*) represent equality. We sometimes use the dagger (†) to indicate equal non-first author contribution.
If you only have time to read one or two of the papers below, then I recommend the
CoRL 2022 paper about ToolFlowNet, or the ICRA 2021 paper "Learning to Rearrange Deformable Cables, Fabrics, and Bags with Goal-Conditioned Transporter Networks"
or the RSS 2020 paper "VisuoSpatial Foresight for Multi-Step, Multi-Task Fabric Manipulation" (or its journal paper extension).
We propose SLIP: Singulating Layers using Interactive Perception, and apply SLIP to the task of autonomous bagging. Using interactive perception, we can singulate layers of a bag with higher success rates than prior work.
We show how to manipulate tools from demonstrations by tracking where different points on a tool move over time. We can predict where each tool point "flows" in 3D space, and then convert this to a rotation and a translation for 3D manipulation.
We use tactile data from the ReSkin sensor to classify the number of layers of fabric between two grippers. We use this information for singulating fabric layers amidst a pile of fabrics.
We study efficiently learning to fling (and smooth) fabrics with one UR5 robot arm. We use a multi-armed bandit framework to efficiently find candidate fling actions.
We use depth sensing, recurrent neural networks, and a new trajectory optimizer to get an automated surgical robot to outperform a human surgical resident on the peg transfer task.
This is an extension of our ISMR 2020 and IEEE RA-Letters 2020 papers on surgical peg transfer.
We study Planar Robot Casting, where the goal is for a robot to dynamically manipulate a free-end cable (in contrast to our prior work on fixed-end cables) so that the end reaches a target.
This is an extension of our RSS 2020 conference paper which presented VisuoSpatial Foresight (VSF). Here, we systematically explore different ways to improve different stages of the VSF pipeline, and find that adjusting the data generation enables better physical fabric folding.
We propose a way to perform interactive imitation learning in a way that minimizes the amount of context switching that occurs when an agent and a supervisor swap control.
We design a suite of tasks for benchmarking deformable object manipulation, including 1D cables, 2D fabrics, and 3D bags. We use Transporter Networks for learning how to manipulate some of these tasks, and for others, we design goal-conditioned variants.
We propose a method to enable a UR5 arm to perform high speed dynamic rope manipulation tasks. We use a parabolic action motion and predict the single apex point of this motion.
We use dense object nets trained on simulated data and apply it to fabric manipluation tasks.
Since we train correspondences, we can take an action applied on a fabric, and "map" the corresponding action to a new fabric setup.
We have an IROS 2020 workshop paper that extends this idea to multi-modal distributions. [arXiv]
We propose a framework which uses a coarse controller in free space, and uses imitation learning to learn precise actions in regions that mandate the most accuracy. We test on the peg transfer task and show high success rates, and transferrability of the learned model across multiple surgical arms.
We design a custom fabric simulator, and script a corner-pulling demonstrator to train a fabric smoothing policy entirely in simulation using imitation learning. We transfer the policy to a physical da Vinci surgical robot.
We propose VisuoSpatial Foresight, an extension of visual foresight that additionally uses depth information, and use it for predicting what fabric observations (i.e., images) will look like given a series of actions.
We have since extended this paper into a journal submission (noted above).
We propose a method for automating the surgical peg transfer task. The method uses depth sensing and block detection algorithms to determine where to pick and place blocks.
We propose a system for robotic bed-making using a quarter-scale bed, which involves collecting real data and using color and depth information to detect blanket corners for pulling. This is applied on two mobile robots: the HSR and the Fetch.
We show how an ensemble of Q-networks can improve robustness of reinforcement learning. We use the ensemble to estimate variance. In simulated autonomous driving using TORCS, robust policies can better handle an adversary.
We show how to use the da Vinci Research Kit at rapid speeds while maintaining reliability, and apply this for a model of the surgical debridement task.
We propose a conceptually simple way to manage a data curriculum to provide samples from a teacher to a student, and show that this facilitates learning in offline and mostly-offline RL.
We investigate whether it makes sense to provide samples that are at a reasonable level of "difficulty" for a learner agent, and empirically test on the standard Atari 2600 benchmark.
Coursework, Teaching, and Oral Exams
I have taken many graduate courses as part of the PhD program at UC Berkeley, typically in computer science (CS) but also in electrical engineering (EE) and statistics (STAT).
Some courses were new when I took them and had a "294-XYZ" number, before they took on a "regular" three-digit number.
You can find my thoughts and reviews of these classes on my personal blog.
I was also the GSI (i.e., Teaching Assistant) for the Deep Learning class in Fall 2016 and Spring 2019.
The course is now numbered CS 182/282A, where the 182 is for undergrads and the 282A is for graduate students.
CS 267, Applications of Parallel Computing
CS 280, Computer Vision
CS 281A, Statistical Learning Theory
CS 182/282A, Deep Neural Networks (GSI/TA twice)
CS 287, Advanced Robotics
CS 288, Natural Language Processing
CS 294-112, Deep Reinforcement Learning (now CS 285)
At the time I took it, UC Berkeley had an oral preliminary exam requirement for PhD students.
Here's the transcript of my prelims.
Nowadays, things might have changed since the number of AI PhD students has skyrocketed.
There is also a second oral exam, called the qualifying exam.
Here's the transcript of my qualifying exam.
Miscellaneous
I frequently blog about (mostly) technical topics.
This blog is not affiliated with my employer, and I deliberately do not use an academic domain to reinforce this separation.
I also recommend checking the Berkeley AI Research blog.
I was one of its maintainers for its first four years.
My "Information Diet:"
I have a list of about 40-45 news sources that I try to read regularly.
Here is the list.
Reading a news source does not imply that I agree with its content.
Also, here are some books I have read:
2016,
2017,
2018,
2019,
2020,
2021,
2022.
(Once again, reading a book does not imply that I agree with its content.)
Twitter can also be a source of information.