I'm a postdoc at Carnegie Mellon University's Robotics Institute,
working with David Held and the Robots Perceiving and Doing lab.
My research is at the intersection of robotics, computer vision, and machine learning, with a focus on manipulation of complex objects such as deformables.
These days, I'm interested in understanding how good action representations can lead to more sample-efficient and reliable learning.
Ultimately, I hope that this research can help open the doors to deploy robots in messy and unstructured environments.
Besides my scientific interests of computer science and robotics,
I'm interested in a variety of other technical fields such as economics, psychology, biology, and physics, and
I try to keep myself reasonably up to date on the major high-level advances in those areas.
I'm also into history, law, politics, and international affairs.
I am originally from Albany, New York.
04/24/2018: I passed my PhD qualifying exam. Please see the bottom of this website for a transcript.
01/11/2018: Our paper on surgical debridement and calibration has been accepted to ICRA 2018.
08/02/2017: We wrote a BAIR Blog post about our work on minibatch Metropolis-Hastings.
Here is a talk I gave at Cornell University on October 2022, which provides a representative overview of my research.
Below, you can find my publications, as well as links to code, relevant blog posts,
and paper reviews. I strongly believe that researchers should make code publicly
available. Our code is usually on GitHub where you can file issue reports with questions.
I generally list papers under review (i.e., "preprints") first, followed by papers at accepted conferences, journals, or other venues in reverse chronological order.
If a paper is on arXiv, that's where you can find the latest version.
As is standard in our field, authors are ordered by contribution level, and asterisks (*) represent equality. We sometimes use the dagger (†) to indicate equal non-first author contribution.
If you only have time to read one or two of the papers below, then I recommend the
recent CoRL 2022 paper about ToolFlowNet, or the ICRA 2021 paper "Learning to Rearrange Deformable Cables, Fabrics, and Bags with Goal-Conditioned Transporter Networks"
or the RSS 2020 paper "VisuoSpatial Foresight for Multi-Step, Multi-Task Fabric Manipulation" (or its journal paper extension).
We show how to manipulate tools from demonstrations by tracking where different points on a tool move over time. We can predict where each tool point "flows" in 3D space, and then convert this to a rotation and a translation for 3D manipulation.
This is an extension of our RSS 2020 conference paper which presented VisuoSpatial Foresight (VSF). Here, we systematically explore different ways to improve different stages of the VSF pipeline, and find that adjusting the data generation enables better physical fabric folding.
We design a suite of tasks for benchmarking deformable object manipulation, including 1D cables, 2D fabrics, and 3D bags. We use Transporter Networks for learning how to manipulate some of these tasks, and for others, we design goal-conditioned variants.
We use dense object nets trained on simulated data and apply it to fabric manipluation tasks.
Since we train correspondences, we can take an action applied on a fabric, and "map" the corresponding action to a new fabric setup.
We have an IROS 2020 workshop paper that extends this idea to multi-modal distributions. [arXiv]
We propose a framework which uses a coarse controller in free space, and uses imitation learning to learn precise actions in regions that mandate the most accuracy. We test on the peg transfer task and show high success rates, and transferrability of the learned model across multiple surgical arms.
We design a custom fabric simulator, and script a corner-pulling demonstrator to train a fabric smoothing policy entirely in simulation using imitation learning. We transfer the policy to a physical da Vinci surgical robot.
We propose VisuoSpatial Foresight, an extension of visual foresight that additionally uses depth information, and use it for predicting what fabric observations (i.e., images) will look like given a series of actions.
We have since extended this paper into a journal submission (noted above).
We propose a system for robotic bed-making using a quarter-scale bed, which involves collecting real data and using color and depth information to detect blanket corners for pulling. This is applied on two mobile robots: the HSR and the Fetch.
We show how an ensemble of Q-networks can improve robustness of reinforcement learning. We use the ensemble to estimate variance. In simulated autonomous driving using TORCS, robust policies can better handle an adversary.
We investigate whether it makes sense to provide samples that are at a reasonable level of "difficulty" for a learner agent, and empirically test on the standard Atari 2600 benchmark.
Coursework, Teaching, and Oral Exams
I have taken many graduate courses as part of the PhD program at UC Berkeley, typically in computer science (CS) but also in electrical engineering (EE) and statistics (STAT).
Some courses were new when I took them and had a "294-XYZ" number, before they took on a "regular" three-digit number.
You can find my thoughts and reviews of these classes on my personal blog.
I was also the GSI (i.e., Teaching Assistant) for the Deep Learning class in Fall 2016 and Spring 2019.
The course is now numbered CS 182/282A, where the 182 is for undergrads and the 282A is for graduate students.
CS 267, Applications of Parallel Computing
CS 280, Computer Vision
CS 281A, Statistical Learning Theory
CS 182/282A, Deep Neural Networks (GSI/TA twice)
CS 287, Advanced Robotics
CS 288, Natural Language Processing
CS 294-112, Deep Reinforcement Learning (now CS 285)
At the time I took it, UC Berkeley had an oral preliminary exam requirement for PhD students.
Here's the transcript of my prelims.
Nowadays, things might have changed since the number of AI PhD students has skyrocketed.
There is also a second oral exam, called the qualifying exam.
Here's the transcript of my qualifying exam.
I frequently blog about (mostly) technical topics.
This blog is not affiliated with my employer, and I deliberately do not use an academic domain to reinforce this separation.
I also recommend checking the Berkeley AI Research blog.
I was one of its maintainers for its first four years, and it's been great to see how much the blog has grown since then.
My "Information Diet:"
I have a list of about 40 news sources that I try to read regularly.
Here is the list.
Reading a news source does not imply that I agree with its content.
Also, here are some books I have read:
(Once again, reading a book does not imply that I agree with its content.)
Twitter can also be a source of information.