Humans effortlessly manipulate objects in cluttered and uncertain environments. In contrast, most robotic manipulators are limited to carefully engineered environments to circumvent the difficulty of manipulation under uncertainty. Contact sensors can provide robots with with the feedback vital to addressing this limitation.

This thesis proposes a framework for using feedback from contact sensors to reliably manipulate objects under uncertainty. We formalize manipulation as a partially observable Markov decision process that includes object pose uncertainty, proprioceptual error, and kinematic constraints. Our algorithms exploit the structure of contact to efficiently estimate state and plan with this model.

First, we introduce the manifold particle filter as a principled method of estimating object pose and robot configuration. This algorithm avoids degeneracy by drawing samples from the lower-dimensional manifold of states induced by contact. Next, we introduce two belief space planning algorithms that seek out contact with sensors when doing so is necessary to achieve the goal. One algorithm harnesses the decoupling effect of contact to share computation between problem instances. The second leverages lower-dimensional structure to plan around kinematic constraints.

Finally, we evaluate the efficacy of our approach in real-robot and simulation experiments. The results show that our state estimation and planning algorithms consistently outperform those that are not tailored to manipulation or contact sensing.

Thesis Committee:
Siddhartha Srinivasa (Co-chair)
Nancy Pollard (Co-chair)
Geoff Gordon
Tomas Lozano-Perez (Massachusetts Institute of Technology)

Copy of Thesis Document

In many application domains, robots co-exist in the same physical space with humans and aim to become trustworthy partners. We particularly envision personal robots arranging furniture with a human partner, manufacturing robots performing spar assembly with human co-workers, or rehabilitation robots assisting spinal cord injury patients. In such collaborative settings, humans often have inaccurate models of the robot capabilities, which leads the team towards suboptimal strategies. On the other hand, the robot frequently knows the optimal way of executing the task based on some objective performance metric. This thesis proposes a set of decision-theoretic models of human teammates, that allow the robot to reason in a principled way over the effects of its actions on the future human behavior, and guide the human towards new, optimal strategies, unknown to them in advance. We formalize human adaptability, that is their willingness to adapt to a robot strategy, and propose a human-robot mutual adaptation formalism based on a bounded-memory model. We evaluate the impact of adaptability on collaboration paradigms: a shared-location collaborative task, and a shared-autonomy setting. We show that the formalism significantly improves team performance when the starting human preference of executing the task is suboptimal. We expect that the proposed models will increase task performance, human trust in the robot and perceived collaboration on a variety of joint-action collaborative tasks.

Thesis Committee:
Siddhartha Srinivasa (Chair)
Jodi Forlizzi
Emma Brunskill
Ariel Procaccia
David Hsu (National University of Singapore)

Copy of Proposal Document

While the applications for robotics are plentiful in theory, matching technical capabilities to real world customer needs at high reliability and practical price points is incredibly difficult, leaving behind large numbers of ambitious, but ultimately failed, attempts to apply robotics to consumer applications. In this talk we will share a bit of our journey with Anki, a company we started working on in 2008 with the goal of identifying and entering markets where robotics and AI can have a real, measurable impact in a short time frame, and then using the technologies and learnings developed for one product as building blocks for the next.

We enjoyed an eventful path from our early days as three Robotics Institute PhD students working out of a Pittsburgh living room to a 150 person company (with over a dozen CMU RI grads!) with offices in San Francisco, London, Munich and Shenzhen. We will share a few of the stories and learnings along the journey through multiple product releases, four rounds of venture funding, challenges at the overlap of many disciplines, large scale mass production, and seemingly endless strings of highs and lows.

Finally, we are excited to share our next product, Cozmo, a robot character that uses a deep combination of robotics, AI, game design, and animated film-style animation with the aim of bringing a physical character to life with a level of personality, emotion and interaction that has never been possible outside of a screen. This interdisciplinary approach has led us to build a small animation studio within a robotics company with a novel approach to animating physical characters, showing intense levels of attachment and emotional response in all of our early testing. Along with a look at the many years of research and development leading to this product, we will discuss why the SDK that will be released with the launch in October could unlock one of the most capable and affordable robotic platforms for research and education.

Boris Sofman is co-founder and CEO of Anki, an artificial intelligence and robotics company focused on using these technologies to reinvent everyday consumer experiences. With an initial focus on entertainment, Anki's first product line, Overdrive, is a battle-racing game that allowed a level of physical gameplay and interaction previously not possible outside of video games and was one of the top selling toys of the 2015 holiday season. Anki is releasing its next product line, Cozmo, this fall. Boris has a background in building diverse robotic systems from consumer products to off-road autonomous vehicles and bomb-disposal robots. He earned a B.S., M.S. and Ph.D. from the Robotics Institute of Carnegie Mellon University.

Hanns Tappeiner is co-founder and President of Anki, an artificial intelligence and robotics company focused on creating groundbreaking consumer products. Anki's first product line, Overdrive, is a battle-racing game that allowed a level of physical gameplay and interaction previously not possible outside of video games and was one of the top selling toys of the 2015 holiday season. Anki is releasing its next product line, Cozmo, this fall. Before moving to the US for his MS and PhD in Robotics at Carnegie Mellon, Hanns earned a Dipl. Ing. in Computer Science in Europe with minors in Mechanical and Electrical Engineering. His is currently on LOA from the PhD program at CMU and hopes to find the time to finish his thesis in the not to far future. He is mainly interested in the application of Robotics and AI in real world consumer products.

Faculty Host: Martial Hebert

A first person video records not only what is out in the environment but also what is in our head (intention and attention) at the time via social and physical interactions. It is invisible but it can be revealed by fixation, camera motion, and visual semantics. In this talk, I will present a computational model to decode our intention and attention from first person cameras when interacting with (1) scene and (2) people.

A person exerts his/her intention through applying physical force and torque to scenes and objects, which effects in visual sensation. We leverage the first person visual sensation to precisely compute force and torque that the first person experienced by integrating visual semantics, 3D reconstruction, and inverse optimal control. Such visual sensation also allows us to associate with our past experiences that eventually provide a strong cue to predict future activities. When interacting with other people, social attention is a medium that controls group behaviors, e.g., how they form a group and move. We learn the geometric and visual relationship between group behaviors and social attention measured from first person cameras. Based on the learned relationship, we derive a predictive model to localize social attention from third person view.

Hyun Soo Park is an Assistant Professor at the Department of Computer Science and Engineering, the University of Minnesota. He is interested in understanding human visual sensorimotor behaviors from first person cameras. Prior to the UMN, he was a Postdoctoral Fellow working with Jianbo Shi at University of Pennsylvania. He earned his Ph.D. under the supervision of Yaser Sheikh from Carnegie Mellon University.

Sponsored in part by Disney Research.

People with upper extremity disabilities are gaining increased independence through the use of assisted devices such as wheelchair-mounted robotic arms. However, the increased capability and dexterity of these robotic arms also makes them challenging to control through accessible interfaces like joysticks, sip-and-puff, and buttons that are lower-dimensional than the control space of the robot. The potential for robotics autonomy to ease control burden within assistive domains has been recognized for decades. While full autonomy is an option, it removes all control from the user. When this is not desired by the human, the assistive technology in fact has made them less able and discards useful input the human might provide, leveraging for example their superior situational awareness, that would add to system robustness.

This thesis takes an in-depth dive into how to add autonomy to an assistive robot arm in the specific application of eating, to make it faster and more enjoyable for people with disabilities to feed themselves. While we are focused on this specific application, the tools and insights we gain can generalize to the fields of deformable object manipulation, selection from behavior libraries, intent prediction, robot teleoperation, and human-robot interaction. The nature of the physical proximity and heavy dependence on the robot arm for doing daily tasks creates a very high-stakes human-robot interaction.

We propose a system that is capable of fully autonomous feeding by (1) predicting bite timing based on social queues, (2) detecting relevant features of the food using RGBD sensor data, and (3) automatically selecting a goal and a food-collection motion primitive to bring a bite from the plate to the operator's mouth. We propose investigating the desired level of autonomy through user studies with an assistive robot where users have varying degrees of control over the bite timing, bite selection, action selection, control mode-switching, and direct teleoperation of the robot to determine the effect on cognitive load, acceptance, trust, and task performance.

Thesis Committee:
Siddhartha Srinivasa (Chair)
Christopher Atkeson
Jodi Forlizz
Leila Takayama (University of California, Santa Cruz)

Copy of Proposal Document

An effective autonomous robot performing dangerous or menial tasks will need to act under significant time and energy constraints. At task time, the amount of effort a robot spends planning its motion directly detracts from its total performance. Manipulation tasks, however, present challenges to efficient motion planning. Tightly coupled steps (e.g. choices of object grasps or placements) allow poor early decisions to render subsequent steps difficult, which encourages longer planning horizons. However, an articulated robot situated within a geometrically complex and dynamic environment induces a high-dimensional configuration space in which it is expensive to test for valid paths. And since multi-step plans require paths in changing valid subsets of configuration space, it is difficult to reuse computation across steps.

This thesis proposes an approach to motion planning well-suited to articulated robots performing recurring multi-step manipulation tasks in complex, semi-structured environments. The high cost of edge validation in roadmap methods motivates us to study a lazy approach to pathfinding on graphs which decouples constructing and searching the graph from validating its edges. This decoupling raises two immediate questions which we address: (a) how to allocate precious validation computation among the unevaluated edges on the graph, and (b) how to efficiently solve the resulting dynamic pathfinding problem which arises as edges are validated. We next consider the inherent tradeoff between planning and execution cost, and show that an objective based on utility functions is able to effectively balance these competing goals during lazy pathfinding. Lastly, we define the family motion planning problem which captures the structure of multi-step manipulation tasks, and propose a related utility function which allows our motion planner to quickly find efficient solutions for such tasks.

We assemble our algorithms into an integrated manipulation planning system, and demonstrate its effectiveness on motion and manipulation tasks on several robotic platforms. We also provide open-source implementations of our algorithms for contemporary motion planning frameworks. While the motivation for this thesis originally derived from manipulation, our pathfinding algorithms are broadly applicable to problem domains in which edge validation is expensive. Furthermore, the underlying similarity between lazy and dynamic settings also renders our incremental algorithms applicable to conventional dynamic problems such as traffic routing.

Thesis Committee:
Siddhartha Srinivasa (Chair)
Anthony Stentz
Maxim Likhachev
Lydia Kavraki (Rice University)

As social and collaborative robots move into everyday life, the need for algorithms enabling their acceptance becomes critical. People parse non-verbal communications intuitively, even from machines that do not look like people, thus, expressive motion is a natural and efficient way to communicate with people. This work presents a computational Expressive Motion framework allowing simple robots to modify task motions to communicate varying internal states, such as task status, social relationships, mood (e.g. emotive) and/or attitude (e.g. rushed, confident). By training robot motion features with humans in the loop, future robot designers can use this approach to parametrize how a robot generates its task motions.

The hypothesis of this Thesis is that robots can modify the motion features of their task behaviors such to legibly communicate a variety of states. Typically, researchers build instances of expressive motion into individual robot behaviors (which is not scalable), or use an independent channel such as lights or facial expressions that do not interfere with the robot's task. What is unique about this work is that we use the same modality to do both task and expression: the robot's joint and whole-body motions. While this is not the only way for a robot to communicate expression, Expressive Motion is a channel available to all moving machines, which can work in tandem with additional communication modalities. Our methodological approach is to operationalize the Laban Effort System, a well-known technique from acting training, describing a four-dimensional state space of Time, Weight, Space and Flow. Thus, our Computational Laban Effort (CLE) framework can use four values, the Laban Effort Setting, to represent a robot's current state. Each value is reflected in the motion characteristics of the robot's movements. For example, a Laban Time Effort of ‘sudden’ might have more abrupt accelerations and fast velocity, while a Laban Time Effort value of ‘sustained’ could have slower acceleration and low velocity. In our experiments, we find that varying these four Effort values results in complex communications of robot state to the people around it, even for robots with low degrees of freedom.

The technical contributions of this work include:

  1. A Computational Laban Effort framework for layering Expressive Motion features onto robot task behaviors, fully specified for low degree of freedom robots.
  2. Specifications for selecting, exploring and making generalizations about how to map these motion features to particular robot state communications.
  3. Experimental studies of human-robot interaction to evaluate the legibility, attributions and impact of these technical components.
  4. Sample evaluations of approaches to establish mappings between CLE features and state communications.

Thesis Committee:
Reid Simmons (Chair)
Manuela Veloso
Aaron Steinfeld
Guy Hoffman (Cornell University)

Copy of Thesis Document

As autonomous systems are deployed in increasingly complex and uncertain environments, safe, accurate, and robust feedback control techniques are required to ensure reliable operation. Accurate trajectory tracking is essential to complete a variety of tasks, but this may be difficult if the system’s dynamics change online, e.g., due to environmental effects or hardware degradation. As a result, uncertainty mitigation techniques are also necessary to ensure safety and accuracy.

This problem is well suited to a receding-horizon optimal control formulation via Nonlinear Model Predictive Control (NMPC). NMPC employs a nonlinear model of the plant dynamics to compute non-myopic control policies, thereby improving tracking accuracy relative to reactive approaches. This formulation ensures constraints on the dynamics are satisfied and can compensate for plant model uncertainty via robust and adaptive extensions. However, existing NMPC techniques are computationally expensive, and many operating domains preclude reliable, high-rate communication with a base station. This is particularly difficult for small, agile systems, such as micro aerial vehicles, which have severely limited computation due to size, weight, and power restrictions but require high-rate feedback control to maintain stability. Therefore, the system must be able to operate safely and reliably with typically limited onboard computational resources.

In this thesis, we propose a series of non-myopic, computationally-efficient, feedback control strategies that enable accurate and reliable operation in the presence of unmodeled system dynamics. The key concept underlying these techniques is the reuse of past experiences to reduce online computation and enhance control performance in novel scenarios. The work completed thus far demonstrates high-rate, constrained, adaptive control of agile systems through the use of experience to inform an online-updated estimate of the system dynamics model and the choice of controller for a given scenario. The proposed work aims to enhance robustness to uncertainty, improve computational efficiency, and inform motion planning to facilitate tracking. We also propose two case studies to demonstrate the performance of these techniques.

Thesis Committee:
Nathan Michael (Chair)
Maxim Likhachev
Koushil Sreenath
Nicholas Roy (Massachusetts Institute of Technology)

Copy of Proposal Document

We study a fundamental question in pose estimation from vision-only video data: should the pose of a camera be determined from fixed and known correspondences? Or should correspondences be simultaneously estimated alongside the pose?<p>

Determining pose from fixed correspondences is known as feature-based, where well-established tools from projective geometry are utilized to formulate and solve a plethora of pose estimation problems. Nonetheless, in degraded imaging conditions such as low light and blur, reliably detecting and precisely localizing interest points becomes challenging.

Conversely, estimating correspondences alongside motion is known as the direct approach, where image data are used directly to determine geometric quantities without relying on sparse interest points as an intermediate representation. The approach is in general more precise by virtue of redundancy as many measurements are used to estimate a few degrees-of-freedom. However, direct methods are more sensitive to changes in illumination.

In this work, we combine the best of the feature-based approaches with the precision of direct methods. Namely, we make use of densely and sparsely evaluated local feature descriptors in a direct image alignment framework to address pose estimation in challenging conditions. Applications include tracking planar targets under sudden and drastic changes in illumination as well as visual odometry in poorly-lit subterranean mines.

Motivated by the success of the approach, we introduce a novel formulation for the joint refinement of pose and structure across multiple views akin to feature-based bundle adjustment (BA). In contrast to minimizing the reprojection error using BA, initial estimates are refined such that the photometric consistency of their image projections is maximized without the need for correspondences. The technique is evaluated on a range of datasets and is shown to improve upon the accuracy of the current state-of-the-art in vision-based simultaneous localization and mapping (VSLAM).

Thesis Committee:
Brett Browning (Co-chair)
Simon Lucey (Co-chair)
Michael Kaess
Martial Hebert
Ian D. Reid (The University of Adelaide)

Perception and state estimation are critical robot competencies that remain difficult to harden and generalize. This is due in part to the incredible complexity of modern perception systems which are commonly comprised of dozens of components with hundreds of parameters overall. Selecting a configuration of parameters relies on a human's understanding of the parameters' interaction with the environment and the robot behavior, which we refer to as the "context." Furthermore, evaluating the performance of the system entails multiple empirical trials, which often poorly predict the generality of the system.

We depart from the conventional wisdom that perception systems must generalize to be successful and instead suggest that a perception system need only do well in situations it encounters over the course of its deployment. This thesis proposes that greater overall perceptual generality can be achieved by designing perception systems that adapt to their local contexts by re-selecting perception system parameters. Towards this end, we have completed work on improving stochastic model fidelity and discuss our proposed work on applying reinforcement learning techniques to learn parameter selection policies from perceptual experience.

Thesis Committee:
George Kantor (Chair)
Sebastian Scherer
Katharina Muelling
Ingmar Posner (University of Oxford)

Copy of Proposal Document


Subscribe to RI