RI

While the applications for robotics are plentiful in theory, matching technical capabilities to real world customer needs at high reliability and practical price points is incredibly difficult, leaving behind large numbers of ambitious, but ultimately failed, attempts to apply robotics to consumer applications. In this talk we will share a bit of our journey with Anki, a company we started working on in 2008 with the goal of identifying and entering markets where robotics and AI can have a real, measurable impact in a short time frame, and then using the technologies and learnings developed for one product as building blocks for the next.

We enjoyed an eventful path from our early days as three Robotics Institute PhD students working out of a Pittsburgh living room to a 150 person company (with over a dozen CMU RI grads!) with offices in San Francisco, London, Munich and Shenzhen. We will share a few of the stories and learnings along the journey through multiple product releases, four rounds of venture funding, challenges at the overlap of many disciplines, large scale mass production, and seemingly endless strings of highs and lows.

Finally, we are excited to share our next product, Cozmo, a robot character that uses a deep combination of robotics, AI, game design, and animated film-style animation with the aim of bringing a physical character to life with a level of personality, emotion and interaction that has never been possible outside of a screen. This interdisciplinary approach has led us to build a small animation studio within a robotics company with a novel approach to animating physical characters, showing intense levels of attachment and emotional response in all of our early testing. Along with a look at the many years of research and development leading to this product, we will discuss why the SDK that will be released with the launch in October could unlock one of the most capable and affordable robotic platforms for research and education.

Boris Sofman is co-founder and CEO of Anki, an artificial intelligence and robotics company focused on using these technologies to reinvent everyday consumer experiences. With an initial focus on entertainment, Anki's first product line, Overdrive, is a battle-racing game that allowed a level of physical gameplay and interaction previously not possible outside of video games and was one of the top selling toys of the 2015 holiday season. Anki is releasing its next product line, Cozmo, this fall. Boris has a background in building diverse robotic systems from consumer products to off-road autonomous vehicles and bomb-disposal robots. He earned a B.S., M.S. and Ph.D. from the Robotics Institute of Carnegie Mellon University.

Hanns Tappeiner is co-founder and President of Anki, an artificial intelligence and robotics company focused on creating groundbreaking consumer products. Anki's first product line, Overdrive, is a battle-racing game that allowed a level of physical gameplay and interaction previously not possible outside of video games and was one of the top selling toys of the 2015 holiday season. Anki is releasing its next product line, Cozmo, this fall. Before moving to the US for his MS and PhD in Robotics at Carnegie Mellon, Hanns earned a Dipl. Ing. in Computer Science in Europe with minors in Mechanical and Electrical Engineering. His is currently on LOA from the PhD program at CMU and hopes to find the time to finish his thesis in the not to far future. He is mainly interested in the application of Robotics and AI in real world consumer products.

Faculty Host: Martial Hebert

A first person video records not only what is out in the environment but also what is in our head (intention and attention) at the time via social and physical interactions. It is invisible but it can be revealed by fixation, camera motion, and visual semantics. In this talk, I will present a computational model to decode our intention and attention from first person cameras when interacting with (1) scene and (2) people.

A person exerts his/her intention through applying physical force and torque to scenes and objects, which effects in visual sensation. We leverage the first person visual sensation to precisely compute force and torque that the first person experienced by integrating visual semantics, 3D reconstruction, and inverse optimal control. Such visual sensation also allows us to associate with our past experiences that eventually provide a strong cue to predict future activities. When interacting with other people, social attention is a medium that controls group behaviors, e.g., how they form a group and move. We learn the geometric and visual relationship between group behaviors and social attention measured from first person cameras. Based on the learned relationship, we derive a predictive model to localize social attention from third person view.

Hyun Soo Park is an Assistant Professor at the Department of Computer Science and Engineering, the University of Minnesota. He is interested in understanding human visual sensorimotor behaviors from first person cameras. Prior to the UMN, he was a Postdoctoral Fellow working with Jianbo Shi at University of Pennsylvania. He earned his Ph.D. under the supervision of Yaser Sheikh from Carnegie Mellon University.

Sponsored in part by Disney Research.

People with upper extremity disabilities are gaining increased independence through the use of assisted devices such as wheelchair-mounted robotic arms. However, the increased capability and dexterity of these robotic arms also makes them challenging to control through accessible interfaces like joysticks, sip-and-puff, and buttons that are lower-dimensional than the control space of the robot. The potential for robotics autonomy to ease control burden within assistive domains has been recognized for decades. While full autonomy is an option, it removes all control from the user. When this is not desired by the human, the assistive technology in fact has made them less able and discards useful input the human might provide, leveraging for example their superior situational awareness, that would add to system robustness.

This thesis takes an in-depth dive into how to add autonomy to an assistive robot arm in the specific application of eating, to make it faster and more enjoyable for people with disabilities to feed themselves. While we are focused on this specific application, the tools and insights we gain can generalize to the fields of deformable object manipulation, selection from behavior libraries, intent prediction, robot teleoperation, and human-robot interaction. The nature of the physical proximity and heavy dependence on the robot arm for doing daily tasks creates a very high-stakes human-robot interaction.

We propose a system that is capable of fully autonomous feeding by (1) predicting bite timing based on social queues, (2) detecting relevant features of the food using RGBD sensor data, and (3) automatically selecting a goal and a food-collection motion primitive to bring a bite from the plate to the operator's mouth. We propose investigating the desired level of autonomy through user studies with an assistive robot where users have varying degrees of control over the bite timing, bite selection, action selection, control mode-switching, and direct teleoperation of the robot to determine the effect on cognitive load, acceptance, trust, and task performance.

Thesis Committee:
Siddhartha Srinivasa (Chair)
Christopher Atkeson
Jodi Forlizz
Leila Takayama (University of California, Santa Cruz)

Copy of Proposal Document

An effective autonomous robot performing dangerous or menial tasks will need to act under significant time and energy constraints. At task time, the amount of effort a robot spends planning its motion directly detracts from its total performance. Manipulation tasks, however, present challenges to efficient motion planning. Tightly coupled steps (e.g. choices of object grasps or placements) allow poor early decisions to render subsequent steps difficult, which encourages longer planning horizons. However, an articulated robot situated within a geometrically complex and dynamic environment induces a high-dimensional configuration space in which it is expensive to test for valid paths. And since multi-step plans require paths in changing valid subsets of configuration space, it is difficult to reuse computation across steps.

This thesis proposes an approach to motion planning well-suited to articulated robots performing recurring multi-step manipulation tasks in complex, semi-structured environments. The high cost of edge validation in roadmap methods motivates us to study a lazy approach to pathfinding on graphs which decouples constructing and searching the graph from validating its edges. This decoupling raises two immediate questions which we address: (a) how to allocate precious validation computation among the unevaluated edges on the graph, and (b) how to efficiently solve the resulting dynamic pathfinding problem which arises as edges are validated. We next consider the inherent tradeoff between planning and execution cost, and show that an objective based on utility functions is able to effectively balance these competing goals during lazy pathfinding. Lastly, we define the family motion planning problem which captures the structure of multi-step manipulation tasks, and propose a related utility function which allows our motion planner to quickly find efficient solutions for such tasks.

We assemble our algorithms into an integrated manipulation planning system, and demonstrate its effectiveness on motion and manipulation tasks on several robotic platforms. We also provide open-source implementations of our algorithms for contemporary motion planning frameworks. While the motivation for this thesis originally derived from manipulation, our pathfinding algorithms are broadly applicable to problem domains in which edge validation is expensive. Furthermore, the underlying similarity between lazy and dynamic settings also renders our incremental algorithms applicable to conventional dynamic problems such as traffic routing.

Thesis Committee:
Siddhartha Srinivasa (Chair)
Anthony Stentz
Maxim Likhachev
Lydia Kavraki (Rice University)

As social and collaborative robots move into everyday life, the need for algorithms enabling their acceptance becomes critical. People parse non-verbal communications intuitively, even from machines that do not look like people, thus, expressive motion is a natural and efficient way to communicate with people. This work presents a computational Expressive Motion framework allowing simple robots to modify task motions to communicate varying internal states, such as task status, social relationships, mood (e.g. emotive) and/or attitude (e.g. rushed, confident). By training robot motion features with humans in the loop, future robot designers can use this approach to parametrize how a robot generates its task motions.

The hypothesis of this Thesis is that robots can modify the motion features of their task behaviors such to legibly communicate a variety of states. Typically, researchers build instances of expressive motion into individual robot behaviors (which is not scalable), or use an independent channel such as lights or facial expressions that do not interfere with the robot's task. What is unique about this work is that we use the same modality to do both task and expression: the robot's joint and whole-body motions. While this is not the only way for a robot to communicate expression, Expressive Motion is a channel available to all moving machines, which can work in tandem with additional communication modalities. Our methodological approach is to operationalize the Laban Effort System, a well-known technique from acting training, describing a four-dimensional state space of Time, Weight, Space and Flow. Thus, our Computational Laban Effort (CLE) framework can use four values, the Laban Effort Setting, to represent a robot's current state. Each value is reflected in the motion characteristics of the robot's movements. For example, a Laban Time Effort of ‘sudden’ might have more abrupt accelerations and fast velocity, while a Laban Time Effort value of ‘sustained’ could have slower acceleration and low velocity. In our experiments, we find that varying these four Effort values results in complex communications of robot state to the people around it, even for robots with low degrees of freedom.

The technical contributions of this work include:

  1. A Computational Laban Effort framework for layering Expressive Motion features onto robot task behaviors, fully specified for low degree of freedom robots.
  2. Specifications for selecting, exploring and making generalizations about how to map these motion features to particular robot state communications.
  3. Experimental studies of human-robot interaction to evaluate the legibility, attributions and impact of these technical components.
  4. Sample evaluations of approaches to establish mappings between CLE features and state communications.

Thesis Committee:
Reid Simmons (Chair)
Manuela Veloso
Aaron Steinfeld
Guy Hoffman (Cornell University)

Copy of Thesis Document

As autonomous systems are deployed in increasingly complex and uncertain environments, safe, accurate, and robust feedback control techniques are required to ensure reliable operation. Accurate trajectory tracking is essential to complete a variety of tasks, but this may be difficult if the system’s dynamics change online, e.g., due to environmental effects or hardware degradation. As a result, uncertainty mitigation techniques are also necessary to ensure safety and accuracy.

This problem is well suited to a receding-horizon optimal control formulation via Nonlinear Model Predictive Control (NMPC). NMPC employs a nonlinear model of the plant dynamics to compute non-myopic control policies, thereby improving tracking accuracy relative to reactive approaches. This formulation ensures constraints on the dynamics are satisfied and can compensate for plant model uncertainty via robust and adaptive extensions. However, existing NMPC techniques are computationally expensive, and many operating domains preclude reliable, high-rate communication with a base station. This is particularly difficult for small, agile systems, such as micro aerial vehicles, which have severely limited computation due to size, weight, and power restrictions but require high-rate feedback control to maintain stability. Therefore, the system must be able to operate safely and reliably with typically limited onboard computational resources.

In this thesis, we propose a series of non-myopic, computationally-efficient, feedback control strategies that enable accurate and reliable operation in the presence of unmodeled system dynamics. The key concept underlying these techniques is the reuse of past experiences to reduce online computation and enhance control performance in novel scenarios. The work completed thus far demonstrates high-rate, constrained, adaptive control of agile systems through the use of experience to inform an online-updated estimate of the system dynamics model and the choice of controller for a given scenario. The proposed work aims to enhance robustness to uncertainty, improve computational efficiency, and inform motion planning to facilitate tracking. We also propose two case studies to demonstrate the performance of these techniques.

Thesis Committee:
Nathan Michael (Chair)
Maxim Likhachev
Koushil Sreenath
Nicholas Roy (Massachusetts Institute of Technology)

Copy of Proposal Document

We study a fundamental question in pose estimation from vision-only video data: should the pose of a camera be determined from fixed and known correspondences? Or should correspondences be simultaneously estimated alongside the pose?<p>

Determining pose from fixed correspondences is known as feature-based, where well-established tools from projective geometry are utilized to formulate and solve a plethora of pose estimation problems. Nonetheless, in degraded imaging conditions such as low light and blur, reliably detecting and precisely localizing interest points becomes challenging.

Conversely, estimating correspondences alongside motion is known as the direct approach, where image data are used directly to determine geometric quantities without relying on sparse interest points as an intermediate representation. The approach is in general more precise by virtue of redundancy as many measurements are used to estimate a few degrees-of-freedom. However, direct methods are more sensitive to changes in illumination.

In this work, we combine the best of the feature-based approaches with the precision of direct methods. Namely, we make use of densely and sparsely evaluated local feature descriptors in a direct image alignment framework to address pose estimation in challenging conditions. Applications include tracking planar targets under sudden and drastic changes in illumination as well as visual odometry in poorly-lit subterranean mines.

Motivated by the success of the approach, we introduce a novel formulation for the joint refinement of pose and structure across multiple views akin to feature-based bundle adjustment (BA). In contrast to minimizing the reprojection error using BA, initial estimates are refined such that the photometric consistency of their image projections is maximized without the need for correspondences. The technique is evaluated on a range of datasets and is shown to improve upon the accuracy of the current state-of-the-art in vision-based simultaneous localization and mapping (VSLAM).

Thesis Committee:
Brett Browning (Co-chair)
Simon Lucey (Co-chair)
Michael Kaess
Martial Hebert
Ian D. Reid (The University of Adelaide)

Perception and state estimation are critical robot competencies that remain difficult to harden and generalize. This is due in part to the incredible complexity of modern perception systems which are commonly comprised of dozens of components with hundreds of parameters overall. Selecting a configuration of parameters relies on a human's understanding of the parameters' interaction with the environment and the robot behavior, which we refer to as the "context." Furthermore, evaluating the performance of the system entails multiple empirical trials, which often poorly predict the generality of the system.

We depart from the conventional wisdom that perception systems must generalize to be successful and instead suggest that a perception system need only do well in situations it encounters over the course of its deployment. This thesis proposes that greater overall perceptual generality can be achieved by designing perception systems that adapt to their local contexts by re-selecting perception system parameters. Towards this end, we have completed work on improving stochastic model fidelity and discuss our proposed work on applying reinforcement learning techniques to learn parameter selection policies from perceptual experience.

Thesis Committee:
George Kantor (Chair)
Sebastian Scherer
Katharina Muelling
Ingmar Posner (University of Oxford)

Copy of Proposal Document

We are aiming at building robots that can interact with people in public spaces. Such a robot receives various sounds, such as surrounding noises and users' voices. In this talk I will present a machine learning-based method to estimate response obligation, i.e., whether the robot needs to respond to each input sound or not. This enables the robot to reject not only noises but also monologues and user utterances toward other users. Our method uses not only acoustic information but also users' motions and postures during a sound segment as features. In addition, user behaviors after a sound segment are taken into account to exploit typical user behaviors in human-robot interaction; for example, a user often stands still when he/she speaks to a robot. Experimental results showed our proposed model significantly outperformed a baseline. We found that user behaviors both during and after sound segments are helpful for estimating the response obligation.

Mikio Nakano is a principal researcher at Honda Research Institute Japan Co., Ltd. (HRI-JP). He received his M.S. degree in Coordinated Sciences and Sc.D. degree in Information Science from the University of Tokyo, respectively in 1990 and 1998.From 1990 to 2004, he worked for NTT (Nippon Telegraph and Telephone Corporation). He was a visiting scientist at MIT Laboratory for Computer Science from 2000 to 2002. He joined HRI-JP in 2004. He has been studying various types of dialogue system including conversational robots and text-based chatbots.

He was a science advisory committee member of SIGDIAL from 2007 to 2012. He also served as a general chair for SIGDIAL 2010 and an area chair for ACL 2012. He was a visiting professor at Waseda University from 2011 to 2016.

The Honda Research Institute Japan is an affiliated company of Honda Motor Company, and as well as its sister companies, Honda Research Institute USA and Honda Research Institute Europe, it is dedicated to fundamental research. HRI-JP focuses on "the intelligence supporting human and machine" by conducting research that takes a unique approach not bounded by conventional concepts.  Current research areas of HRI-JP include dialogue systems, robot audition, psychology of vision, and machine learning.

Perception and state estimation are critical robot competencies that remain difficult to harden and generalize. This is due in part to the incredible complexity of modern perception systems which are commonly comprised of dozens of components with hundreds of parameters overall. Selecting a configuration of parameters relies on a human's understanding of the parameters' interaction with the environment and the robot behavior, which we refer to as the "context." Furthermore, evaluating the performance of the system entails multiple empirical trials, which often poorly predict the generality of the system.

We depart from the conventional wisdom that perception systems must generalize to be successful and instead suggest that a perception system need only do well in situations it encounters over the course of its deployment. This thesis proposes that greater overall perceptual generality can be achieved by designing perception systems that adapt to their local contexts by re-selecting perception system parameters. Towards this end, we have completed work on improving stochastic model fidelity and discuss our proposed work on applying reinforcement learning techniques to learn parameter selection policies from perceptual experience.

Thesis Committee:
George Kantor (Chair)
Sebastian Scherer
Katharina Muelling
Ingmar Posner (University of Oxford)

Copy of Proposal Document

Pages

Subscribe to RI