We present an approach to efficiently detect the 2D pose of multiple people in an image. The approach uses a nonparametric representation, which we refer to as Part Affinity Fields (PAFs), to learn to associate body parts with individuals in the image. The architecture encodes global context, allowing a greedy bottom-up parsing step that maintains high accuracy while achieving realtime performance, irrespective of the number of people in the image. The architecture is designed to jointly learn part locations and their association via two branches of the same sequential prediction process. Our method placed first in the inaugural COCO 2016 keypoints challenge, and significantly exceeds the previous state-of-the-art result on the MPII Multi-Person benchmark, both in performance and efficiency.  Commitee:Yaser Sheikh (Advisor)Deva RamananAayush Bansal 

Humans use subtle and elaborate body signals to convey their thoughts, emotions, and intentions. "Kinesics" is a term that refers to the study of such body movements used in social communication, including facial expressions and hand gestures. Understanding kinesic signals is fundamental to understanding human communication; it is among the key technical barriers to making machines that can genuinely communicate with humans. Yet, the encoding of conveyed information by body movement is still poorly understood.

This thesis proposal is focused on two major challenges in building a computational understanding of kinesic communication: (1) measuring full body motion as a continuous high bandwidth kinesic signal; and (2) modeling kinesic communication as information flow between coupled agents that continuously predict each others' response signals.  To measure kinesic signals between multiple interacting people, we first develop the Panoptic Studio, a massively multiview system composed of more than five hundred camera sensors. The large number of views allows us to develop a method to robustly measure subtle 3D motions of bodies, hands, and faces of all individuals in a large group of people. To this end, a dataset containing 3D kinesic signals of more than two hundred sequences from hundreds of participants is collected and shared publicly.

The Panoptic studio allows us to measure kinesic signals of a large group of interacting people for the first time. We propose to model these signals as information flow in a communication system. The core thesis of our approach is that a meaningful encoding of body movement will emerge from representations that are optimized for efficient prediction of these kinesic communication signals. We hope to see this approach inspire continuous and quantitative models in the future study of social behavior.

Thesis Committee:
Taser Sheikh (Chair)
Takeo Kanade
Louis-Philippe Morency
Mina Cikara (Harvard University)
David Forsyth (University of Illinois at Urbana-Champaign)

Copy of Proposal Document

Robots manipulate with super-human speed and dexterity on factory floors. But yet they fail even under moderate amounts of clutter or uncertainty. However, human teleoperators perform remarkable acts of manipulation with the same hardware. My research goal is to bridge the gap between what robotic manipulators can do now and what they are capable of doing.

What human operators intuitively possess that robots lack are models of interaction between the manipulator and the world that go beyond pick-and-place. I will describe our work on nonprehensile physics-based manipulation that has produced simple but effective models, integrated with proprioception and perception, that has enabled robots to fearlessly push, pull, and slide objects, and reconfigure clutter that comes in the way of their primary task.

But human environments are also filled with humans. Collaborative manipulation is a dance, demanding the sharing of intentions, inferences, and forces between the robot and the human. I will also describe our work on the mathematics of human-robot interaction that has produced a framework for collaboration using Bayesian inference to model the human collaborator, and trajectory optimization to generate fluent collaborative plans.

Finally, I will talk about our new initiative on assitive care that focuses on marrying physics, human-robot collaboration, control theory, and rehabilitation engineering to build and deploy caregiving systems.

Siddhartha Srinivasa is the Finmeccanica Associate Professor at The Robotics Institute at Carnegie Mellon University. He works on robotic manipulation, with the goal of enabling robots to perform complex manipulation tasks under uncertainty and clutter, with and around people. To this end, he founded and directs the Personal Robotics Lab, and co-directs the Manipulation Lab. He has been a PI on the Quality of Life Technologies NSF ERC, DARPA ARM-S and the CMU CHIMP team on the DARPA DRC. Sidd is also passionate about building end-to-end systems (HERB, ADA, HRP3, CHIMP, Andy, among others) that integrate perception, planning, and control in the real world. Understanding the interplay between system components has helped produce state of the art algorithms for object recognition and pose estimation (MOPED), and dense 3D modeling (CHISEL, now used by Google Project Tango).

Sidd received a B.Tech in Mechanical Engineering from the Indian Institute of Technology Madras in 1999, an MS in 2001 and a PhD in
2005 from the Robotics Institute at Carnegie Mellon University. He played badminton and tennis for IIT Madras, captained the CMU squash team, and lately runs competitively.

Faculty Host: Martial Hebert

Robotic swarms are multi-robot systems whose global behavior emerges from local interactions between individual robots and spatially proximal neighboring robots. Each robot can be programmed with several local control laws that can be activated depending on an operator's choice of global swarm behavior (e.g. flocking, aggregation, formation control, area coverage). In contrast to other multi-robot systems, robotic swarms are inherently scalable since they are robust to addition and removal of members with minimal system reconfiguration. This makes them ideal for applications such as search and rescue, environmental exploration and surveillance.

For practical missions, which may require a combination of swarm behaviors and have dynamically changing mission goals, human interaction with the robotic swarm is necessary. However, human-swarm interaction is complicated by the fact that a robotic swarm is a complex distributed dynamical system, so its state evolution depends on the sequence as well as timing of the supervisory inputs. Thus, it is difficult to predict the effects of an input on the state evolution of the swarm. More specifically, after becoming aware of a change in mission goals, it is unclear at what point the operator must convey this information to the swarm or which combination of behaviors to use to accomplish the new goals.

The main challenges we seek to address in this thesis are characterizing the effects of input timing on swarm performance and using this theory to inform automated composition of swarm behaviors to accomplish updated mission goals. 

We begin by formalizing the notion of Neglect Benevolence --- the idea that delaying the application of an input can sometimes be beneficial to overall swarm performance --- and using the developed theory to demonstrate experimentally that humans can learn to approximate optimal input timing. By restricting our behavior library to consensus-based swarm behaviors, we then apply results from control theory to present an algorithm for automated scheduling of swarm behaviors to time-optimally accomplish multiple unordered goals. We also present an algorithm that solves the swarm behavior composition problem when our library contains general swarm behaviors, but the switch times are known.

In our completed work, we have made significant progress towards the swarm behavior composition problem from the perspective of scheduling. In our proposed future work, we plan to (1) extend our work on behavior scheduling by simultaneously relaxing assumptions on switch times and the types of behaviors in the library and (2) study behavior composition from the perspective of synthesis. In this context, synthesis describes the act of appropriately instantiating from a set of swarm meta-behaviors, the necessary concrete swarm behaviors to complete a desired task.

Thesis Committee:
Katia Sycara (Chair)
Howie Choset
Maxim Likhachev
Nilanjan Chakraborty (Stony Brook University)

To make intelligent decisions, robots often use models of the stochastic effects of their actions on the world. Unfortunately, in complex environments, it is often infeasible to create models that are accurate in every plausible situation, which can lead to suboptimal performance. This thesis enables robots to reason about model inaccuracies to improve their performance. The thesis focuses on model inaccuracies that are subtle --i.e., they cannot be detected from a single observation-- and context-dependent --i.e., they affect particular regions of the robot's state-action space. Furthermore, this work enables robots to react to model inaccuracies from sparse execution data.

Our approach consists of enabling robots to explicitly reason about parametric Regions of Inaccurate Modeling (RIMs) in their state-action space. We enable robots to detect these RIMs from sparse execution data, to correct their models given these detections, and to plan accounting for uncertainty with respect to these RIMs. To detect and correct RIMs, we first develop optimization-based algorithms that work effectively online in low-dimensional domains. To extend this approach to high-dimensional domains, we develop a search-based Feature Selection algorithm, which relies on the assumption that RIMs are intrinsically low-dimensional but embedded in a high-dimensional space. Finally, we enable robots to make plans that account for their uncertainty about the accuracy of their models.

We evaluate our approach on various complex robot domains. Our approach enables the CoBot mobile service robots to autonomously detect inaccuracies in their motion models, despite their high-dimensional state-action space: the CoBots detect that they are not moving correctly in particular areas of the building, and that their wheels are starting to fail when making turns. Our approach enables the CMDragons soccer robots to improve their passing and shooting models online in the presence of opponents with unknown weaknesses and strengths. Finally, our approach enables a NASA spacecraft landing simulator to detect subtle anomalies, unknown to us beforehand, in their streams of high-dimensional sensor-output and actuator-input data.

Thesis Committee:
Reid Simmons (Co-chair)
Manuela Veloso (Co-chair)
Jeff Schneider
Brian Williams (Massachusetts Institute of Technology)

Copy of Thesis Document

The neural control of human locomotion is not fully understood. As current experimental techniques provide only partial and indirect access to the neural control network, our understanding remains fragmentary with large gaps between detectable neural circuits and measurable behavioral data. Neuromechanical simulation studies can help bridging these gaps. By testing a hypothesized controller in neuromechanical simulations, one can evaluate the plausibility of the controller and propose experimental studies which can further investigate the hypothesis. Better understanding the control of human locomotion will change the way we design rehabilitation treatment and engineer assistive devices.

This thesis first investigates how much of human locomotion control can be explained by spinal reflexes using neuromechanical simulations. It is known that the spinal control is essential in generating locomotion behaviors in humans, which leads to two central questions: “how does the lower layer controller in the spinal cord generate the motor patterns?” and “how is this lower layer controller modulated by the higher layer brain control to realize different locomotion tasks?” To investigate these questions, we propose a hierarchical control model with two layers, where the lower-layer control consists of spinal reflexes, and the higher-layer sends a few commands to modulate this lower layer control. In neuromechanical simulations, this model can generate diverse human locomotion behaviors, including walking and running, adapting to slopes and stairs, and changing locomotion directions and speeds. Furthermore, its reactions to a range of unexpected disturbances during normal walking are remarkably similar to those observed in human experiments. The simulation results suggest following answers to the central questions: “the motor patterns of many human locomotion behaviors can be generated by chains of reflexes” and “different locomotion behaviors can be realized by a reflex-based unified controller that is modulated by the higher-layer control.”

The latter part of this thesis presents three studies of using the neuromechanical control model either as a simulation testbed for studying human locomotion or as a robotic controller for legged machines. First, the neuromechanical model is used to study human foot biomechanics. The walking simulations with different foot designs suggest that the windlass mechanism in human feet saves metabolic cost during walking, and this saving does not come from the compliance of the feet, which is one component of this mechanism. Second, the age-related skeletal, muscular, and neural changes are applied to the model to investigate why the metabolic cost increases and the regular walking speed reduces in elderly people. The increase of metabolic cost of the elderly model is mostly attributed to weakened muscles, and we find muscle fatigue as a plausible performance criterion that suggests slower walking speed for the elderly model. In the last study, we adapt the neuromechanical model for a bipedal robot ATRIAS. With the controller, ATRIAS could walk on a rough terrain with unknown height changes of ± 20 cm in a sagittal plane physics simulation.

Thesis Committee:
Hartmut Geyer (Chair)
Christopher G. Atkeson
Stelian Coros
Auke J. Ijspeert (EPFL)

Copy of Draft Document

Reliable and efficient acquisition of data from physical spaces will have countless applications in industry, policy and defense. The capability of gaining information at different scales makes Micro-Aerial Vehicles (MAVs) excellent for aforementioned applications. However, reasoning about information gathering at multiple resolution is NP-Hard and the state of the art methods are too slow to present an approximate solution online. Moreover, a robust data gathering system needs to reason about multi-resolution nature of information gathering while being safe, and cognizant of its sensory and battery limitations.

This thesis addresses three key aspects of enabling safe, efficient, multi-resolution data gathering: online budgeted multi-resolution informative path planning (IPP), guaranteeing safety and, optimization of sensing bandwidth for implicit and explicit data gathering requirements.

Firstly, we present an online navigation algorithm to guarantee the safety of the robot through an Emergency Maneuver Library (EML). We discuss an efficient method to construct EML while exploiting vehicle's dynamics capabilities. We then present an information gathering approach that optimizes the sensory actions to ensure vehicle safety and gain information relevant for mission progress. We validate these methods by deploying them on-board a full scale helicopter, demonstrating significant performance increase. We address the IPP problem through Randomized Anytime Orienteering (RAOr), an anytime, asymptotically near-optimal algorithm, that enables the planning for information gathering online.

We will focus our future work on three sub-problems that will lead to a safe, efficient data gathering framework. The first is developing a receding horizon planner that enables the vehicle to stay safe while maximizing the information gathered, through embedding safety constraint and information theoretic reward functions in sampling based planning framework. The second is learning a set of heuristics to enable faster multi-resolution informative path planning through RAOr. The third is to use the safe data gathering framework to improve vehicle's long-term performance through improving its assumptions about the environment.

We will evaluate the performance of our information gathering framework on an autonomous MAV. We expect that our framework will enable long term deployment of autonomous multi-resolution data gathering systems, while guaranteeing their safety, enabling MAVs to realize their potential as efficient data gatherers.

Thesis Committee:
Sebastian Scherer (Chair)
William (Red) L. Whittaker
David Wettergreen
Kostas Alexis (University of Nevada, Reno)

Copy of Proposal Document

Achieving human-level visual understanding requires extracting rich geometric information from images. In particular, it entails moving beyond 2D bounding boxes to more detailed geometric representations. In this talk I will present recent work in this direction, specifically keypoint localization and single-image depth estimation. For keypoint localization, I will present new neural architectures and representations. For single-image depth estimation, I will show how to use crowdsourcing to improve depth estimation for images in the wild.

Jia Deng is an Assistant Professor of Computer Science and Engineering at the University of Michigan. His research focus is on computer vision and machine learning, in particular, achieving human-level visual understanding by integrating perception, cognition, and learning. He received his Ph.D. from Princeton University and his B.Eng. from Tsinghua University, both in computer science. He is a recipient of the PAMI Everingham Prize, the Yahoo ACE Award, a Google Faculty Research Award, the ICCV Marr Prize, and the ECCV Best Paper Award.

A high-fidelity and tractable mechanics model of the physical interaction is essential for autonomous robotic manipulation in complex and uncertain environments. Nonetheless, task mechanics are often ignored or nullified in most robotic manipulation systems. This thesis proposal addresses three aspects of harnessing task mechanics: mechanics model learning, uncertainty reduction and control synthesis.

We first study a large class of manipulation problems where surface-to-surface planar sliding motion occurs. An efficiently identifiable convex polynomial force-motion model is proposed. We derive the kinematic contact model that resolves the contact modes and instantaneous object motion given a position controlled manipulator action. This enables generic quasi-static planar contact simulation, which is validated with extensive robotic grasping and pushing experiments. We then generate tree-structured sequential grasping plans, both sensored and sensorless, that will succeed in localizing the post-action object pose to a singleton (subject to symmetry) despite the presence of bounded initial state uncertainty.

We show some preliminary work on the differential flatness property of the pusher-slider system that leads to trajectory planning with Dubins curves and stable tracking with dynamic feedback linearization. Future work focuses on 1) manipulation in the gravity plane with external contacts including the ground and walls; 2) extensions of developed models to contend with clutter.

Thesis Committee:
Matthew T. Mason (Co-chair)
J. Andrew Bagnell (Co-chair)
Christopher G. Atkeson
Russ Tedrake (MIT CSAIL)

Speech-based AI assistants such as Alexa and Google Now are becoming increasingly popular as a convenient way for people to interact with machines. However, users find interactions with their assistants more natural if conducted in a conversational manner, with multiple requests made and responses provided in a given dialog session. Creating robust dialog policies for conversational bots is challenging. This talk presents a data driven approach for dialog management through reinforcement learning. We first introduce a framework for building conversational bots and describe MovieBot as an implementation of the framework that was launched as an Alexa skill. We then describe approaches to creating the reward function based on sentiment analysis on text, using various techniques including Long Short Term Memory networks (LSTMs). The talk will end by discussing potential directions, and how all pieces of the puzzle can fit together.

Alborz Geramifard is currently a Machine Learning Manager at Amazon working on conversation AI for Alexa. Before joining Amazon, he was a postdoctoral associate at MIT's Laboratory for Information and Decision Systems. Alborz received his PhD from MIT working on representation learning and safe exploration in large scale sensitive sequential decision-making problems in 2012. He finished his MSc at University of Alberta in 2008, where he worked on data efficient online reinforcement learning techniques. His research interests lie at machine learning with the focus on reinforcement learning, natural language understanding, planning, and brain and cognitive sciences. Alborz was the recipient of the NSERC postgraduate scholarships 2010-2012 program.

Faculty Host: Chris Atkeson


Subscribe to RI