A recurrent and elementary machine perception task is to localize objects of interest in the physical world, be it objects on a warehouse shelf or cars on a road. In many real-world examples, this task entails localizing specific object instances with known 3D models. For example, a warehouse robot equipped with a depth sensor is required to recognize and localize objects in a shelf with known inventory, while a low-cost industrial robot might need to localize parts on an assembly line.
Most modern-day methods for the 3D multi-object localization task employ scene-to-model feature matching or regression/classification by learners trained on synthetic or real scenes. While these methods are typically fast in producing a result, they are often brittle, sensitive to occlusions, and depend on the right choice of features and/or training data. This thesis introduces and advocates a deliberative approach, where the multi-object localization task is framed as an optimization over the space of hypothesized scenes. We conjecture that deliberative reasoning--such as understanding inter-object occlusions--is essential to robust perception, and that the role of discriminative algorithms should mainly be to guide this process.
As part of this thesis work so far, we have developed two methods towards this objective: PErception via SeaRCH (PERCH) and Discriminatively-guided Deliberative Perception (D2P). PERCH exploits structure in the optimization over hypothesized scenes to cast it as a tree search over individual object poses, thereby overcoming the computational intractability of joint optimization. D2P extends PERCH by allowing modern statistical learners such as deep neural networks to guide the global search. This is made possible by Multi-Heuristic A* (MHA*) and its extensions, graph search algorithms which we developed for handling multiple, possibly "inadmissible" heuristics. These algorithms allow us to leverage arbitrary learning-based algorithms as heuristics to accelerate search, without compromising on solution quality.
Our experiments with D2P indicate that we can leverage the complementary strengths of fast learning-based methods and deliberative classical search to handle both "hard" (severely occluded) and "easy" portions of a scene by automatically sliding the amount of deliberation required. For easy scenes, the algorithm mostly relies on learning-based methods to save computation, while for harder scenes, it injects more deliberation to gain robustness at the expense of computation time. In addition, to demonstrate the applicability of D2P to real-world perception tasks, we have integrated our method with the Human-Assisted Robotic Picker (HARP)--the system that represented CMU at the 2016 Amazon Picking Challenge. For the remaining portion of this thesis work, we first propose to study whether D2P can achieve real-time performance, independently of the complexity of the scene. Further, our existing approach assumes that there is no extraneous clutter, and that the objects have only 3 degrees of freedom. In the remainder of this thesis, we aim to relax these assumptions to permit broader applicability of Deliberative Perception.
Maxim Likhachev (Chair)
Siddhartha S. Srinivasa
Dieter Fox (University of Washington)
The advent of robotic systems to medicine has revolutionized the practice of surgery. Most recently, several novel robotic surgical systems have been developed are entering the operative theater. This lecture will describe the current state-of-the-art in the robotic surgery. We will also describe some of the newer systems that are currently in use. Finally, the future of robotic surgery will be described in the context of clinical development and ease of use in the operating theaters of the future.
Umamaheswar Duvvuri, MD, PhD, is a graduate of the University of Pennsylvania obtaining his Medical Degree in 2000 and his PhD in Biophysics in 2002. He completed an internship in General Surgery in 2003 and residency training in Otolaryngology in 2007 at the University of Pittsburgh Medical Center. He completed fellowship training in Head and Neck Surgery in 2008 at the University of Texas MD Anderson Cancer Center. He joined the University of Pittsburgh in August 2008 as an Assistant Professor in the Department of Otolaryngology, Head and Neck Surgery Division and is also a staff physician in the VA Pittsburgh healthcare System. He serves as the Director of Robotic Surgery, Division of Head and Neck Surgery, at the University Of Pittsburgh School Of Medicine and is the current Director of the Center for Advanced Robotics Training (CART) at the University of Pittsburgh Medical Center. He directs the Cart Training Courses which provide technical and circumstantial resources to initiate and optimize robotic surgery programs. He has authored numerous research publications and book chapters and is an invited guest lecturer/speaker on the subject of robotic surgery both nationally and internationally. A Fulbright scholar, his research interests include minimally invasive endoscopic and robotic surgery of the head and neck, tumors of the thyroid and parathyroid glands and molecular oncology of head and neck cancer. He is a leader in his field and has proctored Transoral Robotic Surgery cases at numerous medical educational facilities throughout the United States and Europe. He directs a federally funded laboratory that studies the biology of head and neck cancer. He holds funding from the National Institute of Health, Department of Veterans Affairs and the “V” foundation.
Faculty Host: Howie Choset
Appointments: Peggy Martin (email@example.com)
As we work to move robots out of factories and into human environments, we must empower robots to interact freely in unstructured, cluttered spaces. Humans do this easily, using diverse, whole-arm, nonprehensile actions such as pushing or pulling in everyday tasks. These interaction strategies make difficult tasks easier and impossible tasks possible.
In this thesis, we aim to enable robots with similar capabilities. In particular, we formulate methods for planning robust open-loop trajectories that solve the rearrangement planning problem using nonprehensile interactions. In these problems, a robot must plan in a cluttered environment, reasoning about moving multiple objects in order to achieve a goal.
The problem is difficult because we must plan in continuous, high-dimensional state and action spaces. Additionally, during planning we must respect the physical constraints induced by the nonprehensile interaction between the robot and the objects in the scene.
Our key insight is that by embedding physics models directly into our planners we can naturally produce solutions that use nonprehensile interactions such as pushing. This also allows us to easily generate plans that exhibit full arm manipulation and simultaneous object interaction without the need for programmer defined high-level primitives that specifically encode this interaction. We show that by generating these diverse actions, we are able to find solutions for motion planning problems in highly cluttered, unstructured environments.
In the first part of this thesis we formulate the rearrangement planning problem as a classical motion planning problem. We show that we can embed physics simulators into randomized planners. We propose methods for reducing the search space and speeding planning time in order to make the planners useful in real-world scenarios.
The second part of the thesis tackles the imperfect and imprecise worlds that reflect the true reality for robots working in human environments. We pose the rearrangement planning under uncertainty problem as an instance of conformant probabilistic planning and offer methods for solving the problem. We demonstrate the effectiveness of our algorithms on two platforms: the home care robot HERB and the NASA rover K-Rex.
We demonstrate expanded autonomous capability on HERB, allowing him to work better in high clutter, completing previously infeasible tasks and speeding feasible task execution. In addition, we show these planners increase autonomy for the NASA rover K-Rex by allowing the rover to actively interact with the environment.
Siddhartha S. Srinivasa (Chair)
Matthew T. Mason
David Hsu (National University of Singapore)
Terrence W. Fong (NASA Ames Research Center)
Data driven approaches to modeling time-series are important in a variety of applications from market prediction in economics to the simulation of robotic systems. However, traditional supervised machine learning techniques designed for i.i.d. data often perform poorly on these sequential problems. This thesis proposes that time series and sequential prediction, whether for forecasting, filtering, or reinforcement learning, can be effectively achieved by directly training recurrent prediction procedures rather then building generative probabilistic models.
To this end, we introduce a new training algorithm for learned time-series models, Data as Demonstrator (DaD), that theoretically and empirically improves multi-step prediction performance on model classes such as recurrent neural networks, kernel regressors, and random forests. Additionally, experimental results indicate that DaD can accelerate model-based reinforcement learning. We next show that latent-state time-series models, where a sufficient state parametrization may be unknown, can be learned effectively in a supervised way. Our approach, Predictive State Inference Machines (PSIMs), directly optimizes – through a DaD-style training procedure – the inference performance without local optima by identifying the recurrent hidden state as a predictive belief over statistics of future observations. Fundamental to our learning framework is that the prediction of observable quantities is a lingua franca for building AI systems. We propose three extensions that leverage this general idea and adapt it to a variety of problems. The first aims to improve the training time and performance of more sophisticated recurrent neural networks. The second extends the PSIM framework to controlled dynamical systems. The third looks to train recurrent architectures for reinforcement learning problems.
J. Andrew Bagnell (Co-chair)
Martial Hebert (Co-chair)
Byron Boots (Georgia Institute of Technology)
It is a paradox that often the more severe a person's motor impairment, the more challenging it is for them to operate the very assistive machines which might enhance their quality of life. A primary aim of my lab is to address this confound by incorporating robotics autonomy and intelligence into assistive machines---to offload some of the control burden from the user. Robots already synthetically sense, act in and reason about the world, and these technologies can be leveraged to help bridge the gap left by sensory, motor or cognitive impairments in the users of assistive machines. However, here the human-robot team is a very particular one: the robot is physically supporting or attached to the human, replacing or enhancing lost or diminished function. In this case getting the allocation of control between the human and robot right is absolutely essential, and will be critical for the adoption of physically assistive robots within larger society. This talk will overview some of the ongoing projects and studies in my lab, whose research lies at the intersection of artificial intelligence, rehabilitation robotics and machine learning. We are working with a range of hardware platforms, including smart wheelchairs and assistive robotic arms. A distinguishing theme present within many of our projects is that the machine automation is customizable---to a user's unique and changing physical abilities, personal preferences or even financial means.
Brenna Argall is the June and Donald Brewer Junior Professor of Electrical Engineering & Computer Science at Northwestern University, and also an assistant professor in the Department of Mechanical Engineering and the Department of Physical Medicine & Rehabilitation. Her research lies at the intersection of robotics, machine learning and human rehabilitation. She is director of the assistive & rehabilitation robotics laboratory (argallab) at the Rehabilitation Institute of Chicago (RIC), the premier rehabilitation hospital in the United States, and her lab's mission is to advance human ability through robotics autonomy. Argall is a 2016 recipient of the NSF CAREER award. Her Ph.D. in Robotics (2009) was received from the Robotics Institute at Carnegie Mellon University, as well as her M.S. in Robotics (2006) and B.S. in Mathematics (2002). Prior to joining Northwestern, she was a postdoctoral fellow (2009-2011) at the École Polytechnique Fédérale de Lausanne (EPFL), and prior to graduate school she held a Computational Biology position at the National Institutes of Health (NIH).
Faculty Host: Stephen Nuske
Achieving optimality while staying safe is one of the key problems that arise when planning under uncertainty. We specifically focus on path planning for aerial vehicles, where the uncertainties arise due to unobserved winds and other air traffic. A flight plan or a policy that doesn’t take into account such uncertainties can not only result in highly inefficient flight paths but can also jeopardize safety. In this talk, we will first focus on how to reduce uncertainty in wind predictions by using airplanes in flight as a large-scale sensor network. In particular, we explore how information from existing commercial aircraft on their normal business can be harnessed to observe and predict weather phenomena at a continental scale in greater detail that currently available. In the second part of the talk, we consider the problem of path planning under uncertain winds and traffic conditions. Specifically we propose planning algorithms that trade off exploration and exploitation in near-optimal manner and have appealing no-regret properties. Further, we will also discuss how Probabilistic Signal Temporal Logic (PrSTL) can be adapted to the robotic path planning problems in order to guarantee safety. We will present results from longitudinal real-world studies that demonstrate effectiveness of the framework.
Ashish Kapoor is a senior researcher at Microsoft Research, Redmond. Currently, his research focuses on Aerial Informatics and Robotics with an emphasis on building intelligent and autonomous flying agents that are safe and enable applications that can positively influence our society. The research builds upon cutting edge research in machine intelligence, robotics and human-centered computation in order to enable an entire fleet of flying robots that range from micro-UAVs to commercial jetliners. Various applications scenarios include Weather Sensing, Monitoring for Precision Agriculture, Safe Cyber-Physical Systems etc. Ashish received his PhD from MIT Media Laboratory in 2006. He also holds FAA Commercial Pilot certificate (SEL), FAA Flight Instructor certificate (Airplane Single Engine and Instrument Airplane) and is an avid amateur aircraft builder (see build blog).
Faculty Host: Louis-Philippe Morency
Reception follows at 5:00 pm in Newell-Simon 1513
Learn more about the major and minor in Robotics!
We describe the development and testing of the Optical Coherence Tomography Microsurgical Augmented Reality System (OCT-MARS). This system allows surgeons to view real-time medical image data as an in-situ overlay within the surgical field. There are a number of clinical applications for which real time, in situ visualization of otherwise transparent structures of the eye would be beneficial to surgeons. The primary motivating application for this project is the surgical treatment of glaucoma. We have built a projection system capable of producing flat and tilted images in the normal field of view of the microscope with sufficient brightness and resolution to be viewed under magnification. We have also studied the perception of tilted surfaces under magnification and found that OCT images provide sufficient stereo information to be correctly perceived. Finally, we have tested stereo perception under magnification using surgically relevant tasks to evaluate the effectiveness of the system.
George Stetten (Co-chair)
John Galeotti (Co-chair)
Thomas Furness (University of Washington)
Humans effortlessly manipulate objects in cluttered and uncertain environments. In contrast, most robotic manipulators are limited to carefully engineered environments to circumvent the difficulty of manipulation under uncertainty. Contact sensors can provide robots with with the feedback vital to addressing this limitation.
This thesis proposes a framework for using feedback from contact sensors to reliably manipulate objects under uncertainty. We formalize manipulation as a partially observable Markov decision process that includes object pose uncertainty, proprioceptual error, and kinematic constraints. Our algorithms exploit the structure of contact to efficiently estimate state and plan with this model.
First, we introduce the manifold particle filter as a principled method of estimating object pose and robot configuration. This algorithm avoids degeneracy by drawing samples from the lower-dimensional manifold of states induced by contact. Next, we introduce two belief space planning algorithms that seek out contact with sensors when doing so is necessary to achieve the goal. One algorithm harnesses the decoupling effect of contact to share computation between problem instances. The second leverages lower-dimensional structure to plan around kinematic constraints.
Finally, we evaluate the efficacy of our approach in real-robot and simulation experiments. The results show that our state estimation and planning algorithms consistently outperform those that are not tailored to manipulation or contact sensing.
Siddhartha Srinivasa (Co-chair)
Nancy Pollard (Co-chair)
Tomas Lozano-Perez (Massachusetts Institute of Technology)
Improving robotic manipulation is critical for robots to be actively useful in real-world factories and homes. While some success has been shown in simulation and controlled environments, robots are slow, clumsy, and not general or robust enough when interacting with their environment. By contrast, humans effortlessly manipulate objects. One possible reason for this discrepancy is that humans have had years of experience to collect data to have good internal models of what happens when they manipulate objects. If robots could learn models from a large amount of real data, they could become more capable manipulators. In this thesis, we propose to improve robotic manipulation by solving two problems. First, we look at how robots can collect a large amount of manipulation data without human intervention. Second, we study how to build statistical models of robotic manipulation from the collected data. These data-driven models can then be used for planning more robust manipulation actions.
To solve the first problem of enabling large data collection, we perform several different robotic manipulation experiments and use these as case studies. We study bin-picking, post-grasp manipulation, pushing, regrasping, and planar grasping. These case studies allow us to gain insights on how robots can collect a large amount of accurate data with minimal human intervention.
To solve the second problem of statistically modeling manipulation actions, we propose models for different parts of various manipulation actions. First, we look at how to model post-grasp manipulation actions by modeling the probability distribution of where an object ends up in a robot's hand, and how this affects its success rate at various tasks such as placing or insertion. Second, we model how robots can change the pose of an object in their hand with regrasp actions. These learned data-driven models can then be used for planning more robust and accurate manipulation actions.
Matthew T. Mason (Chair)
Nancy S. Pollard
Geoffrey J. Gordon
Paul G. Backes (Jet Propulsion Laboratory)