In recent years, the U.S. educational system has fallen short in training the technology innovators of the future. To do so, we must give students the experience of designing and creating technological artifacts, rather than relegating students to the role of technology consumers, and must provide educators with opportunities and professional development for identifying and supporting their students’ talents. This is especially important for the identification of student talents in computational thinking or engineering design where schools commonly lack educators well versed in those domains. Educational robotics systems are one possible method for providing educators and students with these opportunities.
Our creative robotics program, Arts & Bots, combines craft materials with robotic construction and programming tasks in a manner that encourages complexity such that a wide variety of student talents can surface while permitting integration with non-technical disciplines. This thesis describes our process in developing Arts & Bots as a tool for talent-based learning, which we define as leveraging understanding of a student’s talent areas to encourage and motivate learning. We look at this process and the outcomes of two multi-year Arts & Bots studies: the three year Arts & Bots Pioneers study, where we integrated Arts & Bots into non-technical classes; and the four year Arts & Bots Math-Science Partnership, where we further refined Arts & Bots as a tool for talent identification.
This thesis outlines our development of a teacher training model and case studies of two teacher-designed, Arts & Bots classroom projects. We present a taxonomy for novice-built robots along with other tools which support the identification of engineering design and computational thinking talent by non-technical teachers. Finally we describe our development of a suite of evaluation tools for assessing the outcomes of the Arts & Bots program along with our findings from that evaluation.
Illah Nourbakhsh (Chair)
Mitchel Resnick (MIT Media Lab)
Understanding the temporal dimension of images is a fundamental part of computer vision. Humans are able to interpret how the entities in an image will change over time. However, it has only been relatively recently that researchers have focused on visual forecasting—getting machines to anticipate events in the visual world before they actually happen. This aspect of vision has many practical implications in tasks ranging from human-computer interaction to anomaly detection. In addition, temporal prediction can serve as a task for representation learning, useful for various other recognition problems.
In this thesis, we focus on visual forecasting that is data-driven, self-supervised, and relies on little to no explicit semantic information. Towards this goal, we explore prediction at different timeframes. We first consider predicting instantaneous pixel motion---optical flow. We apply convolutional neural networks to predict optical flow in static images. We then extend this idea to a longer timeframe, generalizing to pixel trajectory prediction in space-time. We incorporate models such as Variational Autoencoders to generate future possible motions in the scene. After this, we consider a mid-level element approach to forecasting. By combining a Markovian reasoning framework with an intermediate representation, we are able to forecast events over longer timescales.
In proposed work, we aim to create a model of visual forecasting that utilizes a structured representation of an image for reasoning. Specifically, instead of directly predicting events in a low-level feature space such as pixels or motion, we forecast events in a higher level representation that is still visually meaningful. This approach confers a number of advantages. It is not restricted by explicit timescales like motion-based approaches, and unlike direct pixel-based approaches predictions are less likely to "fall off" the manifold of the true visual world.
Martial Hebert (Co-chair)
Abhinav Gupta (Co-chair)
David Forsyth (University of Illinois at Urbana-Champaign)
We expect legged robots to be highly mobile. Human walking and running can execute quick changes in speed and direction, even on non-flat ground. Indeed, analysis of simplified models shows that these quantities can be tightly controlled by adjusting the leg placement between steps, and that leg placement can also compensate for disturbances including changes in the ground height. However, to date, legged robots do not exhibit this level of agility or robustness, nor is it well understood what prevents them from attaining this performance. This thesis begins to bridge the gap between the theoretical motions of simplified models and the implementation of agile behaviors on legged robots.
The state of the art allows room for improvement at the level of the simplified model, at the level of hardware demonstration, and at the level of theoretical understanding of applying the simplified model to a real system. We make progress on each of these facets of the problem as we work towards leveraging theory from the simplified model to generate effective control for locomotion on robots. In particular, spring mass theory has identified deadbeat stability for planar running, but it must be formulated in 3D to be applicable to a real system. We extend this behavior to 3D, adding deadbeat steering to the tracking of apex height on unobserved terrain. Running robots have yet to demonstrate the agile and robust behavior that the spring mass model describes; existing implementations do not target the deadbeat behavior. We apply state of the art control techniques to map the deadbeat stabilized planar running onto our robot ATRIAS, and we successfully demonstrate tight tracking of commanded velocities and robustness to unobserved changes in ground height. Despite this empirical proof of concept, it remains unclear how exactly the targeted behavior of the simplified model affects the closed loop behavior of the full order system. There are additional degrees of freedom which affect the tracking of original goals and additional layers of control which may offer other sources of stability. Furthermore, the hardware introduces perturbations and uncertainties which detract from the nominal performance of the full order model. To answer these questions, we formulate a framework founded on linear theory, and we use it to examine the contributions of each component of the control and to quantify the expected effects of the disturbances we encounter. This analysis reveals insights for effective control strategies for legged locomotion and presents a tool for scientific iteration between theory-based control design and evidence-based revision of the underlying theory.
Hartmut Geyer (Chair)
Christopher G. Atkeson
Jerry Pratt (Institute for Human and Machine Cognition)
Autonomous quadrotors will soon play a major role in search-and-rescue and remote-inspection missions, where a fast response is crucial. Quadrotors have the potential to navigate quickly through unstructured environments, enter and exit buildings through narrow gaps, and fly through collapsed buildings. However, their speed and maneuverability are still far from those of birds. Indeed, agile navigation through unknown, indoor environments poses a number of challenges for robotics research in terms of perception, state estimation, planning, and control. In this talk, I will give an overview of my research activities on visual navigation of quadrotors, from slow navigation (using standard frame-based cameras) to agile flight (using active vision and event-based cameras). Topics covered will be: visual inertial state estimation, monocular dense reconstruction, active vision and control, event-based vision.
Davide Scaramuzza (born in 1980, Italian) is Assistant Professor of Robotics at the University of Zurich, where he does research at the intersection of robotics and computer vision. He did his PhD in robotics and computer vision at ETH Zurich (with Roland Siegwart) and a postdoc at the University of Pennsylvania (with Vijay Kumar and Kostas Daniilidis). From 2009 to 2012, he led the European project “sFly”, which introduced the world’s first autonomous navigation of micro drones using visual-inertial sensors and onboard computing. For his research contributions, he was awarded the IEEE Robotics and Automation Society Early Career Award, the SNSF-ERC Starting Grant ($1.5m, equivalent of NSF Career Award), and a Google Faculty Research Award.
In 2015, his lab received funding from the DARPA FLA Program, a three-year project dedicated to agile navigation of vision-controlled drones in unstructured and cluttered environments. He coauthored the book “Introduction to Autonomous Mobile Robots” (published by MIT Press) and more than 80 papers on robotics and perception. In 2015, he co-founded a venture, called Zurich-Eye, dedicated to the commercialization of visual-inertial navigation solutions for mobile robots. In September 2016, this became Facebook-Oculus VR Switzerland.
Faculty Host: Michael Kaess
Tangible heritage, such as temples and statues, is disappearing day-by-day due to human and natural disaster. In e-tangible heritage, such as folk dances, local songs, and dialects, has the same story due to lack of inheritors and mixing cultures. We have been developing methods to preserve such tangible and in-tangible heritage in the digital form. This project, which we refer to as e-Heritage, aims not only record heritage, but also analyzes those recorded data for better understanding as well as displays those data in new forms for promotion and education.
This talk consists of three parts. The first part briefly covers e-Tangible heritage, in particular, our projects in Cambodia and Kyushu. Here I emphasize not only challenge in data acquisition but also the importance to create the new aspect of science, Cyber-archaeology, which allows us to have new findings in archaeology, based on obtained digital data. The second part covers how to display a Japanese folk dance by the performance of a humanoid robot. Here, we follow the paradigm, learning-from-observation, in which a robot learns how to perform a dance from observing a human dance performance. Due to the physical difference between a human and a robot, the robot cannot exactly mimic the human actions. Instead, the robot first extracts important actions of the dance, referred to key poses, and then symbolically describes them using Labanotation, which the dance community has been using for recording dances. Finally, this labanotation is mapped to each different robot hardware for reconstructing the original dance performance.
The third part tries to answer the question, what is the merit to preserve folk dances by using robot performance by the answer that such symbolic representations for robot performance provide new understandings of those dances. In order to demonstrate this point, we focus on folk dances of native Taiwanese, which consists of 14 different tribes. We have converted those folk dances into Labanotation for robot performance. Further, by analyzing these Labanotations obtained, we can clarify the social relations among these 14 tribes.
Dr. Katsushi Ikeuchi is a Principal Researcher of Microsoft Research Asia, stationed at Microsoft Redmond campus. He received a Ph.D. degree in Information Engineering from the University of Tokyo in 1978. After working at Artificial Intelligence Lab of Massachusetts Institute of Technology as a pos-doc fellows for three years, Electrotechnical Lab of Japanese Government as a researcher for five years, Robotics Institute of Carnegie Mellon University as a faculty member for ten years, Institute of Industrial Science of the University of Tokyo as a faculty member for nineteen years, he joined Microsoft Research Asia in 2015. His research interest spans computer vision, robotics, and computer graphics. He has received several awards, including IEEE-PAMI Distinguished Researcher Award, the Okawa Prize from the Okawa foundation, and Si-Ju-Ho-Sho (the Medal of Honor with Purple ribbon) from the Emperor of Japan. He is a fellow of IEEE, IEICE, IPSJ, and RSJ.
Faculty Host: Martial Hebert
In many real-world applications, ego-motion estimation and mapping must be conducted online. In the robotics world, especially, real-time motion estimates are important for control of autonomous vehicles, while online generated maps are crucial for obstacle avoidance and path planning. Further, the complete map of a traversed environment can be taken as an input for further processing such as scene segmentation, 3D reasoning, and virtual reality.
To date, fusing a large amount of data from a variety of sensors in real-time remains a nontrivial problem. The problem is particularly hard if is to be solved in 3D, accurately, robustly, and in a small form factor. This thesis proposes to tackle the problem by leveraging range, vision, and inertial sensing in a coarse-to-fine manor, through multi-layer processing. In a modularized processing pipeline, modules taking light computation execute at high frequencies to gain robustness w.r.t. high-rate, rapid motion. Modules consuming heavy processing run at low frequencies to ensure accuracy in resulting motion estimates and maps.
Further, the modularized processing pipeline is capable of handling sensor degradation by automatic reconfiguration bypassing failure modules. Vision-based methods typically fail in low-light or texture-less scenes. Likewise, lidar-based methods are problematic in symmetric or extruded environments such as a long and straight corridor. When such degradation occurs, the proposed pipeline automatically determines a degraded subspace in the problem state space, and solves the problem partially in the well-conditioned subspace. Consequently, the final solution is formed by combination of the “healthy” parts from each module.
The proposed ego-motion estimation and mapping methods have been validated in extensive experiments ranging from car-mounted, hand-carried, to drone-attached setups. Experiments are conducted in various environments covering structured urban areas as well as unstructured natural scenes. Results indicate that the methods can carry out high-precision estimation over a long distance of travel as well as robustness w.r.t. high-speed, aggressive motion and environmental degradation.
Sanjiv Singh (Chair)
Larry Matthies (Jet Propulsion Laboratory)
We present work towards developing a control method for powered knee and ankle prostheses based on a neuromuscular model of human locomotion. Previous research applying neuromuscular control to simulated biped models and to powered ankle prostheses suggest that this approach can adapt to changes in speed, incline, and rough ground. The improved robustness and generalizability of the approach may arise from its modeling of various physical and neural components of the human neuromuscular system. For example, research has shown that muscular reflexes, such as positive force feedback, can generate human-like compliant leg behavior, that muscle properties such as the force-velocity relationship are important for regulating energy in simplified gait models, and that biarticular structures play an important role in preventing joint over extension during compression of multi-segmented legs. While research has demonstrated that these components individually contribute to the robustness of simplified legged systems, it is unclear if their combined effect when applied to prostheses will help improve the amputee gait robustness. Therefore, the goal of this thesis is to investigate how to apply neuromuscular control to a powered knee and ankle prosthesis and quantify the robustness of amputee gait under this control strategy.
To further motivate our use of neuromuscular control, we first model and simulate an amputee walking with a powered prosthesis and perform optimizations to obtain parameters for the proposed neuromuscular control and the established impedance control method for prostheses. We find that neuromuscular control significantly improves the simulated amputee's gait robustness on uneven ground. To confirm that this improved robustness is evident on a real system, we design and build a powered knee and ankle prosthesis that features powerful actuators capable of producing sufficient torque and speed for trip recovery and series elasticity to enable accurate reproduction of the neuromuscular model torques. In parallel, we have investigated methods to optimize prosthesis control parameters for specific subjects via qualitative feedback. In completed work, we present and evaluate the performance of a Bayesian optimization method that works with a user's preferences between pairs of parameters.
In our proposed work we intend to implement the neuromuscular control on the completed prosthesis hardware and evaluate its robustness properties. We hypothesize that the proposed control will allow amputees to more quickly recover from disturbances. Furthermore, we will extend our method for optimizing control parameters to include more forms of qualitative feedback and explicit consideration of user adaptation over time. Finally, we propose to improve the neuromuscular control's response to disturbances during swing via explicit detection, classification, and execution of recovery strategies.
Hartmut Geyer (Chair)
Elliott Rouse (Northwestern University)
This thesis develops methods for social signal reconstruction---in particular, we measure human motion during social interactions. Compared to other work in this space, we aim to measure the entire body, from the overall body pose to subtle hand gestures and facial expressions. The key to achieving this without placing markers, instrumentation, or other restrictions on participants is the Panoptic Studio, a massively multi-view capture system which allows us to obtain 3D reconstructions of room-sized scenes.
To measure the position of joints and other landmarks on the human body, we combine the output of 2D keypoint detectors across multiple views and triangulate them in 3D. We develop a semi-supervised training procedure, multi-view bootstrapping, which uses 3D triangulation to generate training data for keypoint detectors. We use this technique to train fine-grained 2D keypoint detectors for landmarks on the hands and face, allowing us to measure these two important sources of social signals.
To model human motion data, we present the Kronecker Markov Random Field (KMRF) model for keypoint representations of the face and body. We show that most of the covariance in natural body motions corresponds to a specific set of spatiotemporal dependencies which result in a Kronecker or matrix normal distribution over spatiotemporal data, and we derive associated inference procedures that do not require training sequences. This statistical model can be used to infer complete sequences from partial observations and unifies linear shape and trajectory models of prior art into a probabilistic shape-trajectory distribution that has the individual models as its marginals.
Finally, we demonstrate full-body motion reconstructions by using the KMRF model to combine the various measurements obtained from the Panoptic Studio. We capture a dataset of groups of people engaged in social games and fit mesh models of the body, face, and hands---a representation that encodes many of the social signals that characterize an interaction and can be used for analysis, modeling, and animation.
Yaser Sheikh (Co-chair)
Iain Matthews (Co-chair)
Fernando De la Torre
David Fleet (University of Toronto)
Automatic analysis of facial actions (AFA) can reveal a person's emotion, intention, and physical state, and make possible a wide range of applications. To enable reliable, valid, and efficient AFA, this thesis investigates both supervised and unsupervised learning.
Supervised learning for AFA is challenging, in part, because of individual differences among persons in face shape and appearance and variation in video acquisition and context. To improve generalizability across persons, we propose a transductive framework, Selective Transfer Machine (STM), which personalizes generic classifiers through joint sample reweighting and classifier learning. By personalizing classifiers, STM offers improved generalization to unknown persons. As an extension, we develop a variant of STM for use when partially labeled data are available.
Additional challenges for supervised learning include learning an optimal representation for classification, variation in base rates of action units (AUs), correlation between AUs and temporal consistency. While these challenges could be partly accommodated with an SVM or STM, a more powerful alternative is afforded by an end-to-end supervised framework (i.e., deep learning). We propose a convolutional network with long short-term memory (LSTM) and multi-label sampling strategies. We compared SVM, STM and deep learning approaches with respect to AU occurrence and intensity in and between BP4D+ and GFT databases (size = ~0.6 million annotated frames).
Annotated video is not always possible or desirable. We introduce an unsupervised Branch-and-Bound framework to discover correlated facial actions in un-annotated video. We term this approach Common Event Discovery (CED). We evaluate CED in video and motion capture data. CED achieved moderate convergence with supervised approaches and enabled discovery of novel patterns occult to supervised approaches.
Fernando De la Torre (Co-chair)
Jeffrey F. Cohn (Co-chair)
Vladimir Pavlovic (Rutgers University)