An iterative model-based method is proposed for improving linguistic structure, segmentation, and prosodic annotations that correspond to the delivery of each utterance as regularized across the data. For each iteration, the training utterances are resynthized according to the existing symbolic annotation. Values of various features and subgraph structures are "twiddled:" each is perturbed based on the features and constraints of the model. Twiddled utterances are evaluated using an objective function appropriate to the type of perturbation and compared with the unmodified, resynthesized utterance. The instance with least error is assigned as the current annotation, and the entire process is repeated. At each iteration, the model is re-estimated, and the distributions and annotations regularize across the corpus. As a result, the annotations have more accurate and effective distributions, which leads to improved control and expressiveness given the features of the model.

Thesis Committee
Alan W. Black (Chair)
Jack Mostow
Alex Rudnicky
Julia Hirschberg (Columbia University)

Copy of Draft Thesis Document

Air quality has long been a major health concern for citizens around the world, and increased levels of exposure to fine particulate matter (PM2.5) has been definitively linked to serious health effects such as cardiovascular disease, respiratory illness, and increased mortality.  PM2.5  is one of six attainment criteria pollutants used by the EPA, and is similarly regulated by many other governments worldwide. Unfortunately, the high cost and complexity of most current PM2.5 monitors results in a lack of detailed spatial and temporal resolution, which means that concerned individuals have little insight into their personal exposure levels. This is especially true regarding hyper-local variations and short-term pollution events associated with industrial activity, heavy fossil fuel use, or indoor activity such as cooking.

Advances in sensor miniaturization, decreased fabrication costs, and rapidly expanding data connectivity have encouraged the development of small, inexpensive devices capable of estimating  PM2.5  concentrations. This new class of sensors opens up new possibilities for personal exposure monitoring. It also creates new challenges related to calibrating and characterizing inexpensively manufactured sensors to provide the level of precision and accuracy needed to yield actionable information without significantly increasing device cost.

This thesis addresses the following two primary questions:

  1. Can an inexpensive air quality monitor based on mass-manufactured dust sensors be calibrated efficiently in order to achieve inter-device agreement in addition to agreement with professional and federally-endorsed particle monitors?
  2. Can an inexpensive air quality monitor increase the confidence and capacity of individuals to understand and control their indoor air quality?

In the following thesis, we describe the development of the Speck fine particulate monitor. The Speck processes data from a low-cost dust sensor using a Kalman filter with a piecewise sensing model. We have optimized the parameters for the algorithm through short-term co-location tests with professional HHPC-6 particle counters, and verified typical correlations between the Speck and HHPC-6 units of r2 > 0.90. To account for variations in sensitivity, we have developed a calibration procedure whereby fine particles are aerosolized within an open room or closed calibration chamber. This allows us to produce Specks for commercial distribution as well as the experiments presented herein.

Drawing from previous pilot studies, we have distributed low-cost monitors through local library systems and community groups. Pre-deployment and post-deployment surveys characterize user perception of personal exposure and the effect of a low-cost fine particulate monitor on empowerment.

Thesis Committee:
Illah Nourbakhsh (Chair)
Aaron Steinfeld
Albert Presto
James Longhurst (University of West England, Bristol)

Copy of Draft Thesis Document

Visual Recognition has seen tremendous advances in the last decade. This progress is primarily due to learning algorithms trained with two key ingredients: large amounts of data and extensive supervision. While acquiring visual data is cheap, getting it labeled is far more expensive. So how do we enable learning algorithms to harness the sea of visual data available freely, without worrying about costly supervision?

Interestingly, our visual world is extraordinarily varied and complex, but despite its richness, the space of visual data may not be that astronomically large. We live in a well-structured, predictable world, where cars almost always drive on roads, sky is always above the ground, and so on; and these regularities can provide the missing ingredients required to scaling up our visual learning algorithms. This thesis aims to develop algorithms that: 1) discover this implicit and explicit structure in visual data, and 2) leverage the regularities to provide necessary constraints that facilitate large-scale visual learning. In particular, we propose a two-pronged strategy to enable large-scale recognition.

In Part I, we present algorithms for training better and more reliable supervised recognition models that exploit structure in various flavors of labeled data and target tasks. In Part II, we leverage these visual models and large amounts of unlabeled data to discover constraints, and use these constraints in a semi-supervised learning framework to improve visual recognition.

Thesis Committee:
Abhinav Gupta (Chair)
Alexei A. Efros (University of California, Berkeley)
Martial Hebert
Deva Ramanan
Jitendra Malik (University of California, Berkeley)

Copy of Proposal Document

Real-world sensor data in robotics and related domains is sequential, high-dimensional, noisy, and collected in a raw and unstructured form. In order to interpret, track, predict, or plan with such data, we often assume that it is generated by some underlying dynamical system model. Although we can sometimes use extensive domain knowledge to write down a dynamical system, specifying a model by hand can be a time-consuming process. This motivates an alternative approach: learning the model directly from sensor data. This is a formidable problem. Revealing the dynamical system that governs a complex time series is often not just difficult, but provably intractable. Popular maximum likelihood strategies for learning dynamical system models are slow in practice and often get stuck at poor local optima, problems that greatly limit the utility of these techniques when learning from real-world data. Although these drawbacks were long thought to be unavoidable, recent work has shown that progress can be made by shifting the focus of learning to realistic instances that rule out the intractable cases.

In this talk, I will present an overview of my work on modeling a range of robotics problems as dynamical systems. I will then focus on several related computational approaches for learning dynamical system models directly from high-dimensional sensor data. The key insight is that low-order moments of observed data often possess structure that can be revealed by powerful spectral decomposition methods, and, from this structure, model parameters can be recovered. Based on this insight, we design highly effective algorithms for learning parametric models like Kalman Filters and Hidden Markov Models, as well as nonparametric models via reproducing kernels, and new models based on predictive state representations. Unlike maximum likelihood-based approaches, these new learning algorithms are statistically consistent, computationally efficient, and easy to implement using established linear-algebra techniques. The result is a set of tools for learning dynamical system models with state-of-the-art performance on video, robotics, and biological modeling problems.

Byron Boots is an Assistant Professor in the School of Interactive Computing and the College of Computing at Georgia Tech. He directs the Georgia Tech Robot Learning Lab, which is affiliated with the Center for Machine Learning, the Institute for Data Engineering and Science, and the Institute for Robotics and Intelligent Machines. His research focuses on developing theory and systems that integrate perception, learning, and decision making. His work on learning models of dynamical systems received the Best Paper award at the International Conference for Machine Learning (ICML) in 2010. His research is supported by a NSF CISE Research Initiation Initiative award (CRII), a NSF National Robotics Initiative award (NRI), and BMW Manufacturing. Prior to joining Georgia Tech, Dr. Boots was a postdoctoral researcher working with Dieter Fox in the Robotics and State Estimation Lab at the University of Washington. He received his Ph.D. in Machine Learning from Carnegie Mellon University advised by Geoff Gordon.

Faculty Host: Sidd Srinivasa

Citizen science forges partnerships between experts and citizens through collaboration and has become a trend in public participation in scientific research over the past decade. While public participation has been applied to science education, researchers recently noticed that this strategy can contribute to participatory democracy, which empowers citizens to advocate for their local problems. Such strategy supports citizens to form a community, increase environmental monitoring, gather scientific evidence, and tell convincing stories. Researchers believe that this community-based citizen science strategy can contribute to the wellbeing of communities by giving them power to influence the general public and decision makers.

Community-based citizen science requires collecting, curating, visualizing, analyzing, and interpreting multiple types of data over a large spacetime scale. This is highly dependent on community engagement (i.e. the involvement of citizens in local neighborhoods). Such large-scale tasks require the assistance of innovative computational tools to give technology affordance to communities. However, existing tools often focus on only one type of data, and thus researchers need to develop tools from scratch. Moreover, there is a lack of design patterns for researchers to reference when developing such tools. Furthermore, existing tools are typically treated as products rather than ongoing infrastructures that sustain community engagement. This research studies the methodology of developing computational tools by using visualization and crowdsourcing to support the entire community engagement life cycle, from initiation, maintenance, tracking, to evaluation.

This research will make methodological and empirical contributions to community-based citizen science and human-computer interaction. Methodological contributions include detailed case studies with applied methodologies of information technology systems that are deployed in real-world contexts. Empirical contributions include generalizable empirical insights for developing visualization and crowdsourcing techniques that integrate multiple types of scientific data.

In this proposal, I first define community-based citizen science and explain corresponding design challenges. Then, I review existing computational tools related to this research. Next, I present two completed works: Time Engine and AirWatch Pittsburgh. Time Engine is a web-based timelapse editor for creating guided video tours and interactive slideshows from large-scale imagery datasets. AirWatch Pittsburgh is an air quality monitoring system which integrates heterogeneous data and computer vision to support forming scientific knowledge. In addition, I propose two works: Environmental Health Engine and Smell Pittsburgh. Environmental Health Engine is a visualization and exploratory analysis platform for creating environmental sensing and health data narratives. Smell Pittsburgh is a mobile crowdsourced application for reporting and visualizing pollution odors. I also propose conducting case studies to derive typology about using tools to support community engagement. Finally, I propose organizing insights from all four works and case studies into design patterns, which serve as rubrics for future researchers.

Thesis Committee:
Illah Nourbakhsh (Chair)
Aaron Steinfeld
Jeffrey Bigham
Eric Paulos (University of California, Berkeley)

Copy of Proposal Document

Mobile robots equipped with various sensors can gather data at unprecedented spatial and temporal scales and resolution. Over the years, our group has developed autonomous surface, ground and aerial vehicles for finding and localizing radio tagged animals. Recently, we've also developed robotic systems for yield estimation and farm monitoring. I will give an overview of these projects as well as some of the fundamental algorithmic problems we studied along the way.

Volkan Isler is an Associate Professor in the Computer Science Department at the University of Minnesota. He is a 2009-2012 resident fellow at the Institute on Environment and 2010-2012 McKnight Land-Grant Professor. Previously, he was an Assistant Professor at Rensselaer Polytechnic Institute, and a post-doctoral researcher at CITRIS at UC Berkeley. He obtained his MSE (2000) and PhD (2004) degrees in Computer and Information Science from the University of Pennsylvania. While at Penn, he was a member of the GRASP Lab and the Theory Group. He obtained his BS degree (1999) in Computer Engineering from Bogazici University, Istanbul, Turkey. In 2008, he received the National Science Foundation's Young Investigator Award (CAREER). From 2009 to 2015, he chaired IEEE Society of Robotics and Automation's Technical Committee on Networked Robots. He also served as an Associate Editor for IEEE Transactions on Robotics and IEEE Transactions on Automation Science and Engineering. His research interests are primarily in robotics, computer vision, sensor networks and geometric algorithms, and their applications in agriculture and environmental monitoring.

Faculty Host: Matt Mason

The rapid progress of AI in the last few years are largely the result of advances in deep learning and neural nets, combined with the availability of large datasets and fast GPUs. We now have systems that can recognize images with an accuracy that rivals that of humans. This will lead to revolutions in several domains such as autonomous transportation and medical image understanding. But all of these systems currently use supervised learning in which the machine is trained with inputs labeled by humans. The challenge of the next several years is to let machines learn from raw, unlabeled data, such as video or text. This is known as unsupervised learning. AI systems today do not possess "common sense", which humans and animals acquire by observing the world, acting in it, and understanding the physical constraints of it. Some of us see unsupervised learning as the key towards machines with common sense. Approaches to unsupervised learning will be reviewed. This presentation assumes some familiarity with the basic concepts of deep learning.

Yann LeCun is Director of AI Research at Facebook, and Silver Professor of Dara Science, Computer Science, Neural Science, and Electrical Engineering at New York University, affiliated with the NYU Center for Data Science, the Courant Institute of Mathematical Science, the Center for Neural Science, and the Electrical and Computer Engineering Department. He received the Electrical Engineer Diploma from Ecole Superieure d'Ingenieurs en Electrotechnique et Electronique (ESIEE), Paris in 1983, and a PhD in Computer Science from Universite Pierre et Marie Curie (Paris) in 1987. After a postdoc at the University of Toronto, he joined AT&T Bell Laboratories in Holmdel, NJ in 1988. He became head of the Image Processing Research Department at AT&T Labs-Research in 1996, and joined NYU as a professor in 2003, after a brief period as a Fellow of the NEC Research Institute in Princeton.

From 2012 to 2014 he directed NYU's initiative in data science and became the founding director of the NYU Center for Data Science. He was named Director of AI Research at Facebook in late 2013 and retains a part-time position on the NYU faculty. His current interests include AI, machine learning, computer perception, mobile robotics, and computational neuroscience. He has published over 180 technical papers and book chapters on these topics as well as on neural networks, handwriting recognition, image processing and compression, and on dedicated circuits and architectures for computer perception. The character recognition technology he developed at Bell Labs is used by several banks around the world to read checks and was reading between 10 and 20% of all the checks in the US in the early 2000s. His image compression technology, called DjVu, is used by hundreds of web sites and publishers and millions of users to access scanned documents on the Web. Since the late 80's he has been working on deep learning methods, particularly the convolutional network model, which is the basis of many products and services deployed by companies such as Facebook, Google, Microsoft, Baidu, IBM, NEC, AT&T and others for image and video understanding, document recognition, human-computer interaction, and speech recognition.

LeCun has been on the editorial board of IJCV, IEEE PAMI, and IEEE Trans. Neural Networks, was program chair of CVPR'06, and is chair of ICLR 2013 and 2014. He is on the science advisory board of Institute for Pure and Applied Mathematics, and has advised many large and small companies about machine learning technology, including several startups he co-founded. He is the lead faculty at NYU for the Moore-Sloan Data Science Environment, a $36M initiative in collaboration with UC Berkeley and University of Washington to develop data-driven methods in the sciences. He is the recipient of the 2014 IEEE Neural Network Pioneer Award.

Faculty Host: Martial Hebert

The capacity of aerial robots to autonomously explore, inspect and map their environment is key to many applications. This talk will overview and discuss a set of new -and in their majority open sourced and experimentally verified- sampling-based strategies that break new ground on how a robot can efficiently inspect a structure for which a prior model exists, how to explore unknown environments, and how to actively combine the planning and perception loops to achieve autonomous exploration with maintained levels of 3D mapping fidelity. In particular, we will detail recent developments in the field of active perception and belief-space planning for autonomous exploration. Finally, an overview of further research activities on aerial robotics, including solar-powered unmanned aerial vehicles and aerial manipulators will be provided.

Kostas Alexis obtained his Ph.D. in the field of aerial robotics control and collaboration from the University of Patras, Greece in 2011. His Ph.D. research was supported by the Greek national-European Commission Excellence scholarship. After successfully defending his Ph.D. thesis, he was a awarded a Swiss Government fellowship and moved to Switzerland and ETH Zurich. From 2011 to June 2015 he held the position of senior researcher at the Autonomous Systems Lab, ETH Zurich, leading the lab efforts in the fields of control and path planning for advanced navigational and operational autonomy. His research interests lie in the fields of control, navigation, optimization and path-planning focusing on aerial robotic systems with multiple and hybrid configurations.

He is the author or co-author of more than 50 scientific publications and has received several best paper awards and distinctions, including the IET Control Theory & Applications Premium Award 2014. Furthermore, together with his collaborators, they have achieved world records in the field of solar-powered flight endurance. Kostas Alexis has participated in and organized several large-scale multi-million dollar research projects with broad international involvement and collaboration. In July 2015, Kostas moved to the University of Nevada, Reno with the goal to dedicate his efforts towards establishing true autonomy for aerial and other kinds of robotics.

Faculty Host: Sebastian Scherer

This work describes methods for advancing the state of the art in mobile robot navigation and physical Human-Robot Interaction (pHRI). An enabling technology in this effort is the ballbot, a person-sized mobile robot that balances on a ball. This underactuated robot presents unique challenges in planning, navigation, and control; however, it also has significant advantages over conventional mobile robots. The ballbot is omnidirectional and physically compliant. Moving requires the ballbot to lean, but this also gives it the ability to achieve both soft, compliant physical interaction and apply large forces.

The work presented in this thesis demonstrates the ability to navigate cluttered environments with the ballbot. Formulating the system as differentially flat enables fast, analytic trajectory planning. These trajectories are used to plan in the space of static and dynamic obstacles. Leveraging the ballbot’s navigational capabilities, this thesis also presents a method of physically leading people by the hand. A hu- man subject trial was conducted to assess the feasibility, safety, and comfort of this method. This study was successful, with the ballbot leading participants to multiple goals utilizing an amount of force that users found comfortable.

Another area of pHRI explored in this thesis is assisting people in transition from a seated position to standing. Another user study was conducted to discover how humans help each other out of chairs and how much force they apply. These data were used to design an impedance controller for the ballbot, and this controller was tested and found to deliver equivalent forces to those generated by people.

Lastly, this work explores capabilities that could enable the ballbot to navigate through dense crowds of people. A method for detecting collision and estimating external forces was explored. This method was tested and used to modify a costmap. Iteratively updating this costmap and using it to plan trajectories enabled the robot to discover obstacles through collision. Because the ballbot is inherently compliant, these collisions resulted in safe interactions with small forces.

Thesis Committee:
Ralph Hollis (Chair)
George Kantor
Jodi Forlizzi
Bill Smart (Oregon State University)

Copy of Thesis Draft Document

Robots today have the capability to collect terabytes of data about their environment and travel kilometers in a single day, yet they are still constrained by one fundamental resource: time. Time limits the number of samples a robot can collect, sites it can analyze, and data it can return for review, so it is imperative the rover makes intelligent actions when it comes to choosing when, where, and what to sample, a process known as adaptive sampling.

In this work we propose an approach to modeling and adaptive sampling. We consider a scenario in which a rover collects a set of a set of noisy, quickly-gathered measurements and analyzes it to select targets for longer follow-up operations. These two types of measurements might come from different devices, such as in an agricultural scenario where images of leaves are quickly analyzed to decide whether or not to follow up with a drawn-out yet informative spectra, or they might come from the same device, such as evaluating a quick sensor measurement and increasing the integration time of a follow-up measurement as desired. Our goal is to vastly decrease the amount of time needed to understand a scene by minimizing the number of samples we have to collect and the time spent collecting each sample.

We propose an approach in which we make estimates of models of materials within a scene, then perform follow-up measurements that validate these models and explore areas in which they do not generalize well. We have no prior knowledge about the number of distinct materials within scenes or how similar they are to materials we’ve observed before. In addition to choosing where to sample, we also propose methods for deciding how long to collect measurements or how many measurements to take. Using the models we have estimated in our first step, we decide how intensively we analyze further samples from those models and whether or not it is profitable to continue sampling at that location. We test this work on three main scenarios at vastly different scales, although the approach generalizes well to a number of other domains. First, we consider a rover analyzing low-resolution orbital data and selecting a path that maximizes the diversity of sampling locations. Second, we consider the scenario in which a rover collects an image and chooses objects within that image for further sampling. Finally, we consider the scenario in which a sensor, such as a spectrometer, collects quick and noisy measurements of a small patch of material, the selects a subset of sampled points for further analysis.

Thesis Committee:
David Wettergreen (Chair)
Jeff Schneider
George Kantor
David R. Thompson (Jet Propulsion Laboratory)

Copy of Thesis Document


Subscribe to RI