LTI

Scientific discovery was long viewed as a uniquely human creative activity, but digital computers can now reproduce many facets of this process. In this talk, I review the history of research on computational systems that discover scientific knowledge. The general framework posits that discovery involves search through a space of hypotheses, laws, or models, and that this search is guided both by domain knowledge and by regularities in data. Next I turn to one paradigm -- inductive process modeling -- that encodes models as sets of processes incorporating differential equations, induces these models from observational data, and uses background knowledge to aid in their construction. I illustrate the operation of implemented systems on data sets from ecology and environmental science, showing they produce accurate and interpretable models. I also report an improved framework that, by adopting a few simplifying assumptions, reliably produces more accurate fits and scales far better to complex models, along with recent work on adapting models to altered settings. In closing, I discuss challenges for research on scientific discovery and their role in the e-science movement, which uses computational methods to understand and support the scientific enterprise.

This talk describes joint work with Kevin Arrigo, Adam Arvay, Stuart Borrett, Will Bridewell, Ljupco Todorovski, and others. Papers are available.

Dr. Pat Langley serves as Director of the Institute for the Study of Learning and Expertise and as Professor of Computer Science at the University of Auckland. He has contributed to artificial intelligence and cognitive science for 35 years, he was founding Executive Editor of Machine Learning, and he is currently Editor for Advances in Cognitive Systems. His current research focuses on induction of explanatory scientific models and on architectures for interactive intelligent agents.

The successes of information retrieval in recent decades were built upon bag-of-words representations. Effective as it is, bag-of-words is only a shallow text understanding; there is a limited amount of information for document ranking in the word space. This dissertation goes beyond words and builds knowledge based text representations, which embed the carefully curated information from knowledge bases, and provide richer and structured evidence for more advanced information retrieval systems.

This thesis research first builds query representations with entities associated with the query. Entities' descriptions are used by query expansion techniques that enrich the query with explanation terms. Then we present a framework that represents a query with entities that appear in the query, are retrieved by the query, or frequently show up in the top retrieved documents. A latent space model is developed to jointly learn the connections from query to entities and the ranking of documents, modeling the external evidence from knowledge bases and internal ranking features cooperatively. To further improve the quality of relevant entities, a defining factor of our methods, we introduce learning to rank to entity search and retrieve better entities from knowledge bases. In the document representation part, this thesis research also moves one step forward with a bag-of-entities model, in which documents are represented by their automatic entity annotations, and the ranking is performed in the entity
 space.
 
This proposal includes plans to improve the quality of relevant entities with a co-learning framework that learns from both entity labels and document labels. We also plan to develop a hybrid ranking system that combines words and entities together with their uncertainties considered.  At last, we plan to enrich the text representations with connections between entities. We propose several ways to infer entity graph representations for texts, and to rank documents using their structure representations.

Thesis Committee:
Jamie Callan (Chair)
William Cohen
Tie-Yan Liu  (CMU/Microsoft Research)
Bruce Croft (University of Massachusetts, Amherst)

Copy of Proposal Document

The automated analysis of video data becomes ever more important as we are inundated by the ocean of videos generated every day. Current state-of-the-art algorithms in these tasks are mainly supervised, i.e. the algorithms learn models based on manually labeled training data. However, large quantities of high quality labeled data are very difficult to collect manually.Therefore, in this thesis, we propose to circumvent this problem by automatically harvesting and exploiting useful information from unlabeled videos based on 1) out-of-domain external knowledge sources and 2) internal constraints in video.

Two surveillance video analysis tasks were targeted: multi-object tracking and pose estimation. For multi-object tracking, we leveraged an external knowledge source: face recognition, to perform identity-aware person localization and tracking. We also utilized an internal constraint: spatial-temporal smoothness, to automatically collect person re-identification training data and learn deep appearance features, which further enhanced tracking performance. For pose estimation, we exploited the spatial-temporal smoothness constraint in a self-training framework to perform unsupervised domain adaptation. Finally, the proposed algorithms were used to analyze a nursing home data set which consisted of thousands of hours of surveillance video.

Our experimental results showed that the external knowledge and internal constraints utilized were effective in collecting useful information from unlabeled videos to enhance multi-object tracking and pose estimation performance. Based on the promising experimental results, we believe that other video analysis problems could also benefit from utilizing external knowledge or internal constraints in an unsupervised manner, thus reducing the need to manually label data. Furthermore, our proposed methods potentially open the door to automated analysis on the ocean of surveillance video generated every day.

Thesis Committee:
Alexander Hauptmann (Chair)
Abhinav Gupta
Yaser Sheikh
Rahul Sukthankar (Google Research)

Copy of Thesis Document

The performance of automatic speech recognition (ASR) has improved tremendously due to the application of deep neural networks (DNNs). Despite this progress, building a new ASR system remains a challenging task, requiring various resources, multiple training stages and significant expertise. In this talk, I will present an approach that drastically simplifies building acoustic models for the existing weighted finite state transducer (WFST) based decoding approach, and lends itself to end-to-end speech recognition, allowing optimization for arbitrary criteria. Acoustic modeling now involves learning a single recurrent neural network (RNN), which predicts context-independent targets (e.g., syllables, phonemes or characters). The connectionist temporal classification (CTC) objective function marginalizes over all possible alignments between speech frames and label sequences, removing the need for a separate alignment of the training data. We present a generalized decoding approach based on weighted finite-state transducers (WFSTs), which enables the efficient incorporation of lexicons and language models into CTC decoding. Experiments show that this approach achieves state-of-the-art word error rates, while drastically reducing complexity and speeding up decoding when compared to standard hybrid DNN systems.

Florian Metze is an Associate Research Professor at Carnegie Mellon’s University, in the Language Technologies Institute. His work is centered around speech and multi-media processing with a focus on low resource and multi-lingual speech processing, large-scale multi-media retrieval and summarization, along with recognition of personality or similar meta-data from speech.

Social media such as Twitter and Facebook provide insights into the personalities and concerns of people and communities.  We analyze tens of millions of Facebook posts and billions of tweets to study variation in language use with age, gender, personality, and mental and physical well-being.  Word clouds visually illustrate the big five personality traits (e.g., "What is it like to be neurotic?"), while correlations between language use and health data at both the individual and county level suggest connections between health, happiness, personality and culture, including potential psychological causes of heart disease.     

                                                                             

Dr. Lyle Ungar is a Professor of Computer and Information Science at the University of Pennsylvania, where he also holds appointments in multiple departments in the Schools of Business, Medicine, Arts and Sciences, and Engineering and Applied Science. Lyle received a B.S. from Stanford University and a Ph.D. from M.I.T. He has published over 200 articles and in co-inventor on eleven patents. His current research focuses on developing scalable machine learning methods for data mining and text mining, including spectral methods for natural language processing, statistical models for aggregating crowd-sourced predictions, and techniques to analyze social media to better understand the drivers of physical and mental well-being.

Faculty Host: Jaime Carbonell

 

Learning to reason and understand the world’s knowledge is a fundamental problem in Artificial Intelligence (AI). While it is always hypothesized that both the symbolic and statistical approaches are necessary to tackle complex problems in AI, in practice, bridging the two in a combined framework might bring intractability—most probabilistic first-order logics are simply not efficient enough for real-world sized tasks.

With the vast amount of relational data available in digital form, now is a good opportunity to close the gap between these two paradigms. The core research question that I will address in this thesis defense is the following: how can we design scalable statistical learning and inference methods to operate over rich knowledge representations? I will describe some examples of my work in advancing the state-of-the-arts in theories and practices of statistical relational learning, including: 1) ProPPR, a scalable learning and reasoning framework whose inference time does not depend on the size of knowledge graphs; 2) an efficient structural gradient based meta-reasoning approach that learns formulas from relational data, and a soft version of predicate invention, and representation learning for logic 3) and an application of joint information extraction and relational reasoning in NLP. I will conclude this talk by describing my future research plans.

Thesis Committee:
William Cohen (Chair)
Tom Mitchell
Christos Faloutsos
Eric Horvitz (Microsoft Research)

Copy of Thesis Document

Building intelligent systems that are capable of extracting meaningful representations from high-dimensional data lies at the core of solving many AI related tasks, including visual object recognition, information retrieval, speech perception, and language understanding.

In this talk I will first introduce a broad class of deep learning models and show that they can learn useful hierarchical representations from large volumes of high-dimensional data with applications in information retrieval, object recognition, and speech perception. I will discuss deep models that are capable of extracting a unified representation that fuses together multiple data modalities. In particular, I will introduce models that can generate natural language descriptions (captions) of images, as well as generate images from captions using attention mechanism. Finally, I will discuss an approach for unsupervised learning of a generic, distributed sentence encoder. I will show that on several tasks, including modelling images and text, these models significantly improve over many of the existing techniques.

Ruslan Salakhutdinov received his PhD in computer science from the University of Toronto in 2009. After spending two post-doctoral years at the Massachusetts Institute of Technology Artificial Intelligence Lab, he joined the University of Toronto as an Assistant Professor in the Departments of Statistics and Computer Science. In 2016 he moved to the Machine Learning Department at Carnegie Mellon University. His primary interests lie in artificial intelligence, machine learning, deep learning, and large-scale optimization. He is an Alfred P. Sloan Research Fellow, Microsoft Research Faculty Fellow, Canada Research Chair in Statistical Machine Learning, a recipient of the Early Researcher Award, Connaught New Researcher Award, Google Faculty Award, and is a Senior Fellow of the Canadian Institute for Advanced Research.

A key task in intelligent language processing is obtaining semantic representations that abstract away from surface lexical and syntactic decisions.  The Abstract Meaning Representation (AMR) is one such representation, which represents the meaning of a sentence as labeled nodes in a graph (concepts) and labeled, directed edges between them (relations). A traditional problem of semantic representations is producing them from natural language as well as producing natural language from them, or in other words, mapping into and out of the representation.  In this thesis proposal, I discuss methods and algorithms for mapping into and out of AMR, as well as an application of these techniques to the multi-lingual setting.

Thesis Committee:
Jaime Carbonell (Chair)
Chris Dyer (Chair)
Noah Smith (Chair)
Daniel Gildea (University of Rochester)

Copy of Proposal Document

Spoken dialog systems have been widely used across many domains. For example, voice applications are popular these days in environments such as smart phones or cars. Such speech systems are built using the developers' understanding of the application domain and of the potential users in the field. This understanding may be driven by observations collected from a sampled population at a given time. However, the deployed models may not perfectly fit the real-life usage or may no longer be valid with the dynamics of the domain/users over time. Therefore, an agent which automatically adapts to the domain and users after deployment is intuitively desired.

In this thesis, we focus on realistic problems in human-machine communication via natural language where adaptation can contribute to the quality of interaction. We mainly focus on speech understanding in a spoken dialog system, in order for the agent to figure out meaning from speech input. To do so, the system needs to recognize sentences (sequences of words) and interpret, among other semantics, user's intention. We discuss these two aspects in details. In short, our system can 1) adjust its vocabulary to improve speech recognition and understanding performance; 2) infer a user's high-level intention to better assist the user in the interaction. The former enables the system to accommodate user's (and domain's) language. The latter provides smooth and personalized interaction across multiple existing domains and enables the system to communicate at the task level, in addition to individual domain's level.

Thesis Committee:
Alexander Rudnicky (Chair)
Alan Black
Roni Rosenfeld
Amanda Stent (Yahoo! Research)

Copy of Thesis Document

This talk explores rich yet practical graphical models for structured prediction in natural language processing (NLP). Our architecture stems from the recent observation that approximate inference in graphical models – for some algorithms – is a differentiable feed-forward computation, akin to a deep neural network. As we point out, this connection allows the construction of domain-specific neural networks. These trainable prediction functions inherit a graphical model's expert reasoning about the joint distribution of latent variables in the domain, using hand-crafted feature functions. At the same time, they inherit deep learning's abilities to discover new useful features and to train parameters via back-propagation to minimize prediction error. Such a network should generalize appropriately (like other hand-crafted graphical models) while also fitting the training data well (like other neural networks).

We apply the method to three tasks: syntactic dependency parsing, relation extraction, and low-resource semantic role labeling. In contrast to the deep learning literature that eschews the incorporation of domain knowledge, our deep architecture explicitly capitalizes on our linguistic insights about the task. In this way, our networks are domain-specific. The proposed models perform at high accuracy even with (fast) approximate inference and degrade gracefully in the absence of (expensive) annotated datasets.

Matt Gormley is an assistant teaching professor in the Machine Learning Department at Carnegie Mellon University. He obtained his Ph.D. in Computer Science at Johns Hopkins University (2015), where he was co-advised by Jason Eisner and Mark Dredze. Matt holds a Bachelor's in Computer Science from CMU (2006) – as well as a Certificate in Spanish Culinary Arts from El Centro Superior de Hosteleria y Turismo de Valencia (2007). Matt's research focuses on machine learning for natural language processing. His interests include global optimization, learning under approximations, hybrids of graphical models and neural networks, and applications where supervised resources are scarce.

Pages

Subscribe to LTI