A verb is the organizational core of a sentence. Understanding the meaning of the verb is therefore key to understanding the meaning of the sentence. Natural language understanding is the problem of mapping natural language text to its meaning representation: entities and relations anchored to the world. Since verbs express relations over their arguments, a lexical resource about verbs can facilitate natural language understanding by mapping verbs to relations over entities expressed by their arguments in the world. In this thesis, we semi-automatically construct a verb resource called VerbKB that contains important semantics for natural language understanding. We present algorithms behind VerbKB that learn two semantics about verbs that will complement existing resources on verbs such as WordNet and VerbNet and existing knowledge bases about entities such as NELL. The two semantics are (1) the mapping of verbs to relations in knowledge bases (e.g., the mapping of the verb “marry” to the relation hasSpouse) and (2) the mapping of verbs to changes in relations in knowledge bases (e.g., the mapping of the verb “divorce” to the termination of the relation hasSpouse). The mapping of verbs to relations in knowledge bases such as NELL, YAGO, or Freebase can provide a direct link between the text and the background knowledge about the world contained in these knowledge bases; enabling inferences over the world knowledge to better understand the text. The mapping of verbs to changes in relations in knowledge bases can facilitate automatic updates of relations and temporal scoping of relations in the knowledge bases.
Tom Mitchell (Chair)
Martha Palmer (University of Colorado at Boulder)
To help learners foster the competencies of collaboration and communication in practice, there has been interest in incorporating a collaborative team-based learning component in Massive Open Online Courses (MOOCs) ever since the beginning. Most researchers agree that simply placing students in small groups does not guarantee that learning will occur. In previous work, team formation in MOOCs occurs through personal messaging early in a course and is typically based on scant learner profiles, e.g. demographics and prior knowledge. Since MOOC students have diverse background and motivation, there has been limited success in the self-selected or randomly assigned MOOC teams. Being part of an ineffective or dysfunctional team may well be inferior to independent study in promoting learning and can lead to frustration. This dissertation studies how to coordinate team based learning in MOOCs with a learning science concept, namely Transactivity. A transactivite discussion is one where participants elaborate, build upon, question or argue against previously presented ideas. It has long been established that transactive discussion is an important process that reflects good social dynamics in a group, correlates with students' increased learning, and results in collaborative knowledge integration. Building on this foundation, we design a deliberation-based team formation where students hold a course community deliberation before small group collaboration.
The center piece of this dissertation is a process for introducing online students into teams for effective group work. The key idea is that students should have the opportunity to interact meaningfully with the community before assignment into teams. That discussion not only provides evidence of which students would work well together, but it also provides students with a wealth of insight into alternative task-relevant perspectives to take with them into the collaboration.
The team formation process begins with individual work. The students post their individual work to a discussion forum for a community-wide deliberation over the work produced by each individual. The resulting data trace informs automated guidance for team formation. The automated team assignment process groups students who display successful team processes, i.e., where transactive reasoning has been exchanged during the deliberation. Our experimental results indicate that teams that are formed based on students' transactive discussion after the community deliberation have better collaboration product than randomly formed teams. As a grand finale to the dissertation, the paradigm for team formation validated in Amazon's Mechanical Turk is tested for external validity within two real MOOCs with different team-based learning setting. The results demonstrated the effectiveness of our team formation process.
This thesis provides a theoretical foundation, a hypothesis driven investigation both in the form of corpus studies and controlled experiments, and finally a demonstration of external validation. It's contribution to MOOC practitioners includes both practical design advice as well as coordinating tools for team based MOOCs.
Carolyn P. Rosé (Chair)
Steve Dow (University of California, San Diego)
Anne Trumbore (Wharton Online Learning Initiatives)
Candace Thille (Stanford University)
The introduction of deep neural networks (DNNs) has advanced the performance of automatic speech recognition (ASR) tremendously. On a wide range of ASR tasks, DNN models show superior performance than the traditional Gaussian mixture models (GMMs). Although making significant advances, DNN models still suffer from data scarcity, speaker mismatch and environment variability. This thesis resolves these challenges by fully exploiting DNNs' ability of integrating heterogeneous features under the same optimization objective. We propose to improve DNN models under these challenging conditions by incorporating context information into DNN training.
On a new language, the amount of training data may become highly limited. This data scarcity causes degradation on the recognition accuracy of DNN models. A solution is to transfer knowledge from other languages to the low-resource condition. This thesis proposes a framework to build cross-language DNNs via language-universal feature extractors (LUFEs). Convolutional neural networks (CNNs) and deep maxout networks (DMNs) are employed to improve the quality of LUFEs, which enables the generation of invariant and sparse feature representations. This framework notably improves the recognition accuracy on a wide range of low-resource languages.
The performance of DNNs degrades when the mismatch between acoustic models and testing speakers exists. A form of context information which encapsulates speaker characteristics is i-vectors. This thesis proposes a novel framework to perform feature-space speaker adaptive training (SAT) for DNN models. A key component of this approach is an adaptation network which takes i-vectors as inputs and projects DNN inputs into a normalized feature space. The DNN model fine-tuned in this new feature space rules out non-speech variability and becomes more independent of specific speakers. This SAT method is applicable to different feature types and model architectures.
The proposed adaptive training framework is further extended to incorporate distance- and video-related context information. The distance descriptors are extracted from deep learning models which are trained to distinguish distance types on the frame level. Distance adaptive training (DAT) using these descriptors captures speaker-microphone distance dynamically on the frame level. When performing ASR on video data, we naturally have access to both the speech and the video modality. Video- and segment-level visual features are extracted from the video stream. Video adaptive training (VAT) with these visual features results in more robust acoustic models that are agnostic to environment variability. Moreover, the proposed VAT approach removes the need for frame-level visual features and thus achieves audio-visual ASR on truly open-domain videos.
Florian Metze (Chair)
Jinyu Li (Microsoft)
People write everyday — articles, blogs, emails — with a purpose and an audience in mind. Politicians adapt their speeches to convince their audience, news media slant stories for their market, and teenagers on social media seek social status among their peers through their posts. Hence, language is purposeful and strategic. In this thesis, we introduce a framework for text analysis that make explicit the purposefulness of the author and develop methods that consider the interaction between the author, her text, and her audience's responses. We frame the authoring process as a decision theoretic problem — the observed text is the result of an author maximizing her utility.
We will explore this perspective by developing a set of novel statistical models that characterize authors' strategic behaviors through their utility functions. We consider three particular domains — political campaigns, the scientific community, and the judiciary — using our models and develop the necessary tools to evaluate our assumptions and hypotheses. In each of these domains, our models yield better response prediction accuracy and provide an interpretable means of investigating the underlying processes. Together, they exemplify our approach to text modeling and data exploration.
Throughout this thesis, we will illustrate how our models can be used as tools for in-depth exploration of text data and hypothesis generation.
Noah Smith (Chair)
Daniel Neill (Heinz/CMU)
Jing Jiang (Singapore Management University)
Philip Resnik (University of Maryland, College Park)
Retrieval pipelines commonly rely on a term-based search to obtain candidate records, which are subsequently re-ranked. Some candidates are missed by this approach, e.g., due to a vocabulary mismatch. We address this issue by replacing the term-based search with a generic k-NN retrieval algorithm, where a similarity function can take into account subtle word similarities, computed via IBM Model 1.
To implement this idea, we first develop an extendible framework (NMSLIB) for searching in generic, not necessarily metric, spaces. Second, we implement new and existing search methods within NMSLIB to carry out an extensive evaluation. Third, we employ k-NN methods as a replacement for term-based retrieval in an ad hoc retrieval task.
While an exact brute-force k-NN search using the similarity function incorporating IBM Model 1 is slow, we demonstrate that an approximate algorithm is nearly two orders of magnitude faster at the expense of only a small loss in accuracy. A retrieval pipeline relying on an approximate k-NN search can be more effective than the term-based pipeline. This opens up new possibilities for designing effective retrieval pipelines.
Eric Nyerg (Chair)
James Allan (University of Massachusetts, Amherst)
Selective search is a modern distributed search architecture designed to reduce the computational cost of large-scale search. Prior work has shown selective search to be effective in smaller, single query-at-a-time environments. This thesis aims to bring selective search to wider adoption by addressing the questions related to efficiency and effectiveness in a practical implementation.
The first set of investigations relate to efficiency. A practical implementation of selective search would use a parallel query processing environment to maximize throughput but it is unknown whether selective search would retain is efficiency advantage over exhaustive search in this environment. This proposal presents load-balancing solutions to manage unequal popularity of shards and balance the cost of resource selection. Overall, with proper load management, the efficiency claims of prior work remain relevant in a parallel processing setting and larger computational environments. Production systems also use query optimization techniques such as dynamic pruning. In this work, selective search was combined with WAND, a common dynamic pruning algorithm and it was shown that selective search and WAND had better-than-additive gains due to the long posting lists of the topically focused shards.
The second half of the proposal research addresses effectiveness. Prior work has shown that a single instance of selective search has reasonable accuracy. However, this selective search system contained non-deterministic steps. This thesis investigated the effects of random decisions on the accuracy of selective search and found the variance across system instances is acceptable.
During the course of the thesis research, it was found that resource selection remains a major source of errors in a selective search system. Thus, a new resource selection algorithm was explored, using the pre-existing statistics collected by Block-Max WAND. We found that in a single-term query, the new method proved more effective than existing methods and comparable to exhaustive search.
Jamie Callan (Chair)
Alistair Moffat (University of Melbourne)
Developing NLP tools for many languages involves unique challenges not typically encountered in English NLP work (e.g., limited annotations, unscalable architectures, code switching). Although each language is unique, different languages often exhibit similar characteristics (e.g., phonetic, morphological, lexical, syntactic) which can be exploited to synergistically train analyzers for multiple languages. In this thesis, we advocate for a novel language-universal approach to multilingual NLP in which one statistical model trained on multilingual, homogenuous annotations is used to process natural language input in multiple languages.
To empirically show the merits of the proposed approach, we develop MALOPA, a language-unidersal dependency parser which outperforms monolingually-trained parsers in several low-resource and high-resource scenarios. MALOPA us a greedy transition-based parser which uses multilingual word embeddings and other language-universal efeatures as a homogeneous representation of the imput across all languages. To address the syntactice differences between languages, MALOPA makes use of token-level language information as well as language0specific representations such as fine-grained part-of-speech tags. MALOPA uses a recrurrent neural network architecture and multitask learning to jointly predict POS tags and labels dependency parses.
Focusing on homogeneous input representations, we propose novel methods for stimating multilingual word embeddings and for predicting word alignments. We develop two methods for estimating multilingual word embeddings from bilingual dictionaries and monolingual corpora. The first estimation method, multiCluster, learns embeddings of word clusters which may contain words from different languages, and learns distributional similarities by pooling the contexts of all words in the same cluster in multiple monolingual corpora. The second estimation method, multiCCA, learns a linear projection of monolingually trained embeddings in each language to one vector space, extending the work of Faruqui and Dyer (2014) to more than two languages. To show the scalability of our methods, we train multilingual embeddings in 59 languages. We also develop an extensible, easy-to-use web-based evaluation portal for evaluating arbitrary multilingual word embeddings on several intrinsic and extrinsic tasks. We develop the conditional random field autoencoder (CRF autoencoder) model for un supervised learning of structured predictors, and use it to predict word alignments in parallel corpora. We use a feature-rich CRF model to predict the latent representation conditional on the observed input, then reconstruct the input conditional on the latent representation using a generative model which factorizes similarly to the CRF model. To reconstruct the observations, we experiment with a categorical distribution over word types (or word clusters), and a multivariate Gaussian distribution that generates pretrained word embeddings
Chris Dyer (Co-Chair)
Noah Smith (Co-Chair, University of Washington)
Kuzman Ganchev (Google Research)
Scientific discovery was long viewed as a uniquely human creative activity, but digital computers can now reproduce many facets of this process. In this talk, I review the history of research on computational systems that discover scientific knowledge. The general framework posits that discovery involves search through a space of hypotheses, laws, or models, and that this search is guided both by domain knowledge and by regularities in data. Next I turn to one paradigm -- inductive process modeling -- that encodes models as sets of processes incorporating differential equations, induces these models from observational data, and uses background knowledge to aid in their construction. I illustrate the operation of implemented systems on data sets from ecology and environmental science, showing they produce accurate and interpretable models. I also report an improved framework that, by adopting a few simplifying assumptions, reliably produces more accurate fits and scales far better to complex models, along with recent work on adapting models to altered settings. In closing, I discuss challenges for research on scientific discovery and their role in the e-science movement, which uses computational methods to understand and support the scientific enterprise.
This talk describes joint work with Kevin Arrigo, Adam Arvay, Stuart Borrett, Will Bridewell, Ljupco Todorovski, and others. Papers are available.
Dr. Pat Langley serves as Director of the Institute for the Study of Learning and Expertise and as Professor of Computer Science at the University of Auckland. He has contributed to artificial intelligence and cognitive science for 35 years, he was founding Executive Editor of Machine Learning, and he is currently Editor for Advances in Cognitive Systems. His current research focuses on induction of explanatory scientific models and on architectures for interactive intelligent agents.
The successes of information retrieval in recent decades were built upon bag-of-words representations. Effective as it is, bag-of-words is only a shallow text understanding; there is a limited amount of information for document ranking in the word space. This dissertation goes beyond words and builds knowledge based text representations, which embed the carefully curated information from knowledge bases, and provide richer and structured evidence for more advanced information retrieval systems.
This thesis research first builds query representations with entities associated with the query. Entities' descriptions are used by query expansion techniques that enrich the query with explanation terms. Then we present a framework that represents a query with entities that appear in the query, are retrieved by the query, or frequently show up in the top retrieved documents. A latent space model is developed to jointly learn the connections from query to entities and the ranking of documents, modeling the external evidence from knowledge bases and internal ranking features cooperatively. To further improve the quality of relevant entities, a defining factor of our methods, we introduce learning to rank to entity search and retrieve better entities from knowledge bases. In the document representation part, this thesis research also moves one step forward with a bag-of-entities model, in which documents are represented by their automatic entity annotations, and the ranking is performed in the entity
This proposal includes plans to improve the quality of relevant entities with a co-learning framework that learns from both entity labels and document labels. We also plan to develop a hybrid ranking system that combines words and entities together with their uncertainties considered. At last, we plan to enrich the text representations with connections between entities. We propose several ways to infer entity graph representations for texts, and to rank documents using their structure representations.
Jamie Callan (Chair)
Tie-Yan Liu (CMU/Microsoft Research)
Bruce Croft (University of Massachusetts, Amherst)
The automated analysis of video data becomes ever more important as we are inundated by the ocean of videos generated every day. Current state-of-the-art algorithms in these tasks are mainly supervised, i.e. the algorithms learn models based on manually labeled training data. However, large quantities of high quality labeled data are very difficult to collect manually.Therefore, in this thesis, we propose to circumvent this problem by automatically harvesting and exploiting useful information from unlabeled videos based on 1) out-of-domain external knowledge sources and 2) internal constraints in video.
Two surveillance video analysis tasks were targeted: multi-object tracking and pose estimation. For multi-object tracking, we leveraged an external knowledge source: face recognition, to perform identity-aware person localization and tracking. We also utilized an internal constraint: spatial-temporal smoothness, to automatically collect person re-identification training data and learn deep appearance features, which further enhanced tracking performance. For pose estimation, we exploited the spatial-temporal smoothness constraint in a self-training framework to perform unsupervised domain adaptation. Finally, the proposed algorithms were used to analyze a nursing home data set which consisted of thousands of hours of surveillance video.
Our experimental results showed that the external knowledge and internal constraints utilized were effective in collecting useful information from unlabeled videos to enhance multi-object tracking and pose estimation performance. Based on the promising experimental results, we believe that other video analysis problems could also benefit from utilizing external knowledge or internal constraints in an unsupervised manner, thus reducing the need to manually label data. Furthermore, our proposed methods potentially open the door to automated analysis on the ocean of surveillance video generated every day.
Alexander Hauptmann (Chair)
Rahul Sukthankar (Google Research)