LTI

Smart objects are ordinary objects that have been augmented with computation and communication as well as abilities for perception, action and/or interaction.  To be accepted as intelligent, a smart object should behave in a manner that is appropriate for its role and its environment. Appropriate behavior is commonly referred to as situated. In this talk we will discuss the use of Situation Models as theory for constructing smart objects capable of situated interaction.

We describe a layered architecture, in which the situation model is used to coordinate perception, action and interaction. We describe the components of a situation model, and discuss construction of situation models using both GUI based tools and machine learning.  We present a cyclic process inspired by models from ergonomics for maintaining a situation model, and discuss the use of probabilistic predicates for reasoning with uncertain information.  We present examples of applications from the EU projects FAME and CHIL as well as situated interaction with robots, and recent commercial applications proposed by INRIA startups Rocamroll and Situ8ed.

James Crowley holds the post of Professor, Classe Exceptionelle 2, at the Institut National Polytechnique de Grenoble (INPG), where he teaches courses in Computer Vision, Signal Processing, Pattern Recognition, Machine Learning and Artificial Intelligence at ENSIMAG.  He performs his research at INRIA Grenoble Rhône- Alpes Research Center in Montbonnot, where he directs the INRIA  Pervasive interaction Project-Group.

Over the last 30 years, professor Crowley has made a number of fundamental contributions to computer vision and mobile robotics.  In September 2011, James Crowley was appointed as Senior Member of the l'Institut Universitaire de France (IUF).  In Mar. 2014, James Crowley was named Chevalier de l'Ordre National du Mérite.  During his career, Professor Crowley has edited two books, five special issues of journals, and authored over 220 peer-reviewed scientific articles on computer vision, mobile robotics, human-computer interaction and ambient intelligence.  His publications have received over 11,000 Citations and an h-index of 51.

Ph.D. CMU, 1981; MSc CMU, 1977; BSc, SMU, 1975

Faculty Host: Florian Metze
Instructor: Alex Hauptmann

Appointments

One of the characteristics of spontaneous speech that distinguishes it from written text is the presence of disfluencies, including filled pauses (um, uh), repetitions, and self corrections. In spoken language processing applications, disfluencies are typically thought of as "noise" in the speech signal.  However, there are several systematic patterns associated with where disfluencies occur that can be leveraged to automatically detect them and to improve natural language processing.   Further, rates of different types of disfluencies appear to depend on multiple levels of speech production planning and to vary depending on the individual speaker and the social context.   Thus, detecting different disfluency types provides additional information about spoken interactions -- beyond the literal meaning of the words.  In this talk, we describe both computational models for multi-domain disfluency detection and analyses of different corpora that provide insights into what disfluencies can tell us about the speaker in both high-stakes and casual contexts.

Mari Ostendorf, an alumna of the Stanford Signal Compression and Classification Group, joined the University of Washington in September 1999.  Previously, she was in the Speech Signal Processing Group at BBN Laboratories (1985-1986), and at Boston University on the faculty of the Electrical and Computer Engineering Department (1987-1999).  In 1995, she was a visiting researcher at the ATR Interpreting Telecommunications Laboratory in Japan, and in 2005-2006, she is a Visiting Professor at the University of Karlsruhe. She teaches undergraduate courses in circuits and signals and systems, and graduate courses on various topics related to statistical signal processing.  Professor Ostendorf is a fellow of IEEE and a member of SWE, ASA and Sigma Xi. She has served on numerous technical and advisory committees.

Prof. Ostendorf's research interests include data compression and statistical pattern recognition, particularly in speech processing applications.  Her recent work includes segment-based acoustic modeling for spontaneous speech recognition, dynamic pronunciation modeling dependence modeling for adaptation, use of out-of-domain data and discourse structure in language modeling, and stochastic models of prosody for both recognition and synthesis.  She has published over 200 papers on various problems in speech and language processing.  She works in the Signal, Speech and Language Interpretation Laboratory, where both undergraduate and graduate students are involved in a variety of research projects related to these problems.

Faculty Host: Carolyn Rose
Instructor: Alex Hauptmann

Many modern IR systems and data exhibit new characteristics which are largely ignored by conventional techniques.  What is missing is an ability for the model to change over time and be responsive to stimulus.  Documents, relevance, users and tasks all exhibit dynamic behavior that is captured in big data sets (typically collected over long time spans) and models need to respond to these changes.  This talk provides an up-to-date introduction to Dynamic Information Retrieval Modeling.  In particular, I will talk about how we model information seeking as a partially observable Markov decision process and achieve high accuracy in the TREC Session Tracks. I will also talk about evaluation in dynamic IR and the TREC Dynamic Domain Track.

Grace Hui Yang is an Assistant Professor in the Department of Computer Science at Georgetown University.  Grace obtained her Ph.D. from the Language Technologies Institute, Carnegie Mellon University in 2011.  Grace's current research interests include dynamic search, search engine evaluation, privacy-preserving information retrieval, and information organization.  Prior to this, she conducted research on question answering, ontology construction, near-duplicate detection, multimedia information retrieval and opinion and sentiment detection.  Grace is a recipient of the National Science Foundation (NSF) Faculty Early Career Development Program (CAREER) Award.  Grace co-chaired the SIGIR 2013-2014 Doctoral Consortium, SIGIR 2017 Workshop, and WSDM 2017 Workshop.  She served as an area chair for SIGIR 2014-2016 and ACL 2016.  Grace also co-organized the TREC 2015-now Dynamic Domain Track

Faculty Host: Jamie Callan
Instructor: Alex Hauptmann

The LTI Student Research Symposium (SRS) is a one-day series of talks and poster presentations designed both to increase awareness of the diverse aspects of language technologies research conducted by students within the LTI, as well as to introduce incoming LTI students to the work of current students.

One of the talks will be selected for a Best Presentation Award, which will be announced at the end of the Symposium. The winner of the Best Presentation Award will receive a cash prize of $500.  Additionally, two Honorable Mentions will be selected, each receiving a cash prize of $100, and two $100 for Best Poster Award will also be presented.

The central goal of this thesis is to bridge the divide between theoretical/cognitive linguistics—the scientific inquiry of language—and applied data-driven statistical language processing, to provide deeper insight into data and to build more powerful, robust models. To corroborate the practical importance of the synergy between linguistics and NLP, I present model-based approaches that incorporate linguistic knowledge in novel ways and improve over strong linguistically-uninformed statistical baselines. In the first part of the thesis, I show how linguistic knowledge comes to the rescue in processing languages which lack large data resources. I introduce two new approaches to cross-lingual knowledge transfer from resource-rich to resource-constrained, typologically diverse languages: (i) morpho-phonological knowledge transfer that models the historical process of lexical borrowing between languages, and its utility in improving machine translation, and (ii) semantic transfer that allows to identify metaphors—ultimately, a language- and culture-salient phenomenon—across languages. In the second part, I argue that integrating explicit linguistic knowledge and guiding models towards linguistically-informed generalizations help improve learning also in resource-rich conditions. I present first steps towards interpreting and integrating linguistic knowledge in neural NLP (i) through structuring training data using knowledge from language acquisition and thereby learning the curriculum for better task-specific distributed representations, (ii) for using linguistic typology in training hybrid linguistically-informed multilingual deep learning models, and (iii) in leveraging linguistic knowledge for evaluation and interpretation of learned distributed representations. The scientific contributions of this thesis include a range of new research questions and new statistical models; the practical contributions are new tools and data resources, and several quantitatively and qualitatively improved NLP applications.
 
Thesis Committee:

Chris Dyer (Chair)
Alan Black
Noah Smith
Jacob Eisenstein (Georgia Institute of Technology)

Copy of Thesis Document

A verb is the organizational core of a sentence. Understanding the meaning of the verb is therefore key to understanding the meaning of the sentence. Natural language understanding is the problem of mapping natural language text to its meaning representation: entities and relations anchored to the world. Since verbs express relations over their arguments, a lexical resource about verbs can facilitate natural language understanding by mapping verbs to relations over entities expressed by their arguments in the world. In this thesis, we semi-automatically construct a verb resource called VerbKB that contains important semantics for natural language understanding. We present algorithms behind VerbKB that learn two semantics about verbs that will complement existing resources on verbs such as WordNet and VerbNet and existing knowledge bases about entities such as NELL. The two semantics are (1) the mapping of verbs to relations in knowledge bases (e.g., the mapping of the verb “marry” to the relation hasSpouse) and (2) the mapping of verbs to changes in relations in knowledge bases (e.g., the mapping of the verb “divorce” to the termination of the relation hasSpouse). The mapping of verbs to relations in knowledge bases such as NELL, YAGO, or Freebase can provide a direct link between the text and the background knowledge about the world contained in these knowledge bases; enabling inferences over the world knowledge to better understand the text. The mapping of verbs to changes in relations in knowledge bases can facilitate automatic updates of relations and temporal scoping of relations in the knowledge bases.

Thesis Committee:
Tom Mitchell (Chair)
William Cohen
Ed Hovy
Martha Palmer (University of Colorado at Boulder)

Copy of Thesis Document

To help learners foster the competencies of collaboration and communication in practice, there has been interest in incorporating a collaborative team-based learning component in Massive Open Online Courses (MOOCs) ever since the beginning. Most researchers agree that simply placing students in small groups does not guarantee that learning will occur. In previous work, team formation in MOOCs occurs through personal messaging early in a course and is typically based on scant learner profiles, e.g. demographics and prior knowledge. Since MOOC students have diverse background and motivation, there has been limited success in the self-selected or randomly assigned MOOC teams. Being part of an ineffective or dysfunctional team may well be inferior to independent study in promoting learning and can lead to frustration. This dissertation studies how to coordinate team based learning in MOOCs with a learning science concept, namely Transactivity. A transactivite discussion is one where participants elaborate, build upon, question or argue against previously presented ideas. It has long been established that transactive discussion is an important process that reflects good social dynamics in a group, correlates with students' increased learning, and results in collaborative knowledge integration. Building on this foundation, we design a deliberation-based team formation where students hold a course community deliberation before small group collaboration.
 
The center piece of this dissertation is a process for introducing online students into teams for effective group work. The key idea is that students should have the opportunity to interact meaningfully with the community before assignment into teams. That discussion not only provides evidence of which students would work well together, but it also provides students with a wealth of insight into alternative task-relevant perspectives to take with them into the collaboration.

The team formation process begins with individual work.  The students post their individual work to a discussion forum for a community-wide deliberation over the work produced by each individual. The resulting data trace informs automated guidance for team formation. The automated team assignment process groups students who display successful team processes, i.e., where transactive reasoning has been exchanged during the deliberation. Our experimental results indicate that teams that are formed based on students' transactive discussion after the community deliberation have better collaboration product than randomly formed teams. As a grand finale to the dissertation, the paradigm for team formation validated in Amazon's Mechanical Turk is tested for external validity within two real MOOCs with different team-based learning setting. The results demonstrated the effectiveness of our team formation process.

This thesis provides a theoretical foundation, a hypothesis driven investigation both in the form of corpus studies and controlled experiments, and finally a demonstration of external validation. It's contribution to MOOC practitioners includes both practical design advice as well as coordinating tools for team based MOOCs.

Thesis Committee:
Carolyn P. Rosé (Chair)
James Herbsleb
Steve Dow (University of California, San Diego)
Anne Trumbore (Wharton Online Learning Initiatives)
Candace Thille (Stanford University)

Copy of Thesis Document

The introduction of deep neural networks (DNNs) has advanced the performance of automatic speech recognition (ASR) tremendously. On a wide range of ASR tasks, DNN models show superior performance than the traditional Gaussian mixture models (GMMs). Although making significant advances, DNN models still suffer from data scarcity, speaker mismatch and environment variability. This thesis resolves these challenges by fully exploiting DNNs' ability of integrating heterogeneous features under the same optimization objective. We propose to improve DNN models under these challenging conditions by incorporating context information into DNN training.

On a new language, the amount of training data may become highly limited. This data scarcity causes degradation on the recognition accuracy of DNN models. A solution is to transfer knowledge from other languages to the low-resource condition. This thesis proposes a framework to build cross-language DNNs via language-universal feature extractors (LUFEs). Convolutional neural networks (CNNs) and deep maxout networks (DMNs) are employed to improve the quality of LUFEs, which enables the generation of invariant and sparse feature representations. This framework notably improves the recognition accuracy on a wide range of low-resource languages.  

The performance of DNNs degrades when the mismatch between acoustic models and testing speakers exists. A form of context information which encapsulates speaker characteristics is i-vectors. This thesis proposes a novel framework to perform feature-space speaker adaptive training (SAT) for DNN models. A key component of this approach is an adaptation network which takes i-vectors as inputs and projects DNN inputs into a normalized feature space. The DNN model fine-tuned in this new feature space rules out non-speech variability and becomes more independent of specific speakers. This SAT method is applicable to   different feature types and model architectures.

The proposed adaptive training framework is further extended to incorporate distance- and video-related context information. The distance descriptors are extracted from deep learning models which are trained to distinguish distance types on the frame level.  Distance adaptive training (DAT) using these descriptors captures speaker-microphone distance dynamically on the frame level. When performing ASR on video data, we naturally have access to both the speech and the video modality. Video- and segment-level visual features are extracted from the video stream. Video adaptive training (VAT) with these visual features results in more robust acoustic models that are agnostic to environment variability. Moreover, the proposed VAT approach removes the need for frame-level visual features and thus achieves audio-visual ASR on truly open-domain videos.

Thesis Committee:
Florian Metze (Chair)
Alan Black
Alex Waibel
Jinyu Li (Microsoft)

Copy of Thesis Document

People write everyday — articles, blogs, emails — with a purpose and an audience in mind. Politicians adapt their speeches to convince their audience, news media slant stories for their market, and teenagers on social media seek social status among their peers through their posts. Hence, language is purposeful and strategic. In this thesis, we introduce a framework for text analysis that make explicit the purposefulness of the author and develop methods that consider the interaction between the author, her text, and her audience's responses. We frame the authoring process as a decision theoretic problem — the observed text is the result of an author maximizing her utility.

We will explore this perspective by developing a set of novel statistical models that characterize authors' strategic behaviors through their utility functions. We consider three particular domains — political campaigns, the scientific community, and the judiciary — using our models and develop the necessary tools to evaluate our assumptions and hypotheses. In each of these domains, our models yield better response prediction accuracy and provide an interpretable means of investigating the underlying processes. Together, they exemplify our approach to text modeling and data exploration.

Throughout this thesis, we will illustrate how our models can be used as tools for in-depth exploration of text data and hypothesis generation.

Thesis Committee:
Noah Smith (Chair)
Eduard Hovy
Daniel Neill (Heinz/CMU)
Jing Jiang (Singapore Management University)
Philip Resnik (University of Maryland, College Park)

Copy of Thesis Document

Retrieval pipelines commonly rely on a term-based search to obtain candidate records,  which are subsequently re-ranked. Some candidates  are missed by this approach,  e.g., due to a vocabulary mismatch.  We address this issue by replacing  the term-based search with a generic k-NN retrieval algorithm, where a similarity function can take into account subtle word similarities, computed via IBM Model 1.

To implement this idea,  we first develop an extendible framework (NMSLIB) for searching in generic, not necessarily metric, spaces. Second, we implement new and existing search methods within  NMSLIB to carry out an extensive evaluation. Third,  we employ k-NN methods as a replacement for term-based retrieval in an ad hoc  retrieval task.

While an exact brute-force k-NN search using the similarity function  incorporating IBM Model 1 is  slow, we demonstrate that   an   approximate    algorithm   is  nearly two orders of magnitude  faster at the expense  of only  a small loss in accuracy.  A retrieval pipeline relying on an approximate k-NN search can be more effective than   the  term-based  pipeline. This opens up new possibilities for designing effective retrieval pipelines.

Thesis Committee:
Eric Nyerg (Chair)
Jamie Callan
Alex Hauptmann
James Allan (University of Massachusetts, Amherst)

Copy of Proposal Document

Pages

Subscribe to LTI