LTI

Recurrent neural networks such as LSTMs have become an indispensable tool for building probabilistic sequence models.  With discussion of the statistical motivations, I'll give some not-so-obvious ways that expressive LSTMs can be harnessed to help model sequential data:

1. To score chunks of candidate latent structures in their fully observed context.  The chunks can be assembled by dynamic programming, which preserves tractable marginal inference.  (Applications: string transduction, parsing, ...)
2. To predict sequences of events in real time.  This resembles neural language modeling, but the real-time setting means that you are predicting each event jointly with the entire preceding interval of non-events.  (Applications: social media, patient histories, consumer actions, ...)
3. To classify latent syntactic properties of a language from its observed surface ordering.  This essentially converts a hard and misspecified unsupervised learning problem to a simpler supervised one.  To deal with the shortage of supervised languages to train on, we manufacture new synthetic languages.  (Applications: grammar induction, etc.)

Jason Eisner is Professor of Computer Science at Johns Hopkins University, where he is also affiliated with the Center for Language and Speech Processing, the Machine Learning Group, the Cognitive Science Department, and the national Center of Excellence in Human Language Technology.  His goal is to develop the probabilistic modeling, inference, and learning techniques needed for a unified model of all kinds of linguistic structure.  His 100+ papers have presented various algorithms for parsing, machine translation, and weighted finite-state machines; formalizations, algorithms, theorems, and empirical results in computational phonology; and unsupervised or semi-supervised learning methods for syntax, morphology, and word-sense disambiguation.  He is also the lead designer of Dyna, a new declarative programming language that provides an infrastructure for AI research. He has received two school-wide awards for excellence in teaching.

To effectively sort and present relevant information pieces (e.g., answers, passages, documents) to human users, information systems rely on ranking models. Existing ranking models are typically designed for a specific task and therefore are not effective for complex information systems that require component changes or domain adaptations. For example, in the final stage of question answering, information systems such as IBM Watson DeepQA rank all results according to their evidence scores and judge the likelihood that each is correct or relevant. However, as information systems become more complex, determining effective ranking approaches becomes much more challenging.

Prior work includes heuristic ranking models that focus on a particular type of information object (e.g. a retrieved document, a factoid answer) using manually designed features specific to that information type. These models, however, do not use other, non-local features (e.g. features of the upstream/downstream information source) to locate relevant information. To address this gap, my research seeks to define a ranking approach that should easily and rapidly adapt to any version of system pipelines with an arbitrary number of phases.

We describe a general ranking approach for multi-phase and multi-strategy information systems, which produce and rank significantly more candidate results than the single phase and single strategy information systems to achieve acceptable robustness and overall performance. Our approach allows each phase in a system to leverage information propagated from preceding phases to inform the ranking decision. By collecting ranking features from the derivation paths that generate candidate results, the particular derivation path chosen can be used to predict result correctness or relevance. Those ranking features can be detected from an abstracted system object graph which represents all of the objects created during system execution (e.g. provenance) and object dependencies. This ranking approach has been applied to different domains including question answering and biomedical information retrieval. Experimental results showed that our proposed approach significantly outperforms comparable answer ranking models on the two domains.

Thesis Committee:
Eric Nyberg (Chair)
Teruko Mitamura
Jaime Carbonell
Bown Zhou (IBM T.J. Watson Research Center)

Copy of Thesis Document

In this talk, I will give an overview of some research projects at MSR aiming at building an open-domain neural dialogue system. We group dialogue bots based on users' goals into three categories: task completion bots, information access bots, and social bots. We explore different neural network models and deep reinforcement learning technologies to build response generation engines for all the bots. We will review our experimental settings, recent results tested on simulator users and real users, share the lessons we learned and discuss future work.

Jianfeng Gao is a Partner Research Manager in Deep Learning Technology Center (DLTC) at Microsoft Research, Redmond.  H works on deep learning for text and image processing (MS internal access) and lead the development of AI systems for dialogue, machine reading comprehension, question answering, and enterprise applications.  they have developed a series of deep semantic similarity models (DSSM, also a.k.a. Sent2Vec), which have been used for a wide range of text and image processing tasks.

From 2006 to 2014, he was Principal Researcher at Natural Language Processing Group at Microsoft Research, Redmond, where he worked on Web search, query understanding and reformulation, ads prediction, and statistical machine translation.  From 2005 to 2006, he was a research lead in Natural Interactive Services Division at Microsoft, where he worked on Project X, an effort of developing natural user interface for Windows.  From 1999 to 2005, he was Research Lead in Natural Language Computing Group at Microsoft Research Asia, where together with he colleagues, he developed the first Chinese speech recognition system released with Microsoft Office, the Chinese/Japanese Input Method Editors (IME) which were the leading products in the market, and the natural language platform for Windows Vista.

While acoustic signals are continuous in nature, the ways that humans generate pitch in speech and music involve important discrete decisions.  As a result, models of pitch must resolve a tension between continuous and combinatorial structure.  Similarly, interpreting images of printed documents requires reasoning about both continuous pixels and discrete characters.  Focusing on several different tasks that involve human artifacts, I'll present probabilistic models with this goal in mind. 

First, I'll describe an approach to historical document recognition that uses a statistical model of the historical printing press to reason about images, and, as a result, is able to decipher historical documents in an unsupervised fashion.  Based on this approach, I'll also demonstrate a related model that accurately predicts compositor attribution in the First Folio of Shakespeare.  Next, I'll present an unsupervised system that transcribes acoustic piano music into a symbolic representation by jointly describing the discrete structure of sheet music and the continuous structure of piano sounds.  Finally, I'll present a supervised method for predicting prosodic intonation from text that treats discrete prosodic decisions as latent variables, but directly models pitch in a continuous fashion.

Taylor Berg-Kirkpatrick joined the Language Technologies Institute at Carnegie Mellon University as an Assistant Professor in Fall 2016.  Previously, he was a Research Scientist at Semantic Machines Inc and, before that, completed his Ph.D. in computer science at the University of California, Berkeley. Taylor's research focuses on using machine learning to understand structured human data, including language but also sources like music, document images, and other complex artifacts.

Faculty Host/Instructor: Alex Hauptmann

Information retrieval and machine learning approaches are running in the background of most of the applications we use in our daily digital life.  The assistance they are providing is manifold, but relies on a set of core content processing tasks requiring compatible representation formalisms.  However, this is rarely the case in real-world scenarios.  This talk is concerned with shared representation formalisms for information encoded in heterogeneous modalities.  The heterogeneity may result from intra-modal varieties, like text in different languages for the modality of natural language, or by the different modalities themselves, like when relating text to images or to knowledge graphs.  I will discuss three ways to obtain a joint representation of heterogeneously represented content.  The first one is based on explicit semantics as encoded in knowledge graphs, the second one extends this approach by adding implicit semantics extracted from large data sets and the final one relies on joint learning without utilizing explicit semantics.  The presented approaches contribute to the long standing challenges of braking the language and modality barriers in order to enable the joint semantic processing of content in originally incompatible representation formalisms.

Achim Rettinger is a KIT Junior Research Group Leader at AIFB where he is heading the Adaptive Data Analytics team.  His research areas include Data Mining, Information Extraction, Knowledge Discovery, Ontology Learning, Machine Learning, Human Computer Systems, and Text Mining.

Joint video-language modeling has been attracting increasing attention in recent years, signifying a return to early AI goals of cooperative cognitive systems.  However, many approaches fail to leverage the complementarity and structure across vision and language.  For example, they may rely on a fixed visual model or fail to leverage the underlying compositional semantics inherent in language.  In this talk, I will discuss indeed seek to explicitly jointly capture structure across modalities, and to capture this structure at a low-level.  The work explores sparse modeling as a means for bridging across vision and language.  These are low-level models that capture a joint, generative embedding using paired and composition dictionary learning.  We also overcome a historical limitation of such sparse models by showing how they can be embedded directly within a deep artificial neural network.  Results for both of these works will be provided and discussed in detail.

Jason Corso is an associate professor of Electrical Engineering and Computer Science at the University of Michigan.  He received his PhD and MSE degrees at The Johns Hopkins University in 2005 and 2002, respectively, and the BS Degree with honors from Loyola College in Maryland in 2000, all in Computer Science.  He spent two years as a post-doctoral fellow at the University of California, Los Angeles. 

From 2007-14 he was a member of the Computer Science and Engineering faculty at SUNY Buffalo.  He is the recipient of a Google Faculty Research Award 2015, the Army Research Office Young Investigator Award 2010, NSF CAREER award 2009, SUNY Buffalo Young Investigator Award 2011, a member of the 2009 DARPA Computer Science Study Group, and a recipient of the Link Foundation Fellowship in Advanced Simulation and Training 2003.  Corso has authored more than one-hundred peer-reviewed papers on topics of his research interest including computer vision, robot perception, data science, and medical imaging.  He is a member of the AAAI, ACM, MAA and a senior member of the IEEE.

Faculty Host: Alexander Hauptmann

Interaction in rich natural language enables people to exchange thoughts efficiently and come to a shared understanding quickly. Modern personal intelligent assistants such as Apple's Siri and Amazon's Echo all utilize conversational interfaces as their primary communication channels, and illustrate a future that in which getting help from a computer is as easy as asking a friend. However, despite decades of research, modern conversational assistants are still limited in domain, expressiveness, and robustness. In this thesis, we take an alternative approach that blends real-time human computation with artificial intelligence to reliably engage in conversations. Instead of bootstrapping automation from the bottom up with only automatic components, we start with our crowd-powered conversational assistant, Chorus, and create a framework that enables Chorus to automate itself over time. Each of Chorus' response is proposed and voted on by a group of crowd workers in real-time.

Toward realizing the goal of full automation, we (i) augmented Chorus' capability by connecting it with sensors and effectors on smartphones so that users can safely control them via conversation, and (ii) deployed Chorus to the public as a Google Hangouts chatbot to collect a large corpus of conversations to help speed automation. The deployed Chorus also provides a working system to experiment automated approaches. In the future, we will (iii) create a framework that enables Chorus to automate itself over time by automatically obtaining response candidates from multiple dialog systems and selecting appropriate responses based on the current conversation. Over time, the automated systems will take over more responsibility in Chorus, not only helping us to deploy robust conversational assistants before we know how to automate everything, but also allowing us to drive down costs and gradually reduce reliance on the crowd.

Thesis Committee:
Jeffrey P. Bigham (Chair)
Alexander Rudnicky
Niki Kittur
Walter S. Lasecki (University of Michigan)
Chris Callison-Burch (University of Pennsylvania)

Copy of Proposal Document

Adam Berger is an accomplished technology executive and team leader in the software arena. He founded and grew two companies, both of which extend the capabilities of mobile devices past voice into compelling, usable new data services.  He has also worked within two world-leading information-technology firms, Nokia and IBM.

Along with three other Ph.D. students from Carnegie Mellon, Berger co-founded Eizel Technologies Inc. in 2000.  Eizel was a venture-backed software firm that developed a corporate mobile email system.  The mobile phone company Nokia purchased Eizel in 2003 and made it a component of its newly-formed Enterprise Systems division.  Within Nokia, Berger and the team grew the Eizel product line into the Nokia One Business Server™ product.  At Nokia, Berger led an innovation team that interfaced between U.S. and European offices to bring product and intellectual property concepts from the research and venturing wing into the product units.  Berger left Nokia to co-found Penthera Technologies in 2005, where he served as chief technology officer, vice president, and director.  In 2007 Berger co-founded Penthera Partners and participated in the management buyout of the assets of Penthera Technologies.

During his years in the software industry, Berger has served as a public speaker at major venues and technology advisor and board member to startups and technology investment firms.  He holds degrees in physics and computer science from Harvard University, and a Ph.D. in computer science from Carnegie Mellon University.  He has been a recipient of an IBM Graduate Fellowship, a Harvard College Scholarship and a Thomas J. Watson Fellowship.  Berger has published more than 20 refereed papers and holds 10 U.S. patents.

Duolingo is a language education platform that teaches 20 languages to more than 150 million students worldwide.  Our free flagship learning app is the #1 way to learn a language online, and is the most-downloaded education app for both Android and iOS devices.  In this talk, I will describe the Duolingo system and several of our empirical research projects to date, which combine machine learning with computational linguistics and psychometrics to improve learning, engagement, and even language proficiency assessment through our products.

Burr Settles develops and studies statistical machine learning systems with applications in human language, biology, and social science. Currently, he is most excited about using these technologies to help people learn languages and make music.

Faculty Host: Alex Hauptmann

Language is socially situated:  both what we say and what we mean depend on our identities, our interlocutors, and the communicative setting.  The first generation of research in computational sociolinguistics focused on large-scale social categories, such as gender.  However, many of the most socially salient distinctions are locally defined.  Rather than attempt to annotate these social properties or extract them from metadata, we turn to social network analysis, which has been only lightly explored in traditional sociolinguistics.  I will describe three projects at the intersection of language and social networks.  

First, I will show how unsupervised learning over social network labelings and text enables the induction of social meanings for address terms, such as "Ms" and "dude".  Next, I will describe recent research that uses social network embeddings to induce personalized natural language processing systems for individual authors, improving performance on sentiment analysis and entity linking even for authors for whom no labeled data is available.  Finally, I will describe how the spread of linguistic innovations can serve as evidence for sociocultural influence, using a parametric Hawkes process to model the features that make dyads especially likely or unlikely to be conduits for language change.

Jacob Eisenstein is an Assistant Professor in the School of Interactive Computing at Georgia Tech.   He works on statistical natural language processing, focusing on computational sociolinguistics, social media analysis, discourse, and machine learning.  He is a recipient of the NSF CAREER Award, a member of the Air Force Office of Scientific Research (AFOSR) Young Investigator Program, and was a SICSA Distinguished Visiting Fellow at the University of Edinburgh.  His work has also been supported by the National Institutes for Health, the National Endowment for the Humanities, and Google.  Jacob was a Postdoctoral researcher at Carnegie Mellon and the University of Illinois.  He completed his Ph.D. at MIT in 2008, winning the George M. Sprowls dissertation award.  Jacob's research has been featured in the New York Times, National Public Radio, and the BBC. Thanks to his brief appearance in If These Knishes Could Talk, Jacob has a Bacon number of 2.

Faculty Host: Alex Hauptmann

Pages

Subscribe to LTI