Lemur Search   
Language Technologies Institute
Carnegie Mellon University
School of Computer Science

Aug 31

Chris Dyer

CMU

Feature-Rich Latent Variable Models for Statistical Machine Translation

In this talk, I argue that correct translations adhere to three kinds of constraints: lexical, configurational, and constraints enforcing target-language wellformedness. Lexical constraints ensure that the lexical choices are meaning-preserving; configurational constraints ensure that the relationships between source words and phrases (e.g., semantic roles and modifier-head relationships) are properly transformed in translation; and target-language wellformedness constraints ensure the grammaticality of the output. The constraint-based framework suggests a generate-and-test (discriminative) model of translation in which features sensitive to input and output structures are engineered by language and translation experts, and the feature weights are trained to maximize the conditional likelihood of a corpus of example translations. The specified features represent empirical hypotheses about what correlates (but not why) and thus encode domain-specific knowledge; the learned weights indicate to what extent these hypotheses are confirmed or refuted.

To demonstrate the usefulness of the feature-based approach, I discuss the performance two models: first, a lexical translation model evaluated by the word alignments it learns. Unlike previous unsupervised alignment models, the new model utilizes features that capture diverse lexical and alignment relationships, including morphological relatedness, orthographic similarity, and conventional co-occurrence statistics. Results from typologically diverse language pairs demonstrate that the generate-and-test model provides substantial performance benefits compared to state-of-the-art generative baselines. Second, I discuss the results of an end-to-end translation model in which lexical, configurational, and wellformedness constraints are modeled explicitly. This model is substantially more compact than state-of-the-art translation models, but still performs significantly better on languages where source-target word order differences are substantial.

Bio: Chris Dyer is an assistant professor in the Language Technologies Institute, and affiliated faculty in the Machine Learning Department, at Carnegie Mellon University School of Computer Science. Prior to that, he was a postdoctoral researcher in Prof. Noah Smith's lab at CMU. He completed his PhD on statistical machine translation with Philip Resnik at the University of Maryland in 2010. Together with Jimmy Lin, he is author of "Data-Intensive Text Processing with MapReduce", published by Morgan & Claypool in 2010. Current research interests include machine translation, unsupervised learning, Bayesian techniques, and "big data" problems in NLP.

Sep 7

Alan Black

CMU

Statistical Parametric Speech Synthesis

This talk will give an introduction to the development of statistical parametric speech synthesis. It will contrast the techniques with more classical unit selection speech synthesis. The talk will describe both the tools and theoretical models behind them. In addition to describing the current techniques, their strengths and their limitations we will also discuss possible directions in statistical parametric speech synthesis. Specifically looking at the use of non-standard parameterization (using articulatory features as opposed to more standard spectral features -- as developed at a JHU Workshop in 2011).

The growth of statistical parametric speech synthesis is related to the success in higher quality speech synthesis and the demand to move beyond the goals of ''natural'' and ''understandable'' synthesis. As speech technology systems improve there is the desire for expressive and conversational speech and statistical parametric speech synthesis must start adressing these styles too. Initial projects in expressiveness and conversational synthesis will also be presented.

Bio: Alan W Black is an Associate Professor in the Language Technologies Institute at Carnegie Mellon University. He previously worked in the Centre for Speech Technology Research at the University of Edinburgh, and before that at ATR in Japan. He is one of the principal authors of the free software Festival Speech Synthesis System, the FestVox voice building tools and CMU Flite, a small footprint speech synthesis engine. He received his PhD in Computational Linguistics from Edinburgh University in 1993, his MSc in Knowledge Based Systems also from Edinburgh in 1986, and a BSc (Hons) in Computer Science from Coventry University in 1984.

Although much of his core research focuses on speech synthesis, he also works in real-time hands-free speech-to-speech translation systems (Croatian, Arabic and Thai), spoken dialog systems, and rapid language adaptation for support of new languages. Alan W Black was an elected member of the IEEE Speech Technical Committee (2003-2007). He is currently on the board of ISCA, He was program chair of the ISCA Speech Synthesis Workshop 2004, and was general co-chair of Interspeech 2006 -- ICSLP. In 2004, with Prof Keiichi Tokuda, he initiated the now annual Blizzard Challenge, the largest multi-site evaluation of corpus-based speech synthesis techniques.

Sep 14

Eduard Hovy

CMU

NLP: Its Past and 3.5 Possible Futures

Natural Language text and speech processing (Computational Linguistics) is just over 50 years old, and is still continuously evolving not only in its technical subject matter, but in the basic questions being asked and the style and methodology being adopted to answer them. As unification followed finite-state technology in the 1980s, statistical processing followed that in the 1990s, and large-scale processing is increasingly being adopted (especially for commercial NLP) in this decade, a new and quite interesting trend is emerging: a split of the field into three somewhat complementary and rather different directions, each with its own goals, evaluation paradigms, and methodology. The resource creators focus on language and the representations required for language processing; the learning researchers focus on algorithms to effect the transformation of representation required in NLP; and the large-scale hackers produce engines that win the NLP competitions. But where the latter two trends have a fairly well-established methodology for research and papers, the first doesn't, and consequently suffers in recognition and funding. In the talk, I describe each trend, provide some examples of the first, and conclude with a few general questions, including: Where is the heart of NLP? What is the nature of the theories developed in each stream (if any)? What kind of work should one choose to do if one is a grad student today?

Bio: Eduard Hovy is a member of the Language Technology Institute in the School of Computer Science at Carnegie Mellon University. He holds adjunct professorships at universities in China, Korea, and Canada, and is co-Director of Research for the DHS Center for Command, Control, and Interoperability Data Analytics. He used to direct the Human Language Technology Group at the Information Sciences Institute of the University of Southern California. Dr. Hovy completed a Ph.D. in Computer Science (Artificial Intelligence) at Yale University in 1987. His research addresses areas in Natural Language Processing, including machine reading of text, question answering, information extraction, automated text summarization, the semi-automated construction of large lexicons and ontologies, and machine translation. Dr. Hovy is the author or co-editor of six books and over 300 technical articles and is a popular invited speaker. In 2001 Dr. Hovy served as President of the Association for Computational Linguistics (ACL) and in 200103 as President of the International Association of Machine Translation (IAMT). Dr. Hovy regularly co-teaches courses and serves on Advisory Boards for institutes and funding organizations in Germany, Italy, Netherlands, and the USA.

Sep 21

Joakim Nivre

Uppsala University / Google

Beyond MaltParser -- Advances in Transition-Based Dependency Parsing

The transition-based approach to dependency parsing has become popular thanks to its simplicity and efficiency. Systems like MaltParser achieve linear-time parsing with projective dependency trees using locally trained classifiers to predict the next parsing action and greedy best-first search to retrieve the optimal parse tree, assuming that the input sentence has been morphologically disambiguated using a part-of-speech tagger. In this talk, I survey recent developments in transition-based dependency parsing that address some of the limitations of the basic transition-based approach. First, I show how globally trained classifiers and beam search can be used to mitigate error propagation and enable richer feature representations. Secondly, I discuss different methods for extending the coverage to non-projective trees, which are required for linguistic adequacy in many languages.Finally, I present a model for joint tagging and parsing that leads to improvements in both tagging and parsing accuracy as compared to the standard pipeline approach.

Bio: Joakim Nivre is Professor of Computational Linguistics at Uppsala University and currently visiting scientist at Google, New York. He holds a Ph.D. in General Linguistics from the University of Gothenburg and a Ph.D. in Computer Science from Vaxjo University. Joakim's research focuses on data-driven methods for natural language processing, in particular for syntactic and semantic analysis. He is one of the main developers of the transition-based approach to syntactic dependency parsing, described in his 2006 book Inductive Dependency Parsing and implemented in the MaltParser system. Joakim's current research interests include the analysis of mildly non-projective dependency structures, the integration of morphological and syntactic processing for richly inflected languages, and methods for cross-framework parser evaluation. He has produced over 150 scientific publications, including 3 books, and has given nearly 70 invited talks at conferences and institutions around the world. He is the current secretary of the European Chapter of the Association for Computational Linguistics.

Sep 28

Mirella Lapata

University of Edinburgh

Talk to me in plain English, please! Explorations in Data-driven Text Simplification

Recent years have witnessed increased interest in data-driven methods for text rewriting, e.g., the problem of reformulating a query to alternative queries, a document in a simpler style, or a sentence in more concise manner. In this talk I will focus on text simplification, one of the oldest and best-studied rewriting problems. The popularity of the simplification task stems from its potential relevance to various applications. Examples include the development of reading aids for people with aphasia, non-native speakers and more generally individuals with low literacy.

In this talk I will discuss the challenges involved in text simplification and describe recent progress in leveraging large-scale corpora for model training. I will then present a model that simplifies documents automatically based on a synchronous grammar. I will explain how such a grammar can be induced from Wikipedia and introduce an integer linear programming model for selecting the most appropriate simplification from the space of possible rewrites generated by the grammar. Finally, I will present experimental results on simplifying Wikipedia articles showing that this approach significantly reduces reading difficulty, while producing grammatical and meaningful output.

Joint work with Kristian Woodsend

Bio: Mirella Lapata is a Professor at the School of Informatics at the University of Edinburgh. She hold an MLT from the Language Technologies Institute at Carnegie Mellon University, and a PhD in Natural Language Processing from the University of Edinburgh. Her research interests include statistical natural language processing, with an emphasis on unsupervised methods, mathematical programming, and generation applications. She serves as an associate editor of the Journal of Artificial Intelligence Research (JAIR) and is an action editor for Transactions of the Association for Computational Linguistics (TACL). She is the first recipient (2009) of the British Computer Society and Information Retrieval Specialist Group (BCS/IRSG) Karen Sparck Jones award. She has also received best paper awards in leading NLP conferences and financial support from the EPSRC (the UK Engineering and Physical Sciences Research Council).

Oct 5

David Blei

Princeton University

Probabilistic Topic Models of Text and Users

Probabilistic topic models provide a suite of tools for analyzing large document collections. Topic modeling algorithms can discover the latent themes that underlie the documents, and identify how each document exhibits those themes. Topic modeling can be used to help explore, summarize, and form predictions about documents.

Traditional topic modeling algorithms take a document collection as input and analyze the texts to estimate its latent thematic structure. But for many collections, we have an additional kind of data: how people use the documents. (As examples, consider weblog data or purchase histories.) In this talk, I will describe our recent research on simultaneously analyzing texts and the corresponding user data.

First I will describe collaborative topic models for document recommendation. Unlike classical matrix factorization, these models give interpretable dimensions to user interests and can form recommendations about sparsely rated or previously unrated items.

Then I will describe two models of legislative history. (In this data we consider lawmakers' votes on bills as a kind of "user data.") Ideal point topic models predict how lawmakers will vote on new bills, using the text of the bill to estimate its location in a political spectrum. Issue-adjusted ideal point models capture how a lawmaker's vote can deviate from her usual voting pattern, using the text of the bill to encode the issue under discussion.

With these three models I will demonstrate how texts can help us make better predictions of what users will do and how user data can give us information about what the texts are about.

This is joint work with Chong Wang and Sean Gerrish.
Video of David Blei's Talk

Bio: David Blei is an associate professor of Computer Science at Princeton University. He received his PhD in 2004 at U.C. Berkeley and was a postdoctoral fellow at Carnegie Mellon University. His research focuses on probabilistic topic models, Bayesian nonparametric methods, and approximate posterior inference. He works on a variety of applications, including text, images, music, social networks, and scientific data.

Oct 12

Daniel Povey

Johns Hopkins University

Subspace Gaussian Mixture Models for Speech Recognition

In speech recognition, the Subspace Gaussian Mixture model is a modeling framework based on the conventional HMM-GMM framework which reliably gives better results, especially when the amount of training data is limited. I will describe this framework, and talk about how these models are trained and how they are efficiently evaluted. The talk will also address some of the practical issues encountered in training them, and their advantages and disadvantages compared to the conventional Gaussian Mixture Model.

Bio:Daniel Povey completed his PhD at Cambridge University in 2003, and after spending just under ten years working for industry research labs (IBM Research and then Microsoft Research), joined Johns Hopkins University in 2012. His thesis work introduced several practical innovations for discriminative training of models for speech recognition, and made those techniques widely popular. At IBM Research he introduced feature-space discriminative training, which has become a common feature of state-of-the art systems. He also devised the Subspace Gaussian Mixture Model-- a modeling technique which enhances the Gaussian Mixture Model framework by using subspace ideas similar to those used in speaker identification. At Microsoft Research and then at Johns Hopkins University, he has been creating a speech recognition toolkit "Kaldi", which aims to make state-of-the-art speech recognition techniques widely accessible.

Oct 26

Donald Metzler

Google

Sanitizing, Searching, and Summarizing Microblog Streams

Microblog services, such as Tumblr and Twitter, provide users with the ability to broadcast short messages on a wide range of topics in real-time. Microblog streams often contain a considerable amount of information about local, regional, national, and global events. Most microblog search capabilities are simple, providing a (temporally) ordered list of results in response to a user’s query. This talk will describe a line of research aimed at developing advanced search techniques in the microblog domain. The work encompasses aspects of information retrieval, natural language processing, and data mining. Topics covered will include an analysis of microblog language usage, automatically decoding "Tweetspeak", effective microblog post search using machine learned ranking functions, and automatic timespan retrieval/timeline construction using novel temporal query expansion approaches.

Bio: Donald Metzler is a Senior Software Engineer at Google. Prior to that he was a Senior Research Scientist at Yahoo! and a Research Assistant Professor of Computer Science at the University of Southern California (USC). He obtained his Ph.D. from the University of Massachusetts. As an active member of the information retrieval, Web search, and natural language processing research communities, he has served on the senior program committees of SIGIR, CIKM, and WWW, and is a member of the Information Retrieval Journal and ACM TOIS' editorial boards. He has published over 50 research papers, has 2 patents granted, 14 patents pending, and is the author of "A Feature-Centric View of Information Retrieval" and co-author of "Search Engines: Information Retrieval in Practice".

Nov 2

Bhiksha Raj

CMU

Hearing Without Listening

Speech is one of the most private forms of communication. People do not like to be eavesdropped on. They will frequently even object to being recorded; in fact in many places it is illegal to record people speaking in public, even when it is acceptable to capture their images on video. Yet, when a person uses a speech-based service such as SIRI, they must grant the service complete access to their voice recordings, implicitly trusting that the service will not abuse the recordings, to identify, track, or even impersonate the user.
Privacy concerns also arise in other situations. For instance, a doctor cannot just transmit a dictated medical record to a generic voice-recognition service for fear of violating HIPAA requirements; the service provider requires various clearances first. Surveillance agencies must have access to all recordings by all callers on a telephone line, just to determine if a specific person of interest has spoken over that line. Thus, in searching for Jack Terrorist, they also end up being able to listen to and thereby violate the privacy of John and Jane Doe.
In this talk we will briefly discuss two *privacy-preserving* paradigms that enable voice-based services to be performed securely. The goal is to enable the performance of voice-processing tasks while ensuring that no party, including the user, the system, or a snooper, can derive unintended information from the transaction.
In the first paradigm, conventional voice-processing algorithms are rendered secure by employing cryptographic tools and interactive "secure multi-party conputation" mechanisms to ensure that no undesired information is leaked by any party. In this paradigm the accuracy of the basic voice-processing algorithm remains essentially unchanged with respect to the non-private version; however the privacy requirements introduce large computational and communication overhead. Moreover assumptions must be made about the honesty of the parties.
The second paradigm, which applies specifically to the problem of voice *authentication* with privacy, converts the problem of matching voice patterns to a string-comparison operation. Using a combination of appropriate data representation and locality sensitive hashing schemes, both the data to be matched and the patterns they must match are converted to bit strings, and pattern classification is performed by counting exact matches. The computational overhead of this string-comparison framework is minimal, and no assumptions need be made about the honesty of the participants. However, this comes at the price of restrictions on the classification tasks that may be performed and the classification mechanisms that may be employed.

Bio: Bhiksha Raj is an Associate Professor in the Language Technologies Institute at Carnegie Mellon University, with additional affiliations to the Electrical and Computer Engineering and Machine Learning departments. Dr. Raj obtained his PhD from CMU in 2000 and was at Mistubishi Electric Research Laboratories from 2001-2008. Dr. Raj's chief research interests lie in automatic speech recognition, computer audition, machine learning and data privacy. Dr. Raj's latest research interests lie in the newly emerging field of privacy-preserving speech processing, in which his research group has made several contributions.

Nov 9

Micha Elsner

Ohio State University

Bridging the gap: from sounds to words

During early language acquisition, infants must learn both a lexicon and a model of phonetics that explains how lexical items can vary in pronunciation-- for instance "you" might be realized as 'you' with a full vowel or reduced to 'yeh' with a schwa. Previous models of acquisition have generally tackled these problems in isolation, yet behavioral evidence suggests infants acquire lexical and phonetic knowledge simultaneously. I will present ongoing research on constructing a Bayesian model which can simultaneously group together phonetic variants of the same lexical item, learn a probabilistic language model predicting the next word in an utterance from its context, and learn a model of pronunciation variability based on articulatory features.
I will discuss a model which takes word boundaries as given and focuses on clustering the lexical items (published at ACL 2012). I will also give preliminary results for a model which searches for word boundaries at the same time as performing the clustering.

Bio: Micha Elsner is an Assistant Professor of Linguistics at the Ohio State University, where he started in August. He completed his PhD in 2011 at Brown University, working on models of local coherence. He then worked on Bayesian models of language acquisition as a postdoctoral researcher at the University of Edinburgh.

Nov 16

Antoine Raux

Honda Research Institute USA

Understanding User Intention in Context for Robust Human-Machine Interaction

With its widely publicized release of the Siri virtual assistant, Apple revived the age old dream of artificial agents able to communicate with people through natural language. However, current such systems depend heavily on highly accurate speech recognition and focus on one-shot question answering and constrained tasks such as email dictation. In order to conduct longer dialogs to solve potentially complex issues in conditions where speech understanding is less than perfect, new technologies for inferring the user’s intention in context are needed. To address these issues, probabilistic belief tracking models integrate multiple hypotheses of the user’s spoken (and ulimately multimodal) meaning across several turns of dialog to estimate the most likely user intention.
Since 2009, the Dialog Human Machine Interaction group at Honda Research Institute USA has been developing technologies to enhance and scale up probabilistic belief tracking. In this talk, I will present our work on three fundamental aspects of belief tracking: domain knowledge representation, evidence integration, and real-time inference. A good domain representation needs to capture prior knowledge about the domain in a compact yet informative way. Our model achieves this in two ways, through tree-shaped Bayesian Networks to capture is-a/has-a relations between concepts and 2D probabilistic kernels to capture spatial relations between physical entities. Noisy evidence from speech understanding is incorporated into the model by augmenting the domain Bayesian Network with utterance-level dialog acts and concept-specific evidence nodes, allowing to effectively exploit potentially long (and noisy) n-best lists of understanding results and track the whole dialog history. In order to achieve the efficient inference critical for interactive systems, we developed several algorithms that allow our model to run in real time even for domains with tens of thousands of entities (e.g. a destination setting voice interface for a car navigation system).
Finally, to illustrate the potential of belief tracking as a general way of approaching natural human-machine interaction, I will describe our current work, in collaboration with CMU Silicon Valley, to extend our approach to multimodal in-car systems.

Bio: Antoine Raux is a senior scientist at the Honda Research Institute USA. His research focuses on interactive systems, particularly those involving spoken dialog, with an emphasis on understanding user intention from noisy input. Prior to joining HRI, Antoine got his PhD from the Language Technologies Institute at Carnegie Mellon University, where his research dealt with many aspects of spoken dialog systems and resulted in the development of the Let's Go Bus Information System. He also has a Master degree from Kyoto University and an engineering degree from Ecole Polytechnique in Paris.

Nov 30

Dan Gildea

University of Rochester

Models and Algorithms for Machine Translation

The first part of the talk will examine the theoretical properties of Multi Bottom-Up Tree Transducers, which have been proposed as a general model for syntax-based machine translation with the desirable property of being closed under composition. Tree transducers are defined as relations between trees, but in machine translation, we are ultimately concerned with the relations between the strings at the yields of the input and output trees. We focus on the formal power of Multi Bottom-Up Tree Transducers from this point of view.
The second, more applied, part of the talk will present an algorithm for sampling trees from forests, in the setting where probabilities for each tree may be a function of arbitrarily large tree fragments. This setting extends recent work on sampling to learn Tree Substitution Grammars to the case where the tree structure (TSG derived tree) is not fixed. We will present experiments using the algorithm to learn Hiero-style rules for machine translation from forests that represent the set of possible rules that are consistent with fixed input word-level alignments.

Bio: Dan Gildea received his Ph.D. in computer science from UC Berkeley and was postdoctoral scholar at Penn before joining the University of Rochester in 2003. He is an associate professor of computer science with research interests machine translation and language understanding.

Dec 7

Anoop Sarkar

Simon Fraser University

Ensemble Decoding for Statistical Machine Translation

Statistical machine translation is often faced with the problem of combining data from many diverse sources into a single translation model. In this talk we introduce a novel approach called ensemble decoding that combines multiple translation models during the process of translation. We show that this technique is applicable in many diverse areas in machine translation:
(a) Domain adaptation is needed when the training data is from a different domain than the test data. We show that ensemble decoding can effectively combine out-of-domain and in-domain translation models.
(b) Multi-metric optimization modifies discriminative training for machine translation to prefer Pareto-optimal points with respect to multiple evaluation measures. We use ensemble decoding to combine the Pareto-optimal weight vectors obtained in multi-metric optimization. Furthermore, the ensemble weights are tuned to prefer Pareto-optimal solutions.
(c) In translation out of resource-poor languages, a pivot language is often used to augment the translation model from source to target. Ensemble models provide a novel way to combine the direct translation model (from source to target) and the pivot model (from source to pivot to target).

Bio: Anoop Sarkar is an Associate Professor at Simon Fraser University in British Columbia, Canada where he co-directs the Natural Language Laboratory (http://natlang.cs.sfu.ca). He received his Ph.D. from the Department of Computer and Information Sciences at the University of Pennsylvania under Prof. Aravind Joshi for his work on semi-supervised statistical parsing using tree-adjoining grammars.
His research is focused on statistical parsing and machine translation: in the areas of syntax and morphology in MT, semi-supervised learning, and domain adaptation. His interests also include formal language theory and stochastic grammars, in particular tree automata and tree-adjoining grammars.

Language Technologies Institute • 5000 Forbes Ave • Pittsburgh, PA 15213-3891 • (412) 268-6591