Feature-Rich Latent Variable Models for Statistical Machine Translation
In this talk, I argue that correct translations adhere to three kinds of constraints:
lexical, configurational, and constraints enforcing target-language wellformedness. Lexical constraints ensure that the
lexical choices are meaning-preserving; configurational constraints ensure that the relationships between source words
and phrases (e.g., semantic roles and modifier-head relationships) are properly transformed in translation; and
target-language wellformedness constraints ensure the grammaticality of the output. The constraint-based framework
suggests a generate-and-test (discriminative) model of translation in which features sensitive to input and output
structures are engineered by language and translation experts, and the feature weights are trained to maximize the
conditional likelihood of a corpus of example translations. The specified features represent empirical hypotheses
about what correlates (but not why) and thus encode domain-specific knowledge; the learned weights indicate to what
extent these hypotheses are confirmed or refuted.
To demonstrate the usefulness of the feature-based approach, I discuss the performance
two models: first, a lexical translation model evaluated by the word alignments it learns. Unlike previous unsupervised
alignment models, the new model utilizes features that capture diverse lexical and alignment relationships, including
morphological relatedness, orthographic similarity, and conventional co-occurrence statistics. Results from typologically
diverse language pairs demonstrate that the generate-and-test model provides substantial performance benefits compared
to state-of-the-art generative baselines. Second, I discuss the results of an end-to-end translation model in which
lexical, configurational, and wellformedness constraints are modeled explicitly. This model is substantially more compact
than state-of-the-art translation models, but still performs significantly better on languages where source-target word
order differences are substantial.
Bio: Chris Dyer is an assistant professor in the Language Technologies Institute,
and affiliated faculty in the Machine Learning Department, at Carnegie Mellon University School of Computer Science.
Prior to that, he was a postdoctoral researcher in Prof. Noah Smith's lab at CMU. He completed his PhD on statistical
machine translation with Philip Resnik at the University of Maryland in 2010. Together with Jimmy Lin, he is author of
"Data-Intensive Text Processing with MapReduce", published by Morgan & Claypool in 2010. Current research interests
include machine translation, unsupervised learning, Bayesian techniques, and "big data" problems in NLP.
Statistical Parametric Speech Synthesis
This talk will give an introduction to the development of statistical parametric
speech synthesis. It will contrast the techniques with more classical unit selection speech synthesis. The talk
will describe both the tools and theoretical models behind them. In addition to describing the current techniques,
their strengths and their limitations we will also discuss possible directions in statistical parametric speech
synthesis. Specifically looking at the use of non-standard parameterization (using articulatory features as
opposed to more standard spectral features -- as developed at a JHU Workshop in 2011).
The growth of statistical parametric speech synthesis is related to the success in higher quality speech synthesis
and the demand to move beyond the goals of ''natural'' and ''understandable'' synthesis. As speech technology systems
improve there is the desire for expressive and conversational speech and statistical parametric speech synthesis
must start adressing these styles too. Initial projects in expressiveness and conversational synthesis will also
Bio: Alan W Black is an Associate Professor in the Language Technologies
Institute at Carnegie Mellon University. He previously worked in the Centre for Speech Technology Research at the
University of Edinburgh, and before that at ATR in Japan. He is one of the principal authors of the free software
Festival Speech Synthesis System, the FestVox voice building tools and CMU Flite, a small footprint speech synthesis
engine. He received his PhD in Computational Linguistics from Edinburgh University in 1993, his MSc in Knowledge
Based Systems also from Edinburgh in 1986, and a BSc (Hons) in Computer Science from Coventry University in 1984.
Although much of his core research focuses on speech synthesis, he also works in real-time hands-free speech-to-speech
translation systems (Croatian, Arabic and Thai), spoken dialog systems, and rapid language adaptation for support of
new languages. Alan W Black was an elected member of the IEEE Speech Technical Committee (2003-2007). He is currently
on the board of ISCA, He was program chair of the ISCA Speech Synthesis Workshop 2004, and was general co-chair of
Interspeech 2006 -- ICSLP. In 2004, with Prof Keiichi Tokuda, he initiated the now annual Blizzard Challenge, the
largest multi-site evaluation of corpus-based speech synthesis techniques.
NLP: Its Past and 3.5 Possible Futures
Natural Language text and speech processing (Computational Linguistics) is
just over 50 years old, and is still continuously evolving not only in its technical subject matter, but in the
basic questions being asked and the style and methodology being adopted to answer them. As unification followed
finite-state technology in the 1980s, statistical processing followed that in the 1990s, and large-scale processing
is increasingly being adopted (especially for commercial NLP) in this decade, a new and quite interesting trend is
emerging: a split of the field into three somewhat complementary and rather different directions, each with its own
goals, evaluation paradigms, and methodology. The resource creators focus on language and the representations required
for language processing; the learning researchers focus on algorithms to effect the transformation of representation
required in NLP; and the large-scale hackers produce engines that win the NLP competitions. But where the latter two
trends have a fairly well-established methodology for research and papers, the first doesn't, and consequently suffers
in recognition and funding. In the talk, I describe each trend, provide some examples of the first, and conclude with
a few general questions, including: Where is the heart of NLP? What is the nature of the theories developed in each
stream (if any)? What kind of work should one choose to do if one is a grad student today?
Bio: Eduard Hovy is a member of the Language Technology Institute in the
School of Computer Science at Carnegie Mellon University. He holds adjunct professorships at universities in China,
Korea, and Canada, and is co-Director of Research for the DHS Center for Command, Control, and Interoperability Data
Analytics. He used to direct the Human Language Technology Group at the Information Sciences Institute of the University
of Southern California. Dr. Hovy completed a Ph.D. in Computer Science (Artificial Intelligence) at Yale University in 1987.
His research addresses areas in Natural Language Processing, including machine reading of text, question answering, information
extraction, automated text summarization, the semi-automated construction of large lexicons and ontologies, and machine
translation. Dr. Hovy is the author or co-editor of six books and over 300 technical articles and is a popular invited speaker.
In 2001 Dr. Hovy served as President of the Association for Computational Linguistics (ACL) and in 200103 as President of the
International Association of Machine Translation (IAMT). Dr. Hovy regularly co-teaches courses and serves on Advisory Boards
for institutes and funding organizations in Germany, Italy, Netherlands, and the USA.
Uppsala University / Google
Beyond MaltParser -- Advances in Transition-Based Dependency Parsing
The transition-based approach to dependency parsing has become
popular thanks to its simplicity and efficiency. Systems like MaltParser
achieve linear-time parsing with projective dependency trees using locally
trained classifiers to predict the next parsing action and greedy best-first
search to retrieve the optimal parse tree, assuming that the input sentence has
been morphologically disambiguated using a part-of-speech tagger. In this talk,
I survey recent developments in transition-based dependency parsing that address
some of the limitations of the basic transition-based approach. First, I show
how globally trained classifiers and beam search can be used to mitigate error
propagation and enable richer feature representations. Secondly, I discuss
different methods for extending the coverage to non-projective trees, which are
required for linguistic adequacy in many languages.Finally, I present a
model for joint tagging and parsing that leads to improvements in both tagging
and parsing accuracy as compared to the standard pipeline approach.
Bio: Joakim Nivre is Professor of Computational Linguistics at Uppsala
University and currently visiting scientist at Google, New York. He holds a
Ph.D. in General Linguistics from the University of Gothenburg and a Ph.D. in
Computer Science from Vaxjo University. Joakim's research focuses on data-driven
methods for natural language processing, in particular for syntactic and semantic analysis. He is one of the main
developers of the transition-based
approach to syntactic dependency parsing, described in his 2006 book Inductive
Dependency Parsing and implemented in the MaltParser system. Joakim's current
research interests include the analysis of mildly non-projective dependency
structures, the integration of morphological and syntactic processing for richly
inflected languages, and methods for cross-framework parser evaluation. He has
produced over 150 scientific publications, including 3 books, and has given
nearly 70 invited talks at conferences and institutions around the world. He is
the current secretary of the European Chapter of the Association for
University of Edinburgh
Talk to me in plain English, please! Explorations in Data-driven Text Simplification
Recent years have witnessed increased interest in data-driven methods
for text rewriting, e.g., the problem of reformulating a query to
alternative queries, a document in a simpler style, or a sentence in
more concise manner. In this talk I will focus on text
simplification, one of the oldest and best-studied rewriting
problems. The popularity of the simplification task stems from its
potential relevance to various applications. Examples include the
development of reading aids for people with aphasia, non-native
speakers and more generally individuals with low literacy.
In this talk I will discuss the challenges involved in text
simplification and describe recent progress in leveraging large-scale
corpora for model training. I will then present a model that
simplifies documents automatically based on a synchronous grammar. I
will explain how such a grammar can be induced from Wikipedia and
introduce an integer linear programming model for selecting the most
appropriate simplification from the space of possible rewrites
generated by the grammar. Finally, I will present experimental
results on simplifying Wikipedia articles showing that this approach
significantly reduces reading difficulty, while producing grammatical
and meaningful output.
Joint work with Kristian Woodsend
Bio: Mirella Lapata is a Professor at the School of Informatics at the
University of Edinburgh. She hold an MLT from the Language Technologies Institute at Carnegie Mellon University,
and a PhD in Natural Language Processing from the University of Edinburgh. Her research interests include statistical
natural language processing, with an emphasis on unsupervised methods,
mathematical programming, and generation applications. She serves as
an associate editor of the Journal of Artificial Intelligence Research
(JAIR) and is an action editor for Transactions of the Association for
Computational Linguistics (TACL). She is the first recipient (2009) of
the British Computer Society and Information Retrieval Specialist
Group (BCS/IRSG) Karen Sparck Jones award. She has also received best
paper awards in leading NLP conferences and financial support from the
EPSRC (the UK Engineering and Physical Sciences Research Council).
Probabilistic Topic Models of Text and Users
Probabilistic topic models provide a suite of tools for analyzing
large document collections. Topic modeling algorithms can discover
the latent themes that underlie the documents, and identify how each
document exhibits those themes. Topic modeling can be used to help
explore, summarize, and form predictions about documents.
Traditional topic modeling algorithms take a document collection as
input and analyze the texts to estimate its latent thematic structure.
But for many collections, we have an additional kind of data: how
people use the documents. (As examples, consider weblog data or
purchase histories.) In this talk, I will describe our recent
research on simultaneously analyzing texts and the corresponding user
First I will describe collaborative topic models for document
recommendation. Unlike classical matrix factorization, these models
give interpretable dimensions to user interests and can form
recommendations about sparsely rated or previously unrated items.
Then I will describe two models of legislative history. (In this data
we consider lawmakers' votes on bills as a kind of "user data.")
Ideal point topic models predict how lawmakers will vote on new bills,
using the text of the bill to estimate its location in a political
spectrum. Issue-adjusted ideal point models capture how a lawmaker's
vote can deviate from her usual voting pattern, using the text of the
bill to encode the issue under discussion.
With these three models I will demonstrate how texts can help us make
better predictions of what users will do and how user data can give us
information about what the texts are about.
This is joint work with Chong Wang and Sean Gerrish.
Video of David Blei's Talk
Bio: David Blei is an associate professor of Computer Science at
Princeton University. He received his PhD in 2004 at U.C. Berkeley
and was a postdoctoral fellow at Carnegie Mellon University. His
research focuses on probabilistic topic models, Bayesian nonparametric
methods, and approximate posterior inference. He works on a variety
of applications, including text, images, music, social networks, and
Johns Hopkins University
Subspace Gaussian Mixture Models for Speech Recognition
In speech recognition, the Subspace Gaussian Mixture model is a modeling framework based on the conventional HMM-GMM framework which reliably gives better results, especially when the amount of training data is limited. I will describe this framework, and talk about how these models are trained and how they are efficiently evaluted. The talk will also address some of the practical issues encountered in training them, and their advantages and disadvantages compared to the conventional Gaussian Mixture Model.
Bio:Daniel Povey completed his PhD at Cambridge University in 2003, and after spending just under ten years working for industry research labs (IBM Research and then Microsoft Research), joined Johns Hopkins University in 2012. His thesis work introduced several practical innovations for discriminative training of models for speech recognition, and made those techniques widely popular. At IBM Research he introduced feature-space discriminative training, which has become a common feature of state-of-the art systems. He also devised the Subspace Gaussian Mixture Model-- a modeling technique which enhances the Gaussian Mixture Model framework by using subspace ideas similar to those used in speaker identification. At Microsoft Research and then at Johns Hopkins University, he has been creating a speech recognition toolkit "Kaldi", which aims to make state-of-the-art speech recognition techniques widely accessible.
Sanitizing, Searching, and Summarizing Microblog Streams
Microblog services, such as Tumblr and Twitter, provide users with the ability to broadcast short messages on a wide range of topics in real-time. Microblog streams often contain a considerable amount of information about local, regional, national, and global events. Most microblog search capabilities are simple, providing a (temporally) ordered list of results in response to a userís query. This talk will describe a line of research aimed at developing advanced search techniques in the microblog domain. The work encompasses aspects of information retrieval, natural language processing, and data mining. Topics covered will include an analysis of microblog language usage, automatically decoding "Tweetspeak", effective microblog post search using machine learned ranking functions, and automatic timespan retrieval/timeline construction using novel temporal query expansion approaches.
Bio: Donald Metzler is a Senior Software Engineer at Google. Prior to that he was a Senior Research Scientist at Yahoo! and a Research Assistant Professor of Computer Science at the University of Southern California (USC). He obtained his Ph.D. from the University of Massachusetts. As an active member of the information retrieval, Web search, and natural language processing research communities, he has served on the senior program committees of SIGIR, CIKM, and WWW, and is a member of the Information Retrieval Journal and ACM TOIS' editorial boards. He has published over 50 research papers, has 2 patents granted, 14 patents pending, and is the author of "A Feature-Centric View of Information Retrieval" and co-author of "Search Engines: Information Retrieval in Practice".
Hearing Without Listening
Speech is one of the most private forms of communication. People do not like to be eavesdropped on. They will frequently even object to being recorded; in fact in many places it is illegal to record people speaking in public, even when it is acceptable to capture their images on video. Yet, when a person uses a speech-based service such as SIRI, they must grant the service complete access to their voice recordings, implicitly trusting that the service will not abuse the recordings, to identify, track, or even impersonate the user.
Privacy concerns also arise in other situations. For instance, a doctor cannot just transmit a dictated medical record to a generic voice-recognition service for fear of violating HIPAA requirements; the service provider requires various clearances first. Surveillance agencies must have access to all recordings by all callers on a telephone line, just to determine if a specific person of interest has spoken over that line. Thus, in searching for Jack Terrorist, they also end up being able to listen to and thereby violate the privacy of John and Jane Doe.
In this talk we will briefly discuss two *privacy-preserving* paradigms that enable voice-based services to be performed securely. The goal is to enable the performance of voice-processing tasks while ensuring that no party, including the user, the system, or a snooper, can derive unintended information from the transaction.
In the first paradigm, conventional voice-processing algorithms are rendered secure by employing cryptographic tools and interactive "secure multi-party conputation" mechanisms to ensure that no undesired information is leaked by any party. In this paradigm the accuracy of the basic voice-processing algorithm remains essentially unchanged with respect to the non-private version; however the privacy requirements introduce large computational and communication overhead. Moreover assumptions must be made about the honesty of the parties.
The second paradigm, which applies specifically to the problem of voice *authentication* with privacy, converts the problem of matching voice patterns to a string-comparison operation. Using a combination of appropriate data representation and locality sensitive hashing schemes, both the data to be matched and the patterns they must match are converted to bit strings, and pattern classification is performed by counting exact matches. The computational overhead of this string-comparison framework is minimal, and no assumptions need be made about the honesty of the participants. However, this comes at the price of restrictions on the classification tasks that may be performed and the classification mechanisms that may be employed.
Bio: Bhiksha Raj is an Associate Professor in the Language Technologies Institute at Carnegie Mellon University, with additional affiliations to the Electrical and Computer Engineering and Machine Learning departments. Dr. Raj obtained his PhD from CMU in 2000 and was at Mistubishi Electric Research Laboratories from 2001-2008. Dr. Raj's chief research interests lie in automatic speech recognition, computer audition, machine learning and data privacy. Dr. Raj's latest research interests lie in the newly emerging field of privacy-preserving speech processing, in which his research group has made several contributions.
Ohio State University
Bridging the gap: from sounds to words
During early language acquisition, infants must learn both a lexicon
and a model of phonetics that explains how lexical items can vary in
pronunciation-- for instance "you" might be realized as 'you' with a
full vowel or
reduced to 'yeh' with a schwa. Previous models of acquisition have
generally tackled these
problems in isolation, yet behavioral evidence suggests infants
acquire lexical and phonetic knowledge simultaneously. I will present
ongoing research on constructing a Bayesian model which can
simultaneously group together phonetic variants of the same lexical
item, learn a probabilistic language model predicting the next word in
an utterance from its context, and learn a model of pronunciation
variability based on articulatory features.
I will discuss a model which takes word boundaries as given and
focuses on clustering the lexical items (published at ACL 2012).
I will also give preliminary
results for a model which searches for word boundaries at the same
time as performing the clustering.
Bio: Micha Elsner is an Assistant Professor of Linguistics at the Ohio
State University, where he started in August. He completed his PhD in
2011 at Brown University, working on models of local coherence. He
then worked on Bayesian models of language acquisition as a
postdoctoral researcher at the University of Edinburgh.
Honda Research Institute USA
Understanding User Intention in Context for Robust Human-Machine Interaction
With its widely publicized release of the Siri virtual assistant, Apple revived the age old dream of artificial agents able to communicate with people through natural language. However, current such systems depend heavily on highly accurate speech recognition and focus on one-shot question answering and constrained tasks such as email dictation. In order to conduct longer dialogs to solve potentially complex issues in conditions where speech understanding is less than perfect, new technologies for inferring the userís intention in context are needed. To address these issues, probabilistic belief tracking models integrate multiple hypotheses of the userís spoken (and ulimately multimodal) meaning across several turns of dialog to estimate the most likely user intention.
Since 2009, the Dialog Human Machine Interaction group at Honda Research Institute USA has been developing technologies to enhance and scale up probabilistic belief tracking. In this talk, I will present our work on three fundamental aspects of belief tracking: domain knowledge representation, evidence integration, and real-time inference. A good domain representation needs to capture prior knowledge about the domain in a compact yet informative way. Our model achieves this in two ways, through tree-shaped Bayesian Networks to capture is-a/has-a relations between concepts and 2D probabilistic kernels to capture spatial relations between physical entities. Noisy evidence from speech understanding is incorporated into the model by augmenting the domain Bayesian Network with utterance-level dialog acts and concept-specific evidence nodes, allowing to effectively exploit potentially long (and noisy) n-best lists of understanding results and track the whole dialog history. In order to achieve the efficient inference critical for interactive systems, we developed several algorithms that allow our model to run in real time even for domains with tens of thousands of entities (e.g. a destination setting voice interface for a car navigation system).
Finally, to illustrate the potential of belief tracking as a general way of approaching natural human-machine interaction, I will describe our current work, in collaboration with CMU Silicon Valley, to extend our approach to multimodal in-car systems.
Bio: Antoine Raux is a senior scientist at the Honda Research Institute USA. His research focuses on interactive systems, particularly those involving spoken dialog, with an emphasis on understanding user intention from noisy input. Prior to joining HRI, Antoine got his PhD from the Language Technologies Institute at Carnegie Mellon University, where his research dealt with many aspects of spoken dialog systems and resulted in the development of the Let's Go Bus Information System. He also has a Master degree from Kyoto University and an engineering degree from Ecole Polytechnique in Paris.
University of Rochester
Models and Algorithms for Machine Translation
The first part of the talk will examine the theoretical properties of
Multi Bottom-Up Tree Transducers, which have been proposed as a
general model for syntax-based machine translation with the desirable
property of being closed under composition. Tree transducers are
defined as relations between trees, but in machine translation, we are
ultimately concerned with the relations between the strings at the
yields of the input and output trees. We focus on the formal power of
Multi Bottom-Up Tree Transducers from this point of view.
The second, more applied, part of the talk will present an algorithm
for sampling trees from forests, in the setting where probabilities
for each tree may be a function of arbitrarily large tree
fragments. This setting extends recent work on sampling to learn
Tree Substitution Grammars to the case where the tree structure (TSG
derived tree) is not fixed. We will present experiments using the
algorithm to learn Hiero-style rules for machine translation from
forests that represent the set of possible rules that are consistent
with fixed input word-level alignments.
Bio: Dan Gildea received his Ph.D. in computer science from UC Berkeley and
was postdoctoral scholar at Penn before joining the University of
Rochester in 2003. He is an associate professor of computer science
with research interests machine translation and language
Simon Fraser University
Ensemble Decoding for Statistical Machine Translation
Statistical machine translation is often faced with the problem of combining data from many diverse sources into a single translation model. In this talk we introduce a novel approach called ensemble decoding that combines multiple translation models during the process of translation. We show that this technique is applicable in many diverse areas in machine translation:
(a) Domain adaptation is needed when the training data is from a different domain than the test data. We show that ensemble decoding can effectively combine out-of-domain and in-domain translation models.
(b) Multi-metric optimization modifies discriminative training for machine translation to prefer Pareto-optimal points with respect to multiple evaluation measures. We use ensemble decoding to combine the Pareto-optimal weight vectors obtained in multi-metric optimization. Furthermore, the ensemble weights are tuned to prefer Pareto-optimal solutions.
(c) In translation out of resource-poor languages, a pivot language is often used to augment the translation model from source to target. Ensemble models provide a novel way to combine the direct translation model (from source to target) and the pivot model (from source to pivot to target).
Bio: Anoop Sarkar is an Associate Professor at Simon Fraser University in British Columbia, Canada where he co-directs the Natural Language Laboratory (http://natlang.cs.sfu.ca). He received his Ph.D. from the Department of Computer and Information Sciences at the University of Pennsylvania under Prof. Aravind Joshi for his work on semi-supervised statistical parsing using tree-adjoining grammars.
His research is focused on statistical parsing and machine translation: in the areas of syntax and morphology in MT, semi-supervised learning, and domain adaptation. His interests also include formal language theory and stochastic grammars, in particular tree automata and tree-adjoining grammars.