Multilingual Guidance for Unsupervised Linguistic Structure Prediction
Learning linguistic analyzers from unannotated data remains a major challenge; can multilingual
text help? In this talk, I will describe learning methods that use unannotated data in a target language along with annotated data
in more resource-rich "helper" languages. I will focus on two lines of work. First, I will describe a graph-based semi-supervised
learning approach that uses parallel data to learn part-of-speech tag sequences through type-level lexical transfer from a helper
language. Second, I will examine a more ambitious goal of learning part-of-speech sequences and dependency trees from raw text,
leveraging parameter-level transfer from helper languages, but without any parallel data. Both approaches result in significant
improvements over strong state-of-the-art monolingual unsupervised baselines.
Bio: Dipanjan Das is a Ph.D. student at the Language Technologies Institute, School of
Computer Science at Carnegie Mellon University. He works on statistical natural language processing under the mentorship of Noah
Smith. He finished his M.S. at the same institute in 2008, conducting research on language generation with Alexander Rudnicky.
Das completed his undergraduate degree in 2005 from the Indian Institute of Technology, Kharagpur, where he received the best
undergraduate thesis award in Computer Science and Engineering and the Dr. B.C. Roy Memorial Gold Medal for best all-round
performance in academics and co-curricular activities. He worked at Google Research, New York as an intern in 2010 and received
the best paper award at the ACL 2011 conference.
Google's Speech Internationalization Project: From 1 to 300 Languages and Beyond
The speech team at google has built speech recognition systems in more that 30 languages
in little more than 2 years. In this talk we will describe the history of this project and more interestingly what
technologies have been developed to achieve this goal. I'll explore a bit some of the acoustic modeling, lexicon, language
modeling, infrastructure techniques and even social engineering techniques used to achieve our ultimate goal, to build
speech recognition systems in the top 300 languages of the planet.
Bio: Dr. Pedro J. Moreno leads the speech global engineering group at the Android
division of Google. His team is in charge of deploying speech recognition services in as many languages as possible. He
joined Google 7 years ago after working as a research scientist at HP Labs. During his work at HP he worked mostly in audio
indexing systems. Dr. Moreno completed his Ph.D. studies at Carnegie Mellon University under the direction of Prof. Richard
Stern. His work there was focused on noise robustness in speech recognition systems. His Ph.D. studies were sponsored by a
Fulbright scholarship. Before that he completed an Electrical Engineering degree at Universidad Politecnica de Madrid,
Fast Effective Clustering for Graphs and Documents
We describe two new methods for clustering nodes in a graphs. The
first method is simple to implement, easily parallelized, and very
fast: on a single machine, it runs in time linear with the number of
edges in the graph. Experimentally the method leads to clusterings
that are comparable in quality to those produced by widely used
spectral methods (e.g., the Normalized Cut algorithm), even though it
is much faster. The second method is based on building a
probabilistic model of the graph, and has a complementary set of
advantages: while not as amenable to parallelization, it also
typically runs in time linear in the number of graph edges, and is
well-suited to extensions that incorporate differing clustering
criteria or outside information about node similarities. We also
discuss extensions to the methods for graphs associated with text
corpora. This is joint work with Frank Lin and Ramnath
Bio: William Cohen received his bachelor's degree in Computer Science from
Duke University in 1984, and a PhD in Computer Science from Rutgers
University in 1990. From 1990 to 2000 Dr. Cohen worked at AT&T Bell
Labs and later AT&T Labs-Research, and from April 2000 to May 2002 Dr.
Cohen worked at Whizbang Labs, a company specializing in extracting
information from the web. Dr. Cohen is President of the International
Machine Learning Society, an Action Editor for the Journal of Machine
Learning Research, and an Action Editor for the journal ACM
Transactions on Knowledge Discovery from Data. He is also an editor,
with Ron Brachman, of the AI and Machine Learning series of books
published by Morgan Claypool. In the past he has also served as an
action editor for the journal Machine Learning, the journal Artificial
Intelligence, and the Journal of Artificial Intelligence Research. He
was General Chair for the 2008 International Machine Learning
Conference, held July 6-9 at the University of Helsinki, in Finland;
Program Co-Chair of the 2006 International Machine Learning
Conference; and Co-Chair of the 1994 International Machine Learning
Conference. Dr. Cohen was also the co-Chair for the 3rd Int'l AAAI
Conference on Weblogs and Social Media, which was held May 17-20, 2009
in San Jose, and was the co-Program Chair for the 4rd Int'l AAAI
Conference on Weblogs and Social Media, which will be held May 23-26
at George Washington University in Washington, D. C. He is a AAAI
Fellow, and in 2008, he won the SIGMOD "Test of Time" Award for the
most influential SIGMOD paper of 1998.
Dr. Cohen's research interests include information integration and
machine learning, particularly information extraction, text
categorization and learning from large datasets. He holds seven
patents related to learning, discovery, information retrieval, and
data integration, and is the author of more than 180 publications.
University of Wisconsin-Madison
Harnessing Dozens of Languages for Robust Language Technology
The written word plays a greater role in human communication than at any point
in world history. As modern technology infrastructure spreads throughout the
world, the quantity of electronic text, written in hundreds of different
languages, continues to grow in size and diversity. While language processing
technologies have been steadily maturing for English, progress on most languages
has been slow, due to the paucity of data and research.
In this talk I will present my work on multilingual NLP. The key idea is that by
jointly modeling a broad array of languages, apparent ambiguities can be
resolved by building generic and universally plausible models of human language.
I will talk about the application of this idea to several longstanding problems
in NLP, including part-of-speech induction, computational decipherment of lost
languages, and morphological induction.
I will also present the new task of unsupervised grapheme-to-phoneme prediction
(as a first step towards robust and general decipherment methods). In this task,
we are given an unknown language written using a Latin alphabet, and must
predict the set of phonemes associated with each letter. By harnessing data from
over a hundred languages, we build a model which relates patterns of symbols in
text to plausible phonetic interpretations with high accuracy.
If time permits, I will describe some current work on childhood grammar and
language development using tools from machine translation.
Bio: Benjamin Snyder is an Assistant Professor at the University of
Wisconsin-Madison in the Department of Computer Sciences. His research interests include natural language
processing, machine learning, and cognitive science. Ben received a B.A in philosophy from the University
of Pennsylvania in 2003, and a Ph.D. in computer science from MIT in 2010. His dissertation, which focuses
on multilingual statistical models and the computational decipherment of lost languages, received the ACM
2010 Dissertation Award honorable mention.
Microsoft Research Asia
Computational Advertising: Challenges and Opportunities
Computational advertising is a newly emerged research discipline,
which studies the algorithms and theories for online advertising. Computational advertising lies
in the intersection of information retrieval, machine learning, and game theory. However, due to
its unique properties, the conventional technologies in the aforementioned areas might not be
sufficient to handle the new problems in computational advertising. New principles, models, and
theories need to be developed. In this talk, I will first give a brief introduction to online
advertising (mainly from a business perspective) and computational advertising (from a research
perspective). Then I will discuss the key differences between computational advertising and
information retrieval, machine learning, as well as game theory, followed by the proposal of
several new research directions, like game-theoretic machine learning and statistical game
theory. After that, I will introduce several on-going projects in my group along these directions,
including attractiveness-based ad click prediction, learning to auction, and data-driven
advertiser modeling. At the end of the talk, I will discuss the future evolution of computational
advertising as a research discipline, and online advertising as a business model.
Bio: Tie-Yan Liu is a lead researcher of Microsoft Research
Asia, leading the Internet Economics & Computational Advertising group. His research interests
include learning to rank, large-scale graph ranking, and Internet economics. So far, he has
authored two books, more than 70 journal and conference papers, and nearly 30 granted US /
international patents. He is the co-author of the best student paper for SIGIR (2008) and the most
cited paper for the Journal of Visual Communication and Image Representation (2004~2006). He is a
program committee co-chair of RIAO (2010), a demo/exhibit co-chair of KDD (2012), a track chair of
WWW (2011), an area chair of SIGIR (2008~2011) and AIRS (2009-2011), and a co-chair of several
workshops at SIGIR, ICML, and NIPS. He is an associate editor of ACM Transactions on Information
System (TOIS) and an editorial board member of several other journals including Information Retrieval
and ISRN Artificial Intelligence. He is a keynote speaker at PCM (2010) and CCIR (2011), a plenary
panelist of KDD (2011), and a tutorial speaker at several conferences including SIGIR and WWW. Prior
to joining Microsoft, he obtained his Ph.D. in electronic engineering from Tsinghua University. He is
a senior member of the IEEE and a member of the ACM.
German Research Ctr. for AI
Learning Relation Extraction Rules from Massive Data
The talk will report on an information extraction platform that
combines named-entity detection, generic parsing and statistical
confidence estimation for learning large sets of rules that can
extract instances of given n-ary relations from free texts.
For precision-critical applications that do not need to recognize all
mentions, supervised learning approaches often suffice.
For recall-critical applications, supervised learning usually misses
most of the notorious long tail of patterns. In order to improve
recall, two methods have been employed. One of them is minimally
supervised learning starting with a small set of examples as
semantic seed. More instances and rules are then learned by
bootstrapping. The other method is distantly supervised learning,
starting with a large set of examples serving as a massive seed. In my
talk I want to compare the two methods as alternatives on
the same relation extraction platform. On the basis of our
empirical findings, I will argue that at least for some relations,
supervision learning on the Web provides a better basis for
attacking the long tail. Both methods are faced with the problem of
learning many wrong rules, seriously damaging precision. I will
present three approaches to filtering incorrect rules: regular
confidence estimation, implicit negative information through
closed-world seed knowledge and negative information obtained
by the parallel rule-learning for multiple relations.
Bio:Hans Uszkoreit is Scientific Director and Head of the Language
Technology Lab at DFKI, and at the same also Professor of
Computational Linguistics and Computer Science at Saarland U. since
1988. He received his PhD in 1984 from the U. of Texas at Austin.
As a student he worked two years for the machine translation project
METAL. He later held research positions at Stanford U., SRI in
Menlo Park, and IBM Germany in Stuttgart. He is Past President of the
European Association of Logic, Language and Information,
Member of the European Academy of Sciences, the International
Committee for Computational Linguistics, the ELRA Board and various
advisory and editorial boards. Uszkoreit is also co-founder and board
member of several LT spin-off companies. Since 2009, he is
Coordinator of the European Network of Excellence META-NET with
currently 57 European research centers in 33 countries. His research
is documented in more than 150 publications in computational
linguistics, language technology and related fields. His current research
interests are information extraction, machine translation and other
language technology applications.
University of Texas at Austin
Latent Variable Models of Distributional Lexical Semantics
In order to respond to increasing demand for
natural language interfaces---and provide meaningful insight into user query
intent---fast, scalable lexical semantic models with flexible representations
are needed. Human concept organization is a rich phenomenon that has yet to be
accounted for by a single coherent psychological framework: Concept generalization
is captured by a mixture of prototype and exemplar models, and local taxonomic
information is available through multiple overlapping organizational systems.
Previous work in computational linguistics on extracting lexical semantic
information from the Web does not provide adequate representational flexibility
and hence fails to capture the full extent of human conceptual knowledge. In this
talk I will outline two probabilistic models that can account for some of the rich
organizational structure found in human language: (1) a background clustering model
of polysemy and (2) a hierarchical LDA-based approach to modeling concept organization.
These models can be used to predict contextual variation, selectional preference and
feature-saliency norms to a much higher degree of accuracy than previous approaches,
and have the potential for improving question answering, text classification, machine
translation, and information retrieval.
Bio:Joe Reisinger is a PhD candidate in the
Computer Science at the University of Texas at Austin. His research interests include
large-scale latent variable modeling, structured information extraction, lexical
semantics and econometric modeling. Joe was the recipient of the 2010 Google Research
Fellowship in NLP and previously held an NSF Graduate Research Fellowship. Prior to
joining UT, he worked at IBM T.J. Watson Research Center and IBM Yamato, and more
recently has completed several internships at Google Research in Mountain View.
LTI and Voci Technologies, Inc.
Distant Speech Recognition: No Black Boxes Allowed
A complete system for distant speech recognition (DSR) typically
consists of several distinct components. While it is tempting to isolate and optimize each component
individually, experience has proven that such an approach cannot lead to optimal performance.
In this talk, I will discuss several examples of the interactions between the individual components
of a DSR system. In addition, I will describe the synergies that become possible as soon as each
component is no longer treated as a ``black box''. To wit, instead of treating each component as
having solely an input and an output, it is necessary to "peel back the lid" and look inside. It
is only then that it becomes apparent how the individual components of a DSR system can be jointly
optimized to obtain the best possible performance.
Among the components I will discuss are:
1. The speaker tracking system used to estimate speakers' physical locations;
2. Beamforming required to combine several signals from a microphone array to emphasize desired
speech while suppressing noise and interference;
3. Postfiltering applied to the output of the beamformer for further enhancement;
4. The recognition engine, which turns an enhanced signal into a set of word hypotheses;
5. The speaker adaptation component for adapting to the individual characteristics of a given speaker.
I will also briefly discuss other necessary components, such as those required for detecting focus
of attention, and voice prompt suppression. All of these technologies will grow in importance as
DSR systems are deployed in automotive, robotics, and manufacturing applications, where automation
will be used to achieve cooperative, synergistic, man-machine interactions intended to accomplish
Bio:John McDonough has been doing research in automatic speech
recognition since 1993 when he joined BBN after completing his Master's at Rensselaer Polytechnic Institute
in 1992. In 1997 he returned to graduate school, and received his PhD under Fred Jelinek at Johns Hopkins
University in 2000. He then worked at the University of Karlsruhe and Saarland University, where he
established courses on distant speech recognition. John supervised all speech and audio technologies
research for the EU project CHIL, Computers in the Human Interaction Loop, and co-wrote a book on Distant
Speech Recognition during that time. Beginning in February 2010, John spent a year at Disney Research Pittsburgh
founding a research effort in distant speech recognition. Since January 2011, he has been a visiting scientist
at the Language Technologies Institute at Carnegie Mellon University. He also works with Voci Technologies,
Inc., a local CMU startup, where he applies finite-state transducer techniques to hardware-accelerated speech
Cybercasing the Joint: Language Technologies, Multimedia Retrieval, and Online Privacy.
In this talk, I present recent case studies that highlight
the potential for (multimedia) retrieval of online (social network)
data to support real-world attacks. Both language-based and
multimedia-based retrieval has rapidly emerged as a field with
highly useful applications in many different domains. Researchers from
different areas in signal processing and computer science have
invested significant effort into the development of convenient and
efficient retrieval mechanisms. While retrieval speed, flexibility,
and accuracy are still research
problems, this talk will demonstrate that they are not the only ones.
This talk aims to raise awareness for a rapidly emerging privacy
threat that we termed "cybercasing": leveraging information available
online to mount real-world attacks. Based on the initial example of
geo-tagging, I will show that while users typically realize that
sharing information, e.g., on social networks, has some implications
for their privacy, many users 1) are unaware of the full scope of the
threat they face when doing so, and 2) often do not even realize when
they publish such information. The threat is elevated by recent
developments that make systematic search for information (either
posted by humans or by sensors) and inference from multiple sources
easier than ever before. However, even with relatively high error
rates, retrieval techniques can be used effectively for different
real-world attacks by using "lop-sided" tuning; for example by
favoring low false alarm rates over high hit rates when scanning for
potential victims to attack. This talk presents a set of scenarios
demonstrating how easy it is to correlate data , especially those
based on location information, with corresponding publicly available
information for compromising a victim's privacy.
 G. Friedland, O. Vinyals, T. Darrell: "Multimodal Location
Estimation", Proceedings of ACM Multimedia 2010, pp. 1245-1251,
Florence, Italy, October 2010.
 H. Lei, J. Choi, A. Janin, and G. Friedland: "Persona Linking:
Matching Uploaders of Videos Accross Accounts", IEEE International
Conference on Acoustic, Speech, and Signal Processing (ICASSP),
Prague, May 2011.
 G. Friedland, R. Sommer: "Cybercasing the Joint: On the Privacy
Implications of Geotagging", Usenix HotSec 2010 at the Usenix Security
Conference, Washington DC, August 2010.
 Gerald Friedland, Gregor Maier, Robin Sommer, Nicholas Weaver:
Sherlock Holmes's Evil Twin: On The Impact of Global Inference for
Online Privacy, New Security Paradigms Workshop, Marin County, CA,
Bio:Dr. Gerald Friedland is a senior research scientist at the
International Computer Science Institute, a private lab affiliated
with the University of California, Berkeley, where he leads multimedia
content analysis research, mostly focusing on ("non-speech,
non-music") acoustic techniques as an aid for video analysis.
He is currently leading a group of 6 multimedia researchers supported
by NSF, DARPA, IARPA, and industry grants. Gerald has published more
than 100 peer-reviewed articles in conferences, journals, and books
and is currently authoring a new textbook on multimedia computing
together with Dr. Ramesh Jain. Gerald co-founded the IEEE
International Conference on Semantic Computing and is a proud founder
and program director of the IEEE International Summer School on
Semantic Computing at UC Berkeley. He is associate editor for ACM
Transactions on Multimedia Computing, Communications, and
Applications, is in the organization committee of ACM Multimedia 2011,
2012, and 2014. He is also serves as TPC Co-Chair of IEEE ICME 2012.
He is the recipient of several research and industry recognitions,
among them the European Academic Software Award and the Multimedia
Entrepreneur Award by the German Federal Department of Economics. Most
recently, he lead the team that won the ACM Multimedia Grand Challenge
in 2009. Gerald received his doctorate (summa cum laude) and master's
degree in computer science from Freie Universitaet Berlin, Germany, in
2002 and 2006, respectively.
Microsoft Research Redmond
Not Just for Kids: Enriching Information Retrieval with Reading Level Metadata
A document isn't relevant - at least, not immediately - if you can't
understand it, yet search engines have traditionally ignored the problem of
finding content at the right level of difficulty as an aspect of relevance.
Moreover, little is currently known about the nature of the Web, its users,
and how users interact with content when seen through the lens of reading
difficulty. I'll present our recent research progress in combining reading
difficulty prediction with information retrieval, including models, algorithms
and large-scale data analysis. Our results show how the availability of
reading level metadata - especially in combination with topic metadata - opens
up new and sometimes surprising possibilities for enriching search systems,
from personalizing Web search results by reading level to predicting user and
site expertise, improving result caption quality, and estimating searcher
This talk includes joint work with Paul N. Bennett, Ryen White, Susan Dumais,
Jin Young Kim, Sebastian de la Chica, and David Sontag.
Kevyn Collins-Thompson is a Researcher in the Context, Learning and User
Experience for Search (CLUES) group at Microsoft Research (Redmond). His
research lies in an area combining information retrieval, machine learning,
and computational linguistics, and focuses on models, algorithms, and
evaluation methods for making search technology more reliable and effective.
His recent work has explored algorithms and Web search applications for
reading level prediction; optimization strategies that reduce the risk of
applying risky retrieval algorithms like personalization and automatic query
rewriting; and educational applications of IR such as intelligent tutoring
systems. Kevyn received his Ph.D. and M.Sc. from the Language Technologies
Institute at Carnegie Mellon University and B.Math from the University of
Jointly Maximum Margin and Maximum Entropy Learning of Graphical Models
Graphical models (GMs) offer a powerful language to elegantly define expressive distributions,
and a generic computational framework to support reasoning under uncertainty in a wide range of problems. Popular paradigms for
training GMs include the maximum likelihood estimation, and more recently the max-margin learning, each enjoys some advantages,
as well as weaknesses. For example, the maximum margin structured prediction model such as M3N lacks a straightforward
probabilistic interpretation of the learning scheme and the prediction rule. Therefore its unique advantages such as support
vector sparsity and kernel tricks cannot be easily conjoined with the merits of a probabilistic model such as Bayesian
regularization, model averaging, and ability to model hidden variables.
In this talk, I present a new general framework called Maximum Entropy Discrimination Markov
Networks (MEDN), which integrates the margin-based and likelihood-based approaches and combines and extends their merits. This
new learning paradigm naturally facilitates integration of the generative and discriminative principles under a unified
framework, and the basic strategies can be generalized to learn arbitrary GMs, such as the generative Bayesian networks, models
with structured hidden variables, and even nonparametric Bayesian models, with a desirable maximum margin effect on structured
or unstructured predictions. I will discuss a number of theoretical properties of this approach, and show applications of MEDN
to learning a wide range of GMs including: fully supervised structured i/o model, max-margin structured i/o models with hidden
variables, a max-margin LDA-style model for jointly discovering 'discriminative' latent topics and predicting document
label/score of text documents, or total scene and objective categories in natural images, etc. Our empirical results strongly
suggest that, for any GM with structured or unstructured labels, MEDN always leads to a more accurate predictive GM than the
one trained under either MLE or Max Margin.
Joint work with Jun Zhu.
Bio: Dr. Eric Xing is an associate professor in the School of Computer Science at Carnegie Mellon
University. His principal research interests lie in the development of machine learning and statistical methodology; especially
for solving problems involving automated learning, reasoning, and decision-making in high-dimensional and dynamic possible
worlds; and for building quantitative models and predictive understandings of biological systems. Professor Xing received a
Ph.D. in Molecular Biology from Rutgers University, and another Ph.D. in Computer Science from UC Berkeley. His current work
involves, 1) foundations of statistical learning, including theory and algorithms for estimating time/space varying-coefficient
models, sparse structured input/output models, and nonparametric Bayesian models; 2) computational and statistical analysis of
gene regulation, genetic variation, and disease associations; and 3) application of statistical learning in social networks,
computer vision, and natural language processing. Professor Xing has published over 140 peer-reviewed papers, and is an associate
editor of the Annals of Applied Statistics, the IEEE Transaction of Pattern Analysis and Machine Intelligence (PAMI), the PLoS
Journal of Computational Biology, an Action Editor of the Machine Learning journal, and a member of the DARPA Information Science
and Technology (ISAT) Advisory Group. He is a recipient of the NSF Career Award, the Alfred P. Sloan Research Fellowship in Computer
Science, and the United States Air Force Young Investigator Award, and best paper awards in a number of premier conferences including
UAI, ACL, SDM, and ISMB.