Statistical machine translation has been very successful, resulting in a thriving industry highlighted by products like Google Translate. Yet translation systems still often fail to capture many linguistic phenomena because they model translation as simple substitution and permutation of word tokens, sometimes informed by syntax. Formally, these models are weighted relations on regular or context-free sets, a poor fit for many languages. But over the last several decades, computational linguists have developed more expressive mathematical models of language that exhibit high empirical coverage of annotated language data, correctly predict a variety of important linguistic phenomena in many languages, explicitly model semantics, and can be processed with efficient algorithms. I will discuss formal problems that arise in the application of these models to translation, and their solutions. I will focus on combinatory categorial grammar (CCG), but don't worry if you don't know what that is -- I will tell you everything you need to know.
Adam Lopez works on problems in computational linguistics, algorithms, formal language theory, and machine learning, with applications to problems in natural language processing, particularly machine translation. He is currently an assistant research professor at Johns Hopkins University. This fall he will join the faculty at the University of Edinburgh.
Faculty Host: Chris Dyer
jlentz [atsymbol] cs ~replace-with-a-dot~ cmu ~replace-with-a-dot~ edu