Language Technologies Ph.D. Thesis Defense
- Gates Hillman Centers
- Traffic21 Classroom 6501
- AUSTIN MATTHEWS
- Ph.D. Student
- Language Technologies Institute
- Carnegie Mellon University
Linguistic Knowledge for Neural Language Generation and Machine Translation
Recurrent neural networks are exceptionally good models of distributions over natural language sentences, and are deployed in a wide range of applications. However, RNNs are general-purpose function learners, capable of representing any distribution, whereas the space of possible natural languages is narrowly constrained. This thesis uses insights from linguistic theories that characterize these constraints to inform the neural architectures used for language modelling, seeking models that make more effective use of limited amounts of data. Since linguistic theories are incomplete, we develop models that are able to exploit linguistic knowledge while still retaining the generality of the neural models they augment.
First, we introduce a language model that captures sub-word morphological processes via analyzers hand-crafted by linguistic experts. Our model uses the raw word-, character- and morpheme-levels to encode and condition on previous words and to construct its predicted next word. It is thus fully open vocabulary, capable of producing any token admitted by a language's alphabet. These properties make it ideal for modelling languages with potentially unbounded vocabulary size, such as Turkish and Finnish.
Second, we present a pair of dependency-based language models, leveraging the hierarchical processes that construct sentences from words. Our models construct syntax trees either top-down or bottom-up, jointly learning language modelling and parsing. We find that these models make good parsers, but that dependencies are less effective than phrase-structure trees for modelling language.
Third, we apply syntax to machine translation, where data scarcity necessitates sample efficient models. We develop a fully neural tree-to-tree translation system and ablate the model, demonstrating the effects of a syntax-based encoder and decoder separately. We find that source-side syntax promising and show that inference under neural models is trapped in a local optimum wherein biased models perversely synergize with poor inference procedures.
Chris Dyer (Chair)
Alon Lavie (Unbabel)
Jonathan May (University of Southern California, Information Sciences Institute)