Language Technologies Thesis Proposal
- Gates Hillman Centers
- AUSTIN MATTHEWS
- Ph.D. Student
- Language Technologies Institute
- Carnegie Mellon University
Linguistic Knowledge for Neural Language Generation and Machine Translation
Recurrent neural networks (RNNs) are exceptionally good models of distributions over natural language sentences, and they are deployed in a wide range of applications that require the generation of natural language outputs. However, RNNs are general-purpose function learners that, given sufficient capacity, are capable of representing any distribution, whereas the space of possible natural languages is narrowly constrained. Linguistic theory has been concerned with characterizing these constraints, with a particular eye toward explaining the uniformity with which children acquire their first languages, despite receiving relatively little linguistic input. This thesis is uses insights from linguistic theory to inform the neural architectures and generation processes used to model natural language, seeking models that make more effective use of limited amounts of training data. Since linguistic theories are incomplete, a central goal is developing models that are able to exploit explicit linguistic knowledge while still retaining the generality and flexibility of neural network models they augment.
In particular, this thesis focuses on modeling word formation using linguistic knowledge about morphological processes in the form of finite state transducers and syntactic theories that construct sequences of words as the outputs of hierarchical branching processes. We present a model capable of conditioning and emitting words at several levels of granularity, including the raw word-, character-, and morpheme-levels. We further present a model that generates sentences using hierarchical structure, jointly learning language modelling and parsing. We evaluate each model on several NLP tasks, and combine them together and condition on a foreign-language input sentence to create a linguistically-aware neural machine translation system that excels at translating into traditionally difficult languages with complex word formation paradigms and very different syntax than English.
The major contributes of this thesis are as follows: (i) an morphologically-aware open-vocabulary language model (ii) a dependency-based language model for generation and parsing
(iii) a syntax-aware attention mechanism for machine translation (iv) an MT system incorporating all of the above.
Chris Dyer (Chair)
Jonathan May (ISI)