Recent natural language processing(NLP) research has been increasingly focusing on deep learning methods and producing superior results on various NLP tasks. Deep NLP models are usually based on the dense vector representation of input and are able to automatically extract multi-scale features given human-annotated data. However, human annotations are expensive and often not evenly distributed across different languages, domains, genres, and styles.
This thesis focuses on multiple aspects of cross-language and cross-style mapping in text, addressing the limitations of existing methods and improving the state-of-the-art results when sufficient amounts of labeled data are not available. By developing both task-oriented transfer learning models (e.g., for class-languageclassification) and generic methods for mapping among embedded words or sentences, the key contribution of this thesis is a set of novel approaches to leveraging unlabeled text data for effective and efficient mapping across languages or styles.
Yiming Yan (Chair)
Ming Zhou (Microsoft Research Asia)