The ultimate goal of generative modeling is to model the probability of the world, either implicitly or explicitly. In practice, researchers devise models to estimate the probability of data, often unlabeled in its natural form, such as text corpora and images. Generative modeling not only serves as a bridge towards characterizing and understanding the world from a probabilistic perspective, but also has a benefit of learning transferable features from unlabeled data. This thesis proposes novel deep learning architectures for generative modeling, along with semi-supervised learning algorithms that leverage generative modeling on unlabeled data to improve performance on downstream tasks.
Specifically, the thesis consists of two parts—better architectures to improve generative modeling, and applications of generative modeling in semi-supervised learning. In the first part, we identify an expressiveness bottleneck of prior neural language models, and propose a high-rank language model called the Mixture of Softmaxes (MoS) to break through such bottleneck. We later propose a faster high-rank language model that trains faster than MoS while maintaining the capacity to break the bottleneck. In the second part, we present four semi-supervised learning algorithms based on generative approaches, including generating low-density adver- sarial samples, generating natural language questions given the context, generating random walk paths on a graph, and language modeling.
Ruslan Salakhutdinov, (Chair)
William W. Cohen, (Co-Chair)
Jason Weston, (Facebook AI Research)
Copy of Thesis Proposal Document