Language Technologies Thesis Defense

  • Ph.D. Student
  • Language Technologies Institute
  • Carnegie Mellon University
Thesis Orals

Empowering Probabilistic Inference with Stochastic Deep Neural Networks

Probabilistic models are powerful tools in understanding real world data from various domains, including images, natural language texts, audios and temporal time series. Often more flexible and expressive probabilistic models are preferred for accurate modeling, however the difficulty for effective model learning and inference arises accordingly due to increasingly more complex probabilistic model architectures. Meanwhile, recent advances in deep neural networks for both supervised and unsupervised learning have shown prominent advantages in learning flexible deterministic mappings, compared to traditional shallow models. Integrating deep neural networks into probabilistic modeling thus becomes an important research direction. Though existing works have opened the door of modeling stochasticity in data with deep neural networks, they may still suffer from limitations, such as a) the family of distributions that can be captured for both modeling and inference is limited, b) probabilistic models for some important discrete structures, such as permutations, have not yet been extensively studied; and c) applications to discrete and continuous sequential data modeling, such as natural language and time series, could still use significant improvements.

In this thesis, we propose simple yet effective methods to address the above limitations of incorporating stochastic deep neural networks for probabilistic modeling. Specifically, we propose: a) to enrich the family of distributions used for probabilistic modeling and inference, b) to define probabilistic models over certain important discrete structures and to demonstrate how learning and inference could be performed over them; and c) to develop significantly better probabilistic models in both discrete and continuous sequential data domains, such as natural languages and continuous time series. Experimental results have demonstrated the effectiveness of the proposed approaches.

Thesis Committee:
Yiming Yang (Co-Chair)
Jaime Carbonell (Co-Chair)
Pradeep Ravikumar
John Paisley (Columbia University)

Copy of Thesis Document

For More Information, Please Contact: