Language Technologies Thesis Proposal
- Newell-Simon Hall
- SUBHODEEP MOITRA
- Ph.D. Student
- Language Technologies Institute
- Carnegie Mellon University
Feature Learning and Graphical Models for Protein Sequences
Machine learning methods rely heavily on using and learning good features. We study three problems in the context of protein sequences:
(1) drug cocktail design
(2) studying allostery in GPCRs and
(3) generative modeling of protein families.
We show that the core challenges underlying each of these tasks relates to effective feature selection, feature interpretation and feature learning, respectively.
We address the drug cocktail design problem by providing solutions for feature selection in the context of large scale data. We employ structure learning in markov random fields and interpret the features learned from a biological perspective for studying allostery in GPCRs. We investigate deep architectures for unsupervised feature learning of latent representations in protein families. We show preliminary results using Restricted Boltzmann Machines (RBMs). We propose to build a deep architecture from RBMs using Deep Boltzmann Machines (DBMs). Additionally, we propose Locally Connected Deep Boltzmann Machines (LC-DBMs) employing sparse structure learning for trading off model agnosticism with prior knowledge.
Chris Langmead (Chair)
Hetunandan Kamisetty (Facebook)