FacebookTwitterGoogle PlusRSS News Feed

Language Technologies Thesis Proposal

Thesis Proposals
Ph.D. Student
Language Technologies Institute
Carnegie Mellon University
Feature Learning and Graphical Models for Protein Sequences
Tuesday, April 29, 2014 - 12:30pm
1507 
Newell-Simon Hall
Abstract:

Machine learning methods rely heavily on using and learning good features. We study three problems in the context of protein sequences:

(1) drug cocktail design
(2) studying allostery in GPCRs and
(3) generative modeling of protein families.

We show that the core challenges underlying each of these tasks relates to effective feature selection, feature interpretation and feature learning, respectively.

We address the drug cocktail design problem by providing solutions for feature selection in the context of large scale data. We employ structure learning in markov random fields and interpret the features learned from a biological perspective for studying allostery in GPCRs. We investigate deep architectures for unsupervised feature learning of latent representations in protein families. We show preliminary results using Restricted Boltzmann Machines (RBMs). We propose to build a deep architecture from RBMs using Deep Boltzmann Machines (DBMs). Additionally, we propose Locally Connected Deep Boltzmann Machines (LC-DBMs) employing sparse structure learning for trading off model agnosticism with prior knowledge.

Thesis Committee:
Chris Langmead (Chair)
Jaime Carbonell
Bhiksha Raj
Hetunandan Kamisetty (Facebook)

Thesis Proposal Document

Keywords:
For More Information, Please Contact:

staceyy [atsymbol] cs ~replace-with-a-dot~ cmu ~replace-with-a-dot~ edu