LTI Thesis Proposal
- Gates&Hillman Centers
- Reddy Conference Room 4405
- MEGHANA KSHIRSAGAR
- Language Technologies Institute
- Carnegie Mellon University
Combine and Conquer: Transfer Learning Approaches for Modeling Diseases
Proteins are the workhorses of the cellular machinery. Disease causing pathogens such as bacteria and viruses introduce their proteins into the host cells. There they interact with the host's proteins and enable the pathogen to obtain nutrients, replicate and survive inside the host. Systems biology based approaches that study such infectious diseases use biochemical experiments to analyze these molecular-level interactions. Often, multiple diseases involve organisms that are related phylogenetically or share some biological properties. For instance, viruses that cause similar diseases will employ similar strategies to infect the host cells. Therefore, knowledge can be shared across these experimental studies to better understand various biological phenomena.
From a computational perspective, the performance of the algorithmic and statistical methods that model this data can be improved if they are made "aware" of this underlying thread of similarity that runs across multiple studies. This transfer of knowledge will not only overcome the data scarcity issues for sparsely studied diseases but also help analyze commonalities and differences between various diseases at a macro-level. More importantly, newly arising diseases where we have no data available can be scrutinized in this joint framework to derive important initial understanding.
The proposed dissertation develops and applies transfer-learning approaches in order to combine the knowledge from experimental studies of several diseases in order to build stronger predictive models. To integrate data across diseases, we develop a task-regularization based method where the regularizer is derived from biological hypotheses relating the diseases. For diseases with zero available data, we build models based on instance re-weighting and kernels. We also explore a method for imputation of features that infers values by sampling feature functions from other similar diseases. We demonstrate the utility of these methods in generating better host-pathogen protein-protein interaction predictions.
Jaime G. Carbonell (Chair)
Gunnar Raetsch (Sloan-Kettering)