Time series expression data presents an opportunity to watch (and analyze) gene regulatory programs as they unfold. Here we address three problems in the realm of modeling dynamic gene regulation. We develop a novel set of modeling algorithms, using an Input-Output Hidden Markov Model (IOHMM) framework to build models of regulatory activity.
The first problem we address is combinatorial regulation. Genes are often combinatorially regulated by multiple transcription factors (TFs). Such combinatorial regulation plays an important role in development and facilitates the ability of cells to respond to different stresses. We present a new method called cDREM, capable of reconstructing dynamic models of combinatorial regulation. cDREM integrates time series gene expression data with (static) protein interaction data. The method is based on a hidden Markov model and utilizes the sparse group Lasso to identify small subsets of combinatorially active TFs, their time of activation and the logical function they implement.
The second problem is the modeling of multiple dynamic regulatory networks from multiple time series expression experiments. It is now possible to measure a patient's gene expression during the course of a treatment. We wish to identify groups of patients with similar regulatory activity, with the expectation that this will relate to disease progression and treatment outcome. We present here a method called SMARTS that can be used to cluster patients based on the similarity of the regulatory program they are expressing, and then identify TFs which may be differentially active between the groups.
In SMARTS each dynamic regulatory model we build is created from a set of individual time series. Our third aim is to extend this technique to use sets of single cell gene expression experiments as the input to a regulatory model. We present a novel technique, SCAREDY-CAT, which is able to create such models. We use this technique to analyze the differentiation of lung epithelial cells, and show that we can reconstruct the structure of lung epithelium differentiation in an unsupervised manner.
We tie these methods together with the release of a software package that allows interested (non-technical) users to use our methods. By developing methods for understanding the regulatory dynamics present in time series data, we enable the discovery of regulatory relationships that help us understand biological systems and mechanisms underlying disease.
Ziv Bar-Joseph (Advisor)
Zoltan Oltvai (University of Pittsburgh)
Naftali Kaminski (Yale School of Medicine)