Although all cells in our body contain the same genetic material in their DNA, they can perform vastly different functions, by selectively expressing subsets of their genes. Cell-type-specific gene regulation is achieved through an interplay between regulatory proteins, such as transcription factors, and epigenetic mechanisms, which affect the higher level organization of the genome. The interactions between these regulatory components and their dependence on DNA sequence information are only partially characterized. A better understanding of condition-specific regulatory mechanisms is important for understanding the causes of genetic diseases and for identifying potential targets for intervention.
Using computational techniques to analyze the variability in epigenetic state across genomic contexts and individuals, we were able to highlight, probably for the first time, the extensive plasticity of the epigenetic landscape. This work demonstrated the link between genetic and epigenetic variability, but also showed that the effects of this variation on gene regulation are highly combinatorial. Motivated by these results, we developed a novel machine learning framework for discovering cell-type-specific rules of regulation based on both the expression patterns of regulators and DNA sequence information. Unlike previous work in this field, our method incorporates the effect of the cell-type-specific activity of distal regulatory elements, such as enhancers, and takes advantage of prior knowledge regarding protein interactions. Using large-scale datasets from the Roadmap Epigenomics and ENCODE Projects, we constructed a regulatory map of a large number of human tissues. Our model achieves high predictive power and discovers both known and novel cell-type-specific regulators and context-specific interactions between them.