Understanding the human genome sequence and in particular the vast non-coding regions is a central challenge for modern molecular biology with profound implications towards understanding the genetic basis of disease. In this talk I will survey multiple different computational approaches that I have developed for better understanding the non-coding genome. I will first describe a method, ChromHMM, that learns de novo combinatorial and spatial patterns from maps of multiple epigenetic marks using a multivariate hidden Markov model (HMM). These patterns correspond to different classes of genomic elements, which I have then used to provide cell type specific annotations of the human genome. I will then describe a method, ChromImpute, to impute maps of epigenetic marks that I have applied in the context of the Roadmap Epigenomics project to computationally predict over 4000 epigenomic datasets vastly accelerating the coverage of the human epigenome while providing overall more robust maps than have been obtained experimentally. I will then describe a combined computational modeling and experimental approach, Sharpr-PRA, that in high-throughput can test putative regulatory elements of interest identified based on epigenomics patterns and identify within them at high resolution bases activating or repressing gene expression. Finally, I will describe a new method, ConsHMM, also based on a multivariate HMM to annotate the human genome at single nucleotide resolution into a large number of different conservation states based on the combinatorial patterns of which species align to and which match the human reference genome within a multi-species sequence alignment.
Faculty Host: Russell Schwartz