During wound healing and cancer metastasis, cells are frequently observed to migrate together in collective groups. Collective cell migration involves a complex interplay between growth factors and cell-cell interactions. The mechanisms that govern the collectiveness of epithelial cell migration remain poorly defined. We are interested in how different cell migration guiding signals are produced and sensed to yield distinct cell movement behavior. Using an array of live cell fluorescent biosensors and mathematical modeling, we are identifying the molecular mechanisms that may explain the collectiveness of epithelial sheet movement in response to growth factor stimulation. We showed that spatial constrained growth factor signaling at the leading edge of an epithelial sheet enables wound directed collective migration. Cell adherens junctions restrain individual cell migration but promote growth factor-induced collective migration. Disruption of the actin cytoskeleton or loss of adherens junctions activates autocrine EGFR signaling via TACE and elevated autocrine signaling increases cell proliferation and motility. The relevance of these findings for designing novel therapeutic approaches will be discussed.   


Understanding the human genome sequence and in particular the vast non-coding regions is a central challenge for modern molecular biology with profound implications towards understanding the genetic basis of disease. In this talk I will survey multiple different computational approaches that I have developed for better understanding the non-coding genome. I will first describe a method, ChromHMM, that learns de novo combinatorial and spatial patterns from maps of multiple epigenetic marks using a multivariate hidden Markov model (HMM). These patterns correspond to different classes of genomic elements, which I have then used to provide cell type specific annotations of the human genome. I will then describe a method, ChromImpute, to impute maps of epigenetic marks that I have applied in the context of the Roadmap Epigenomics project to computationally predict over 4000 epigenomic datasets vastly accelerating the coverage of the human epigenome while providing overall more robust maps than have been obtained experimentally. I will then describe a combined computational modeling and experimental approach, Sharpr-PRA, that in high-throughput can test putative regulatory elements of interest identified based on epigenomics patterns and identify within them at high resolution bases activating or repressing gene expression. Finally, I will describe a new method, ConsHMM, also based on a multivariate HMM to annotate the human genome at single nucleotide resolution into a large number of different conservation states based on the combinatorial patterns of which species align to and which match the human reference genome within a multi-species sequence alignment.

Faculty Host: Russell Schwartz

Prions are a paradigm-shifting mechanism of inheritance in which phenotypes are encoded by self-templating protein conformations rather than nucleic acids. We examined the breadth of protein-based inheritance across the yeast proteome by assessing the ability of nearly every open reading frame to induce heritable traits. Transient overexpression of nearly 50 proteins created traits that remained heritable long after their expression returned to normal. These traits were beneficial, had prion-like patterns of inheritance, were common in wild yeasts, and could be transmitted to naive cells with protein alone. Most inducing proteins were not known prions and did not form amyloid. Instead, they are highly enriched in nucleic acid binding proteins with large intrinsically disordered domains that have been widely conserved across evolution. Thus, our data establish a common type of protein-based inheritance through which intrinsically disordered proteins can drive the emergence of new traits and adaptive opportunities.



Using genome-wide data for phylogenetic inference and analysis has become commonplace in the post-genomic era, giving rise to the field of phylogenomics. The multispecies coalescent (MSC) model has emerged as the main stochastic process that helps capture the intricate relationship between species trees and gene trees.  Combined with models of sequence evolution, the MSC can be viewed as a generative model of genomic sequence data in the context of a (species) phylogenetic tree.

A significant outcome of the use of genome-wide data has been the increasing evidence, or hypotheses, of reticulation (e.g., hybridization) during the evolution of various groups of eukaryotic species. Reticulate evolutionary histories are best represented as phylogenetic networks, which extend the tree model to allow for admixtures of genetic material. In this talk, I will describe the multispecies network coalescent (MSNC) model, which extends the MSC model so that it operates within the branches of a phylogenetic network. This extended model naturally allows for modeling vertical and horizontal evolutionary processes acting within and across species boundaries. In particular, it simultaneously accounts for gene tree incongruence across loci due to both hybridization and incomplete lineage sorting. I will then describe a likelihood function for this model, as well as a method for Bayesian sampling of phylogenetic networks and their parameters using reversible-jump Markov chain Monte Carlo (RJMCMC). All the methods I describe have been implemented in our open-source software package, PhyloNet, which is publicly available at http://bioinfo.cs.rice.edu/phylonet.

Luay Nakhleh is Professor and Chair of the Computer Science Department at Rice University. He received the B.Sc. degree in Computer Science from the Technion (Israel), the Master’s degree in Computer Science from Texas A&M University, and the Ph.D. degree in Computer Science from The University of Texas in Austin. He conducts research in the areas of bioinformatics and computational biology, focusing mainly on questions in evolutionary biology. Luay is a recipient of the DoE CAREER award, the NSF CAREER award, the Sloan Fellowship, and the Guggenheim Fellowship.

We present a generative probabilistic approach to discovery of disease subtypes determined by the genetic variants. In many diseases, multiple types of pathology may present simultaneously in a patient, making quantification of the disease challenging. Our method seeks common co-occurring image and genetic patterns in a population as a way to model these two different data types jointly. We assume that each patient is a mixture of multiple disease subtypes and use the joint generative model of image and genetic markers to identify disease subtypes guided by known genetic influences. Our model is based on a variant of the so-called topic models that uncover the latent structure in a collection of data. We derive an efficient variational inference algorithm to extract patterns of co-occurrence and to quantify the presence of heterogeneous disease processes in each patient. We evaluate the method on simulated data and illustrate its use in the context of Chronic Obstructive Pulmonary Disease (COPD) to characterize the relationship between image and genetic signatures of COPD subtypes in a large patient cohort. 

 About the Speaker


The bulk of translational cancer research to date has focused on somatic mutations in protein coding regions to identify putative oncogenic drivers. However, recent studies have shown that enhancer activity plays an important role in specifying and maintaining oncogenic cell state. Here, we present a mapping and analysis of the transcriptional cell state of acute myeloid leukemia (AML) via the H3K27Ac landscape, gene expression, and somatic mutations from 62 AML patients. The goal of this work is to identify the recurrent enhancer drivers of oncogenic cell states and translate that knowledge of the epigenome to discover novel therapeutic opportunities.

Through a computational deconvolution of enhancer maps, we identify 6 epigenomically defined patient subtypes of AML. We demonstrate that while certain genetic lesions, such as MLL translocations and NPM1 mutations, do correlate with these subtypes, the epigenome provides a novel stratification of patients that is not fully specified by combinations of mutations. We develop a novel scoring of myeloid differentiation based on the enhancer landscape of healthy cells and use this score to show that enhancer subtypes are associated with the differentiation state of the underlying AML blasts. Enhancer subtypes are also clinically relevant as they are predictive of divergent overall survival, varying from a median overall survival of 9.2 months to a median overall survival that was not reached in our cohort. By using individual enhancer activity as a novel biomarker, we are able to predict the effect of existing therapies on cell line models. Finally, a network analysis of the super-enhancers underlying the patient subtypes suggests that one subtype of AML is specified in part by enhancer activation of the retinoic acid receptor alpha gene (RARA), and we demonstrate that RARA enhancer strength in cell-line and patient-derived xenograft models is predictive of response to a first-in-class selective RARα agonist, SY-1425 (tamibarotene). Taken together these findings highlight the importance and utility of understanding the enhancer landscape for patient stratification and the development of novel therapies.

In addition to presenting his research on cancer epigenomics, Matthew will provide his perspective on careers in Comp Bio research in the pharmaceutical industry.

Dr. Matthew Eaton is a principal scientist at Syros Pharmaceuticals, where he manages a computational biology translational research team. He earned his undergraduate degree in Computer Science and Philosophy from Wesleyan University. He completed a PhD in Computational Biology at Duke University working in the laboratory of Dr. David MacAlpine. He conducted postdoctoral research at MIT under Dr. Manolis Kellis developing methods to integrate diverse types of epigenomic data.


Despite rapid progress in the understanding and treatment of disease over the course of the past 100 years, diagnosis and treatment of cancer has become a focal point for basic science research. As a result, advances have been made in quantifying the myriad changes in tumor genomes, transcriptomes, epigenomes, and metagenomes as compared to healthy tissue. 

Specific to the work of this thesis, technical advances have led to more robust quantification of RNA expression states via RNA-seq, and DNA copy number quantification via DNA-seq. These approaches allow for the measurement of the state of tens of thousands of genes in a sample.  Moreover, the enhanced quantification has led to understanding the existence of heterogeneity among tumors.

Thesis Committee:
Russell Schwartz (Advisor)
Adrian Lee
Robin Lee
Jessica Zhang
Jian Ma

Guardant Health is a late-stage startup in Redwood City focusing on analysis of circulating tumor DNA (ctDNA) in blood, using optimized laboratory assays, vast data sets, and advanced analytics. The Guardant360 assay covers 73 cancer genes and is now the most widely ordered comprehensive liquid biopsy test used to assess treatment options for patients with advanced stage solid tumors. It utilizes a proprietary digital sequencing approach with molecular barcoding for error correction, resulting in highly sensitive and specific detection of mutations down to one or two molecules. An in-house bioinformatics pipeline was developed to detect and report somatic point mutations, short insertions/deletions, copy number amplifications and gene rearrangements.

To date, Guardant360 has been run on over 35,000 patient samples, enabling ongoing research and development using machine learning and other mining techniques to enhance detection capabilities and understand cancer genomics. An initial study of 15,000 samples revealed that the landscape of somatic mutations in ctDNA is concordant with that of large tissue sequencing studies such as TCGA. However, ctDNA data is representative of later stage cancer than TCGA and, as such, reveals insights into drug resistance and mutational heterogeneity.

Additional assays are currently being developed at Guardant Health, including Project LUNAR – an effort to apply the technology to early cancer detection and recurrence monitoring, as well as GuardantOMNITM – a 500-gene panel developed in partnership with large pharma for immuno-oncology applications. These assays will broaden the gene content and tumor representation in Guardant’s database and allow for further large-scale analyses.

Using these data, Guardant is actively researching ctDNA fragmentation patterns, which exhibit a non-random distribution of length and placement due to the nature of cleavage around nucleosomes. Additional signal can be gained from mining these fragmentation patterns across tumor types in the largest available cohort of cancer patients to uncover chromatin dynamics and enhance detection sensitivity.

Guardant Health is tackling some of the most impactful problems across cancer care and genomics using advanced technology and analytics. By continuing to leverage the data collected, Guardant aims to innovate methodologies and discoveries in this field.


What makes each species unique? My research aims to understand how changes in DNA sequences translate into the evolution of phenotypes. My approach consists in modeling the evolution of molecular networks that mediate genotype-phenotype relationships. I will describe findings related to the evolution of mutations in protein-coding genes and in non-genic sequences, as well as a special case where mutations in non-genic sequences give rise to novel, species-specific, protein-coding genes.

About the Speaker

Genome-wide association studies (GWAS) have linked hundreds of common germline variants to inherited predisposition for specific cancers. However, determining the precise biological mechanism by which these loci lead to cancer susceptibility has proven challenging. More recently, there have been reports of specific germline haplotypes that increase the probability that a tumor acquires a specific mutation, but few cancer GWAS thus far have collected both germline and tumor genomes. Using matched germline and tumor genomic data for nearly 6000 The Cancer Genome Atlas (TCGA) patients, it was possible to systematically screen for and validate 412 associations between germline loci and tumor site as well as for a subset of common tumor genotypes involving known cancer genes. By this approach, we sought to evaluate the extent to which the germline influences where and how tumors develop. Among germline-somatic interactions, we found germline variants in RBFOX1 that increase incidence of SF3B1 somatic mutation by eight-fold via functional alterations in RNA splicing. Similarly, 19p13.3 variants were associated with a four-fold increased likelihood of somatic mutations in PTEN. In support of this association, we found that PTEN knock-down sensitized the MTOR pathway to high expression of the 19p13.3 gene GNA11. Finally, we observed that stratifying patients by germline polymorphisms exposes distinct somatic mutation landscapes, implicating new cancer genes. These associations, obtained by comparing similar tumors with distinct genomic characteristics, provide a new perspective on cancer risk by tying the germline locus to a specific event in the tumor. The identified interactions suggest much more specific hypotheses about how a particular germline locus contributes to disease, thereby providing new clues to unravel the biology underlying inherited cancer risk. Our work contributes to accumulating evidence that the germline biases the emergence of specific tumor genotypes suggests that it may be possible to predict how an individual’s tumor will develop, potentially allowing a shift from reactionary approaches toward more proactive approaches for planning therapeutic strategies.

The Carter Lab is a bioinformatics and computational biology lab focused on developing strategies to 1) model the impact of somatic mutations on intracellular biological processes, 2) identify genetic variants that contribute to disease predisposition, 3) quantify the influence of germline polymorphism on somatic tumor phenotypes, 4) investigate the biological networks by which cancer cells transduce information about their environment and 5) inform precision cancer therapy from genomic data.

Host: Anne-Ruxandra Carvunis


Subscribe to CBD