Skip to main content


We develop machine learning algorithms to understand human genome structure and function as well as their changes in different cellular conditions, during disease development, and across mammalian species. Our recent work includes machine learning architectures for integrative analysis and representation learning of 3D genome organization, single-cell epigenomics, gene regulation, and complex molecular interactomes. More recently, we have been exploring machine learning algorithms to integrate data from multiple modalities and methods for biomedical data that are interpretable and generalizable.


Current Projects and Interests
Genome Organization

Interphase chromosomes in higher eukaryotic cells are folded in the nucleus, leading to 3D genome organization. However, the principles of such complex organization and its functional impacts are poorly understood. Leveraging new genome-wide mapping technologies, we are developing new algorithms utilizing machine learning techniques to probe the 3D genome organization. (1) Revealing genome-wide compartmentalization patterns relative to multiple nuclear bodies. (2) Understanding the principles of spatial genome organization and its impact on gene regulation. (3) Probing chromatin interaction patterns in single cell and single nucleus resolution.
[See selected publications in 3D genome]

Single-cell Epigenomics

The compositions of different cells in various human tissues remain poorly understood. New single-cell assays such as single-cell Hi-C and the spatial transcriptomics based on multiplexed imaging and sequencing can reveal dififerent aspects of individdual cells within the tissue context. However, the computational methods to fully utilize such new data are lagging behind. We are developing new machine learning frameworks for: (1) Modeling spatial transcriptome data to express the interplay of spatial and intrinsic factors that comprise cell identity. (2) Integration of multimodal single-cell assays to reveal the cellular heterogeneity.
[See selected publications in single-cell epigenomics]

Comparative Genomics and Cancer Genomics

Ultimately, human biology must be understood in the context of evolution. The whole genome sequences of different species will provide us with unprecedented opportunities to elucidate the trajectory of genome evolution and gene regulation variations that result in phenotypic diversity. We are working on new algorithms to facilitate the advances of a few key areas in comparative genomics and cancer genomics. (1) Discovering gene regulatory elements and their functional roles in the human genome. (2) Understanding the 3D genome evolution in mammals. (3) Discovering key perturbations in transcriptional regulation and epigenome in cancer.
[See selected publications in comparative genomics] [cancer genomics]

Generalizable Machine Learning Methods for Biomedical Data

There is a major gap between the cohesive integration of multimodal biomedical datasets and utilization of the full potential of machine learning technologies. We are exploring novel and generic machine learning algorithms for various biomedical data. We have particular interest in high-dimensional graphical models and new representation learning architectures. We recently developed rotation equivariant and invariant neural networks and state-of-the-art architectures for hypergraph representation learning. Importantly, we are developing methods that are interpretable, generalizable, and ethics-aware.
[See selected publications in machine learning]

Research Support