Skip to main content


We develop algorithms, especially machine learning methods, to study the structure and function of the human genome and cellular organization and their implications for evolution, health and disease. We have made contributions to the development of methods for genome sequence comparisons. Our recent work has contributed new high-dimensional probabilistic models and new representation learning architectures for studying nuclear genome organization and its evolution, single-cell epigenomics, and spatial transcriptomics. We have also explored interpretable and generalizable machine learning algorithms for integrating multimodal biomedical data. We have a keen interest in developing large-language models (LLMs) for genomes and cells.


Current Projects and Interests
Genome Organization

Interphase chromosomes in higher eukaryotic cells are organized in a complex 3D structure in the nucleus, but the principles behind this organization and its functional impacts are not well understood. Our research leverages new genome-wide mapping technologies and machine learning algorithms to study 3D genome organization and its impact on gene regulation. We aim to: (1) Uncover genome-wide compartmentalization patterns relative to multiple nuclear bodies. (2) Investigate the principles of spatial genome organization and its effects on gene regulation. (3) Explore chromatin interaction patterns at single-cell resolution and their effect on cell type-specific function.
[See selected publications in 3D epigenome]

Single-cell Epigenomics

The compositions of different cells in various human tissues remain poorly understood. Emerging single-cell epigenomic assays such as single-cell Hi-C and spatial transcriptomics based on multiplexed imaging and sequencing can provide insight into individual cells within the tissue context. However, computational methods to fully utilize these new data are lagging behind. Our research focuses on developing machine learning frameworks for: (1) Modeling spatial transcriptome data to understand the interplay of intrinsic and spatial factors that contribute to cell identity. (2) Integrating multimodal single-cell assays to uncover cellular heterogeneity.
[See selected publications in single-cell biology]

Comparative Genomics

Ultimately, human biology must be understood in the context of evolution. Our research focuses on utilizing whole genome sequences of various species to study genome and epigenome evolution, gene regulation, and their impact on phenotypic diversity in the context of human biology. We aim to advance comparative genomics and cancer genomics through the development of new algorithms to: (1) Uncover gene regulatory elements and their functions. (2) Study 3D epigenome evolution in mammals. (3) Discover crucial disruptions in transcriptional regulation and the epigenome in human disease such as cancer, informed by insights from genome evolution.
[See selected publications in comparative genomics]

Generalizable Machine Learning Methods for Biomedical Data

We aim to close the gap between the integration of multimodal biomedical datasets and the full potential of machine learning by exploring novel algorithms for various biomedical contexts. Our focus is on high-dimensional probabilistic models and new representation learning architectures. Our methods prioritize interpretability, generalizability, and ethical considerations. We are also working on large-language models (LLMs) for studying genomes and cells.
[See selected publications in machine learning]

Research Support