About

My current website is here.

Hello, I got my Ph.D. in Computer Science from Carnegie Mellon University. My research focuses on machine learning in cancer genomics and computational healthcare. I was fortunate to be advised by Prof. Russell S. Schwartz. In my first two years at CMU, I collaborated with Dr. William W. Cohen and Prof. Xinghua Lu. Prior to that, I worked with Prof. Jianyang Zeng during my undergrad. I hold a master's degree in machine learning from CMU and a bachelor's degree in automation (double major in economics) from Tsinghua University.

我的中文名是陶一锋.

News

Previous news...

Research: Machine Learning in Cancer Genomics

Cancer proceeds from the accumulation of genomic alterations, and develops into heterogeneous cell populations in an evolutionary process. Therefore, the prognoses of cancer patients, such as survival profile, metastasis, and drug response, are encoded by the large-volume genome data. Our research focuses on the personalized medicine of cancer with machine learning and phylogenetic models (Thesis):
  • Reliable phenotype inference of cancer through well-designed interpretable machine learning models. By leveraging the power of large scale genomic data and external biomedical knowledge base, we have been working on deep learning models for the accurate inference of cancer phenotypes, including transcriptome expression levels (Genomic Impact Transformer; GIT), transcription factor activities (Chromatin-informed Inference of Transcriptional Regulators Using Self-attention mechanism; CITRUS), and drug resistance (Contextual Attention-based Drug REsponse; CADRE). We addressed the interpretability of models through techniques such as attention mechanism to identify driver mutations and critical biomarkers.
  • Revealing intra-/inter-tumor heterogeneity and mechanism of tumor progression via robust deconvolution and phylogenetic algorithms. We formulated the deconvolution of bulk tumor molecular data mathematically as a biologically inspired matrix factorization problem, and proposed a neural network (Neural Network Deconvolution; NND) and then an improved hybrid optimizer (Robust and Accurate Deconvolution; RAD) to solve the problem robustly and accurately. We developed and applied a Minimum Elastic Potential (MEP) algorithm to reconstruct the evolutionary trajectory from the unmixed clones. Our ongoing projects focus on the integration of single-cell data for finer resolution of clone deconvolution and phylogeny inference (FISH-Deconv).
  • Improving prognostic prediction of cancer by incorporating machine learning and evolutionary methods. Clinicians traditionally focused on the pathological features and driver-level genomic profiles to facilitate the treatment. However, it is possible that critical clones, instead of the bulk tumor as a whole, affect the prognoses. We explored the questions by integrating both the evolutionary mutational features, driver-level features, and clinical features to improve the prognostic prediction of cancer. We developed an L0-regularized Cox regression model (Phylo-Risk), and found that the evolutionary features account for roughly 1/3 of all the available features, depending on cancer types and sequencing techniques.

Publications

Note: * indicates equal contribution, indicates co-corresponding author.

Paper image
Genome-Driven Personalized Medicine of Cancer via Machine Learning and Phylogenetic Models
Carnegie Mellon University Ph.D. Thesis. 2021.
Paper image
Prediction of Cell-Drug Sensitivities Using Deep Learning-based Graph Regularized Matrix Factorization
Proceedings of the Pacific Symposium on Biocomputing (PSB). 2022. Oral
Paper image
Interpretable Deep Learning for Chromatin-Informed Inference of Transcriptional Programs Driven by Somatic Alterations Across Cancers
bioRxiv 2021.09.07.459263. 2021.
Paper image
Joint Clustering of Single Cell Sequencing and Fluorescence in situ Hybridization Data for Reconstructing Clonal Heterogeneity in Cancers
Paper image
Tumor Heterogeneity Assessed by Sequencing and Fluorescence in situ Hybridization (FISH) Data
Bioinformatics. 2021. Impact Factor=6.9
Paper image
Assessing the Contribution of Tumor Mutational Phenotypes to Cancer Progression Risk
PLOS Computational Biology 17(3):e1008777. 2021. Impact Factor=4.4
Paper image
Neural Network Deconvolution Method for Resolving Pathway-Level Progression of Tumor Clonal Expression Programs with Application to Breast Cancer Brain Metastases
Frontiers in Physiology 11:1055. 2020. Impact Factor=4.1
Paper image
Predicting Drug Sensitivity of Cancer Cell Lines via Collaborative Filtering with Contextual Attention
Proceedings of the Machine Learning for Healthcare Conference (MLHC). 2020.
Proceedings of Machine Learning Research (PMLR). 126:660-684. 2020.
Paper image
Robust and Accurate Deconvolution of Tumor Populations Uncovers Evolutionary Mechanisms of Breast Cancer Metastasis
Proceedings of the Conference on Intelligent Systems for Molecular Biology (ISMB). 2020. Oral
Bioinformatics 36:i407-i416. 2020. Impact Factor=6.9
Paper image
From Genome to Phenome: Predicting Multiple Cancer Phenotypes based on Somatic Genomic Alterations via the Genomic Impact Transformer
Proceedings of the Pacific Symposium on Biocomputing 25:79-90 (PSB). 2020. Oral
Paper image
Improving Personalized Prediction of Cancer Prognoses with Clonal Evolution Models
bioRxiv 761510. 2019.
Paper image
Phylogenies Derived from Matched Transcriptome Reveal the Evolution of Cell Populations and Temporal Order of Perturbed Pathways in Breast Cancer Brain Metastases
Proceedings of the International Symposium on Mathematical and Computational Oncology 3-28 (ISMCO). 2019. Oral
Paper image
Effective Feature Representation for Clinical Text Concept Extraction
Proceedings of the Clinical Natural Language Processing Workshop 1-14 (NAACL-ClinicalNLP). 2019. Oral
Paper image
Automatic Human-like Mining and Constructing Reliable Genetic Association Database with Deep Reinforcement Learning
Proceedings of the Pacific Symposium on Biocomputing 24:112-123 (PSB). 2019.

Misc

Reviewer
Teaching
Teaching Assistant
Courses