I am a third-year Computer Science Ph.D. student at Carnegie Mellon University, advised by Eric Xing. I am interested in statistical machine learning for healthcare and the theoretical problems that arise from the constraints of real-world data. These include building interpretable, robust systems for prediction on structured genomic, medical, and other types of data.

I am currently co-founding the AI+ club at CMU. If you are interested in sponsoring talks, contact me by email.

Contact me on Twitter, by email, or in person (GHC 8127).


You can find a list of my publications according to Google Scholar, Semantic Scholar, or DBLP.

Papers in Review

Pre-prints available on request.
Hybrid Subspace Learning for High-Dimensional Genomic Data
Micol Marchetti-Bowick, Benjamin J. Lengerich, Ankur Parikh, Eric P. Xing
Precision Lasso: Accounting for Correlations and Linear Dependencies in High-Dimensional Genomic Data
Haohan Wang, Benjamin J. Lengerich, Bryon Aragam, Eric P. Xing
Retrofitting Distributional Embeddings to Knowledge Graphs with Functional Relations
Benjamin J. Lengerich, Andrew L. Maas, Christopher Potts

Publications and Presentations

Personalized Regression Enables Sample-Specific Pan-Cancer Analysis
Benjamin J. Lengerich, Bryon Aragam, Eric P. Xing
Opportunities And Obstacles For Deep Learning In Biology And Medicine
Author order was determined by a randomized algorithm.
Travers Ching, Daniel S. Himmelstein, Brett K. Beaulieu-Jones, Alexandr A. Kalinin, Brian T. Do, Gregory P. Way, Enrico Ferrero, Paul-Michael Agapow, Wei Xie, Gail L. Rosen, Benjamin J. Lengerich, Johnny Israeli, Jack Lanchantin, Stephen Woloszynek, Anne E. Carpenter, Avanti Shrikumar, Jinbo Xu, Evan M. Cofer, David J. Harris, Dave DeCaprio, Yanjun Qi, Anshul Kundaje, Yifan Peng, Laura K. Wiley, Marwin H.S. Segler, Anthony Gitter, Casey S. Greene

Towards Visual Explanations for Convolutional Neural Networks via Input Resampling
Benjamin J. Lengerich*, Sandeep Konam*, Eric P. Xing, Stephanie Rosenthal, Manuela Veloso
ICML Workshop on Visualization for Deep Learning
GenAMap on the Web: Intuitive and Scalable Machine Learning for Structured Association Mapping.
Benjamin J. Lengerich*, Haohan Wang*, Min Kyung Lee, Eric P. Xing
Experimental and Computational Mutagenesis To Investigate the Positioning of a General Base within an Enzyme Active Site
Jason P. Schwans, Philip Hanoian, Benjamin J. Lengerich, Fanny Sunden, Ana Gonzalez, Yingssu Tsai, Sharon Hammes-Schiffer, and Daniel Herschlag


In Spring 2017, I was a TA for Prof. Jian Ma and Prof. Maria Chikina's course 02-410/710 Computational Genomics. I gave a lecture on statistical methods for discovering genetic associations [pdf].

In Fall 2016, I was a TA for Prof. Chris Langmead's course 02-450/750 Automation of Biological Research: Robotics and Active Learning. I gave a lecture on active learning for Bayesian Networks [pdf].


GenAMap is an open source platform for visual machine learning of structured association mappings between genotypes and phenotypes.

My github page.