I am a rising third-year Ph.D. student at Carnegie Mellon University, advised by Eric Xing. I am interested in statistical machine learning for healthcare and the theoretical problems that arise from the constraints of real-world data. These include building interpretable, robust systems for prediction on structured genomic, medical, and other types of data.

Previously, I studied math and computer science as a Braddock Scholar in the Schreyer Honors College at Penn State University. I did research with Sharon Hammes-Schiffer's Theoretical Chemistry Lab and Raj Acharya's ALISA lab. In my free time, I enjoy backpacking and composing for piano.

In Fall 2017, I will be co-founding the AI+ club at CMU. If you are interested in sponsoring talks, contact me by email.

Contact me on Twitter, by email, or in person.


August 2017 Our preprint "Retrofitting Distributional Embeddings to Knowledge Graphs with Functional Relations" is available on ArXiv.
July 2017 Visual Explanations for Convolutional Neural Networks via Input Resampling has been accepted to ICML Workshop on Visualization for Deep Learning.
May 2017 Starting my internship at Roam in San Mateo, CA.
March 2017 Attending ENAR in Washington, DC.
February 2017 Presenting "Improving the Accuracy of GWAS" at the Pittsburgh Center for Drug Abuse Research.


My heart is motivated by the promise of precision medicine and my mind is captivated by the puzzles of statistical machine learning. You can find a list of my publications according to Google Scholar here.

Working Papers

Drafts available on request.
Personalized Network Inference by Neighborhood Mixture Regression
Benjamin J. Lengerich, Bryon Aragam, Eric P. Xing
GenAMap on the Web: Visual Machine Learning for Next-Generation Genome Wide Association Studies
Haohan Wang*, Benjamin J. Lengerich*, Min Kyung Lee, Eric P. Xing

Papers in Review

Pre-prints available on request.
Hybrid Subspace Learning for High-Dimensional Data
Micol Marchetti-Bowick, Benjamin J. Lengerich, Ankur Parikh, Eric P. Xing
Precision Lasso: Accounting for Correlations and Linear Dependencies in High-Dimensional Genomic Data
Haohan Wang, Benjamin J. Lengerich, Bryon Aragam, Eric P. Xing
Retrofitting Distributional Embeddings to Knowledge Graphs with Functional Relations
Benjamin J. Lengerich, Andrew L. Maas, Christopher Potts

Publications and Presentations

Visual Explanations for Convolutional Neural Networks via Input Resampling
Benjamin J. Lengerich*, Sandeep Konam, Eric P. Xing, Stephanie Rosenthal, Manuela Veloso
ICML Workshop on Visualization for Deep Learning
Opportunities And Obstacles For Deep Learning In Biology And Medicine
Author order was determined by a randomized algorithm.
Travers Ching, Daniel S. Himmelstein, Brett K. Beaulieu-Jones, Alexandr A. Kalinin, Brian T. Do, Gregory P. Way, Enrico Ferrero, Paul-Michael Agapow, Wei Xie, Gail L. Rosen, Benjamin J. Lengerich, Johnny Israeli, Jack Lanchantin, Stephen Woloszynek, Anne E. Carpenter, Avanti Shrikumar, Jinbo Xu, Evan M. Cofer, David J. Harris, Dave DeCaprio, Yanjun Qi, Anshul Kundaje, Yifan Peng, Laura K. Wiley, Marwin H.S. Segler, Anthony Gitter, Casey S. Greene

GenAMap on the Web: Intuitive and Scalable Machine Learning for Structured Association Mapping.
Benjamin J. Lengerich*, Haohan Wang*, Min Kyung Lee, Eric P. Xing
The 66th Annual Meeting of The American Society of Human Genetics, October 19, 2016, Vancouver, BC.
Experimental and Computational Mutagenesis To Investigate the Positioning of a General Base within an Enzyme Active Site
Jason P. Schwans, Philip Hanoian, Benjamin J. Lengerich, Fanny Sunden, Ana Gonzalez, Yingssu Tsai, Sharon Hammes-Schiffer, and Daniel Herschlag


On the Origin of Sequences: Computational Analysis of Somatic Hypermutation for Probabilistic Immunoglobulin Predecessor Identification
Benjamin J. Lengerich
Adviser: Raj Acharya, Supervisor: Jesse Barlow.

Undergraduate Thesis in completion of Schreyer Honors College requirements for honors in computer science.


In Spring 2017, I was a TA for Prof. Jian Ma and Prof. Maria Chikina's course 02-410/710 Computational Genomics. I gave a lecture on statistical methods for discovering genetic associations [pdf].

In Fall 2016, I was a TA for Prof. Chris Langmead's course 02-450/750 Automation of Biological Research: Robotics and Active Learning. I gave a lecture on active learning for Bayesian Networks [pdf].


GenAMap is an open source platform for visual machine learning of structured association mappings between genotypes and phenotypes.

My github page.



  • Roam Analytics

    San Mateo, CA.    Summer 2017

    Creating machine learning methods that can organize information from a 1-billion edge knowledge graph.

  • Carnegie Mellon University

    Pittsburgh, PA.   09/2015 - Present

    PhD student at CMU, working with Prof. Eric Xing to design and implement systems and methods for statistical machine learning.
  • Google AdWords Express Team

    Mountain View, CA.    Summer 2014

    Implemented a new targeting method for Google Adwords Express based on machine learning of document localization tendencies.

  • Pennsylvania State University

    University Park, PA.    2011 - 2015

    Undergraduate degrees in computer science and mathematics. Research in theoretical chemistry.