Previously, I studied math and computer science as a Braddock Scholar in the Schreyer Honors College at Penn State University. I did research with Sharon Hammes-Schiffer's Theoretical Chemistry Lab and Raj Acharya's ALISA lab. In my free time, I enjoy backpacking and composing for piano.
In Fall 2017, I will be co-founding the AI+ club at CMU. If you are interested in sponsoring talks, contact me by email.
Contact me on Twitter, by email, or in person.
|August 2017||Our preprint "Retrofitting Distributional Embeddings to Knowledge Graphs with Functional Relations" is available on ArXiv.|
|July 2017||Towards Visual Explanations for Convolutional Neural Networks via Input Resampling has been accepted to ICML Workshop on Visualization for Deep Learning.|
|May 2017||Starting my internship at Roam in San Mateo, CA.|
|March 2017||Attending ENAR in Washington, DC.|
|February 2017||Presenting "Improving the Accuracy of GWAS" at the Pittsburgh Center for Drug Abuse Research.|
Publications and Presentations
Towards Visual Explanations for Convolutional Neural Networks via Input Resampling
Benjamin J. Lengerich*, Sandeep Konam, Eric P. Xing, Stephanie Rosenthal, Manuela Veloso
ICML Workshop on Visualization for Deep Learning
The predictive power of neural networks often costs model interpretability. Several techniques have been developed for explaining model outputs in terms of input features; however, it is difficult to translate such interpretations into actionable insight. Here, we propose a framework to analyze predictions in terms of the model's internal features by inspecting information flow through the network. Given a trained network and a test image, we select neurons by two metrics, both measured over a set of images created by perturbations to the input image: (1) magnitude of the correlation between the neuron activation and the network output and (2) precision of the neuron activation. We show that the former metric selects neurons that exert large influence over the network output while the latter metric selects neurons that activate on generalizable features. By comparing the sets of neurons selected by these two metrics, our framework offers a way to investigate the internal attention mechanisms of convolutional neural networks.
Opportunities And Obstacles For Deep Learning In Biology And Medicine
Author order was determined by a randomized algorithm.
Travers Ching, Daniel S. Himmelstein, Brett K. Beaulieu-Jones, Alexandr A. Kalinin, Brian T. Do, Gregory P. Way, Enrico Ferrero, Paul-Michael Agapow, Wei Xie, Gail L. Rosen, Benjamin J. Lengerich, Johnny Israeli, Jack Lanchantin, Stephen Woloszynek, Anne E. Carpenter, Avanti Shrikumar, Jinbo Xu, Evan M. Cofer, David J. Harris, Dave DeCaprio, Yanjun Qi, Anshul Kundaje, Yifan Peng, Laura K. Wiley, Marwin H.S. Segler, Anthony Gitter, Casey S. Greene
Deep learning, which describes a class of machine learning algorithms, has recently showed impressive results across a variety of domains. Biology and medicine are data rich, but the data are complex and often ill-understood. Problems of this nature may be particularly well-suited to deep learning techniques. We examine applications of deep learning to a variety of biomedical problems -- patient classification, fundamental biological processes, and treatment of patients -- to predict whether deep learning will transform these tasks or if the biomedical sphere poses unique challenges. We find that deep learning has yet to revolutionize or definitively resolve any of these problems, but promising advances have been made on the prior state of the art. Even when improvement over a previous baseline has been modest, we have seen signs that deep learning methods may speed or aid human investigation. More work is needed to address concerns related to interpretability and how to best model each problem. Furthermore, the limited amount of labeled data for training presents problems in some domains, as can legal and privacy constraints on work with sensitive health records. Nonetheless, we foresee deep learning powering changes at the bench and bedside with the potential to transform several areas of biology and medicine.
GenAMap on the Web: Intuitive and Scalable Machine Learning for Structured Association Mapping.
Benjamin J. Lengerich*, Haohan Wang*, Min Kyung Lee, Eric P. Xing
The 66th Annual Meeting of The American Society of Human Genetics, October 19, 2016, Vancouver, BC.
Current methods of structured association mapping can effectively relate genetic polymorphisms with phenotypes, but correct use requires algorithmic expertise to run code and domain expertise to analyze results. To overcome these challenges, the GenAMap software platform was developed and released in 2010. Here, GenAMap is redesigned for scalability and updated with state-of-the-art methods.
Experimental and Computational Mutagenesis To Investigate the Positioning of a General Base within an Enzyme Active Site
Jason P. Schwans, Philip Hanoian, Benjamin J. Lengerich, Fanny Sunden, Ana Gonzalez, Yingssu Tsai, Sharon Hammes-Schiffer, and Daniel Herschlag
The positioning of catalytic groups within proteins plays an important role in enzyme catalysis, and here we investigate the positioning of the general base in the enzyme ketosteroid isomerase (KSI). The oxygen atoms of Asp38, the general base in KSI, were previously shown to be involved in anion–aromatic interactions with two neighboring Phe residues. Here we ask whether those interactions are sufficient, within the overall protein architecture, to position Asp38 for catalysis or whether the side chains that pack against Asp38 and/or the residues of the structured loop that is capped by Asp38 are necessary to achieve optimal positioning for catalysis. Our results indicate that structural features in addition to the overall protein architecture and the Phe residues neighboring the carboxylate oxygen atoms play roles in positioning. X-ray crystallography and molecular dynamics simulations suggest that the functional effects arise from both restricting dynamic fluctuations and disfavoring potential mispositioned states. Recognizing the extent, type, and energetic interconnectivity of interactions that contribute to positioning catalytic groups has implications for enzyme evolution and may help reveal the nature and extent of interactions required to design enzymes that rival those found in biology.
On the Origin of Sequences: Computational Analysis of Somatic Hypermutation for Probabilistic Immunoglobulin Predecessor Identification
Benjamin J. Lengerich
Adviser: Raj Acharya, Supervisor: Jesse Barlow.
Undergraduate Thesis in completion of Schreyer Honors College requirements for honors in computer science.
The human immune system uses complex mechanisms to generate enough antibody diversity to effectively protect against a wide array of potential antigens. These mechanisms obfuscate the germline predecessors of mature antibodies, making it difficult to produce a comprehensive model of the immune system. Interestingly, this problem also arises in the production of next-generation anti-viral software that use biomimicry to model malicious software as a recombination of attack patterns. Current methods fail to solve this problem in bicomputation because they ignore somatic hypermutation, one of the key methods of diversity generation. In this thesis, computational analysis is performed to develop a better model of somatic hypermutation, which is then used to improve antibody predecessor identification performance.
In Fall 2016, I was a TA for Prof. Chris Langmead's course 02-450/750 Automation of Biological Research: Robotics and Active Learning. I gave a lecture on active learning for Bayesian Networks [pdf].
San Mateo, CA. Summer 2017
Creating machine learning methods that can organize information from a 1-billion edge knowledge graph.
Carnegie Mellon University
Pittsburgh, PA. 09/2015 - PresentPhD student at CMU, working with Prof. Eric Xing to design and implement systems and methods for statistical machine learning.
Google AdWords Express Team
Mountain View, CA. Summer 2014
Implemented a new targeting method for Google Adwords Express based on machine learning of document localization tendencies.
Pennsylvania State University
University Park, PA. 2011 - 2015
Undergraduate degrees in computer science and mathematics. Research in theoretical chemistry.