Carnegie Mellon University researchers have created the first robotically driven experimentation system to determine the effects of a large number of drugs on many proteins, reducing the number of necessary experiments by 70 percent.
The model, presented in the journal eLife, uses an approach that could lead to accurate predictions of the interactions between novel drugs and their targets, helping reduce the cost of drug discovery.
"Biomedical scientists have invested a lot of effort in making it easier to perform numerous experiments quickly and cheaply," said lead author Armaghan Naik, a Lane Fellow in CMU's Computational Biology Department.
"However, we simply cannot perform an experiment for every possible combination of biological conditions, such as genetic mutation and cell type. Researchers have therefore had to choose a few conditions or targets to test exhaustively, or pick experiments themselves. The question is, which experiments do you pick?"
Naik says that careful balance between performing experiments that can be predicted confidently and those that cannot is a challenge for humans, as it requires reasoning about an enormous amount of hypothetical outcomes at the same time.
To address this problem, the research team previously described applying a machine learning approach called "active learning" that involves a computer repeatedly choosing which experiments to do in order to learn efficiently from the patterns it observes in the data. The team is led by senior author Robert F. Murphy, professor and head of the Computational Biology Department.
While their approach had only been tested using synthetic or previously acquired data, the team's current model builds on this work by letting the computer choose which experiments to do. The experiments were then carried out using liquid-handling robots and an automated microscope.
The learner studied the possible interactions between 96 drugs and 96 cultured mammalian cell clones with different, fluorescently tagged proteins. A total of 9,216 experiments were possible, each consisting of acquiring images for a given cell clone in the presence of a given drug. The challenge for the algorithm was to learn how proteins were affected in each of these experiments, without performing all of them.
The first round of experiments began by collecting images of each clone for one of the drugs, totaling 96 experiments. Images were represented by numerical features that captured the protein's location in the cell.
At the end of each round, all experiments that passed quality control were used to identify phenotypes (patterns in the location of a protein) that may or may not have related to a previously characterized drug effect.
A novelty of this work was for the learner to identify potentially new phenotypes on its own as part of the learning process. To do this, it clustered the images to form phenotypes. The phenotypes were then used to form a predictive model, so the learner could guess the outcomes of unmeasured experiments. The basis of the model was to identify sets of proteins that responded similarly to sets of drugs, so that it could predict the same prevailing trend in the unmeasured experiments.
The learner repeated the process for a total of 30 rounds, completing 2,697 out of the 9,216 possible experiments. As it progressively performed the experiments, it identified more phenotypes and more patterns in how sets of proteins were affected by sets of drugs. (Check out the video below.)
Using a variety of calculations, the team determined that the algorithm was able to learn a 92 percent accurate model for how the 96 drugs affected the 96 proteins, from conducting only 29 percent of the experiments.
"Our work has shown that doing a series of experiments under the control of a machine learner is feasible even when the set of outcomes is unknown. We also demonstrated the possibility of active learning when the robot is unable to follow a decision tree," Murphy said.
"The immediate challenge will be to use these methods to reduce the cost of achieving the goals of major, multisite projects, such as The Cancer Genome Atlas, which aims to accelerate understanding of the molecular basis of cancer with genome analysis technologies."
Reference: The paper "Active Machine Learning-Driven Experimentation To Determine Compound Effects on Protein Patterns" can be freely accessed on the eLife website. Contents, including text, figures and data, are free to reuse under a CC BY 4.0 license.
About eLife: eLife is a unique collaboration between the funders and practitioners of research to improve the way important research is selected, presented and shared. eLife publishes outstanding works across the life sciences and biomedicine — from basic biological research to applied, translational and clinical studies. All papers are selected by active scientists in the research community. Decisions and responses are agreed upon by the reviewers and consolidated by the reviewing editor into a single, clear set of instructions for authors, removing the need for laborious cycles of revision and allowing authors to publish their findings quickly. eLife is supported by the Howard Hughes Medical Institute, the Max Planck Society and the Wellcome Trust.
About Carnegie Mellon University: Carnegie Mellon is a private, internationally ranked research university with programs in areas ranging from science, technology and business, to public policy, the humanities and the arts. More than 13,000 students in the university's seven schools and colleges benefit from a small student-to-faculty ratio and an education characterized by its focus on creating and implementing solutions for real problems, interdisciplinary collaboration and innovation. The Computational Biology Department is one of seven academic departments in the School of Computer Science. Its focus is on developing rigorous computational solutions to biomedical problems.