Beginning in the mid 1990s, Murphy and his team pioneered using the methods of machine learning to analyze microscope images of cellular structures.
At CMU, he developed the world's first formal undergraduate program in computational biology in 1987 and served as founding director (with Jelena Kovacevic) of CMU's Center for Bioimage Informatics, as well as founding director (with Ivet Bahar) of the joint Ph.D. program in computational biology offered by CMU and the University of Pittsburgh.
Murphy's honors include his election in 2007 as a senior member of the Institute of Electrical and Electronic Engineers and the Alexander von Humboldt Foundation Research Award in 2009.
He's also an avid amateur basketball player and a former youth basketball coach. He spoke to Link Managing Editor Jason Togyer.
How did you come to the field of computational biology?
When I was 13, I read a book called "The Genetic Code" by Isaac Asimov, and from that time, I knew I wanted to do something related to research in genetics. But I didn't know how I wanted to get there.
I ended up majoring in biochemistry. When I went to Caltech for my doctoral degree, I was looking for a way to analyze results, and I learned about the world of computing. I was hooked.
I spent a good fraction of my graduate school time doing data analysis, which in those days included actually figuring out how to get the data into the computer.
About 50 percent of my time was spent doing experiments, and 50 percent was spent writing code. Since then, I have followed and learned the amazing developments in computer science and machine learning as they developed over the past 36 years.
What shaped your research interests?
I started having these odd experiences at conferences. My own work was very quantitative, but I would often present in a session with people who would be describing their results in very qualitative terms--and the images they were displaying, which were generated through microscopy, were supposed to be accepted in support of the models they were describing.
But I had a hard time making the connection between the images and the hypotheses they were attempting to prove. There was nothing that I would see that allowed me to use the word "proof."
Then I would see very similar images used to "prove" very different hypotheses. There was no attempt to deceive people, but the value of these images was woefully inadequate.
I said to myself, "Somebody has to tackle the method of drawing statistically verifiable conclusions from these images," and a certain point, it became apparent that that person was going to have to be me.
I began working to develop computer models that would recognize patterns in the images, which in turn would enable researchers to say, "This particular protein is in this particular location."
How do you describe computational biology to a layperson?
Most of the time, I describe it as using computers to solve biological problems, and I say "computers" rather than "computer science" because people understand the concept of "computers." I also say I'm trying to change the way that biology is done.
Using computers to solve biological problems is consistent with a traditional scientific discovery model--you have some data source, you analyze it, and you report your results.
But the way that biological research and ultimately clinical practice will have to be changed is by having machine learning techniques take a role not just in analyzing data, but in collecting it as well.
The mission of the Lane Center is to enable--to catalyze-- a transition to where robotics and machine learning are at the center of how biological research is done.
You use the term "active learning" to describe some of the work being done at the Lane Center. What does that mean?
Active learning describes an iterative process where a computer analyzes the elements of a dataset, tries to build an model to predict the results of experiments that haven't been done, and chooses key experiments to generate new data with the goal of improving the model until it can accurately predict all results without doing all possible experiments. In some sense, it removes traditional hypothesizing from the mix.
This is a very informative point to make--in traditional methods, you want to pick a hypothesis and prove that hypothesis is right. In active learning, you don't want to verify the hypothesis that you're already pretty sure is right--you want to test the hypotheses that aren't right, because those are the ones that will help you improve the model. Verifying hypotheses in which you already have high confidence isn't going to help you improve the model.
Why are these models more useful than, say, performing clinical trials on real patients?
Well, you might run clinical trials to see, for example, whether you have a statistically significant difference between "drug" and "no drug." But sometimes the effect is very small--maybe a 3 percent change. And the issue of side effects isn't examined until after a drug is approved.
Or, sometimes you have studies on one drug in one biological pathway that are run in parallel to studies of other drugs in other biological pathways. But the two studies don't inform each other, and those two drugs in combination create side effects. There are too many possible experiments to do.
That's where we see the need for ways to design the appropriate experiments to collect enough data to support much more thorough avenues to questions of whether a particular drug is an appropriate treatment for a particular disease. One of our goals now is to create detailed models of tissues that can predict effects, without necessarily measuring them.
Why is this work important to a doctor--or to a patient?
I'll give you the standard answer a basic scientist will give you--we're trying to learn the ways that clinical practice can be improved.
The FDA right now will not approve a drug without receiving a clear understanding of how the mechanism works. That concept comes from a time when we thought biology was much simpler than it actually is. Biological systems are incredibly complex, and therefore being able to make statements about "why something works" from a mechanistic standpoint is extremely difficult to do.
With machine learning techniques, we can create and evaluate models of drug efficiency from a sound statistical basis, without having to reduce it to a simple statement of "this drug affects the catalysis of A to B," because the model may determine the drug actually has 13 different effects.
What motivates today's students to pursue computational biology?
A lot of students are motivated because of things they see or read in the news.
We're in an era when the opportunities for biomedical research--while also tackling significant computer science challenges--are enormous. And let's face it--the ways that we humans work is a fascinating subject for us.
So we've been looking at the educational offerings of the Lane Center, and we already have two new initiatives there. The first is a master's of science in biotechnology, innovation and computing, which we're offering jointly with the Language Technologies Institute.
And we've initiated a new minor for undergraduates in computational biology, which is currently working its way through the review process. We're planning to bring that online in fall 2011.
Jason Togyer | 412-268-8721 | email@example.com