By Kenneth Chiacchia
When Ziv Bar-Joseph talks about his research, he's precise, but rapid-fire. It's as if language can't keep up with him; as if the ideas have to come out more quickly than verbal communication can allow.
That's not surprising, perhaps, given the nature of his work at Carnegie Mellon University: bridging the biological and computational worlds in a way that allows us to finally understand a tremendous volume of built-up data.
Thanks to technologies that allow genetic information to be rapidly decoded, vanishingly small traces of DNA to be expanded to any quantity needed, and thousands of proteins to be identified and analyzed in parallel, biologists have recently been able to amass a vast amount of information about how living creatures develop, live, grow sick and even die.
Yet the complexity of interactions between biomolecules--coupled with the very volume of the data--has threatened to overwhelm the painstaking, point-by-point methods of wet-lab analysis and interpretation.
Enter Bar-Joseph, who came to CMU in 2003 and is now an associate professor with joint appointments in machine learning and computational biology. Bar-Joseph carried out his Ph.D. studies at the Massachusetts Institute of Technology after earning a master's degree in computer science at Hebrew University in Jerusalem in 1999. He brought to the United States a fascination with the biological world, and the conviction that computer science could make it possible to understand the mountain of data then beginning to emerge from biological laboratories.
"The end of my master's and beginning of my Ph.D. ... really coincided with the rapid advances that were taking place in biological sequencing capacity," he says. "I [watched] some lectures, and it just seemed like there were a lot of things going on. I thought it would require a lot of computational ideas ... to understand it."
The problem--and the opportunity--both stemmed in part from the successful first sequencing of the human genome. The Human Genome Project cost billions of dollars in an international collaboration to produce the first sequence of human DNA. Along the way, the effort spurred more efficient methods of sequencing.
But if gene sequencing was like writing, then it was producing many, many books in a strange language that biologists could only read on a word-by-word basis. They couldn't understand the syntax, the stories or any higher themes.
"[The genome project] actually did pay off in terms of research, but it's clear now that it's not enough to have the sequence," Bar-Joseph says. "Heart and lung cells all have the same sequence information, but they do different things because they're using different parts of that sequence."
Understanding which part of the genetic information in a cell is being used at any given time is only one part of the puzzle. Equally important is understanding how the proteins that are expressed are used by cells, and how those uses change as the proteins interact with one another.
One example of his work at MIT, in the laboratory of David Gifford, was a 2003 article in the journal Nature Biotechnology. Bar-Joseph and his collaborators used a novel computer algorithm to identify groups of genes that work together to perform biologically relevant tasks, such as respiration, protein synthesis and response to external stress.
In many ways, the work set the tone for his later career, in that it explained a biological system using a new computer tool that iteratively analyzed large amounts of data--but, importantly, also pulled in data sources not utilized by previous researchers.
In the 2003 paper, those new data consisted of measurements of the ability of proteins synthesized by given genes to bind to the DNA sequences of possible target genes. It's a mainstay of genetic interaction: regulatory genes direct the production of proteins that in turn modify the activity of other genes, which can include both functional and other regulatory genes.
Previous work had consisted largely of measuring statistical correlation between the expression of genes, assuming that those expressed contemporaneously are likely to be interacting. By adding the binding data, Bar-Joseph and his colleagues were able to discriminate between genes involved in the same genetic module, and those involved in different modules that just happen to be activated at the same time.
Bar-Joseph is a "prototypical example" of someone who has the skill set of developing new tools while examining novel problems, says Tom Mitchell, the CMU's Fredkin Professor of Computer Science and Machine Learning and head of the Machine Learning Department.
One example Mitchell gives concerns Bar-Joseph's work on understanding the cycle of cell division--the process that transforms a resting cell into two daughter cells containing identical copies of the genetic information.
In order to divide successfully, a cell must duplicate its DNA, segregate the two copies perfectly, and then divvy up the other cell components so that, when the two daughter cells split, they each have a fully functional genetic and cellular complement. The process takes precise coordination, synchronization and communication between thousands of genes.
The problem, says Mitchell, is that the cells in a given experiment "are not dividing at exactly the same time. They're dividing on their own asynchronous schedules. Now, the question is, how can you deal with the different timing of those different cells?"
Bar-Joseph solved the problem by building a new algorithm, which he designed to accommodate the asynchrony-generated noise in the data.
What makes Bar-Joseph really special, Mitchell says, is something else, though--it's his ability to bridge two vital specialties via a deep focus on both.
"We wrote a paper together that had to do, again, with time series of gene expression data," Mitchell says. "What was fun was just working with Ziv" to frame the problem as a machine learning problem, he says: "It became very clear to me in that process just what an advantage Ziv had over me and most people ... he really understood the biology and really understood the machine learning."
"Many people have diverse interests, and they dabble in things," Mitchell adds. "Ziv does more than dabble--he goes to work in a wet lab," referring to a yearlong sabbatical Bar-Joseph took in a laboratory studying the development of the fly nervous system. "He doesn't just jog, he runs marathons."
That last bit is not a metaphor: Bar-Joseph is a running enthusiast, and by any measure but his own, is a formidable competitor.
"In Boston, I ran with a group," Bar-Joseph says. "Here, I run on my own." Initially, when he moved to Pittsburgh in 2003 to take up his duties at CMU, he did run with a group: "But they were much too strong for me."
One tends to take that statement with a bit of a grain of salt, upon hearing of a recent milestone he achieved: a three-hour marathon, something relatively few amateurs achieve. (Of more than 4,000 competitors in the 2010 Pittsburgh Marathon, for instance, only 59 clocked in at less than three hours.)
"Since I came here, I've improved dramatically," he admits.
Bar-Joseph the runner has fallen in love with Pittsburgh's rolling hills and river valleys, which give him varying challenges.
"I run a lot in Schenley Park," he says. "Once a week I run between 15 and 20 miles. The nice thing about Pittsburgh is you can find 15 or 20 miles that are flat next to the rivers--or, if you want hills, you have that."
The driven nature of a marathon runner came through in Bar-Joseph's recent project with colleagues including Yehuda Afek at Tel-Aviv University and Naama Barkai, a molecular geneticist at the Weizmann Institute of Science in Israel. The work resulted in a high-profile paper for the journal Science in January 2011, on how the fly's nervous system self-organizes as it develops; it was the project for which he took his wet-lab sabbatical.
The work was a little atypical for Bar-Joseph, as it took a biological observation and used it to derive lessons for organizing distributed networks, rather than using computer technology to derive lessons for understanding a biological system.
The insight that the investigators gleaned came from the simplicity of the fly system. In order for a network to form, some nodes in the network must become leaders; but leaders must be spaced apart from each other in the network for efficiency. In traditional, computer-science derived self-organization, in order to decide whether to become a leader a node must know how many direct neighbors it has and receive information from these neighbors that scales with the network's size. What the collaborators discovered was that the fly's neurons could decide whether to become leaders even without knowing the size of their neighborhood, and using only one-bit messages from their immediate neighbors. The lesson from this carbon-based network immediately suggested better ways of forming silicon networks.
"The one thing for which the credit goes entirely to Ziv Bar-Joseph is that he realized [the fly system] was behaving as a distributed algorithm," Afek says. But as with the cell-cycle work with Mitchell, his greatest value to the work possibly lay in his ability to form connections.
"He's very energetic; he was really the leader," Afek says. "His mind keeps working all the time. He would send an email in the middle of the night with a new idea: 'Let's try to analyze this idea, how about trying this kind of algorithm.'"
With such an intense focus on work, it may come as a surprise how much time Bar-Joseph spends with his family. A man of deep faith who's involved in Pittsburgh's Orthodox Jewish community, he spends Saturdays with his wife, daughter, and two sons in strict religious observance, not performing any work in honor of the Sabbath--including using the car. On other days, they'll take an extended break to places near Pittsburgh, farther afield, or to Israel to visit family.
As his children grow up--his oldest, a boy, is 11--they're also developing individual interests that he's had fun helping to shape and join. Hiking and camping, in places like Ohiopyle State Park, appeal to him and his kids--though "my wife doesn't really like it," he says, a little sheepishly. He plays basketball with his 11-year-old; his eight-year-old daughter has gotten interested in running.
"My hope is to run a half-marathon with her some day," he says, smiling. "By that time, she'll beat me."
Today, Bar-Joseph's research group carries out three main avenues of research. One explores computational methods to understand the dynamics of systems that change over time--for example, the cell cycle, or epidemics. Another is the study of cross-species analysis; though the biological literature brims with experiments performed on hundreds of species of biomedical and wider biological interest, they take place in organisms and cells in different life stages, health conditions, and environments. The complex comparisons lend themselves to Bar-Joseph's machine-learning approach. The third direction for the lab uses biology to understand and build better computer systems and networks, like the fly nervous system development work with Afek.
While Bar-Joseph's work spans varied topics and disciplines, it's not exactly surprising, according to his other boss--Robert Murphy, the Ray and Stephanie Lane Professor and director of the Lane Center for Computational Biology. It's more or less a job description.
"That's what we exist for," Murphy says about the Lane Center. "The key is combining expertises to solve problems." He particularly cites "Ziv's work within system biology ... trying to integrate information on a genomic scale, to be able to build models of cell behavior or of biological behavior ... It's precisely the scale of the problem that makes it the most interesting computationally."
Murphy has published two papers with Bar-Joseph and together they supervised a Ph.D. student. "Ziv's a great practitioner of computational biology and a tremendous contributor to our department," he says. "We're very pleased and lucky to have him here at Carnegie Mellon."
"He's a no-nonsense guy," Mitchell agrees. "Once he decides to solve a problem, he's going to solve the problem. Every time we met [to discuss the cell-cyle work], he was on top of the question ... Over the next decade, the leaders in the field are going to be people like Ziv."
For his part, Ziv Bar-Joseph loves both his job and his adopted home. "In Israel, they're much more focused on the theory. Here, application is at least as important. Since I work a lot on applications, I like it."
Coming to Pittsburgh "proved to be the right decision," he adds. "The students that I was fortunate to have since coming here were great. If I've had any success in my career, I owe it to my students and the post-docs ... The fact that [at CMU] you can attract the best students from around the world is really key."
Jason Togyer | 412-268-8721 | jt3y [atsymbol] cs.cmu.edu