Current Research   |  Previous Work in Computational Biology

Current Research in Computational Biology

The goal of my thesis work is to develop a formal statistical framework for analyzing the spatial organization of genes within and across genomes. One main component of this work is to identify modules of genes for which spatial proximity is significantly conserved in the genomes of a number of species, either for functional or historical reasons. This task is often approached by identifying chromosomal regions that have arisen from a single region in a common ancestor. In closely related genomes, these regions are characterized by identical gene content and order. However, in more distantly related genomes, homologous regions must be detected by searching for gene clusters, pairs of regions with similar, but not identical, gene content and scrambled gene order. Identification of gene clusters is an essential prerequisite for many types of comparative genomics analyses. Applications of this work include operon prediction, identification of horizontal transfer events, discovery and analysis of large-scale or whole-genome duplications, reconstruction of ancestral gene order, ortholog detection, and the generation of novel features for distance-based phylogeny reconstruction. Summer 2004 - Present.

Advisor: Dannie Durand, Deparrtments of Biological Sciences and Computer Science, Carnegie Mellon University
Collaborator: David Sankoff, Department of Mathematics and Statistics, University of Ottawa, Ontario, Canada.

Relevant Publications:

  • Diagnosing Gene Duplications: Can it be Done?, with Dannie Durand. Trends in Genetics, 2006; 22(3), 156-164. [pdf]

  • The Incompatible Desiderata of Gene Cluster Properties, with Dannie Durand. "Proceedings of the 3rd RECOMB Workshop on Comparative Genomics." Mclysaght and Huson, eds., Lecture Notes in Bioinformatics, Springer-Verlag, September 2005. [pdf]

  • The Statistical Analysis of Spatially Clustered Genes under the Maximum Gap Criterion, with David Sankoff and Dannie Durand. Journal of Computational Biology, 2005; 12(8), 1083-1101. [pdf]

  • The Statistical Significance of Max-Gap Clusters, with David Sankoff and Dannie Durand. In "Proceedings of the 2nd RECOMB Workshop on Comparative Genomics. Lagergren, ed., Lecture Notes in Bioinformatics, Springer-Verlag. October 2004. [pdf]


Related Presentations:

  • Jan 2006. Conserved spatial patterns in genomes: signal or noise?, Carnegie Mellon Student Seminar Series [slides.pdf]

  • Sept 2005. The Incompatible Desiderata of Gene Cluster Properties. RECOMB Workshop on Comparative Genomics. [slides.pdf]

  • Aug 2005. A Statistical Framework for Spatial Comparative Genomics. Thesis Proposal, CMU. [slides.pdf]

  • June 2005. Significance Tests for Max-Gap Gene Clusters. Mathematics of Evolution and Phylogeny Conference. [slides.pdf]

  • Nov 2004. The Statistical Significance of Max-Gap Gene Clusters. Biological Language Conference. [ppt]

  • Oct 2004. The Statistical Significance of Max-Gap Gene Clusters. RECOMB Workshop on Comparative Genomics. [ppt]

  • Sept 2004. Operons: A Microbial Odyssey. Durand lab seminar. [ppt]

  • Mar 2003. Molecular dating, molecular clocks, and the entrails of chickens. Durand lab seminar. [ppt]

Previous Research in Computational Biology

Research Projects

  • Protein Evolution. Designed a method to detect site-specific conservation of physical and chemical amino acid properties within a protein family, even when little sequence similarity is observed. Fall 2003 - Spring 2004.

    Advisor: Roni Rosenfeld, Computer Science Department, Carnegie Mellon University.
    Collaborator: Judith Klein-Seetharaman, Department of Pharmacology University of Pittsburgh School of Medicine

  • Simulating the Immune System. Developed a prototype discrete-state simulation system to model the innate and adaptive mammalian immune systems, and applied it to the cytokine signalling network. Summer 2003.

    Supervisor: Shlomo Ta'asan, Department of Mathematics, Carnegie Mellon University

  • Epitope Prediction. Conducted a large-scale statistical comparison of human and pathogen proteomes, in order to better understand the adaptive immune system's ability to discriminate self from non-self. By identifying statistical differences between self and non-self proteins we can more accurately identify antigenic proteins or epitopes and improve our understanding of the mechanisms of self-tolerance and auto immunity. Fall 2002 - Summer 2003.

    Advisor: Roni Rosenfeld, Computer Science Department, Carnegie Mellon University.

  • Alternative Splicing For a course project, I investigated the relationship between system complexity and rates of alternative splicing in different organisms and biological systems. I tested the hypothesis that although overall alternative splicing rates are not statistically different between mammals and other organisms like fly and worm, rates of alternative splicing will vary significantly in complex systems like the adaptive immune system and the nervous system. Fall 2002.
    Supervisor: Javier Lopez, Department of Biological Sciences, Carnegie Mellon University.

Publications and Reports:

  • Inferring Property Selection Pressure from Positional Residue Conservation with Judith Klein-Seetharaman and Roni Rosenfeld. Appl. Bioinformatics. 2004; 3(2-3): 167-179. [pdf]

  • The Relationship Between Alternative Splicing Rates and Complexity of Biological Systems. Computational Molecular Biology and Genomics class project final report. Dec 2002. [ps]



  • Inferring property selection pressure from positional residue conservation. Presented at ISMB, July 2004. [ppt].


  • June 2004. Using physical-chemical properties of amino acids to model site-specific substitution propensities. Genomes and Evolution, the joint meeting of SMBE and AGA.

  • June 2004. Review: First RECOMB Satellite Workshop on Regulatory Genomics. Biological Language Modeling Seminar.

  • Feb 2004. Protein evolution and property conservation: a review of three papers. Biological Language Modeling Seminar.

  • Nov 2003. Inferring property selection pressure from positional residue conservation. Biological Language Conference.

  • Oct 2003. Conference Review: ECCB 2003. Biological Language Modeling Seminar.

  • May 2003. Telling self from non-self: learning the language of the immune system. Biological Language Modeling Workshop.

  • Apr 2003. Computational immunology: an introduction. Biological Language Modeling Seminar.

  • May 2002. Review: Building a dictionary for genomes, Bussemaker et al. Biological Language Modeling Seminar.

[ Back to Rose's Web Page ]

Rose Amanda Hoberman
Last updated Dec 20 2002