Center For Biological Language Modeling
Carnegie Mellon University | University of Pittsburgh | Massachusetts Institute of Technology
Boston University Medical School | National Research Council Canada

Research Goals   Publications BLM Web Toolkit   Conferences Contact
New and Past Events

Biological Language Conference 2005  * New

Advisory Board Meeting
Restricted Access

Biological Language Conference 2004 

NSF VISIT July 2004 * Restricted Access

Workshop on Ambient Intelligence for Scientific Discovery

Biological Language Conference 2003

Biological Language Modeling Workshop May 13 & 14, 2003

Biological Language Modeling Workshop Nov 7 & 8, 2002  





Overview of the Biological Language Modeling Project

The Biological Language Modeling project is based on the assumption that protein sequences from different organisms may be viewed as texts written in different languages. The mapping of protein sequence to their structure, dynamics and function then becomes analogous to the mapping of words to meaning in natural languages. This analogy can be exploited by application of statistical language modeling and text classification techniques to biological sequences, thereby generating testable hypotheses regarding the fundamental building blocks of "protein sequence language". The biology-language analogy enables novel applications of language technologies to the biology domain, but is to a great extent overlapping with existing other computational biology/bioinformatics applications.  

Goal 1. Integration of linguistic analyses results and those from structural dynamics for characterizing key residues that control enzymatic functions
Principle Investigators Ivet Bahar, Michele Loewen, Cathy Costello, Jaime Carbonell, Roni Rosenfeld
Students Alpay Temiz, Lee-Wei Yang
Postdocs Dror Tobi, Shyamasri Biswas

i. Computational assessment of the hinge-bending and other key mechanical sites that coordinate functional dynamics of representative sets of enzymes, using the recently compiled (Thornton and coworkers) enzyme classifications and network models of protein structure and dynamics.. 

ii. Systematic analysis of the structural regions/motifs that are implicated in key mechanical roles, using graph theoretical, computer vision and linguistic tools 

iii. Experimental characterization of the mechanism of action of particular classes of enzymes as case studies (e.g. dioxygenase enzymes in collaboration with NRC) 

iv. IMPACT: Discover new potential ligand-binding sites 

v. EVALUATION: Test key mechanical sites predictions against known inhibitor binding sites towards elucidating the possible coupling between mechanical and chemical activities, and hypothesize potential new binding sites to be tested experimentally


Goal 2. Deep understanding of transmembrane protein folding and function
Principle Investigators Judith Klein-Seetharaman, Gobind Khorana, Ivet Bahar, Raj Reddy, Hagai Meirovitch 
Students Basak Isin, Madhavi Ganapathiraju, Jiangbo Miao, Naveena Yanamala
Postdocs AJ Rader, Harpreet Kaur Dhiman, David Man

i. Discern property conservation of amino acids (e.g. discriminating hydrophobic amino acids facing outside vs. inside of transmembrane helices)

ii. Identify the most salient motifs for structural stability and for governing conformational changes, building on our existing results for rhodopsin and leveraging context-dependent statistical grammars or similar approaches.

iii. Discriminate between functional-relevant and structurally-relevant motifs.

iv. IMPACT: New strategies to help diagnose and eventually treat conformational diseases associated with transmembrane proteins.

v. EVALUATION: site-directed mutagenesis experiments.


Goal 3. Deep understanding of relation between b-sheet formation and underlying primary sequences
Principle Investigators Jaime Carbonell, Jonathan King, Vanathi Gopalakrishnan,Judith Klein-Seetharaman
Students Yan Liu, Welkin Pope, Ryan Simkovsky
Postdocs Peter Weigele

i. Extension to supersecondary structures via long-range probabilistic linguistic models

ii. Explore b-sheet transmembrane proteins (incl b-barrels)

iii. BIOLOGICAL IMPACT: Predictive models for b-sheets and selected b-sheet based supersecondary structures

iv. BIOLOGICAL EVALUATION: Reduce prediction errors by up to 50% for existing prediction tasks, and predict structures for which there are no present prediction results.

v. COMPUTATIONAL IMPACT: New predictive algorithms and techniques such as multi-layer conditional random fields (CRFs)

vi. COMPUTATIONAL EVALUATION: Performance measured on new problems in relations text extraction and understanding

Goal 4. Discovery of vocabulary for conservation pressure in protein evolution
Principle Investigators Roni Rosenfeld, Judith Klein-Seetharmanan, Vanathi Gopalakrishnan, Michele Loewen, Jonathan King
Students Jerry Zhu, Yong Lu, Oznur Tastan

i. Based on multi-dimensional position-conditional properties

ii. Expansion from HIV and GPCR families to other families including kinases and nuclear receptors

iii. Further expansion to broader datasets including all possible protein families, and structurally homologous protein families

iv. IMPACT: Detect very remote homologs.

v. EVALUATION: Identification of heretofore unknown homologs, to be validated via biological experimentation.


Goal 5. Algorithmic solutions for the discovery of large-scale gene regulatory networks and for function prediction
Principle Investigators Yiming Yang, Eric Xing
Students Fan Li

i.Developing new algorithms to extract multi-type expressions of genes from micro-array data, DNA sequences, protein-protein interaction databases and gene ontology, and to induce regulatory networks based on the multi-types of evidence

ii. Adapting hierarchical classification techniques to the motif/gene identification and function analysis based on multi-abstraction-level
representations of protein sequences

iii. Improving the efficiency of machine learning algorithms for automated induction of very large regulatory networks, e.g., with thousands of genes

iv. IMPACT: The resulting regulatory networks and gene/motif classes would help biologists to discover new and interesting patterns, e.g. leading to deeper understanding of the mechanisms regulating oncogenes and their potential disruption.

v. EVALUATION: Identification of regulatory networks and gene classes (functions), to be validated first via biology databases (TRANSFAC, GO, SCPD...), and later possibly via collaboration with personnel in the U Pitt Cancer Center.

Goal 6. Protein protein interaction prediction
Principle Investigators Ziv Bar-Joseph, Judith Klein-Seetharaman, Michele Loewen
Students Yanjun Qi

i. Develop new approaches and explore all available features for protein protein interaction prediction

ii. IMPACT: comprehensive prediction of protein pairs in entire organisms, starting with yeast, but eventually for human

iii. EVALUATION: precision and recall optimization of existing datasets, small-scale validation with specific pathways and wet-lab experiments

Goal 7. Development of user-friendly BLM service hub with website interface
Principle Investigators Alain Rappaport, Judith Klein-Seetharaman, Hassan Karimi
System Administrator Mark Holliman 
Students Madhavi Ganapathiraju ,Yanjun Qi,Mitch Saltykov

i. Develop a consistent interface for publishing and invoking services

ii. Develop tutorials for adding, publishing and invoking services in specific computational biology contexts

iii. Organize workshops to train users internally and externally

iv. IMPACT: effective dissemination and integration into computational biology community

v. EVALUATION: Integration of BLM tools in external projects


For questions or comments please contact

Judith Klein-Seetharaman
Assistant Professor


Department of Pharmacology
University of Pittsburgh Medical School
Biomedical Science Tower E1355
Pittsburgh, PA 15261
Tel: 412-383-7325
Fax: 412-648-1945


Language Technologies Institute
Carnegie Mellon University
School of Computer Science
Smith Hall 225
Pittsburgh, PA15213
Tel: 412-268-8249


BLM Web Toolkit | BLC ConferencesCenter Publications

Judith Klein Seetharaman | Administrative Staff | Old Members


Last updated June 08, 2005