Reviving an old web-page. Will update it in a few days

Algorithms for Computational and Predictive Biomedicine

Madhavi K. Ganapathiraju

Associate Professor
Department of Biomedical Informatics
School of Medicine
& Intelligent Systems Program
School of Arts and Sciences
University of Pittsburgh


I am an associate professor in the Department of Biomedical Informatics, and Intelligent Systems Program, at University of Pittsburgh. I hold a Masters degree in Electrical and Communications Engineering from Indian Institute of Science and a Ph.D. in Language and Information Technologies from School of Computer Science, Carnegie Mellon University. My current research interests include machine learning and development of multi-disciplinary approaches to computational and predictive biomedicine.

Open Positions for Graduate Students:

I am looking for students with background in mathematics, engineering sciences or machine learning to participate in research pertaining to predictive / computational medicine (bioinformatics and biomedical informatics).  Students with other backgrounds are also welcome to explore complementing opportunities.

Computational Areas
Application Areas
Machine Learning
Genome-Wide Association Studies
Signal Processing
Genome Sequence Analysis
Statistical Language Processing
Membrane Protein Structure Prediction

Students from Biomedical Informatics Training Program, Intelligent Systems Program, Joint CMU-Pitt PhD program may apply!

Algorithms for Computational and Predictive Biomedicine



Proteome and Genome Wide Analysis

Machine Learning for Transmembrane helix prediction
TMpro, is an algorithm that was built in analogy to latent semantic analysis model, for transmembrane helix prediction. A web server makes this algorithm available to the scientific community, allowing upto 4000 sequences to be analyzed at a time. Current and future work involves designing learning algorithms to improve the algorithm to take into account additional sources of information (some of which may provide partial or unreliable information).

Sequence based prediction of genes that escape inactivation in the DNA

Biological Language Modeling Toolkit (BLMT):

A toolkit to compute n-gram frequencies (n-mer / k-mer / oligomer frequencies) from protein or nucleotide sequence data has been built previously. It processes data of protein sequences or genome sequences into suffix arrays and computes a variety of sequence features such as n-grams and Yule values. The source code is in C, and may be installed on any standard computer. The system has been tested for upto 25MB data at a time. The web interface provides an interactive mechanism to compute these features without requirement to locally install the software.  A number of applications have been built over the toolkit, e.g. comparison of yule values of hydriphobic segments in transmembrane and globular proteins, n-gram comparison between human and mouse genomes, scalable algorithm for variable number tandem repeats (VNTRs) etc.

Current and future work involves advancing the scalability of the algorithms as well as development of novel applications.

Genome Sequence Analysis with BLM toolkit
Analysis of protein sequences as if they were natural language texts, allows analysis of sequence analogous to "topic segmentation" and "document classification". We computed the n-gram frequencies of 44 different organisms using the n-gram comparison functions provided by the Biological Language Modeling Toolkit and performed Markovian n-gram analysis, Zipf analysis and n-gram phrase analysis leading to the identificatio of genome signatures of organisms.

Comparison of transmembrane and soluble-hydrophobic helices
Transmembrane (TM) helix prediction algorithms often incorrectly predict globular helices and signal peptide sequences to be of TM type. The goal of this project  was to identify if correlations between amino acids in globular helices, signal peptide sequences and actual transmembrane regions differ. Yule’s Q-statistic was computed using the BLM Toolkit for the three data sets. The results show that Yule values vary between the three data sets and may prove useful features for TM prediction algorithms.

Univsersal Digital Library, Language Technologies

Om Transliteration Editor
A large number of different languages are spoken in India. The languages and scripts are distinct from each other but all Indian languages are phonetic in nature. We developed a transliteration scheme Om which exploits this phonetic nature of the alphabet. Om uses ASCII characters to represent Indian language alphabets, and  can be read directly in English, by a large number of users who cannot read script in other Indian Languages than their mother tongue. It is also useful in computer applications where local language tools are not yet available, such as email and chat. We also developed a text editor for Indian languages that integrates the Om input for many Indian languages into a word processor such as Microsoft Winword®. The text editor is also developed on Java® platform that can run on UNIX machines as well. This transliteration scheme is proposed as a possible standard for Indian language transliteration and keyboard entry.

Multilingual Book Reader: Transliteration, Word-to-Word and Full-text Translation
India being a multilingual nation, with 22 recognised official languages, also has literature in all these languages; they find representation in the Digital Library of India (DLI) which holds over 120,000 books. DLI has driven the creation of a large number of applications to process and present the Indian language content. In this paper, we present the creation of a multilingual book reader interface for DLI that supports transliteration and “good enough translation” features making it possible for readers to read a book that is written in another language.

Telugu Morphological Generator
Telmore is a morphological generator tool for Telugu nouns and verbs.  Nouns generator: For nouns, it takes a word and its "class" as input, and generates morphological forms as output. Total number of noun morphological forms is 17 under nominative, genitive, accusative, dative, locative, instrumental and vocative (cases), masculine, feminine or neutral (gender) and in number.  Verbs generator: For verbs, it takes a word in infinitive t'a form (ichchut'a, geluchut'a, raayut'a) and generates its morphological forms as output. The output has 130 forms: by 2 numbers (singular, plural), 3 genders (male, female, neutral), 3 persons (1st, 2nd and 3rd person), and 7 tenses/moods (present, past, future, aorist affirmitive, aorist negative, imperative and prohibitive), and 4 independent participles.  Input and Output of Telugu text is in Om transliteration.



Algorithms for Computational and Predictive Biomedicine


Spring 2009:

AI: Knowledge Representation and Problem Solving
3 credit, MW 5-6:30PM, Wean Hall

Algorithms for Computational & Predictive Biomedicine
3 credit, TTh 10-1130, VALE M-184

This course teaches widely-used computational approaches from disparate fields, specificially, machine learning, signal and image processing, natural language processing and graph theory. Each algorithm will be presented with application to a specific problem in the area of computational biomedicine or predictive medicine. By presenting the most fundamental concepts or algorithms from each of these fields, this course provides the students with the ability to identify the best algorithm or the field of approach to solve a biomedical question at their hand. 

Prerequisites: Working knowledge of Calculus, Probability Theory and Linear Algebra
Course goals:

  • Gain working knowledge of computational algorithms from multidisciplinary fields
  • Uncover the analogy between different areas so as to apply algorithms of one field to another



Algorithms for Computational and Predictive Biomedicine



Computational Biology

  1. "TMpro: Transmembrane Helix Prediction Using Amino Acid Properties and Latent Semantic Analysis",
    Madhavi Ganapathiraju, N. Balakrishnan, Raj Reddy and Judith Klein-Seetharaman,
    BMC Bioinformatics, vol 8, issue 10, 2007.

  2. "TMpro: Webserver and Webservice for Transmembrane Helix Prediction Using Amino Acid Properties",
    Madhavi Ganapathiraju, Chritopher Jon Jursa, Hassan A. Karimi, and Judith Klein-Seetharaman,
    Bioinformatics, vol 23, issue 20, 2007.

  3. "Evolutionary insights from suffix array based genome sequence analysis",
    A. Poddar, N. Chandra, Madhavi Ganapathiraju, K. Sekar, J. Klein-Seetharaman, R. Reddy and N. Balakrishnan,
    J. Biosciences, in print, 2007.

  4. "Collaborative Discovery and Biological Language Modeling Interface",
    Madhavi Ganapathiraju, Vijayalaxmi Manoharan, Raj Reddy and Judith Klein-Seetharaman,
    Lecture Notes in Artificial Intelligence, LNCS/LNAI 3864, 2006.

  5. "Retinitis pigmentosa associated with rhodopsin mutations: Correlation between phenotypic variability and molecular effects",
    Iannaccone A, D. Man, N. Waseem, BJ. Jennings, Madhavi Ganapathiraju, K. Gallaher, E. Reese, SS. Bhattacharya, J. Klein-Seetharaman,
    Vision Research, vol 46, issue 27, pp 4556-67, 2006.

  6. "Comparison of stability predictions and simulated unfolding of rhodopsin structures",
    Oznur Tastan, Esther Yu, Madhavi Ganapathiraju, Anes Aref, AJ Rader and Judith Klein-Seetharaman,
    Photobiology and photochemistry, vol 63, issue 2, pp 351-363, 2006.

  7. "Computational Biology and Language",
    Madhavi Ganapathiraju, N. Balakrishnan, Raj Reddy and Judith Klein-Seetharaman,
    Lecture Notes in Artificial Intelligence, LNCS/LNAI 3345, 2004.

  8. "BLMT: Statistical sequence analysis using n-grams",
    Madhavi Ganapathiraju, Vijayalaxmi Manoharan and Judith Klein-Seetharaman,
    Applied Bioinformatics, vol. 3, issue 2, November 2004.

  9. "Characterization of Protein Secondary Structure using Latent Semantic Analysis",
    Madhavi Ganapathiraju, Judith Klein-Seetharaman, N. Balakrishnan and Raj Reddy,
    IEEE Signal Processing Magazine, vol. 21, issue 3, May 2004.

Digital Library and Indian Language Processing

  1. "Digital Library of India: a testbed for Indian language research",
    N. Balakrishnan, Raj Reddy, Madhavi Ganapathiraju, Vamshi Ambati,
    IEEE Technical Committee on Digital Libraries (TCDL) Bulletin, vol 3, issue 1, 2006.

  2. "Om: One tool for many (Indian) languages",
    Ganapathiraju Madhavi, Balakrishnan Mini, Balakrishnan N., Reddy Raj,
    Journal of Zhejiang University SCIENCE, Vol 6A, No. 11, pp 1348-1353, Oct 2005.

  3. "Improving Recognition Accuracy on CVSD speech in mismatched conditions",
    Madhavi Ganapathiraju, N. Balakrishnan and Raj Reddy,
    WSEAS Transactions on Computers, Vol 2, Issue 4, October 2003.

Conference Proceedings

Computational Biology

  1. "BLMT Web Server: Interactive Language Technologies for Analogous Biological Data",
    Vijayalaxmi Manoharan, Madhavi Ganapathiraju and Judith Klein-Seetharaman,
    Workshop on Ambient Intelligence and (Everyday) Life, Donostia, San-Sebastian, Spain, 2005.

  2. "Yule value tables from Protein Datasets of different categories: emphasis on membrane proteins" [Invited Talk],
    Madhavi Ganapathiraju, Deborah Weisser, Raj Reddy and Judith Klein-Seetharaman,
    Proc. SCI2004, Florida, USA, 2004.

  3. "Extensions to Biological Language Modeling Toolkit (BLMT)",
    Balakrishnan Sivaraman, Madhavi Ganapathiraju, Judith Klein-Seetharaman, N. Balakrishnan and Raj Reddy,
    BLC2003: Biological Language Conference, Pittsburgh USA, November 2003.

  4. "Comparative n-gram analysis of whole-genome sequences"
    Madhavi Ganapathiraju, Deborah Weisser, Judith Klein-Seetharaman, Roni Rosenfeld, Jaime Carbonell and Raj Reddy,
    HLT'02: Human Language Technologies Conference, San Diego, March, 2002.
  5. "Rare and frequent amino acid n-grams in whole-genome protein sequences",
    Madhavi Ganapathiraju, Judith Klein-Seetharaman, Roni Rosenfeld, Jaime Carbonell and Raj Reddy, 
    RECOMB'02: The Sixth Annual International Conference on Research in Computational Molecular Biology, Washington DC, USA, April, 2002.

Digital Library and Indian Language Processing

  1. "TelMore: Morphological Generator for Telugu Nouns and Verbs",
    Madhavi Ganapathiraju and Lori Levin,
    Proc. Second International Conference on Universal Digital Library, Vol Alexandria, Egypt, Nov 17-19, 2006

  2. "Multilingual Book Reader: Transliteration, Word-to-Word Translation and Full-text Translation",
    Prashanth Balajapally, Phanindra Pydimarri, Madhavi Ganapathiraju, N. Balakrishnan, Raj Reddy,
    VALA 2006: 13th Biennial Conference and Exhibition Conference of Victorian Association for Library Automation, Melbourne, Australia, February 8-10, 2006.

  3. "Million Books to Web: Technological Challenges and Research Issues",
    N. Balakrishnan, Raj Reddy, Madhavi Ganapathiraju and Hemant Gogineni,
    Proc. Tamil Internet conference, pp 227-242, December 2004.

  4. "OmSE: Tamil Search Engine",
    Anandh Jayaraman, Srinivas Sangani, Madhavi Ganapathiraju and N. Balakrishnan,
    Proc. Tamil Internet conference, pp 23-29, December 2004.


  1. "Relevance of Cluster Size in MMR Summarization",
    Madhavi Ganapathiraju (Advisor: Jaime G. Carbonell),
    Report for 11-745, 2002.

  2. Ph.D. Thesis: "Application of Language Technologies to Computational Biology: Transmembrane Helix Prediction and Characterization",
    Madhavi Ganapathiraju. Thesis Supervisors: Raj Reddy and Judith Klein-Seetharaman,
    Language Technologies Institute, Carnegie Mellon University, Pittsburgh PA 15213 USA.

  3. M.Engg. "Thesis: Speech Recognition on MPEG/Audio"
    Madhavi Ganapathiraju. Advisors: Prof. N. Balakrishnan and Prof. P. S. Naidu,
    Department of Electrical and Communications Engineering, Indian Institute of Science, Bangalore, 560012, India



Algorithms for Computational and Predictive Biomedicine


People of This Lab


Madhavi Ganapathiraju

Assistant Professor

Department of Biomedical Informatics (DBMI)
Intelligent Systems Program (ISP)
Bioinformatics and Bioengineering Summer Institute (BBSI)
University of Pittsburgh

  Yolanda Dibucci Administrative Support
Asia Mitchell

NLM Funded Summer Research Fellow, DBMI
Research focus: Genome Sequence Pattern Discovery using BSAT


  Adam Handen  

Summer Student, DBMI
Research focus: Visualization Interface Development for BSAT

  Thahir P. Mohamed  

Graduate Research Assistant, DBMI & ISP
Research focus: BSAT


Other Affiliated Students


Heather Piwowar
Ph.D student, DBMI
Thesis Advisor: Dr. Wendy Chapman
  Chad Kimmel   M.S. Student, DBMI
Supervisor: Dr. James Lyons Weiler

Past Students of this Lab


  Jessica A. Wehner
May-July 2008

Summer Research Fellow, BBSI
Research focus: Active Learning (Machine Learning) for Membrane Protein Structure Prediction

She moved to University of North Carolina at Chapel Hill for M.S. in Applied Mathematics.


Algorithms for Computational and Predictive Biomedicine


Dr. Deepti Deobagkar
University of Pune, India
Student: Ashwin Kelkar (Ph.D. Candidate)

Dr. Judith Klein-Seetharaman
University of Pittsburgh School of Medicine



Algorithms for Computational and Predictive Biomedicine


Mailing Address
200 Meyran Ave
Room M-189
Pittsburgh PA 15260 USA



Corner of Forbes and Meyran Ave. Opposite Pitt Kiva-han, near Pitt Starbucks.
It is the nice building with pillars and lions and arched windows.


Phone +1-412-647-7113 or +1-412-647-0624



Administrative Contact
Ms. Yolanda Dibucci
Room M-189