Advanced Algorithms and Models for Computational Biology

10-810, Spring 2006

School of Computer Science, Carnegie-Mellon University

Syllabus and Course Schedule


Material covered

Online material and links

Dates and Instructor

A primer of molecular biology, cell biology and genetics
  • Lecture1 (1/23): 
Jan 18: Ziv Bar-Joseph
and Eric Xing

Jan 23: Eric Xing
Biological Sequence Analysis
Basic probability

Gene finding
  • gene scan via HMM
  • coupled HMM
  • Hierarchical HMM
  • phylogenetic HMM

Motif/CRM finding
  • The MEME model
  • Bayesian motif models
  • Dictionary models
  • CRM structural models
  • Micro-RNA identification
  • Lecture 3 (1/25)
  • Lecture 4 (1/30):
  • Lecture 5 (2/1):
  • Lecture 6 (2/6): 
  • Lecture 7 (2/8): 
  • Lecture 8 (2/13): 
Jan 25: Eric Xing

Jan 30: Eric Xing

Feb 1: Eric Xing

Feb 6: Eric Xing

Feb 8: Ziv Bar-Joseph

Feb 13: Eric Xing
Gene Expression Analysis
Normalization and differentially expressed genes
Gene expression Dynamics
  • Lecture 9 (2/15):
  • Lecture 10 (2/20):
  • Lecture 11 (2/22):
  • Lecture 12 (2/27):
  • Lecture 13 (3/1):
  • Lecture 14 (3/6):
Feb 15: Ziv Bar-Joseph
Feb 20: Ziv Bar-Joseph
Feb 22: Ziv Bar-Joseph
Feb 27: Ziv Bar-Joseph
Mar 1: Ziv Bar-Joseph
Mar 6: Ziv Bar-Joseph
Guest lecture

  • Guest Lecture (3/22):
Mar 22: Takis Benos
Population Genetics
Meiosis and recombination
Linkage analysis
QTL mapping
SNPS and haplotype inference
pedigree and population inference
The coalescent process
  • Lecture 15 (3/8): 
  • Lecture 16 (3/20):
  • Lecture 17 (3/22 & 3/27):
  • Lecture 18 (3/27):
  • Lecture 19 (3/29):
Mar 8: Eric Xing
Mar 20: Eric Xing
Mar 27: Eric Xing
Mar 29: Eric Xing
Evolution and Phylogeny
Molecular evolution
  • Nucleotide substitution models, continuous-time Markov model
  • Phylogenetic tree building
  • Ancestral inference

Network evolution
  • scale-free network
  • network dynamics
  • Lecture 20 (4/3):
  • Lecture 21 (4/5):
  • Lecture 22 (4/10):

Apr 3: Eric Xing
Apr 5: Eric Xing
Apr 10: Eric Xing
Systems Biology
Bayesian Networks
MRF and active learning
Topology and network motifs
Cross-species network alignment
Protein-protein interactions
  • Lecture 24 (4/17):
  • Lecture 25 (4/19):
  • Lecture 26 (4/24):
  • Lecture 27 (4/26):
  • Lecture 28 (5/1):
  • Lecture 29 (5/3):
Apr 12: Eric Xing
Apr 17: Eric Xing
Apr 19: Eric Xing
Apr 24: Ziv Bar-Joseph
Apr 26: Ziv Bar-Joseph
May 01:Ziv Bar-Joseph
May 03:Ziv Bar-Joseph
May 08:Ziv Bar-Joseph
Project presentation

May 10

Recitation Schedule

Date Time Place Topic

Additional Readings:

Jan 30: Additional readings for lectures 4-5
Prediction of Complete Gene Structures in Human Genomic DNA

Applications of Generalized Pair Hidden Markov Models to Alignment and Gene Finding Problems

Feb 7: Additional readings for lectures 6-8
Fitting a mixture model by expectation maximization to discover motifs in biopolymers

Bayesian models for multiple local sequence alignment and gibbs sampling strategy

Feb 7: Additional readings for lectures 7
Computational identification of Drospphila microRNA genes

Feb 13: Additional readings for lectures 8
Modeling Dependencies in Protein-DNA Binding Sites

LOGOS: A modular Bayesian model for de novo motif detection

De novo cis-regulatory module elicitation for eukaryotic genomes

CisModule: A Bayesian module sampler by hierachical mixture modeling

Feb 15: Additional readings for lectures 9
Navigating gene expression using microarrays, a technology review

Microarray data normalization and transformation

Feb 20: Additional readings for lectures 10
Significance analysis of microarrays applied to the ionizing radiation response

Feb 27: Additional readings for lectures 12
Cluster analysis and display of genome-wide expression patterns

Mar 1: Additional readings for lectures 13
Molecular Classification of Cencer: Class Discover and Class Prediction by Gene Expression Monitoring

Mar 3: Additional readings for lectures 14
Analyzing time series gene expression data

Mar 23: Additional readings for lectures 16-17

Review of statistical methods for QTL mapping in experimental crosses, Broman KW.


Multiple Interval Mapping for Quantitative Trait Loci, Kao, et al.


General formulas for obtaining the MLEs and the asymptotic variance-covariance matrix in mapping quantitative trait loci when using the EM algorithm, Kao CH, Zeng ZB


Multiple regression approach to mapping of quantitative trait loci (QTL) based on sib-pair data: a theoretical analysis, Sunwei Guo and Momiao Xiong


Interval Mapping of Multiple Quantitative Trait Loci (1993), Ritsert C. Jansen

Mar 27: Additional readings for lectures 18
Stephens, M., Smith, N., and Donnelly, P. (2001). A new statistical method for haplotype reconstruction from population data. American Journal of Human Genetics, 68, 978--989.

T. Niu, Z.S. Qin, X. Xu, and J. Liu (2002) Bayesian Haplotype Inference for Multiple Linked Single Nucleotide Polymorphisms. Am. J. Hum. Genet

Stephens, M., and Donnelly, P. (2003). A comparison of Bayesian methods for haplotype reconstruction from population genotype data. American Journal of Human Genetics, 73:1162-1169.

Apr 10: Additional readings for lectures 22
Network biology: understanding the cell's functional organization. Barabasi AL, Oltvai ZN.

Apr 17: Additional readings for lectures 24-25

Using Bayesian Networks to Analyze Expression Data with M. Linial, I.
Nachman, and D. Pe'er. In Proc. Fourth Annual Inter. Conf. on Computational Molecular Biology RECOMB, 2000

Module networks: identifying regulatory modules and their condition-specific regulators from gene expression data.
Segal E, Shapira M, Regev A, Pe'er D, Botstein D, Koller D, Friedman N.

Adrian Dobra, Beatrix Jones, Chris Hans, Joseph R. Nevins and Mike West, Sparse graphical models for exploring gene expression data, 2004, J. Mult. Analysis, 90, 196-212.
More Additional readings

Class Monday, April 24:
Physical network models

Class Wednesday, April 26
<>Computational discovery
of gene modules and regulatory networks

Class Monday, May 1
Rich Probabilistic Models for Gene Expression

Class Wednesday, May 3

assessment of large-scale data sets of protein protein interactions