Computational Genomics

02-710/MSCBIO207, Spring 2007

Eric Xing, Ziv Bar-Joseph, Takis Benos,

School of Computer Science, Carnegie-Mellon University

Syllabus and Course Schedule


Material covered

Online material and links

Dates and Instructor

A primer of molecular biology, cell biology and genetics
  • Lecture 1 (1/16):
Jan 16: Ziv Bar-Joseph
Population Genetics Meiosis and recombination
Linkage analysis
QTL mapping
SNPS and haplotype inference
pedigree and population inference
The coalescent process
  • Lecture 2 (1/18):
  • Lecture 3 (1/23):
  • Lecture 4 (1/25):
  • Lecture 5 (1/30):

Jan 18: Eric Xing

Jan 23: Eric Xing

Jan 25: Eric Xing

Jan 30: Eric Xing

Biological Sequence Analysis
Elements of molecular biology and statistics
Sequence analysis - heuristic algorithms
Profile HMMs
Gene finding
Motif finding
microRNA genes

  • Lecture 6 (2/1):
  • Lecture 7 (2/6): 
  • Lecture 8 (2/8):
Feb 1: Takis Benos

Problem set 1,


Feb 6: Takis Benos

Feb 8: Takis Benos

Feb 13: Takis Benos

Feb 15: Taki Benos

Feb 20: Takis Benos
Problem set 1 due
Evolution and Phylogeny Molecular evolution
  • Nucleotide substitution models, continuous-time Markov model
  • Phylogenetic tree building
  • Ancestral inference

  • Lecture 12 (2/22):
  • Lecture 13 (2/27):
  • Lecture 15 (3/6):
Feb 22: Eric Xing

Problem set 2,

Data: genomic (problem 3)
promoters (problem 5)

Feb 27: Eric Xing

Mar 6: Eric Xing

Gene Expression Analysis
Normalization and differentially expressed genes
Gene expression Dynamics
  • Lecture 15 (3/1): 
  • Lecture 16 (3/8):

  • Lecture 17 (3/20): 
  • Lecture 18 (3/22): 
  • Lecture 19 (3/27): 
  • Lecture 17 (3/29): 
Mar 1: Ziv Bar-Joseph

Mar 8: Ziv Bar-Joseph

Problem set 2 due

Problem set 3
GO enrichment analysis

Mar 20: Ziv Bar-Joseph

Mar 22: Ziv Bar-Joseph

Project proposals due

Mar 27: Ziv Bar-Joseph

Mar 29: Ziv Bar-Joseph

Problem set 3 due


Apr 5:
Systems Biology
Network evolution
  • scale-free network
  • network dynamics

Network algorithms
  • Topology and network motifs
  • Cross-species network alignment

Bayesian Networks
  • regulatory networks

Moeule networks

Dynamic models

Physical networks

Protein-protein interactions
  • Lecture 21 (4/3):
  • Lecture 22 (4/10):
Attending Alan Qi's seminar
  • Lecture 23 (4/12):
  • Lecture 24 (4/17):
  • Lecture 25 (4/24):
  • Lecture 26 (4/26): 
  • Lecture 27 (5/01): 
  • Lecture 28 (5/3):
Apr 3: Eric Xing

Apr 10: Eric Xing

Apr 12: Eric Xing

problems set 4 out

Apr 17: Takis Benos

Apr 24: Takis Benos

Apr 26: Ziv Bar-Joseph

problems set 4 due

May 01:Ziv Bar-Joseph

May 03:Ziv Bar-Joseph
Project presentation

May 10

Recitation Schedule

Date Time Place Topic

Additional Readings:

Jan 23-25: Additional readings for lectures 3-4

Review of statistical methods for QTL mapping in experimental crosses, Broman KW.


Multiple Interval Mapping for Quantitative Trait Loci, Kao, et al.


General formulas for obtaining the MLEs and the asymptotic variance-covariance matrix in mapping quantitative trait loci when using the EM algorithm, Kao CH, Zeng ZB


Multiple regression approach to mapping of quantitative trait loci (QTL) based on sib-pair data: a theoretical analysis, Sunwei Guo and Momiao Xiong


Interval Mapping of Multiple Quantitative Trait Loci (1993), Ritsert C. Jansen

Jan 30: Additional readings for lectures 5

Stephens, M., Smith, N., and Donnelly, P. (2001). A new statistical method for haplotype reconstruction from population data. American Journal of Human Genetics, 68, 978--989.

T. Niu, Z.S. Qin, X. Xu, and J. Liu (2002) Bayesian Haplotype Inference for Multiple Linked Single Nucleotide Polymorphisms. Am. J. Hum. Genet

Stephens, M., and Donnelly, P. (2003). A comparison of Bayesian methods for haplotype reconstruction from population genotype data. American Journal of Human Genetics, 73:1162-1169.

Marchini J, Cutler D, Patterson N, Stephens M, Eskin E, Halperin E, Lin S, Qin ZS, Munro HM, Abecasis GR, Donnelly P;(2006) Bayesian Haplotype Inference for Multiple Linked Single Nucleotide Polymorphisms. American Journal of Human Genetics, 78:437-50.

E.P. Xing, R. Sharan and M.I Jordan, Bayesian Haplotype Inference via the Dirichlet Process. Proceedings of the 21st International Conference on Machine Learning (ICML2004).

E.P. Xing, K. Sohn, M.I. Jordan and Y.W. Teh, Bayesian Multi-Population Haplotype Inference via a Hierarchical Dirichlet Process Mixture, Proceedings of the 23st International Conference on Machine Learning (ICML 2006).

  Feb 2: Additional readings for lecture 8

C. Burge, S. Karlin (1997). Prediction of complete gene structures in human genomic DNA. J Mol Biol, 268, 78--94.

  Korf I, Flicek P, Duan D, Brent MR (2001). Integrating genomic homology into gene structure prediction. Bioinformatics, 17 Suppl 1: S140--148.

  Gross SS, Brent MR (2006). Using multiple alignments to improve gene prediction. J Comput Biol, 13, 379--393

  Rogic S, Mackworth AK, Ouellette FB (2001). Evaluation of Gene-Finding Programs on Mammalian Sequences. Genome Res, 11, 817--832

  Feb 20: Additional readings for lecture 11 (motif finding)

Hertz GZ, Stormo GD (1999). Identifying DNA and protein patterns with statistically significant alignments of multiple sequences. Bioinformatics, 15, 563--577

  Lawrence CE, Altschul SF, Boguski MS, Liu JS, Neuwald AF, Wootton JC (1993). Detecting subtle sequence signals: a Gibbs sampling strategy for multiple alignment. Science, 262, 208--214

  Bailey TL, Elkan C (1995). The value of prior knowledge in discovering motifs with MEME. Proc Int Conf Intell Syst Mol Biol, 3, 21--29.

  Mahony S, Golden A, Smith TJ, Benos PV (2005). Improved detection of DNA motifs using a self-organized clustering of familial binding profiles. Bioinformatics, 21 Suppl 1, i283--i291.

  M Tompa et al. (2005). Assessing computational tools for the discovery of transcription factor binding sites. Nature Biotechnology, 23, 137--144.

  Feb 20: Additional readings for lecture 11 (microRNA)

Miranda KC, Huynh T, Tay Y, Ang YS, Tam WL, Thomson AM, Lim B, Rigoutsos I (2006). A Pattern-Based Method for the Identification of MicroRNA Binding Sites and Their Corresponding Heteroduplexes. Cell, 126, 1203--1217.

  Mar 8: Additional readings for lectures 16 (Normalization)
Microarray data normalization and transformation

Maximum Likelihood Estimation of Optimal Scaling Factors for Expression Array Normalization

  Mar 20: Additional readings for lectures 17 (Differentially expressed genes)
Significance analysis of microarrays applied to the ionizing radiation response

  Mar 22: Additional readings for lectures 18 (Clustering)
Cluster analysis and display of genome-wide expression patterns

  Mar 27: Additional readings for lectures 19 (Classification)
Molecular Classification of Cencer: Class Discover and Class Prediction by Gene Expression Monitoring

  Mar 29: Additional readings for lectures 20 (Time series)
Analyzing time series gene expression data

  May 01: Additional readings for lectures 27 (Physical networks)
Physical networks models

  May 03: Additional readings for lectures 28 (Protein interactions)
Comparative assessment of large-scale data sets of protein protein interactions