10-810 -- Computational Molecular Biology: a machine learning approach

This course focuses on modern machine learning methodologies for computational problems in molecular biology and genetics, including probabilistic modeling, inference and learning algorithms, pattern recognition, data fusion, time series analysis, etc. We will discuss the following biological problems: 1) Analysis of high throughput biological data, such as gene expression data, focusing on issues ranging from data acquisition to pattern recognition and classification. 2) Computational genomics, focusing on gene finding, motifs detection and sequence evolution. 3) Statistical genetics, focusing on the statistical properties of the relationships between traits (phenotypes) and genetic polymorphisms (genotypes), and the analysis of pedigree, SNPS and CGH data. 4) Systems biology, concerning how to combine sequence, expression and other biological data sources (protein-protein interaction, protein-DNA binding and more) to infer the structure and function of different systems in the cell.


Ziv Bar-Joseph, WeH 4107, zivbj at cs.cmu.edu
Eric Xing, WeH 4127, epxing at cs.cmu.edu

Course Information

Course structre
Lectures: Monday & Wednesday 15:00-16:20 WEH 4615A
Office Hours, Bar-Joseph: Monday 14:00-15:00, WeH 4107
Office Hours, Xing: Wednesday 16:20-17:20, WeH 4127
Course Material: See assigned reading
Class list: The class list is available for forming project groups.

Problem Sets

Problem set 1 Due on Wednesday, 2/9, in class.
Problem set 2 Due on Wednesday, 3/02, in class.

Problem set 3 Due on Monday, 3/28, in class.

Problem set 4 Due on Wednesday, 4/06, in class.


01/10 Lecture 1: Introduction to molecular biology
01/12 Lecture 2: Statistical modeling of biopolymer sequences
01/19 Lecture 3: The Hidden Markov Models for sequence parsing
01/24 Lecture 4: HMM variants
01/26 Lecture 5: Molecular Evolution and Comperative Genomics
01/31 Lecture 6: Motif Detection
02/09 Lecture 7: Meiosis and recombination
02/14 Lecture 8: 2-point linkage analysis
02/16 Lecture 9: Quantitative trait locus (QTL) mapping
02/21 Lecture 10: SNPs and Haplotype Inference
02/23 Lecture 11: Array CGH
02/28 Lecture 12: Microarrays
03/02 Lecture 13: Normalization
03/14 Lecture 14: Differentially Expressed Genes
03/16 Lecture 15: Clustering expression data
03/21 Lecture 16: Bi-Clustering and Optimal leaf ordering
03/23 Lecture 17: Classification
03/23 Lecture 17a: Classification (cont.)
03/28 Lecture 18: Time series analysis
04/04 Lecture 20: Systems biology
04/06 Lecture 21: Bayesian Networks
04/11 Lecture 22: Graphical models
04/13 Lecture 23: Probabilistic inference in graphical models
04/18 Lecture 24: Physical networks and network motifs
04/20 Lecture 25: Network motifs and protein interactions