Computational Molecular Biology and Genomics Syllabus - Fall 2004


CLASS
DATE
TOPICS
ASSIGNED READING
ADDITIONAL TOPICS
1.  Aug. 31
  • Course overview
  • Introduction to computational biology and genomics I

        PS0 handed out. Due Sept 9.
        PS0 solution set.
  •   Review biology and algorithms background  
    2.  Sept. 1
  • Introduction to sequencing,  [Slides]  D. Durand
  • Genome Assemblies and Interval Graphs  [Slides]  M. Farach-Colton, Rutgers Univ.

       You can also view these lectures online in Quicktime format.

      PS1 handed out. Due Sept 9.
  •    
    3.  Sept. 7
  • Introduction to computational biology and genomics II
  •    
    4.  Sept. 9
  • Global pairwise sequence alignment
    Lecture outline
    Alignment examples

    PS0 and PS1 due in class.
  • Global sequence alignment notes,
      courtesy Dr. M. Singh, Princeton University
  • Setubal and Meidanis, 47-55, 89-92, 96-98; (electronic reserve)
  • Durbin, pp. 17-22 (electronic reserves)
  • Saving space: Setubal and Meidanis, 58-60; (physical reserve)
  • General gap penalty functions: Setubal and Meidanis, 60-64 (physical reserve)
  • 5.   Sept. 14   Local pairwise sequence alignment.
    Semiglobal alignment. Affine gap penalties.
    Alignment examples Lecture outline
  • Local sequence alignment notes,
      courtesy Dr. M. Singh, Princeton University
  • Setubal and Meidanis, 55-57, 64-66; (electronic reserve)
  • Durbin, pp. 23-24, 29-30 (electronic reserves)
  •  
    6.   Sept. 16   Global Multiple Sequence Alignment
    Lecture outline

  • Setubal and Meidanis, 69-72 (electronic reserve)
  • Multiple sequence alignment notes, I,
  • Durbin, 6.1 -- 6.4(electronic reserves)
  • On the Design of Optimization Criteria for MSA, Durand and Farach-Colton, In Biological Evolution and Statistical Physics, M. Laessig and A. Valleriani, Eds,Springer Verlag, 2002
  • 7.  Sept. 21 Global MSA summary, Introduction to class projects
    PS2 handed out. Due Sept 30.
  • Multiple sequence alignment notes, II,
      courtesy Dr. M. Singh, Princeton University
  • Strategies for multiple sequence alignment, Nicholas HB Jr, Ropelewski AJ, Deerfield DW 2nd, Biotechniques 2002 Mar;32(3):572-4 (electronic reserve)
  • 8.   Sept. 23 Introduction to Phylogeny reconstruction, Parsimony
    Newick tree format
    Durbin, et al: (electronic reserves)
    7.1, 7.2:  Background on trees
    7.4:  Parsimony
      Parsimony, nice examples
  • Mount, pp 248-254(physical reserve)

  • 9.   Sept. 28 Phylogeny Reconstruction
     Distance-based methods.
      Distance-based methods
  • Durbin, et al: 7.3(electronic reserves)
  • Phylogeny notes,
      courtesy Dr. M. Singh, Princeton University
  •  
    10. Sept. 30 Phylogeny Reconstruction
     Distance-based methods.
    Lecture outline
     UPGMA algorithm
     NJ algorithm
  • PS2 due.
  • PS3 handed out. Due oct 7
  • PS3 solutions
  •    
    11. Oct. 5 Phylogeny Reconstruction
     Probabilistic models of evolution (Jukes-Cantor);  Correcting for multiple substitutions.
    Lecture outline
    Markov Chain background
    Ewens and Grant, 4.4-4.8
    Durbin et al., 3.1 (electronic reserves)
    Probabilistic models of evolution
    Durbin, et al: 8.1, 8.2 (electronic reserves)
    Phylogeny notes,  courtesy Dr. M. Singh, Princeton University
     
    12. Oct. 7 Phylogeny Reconstruction
     Maximum Likelihood;  Comparison of methods, Evaluation of results

    Lecture outline

  • PS3 due.
  • Durbin, et al: (electronic reserves)
    8.3, 8.4:  Maximum Likelihood
    Complexity results:
  • On the Approximability of Numerical Taxonomy: (Fitting Distances by Tree Metrics), Agarwala et al. , (SODA '96) (electronic reserve)
  • Efficient Algorithms for Inverting Evolution, Farach and Kannan, (STOC '96)
  • 13. Oct. 12 Local multiple sequence alignment
    Online protein domain databases:
      CDD: Conserved Domain Database
       CDART: Conserved Domain Architecture Retrieval Tool,

     One paragraph project description due.
       
    14. Oct. 14 Local MSA:
  • PSSM's PSSM example,   A PSSM with pseudocounts
  • Gibb's sampler
  • Motifs and Profile Analysis,
      courtesy Dr. M. Singh, Princeton University
  • Durbin, et al: p. 102 (electronic reserves)
  • Pseudocounts:
  • Durbin, et al: 5.6(electronic reserves)
  • 15. Oct. 19 Midterm Exam
    This exam is closed book. You may bring two pages (or one page, front and back) of your own notes.
       
    16. Oct. 21   No class    
    17. Oct. 26   Project proposals due.
      Hidden Markov Models I
    Guest lecturer: Rose Hoberman.
      Lecture notes for HMMs I and II
    Introduction to Markov models
    Durbin, pp 46-55.
    Ewens and Grant, pp. 327-329 Electronic reserves.
    Viterbi, Forward, Backward algorithms
    Durbin, pp 55 - 61.
    Ewens and Grant, pp. 329-332 Electronic reserves.
      Hidden Markov Models in Computational Biology: Applications to Protein Modeling,
    Krogh et al., JMB 235, pp 1501--1531,(1994).
    Available through electronic reserves.
    18. Oct. 28  Hidden Markov Models II
    Guest lecturer: Rose Hoberman.   Lecture notes for HMMs I and II
    Profile HMMs
    Durbin, pp 100 - 113.
    Ewens andGrant, pp. 335-337 Electronic reserves.
     
    19. Nov. 2 Profile HMMs   Lecture notes
    HMM topology: Durbin, pp 61-71 Electronic reserves.  
    20. Nov. 4
  • HMMs: Parameter estimation, MSA.(Lecture notes)
  • Introduction to scoring matrices (Lecture notes)
  • Parameter estimation, Baum-Welsch algorithm
    Durbin, pp 61-71
    Ewens and Grant, pp. 329-332 Electronic reserves.
    Multiple alignment using HMMs
    Ewens and Grant, pp. 337 - 339 Electronic reserves.
     
    21. Nov. 9 Substitution Matrices
      PAM matrices, BLOSUM matrices
     Lecture notes,   Scoring systems

    PS4 handed out. Due Nov 18.
    PS4 solutions
    Substitution matrices:
    Setubal and Meidanis, 80-84; (electronic reserve)
    Mount, pp 76-89; (electronic reserve)
    Durbin et al, pp 14-16 (electronic reserves)
    BLOSUM Matrices:
    Ewens and Grant, 6.5.2.
    Amino acid substitution matrices from protein blocks, Henikoff S, Henikoff JG., PNAS 89(22):10915-9, 1992 (electronic reserve)

     
    22. Nov. 11   Substitution matrices cont'd
     Lecture notes

       
    23. Nov. 16   Database searching; BLAST
     Lecture notes

      BLAST home page

      BLAST Tutorial page  Recommended for students unfamiliar with BLAST
    Data Base Searching
    Mount, pp. 282-291 (electronic reserve)

    BLAST
    Setubal and Meidanis, 84-87 (electronic reserve)
    Basic local alignment search tool, Altschul et al. , J. Mol. Bio., 1990 (electronic reserve)
     
    24. Nov. 18 PS4 due in class.
    BLAST; statistics of local, ungapped alignments.
     Lecture notes

    PS5 handed out. Due Dec 2nd.
    PAM 30,   PAM 250

    PS5 solutions
    The statistics of sequence similarity scores S. F. Altschul  

    Strategies for searching sequence databases, Nicholas HB Jr, Ropelewski AJ, Deerfield DW 2nd, Biotechniques 2002 Jun;28(6):1174-8 (electronic reserve)
    Blast statistics:
    Amino acid substitution matrices from an information theoretic perspective S. F. Altschul, J. Mol. Bio., 219:555-565, 1991 (electronic reserve)
    A protein alignment scoring system sensitive at all evolutionary distances, S. F. Altschul, J. Mol. Evol., 36:290-300 , 1993 (electronic reserve)
    Statistical Methods in Bioinformatics, W. Ewens and G. Grant (Physical reserves)

    Other BLAST references
    25. Nov. 23 Gapped BLAST
     Lecture notes

      Gapped BLAST and PSI-BLAST: a new generation of protein database search programs, Altschul et al., Nucleic Acids Research, 1997, pp. 3389 - 3394 (electronic reserve)

     
      Nov. 25 No class (Thanksgiving Holiday)    
    26. Nov. 30 Prokaryotic Gene Finding
      Lecture notes
     
  • Defining genes in the genomics era.
    Snyder and Gerstein, Science (2003) 300(5617):258-60.
  • Gene Discovery in DNA Sequences
    S. Salzberg, IEEE 1999 (electronic reserve)
  • A hidden Markov model that finds genes in E. coli DNA A. Krogh et al., NAR 1994 (electronic reserve)
  • Assessment of protein coding measures
    J.W. Fickett and C.S. Tung, NAR 1992 (electronic reserve)
  • Distinctive sequence features in protein coding genic non-coding, and intergenic human DNA R. Guigo and J.W. Fickett, JMB 1995 (electronic reserve)
  • 27. Dec. 2  Eukaryotic Gene Finding
    Lecture notes

    PS5 due in class.
    Yeast rises again.
    S. Salzberg, Nature ( 2003) 423, 233-234
  • Prediction of Complete Gene Structures in Human Genomic DNA C. Burge and S. Karlin, JMB 1997 (electronic reserve)
  • Ewens and Grant, pp. 340-346.

  • Evaluation of Gene Structure Prediction Programs M. Burset and R. Guigo, Genomics 1996 (electronic reserve)
  • 28. Dec. 7 Project presentations    
    29. Dec. 9 Project presentations

       Project final papers due.
       
    30. Friday
    Dec. 17th
    Final Exam:
      8:30 - 11:30, Porter Hall A18C
    This exam is closed book. You may use two 8.5x11 pages of your own notes. Bring a calculator.

    Gene finding study questions

       
    To view online lectures in Quicktime format, you will need to have within your browser the QuickTime plug-in, and select it as the player for all media files. You can download the QuickTime Movie player for a PC or Mac free of charge at: http://www.apple.com/quicktime/download/index.html.



    Return to course homepage
    Last modified: September 7, 2004.
    Maintained by Dannie Durand (durand@cs.cmu.edu).