Course Description: Provides an in-depth look at modern algorithms used to process string data, particularly those relevant to genomics. The course will cover the design and analysis of efficient algorithms for processing enormous amounts of collections of strings. Topics will include string search; inexact matching; string compression; string data structures such as suffix trees, suffix arrays, and searchable compressed indices; and the Borrows-Wheeler transform. Applications of these techniques in genomics will be presented, including genome assembly, transcript assembly, whole-genome alignment, gene expression quantification, read mapping, and search of large sequence databases. No knowledge of biology is assumed; programming proficiency is required.Computational Genomics (02-710) is not a pre-requisite. No prior knowledge of biology is assumed.
Instructor: Carl Kingsford
Associate Professor, Ray and Stephanie Lane Center for Computational Biology, School of Computer Science, Carnegie Mellon University.
Office: GHC 7705
Office Hours: TBD
TA: TBD, TBD@cs
Office Hours: TBD
Syllabus: Here is the syllabus.
- Please read Trapnell et al. Transcript assembly and abundance estimation from RNA-Seq reveals thousands of new transcripts and switching among isoforms by Tuesday, Nov 19. Read the main paper and section 4 of the suplementary material (linked at the bottom of the page)
An optional LaTeX template for homeworks.
Additional resources will be linked as appropriate from the course schedule.
Online lecture notes:
- Carl's Algorithms Slides
- Carl's Data Structures Slides (scroll down)
- Dave Mount's Data Structures Lecture Notes
- Dave Mount's Algorithm Lecture Notes
- CMU 15-210 lecture notes
- CMU 15-451 lecture notes
- Introduction to Algorithms by Cormen, Leiserson, Rivest, and Stein
Other web resources: