Course Information:

Course Description: Provides an in-depth look at modern algorithms used to process string data, particularly those relevant to genomics. The course will cover the design and analysis of efficient algorithms for processing enormous amounts of collections of strings. Topics will include string search; inexact matching; string compression; string data structures such as suffix trees, suffix arrays, and searchable compressed indices; and the Borrows-Wheeler transform. Applications of these techniques in genomics will be presented, including genome assembly, transcript assembly, whole-genome alignment, gene expression quantification, read mapping, and search of large sequence databases. No knowledge of biology is assumed; programming proficiency is required.

Computational Genomics (02-710) is not a pre-requisite. No prior knowledge of biology is assumed.

Instructor: Carl Kingsford
Associate Professor, Ray and Stephanie Lane Center for Computational Biology, School of Computer Science, Carnegie Mellon University.
carlk@cs.cmu.edu
Office: GHC 7705
Office Hours: TBD

TA: TBD, TBD@cs
Office Hours: TBD

Syllabus: Here is the syllabus.

Announcements

Please read Trapnell et al. Transcript assembly and abundance estimation from RNA-Seq reveals thousands of new transcripts and switching among isoforms by Tuesday, Nov 19. Read the main paper and section 4 of the suplementary material (linked at the bottom of the page)

Resources

An optional LaTeX template for homeworks.

Additional resources will be linked as appropriate from the course schedule.

Online lecture notes:

Other textbooks:

Introduction to Algorithms by Cormen, Leiserson, Rivest, and Stein

Computer documentation:

Other web resources: