02-714: String Algorithms (Fall 2013)

Course Information:

Course Description: Provides an in-depth look at modern algorithms used to process string data, particularly those relevant to genomics. The course will cover the design and analysis of efficient algorithms for processing enormous amounts of collections of strings. Topics will include string search; inexact matching; string compression; string data structures such as suffix trees, suffix arrays, and searchable compressed indices; and the Borrows-Wheeler transform. Applications of these techniques in genomics will be presented, including genome assembly, transcript assembly, whole-genome alignment, gene expression quantification, read mapping, and search of large sequence databases. No knowledge of biology is assumed; programming proficiency is required.

Computational Genomics (02-710) is not a pre-requisite. No prior knowledge of biology is assumed.

Instructor: Carl Kingsford
Associate Professor, Ray and Stephanie Lane Center for Computational Biology, School of Computer Science, Carnegie Mellon University.

Office: GHC 7705
Office Hours: TBD

Office Hours: TBD

Syllabus: Here is the syllabus.



An optional LaTeX template for homeworks.

Additional resources will be linked as appropriate from the course schedule.

Online lecture notes:

Other textbooks:

Computer documentation:

Other web resources: