02-[46]14: String Algorithms (Fall 2020)

Course Information:

Course Description: Provides an in-depth look at modern algorithms used to process string data, particularly those relevant to genomics. The course will cover the design and analysis of efficient algorithms for processing enormous amounts of collections of strings. Topics will include string search; inexact matching; string compression; string data structures such as suffix trees, suffix arrays, and searchable compressed indices; and the Borrows-Wheeler transform. Applications of these techniques in genomics will be presented, including genome assembly, transcript assembly, whole-genome alignment, gene expression quantification, read mapping, and search of large sequence databases. No knowledge of biology is assumed; programming proficiency is required.

Pre-requisites:

Computational Genomics (02-710) is not a pre-requisite. No prior knowledge of biology is assumed.

Instructor: Carl Kingsford
Professor, Computational Biology Department, School of Computer Science, Carnegie Mellon University.

Office: GHC 7719
Office Hours: TBD

TA: TBD, TBD@cs
Office Hours: TBD

Syllabus: Here is the syllabus.

Resources

Piazza for course discussions.

Gradescope for submitting written homeworks.

An optional LaTeX template for homeworks.

Additional resources will be linked as appropriate from the course schedule.

Useful textbooks: