Course Description

Dramatic advances in experimental technology and computational analysis are fundamentally transforming the basic nature and goal of biological research. The emergence of new frontiers in biology, such as evolutionary genomics and systems biology is demanding new methodologies that can confront quantitative issues of substantial computational and mathematical sophistication. In this course we will discuss classical approaches and latest methodological advances in the context of the following biological problems: 1) Computational genomics, focusing on gene finding, motif detection and sequence evolution. 2) Analysis of high throughput biological data, such as gene expression data, focusing on issues ranging from data acquisition to pattern recognition and classification. 3) Molecular and regulatory evolution, focusing on phylogenetic inference and regulatory network evolution, and 4) Systems biology, concerning how to combine sequence, expression and other biological data sources to infer the structure and function of different systems in the cell. From the computational side this course focuses on modern machine learning methodologies for computational problems in molecular biology and genetics, including probabilistic modeling, inference and learning algorithms, pattern recognition, data integration, time series analysis, active learning, etc.

Students are expected to have successfully completed 10-701 (Machine Learning), or an equivalent class.

Grading

Homework resources and collaboration policy

Homeworks and the exam may contain material that has been covered by papers and webpages. Since this is a graduate class, we expect students to want to learn and not google for answers. Homeworks will be done individually: each student must hand in their own answers. It is acceptable, however, for students to collaborate in figuring out answers and helping each other solve the problems. We will be assuming that, as participants in a graduate course, you will be taking the responsibility to make sure you personally understand the solution to any work arising from such collaboration. You also must indicate on each homework with whom you collaborated.

Late homework policy

Homework regrades policy

If you feel that we have made an error in grading your homework, please turn in your homework with a written explanation to the TA in charge, and we will consider your request. Please note that regrading of a homework may cause your grade to go up or down.

Homework assignments

We will have four problem sets. Problem sets will consist of both theoretical and programming problems. This is not a computer systems class, and so the programming load will be small. Still, we think that it is essential to work with real data since computational biology is an applied field. We will use matlab for the programming part.