|
Lectures: M,W 10:00 - 11:20 (room A165)
Recitations: T., 10:30 - 11:20 (room A156B) Class Webpage: http://qatar.cmu.edu/course/15-212
Bulletin Boards:
|
|
|||||||||||||||||||||||||||||||||||||
This course has the purpose of introducing students who have had experience with basic data structures and algorithms to more advanced skills, concepts and techniques in programming and Computer Science in general. This will be accomplished along three dimensions.
The course relies extensively on the programming language Standard ML (SML). The particular implementation we will be working with is Standard ML of New Jersey (SML/NJ), version 110.59.
A reference build has been made available on the Unix cluster. To run it, you need to login into your Unix account. In Windows, you do this by firing PuTTy and specifying unix.qatar.cmu.edu as the machine name. When the PuTTy window comes up, type sml, do your work, and then hit CTRL-D when you are done
You can edit your files directly on under Unix (the easiest way is to use Emacs - see this tutorial), or you can edit them on a campus machine and put them on your "I"-drive, or you can edit them on your local machine and transfer them to the Unix servers.
Useful documentation can be found on the SML/NJ web site. The following two files will be particularly useful:
If you want, you can install a personal copy of SML/NJ on your laptop. To do this, download this file and follow these instructions Personal copies are for your convenience: all software will be evaluated on the reference environment on unix.qatar.cmu.edu. You need to make sure that your homework assignments work there before submitting them.
No text book is required and just a few lectures have handouts: what a great reason to come to class! It is in your interest to read the handouts before class. Most lectures in the class schedule below reference parts of Professor Harper's forthcoming book: they are relevant to the topic of the class, but we will not necessarily follow them strictly or at all.
The code presented in each class is available electronically by following the "code" links in the class schedule below.
| Mon 15 Jan. Lecture 1 |
Welcome and Course Introduction
Evaluation and Typing We outline the course, its goals, and talk about various administrative issues. We also introduce the language ML which is used throughout the course
|
| Tue 16 Jan. Recitation |
Practice, Style, Hints
|
| Wed 17 Jan. Lecture 2 |
Binding, Scope, and Functions
We introduce declarations which evaluate to environments. An environment collects a set of bindings of variables to values which can be used in subsequent declarations or expressions. We also discuss the rules of scope which explain how references to identifiers are resolved. This is somewhat tricky for recursive function declarations.
|
| Mon 22 Jan. Lecture 3 |
Recursion and Induction
We review the methods of mathematical and complete induction and show how they can be applied to prove the correctness of ML functions. Key is an understanding of the operational semantics of ML. Induction can be a difficult proof technique to apply, since we often need to generalize the theorem we want to prove, before the proof by induction goes through. Sometimes, this requires considerable ingenuity. We also introduce clausal function definitions based on pattern matching.
|
| Tue 23 Jan. Recitation |
Scoping in recursive functions; Complete induction |
| Wed 24 Jan. Lecture 4 |
Datatypes, Patterns, and Lists
One of the most important features of ML is that it allows the definition of new types with so-called datatype declarations. This means that programs can be written to manipulate the data in a natural representation rather than in complex encodings. This goes hand-in-hand with clausal function definitions using pattern matching on given data types. We introduce lists and polymorphic datatypes and functions
|
| Mon 29 Jan. Lecture 5 |
Structural Induction and Tail Recursion
We discuss the method of structural induction on recursively defined types. This technique parallels standard induction on predicates, but has a unique character of its own, and arises often in programming. We also discuss tail recursion, a form of recursion that is somewhat like the use of loops in imperative programming. This form of recursion is often especially efficient and easy to analyze. Accumulator arguments play an important role in tail recursion. As examples we consider recursively defined lists and trees
Tonight 2:12am Doha time: Homework 1 due |
| Tue 30 Jan. Recitation |
Lists; Equality types |
| Wed 31 Jan. Lecture 6 |
Higher Order Functions and Staged Computation
We discuss higher order functions, specifically, passing functions as arguments, returning functions as values, and mapping functions over recursive data structures. Key to understanding functions as first class values is understanding the lexical scoping rules. We discuss staged computation based on function currying
|
| Mon 05 Feb. Lecture 7 |
Data Structures
|
| Tue 06 Feb. Recitation |
Currying, folding, and mapping |
| Wed 07 Feb. Lecture 8 |
Representation Invariants
We demonstrate a complicated representation invariant using Red/Black Trees. The main lesson is to understand the subtle interactions of invariants, data structures, and reliable code production. In order to write code satisfying a strong invariant, it is useful to proceed in stages. Each stage satisfies a simple invariant, and is provably correct. Together the stages satisfy the strong invariant
|
| Mon 12 Feb. Lecture 9 |
Continuations
Continuations act as "functional accumulators." The basic idea of the technique is to implement a function f by defining a tail-recursive function f' that takes an additional argument, called the continuation. This continuation is a function; it encapsulates the computation that should be done on the result of f. In the base case, instead of returning a result, we call the continuation. In the recursive case we augment the given continuation with whatever computation should be done on the result. Continuations can be used to advantage for programming solutions to a variety of problems. In today's lecture we'll look at a simple example where continuations are used to efficiently manage a certain pattern of control. We'll see a related and more significant example in an upcoming lecture when we look at regular expressions
Tonight 2:12am Doha time: Homework 2 due |
| Tue 13 Feb. Recitation |
Review |
| Wed 14 Feb. Lecture 10 |
Regular Expressions
Regular expressions - and their underlying finite-state automata--are useful in many different applications, and are central to text processing languages and tools such as awk, Perl, emacs and grep. Regular expression pattern matching has a simple and elegant implementation in SML using continuation passing
|
| Mon 19 Feb. Lecture 11 |
Review |
| Tue 20 Feb. Recitation |
Tail Recursion vs Continuations |
| Wed 21 Feb. | Midterm |
| Mon 26 Feb. Lecture 12 |
Combinators
Combinators are functions of functions, that is, higher-order functions used to combine functions. One example is ML's composition operator o. The basic idea is to think at the level of functions, rather than at the level of values returned by those functions. Combinators are defined using the pointwise principle. Currying makes this easy in ML. We first discuss combinators of functions of type int -> int. Then we discuss rewriting our regular expression matcher using combinators. We using staging. The regular expression pattern matching is in one stage, the character functions are in another
|
| Tue 27 Feb. Recitation |
Searching, failing, and stopping early |
| Wed 28 Feb. Lecture 13 |
Exceptions, n-Queens
Exceptions play an important role in the system of static and dynamic checks that make SML a safe language. Exceptions are the first type of effect that we will encounter; they may cause an evaluation to be interrupted or aborted. We have already seen simple uses of exceptions in the course, primarily to signal that invariants are violated or exceptional boundary cases are encountered. We now look a little more closely at what exceptions are and how they can be used. In addition to signaling error conditions, exceptions can sometimes also be used in backtracking search procedures or other patterns of control where a computation needs to be partially undone
|
| Mon 05 Mar. Lecture 14 |
Functors and Substructures
A functor is a parameterized module that acts as a kind of function which takes zero or more structures as arguments and returns a new structure as result. Functors greatly facilitate hierarchical organization in large programs. In particular, as discussed in the next few lectures, they can enable a clean separation between the details of particular definition and higher-level structure, allowing the implementation of "generic" algorithms that are easier to debug and maintain, and that maximize code reuse
Tonight 2:12am Doha time: Homework 3 due |
| Tue 06 Mar. Recitation |
Ascription, where, and functors |
| Wed 07 Mar. Lecture 15 |
Game Tree Search
In this lecture we give an example of modularity and code reuse by illustrating a generic game tree search algorithm. By carefully specifying the interface between the game and the search procedure, the code can be written very generally, yet still applied to a wide variety of games. We illustrate this through a very simple minimax game tree search algorithm, but the underlying concepts and techniques become even more important as the sophistication of the search algorithm increases
|
| Mon 12 Mar. Lecture 16 |
Mutation and State
The programming techniques used so far in the course have, for the most part, been "purely functional". Some problems, however, are more naturally addressed by keeping track of the "state" of an internal machine. Typically this requires the use of mutable storage. ML supports mutable cells, or references, that store values of a fixed type. The value in a mutable cell can be initialized, read, and changed (mutated), and these operations result in effects that change the store. Programming with references is often carried out with the help of imperative techniques. Imperative functions are used primarily for the way they change storage, rather than for their return values
|
| Tue 13 Mar. Recitation |
Arrays and mutable state |
| Wed 14 Mar. Lecture 17 |
Ephemeral Data Structures
Previously, within the purely functional part of ML, we saw that all values were persistent. At worst, a binding might shadow a previous binding. As a result our queues and dictionaries were persistent data structures. Adding an element to a queue did not change the old queue; instead it created a new queue, possibly sharing values with the old queue, but not modifying the old queue in any way. Now that we are able to create cells and modify their contents we can create ephemeral data structures. These are data structures that change over time. The main advantage of such data structures is their ability to maintain state as a shared resource among many routines. Another advantage in some cases is the ability to write code that is more time-efficient than purely functional code. The disadvantages are error and complexity: our routines may accidentally and irreversibly change the contents of a data structure; variables may be aliases for each other. As a result it is much more difficult to prove the correctness of code involving ephemeral data structures. As always, it is a good idea to keep mutation to a minimum and to be careful about enforcing invariants. We present two examples. First, we consider a standard implementation of hash tables. We use arrays to implement generic hash tables as a functor parameterized by an abstract hashable equality type. Second, we revisit the queue data structure, now defining an ephemeral queue. The queue signature clearly indicates that internal state is maintained. Our implementation uses a pair of reference cells containing mutable lists, and highlights some of the subtleties involved when reasoning about references We end the lecture with a few words about ML's value restriction. The value restriction is enforced by the ML compiler in order to avoid runtime type errors. All expressions must have well-defined lexically-determined static types
|
| Mon 19 Mar. Lecture 18 |
Streams, Demand-Driven Computation
Functions in ML are evaluated eagerly, meaning that the arguments are reduced before the function is applied. An alternative is for function applications and constructors to be evaluated in a lazy manner, meaning expressions are evaluated only when their values are needed in a further computation. Lazy evaluation can be implemented by "suspending" computations in function values. This style of evaluation is essential when working with potentially infinite data structures, such as streams, which arise naturally in many applications. Streams are lazy lists whose values are determined by suspended computations that generate the next element of the stream only when forced to do so
Tonight 2:12am Doha time: Homework 4 due |
| Tue 20 Mar. Recitation |
Operations on streams; Sequences and flip-flops |
| Wed 21 Mar. Lecture 19 |
Streams, Laziness and Memoization
We continue with streams, and complete our implementation by introducing a memoizing delay function. Memoization ensures that a suspended expression is evaluated at most once. When a suspension is forced for the first time, its value is stored in a reference cell and simply returned when the suspension is forced again. The implementation that we present makes a subtle and elegant use of a "self-modifying" code technique with circular references
|
| Mon 26 Mar | No class (Spring Break) |
| Tue 27 Mar | |
| Wed 28 Mar | |
| Mon 2 Apr. Lecture 20 |
Lexical Analysis and Grammars
Many applications require some form of tokenization or lexical analysis to be carried out as a preprocessing step. Examples include compiling programming languages, processing natural languages, or manipulating HTML pages to extract structure. As an example, we study a lexical analyzer for a simple language of arithmetic expressions
|
| Tue 03 Apr. Recitation |
Languages |
| Wed 04 Apr. Lecture 21 |
Grammars and Parsing
Context-free grammars arise naturally in a variety of applications. The "Abstract Syntax Charts" in programming language manuals are one instance. The underlying machine for a context-free language is a pushdown automaton, which maintains a read-write stack that allows the machine to "count"
|
| Mon 09 Apr. Lecture 22 |
More Parsing and Evaluation (maybe)
In this lecture we continue our discussion of context-free grammars, and demonstrate their role in parsing. Shift-reduce parsing uses a stack to delay application of rewrite rules, enabling operator precedence to be enforced. Recursive descent parsing is another style that uses recursion in a way that mirrors the grammar productions. Although parser generator tools exist for restricted classes of grammars, a direct implementation can allow greater flexibility and better error handling. We present an example of a shift-reduce parser for a grammar of arithmetic expressions
|
| Tue 10 Apr. Recitation |
TBA |
| Wed 11 Apr. Lecture 23 |
Evaluation
We now put together lexical analysis and parsing with evaluation. The result is an interpreter that evaluates arithmetic expressions directly, rather than by constructing an explicit translation of the code into an intermediate language, and then into machine language, as a compiler does. Our first example uses the basic grammar of arithmetic expressions, interpreting them in terms of operations over the rational numbers. In this and the next lecture we extend this simple language to include conditional statements, variable bindings, function definitions, and recursive functions
Tonight 2:12am Doha time: Homework 5 due |
| Mon 16 Apr. Lecture 24 |
Interpreters and Recursion
We introduce declaration environments, type environments, and value environments, to distinguish between static declarations and runtime evaluations. The parser produces declaration environments. The type-checker uses the declaration environments to build type environments, and thus perform compile-time type-checking. The evaluator uses the declaration environments to build value environments, and thus perform execution-time evaluation. We extend our set of values to include functions. In order to do this properly, we introduce the notion of a closure, which encapsulates the function definition as an expression together with the necessary variable bindings in the value environment.
|
| Tue 17 Apr. Recitation |
Decidability, tractability, and tiling |
| Wed 18 Apr. Lecture 25 |
Computability, Part I
In this and the next lecture we discuss the computability of functions in ML. By the Church-Turing thesis this is the same notion of computability as we have in recursion theory, with Turing machines, etc. There are two main ideas to show that certain functions are not computable: diagonalization (which is a direct argument), and problem reduction (which shows that a problem is undecidable by giving a reduction from another undecidable problem)
|
| Mon 23 Apr. Lecture 26 |
Computability, Part II
Tonight 2:12am Doha time: Homework 6 due |
| Tue 24 Apr. Recitation |
Review for the FINAL |
| Wed 25 Apr. Lecture 27 |
Review for the FINAL |
| TBA | Final exam |
The provided code and the notes were authored by Michael Erdmann (CMU).
| Iliano Cervesato |