Lecture 14: Regular Expressions

Regular expressions--and their underlying finite-state automata--are useful in many different applications, and are central to text processing languages and tools such as `awk`, `Perl`, `emacs` and ` grep`.

Regular expression pattern matching has a simple and elegant implementation in SML using continuation passing.

Key Concepts

• Formal language
• Finite-state automaton
• Regular expression
• Continuation passing
• Proof-directed debugging

Example evaluation of the matcher

The notes linked below discuss regular expressions.

The notes also discuss proofs of correctness, a topic we will examine during the next lecture. The two sets of notes approach proofs of correctness for our regular expression matcher in slightly different ways:

• The first set of notes proves that the matcher returns true if and only if it is given 'good' input. Here 'good' means that the input string can be split into a prefix and a suffix, such that the prefix is in the language of the given regular expression and the given continuation returns true when called on the suffix. (See the specs for match. Also note that the actual code converts strings to lists of characters, for simplicity.)
• The second set of notes shows that the matcher returns true if it is given 'good' input and returns false otherwise.

These are slightly different perspectives, and lead to slightly different proof techniques. Let's suppose that the matcher and all continuations involved are total, i.e., always return either true or false. This requires proof, but let's suppose we know it. In that case, the two perspectives on how to prove correctness are logically equivalent. It is largely a matter of taste and convenience which one to pick. Previous experience in 15-150 suggests that the first proof perspective, namely "matcher returns true iff 'good' input" is conceptually simpler.

The first set of notes works out a correctness proof in detail, using the simpler-to-follow proof technique we just mentioned. It is a long proof, but an excellent template for how to prove facts about the regular expression matcher. When doing a homework assignment, this set of notes is a useful reference and template. The second set of notes is useful in part because of its brevity. These notes are a good way to get a concise overall perspective on the key issues involved in regular expression matching. The notes only outline a proof, so do not use them as a template for doing 15-150 assignments.

The second set of notes also discuss standardization of regular expressions.