# 15-150: Principles of Functional Programming

# Lecture 14: Regular Expressions

Regular expressions--and their underlying finite-state
automata--are useful in many different applications, and are central
to text processing languages and tools such as `awk`

,
`Perl`

, `emacs`

and ` grep`

.

Regular expression pattern matching has a simple and elegant
implementation in SML using continuation passing.

### Key Concepts

- Formal language
- Finite-state automaton
- Regular expression
- Continuation passing
- Proof-directed debugging

The notes linked below discuss regular expressions.

The notes also discuss proofs of correctness, a topic we will
examine during the next lecture. The two sets of notes approach
proofs of correctness for our regular expression matcher in slightly
different ways:

- The first set of notes proves that the matcher returns
`true` if and only if it is given 'good' input. Here 'good'
means that the input string can be split into a prefix and a suffix,
such that the prefix is in the language of the given regular
expression and the given continuation returns `true` when
called on the suffix. (See the specs for `match`. Also note
that the actual code converts strings to lists of characters, for
simplicity.)
- The second set of notes shows that the matcher returns
`true` if it is given 'good' input and returns `false`
otherwise.

These are slightly different perspectives, and lead to slightly
different proof techniques. Let's suppose that the matcher and all
continuations involved are total, i.e., always return either
`true` or `false`. This requires proof, but let's
suppose we know it. In that case, the two perspectives on how to
prove correctness are logically equivalent. It is largely a matter of
taste and convenience which one to pick. Previous experience in
15-150 suggests that the first proof perspective, namely "matcher
returns `true` iff 'good' input" is conceptually simpler.

The first set of notes works out a correctness proof in detail,
using the simpler-to-follow proof technique we just mentioned. It is
a long proof, but an excellent template for how to prove facts about
the regular expression matcher. When doing a homework assignment,
this set of notes is a useful reference and template. The second set
of notes is useful in part because of its brevity. These notes are a
good way to get a concise overall perspective on the key issues
involved in regular expression matching. The notes only outline a
proof, so do not use them as a template for doing 15-150
assignments.

The second set of notes also discuss standardization of regular
expressions.