\documentclass[11pt,twoside]{article}

\usepackage{lecnotes}
\input{lmacros}

\newcommand{\course}{15-814: Types and Programming Languages}
\newcommand{\lecturer}{Frank Pfenning}
\newcommand{\lecdate}{Thursday, September 5, 2019}
\newcommand{\lecnum}{2}
\newcommand{\lectitle}{Recursion}

\begin{document}

\maketitle

\section{Introduction}

In this lecture we continue our exploration of the $\lambda$-calculus
and the representation of data and functions on them.  We give
schematic forms to define functions on natural numbers and give
uniform ways to represent them in the $\lambda$-calculus.  We begin
with the \emph{schema of iteration} and then proceed the more complex
\emph{schema of primitive recursion} and finally just plan
\emph{recursion}.

\section{The Schema of Iteration}

As we saw in the first lecture, a natural number $n$ is represented by
a function $\overline{n}$ that iterates its first argument $n$ times
applied to the second:
$\overline{n}\, g\, c = \underbrace{g\, (\ldots (g}_{\mbox{$n$
    times}}\, c))$.  Another way to specify such a function
schematically is
\[
  \begin{array}{lcl}
    f\, 0 & = & c \\
    f\, (n+1) & = & g\, (f\, n)
  \end{array}
\]
If a function satisfies such a \emph{schema of iteration} then it can
be defined in the $\lambda$-calculus on Church numerals as
\[
  f = \lambda n.\, n\, g\, c
\]
which is easy to verify.  The class of function definable this way is
\emph{total} (that is, defined on all natural numbers if $c$ and $g$
are), which can easily be proved by induction on $n$.  Returning
to examples from the last lecture, let's consider multiplication again.
\[
  \begin{array}{lcl}
    \mi{times}\; 0\, k & = & 0 \\
    \mi{times}\; (n+1)\, k & = & k + \mi{times}\, n\, k
  \end{array}
\]
This doesn't exactly fit our schema because $k$ is an additional
parameter.  That's usually allowed for iteration, but to avoid
generalizing our schema the $\mi{times}$ function can just return a
\emph{function} by abstracting over $k$.
\[
  \begin{array}{lcl}
    \mi{times}\; 0 & = & \lambda k.\, 0 \\
    \mi{times}\; (n+1) & = & \lambda k.\, k + \mi{times}\, n\, k
  \end{array}
\]
We can read off the constant $c$ and the function $g$ from this schema
\[
  \begin{array}{lcl}
    c & = & \lambda k.\, \mi{zero} \\
    g & = & \lambda r.\, \lambda k.\, \mi{plus}\, k\, (r\, k)
  \end{array}
\]
and we obtain
\[
  \mi{times} = \lambda n.\, n\, (\lambda r.\, \lambda k.\, \mi{plus}\, k\, (r\, k))\,
  (\lambda k.\, \mi{zero})
\]
which is more complicated than the solution we constructed by hand
\[
  \begin{array}{lcl}
    \mi{plus} & = & \lambda n.\, \lambda k.\, n\; \mi{succ}\; k \\
    \mi{times}' & = & \lambda n.\, \lambda k.\, n\; (\mi{plus}\; k)\; \mi{zero}
  \end{array}
\]  
The difference in the latter solution is that it takes advantage of
the fact that $k$ (the second argument to $\mi{times}$) never changes
during the iteration.  We have repeated here the definition of
$\mi{plus}$, for which there is a similar choice between two versions
as for $\mi{times}$.

In this latter solution, we exploit that $(\mi{plus}\; k)$ is a function
because $\mi{plus}$ starts with two $\lambda$-abstractions.  We could also
make the second argument to $\mi{plus}$ explicit:
\[
  \begin{array}{lcl}
    \mi{times}'' & = & \lambda n.\, \lambda k.\, n\; (\lambda u.\, \mi{plus}\; k\; u)\; \mi{zero}
  \end{array}
\]  
We observe that $\mi{plus}\; k$ and $\lambda u.\, \mi{plus}\; k\; u$ always
behave the same when applied to any argument $e$ because
\[
  \mi{plus}\; k\; e =_\beta (\lambda u.\, \mi{plus}\; k\; u)\; e
\]
More generally, the behavior of $e$ and $\lambda u.\, e\, u$ is the
same when applied to any argument $e'$ as long as
$u \not\in \m{FV}(e)$ (that is, $u$ is not among the free variables of
$e$, see Section~\ref{sec:def-subst}).  This law is called
\emph{$\eta$-conversion} and represents a weak form of an
\emph{extensionality principle}: two functions should be equal if they
return equals results when applied to the equal arguments.

\section{The Schema of Primitive Recursion}

It is easy to define very fast-growing functions by iteration, such as
the exponential function, or the ``stack'' function iterating the
exponential.
\[
  \begin{array}{lcl}
    \mi{exp} & = & \lambda b.\, \lambda e.\, e\; (\mi{times}\; b)\; (\mi{succ}\; \mi{zero}) \\
    \mi{stack} & = & \lambda b.\, \lambda n.\, n\; (\mi{exp}\; b)\; (\mi{succ}\; \mi{zero})
  \end{array}
\]
Everything appears to be going swimmingly until we think of a very
simple function, namely the predecessor function defined by
\[
  \begin{array}{lcl}
    \mi{pred}\; 0 & = & 0 \\
    \mi{pred}\; (n+1) & = & n
  \end{array}
\]
You may try for a while to see if you can define the predecessor
function, but it is difficult.  The problem is that we have to go from
$\lambda s.\, \lambda z.\, s\, (\ldots (s\, z))$ to
$\lambda s.\, \lambda z.\, s\, (\ldots z)$, that is, we have to
\emph{remove} an $s$ rather than add an $s$ as was required for the
successor.  One possible way out is to change representation and
define $\overline{n}$ differently so that predecessor becomes easy
(see Exercise~\ref{exc:primrec}).  We run the risk that other
functions then become more difficult to define, or that the
representation is larger than the already inefficient unary
representation already is.  We follow a different path, keeping the
representation the same and defining the function directly.

We can start by assessing why the schema of iteration does not
immediately apply.  The problem is that in
\[
  \begin{array}{lcl}
    f\; 0 & = & c \\
    f\; (n+1) & = & g\; (f\; n)
  \end{array}
\]
the function $g$ only has access to the result of the recursive call
of $f$ on $n$, but not to the number $n$ itself.  What we would
need is the \emph{schema of primitive recursion}:
\[
  \begin{array}{lcl}
    f\; 0 & = & c \\
    f\; (n+1) & = & h\; n\; (f\; n)
  \end{array}
\]
where $n$ is passed to $h$.  For example, for the predecessor
function we have $c = 0$ and $h = \lambda x.\, \lambda y.\, x$ (we do
not need the result of the recursive call, just $n$ which is the first
argument to $h$).

At first glance it seems at least plausible that we should be able to
define any primitive recursive function using only the schema of
iteration.  Certainly, all functions in these two classes are total,
as long as the component functions $c$, $g$, and $h$ are total.

The basic idea in the representation of primitive recursion by
iteration is that we need the recursive call to return not only
$f\, n$ but also $n$ itself!  In other words, in order to eventually
get $f$, we first define a function $f'$ satisfying
\[
  f'\, n = \langle n, f\, n\rangle
\]
where $\langle {-},{-}\rangle$ forms a pair.  From such a pair
we can extract both $n$ and $f\, n$ in order to pass them to $h$.
In more detail:
\[
  \begin{array}{lcl}
    f'\, 0 & = & \langle 0, c\rangle \\
    f'\, (n+1) & = & \mi{letpair}\, (f'\, n)\, (\lambda x.\, \lambda r.\, \langle x+1, h\, x\, r\rangle) \\[1ex]
    f\, n & = & \mi{letpair}\, (f'\, n)\, (\lambda x.\, \lambda r.\, r)
  \end{array}
\]
What is $\mi{letpair}$\footnote{We called with $\mi{case}$ in lecture,
  but in hindsight that seems like a poor choice.} supposed to do?  We
specify that
\[
  \mi{letpair}\, \langle e_1, e_2\rangle\, k =_\beta k\, e_1\, e_2
\]
In other words, $\mi{letpair}$ applies the continuation $k$ to the
components of its first argument (which should be a
pair).  If we can define pairs and $\mi{letpair}$ then
$f'$ is correctly defined.  Formally, we would prove this by induction
on $n$.  First, if $n = 0$ then
\[
  \begin{array}{lcl}
    f'\, 0 & = & \langle 0,c\rangle = \langle 0, f\, 0\rangle
  \end{array}
\]
Second
\[
  \begin{array}{lcll}
    f'\, (n+1) & = & \mi{letpair}\, (f'\, n)\, (\lambda x.\, \lambda r.\,
                     \langle x+1, h\, x\, r\rangle) \\
               & =_\beta & \mi{letpair}\, \langle n, f\, n\rangle\, (\lambda x.\, \lambda r.\,
                     \langle x+1, h\, x\, r\rangle)
               & \mbox{by induction hypothesis} \\
               & =_\beta & \langle n+1, h\, n\, (f\, n)\rangle
               & \mbox{by reduction for $\mi{letpair}$} \\
               & = & \langle n+1, f\, (n+1)\rangle
               & \mbox{by definition of $f\, (n+1)$}
  \end{array}
\]
It remains to give the definitions of $\mi{pair}$ (implementing
$\langle {-},{-}\rangle$) and $\mi{letpair}$.  Actually, we will do a little
more, also providing explicit projections onto the first and second components
of a pair.  But first, we form a pair by abstracting over a function $g$
applied to both components.
\[
  \mi{pair} = \lambda x.\, \lambda y.\, \lambda g.\, g\, x\, y
\]
which means that $\mi{pair}\, e_1\, e_2 =_\beta \lambda g.\, g\, e_1\, e_2$.
To extract the first component of the pair, we simply apply it to the
first projection function!  And for the second component we project onto
the second argument.
\[
  \begin{array}{lcl}
    \mi{fst} & = & \lambda p.\, p\, (\lambda x.\, \lambda y.\, x) = \lambda p.\, p\; \mi{true} \\
    \mi{snd} & = & \lambda p.\, p\, (\lambda x.\, \lambda y.\, y) = \lambda p.\, p\; \mi{false}
  \end{array}
\]
The $\mi{letpair}$ function is interesting.  Recall that we want
\[
  \mi{letpair}\, \langle e_1, e_2\rangle\, k =_\beta k\, e_1\, e_2
\]
We define
\[
  \begin{array}{lcl}
    \mi{letpair} & = & \lambda p.\, \lambda k.\, p\, k
  \end{array}
\]
which works because
\[
  \begin{array}{lcl}
    \mi{letpair}\; (\mi{pair}\; e_1\, e_2)\, k
    & =_\beta & \mi{letpair}\, (\lambda g.\, g\, e_1\, e_2)\, k \\
    & =_\beta & (\lambda k.\, (\lambda g.\, g\, e_1\, e_2)\, k)\, k \\
    & =_\beta & k\, e_1\, e_2
  \end{array}
\]
One further remark here: the right-hand side in the definition of
$\mi{letpair}$ can be simplified further using $\eta$-conversion.
\[
  \mi{letpair} = \lambda p.\, \lambda k.\, p\, k =_\eta \lambda p.\, p
\]
so it is the identity function!  Intuitively, a pair is represented by
its own destructor function, so this destructor (here $\mi{letpair}$)
is just the identity.  Similarly, if we wanted to define an iterator
function
\[
  \mi{iter}\; \overline{n}\, f\, c = \underbrace{f\, (f \ldots (f}_{\mbox{$n$ times}}\, c))
\]
then $\mi{iter} = \lambda m.\, m$ will satisfy this equation for Church
numerals.

To put this all together, we implement a function specified with
\[
  \begin{array}{lcl}
    f\; 0 & = & c \\
    f\; (n+1) & = & h\; n\; (f\; n)
  \end{array}
\]
with the following definition in terms of $c$ and $h$:
\[
  \begin{array}{lcl}
    \mi{pair} & = & \lambda x.\, \lambda y.\, \lambda g.\, g\, x\, y \\
    \mi{letpair} & = & \lambda p.\, p \\[1ex]
    f' & = & \lambda n.\, n\, (\lambda r.\, \mi{letpair}\; r\; (\lambda x.\, \lambda y.\,
             \mi{pair}\; (\mi{succ}\; x)\; (h\; x\; y)))\, (\mi{pair}\; \mi{zero}\; c) \\
    f & = & \lambda n.\, f'\, n\; (\lambda x.\, \lambda y.\, y)
  \end{array}
\]
Eliminating $\mi{letpair} = \lambda p.\, p$ we obtain the slightly shorter
version
\[
  \begin{array}{lcl}
    \mi{pair} & = & \lambda x.\, \lambda y.\, \lambda g.\, g\, x\, y \\[1ex]
    f' & = & \lambda n.\, n\, (\lambda r.\, r\; (\lambda x.\, \lambda y.\,
             \mi{pair}\; (\mi{succ}\; x)\; (h\; x\; y)))\, (\mi{pair}\; \mi{zero}\; c) \\
    f & = & \lambda n.\, f'\, n\; (\lambda x.\, \lambda y.\, y)
  \end{array}
\]

Recall that for the concrete case of the predecessor function we have
$c = 0$ and $h = \lambda x.\, \lambda y.\, x$.
% \[
%   \begin{array}{lcl}
%     \mi{pair} & = & \lambda x.\, \lambda y.\, \lambda g.\, g\, x\, y \\
%     \mi{letpair} & = & \lambda p.\, p \\[1ex]
%     \mi{pred}' & = & \lambda n.\, n\, (\mi{pair}\; \mi{zero}\; \mi{zero})\; 
%                      (\lambda r.\, \mi{letpair}\; r\; (\lambda x.\, \lambda y.\, 
%                      \mi{pair}\; (\mi{succ}\; x)\; x)) \\[1ex]
%     \mi{pred} & = & \lambda n.\, \mi{letpair}\; (\mi{pred}'\; n)\; (\lambda x.\, \lambda y.\, y) 
%   \end{array}
% \]
% Eliminating the trivial destructor $\mi{letpair}$: 
We obtain
\[
  \begin{array}{lcl}
%    \mi{zero} & = & \lambda s.\, \lambda z.\, z \\
%    \mi{succ} & = & \lambda n.\, \lambda s.\, \lambda z.\, s\, (n\, s\, z) \\[1ex]
%    \mi{pair} & = & \lambda x.\, \lambda y.\, \lambda g.\, g\, x\, y \\[1ex]
    \mi{pred}' & = & \lambda n.\, n\,
                     (\lambda r.\, r\; (\lambda x.\, \lambda y.\,
                     \mi{pair}\; (\mi{succ}\; x)\; x))\, (\mi{pair}\; \mi{zero}\; \mi{zero}) \\[1ex]
    \mi{pred} & = & \lambda n.\, \mi{pred}'\; n\; (\lambda x.\, \lambda y.\, y)
  \end{array}
\]

\section{General Recursion}

Schematic function definitions (even at the generality of primitive
recursion) can be restrictive.  Let's consider the subtraction-based
specification of a $\mi{gcd}$ function for the greatest common divisor
of strictly positive natural numbers $a,b > 0$.
\[
  \begin{array}{lcll}
    \mi{gcd}\; a\; a & = & a \\
    \mi{gcd}\; a\; b & = & \mi{gcd}\; (a-b)\; b & \mbox{if $a > b$} \\
    \mi{gcd}\; a\; b & = & \mi{gcd}\; a\; (b-a) & \mbox{if $b > a$}
  \end{array}
\]
Why is this correct?  First, the result of $\mi{gcd}\; a\; b$ is a
divisor of both $a$ and $b$.  This is clearly true in the first
clause.  For the second clause, assume $c$ is a common divisor of $a$
and $b$.  Then there are $n$ and $k$ such that $a = n \times c$ and
$b = k \times c$.  Then $a - b = (n-k) \times c$ (defined because
$a > b$ and therefore $n > k$) so $c$ still divides both $a-b$ and
$b$.  In the last clause the argument is symmetric.  It remains to
show that the function terminates, but this holds because the sum of
the arguments to $\mi{gcd}$ becomes strictly smaller in each recursive
call because $a,b > 0$.

While this function looks simple and elegant, it does not fit the
schema of iteration or primitive recursion.  The problem is that the
recursive calls are not just on the immediate predecessor of an
argument, but on the results of subtraction.  So it might look like
\[
  f\, n = h\, n\, (f\, (g\, n))
\]
but that doesn't fit exactly, either, because the recursive calls to
$\mi{gcd}$ are on different functions in the second and third clauses.

So, let's be bold!  The most general schema we might think of is
\[
  f = h\, f
\]
which means that in the right-hand side we can make arbitrary recursive
calls to $f$.  For the $\mi{gcd}$, the function $h$ might look something
like this:
\begin{tabbing}
  $h = \lambda g.\, \lambda a.\, \lambda b.\, $ \=
  $\mi{if}\;$ \= $(a = b)\; a$ \\
  \>\> $(\mi{if}\;$ \= $(a > b)\; (g\, (a-b)\, b)$ \\
  \>\>\> $(g\, (b-a)\, b))$
\end{tabbing}
Here, we assume functions for testing $x = y$ and $x > y$ on natural
numbers, for subtraction $x - y$ (assuming $x > y$) and for
conditionals $\mi{if}\; b\; e_1\; e_2$ where
$\mi{if}\; \mi{true}\; e_1\; e_2 =_\beta e_1$ and
$\mi{if}\; \mi{false}\; e_1\; e_2 =_\beta e_2$ (see
Exercise~\ref{exc:if}).

The interesting question now is if we can in fact define an $f$
explicitly when given $h$ so that it satisfies $f = h\, f$.  We say
that $f$ is a \emph{fixed point} of $h$, because when we apply $h$ to
$f$ we get $f$ back.  Since our solution should be in the
$\lambda$-calculus, it would be $f =_\beta h\, f$.  A function $f$
satisfying such an equation may \emph{not} be uniquely determined.
For example, the equation $f = f$ (so, $h = \lambda x. x$) is
satisfied by every function $f$. For the purpose of this lecture, any
function that satisfies the given equation is acceptable.

If we believe in the Church-Turing thesis, then any partial recursive
function should be representable on Church numerals in the
$\lambda$-calculus, so there is reason to hope there are explicit
representations for such $f$.  The answer is given by the so-called
$Y$ combinator.\footnote{For our purposes, a \emph{combinator} is
  simply a $\lambda$-expression without any free variables.}  Before
we write it out, let's reflect on which laws $Y$ should satisfy?  We
want that if $f = Y\, h$ and we specified that $f = h\, f$, so we get
$Y\, h = h\, (Y\, h)$. We can iterate this reasoning indefinitely:
\[
  Y\, h = h\, (Y\, h) = h\, (h\, (Y\, h)) = h\, (h\, (h\, (Y\, h))) = \ldots
\]
In other words, $Y$ must iterate its argument arbitrarily many times.

The ingenious solution deposits one copy of $h$ and the replicates
$Y\, h$.
\[
  Y = \lambda h.\, (\lambda x.\, h\, (x\, x))\, (\lambda x.\, h\, (x\, x))
\]
Here, the application $x\, x$ takes care of replicating $Y\, h$, and
the outer application of $h$ in $h\, (x\, x)$ leaves a copy of $h$
behind.  Formally, we calculate
\[
  \begin{array}{lcl}
    Y\, h & =_\beta & (\lambda x.\, h\, (x\, x))\, (\lambda x.\, h\, (x\, x)) \\
          & =_\beta & h\, ((\lambda x.\, h\, (x\, x))\, (\lambda x.\, h\, (x\, x))) \\
          & =_\beta & h\, (Y\, h)
  \end{array}
\]
In the first step, we just unwrap the definition of $Y$.  In the
second step we perform a $\beta$-reduction, substituting
$[(\lambda x.\, h\, (x\, x))/x]\, h\, (x\, x)$.  In the third step we
recognize that this substitution recreated a copy of $Y\, h$.

You might wonder how we could ever get an answer since
\[
  Y\, h =_\beta h\, (Y\, h) =_\beta h\, (h\, (Y\, h)) =_\beta h\, (h\, (h\, (Y\, h))) = \ldots
\]
Well, we sometimes don't!  Actually, this is important if we are to
represent \emph{partial recursive functions} which include functions
that are undefined (have no normal form) on some arguments.
Reconsider the specification $f = f$ as a recursion schema.  Then
$h = \lambda g.\, g$ and
\[
  Y\, h = Y\, (\lambda g.\, g) =_\beta (\lambda x.\, (\lambda g.\, g)\, (x\, x))\, (\lambda x.\, (\lambda g.\, g)\, (x\, x)) =_\beta (\lambda x.\, x\, x)\, (\lambda x.\, x\, x)
\]
The term on the right-hand side here (called $\Omega$) has the
remarkable property that it only reduces to itself!  It therefore does
not have a normal form.  In other words, the function
$f = Y\, (\lambda g.\, g) = \Omega$ solves the equation $f = f$ by
giving us a result which always diverges.

We do, however, sometimes get an answer.  Consider, for example,
a case where $f$ does not call itself recursive at all:
$f = \lambda n.\, \mi{succ}\; n$.  Then $h_0 = \lambda g.\, \lambda n.\, \mi{succ}\; n$.
And we calculate further
\[
  \begin{array}{lcl}
    Y\, h_0 & = & Y\, (\lambda g.\, \lambda n.\, \mi{succ}\; n) \\
          & =_\beta & (\lambda x.\, (\lambda g.\, \lambda n.\, \mi{succ}\; n)\, (x\, x))\,
                      (\lambda x.\, (\lambda g.\, \lambda n.\, \mi{succ}\; n)\, (x\, x)) \\
          & =_\beta & (\lambda x.\, (\lambda n.\, \mi{succ}\; n))\, (\lambda x.\, (\lambda n.\, \mi{succ}\; n)) \\
          & =_\beta & \lambda n.\, \mi{succ}\; n
  \end{array}
\]
So, fortunately, we obtain just the successor function \emph{if we
  apply $\beta$-reduction from the outside in}.  It is however also
the case that there is an infinite reduction sequence starting at
$Y\, h_0$.  By the Church-Rosser Theorem~\ref{thm:church-rosser} this means
that at any point during such an infinite reduction sequence we could
still also reduce to $\lambda n.\, \mi{succ}\; n$.  A remarkable and
nontrivial theorem about the $\lambda$-calculus is that if we always
reduce the left-most/outer-most redex (which is the first expression
of the form $(\lambda x.\, e_1)\, e_2$ we come to when reading an
expression from left to right) then we will definitely arrive at a
normal form when one exists.  And by the Church-Rosser theorem such a
normal form is unique (up to renaming of bound variables, as usual).

\section{A Few Somewhat More Rigorous Definitions}
\label{sec:def-subst}

We write out some definitions for notions from the first two lectures
a little more rigorously.

\paragraph{$\lambda$-Expressions.}  First, the abstract syntax.
\[
  \begin{array}{llcl}
    \mbox{Variables} & x \\
    \mbox{Expressions} & e & ::= & \lambda x.\, e \mid e_1\, e_2 \mid x
  \end{array}
\]
$\lambda x.\, e$ \emph{binds} $x$ with scope $e$.  In the concrete
syntax, the scope of a binder $\lambda x$ is as large as possible
while remaining consistent with the given parentheses so
$y\, (\lambda x.\, x\, x)$ stands for $y\, (\lambda x.\, (x\, x))$.
Juxtaposition $e_1\,e_2$ is left-associative so $e_1\, e_2\, e_3$
stands for $(e_1\, e_2)\, e_3$.

We define $\m{FV}(e)$, the \emph{free variables} of $e$ with
\[
  \begin{array}{lcl}
    \m{FV}(x) & = & \{x\} \\
    \m{FV}(\lambda x.\, e) & = & = \m{FV}(e) \backslash \{x\} \\
    \m{FV}(e_1\, e_2) & = & \m{FV}(e_1) \cup \m{FV}(e_2)
  \end{array}
\]

\paragraph{Renaming.}
Proper treatment of names in the $\lambda$-calculus is notoriously
difficult to get right, and even more difficult when one \emph{reasons
  about} the $\lambda$-calculus.  A key convention is that
``\emph{variable names do not matter}'', that is, we actually
\emph{identify expressions that differ only in the names of their
  bound variables}.  So, for example,
$\lambda x.\, \lambda y.\, x\, z = \lambda y.\, \lambda x.\, y\, z =
\lambda u.\, \lambda w.\, u\, z$.  The textbook defines \emph{fresh
  renamings}~\cite[pp. 8--9]{Harper16book} as bijections between
sequences of variables and then $\alpha$-conversion based on fresh
renamings.  Let's take this notion for granted right now and write
$e =_\alpha e'$ if $e$ and $e'$ differ only in the choice of names for
their bound variables and this observation is important.  From now on
we identify $e$ and $e'$ if they differ only in the names of their
bound variables, which means that other operations such as
substitution and $\beta$-conversion are defined on
$\alpha$-equivalence classes of expressions.

\paragraph{Substitution.}
We can now define \emph{substitution of $e'$ for $x$ in $e$}, written
$[e'/x]e$, following the structure of $e$.
\[
  \begin{array}{lcll}
    [e'/x]x & = & e' \\ \relax
    [e'/x]y & = & y & \mbox{for $y \not= x$} \\ \relax
    [e'/x](\lambda y.\, e) & = & \lambda y. [e'/x]e
            & \mbox{provided $y \not\in \m{FV}(e')$} \\ \relax
    [e'/x](e_1\, e_2) & = & ([e'/x]e_1)\, ([e'/x]e_2)
  \end{array}
\]
This looks like a partial operation, but since we identify terms up to
$\alpha$-conversion we can always rename the bound variable $y$ in
$[e'/x](\lambda y.\, e)$ to another variable that is not free in $e'$
or $e$.  Therefore, substitution is a \emph{total function} on
$\alpha$-equivalence classes of expressions.

Now that we have substitution, we also characterize
$\alpha$-conversion as $\lambda x.\, e =_\alpha \lambda y.\, [y/x]e$
provided $y \not\in \m{FV}(e)$ but as a definition it would be
circular because we already required renaming to define substitution.

\paragraph{Equality.}
We can now define $\beta$- and $\eta$-conversion.  We understand these
conversion rules as defining a \emph{congruence}, that is, we can
apply an equation anywhere in an expression that matches the left-hand
side of the equality.  Moreover, we extend them to be reflexive,
symmetric, and transitive so we can write $e =_\beta e'$ if we can go
between $e$ and $e'$ by multiple steps of $\beta$-conversion.
\[
  \begin{array}{llcll}
    % \mbox{$\alpha$-conversion} & \lambda x.\, e & =_\alpha & \lambda y. [y/x] e
    % & \mbox{provided $y$ not free in $e$} \\
    \mbox{$\beta$-conversion} & (\lambda x.\, e)\, e' & =_\beta & [e'/x]e \\
    \mbox{$\eta$-conversion} & \lambda x.\, e\, x & =_\eta & e
    & \mbox{provided $x \not\in \m{FV}(e)$}
  \end{array}
\]

\paragraph{Reduction.}
Computation is based on reduction, which applies $\beta$-conversion in
the left-to-right direction.  In the pure calculus we also treat it as
a congruence, that is, it can be applied anywhere in an expression.
\[
  \begin{array}{llcll}
    \mbox{$\beta$-reduction} & (\lambda x.\, e)\, e' & \longrightarrow_\beta & [e'/x]e \\
  \end{array}
\]
Sometimes we like to keep track of length of reduction sequences so we
write $e \longrightarrow_\beta^n e'$ if we can go from $e$ to $e'$
with $n$ steps of $\beta$-reduction, and
$e \longrightarrow_\beta^* e'$ for an arbitrary $n$ (including $0$).

\paragraph{Confluence.}
The Church-Rosser property (also called confluence) guarantees that
the normal form of a $\lambda$-expression is unique, if it exists.

\begin{theorem}[Church-Rosser~\cite{Church36}]
  \label{thm:church-rosser}
  If $e \longrightarrow_\beta^* e_1$ and $e \longrightarrow_\beta^* e_2$
  then there exists an $e'$ such that $e_1 \longrightarrow_\beta^* e'$
  and $e_2 \longrightarrow_\beta^* e'$.
\end{theorem}

\section*{Exercises}

\begin{exercise}\rm
  \label{exc:primrec}
  One approach to representing functions defined by the schema of
  primitive recursion is to change the representation so that
  $\overline{n}$ is not an iterator but a \emph{primitive recursor}.
  \[
    \begin{array}{lcl}
      \overline{0} & = & \lambda s.\, \lambda z.\, z \\
      \overline{n+1} & = & \lambda s.\, \lambda z.\, s\, \overline{n}\, (\overline{n}\, s\, z)
    \end{array}
  \]
  % \[ \overline{n} = \lambda s.\, \lambda z.\, s\, \overline{n-1}\,
  %   (s\, \overline{n-2}\, (\ldots (s\, \overline{0}\, z)))
  % \]
  \begin{enumerate}\setlength{\itemsep}{0pt}
  \item Define the successor function $\mi{succ}$ (if possible) and show its correctness.
  \item Define the predecessor function $\mi{pred}$ (if possible) and show its correctness.
  \item Explore if it is possible to directly represent any
    function $f$ specified by a schema of primitive recursion, ideally
    without constructing and destructing pairs.
  \end{enumerate}
\end{exercise}

\begin{exercise}\rm
  \label{exc:if}
  We know we can represent all functions on Booleans returning
  Booleans once we have exclusive or.  But we can also represent the
  more general conditional $\mi{if}$ with the requirements
\[
  \begin{array}{lcl}
    \mi{if}\; \mi{true}\; e_1\; e_2 & = & e_1 \\
    \mi{if}\; \mi{false}\; e_1\; e_2 & = & e_2
  \end{array}
\]
Give a definition of $\mi{if}$ in the $\lambda$-calculus and verify
(showing each step) that the equations above are satisfied using
$\beta$-conversion.
\end{exercise}

\begin{exercise}\rm
  Recall the specification of the greatest common divisor
  ($\mi{gcd}$) from this lecture for natural numbers $a,b > 0$:
  \[
    \begin{array}{lcll}
      \mi{gcd}\; a\; a & = & a \\
      \mi{gcd}\; a\; b & = & \mi{gcd}\; (a-b)\; b & \mbox{if $a > b$} \\
      \mi{gcd}\; a\; b & = & \mi{gcd}\; a\; (b-a) & \mbox{if $b > a$}
    \end{array}
  \]
  We don't care how the function behaves if $a = 0$ or $b = 0$.

  Define $\mi{gcd}$ as a closed expression in the $\lambda$-calculus
  over Church numerals.  You may use the $Y$ combinator we defined,
  and any other functions like $\mi{succ}$, $\mi{pred}$, etc.\ from
  this lecture and $\mi{if}$ from Exercise~\ref{exc:if}, but you have
  to define other functions you may need such as subtraction or
  arithmetic comparisons.

  Analyze how your function behaves when one or both of the arguments
  $a$ and $b$ are $\overline{0}$.
\end{exercise}

\bibliographystyle{alpha}
\bibliography{fp,lfs}

\end{document}
