Lecture 4: Tools for Formal Definitions and Reasoning ===================================================== One of the goals of this course is to teach formal (i.e. mathematical) reasoning about programming languages and compilers. In fact, here at CMU that is an explicit goal of the Logic and Languages constrained elective category. Today we'll learn some basic intellectual tools that we'll use to formally define aspects of a programming language or compiler, and to reason about those formal definitions. Tomorrow in recitation, you'll learn about a proof assistant tool, SASyLF, that will help you write down those definitions and proofs in a rigorous way and can automatically check them for mistakes. You might ask about the title of this course: what's pragmatic about formalism and proof? Perhaps surprisingly, a lot! Writing little formalisms like the ones we'll learn in this class were used to figure out how to correctly add generics to Java---types like List. More recently, the WebAssembly platform, which for example allows safely and efficiently running C programs in browers, was defined entirely using a formal specification. We can be certain that if we have formalized a design and proved it correct, it won't have certain flaws--like security vulnerabilities in the case of mobile code in Java or WebAssembly. That's a very pragmatic thing for end users to care about! Based on the recent history, it's likely that future programming languages will be defined in terms of formalisms like the ones you'll be learning. More broadly, learning how to formalize definitions and do proofs is a skill that can help you think more precisely about the semantics of the languages you work in and the programs you write. Formalizing Numbers ------------------- Before tackling programming languages, let's look at something simpler: formalizing natural numbers. The first thing we need to do is define natural numbers. To reason about numbers, we need to represent them somehow. We will use syntax to do this---defined using the same tools (context-free grammars) that we just talked about using to define programming languages. Natural numbers can be defined inductively: a number is either zero or a successor of some other number. For example, the number 1 is the successor of zero, and the number 2 is the successor of 1. We can define this syntactically as follows. Let z represent the number zero. And if n is a number, then s n is the successor of that number. Using a context free grammar, this can be written as: n ::= z | s n Now we can represent the number 3 with the string "s s s z". But instead of thinking about strings, we'd like to think of this as an abstract syntax tree, with the s elements forming the root and (single) branch, and z at the leaf. Graphically this is: s | s | s | z Of course, programs have more interesting structure than numbers, and their abstract syntax trees will have more than one branch, and more kinds of things at the leaves. For example, here's a syntax for a very simple subset of mathematical expressions: e ::= n | x | e + e | e * e These simple expressions are made up of numbers n, variables x, addition expressions e1 + e2, and multiplication expressions e1 * e2. We can then build parse trees for expressions like 2 * (x + 1) as follows: * / \ 2 + / \ x 1 Formalizing Addition -------------------- Now that we have formalized natural numbers, we'd like to reason about them. Most interesting properties of numbers rely on operators such as addition, so let's formalize that. Let's first write down some formal syntax for relating an addition expression to its result. We'll do that with a judgment of the form n1 + n2 = n3, which means the obvious thing: that when you add the number n1 to the number n2, you get the number n3. Let's call this judgment "sum". In the sum judgment, n1, n2, and n3 are metavariables: we will replace them with actual numbers when we instantiate the judgment. For example, if n1 is 0 (or "z") and n2 is 1 (or "s z") then n3 will be 1 (again, "s z" in our formalization). Note that we are using a convention that metavariables are named after the nonterminal representing their syntactic category: n1 is a number (the subscript 1 distinguishes it from other numbers in the same judgment). Of course, the judgment n1 + n2 = n3 is true if we instantiate it as z + (s z) = (s z), but it's not true if we instantiate it in other ways. For example, z + z = s z is not true. We can define when a judgment is true using inference rules. An inference rule is written as follows: P1 P2 ... Pn ------------------ rule-name C Where P1...Pn are judgments called premises and C is a judgment called the conclusion. We write the name of the rule to the right of the line. The inference rule means that if all the premises are true, then the conclusion is true. A special case of an inference rule is an axiom, which is a rule that has no premises. Let's write an axiom for adding zero to a number: --------- sum-z z + n = n This rule states that if you add z (zero) to any number n, the result is n. We name the rule sum-z, which helps us remember that it defines the sum judgment for the case where we are adding zero to a number. Of course, we also need to define addition when we are adding numbers other than zero. Let's therefore define another rule: n1 + n2 = n3 -------------------- sum-s (s n1) + n2 = (s n3) We'll call this rule sum-s, because it's the successor (s) case of sum. If we have established that n1 + n2 = n3, then we know that we can add 1 to both sides, thus (s n1) + n2 = (s n3). You might think that we need more rules--what if the second number is zero? But in fact these are all the rules we need to define addition for natural numbers. As we will see, we can use inductive reasoning to show other interesting properties of addition, such as for all n1, n1 + z = n1. Derivations and Provability --------------------------- How can we prove concrete facts like 1 + 2 = 3 using this system? First of all, let's encode the numbers in our system. 1 + 2 = 3 can be written as (s z) + (s s z) = (s s s z). Now we can use inference rules to conclude what we need. We'll build a derivation tree, which has the thing we want to prove at the bottom, and applies rules to each judgment until we get to axioms at the leaves of the tree. For our example fact, the derivation will look like this: --------------------- sum-z z + (s s z) = (s s z) --------------------------- sum-s (s z) + (s s z) = (s s s z) You can read the reasoning from the top down. We can apply the axiom sum-z, instantiating the number n with s s z, to conclude that z + s s z = s s z. We can then use that as a premise of the rule sum-s: n1 will be z, n2 will be s s z, and n3 will be s s z. If we plug n1, n2, and n3 into the conclusion of the sum-s rule, we get the desired result: (s z) + (s s z) = (s s s z). We say that a judgment J is provable if there exists a well-formed derivation that has J as its conclusion. Well-formed means that every step in the derivation is a valid instance of one of the inference rules in our formal system. Structural Induction -------------------- Now, we'd like to prove some properties! Let's start with the property we mentioned earlier: for all n, n + z = n. This is "obviously" true in mathematics, but is it true in our formalization of addition? Let's find out! We'll use a technique called mathematical induction to do this. Mathematical Induction is a technique for properties about natural numbers. One such property is the one above: that adding zero to any number n yields that same number, n. In a proof by induction, we show that some property P is true in two parts. In the first part, called the base case, we show that the property is true for the number 0---which we can write as P(0). In the second part, call the inductive case, we show that when the property is true for some number k, then it must be true for the number k+1. More formally, we show that P(k) implies P(k+1). Together, these show that the property is true for every natural number n. We know this must be true because for a given n we can apply the base case plus n instances of the inductive case to show that the property is true. The nice thing is that we do not have to actually construct the concrete proofs for each individual n (which is good because there are an infinite number of such n's, and the concrete proofs get larger with each n). One generic proof suffices for all numbers. For reasoning about programming languages---as well as the simpler case of addition for natural numbers---we'll use a variant of induction called structural induction. Structural induction works over some inductively defined structure: like our natural number syntax. The base cases are the base case of our syntax: for natural numbers, that's z. So to prove some property P(n) for all n, the base case will be to show that P(z) holds. Then, for the inductive case, we show that we can prove that P(n) holds if we assume P(n') holds for all n' that are smaller than n. What does it mean for n' to be smaller than n? In structural induction, n' is smaller than n if n' is a substructure of n. In the case of natural numbers, if n = s n', then n' is clearly a substructure of n, because we applied the production n ::= s n with n' on the right and n on the left. Just as in mathematical induction we reason from smaller numbers to larger ones, in structural induction we reason from smaller structures to larger structures. Another way to look at this is that we are doing induction over trees. Our tree for the number s s s z (which is how we represent 3) was: s | s | s | z and so our base case is z, whereas our inductive case (for s) moves us one step at a time up the tree until we've proved the property for the entire number. We can do this for expressions also: in the tree * / \ 2 + / \ x 1 we can define base cases for different kind of leaves--one base case for numbers like 1 and 2, and one base case for variables like x. Then we have inductive cases in the proof that rely on the fact that the property has been proved for subtrees, and show that the property is true when we combine those subtrees with the operators + and *. By applying the base cases 3 times and the inductive cases twice, we can show that the given property holds for the tree above. Let's take the simple case of zero/successor numbers first, and prove the property that for all n, n + z = n. We can give this property a name, sum-z-rh, for it is a property of the sum judgment when you add z on the right hand side of +. The proof is by structural induction on n: Base case (n=z): we need to show that z + z = z. We can prove this by applying the sum-z rule where n=z: --------- sum-z z + z = z Inductive case (n = s n'): we need to show that n + z = n. Rewriting in terms of n', we have s n' + z = s n' Now, we are allowed to assume that the property we are proving is true for substructures of n. We call this assumption the induction hypothesis. One such substructure is n'. Thus we have: n' + z = n' by applying the induction hypothesis to n'. Now we can finish the proof by applying the rule sum-s: n' + z = n' --------------- sum-s s n' + z = s n' Of course, this is not a complete derivation, but that's OK. When we assume the induction hypothesis, we are really assuming there is some derivation D that can be used to prove that n' + z = n'. What we did in the last step is apply the rule sum-s using the entire derivation D as the premise, and giving us the desired conclusion. Induction Over Derivations -------------------------- Syntax definitions are inductive structures--but so are derivations. That means we can do induction over them. This is useful to prove many properties. For example, consider the property that's symmetric to sum-s: for all n1, n2, and n3 such that n1 + n2 = n3, we have n1 + s n2 = s n3. Let's call this sum-s-rh (it's a property of the sum judgment when you add an s to the right hand side of the +) You can actually prove this by induction on n1, but it's a bit complicated to do so. Let's instead assume there is some derivation D of n1 + n2 = n3, and do induction over that derivation. The derivation D must end with the application of some rule: either sum-z or sum-s, since those are the only two rules that can be used to derive a sum judgement. We'll finish the proof by considering each rule as a case: The base case is sum-z: --------- sum-z z + n = n If we are applying rule sum-z, then n1 must be z, and n2 and n3 must be the same number n, because otherwise it doesn't match the rule. Plugging the substituion [z/n1, n/n2, n/n3] into the thing we have to show, we get z + s n = s n as the desired result. Here the notation [z/n1, n/n2, ...] means substitution z for n1, n for n2, etc. But we can just use the sum-z rule to show this: -------------- sum-z z + s n = s n which finishes our case. The inductive case is the rule sum-s: n1' + n2 = n3' ------------------ sum-s s n1' + n2 = s n3' Once again, we have a substitution: if we are using the sum-s rule to derive n1 + n2 = n3, then n1 must be s n1' (for some number n1') and similarly n3 must be some number s n3'. Now, notice that if the derivation D ended with the above application of sum-s, there must be some derivation D' of the property n1' + n2 = n3' that is in the premise of the rule. D' is a subderivation of D: it's a part of the derivation of D. We are doing induction on the derivation D, so we can assume the induction hypothesis about any subderivation, in particular the subderivation D'. Thus we have: n1' + s n2 = s n3' by applying the induction hypothesis to D'. Notice that now we can use rule sum-s as follows: n1' + s n2 = s n3' ---------------------- sum-s s n1' + s n2 = s s n3' But notice that this result, s n1' + s n2 = s s n3', is exactly what you get if you apply the substitution [s n1'/n1, s n3'/n3] to the thing we were trying to prove, which was n1 + s n2 = s n3. Thus we are done! Once we have proved a property like sum-s-rh, we can use it just like a rule to prove other theorems. For example, we might want to prove that + is commutative. We can do so using structural induction, the rules sum-z and sum-s, and the theorems above: sum-z-rh and sum-s-rh. In fact, theorems like "+ is commutative" are the interesting ones; properties like sum-z-rh are mostly useful to prove commutativity, and so we call them lemmas: properties that are useful in proving a more interesting theorem. Proofs by Induction Over Syntax ------------------------------- Proofs about numbers are fun, but can we prove things about programs? Consider the grammar for expressions that we defined before. Let's define the literals of an expression to be all the n's and x's within it, and let's define the operators to be all the +'s and *'s within it. An interesting property is that the number of literals is always the number of operators plus 1. Can we prove it? First, let's define some rules. First of all, we have a judgment Lit(e) = n for defining the literals of an expression e. The rules are: ---------- Lit-n Lit(n) = 1 ---------- Lit-x Lit(x) = 1 Lit(e1) = n1 Lit(e2) = n2 n1 + n2 = n3 ------------------------------------------ Lit+ Lit(e1 + e2) = n3 Lit(e1) = n1 Lit(e2) = n2 n1 + n2 = n3 ------------------------------------------ Lit* Lit(e1 * e2) = n3 We can also define rules for an operator judgment, Ops(e) = n: ---------- Ops-n Ops(n) = 0 ---------- Ops-x Ops(x) = 0 Ops(e1) = n1 Ops(e2) = n2 n1 + n2 + 1 = n3 ---------------------------------------------- Ops+ Ops(e1 + e2) = n3 Ops(e1) = n1 Ops(e2) = n2 n1 + n2 + 1 = n3 ---------------------------------------------- Ops* Ops(e1 * e2) = n3 If we assume that number n are defined as before, we can take 0 as an abbreviation for z and 1 as an abbreviation for s z. In the rest of this section, I'll use numbers as in math, but remember that we could define and reason about them entirely using rules like sum-s. Now let's prove the property. For all expressions e, Lit(e) = Ops(e)+1. We'll prove it by induction on e. This induction is a bit more interesting because e is tree-structured...the syntax e1 + e2 has two smaller bits of syntax within it, e1 and e2. So when we do the inductive step for the case of e1 + e2, we can apply the induction hypothesis twice: once for e1 and once for e2. This is OK because both e1 and e2 are subtrees of e1 + e2. The proof goes by case analysis on the last syntactic production used to construct e: case e=x: Lit(x) = 1 by rule Lit-x Ops(x) = 0 by rule Ops-x So Lit(x) = 1 = Ops(x)+1 = 0+1 = 1 (as mentioned above, we are doing the math in one step, rather than appealing to the judgments defining +) case e=n: Lit(n) = 1 by rule Lit-n Ops(n) = 0 by rule Ops-n So Lit(n) = 1 = Ops(n)+1 = 0+1 = 1 (note: this case is analogous to that for e=x) case e = e1 + e2: Lit(e1) = Ops(e1)+1 by the induction hypothesis applied to e1 Lit(e2) = Ops(e2)+1 by the induction hypothesis applied to e2 Lit(e1 + e2) = Lit(e1) + Lit(e2) by the rule Lit+ Ops(e1 + e2) = Ops(e1) + Ops(e2) + 1 by the rule Ops+ So Lit(e1 + e2) = Lit(e1) + Lit(e2) = Ops(e1) + 1 + Ops(e2) + 1 = Ops(e1 + e2) + 1 which is the result we needed to prove. case e = e1 * e2: (similar to e1 + e2, above)