Lecture 16: Intermediate Representation and IR Generation Practical Hints for HW7 typing: code generator can rely on progra being typed (can assume you'll always find a variable, use the right number of fn params, etc.) but don't need type info (everything is a i32). Note that if we added floats this would change! variables: do need to know what variables are in scope, because you need to index them. Will need to track list of variables for each scope as you go in, so you can look outward (and follow static links). Need to track some other things: e.g. functions generated (they all go to the top level). Other things (notably instructions) you can create as you go. Single-pass codegen is possible. Generating IRs closure conversion: closing and hoisting why? because platforms don't have built-in support for closures closing source language e ::= x | x1..xn => e | e0(x1=e1...xn=en) | n | e1 + e2 | let x = e1 in e2 | x := e1; e2 | if e0 then e1 else e2 destination language p ::= $fp | p.f e ::= p | ($fp => e, e) | e0(e1) | n | new { f1 = e1...fn = en } | e1 + e2 | p.f := e1; e2 | if e0 then e1 else e2 Gamma ::= * | Gamma, x1..xm // Gamma is a list of lists of variables closing rules Gamma, x1..xn xn+1..xm |- e ~> e' // xn+1..xm are bound in let statements that are in e but not nested in an inner function ----------------------------------------------------------------------------------------------------------------------------- Gamma |- [x1..xn => e] ~> ($fp => e', $fp) x is in the nth list in Gamma //starting from the last list in Gamma as 0 ----------------------------- Gamma |- [x] ~> $fp.(link.^n).x // (link.^n) means repeat "link." n times; n may be zero Gamma |- [ei] ~> ei' // for all i in 0..n ------------------------------------------------------------ Gamma |- [e0(x1=e1...xn=en)] ~> e0'.fst(new { link = e0'.snd x1 = e1...xn = en }) Gamma |- [x] ~> p Gamma |- [ei] ~> ei' // i in {1, 2} -------------------------------------------------------- Gamma |- [x := e1; e2] ~> p.x := e1'; e2' // let x = e1 in e2 is similar Note: the strategy above is simple and very flexible, as it works for closures that capture mutable variables. If your language avoids capturing mutable variables, or marks some variables as immutable, there are strategies that can increase efficiency and preserve more reasoning for later optimization passes. hoisting source langage is the destination language of closing destination language program ::= fd* [last function is main] p ::= $fp | p.f fd ::= fun x($fp) => e e ::= p.f | e1 e2 | n | e1 + e2 | new { f1 = e1...fn = en } | p.f := e1; e2 | if e0 then e1 else e2 | closure(x,y) hoisting rules x fresh [e] -> e'; f --------------------------------------------------------- [$fp => e, e2] -> closure(x,e2); {(fun x($fp) => e')} U f [ei] ~> ei'; fi ------------------------------- [e1 + e2] ~> e1' + e2'; f1 U f2 // other rules are similar IR generation source language is destination language of hoisting destination language program ::= fd* [last function is main] fd ::= fun x($fp) => map[l, i] i ::= nop x := n x := y x := y op z // op is +, -, *, /, ... x := call y(z) return x x := y.f x.f := y goto l if x goto l x ::= new {f1 = x1...fn = xn} [while3addr, with dynamic calls and record allocation/dereference/assignment] why? (1) it's a useful step on the path to assembly language (2) it's a good format for optimization (3) it's still machine-independent, so you can write optimizations that apply to all machines typical representation: the graph of instructions implied by sequencing and goto labels optimized representation: basic blocks encapsulate linear sequences of instructions semantics for a similar destination language, While3Addr, available in chapter 3 of my program analysis book: https://cmu-program-analysis.github.io/2021/resources/program-analysis.pdf IR generation as rules: is0 stands for an instruction sequence ------------------ [n], x ~> [x := n] ------------------ [y], x ~> [x := y] [p], x ~> [is0] x fresh ------------------------- [p.f], y ~> [is0, y := x.f] [many other rules similar to the above 3] [e0], x ~> [is0] [e1], y ~> [is1] [e2], y ~> [is2] x,l1,l2 fresh --------------------------------------------------------------------------------- [if e0 then e1 else e2], y ~> [is0, if x goto l1, is2, goto l2, l1: is1, l2: nop] In case you're interested: IR generation from a stack machine S ::= * | S, x y fresh ---------------------------------- S |- i32.const n ~> S, y |- y := n ---------------------------------------- S, x, y |- i32.add ~> S, z |- z := x + y