Lecture 15: code generation to WebAssembly ========================================== WebAssembly portable binary format (.wasm) textual equivalent that you will use (.wat, webassembly text) based on S-expressions S ::= n | string | symbol | ( S* ) memory safe (all errors caught, result in exceptions, can't go outside process) semantics specified using inference rules typed (but low-level; almost everything is an int32) no garbage collection interop with JavaScript, other languages shares some things with "real" assembly languages, but also some differences (memory safety, typing, abstract from registers) (we'll also look at some of these issues later in the class) basics stack-based bytecode - examples of constants and operations i32.const 1 i32.const 2 i32.add -> 3 local variables - like "registers" except unlimited (some will spill to memory in practice) - can't take address let x = 5 in (let x = 1 in x + x) + x -> i32.const 5 local.set $x_outer i32.const 1 local.set $x local.get $x local.get $x i32.add local.get $x_outer i32.add produce stack bytecode from a tree traversal (walk through this) drop - removes the element at the top of the stack control flow - examples functions (func $add_one (param $x i32) (result i32) i32.const 1 local.get $x i32.add) call i32.const 2 call $add_one -> 3 if if then else -> (if (then ) (else )) This can also be written in WebAssembly in a flatter style as: if else end If the "if" is an expression that evaluates to a value, you can put an optional (result i32) after the if, indicating that the then and else parts of the if should leave an i32 on the stack (i32 can be replaced with other WebAssembly primitive types) EXERCISE 1 ---------- if x > 0 then x else -x -> local.get $x i32.const 0 i32.gt_s (if (then local.get $x) (else i32.const 0 local.get $x i32.sub)) loops do sum = sum + x; x = x - 1 while x > 0 -> (loop $my_loop local.get $sum local.get $x i32.add local.set $sum local.get $x i32.const 1 i32.sub local.set $x local.get $x i32.const 0 i32.gt_s br_if $my_loop // conditional branch ) Here the parentheses can be omitted if you add an "end" to match the "loop" starting point. "structured control flow" - follows source-level constructs easy to generate, understand, optimize module things global variables (global $tmp (mut i32) (i32.const 0)) - mutable 32-bit integer, initialized to zero i32.const 0 global.set $tmp global.get $tmp imports and exports (module (import "console" "log_int" (func $log_int (param i32))) ... - see run.js (func (export "main") (local $fp i32) i32.const 0 ... ) memories (import "js" "mem" (memory 1)) - 1 means the memory has size of 1 64kb page - see run.js tables, types, and indirect function calls motivation webassembly has a very simple type system values can only be 32- or 64-bit integers or floating point values want calls to functions to be safe! even if we don't know which function is being called! solution: functions represented as integers indexing into a table call_indirect operation takes an integer argument and a concrete type (type $fntype (func (param i32) (result i32))) // push arguments onto stack... local.get $func_ptr // push the function index onto the stack call_indirect (type $fntype) run time check verifies the function actually has that type - like a cast (table 2 funcref) (elem (i32.const 0) $foo $bar) Homework Practicalities bump allocation for memory conceptually: $next_alloc = $next_alloc + bytes in WASM, for 2 words/8 bytes: global.get $next_alloc i32.const 8 i32.add global.set $next_alloc use for objects, closures, and stack frames we allocate stack frames in the heap instead of using a stack. **why**? calling convention we suggest EXAMPLE (draw picture) let w = 3 function bar(int y): number { let x = 0; function foo():number { let z = 2; return x; } return foo(); } function baz(): number { int x = 4; return bar(x+1); } baz(); what's implicit: return location, local variable/register storage EXAMPLE: access x from within baz EXERCISE 2: generate code to access variable x from within foo() object layout we suggest (draw picture) how to access a variable typing: code generator can rely on progra being typed (can assume you'll always find a variable, use the right number of fn params, etc.) but don't need type info (everything is a i32). Note that if we added floats this would change! variables: do need to know what variables are in scope, because you need to index them. Will need to track list of variables for each scope as you go in, so you can look outward (and follow static links). Need to track some other things: e.g. functions generated (they all go to the top level). Other things (notably instructions) you can create as you go. Single-pass codegen is possible. look again at run.js driver walk through some simple .ts programs and the corresponding WebAssembly function.ts static_scope.ts arithmetic.ts extras show webAssembly semantic rules https://webassembly.github.io/spec/core/exec/instructions.html - see for example local.set and local.tee string constants - example in https://developer.mozilla.org/en-US/docs/WebAssembly/Understanding_the_text_format (see WebAssembly Memory) how arrays would work - draw picture ANSWERS why are stack frames allocated on the heap? because due to closures, a stack frame might stick around!