In today's lecture we worked on the design of L3. Thanks to everyone for their active participation! Here is a transcription based on my recollection and some editor's notes. I would appreciate if we could have a discussion on the remaining issues which I list at the bottom in the Sakai discussion forum. Design criteria: 1. Want structs and arrays 2. Should be "C-like" and compatible with L2 3. Doesn't need to be safe, but it should be possible to implement it safely (with some runtime checks---this will be L4). Please keep these points in mind for the discussion, especially point 3. Structs ======= Declare with struct s { f1:t1; ... fn:tn; }; where s is the name of the type (an identifier), f1,...,fn are field names (which need to be distinct from each other, but can be the same as other fields in other structs) t1,...,tn are types of the fields. Note that field types ti may again be struct types. structs must be declared at the top-level, like functions. Also like functions, they can be (mutually) recursive, and they don't need to be in order, but on every recursive reference of any struct to itself there must be a pointer interposed to avoid infinitely sized objects (or lazy structs -- ouch -- which is definitely not C-like). The language of types is then t ::= int | s | t* where is the name of a struct type (and t should be pronounced "tau" :-) We can (implicitly) allocate a struct simply by declaring a variable of struct type, as in var x : s; structs declared in this way need not be initialized (at least for this lab---we'll reconsider this issue when we want to make the language safe). Fields of a struct are accessed with an expression e.f where e is an expression and f is a field name (identifier). We use the typing rule: e.f : t if e : s and s was declared with struct s {... f:t; ...}; We can assign to the fields of a struct by using an "lval" (left value) of the form v.f, as in struct s {f:t;}; var x : s; x.f = e; We write "v" for lvals which are defined as in v ::= x | v.f | *v [editor's note: I am wondering about *v -- perhaps it should be *e so we can compute an address and then write to it? But I guess one could rewrite *e1 = e2 as x = e1; *x = e2;] As a derived expression [editor's note: and perhaps an lval?] we can define e->f as (*e).f without complicating the compiler. Due to complications with calling conventions, we decided not to allow structs as function arguments or function results. So instead of passing a struct, we need to pass a pointer to a struct; instead of returning a struct, we need to return a pointer to a struct. Assignment now takes the more general form v = e with the rule that v = e stmt if v : t and e : t (for some t) There is no "deep" assignment for structs, so for x : s and y : s, the assignment x = y or even the comparisons x == y or x != y are illegal. This needs to be caught by the type-checker. This decision simplifies the compiler, but complicates programs occasionally. Pointers ======== We use the C-like expression *e to dereference a pointer. The typing rule for this says: *e : t if e : t* We can declare a variable of pointer type as in var x : int*; but this doesn't actually reserve any space to store an integer. If we were requiring initialization (which we don't, for now), then x would be initialize to the null pointer: NULL We were unsure about the typing rules for the NULL pointer. Three possibilities were offered (see the open questions below). The sentiment seemed to be to distinguish NULL (a pointer) from 0 (an int). We create a pointer with the expression new t[e] with the rule that new t[e] : t* if e : int and (potentially) a run-time check that e > 0. This would allocate an array of "k" elements, where "k" is the value of e. As a special case we might support new t which should be equivalent to new t[1]. We write to the destination of a pointer by using it as an lval. For example, *p = *q; copies the content of q to the location of p, as in C. Also, it was suggested that equality (==) and disequality (!=) work on pointers. Arrays ====== This was the least developed feature of the language since we ran out of time. The new construct to create pointers is used to create arrays. We access an array element with subscripting, writing e1[e2] with the rule e1[e2] : t if e1 : t* and e2 : int The result of this is only well-defined if e2 is "in bounds". This was not discussed, but I take this to mean that if we create x = new t[k] then x[0],...,x[k-1] should all be ok, but not x[-1] or x[k]. Because the language is unsafe, it is "undefined" what this returns. The idea to expand e1[e2] as *(e1 + e2) with pointer arithmetic was shouted down, so I assume we won't do this. Neverthless, at least *x === x[0] seems inevitable, unless we distinguish pointer types from array types (which nobody suggested). But see point D. below. Presumably (although not discussed), there should also be an lval e1[e2] or v1[e2] so one can write x[2] = 4; Implementation ============== While some arrays and structs can be allocated on the stack, we decided a simple and uniform strategy is to allocate them all on the heap. The L3 programmer can access the heap only through the "new" constructs, by dereferencing pointers, and array subscript operations. I declared categorically that there is no way to free memory (because this makes Lab 4 well nigh impossible). The run-time system will provide a garbage collector, or all our programs will be small enough so that garbage collection isn't necessary. Since I am providing the run-time system (and the so-called conservative garbage collector), this is easy for you. Note that your compiler will have to generate the right calls to malloc() in order to implement "new" and implicit allocation of structs. For the safe implementation in Lab 4 I imagine a Cyclone-like solution where pointers come with an upper and lower bound when "new" is called. Before an array access the pointer is checked whether it is "in bounds". Open Questions ============== A. Type checking with NULL. Without NULL, it seems pretty straightforward to type-check this language using the informally presented rules above. For NULL, there were three suggestions: NULL : alpha* for some type variable alpha [editor's note: scary] NULL : t* where we locally somehow determine what t must be without introducing full-fledged polymorphism Perhaps this is just implicitly converting some special type such as void* to t* as needed. NULL(t) : t* that is, the type is given in the syntax. [editors note: perhaps we could literally write NULL : t* or (t*)NULL for those cases where it would be potentially ambiguous in case of the third option. Of course, we would have to define precisely when this would be the case.] B. Type casting. Do we allow type-casting, and which forms? What for? C. Alignment. Do we impose alignment constraints on the implementation of structs or not? During the discussion, I said this would have an impact if we wanted to be able to call/be-called-from C libraries. But, actually, it is bit worse because the architecture manual makes it very clear that all pointers (64 bit ints) should be aligned at 0 mod 8, and ints should be aligned at 0 mod 4. In fact, in some modes the processor will raise an exception if that is not the case. So I don't believe we really have a choice here. D. Pointers vs arrays. This is perhaps too much unlike C, but what about the idea of t* (for a pointer, not subject to array indexing or pointer arithmetic) and t[] (subject to array indexing and, if available, pointer arithmetic)? E. Address-of operator (&v). There was the issue if the address-of operator should be in the language or not. Presumably, its typing rule would be &v : t* if v : t but this creates implementation problems if v is a variable x which may be held in a register. For example, the compiler would have to make sure that in var x : int; var p : int*; p = &v; the variable x is not held in a register but put on the heap (or at the very least spilled onto the stack).