Summary

A common strategy used by compilers to optimize programs involves factoring computations out of repeatedly executed procedures, statements, and expressions. For example, some compilers (and some programmers) will `hoist' so-called `loop-invariant computations' out of loops [Dragon]. This basic idea also exists in the area of program transformation, for example in the use of `staging transformations' [JoSche86] that move computations to earlier stages in the execution of the program whenever possible.

Run time code generation (RTCG) is another technique for factoring invariant computations out of repeatedly evaluated expressions. In principle, RTCG can lead to better results than purely static approaches because the `invariants' do not need to be established until run time, when we might expect more useful aspects of the computation to become invariant.

My thesis is that an approach to RTCG exists which will allow one to build software using at a high level of abstraction (by composing many `little languages') while at the same time achieving high performance. I plan to demonstrate this thesis by implementing a prototype system and using it to build several novel graphics applications.

The approach I take to this research involves factoring computations into procedures called `generating extensions' [JoGoSe93]. The classic example of a generating extension is a compiler. For example, consider an interpreter I. This is essentially a procedure that takes a program to interpret and that program's input; the result is the program's output.

I(p, i) --> o

Typically, the interpreter will be repeatedly applied with the same program but varying inputs:

I(p, i1) --> o1   I(p, i2) --> o2   I(p, i3) --> o3

and so it often becomes profitable to factor out the computations that are specific to the interpretation of the particular program p and perform them once only, leaving behind a program that performs only that part of the computation that depends on the program's input. The program that performs the factoring and executes the `static semantics' is called a compiler:

comp(p) --> aout

aout(i1) --> o1   aout(i2) --> o2   aout(i3) --> o3

Aout implements the `factored out' computations of I applied to p.

However, to quote [Abelson92], ``The interpreter for a computer language is just another program.'' Conversely, any program can be viewed as implementing an interpreter for a language (often a very simple language!), and so this idea of factoring out computations can be applied in all sorts of situations, for example when

f(s, d1); f(s, d2); f(s, d3);

is replaced with

f_s = f_gen(s)
f_s(d1); f_s(d2); f_s(d3);

Functional programmers may think of f_gen as a `smart' currying function that performs as much of the computation of f as is possible when the S argument is supplied, then returns a `partially applied' function. If f has type S * D -> T then f_gen would have type S -> (D -> T).

With today's popular programming systems [CodeWarrior][GCC] many practical differences arise between compilation and RTCG. In the most obvious one, the argument s will not be known until run time, and hence the specialized procedure f_s cannot be computed until run time. Hence, run time code generation, or RTCG. This also means that the procedures to be factored will often be much simpler than typical interpreters, and so we might hope that the generating extensions might be simpler to implement and also less expensive to execute than the typical compiler.

Still, compilers such as f_gen are in practice much harder to port, write, debug, and change than interpreters. And since the cost of compilation has not been a primary concern, it has typically been very expensive to execute a compiler. The cost of compilation is particularly important in situations in which the specialized procedures such as f_s might be used only a few times. Reducing the cost of the compilers only exacerbates their implementation difficulties. Thus there have been limited opportunities for RTCG in production code.

Note that several procedures might be composed together (e.g., to implement `layered' system architectures), and so several stages of `compilation' might be required in general.

The above motivates a system with the following properties:

fast compilers: the generating extensions must run fast.
(semi)automatic: make it easy to build generating extensions (compilers) from procedures (interpreters).
composable: handle multistage and multilayer systems.

To address these problems, I take an approach based on ideas and techniques developed for self-applicable partial evaluation (PE) [JoGoSe93] and apply them to a compiler intermediate representation. PE is a transformation technique that traditionally has been applied to the generation of compilers from interpreters. In other words, PE is an approach for automatically deriving efficient generating extensions. Partial evaluation of mostly-pure Scheme has made great advances in practicality recently, and now it is possible to convert even a higher-order interpreter into a good compiler (whose target language is Scheme) [Jorgensen92].

There's nothing magic about this kind of cogen (compiler generator). It doesn't write any new code, it merely reorganizes the procedures given to it. However, easing the creation of compilers from interpreters makes languages lightweight. Such a cogen promises to (and in fact was specifically designed to) alleviate the implementation difficulties of interactive graphics. Thus my title: Lightweight Languages for Interactive Graphics.

My system is called nitrous. It uses a simple, untyped compiler intermediate representation (IR) as input to and output from a directly implemented compiler generator. Because the input is low-level, few invariants are apparent in the input code. Hints are required to avoid excess specialization and generate good compilers; these hints sometimes take the form of lifting operators.

Since the input and output languages are the same, cogen may make several passes over code. This can be used to support layered languages, one form of language composition. However, the lifting operators must be generalized to handle multiple passes. Directly implementing cogen instead of creating it by self-applying a specializer has proven to free the system from some peculiar constraints [BiWe92], and obviates the bootstrapping problems of self-application.

There are a number of questions I hope to answer. For example, what are the analytical properties of the transformed code and its execution? How practical is cogen compared to existing systems such as lisp macros? Is the system powerful enough to handle a real language without depending on brittle analyses?

I plan to evaluate the system by implementing several examples, including vector and 2D graphics toolkits. The vector toolkit uses RTCG to convert scalar procedures into unrolled, pipelined loops. In the graphics toolkit, the representation of bitmaps is decoupled from the operations on them. Eg, I use a compiled little language to describe the memory layout of bitmaps. Perhaps the most interesting example is the lift compiler, wherein cogen is used to implement itself.

In conclusion, we can summarize this thesis as follows: run time code generation and compilation are two ends of the same underlying optimization. Traditional compilation is great, but fast compilers open up many new and interesting opportunities. Writing compilers manually is hard; we can automate much of it, giving us lightweight languages. My approach is to apply partial evaluation techniques to a compiler IR, yielding cogen, a compiler generator designed for RTCG. The approach is proven by the development of vector and 2D graphics toolkits.