Bcopy and The Trick

This section explores how a typical bcopy optimization can be implemented with The Trick. The functioning of The Trick is explained in detail (the details are peculiar to my system, the concepts are not). Bcopy (copy a byte-addressed block of memory) is a simple kind of bitblit. Efficient implementations copy the memory word-at-a-time, but if the blocks' alignments differ then the data must be masked and shuffled in the registers. Roughly (ignoring the boundaries):

Cogen applied to this code with binding time pattern (D D D) does the right thing: even though all the parameters are dynamic the trick-n procedure produces four copies of the main loop, each specialized to a different alignment. The bit-shuffling is static. If needed one could also specialize to the length (eg when it's small).

The trick-n procedure implements the classic `dynamic choice of finitely static values' PE trick [JoGoSe93]. This implementation produces a dynamic, linear chain of if-tests. A fancier version could produce a log-tree or a jump table. If the alignment is guaranteed, one may avoid the tests completely by specializing bcopy1.

Consider how a purely dynamic compiler functions like a macro. This works because of The Trick: converting a dynamic value into a finitely static one via case analysis. The finite (aka terminating) static computation is executed by the compiler (analogously to macro execution).

trick-n in root is

Now instead of bcopy, consider the simpler trick-n-demo procedure. It retains the essential features: static computation depending on a dynamic parameter of statically known range. Note that in the bcopy example max is not a parameter, thus this example is somewhat more general.

Even though actual is dynamic we can eliminate the inner multiplication provided that it is less than max (in bcopy, this corresponds to the offsets for the masking and shuffling in the loop). In root this is

We can generate versions of demo-cont specialized to values for actual from 0 to 2 and also to the rest of the static environment (s --> 7).

The code for fast-demo includes 3 copies of demo-cont. In each, the first multiplication (arg with s) has been performed. See how the loop in the compiler tndc expands in its output fast-demo.



fast-demo also includes a general version of demo-cont, in case the actual value were greater than max. If this is not needed, then a version of trick-n that produces an error instead of returning x could be used. Even in cases where the test against zero will never return true, the test cannot be eliminated because that produces a non-terminating compiler. Though this is a good example of how the programmer must understand and account for PE in their code, note that the accounting is limited to the trick-n procedure, rather than demo-cont.