This section explores how a typical bcopy optimization can be implemented with The Trick. The functioning of The Trick is explained in detail (the details are peculiar to my system, the concepts are not). Bcopy (copy a byte-addressed block of memory) is a simple kind of bitblit. Efficient implementations copy the memory word-at-a-time, but if the blocks' alignments differ then the data must be masked and shuffled in the registers. Roughly (ignoring the boundaries):

(define (bcopy from to nbytes) (let* ((diff (mod (- from to) 4)) (diff (trick-n diff 4))) (bcopy1 from to nbytes diff)))(define (bcopy1 from to nbytes diff) (let ((nwords (truncate (/ nbytes 4)))) (define (shuffle src prev) (let* ((bits (* 8 diff)) (mask (- (shift-left 1 (- 32 bits)) 1))) (bitwise-or (shift-left (bitwise-and mask src) bits) prev))) (let loop ((i 0) (prev 0)) (let* ((s (vector-ref from i)) (d (shuffle s p))) (vector-set! to i s) (loop (+ 1 i) (hibits s diff)))))) (define (trick-n x n) (cond ((< 0 n) x) ((= x n) n) (else (trick-n x (- n 1)))))

Cogen applied to this code with binding time pattern `(`

does the right thing: even though all the parameters are dynamic the
`D`

`D`

`D`

)`trick-n`

procedure produces four copies of the main loop, each
specialized to a different alignment. The bit-shuffling is static.
If needed one could also specialize to the length (eg when it's
small).

The `trick-n`

procedure implements the classic `dynamic choice of
finitely static values' PE trick [JoGoSe93]. This implementation
produces a dynamic, linear chain of if-tests. A fancier version could
produce a log-tree or a jump table. If the alignment is guaranteed,
one may avoid the tests completely by specializing `bcopy1`

.

Consider how a purely dynamic compiler functions like a macro. This works because of The Trick: converting a dynamic value into a finitely static one via case analysis. The finite (aka terminating) static computation is executed by the compiler (analogously to macro execution).

`trick-n`

in `root`

is

(code trick-n (x n k) ((const zero 0) (prim z>n ,> zero n) (if z>n (,@(call 'k '(x)))) (prim nn ,identity n) ; prevent n from being lifted (prim n=x ,= nn x) (if n=x (,@(call 'k '(n)))) (const one 1) (prim n-1 ,- n one) (const trick-n trick-n) (jump trick-n x n-1 k)))

Now instead of `bcopy`

, consider the simpler `trick-n-demo`

procedure. It retains the essential features: static computation
depending on a dynamic parameter of statically known range. Note that
in the bcopy example `max`

is not a parameter, thus this example is
somewhat more general.

(define (trick-n-demo actual max s d) (let ((a (trick actual max))) (* d (* s a))))

Even though `actual`

is dynamic we can eliminate the inner
multiplication provided that it is less than `max`

(in bcopy, this
corresponds to the offsets for the masking and shuffling in the loop).
In `root`

this is

(code trick-n-demo (actual max s d k) (,@(make-closure 'demo-cont '(s d k) 'cl) (const trick-n ,trick-n) (jump trick-n actual max cl)))(code demo-cont (cl arg) (,@(unmake-closure '(s d k) 'cl) (prim r3 ,* arg s) (prim r4 ,* r3 d) ,@(call 'k '(r4))))

We can generate versions of `demo-cont`

specialized to
values for `actual`

from 0 to 2 and also to the rest of the
static environment (`s`

--> 7).

(define tndc (cogen trick-n-demo '(dynamic static static dynamic dynamic))) (define fast-demo (eval-root tndc (list top-cont 2 7)))

The code for `fast-demo`

includes 3 copies of `demo-cont`

. In
each, the first multiplication (`arg`

with `s`

) has been
performed. See how the loop in the compiler `tndc`

expands in its
output `fast-demo`

.

`fast-demo`

also includes a general version of `demo-cont`

, in
case the actual value were greater than `max`

. If this is not
needed, then a version of `trick-n`

that produces an error instead
of returning `x`

could be used. Even in cases where the test
against zero will never return true, the test cannot be eliminated
because that produces a non-terminating compiler. Though this is a
good example of how the programmer must understand and account for PE
in their code, note that the accounting is limited to the `trick-n`

procedure, rather than `demo-cont`

.