Bcopy and The Trick

This section explores how a typical bcopy optimization can be implemented with The Trick. The functioning of The Trick is explained in detail (the details are peculiar to my system, the concepts are not). Bcopy (copy a byte-addressed block of memory) is a simple kind of bitblit. Efficient implementations copy the memory word-at-a-time, but if the blocks' alignments differ then the data must be masked and shuffled in the registers. Roughly (ignoring the boundaries):

(define (bcopy from to nbytes)
  (let* ((diff (mod (- from to) 4))
         (diff (trick-n diff 4)))
    (bcopy1 from to nbytes diff)))

(define (bcopy1 from to nbytes diff)
  (let ((nwords (truncate (/ nbytes 4))))
    (define (shuffle src prev)
      (let* ((bits (* 8 diff))
             (mask (- (shift-left 1 (- 32 bits)) 1)))
        (bitwise-or (shift-left (bitwise-and mask src) bits) prev)))
    (let loop ((i 0) (prev 0))
         (let* ((s (vector-ref from i))
                (d (shuffle s p)))
           (vector-set! to i s)
           (loop (+ 1 i) (hibits s diff))))))
(define (trick-n x n)
  (cond ((< 0 n) x)
        ((= x n) n)
        (else (trick-n x (- n 1)))))

Cogen applied to this code with binding time pattern (D D D) does the right thing: even though all the parameters are dynamic the trick-n procedure produces four copies of the main loop, each specialized to a different alignment. The bit-shuffling is static. If needed one could also specialize to the length (eg when it's small).

The trick-n procedure implements the classic `dynamic choice of finitely static values' PE trick [JoGoSe93]. This implementation produces a dynamic, linear chain of if-tests. A fancier version could produce a log-tree or a jump table. If the alignment is guaranteed, one may avoid the tests completely by specializing bcopy1.

Consider how a purely dynamic compiler functions like a macro. This works because of The Trick: converting a dynamic value into a finitely static one via case analysis. The finite (aka terminating) static computation is executed by the compiler (analogously to macro execution).

trick-n in root is

(code trick-n
      (x n k)
      ((const zero 0)
       (prim z>n ,> zero n)
       (if z>n (,@(call 'k '(x))))
       (prim nn ,identity n) ; prevent n from being lifted
       (prim n=x ,= nn x)
       (if n=x (,@(call 'k '(n))))
       (const one 1)
       (prim n-1 ,- n one)
       (const trick-n trick-n)
       (jump trick-n x n-1 k)))

Now instead of bcopy, consider the simpler trick-n-demo procedure. It retains the essential features: static computation depending on a dynamic parameter of statically known range. Note that in the bcopy example max is not a parameter, thus this example is somewhat more general.

(define (trick-n-demo actual max s d)
  (let ((a (trick actual max)))
    (* d (* s a))))

Even though actual is dynamic we can eliminate the inner multiplication provided that it is less than max (in bcopy, this corresponds to the offsets for the masking and shuffling in the loop). In root this is

(code trick-n-demo
      (actual max s d k)
      (,@(make-closure 'demo-cont '(s d k) 'cl)
       (const trick-n ,trick-n)
       (jump trick-n actual max cl)))

(code demo-cont
      (cl arg)
      (,@(unmake-closure '(s d k) 'cl)
       (prim r3 ,* arg s)
       (prim r4 ,* r3 d)
       ,@(call 'k '(r4))))

We can generate versions of demo-cont specialized to values for actual from 0 to 2 and also to the rest of the static environment (s --> 7).

(define tndc (cogen trick-n-demo '(dynamic static static dynamic dynamic)))
(define fast-demo (eval-root tndc (list top-cont 2 7)))

The code for fast-demo includes 3 copies of demo-cont. In each, the first multiplication (arg with s) has been performed. See how the loop in the compiler tndc expands in its output fast-demo.

fast-demo also includes a general version of demo-cont, in case the actual value were greater than max. If this is not needed, then a version of trick-n that produces an error instead of returning x could be used. Even in cases where the test against zero will never return true, the test cannot be eliminated because that produces a non-terminating compiler. Though this is a good example of how the programmer must understand and account for PE in their code, note that the accounting is limited to the trick-n procedure, rather than demo-cont.