Solutions to Problem Set 9.

Problem 1.

If we take the definition of the SUM machine as our template example 
(see the notes, page 310), then a straightforward implementation of of the
F register machine is the following:

(define-machine f
  (registers val cont n)
  (operations
   (branch (zero? (fetch n)) zero-case) 
   (save n) 
   (save cont) 
   (restore n)
   (restore cont) 
   (assign val (1+ (fetch val)))
   (assign val (* (fetch n) (fetch val))) 
   (assign val 0) 
   (assign n (-1+ (fetch n))) 
   (assign cont done) 
   (assign cont return-from-recursion) 
   (goto top) 
   (goto (fetch cont)))
  (controller
   (assign cont done)
   top
   (branch (zero? (fetch n)) zero-case)
   (save cont)
   (assign cont return-from-recursion)
   (save n)
   (assign n (-1+ (fetch n)))
   (goto top)
   return-from-recursion
   (restore n)
   (restore cont)
   (assign val (1+ (fetch val)))
   (assign val (* (fetch n) (fetch val)))
   (goto (fetch cont))
   zero-case
   (assign val 0)
   (goto (fetch cont))
   done))

The following is a trivial but useful function to load up the appropriate
registers of the F machine, run the machine, and return the result (which
is left in the VAL register).

(define (try-f n)
  (remote-assign f 'n n)
  (start f)
  (remote-fetch f 'val))

If we try it, we see that indeed (f 3) yields 15, (f 5) gives 325, etc.

To analyze stack usage, we now add the initialization and printing code to our
machine and investigate its behavior:

(define-machine f
  (registers val cont n)
  (operations
   (branch (zero? (fetch n)) zero-case) 
   (save n) 
   (save cont) 
   (restore n)
   (restore cont) 
   (assign val (1+ (fetch val)))
   (assign val (* (fetch n) (fetch val))) 
   (assign val 0) 
   (assign n (-1+ (fetch n))) 
   (assign cont done) 
   (assign cont return-from-recursion) 
   (goto top) 
   (goto (fetch cont))
   (perform (the-stack 'initialize))
   (perform (the-stack 'print-statistics)))
  (controller
   (perform (the-stack 'initialize))
   (assign cont done)
   top
   (branch (zero? (fetch n)) zero-case)
   (save cont)
   (assign cont return-from-recursion)
   (save n)
   (assign n (-1+ (fetch n)))
   (goto top)
   return-from-recursion
   (restore n)
   (restore cont)
   (assign val (1+ (fetch val)))
   (assign val (* (fetch n) (fetch val)))
   (goto (fetch cont))
   zero-case
   (assign val 0)
   (goto (fetch cont))
   done
   (perform (the-stack 'print-statistics))))

==> (try-f 3)
(TOTAL-PUSHES: 6 MAXIMUM-DEPTH: 6)
15

==> (try-f 5)
(TOTAL-PUSHES: 10 MAXIMUM-DEPTH: 10)
325

...

By the argument for linearity in the problem set statement, this convinces
us that the formula for both number of pushes and maximum stack depth is
just 2 * n.  The reason is easy enough, namely that every time around the
loop we push both cont and n, and we go through the loop n times.  Notice
that the (save n) and (restore n) operations cannot be omitted, since the
value of n is needed after the recursive call.



Problem 2.

As you see by doing the assignment, the explicit-control evaluator's running
of the Scheme procedure f is quite a bit less efficient than the
hand-crafted register-transfer machine that you created for this function in
problem 1.  My results for number of pushes and maximum stack depth were as
follows:

n	pushes	max-depth
0	16	8
1	56	14
2	96	20
3	136	26
4	176	32
5	216	38
...

Obviously, the number of pushes is 16 + (40 * n), and the maximum stack
depth is 8 + (6 * n).  The basic behavior of f as evaluated by the
explicit-control evaluator is just the same as the behavior of the register
machine we built in problem 1, only the constants are much larger.  This is
true for three reasons: (1) the interpreter must do some work to figure out
what our program is telling it to do, (2) the interpreter does a
considerable number of useless saving and restoring, e.g., saving argl and
so on for a recursive evaluation of a self-evaluating expression like 4, and
(3) "open coding" of primitive operations like multiplication and
decrementing.



Problem 3.

e	pushes	max-depth
0	16	8
1	56	16
2	96	24
3	136	32
...

The number of pushes is the same as before, but the maximum stack depth is
now 8 + (8 * n).  It seems that on each recursive call to f1, two more
registers are being saved on the stack.  If you look very closely at the
code of the explicit-control evaluator, you can see that these must be the
registers env and unev, and that the reason is because of the tail-recursion
special case in the evaluator.  In running f1, when the recursive expression
(f1 (- n 1)) is being evaluated, the second argument to * is yet to be
processed and therefore the code at eval-arg-loop saves env and unev around
the recursive evaluation.  In problem 2, where the recursive call was the
second (and last) argument to *, the evaluator runs through the
tail-recursive code at eval-last-arg, which knows that it need not save
either env or unev, and does not.  This behavior is by no means obvious, but
can be deduced from a close examination of the evaluator code.



Problem 4.

For the iterative version of f, the table of numbers follows:

n	pushes	max-depth
0	72	11
1	115	11
2	158	11
3	201	11 ...

The number of pushes is 72 + (43 * n), but now the maximum stack depth
is 11, a constant independent of n, just as we would have expected
from a tail-recursive interpreter.  



Problem 5.

The compiler generates the following controller code:

For (+ x (* y z)):

((ASSIGN FUN (LOOKUP-VARIABLE-VALUE '+ (FETCH ENV)))
 (SAVE FUN) 
 (ASSIGN VAL (LOOKUP-VARIABLE-VALUE 'X (FETCH ENV))) 
 (ASSIGN ARGL (CONS (FETCH VAL) ()))
 (SAVE ARGL) 
 (ASSIGN FUN (LOOKUP-VARIABLE-VALUE '* (FETCH ENV))) 
 (ASSIGN VAL (LOOKUP-VARIABLE-VALUE 'Y (FETCH ENV))) 
 (ASSIGN ARGL (CONS (FETCH VAL) ())) 
 (ASSIGN VAL (LOOKUP-VARIABLE-VALUE 'Z (FETCH ENV))) 
 (ASSIGN ARGL (CONS (FETCH VAL) (FETCH ARGL))) 
 (ASSIGN CONTINUE AFTER-CALL0)
 (SAVE CONTINUE)
 (GOTO APPLY-DISPATCH) 
 AFTER-CALL0
 (RESTORE ARGL)
 (ASSIGN ARGL (CONS (FETCH VAL) (FETCH ARGL))) 
 (RESTORE FUN)
 (GOTO APPLY-DISPATCH))

For (+ (* y z) x):

((ASSIGN FUN (LOOKUP-VARIABLE-VALUE '+ (FETCH ENV)))
 (SAVE FUN)
 (SAVE ENV) 
 (ASSIGN FUN (LOOKUP-VARIABLE-VALUE '* (FETCH ENV))) 
 (ASSIGN VAL (LOOKUP-VARIABLE-VALUE 'Y (FETCH ENV))) 
 (ASSIGN ARGL (CONS (FETCH VAL) ())) 
 (ASSIGN VAL (LOOKUP-VARIABLE-VALUE 'Z (FETCH ENV))) 
 (ASSIGN ARGL (CONS (FETCH VAL) (FETCH ARGL))) 
 (ASSIGN CONTINUE AFTER-CALL1)
 (SAVE CONTINUE)
 (GOTO APPLY-DISPATCH) 
 AFTER-CALL1
 (ASSIGN ARGL (CONS (FETCH VAL) ()))
 (RESTORE ENV) 
 (ASSIGN VAL (LOOKUP-VARIABLE-VALUE 'X (FETCH ENV)))
 (ASSIGN ARGL (CONS (FETCH VAL) (FETCH ARGL)))
 (RESTORE FUN) 
 (GOTO APPLY-DISPATCH))

It turns out that both expressions compile to 18 lines of code (one label +
17 control operations).  The principal difference is that in the first case,
the register argl must be saved and restored, because the value of x is
already computed before the inner multiply is done, whereas in the second
case, argl is not saved, but the env register must be, because in case the
inner expression (first argument to the +) does some arbitrary computation
that changes the env register, it must be restored to allow evaluation of x.
The env register is saved here for the same reasons that the env and unev
registers were saved in problem 3, but not problem 2.

In comparing this difference with the one between f and f1 of problems 2 and
3, note that the compiler does much more optimization than the evaluator
did, but not as much as it could.  In particular, notice that in the case of
(* x (+ y z)) the compiler noticed that it does not have to save and restore
env around the evaluation of x, because it can tell that that evaluation
will not alter env.  In the other case, it saves env, because the evaluation
of the first argument might change env.  In fact, because that argument is
just the application of a primitive arithmetic operator, we know that it
cannot change env, and if the compiler also did, it could generate better
code.



Problem 6.

A summary of experimental runs of the three functions follows:

	cf		cf1		cf-iter
n	push	max	push	max	push	max
0	 7	3	7	3	17	6
1	17	8	17	8	27	6
2	27	13	27	13	37	6
3	37	18	37	18	47	6

The formulas for cf and cf1 are identical, as we might have expected
after considering the results of problem 5:  push(cf) = 7 + (10 * n), and
max(cf) = 3 + (5 * n).  The iterative version has a constant stack depth,
and its number of pushes is 17 + (10 * n).



Problem 7.

Comparing the ratio of run-time and space between the compiled and interpreted
versions of the three functions we have considered (for large n):

		speed-up	space-saving
f		10/40 (0.25)	5/6  (0.83)
f1		10/40 (0.25)	5/8  (0.63)
f-iter  	10/43 (0.23)	6/11 (0.55)

For f, we can also compare performance of both the interpreted and compiled
versions with the hand-coded version of problem 1.

		speed-up	space-saving
		  over		over
		interpreted	interpreted
compiled	0.25		0.83
hand-crafted	0.05		0.33

The speed-up given by the hand-crafted code (a factor of 20) is quite
impressive.  Our simple compiler produces code that is slower by a factor of 5
and less space-efficient by a factor of 3 than the best hand-produced code.


Here is the code the compiler gave me for f:

 ((ASSIGN VAL (MAKE-COMPILED-PROCEDURE ENTRY30 (FETCH ENV)))
  (GOTO AFTER-LAMBDA31)
  ENTRY30
  (ASSIGN ENV (ENV-OF-COMPILED-PROCEDURE (FETCH FUN)))
  (ASSIGN ENV (EXTEND-ENVIRONMENT '(N) (FETCH ARGL) (FETCH ENV)))
  (SAVE ENV)
  (ASSIGN FUN (LOOKUP-VARIABLE-VALUE '= (FETCH ENV)))
  (ASSIGN VAL (LOOKUP-VARIABLE-VALUE 'N (FETCH ENV)))
  (ASSIGN ARGL (CONS (FETCH VAL) ()))
  (ASSIGN VAL '0)
  (ASSIGN ARGL (CONS (FETCH VAL) (FETCH ARGL)))
  (ASSIGN CONTINUE AFTER-CALL36)
  (SAVE CONTINUE)
  (GOTO APPLY-DISPATCH)
  AFTER-CALL36
  (RESTORE ENV)
  (BRANCH (FETCH VAL) TRUE-BRANCH32)
  (ASSIGN FUN (LOOKUP-VARIABLE-VALUE '* (FETCH ENV)))
  (SAVE FUN)
  (ASSIGN VAL (LOOKUP-VARIABLE-VALUE 'N (FETCH ENV)))
  (ASSIGN ARGL (CONS (FETCH VAL) ()))
  (SAVE ARGL)
  (ASSIGN FUN (LOOKUP-VARIABLE-VALUE '+ (FETCH ENV)))
  (SAVE FUN)
  (ASSIGN VAL '1)
  (ASSIGN ARGL (CONS (FETCH VAL) ()))
  (SAVE ARGL)
  (ASSIGN FUN (LOOKUP-VARIABLE-VALUE 'F (FETCH ENV)))
  (SAVE FUN)
  (ASSIGN FUN (LOOKUP-VARIABLE-VALUE '- (FETCH ENV)))
  (ASSIGN VAL (LOOKUP-VARIABLE-VALUE 'N (FETCH ENV)))
  (ASSIGN ARGL (CONS (FETCH VAL) ()))
  (ASSIGN VAL '1)
  (ASSIGN ARGL (CONS (FETCH VAL) (FETCH ARGL)))
  (ASSIGN CONTINUE AFTER-CALL35)
  (SAVE CONTINUE)
  (GOTO APPLY-DISPATCH)
  AFTER-CALL35
  (ASSIGN ARGL (CONS (FETCH VAL) ()))
  (RESTORE FUN)
  (ASSIGN CONTINUE AFTER-CALL34)
  (SAVE CONTINUE)
  (GOTO APPLY-DISPATCH)
  AFTER-CALL34
  (RESTORE ARGL)
  (ASSIGN ARGL (CONS (FETCH VAL) (FETCH ARGL)))
  (RESTORE FUN)
  (ASSIGN CONTINUE AFTER-CALL33)
  (SAVE CONTINUE)
  (GOTO APPLY-DISPATCH)
  AFTER-CALL33
  (RESTORE ARGL)
  (ASSIGN ARGL (CONS (FETCH VAL) (FETCH ARGL)))
  (RESTORE FUN)
  (GOTO APPLY-DISPATCH)
  TRUE-BRANCH32
  (ASSIGN VAL '0)
  (RESTORE CONTINUE)
  (GOTO (FETCH CONTINUE))
  AFTER-LAMBDA31
  (PERFORM (DEFINE-VARIABLE! 'F (FETCH VAL) (FETCH ENV)))
  (RESTORE CONTINUE)
  (GOTO (FETCH CONTINUE))))

Two possible major optimizations that one could add to come closer to the
hand-crafted code's performance are:

1.  Open-coding of primitive functions.  To do a multiply on our machine, we
need only use a single instruction like
(assign val (* (fetch n) (fetch val)))
as we used in problem 1.  The compiler, however, must instead set up an
argument list in the argl register, the function (primitive-*, in this case)
in the fun register, and then call apply-dispatch.  This is much slower, of
course.  Note too that the compiled code looks up * in the environment,
whereas the hand-coded routine "knows" that its value is the multiply
primitive.  One danger of such knowledge, of course, is that if we were to
redefine * to mean addition, the hand-coded (or open-code optimized) code
would never notice.

2.  Using registers to hold important values.  In the hand-coded machine, we
did not need to manipulate the environment nor to look up values in it,
because the value of n, our variable, was in a register.  In many actual
computers, the hardware provides some small number (usually between 4 and
32) of extra registers, not needed by the operation of the machine itself,
that can be used by programmers or compilers to hold the most important and
often-used values in a computation.  If we imagined extending the
explicit-control evaluator of Scheme by adding a set of extra registers, say
r0, r1, ... r7, then the compiler could use, say, r0 to hold n and avoid
having to look up its value in the environment except when the function was
first called.  This optimization is especially important when combined with
the open-coding of primitives, as suggested above.

In addition, optimizations based on knowing special properties of certain
functions, the propagation of constant values at compile time, moving of
code segments to reduce the number of needed goto's, and many other possible
optimizations can be applied.  Some state-of-the-art compilers produce code
that is closely comparable with the best efforts of human coders, although
of course such compilers are much larger and more complex than what we have
presented.

