	          MASSACHUSETTS INSTITUTE OF TECHNOLOGY
        Department of Electrical Engineering and Computer Science
         6.001 Structure and Interpretation of Computer Programs
                             Fall Semester 1984

                        Solutions to Problem Set 10.

Problem 1.

A straightforward implementation of of the REXPT register machine is the 
following:

(define-machine rexpt
  (registers val cont b n)
  (controller
   (assign cont done)
  loop
   (branch (zero? (fetch n)) zero-case)
   (save cont)
   (assign cont return-from-recursion)
   (save n)
   (assign n (-1+ (fetch e)))
   (goto loop)
  return-from-recursion
   (restore n)
   (restore cont)
   (assign val (* (fetch b) (fetch val)))
   (goto (fetch cont))
  zero-case
   (assign val 1)
   (goto (fetch cont))
  done))

The following is a trivial but useful function to load up the appropriate
registers of the REXPT machine, run the machine, and return the result (which
is left in the VAL register).

(define (try-rexpt b n)
  (remote-assign rexpt 'b b)
  (remote-assign rexpt 'n n)
  (start rexpt)
  (remote-fetch rexpt 'val))

If we try it, we see that indeed (rexpt 2 6) yields 64, (rexpt 3 4) give 81,
etc.

To analyze stack usage, we now add the initialization and printing code to our
machine and investigate its behavior:

(define-machine rexpt
  (registers val cont b n)
  (controller
   (perform (initialize-stack))
   (assign cont done)
  loop
   (branch (zero? (fetch n)) zero-case)
   (save cont)
   (assign cont return-from-recursion)
   (save n)                                            ; Note A.
   (assign n (-1+ (fetch n)))
   (goto loop)
  return-from-recursion
   (restore n)                                         ; Note A.
   (restore cont)
   (assign val (* (fetch b) (fetch val)))
   (goto (fetch cont))
  zero-case
   (assign val 1)
   (goto (fetch cont))
  done
   (perform (*the-stack* 'print-statistics))
   ))

==> (try-rexpt 3 3)
(TOTAL-PUSHES: 6 MAXIMUM-DEPTH: 6)
27

==> (try-rexpt 3 4)
(TOTAL-PUSHES: 8 MAXIMUM-DEPTH: 8)
81

...

By the argument for linearity in the problem set statement, this convinces us
that the formula for both number of pushes and maximum stack depth is just
2 * n.  Why, you might ask, however, should it be twice n?  Clearly,
each time around the loop, we push both cont and n.  Cont turns out to be
necessary, because we must know where to continue after calculating a new val,
but we may notice that n is needed only so we can count down to 0, and never
again afterwards.  Thus, we could eliminate the lines in the above code that
are marked with the comment "Note A." without harming the machine.  In that
case, it would run with half as many stack pushes, a maximum stack depth that
is half as large, and the only "cost" would be that after running, the n
register would no longer have its original value restored (which was not part
of the specification of the problem, anyway).

A very straightforward implementation of IEXPT is:

(define-machine iexpt
  (registers b n val)
  (controller
   (assign val 1)
  loop
   (branch (zero? (fetch n)) done)
   (assign n (-1+ (fetch n)))
   (assign val (* (fetch b) (fetch val)))
   (goto loop)
  done))

Since IEXPT does not use the stack, both pushes and depth are zero.

Problem 2.

As you see by doing the assignment, the explicit-control evaluator's running
of the Scheme procedure rexpt is quite a bit less efficient than the
hand-crafted register-transfer machine that you created for this function in
problem 1.  My results for number of pushes and maximum stack depth were as
follows:

n	pushes	max-depth
0	19	8
1	54	11
2	89	14
3	124	17
4	159	20
5	194	23
...

Note that the value of b cannot matter in this case.  Just to be sure, I also
tried (rexpt 2 4), and got the same values as above: 159 pushes and 20 depth.

Obviously, the number of pushes is 19 + (35 * n), and the maximum stack depth
is 8 + (3 * n).  The basic behavior of rexpt as evaluated by the
explicit-control evaluator is just the same as the behavior of the register
machine we built in problem 1, only the constants are much larger.  This is
true for two reasons:  (1) the interpreter must do some work to figure out
what our program is telling it to do, and (2) "clever" optimizations, like
noticing that the value of n is not needed after we return from the recursive
call, are not done by this rather simple-minded interpreter.   (Anyway, it
might take a more clever interpreter more time to figure out the optimization
than it would save.)

Problem 3.

n	pushes	max-depth
0	19	8
1	54	13
2	89	18
3	124	23
...

The number of pushes is the same as before, but the maximum stack
depth is now 8 + (5 * n).  It seems that on each recursive call to
rexpt1, two more registers are being saved on the stack.  If you look
very closely at the code of the explicit-control evaluator, you can
see that these must be the registers env and unev, and that the reason
is because of the tail-recursion special case in the evaluator.  In
running rexpt1, when the recursive expression (rexpt1 b (- n 1)) is being
evaluated, the second argument to * is yet to be processed and therefore
the code at eval-arg-loop saves env and unev around the recursive evaluation.
In problem 2, where the recursive call was the second (and last) argument
to *, the evaluator runs through the tail-recursive code at eval-last-arg,
which knows that it need not save either env or unev, and does not.  This
behavior is by no means obvious, but can be deduced from a close examination
of the evaluator code.

Problem 4.

For the iterative version, iexpt, the table of numbers follows:

n	pushes	max-depth
0	32	8
1	67	10
2	102	10
3	137	10
4	172	10
5	207	10 ...

The number of pushes is 32 + (35 * n), but now the maximum stack depth
is 10, a constant independent of n, just as we would have expected
from a tail-recursive interpreter.  Note that relying on the 0-value
in looking at the timing performance of an algorithm may be
instructive but can also mislead.  In this case, the depth of stack
used for the case where n = 0 is smaller than for all other values of
n.  It is interesting that the coefficient of n in the formula here
happens to be exactly the same as in problem 2.  This is likely
because the two algorithms are very similar in what they cause the
interpreter to do, though you cannot generally anticipate getting the
same constant.

Problem 5.

The compiler generates the following controller code:

For (+ x (* y z)):

((ASSIGN FUN (LOOKUP-VARIABLE-VALUE '+ (FETCH ENV)))
 (SAVE FUN) 
 (ASSIGN VAL (LOOKUP-VARIABLE-VALUE 'X (FETCH ENV))) 
 (ASSIGN ARGL (CONS (FETCH VAL) '()))
 (SAVE ARGL) 
 (ASSIGN FUN (LOOKUP-VARIABLE-VALUE '* (FETCH ENV))) 
 (ASSIGN VAL (LOOKUP-VARIABLE-VALUE 'Y (FETCH ENV))) 
 (ASSIGN ARGL (CONS (FETCH VAL) '())) 
 (ASSIGN VAL (LOOKUP-VARIABLE-VALUE 'Z (FETCH ENV))) 
 (ASSIGN ARGL (CONS (FETCH VAL) (FETCH ARGL))) 
 (ASSIGN CONTINUE AFTER-CALL0)
 (SAVE CONTINUE)
 (GOTO APPLY-DISPATCH) 
 AFTER-CALL0
 (RESTORE ARGL)
 (ASSIGN ARGL (CONS (FETCH VAL) (FETCH ARGL))) 
 (RESTORE FUN)
 (GOTO APPLY-DISPATCH))

For (+ (* y z) x):

((ASSIGN FUN (LOOKUP-VARIABLE-VALUE '+ (FETCH ENV)))
 (SAVE FUN)
 (SAVE ENV) 
 (ASSIGN FUN (LOOKUP-VARIABLE-VALUE '* (FETCH ENV))) 
 (ASSIGN VAL (LOOKUP-VARIABLE-VALUE 'Y (FETCH ENV))) 
 (ASSIGN ARGL (CONS (FETCH VAL) '())) 
 (ASSIGN VAL (LOOKUP-VARIABLE-VALUE 'Z (FETCH ENV))) 
 (ASSIGN ARGL (CONS (FETCH VAL) (FETCH ARGL))) 
 (ASSIGN CONTINUE AFTER-CALL1)
 (SAVE CONTINUE)
 (GOTO APPLY-DISPATCH) 
 AFTER-CALL1
 (ASSIGN ARGL (CONS (FETCH VAL) '()))
 (RESTORE ENV) 
 (ASSIGN VAL (LOOKUP-VARIABLE-VALUE 'X (FETCH ENV)))
 (ASSIGN ARGL (CONS (FETCH VAL) (FETCH ARGL)))
 (RESTORE FUN) 
 (GOTO APPLY-DISPATCH))

It turns out that both expressions compile to 18 lines of code (one label +
17 control operations).  The principal difference is that in the first case,
the register argl must be saved and restored, because the value of x is already
computed before the inner multiply is done, whereas in the second case, argl
is not saved, but the env register must be, because in case the inner
expression (first argument to the +) does some arbitrary computation that
changes the env register, it must be restored to allow evaluation of x.
The env register is saved here for the same reasons that the env and unev
registers were saved in problem 3, but not problem 2.

In comparing this difference with the one between rexpt and rexpt1 of
problems 2 and 3, note that the compiler does much more optimization than
the evaluator did, but not as much as it could.  In particular, notice that
in the case of (* x (+ y z)) the compiler noticed that it does not have to
save and restore env around the evaluation of x, because it can tell that
that evaluation will not alter env.  In the other case, it saves env, because
the evaluation of the first argument might change env.  In fact, because that
argument is just the application of a primitive arithmetic operator, we know
that it cannot change env, and if the compiler also did, it could generate
more optimal code.

Problem 6.

A summary of experimental runs of the three functions follows:

	crexpt		crexpt1		ciexpt
n	push	max	push	max	push	max
0	10	5	10	5	10	5
1	18	6	18	6	17	5
2	26	9	26	9	24	5
3	34	12	34	12	31	5
4	42	15	42	15	38	5

The formulas for crexpt and crexpt1 are identical, as we might have expected
after considering the results of problem 5:  push(crexpt) = 10 + (8 * n), and
max(crexpt) = 3 + (3 * n).  The iterative version has a constant stack depth,
and its number of pushes is 10 + (7 * n).

Problem 7.

Comparing the ratio of run-time and space between the compiled and interpreted
versions of the three functions we have considered (for large n):

		speed-up	space-saving
rexpt		8/35 (0.23)	3/3  (1.0)
rexpt1		8/35 (0.23)	3/5  (0.6)
iexpt		7/35 (0.2)	5/10 (0.5)

For rexpt, we can also compare performance of both the interpreted and compiled
versions with the hand-coded version of problem 1.  (I am making the comparison
with the best hand-coded version, which eliminates the save and restore of n,
as described above in the solution to problem 1.)

		speed-up	space-saving
		  over		over
		interpreted	interpreted
compiled	0.23		1.0
hand-crafted	0.03		0.33

The speed-up given by the hand-crafted code (a factor of 35) is quite
impressive.  Our simple compiler produces code that is slower by a factor of 8
and less space-efficient by a factor of 3 than the best hand-produced code.
Here is a listing of the code for rexpt produced by our compiler:
 (...
  ENTRY20 
  (ASSIGN ENV (ENV-OF-COMPILED-PROCEDURE (FETCH FUN))) 
  (ASSIGN ENV (EXTEND-ENVIRONMENT '(N B) (FETCH ARGL) (FETCH ENV))) 
  (SAVE ENV)
  (ASSIGN FUN (LOOKUP-VARIABLE-VALUE '= (FETCH ENV))) 
  (ASSIGN VAL (LOOKUP-VARIABLE-VALUE 'N (FETCH ENV))) 
  (ASSIGN ARGL (CONS (FETCH VAL) '()))
  (ASSIGN VAL '0) 
  (ASSIGN ARGL (CONS (FETCH VAL) (FETCH ARGL))) 
  (ASSIGN CONTINUE AFTER-CALL25)
  (SAVE CONTINUE)
  (GOTO APPLY-DISPATCH) 
  AFTER-CALL25
  (RESTORE ENV)
  (BRANCH (FETCH VAL) TRUE-BRANCH22) 
  (ASSIGN FUN (LOOKUP-VARIABLE-VALUE '* (FETCH ENV)))
  (SAVE FUN) 
  (ASSIGN VAL (LOOKUP-VARIABLE-VALUE 'B (FETCH ENV))) 
  (ASSIGN ARGL (CONS (FETCH VAL) '()))
  (SAVE ARGL) 
  (ASSIGN FUN (LOOKUP-VARIABLE-VALUE 'EXPT (FETCH ENV)))
  (SAVE FUN) 
  (ASSIGN VAL (LOOKUP-VARIABLE-VALUE 'B (FETCH ENV))) 
  (ASSIGN ARGL (CONS (FETCH VAL) '()))
  (SAVE ARGL) 
  (ASSIGN FUN (LOOKUP-VARIABLE-VALUE '- (FETCH ENV))) 
  (ASSIGN VAL (LOOKUP-VARIABLE-VALUE 'N (FETCH ENV))) 
  (ASSIGN ARGL (CONS (FETCH VAL) '()))
  (ASSIGN VAL '1) 
  (ASSIGN ARGL (CONS (FETCH VAL) (FETCH ARGL))) 
  (ASSIGN CONTINUE AFTER-CALL24)
  (SAVE CONTINUE)
  (GOTO APPLY-DISPATCH) 
  AFTER-CALL24
  (RESTORE ARGL) 
  (ASSIGN ARGL (CONS (FETCH VAL) (FETCH ARGL))) 
  (RESTORE FUN)
  (ASSIGN CONTINUE AFTER-CALL23)
  (SAVE CONTINUE) 
  (GOTO APPLY-DISPATCH)
  AFTER-CALL23
  (RESTORE ARGL)
  (ASSIGN ARGL (CONS (FETCH VAL) (FETCH ARGL)))
  (RESTORE FUN) 
  (GOTO APPLY-DISPATCH)
  TRUE-BRANCH22
  (ASSIGN VAL '1)
  (RESTORE CONTINUE) 
  (GOTO (FETCH CONTINUE))
  )

Two possible major optimizations that one could add to come closer to the
hand-crafted code's performance are:

1.  Open-coding of primitive functions.  To do a multiply on our machine, we
need only use a single instruction like
(assign val (* (fetch b) (fetch val)))
as we used in problem 1.  The compiler, however, must instead set up an
argument list in the argl register, the function (primitive-*, in this case)
in the fun register, and then call apply-dispatch.  This is much slower, of
course.  Note too that the compiled code looks up * in the environment, whereas
the hand-coded routine "knows" that its value is the multiply primitive.  One
danger of such knowledge, of course, is that if we were to redefine * to mean
addition, the hand-coded (or open-code optimized) code would never notice.

2.  Using registers to hold important values.  In the hand-coded machine, we
did not need to manipulate the environment nor to look up values in it, because
the values of b and n, our two variables, were in registers.  In many actual
computers, the hardware provides some small number (usually between 4 and 32)
extra registers, not needed by the operation of the machine itself, that can
be used by programmers or compilers to hold the most important and often-used
values in a computation.  If we imagined extending the explicit-control
evaluator of Scheme by adding a set of extra registers, say r0, r1, ... r7,
then the compiler could use, say, r0 to hold n and r1 to hold b, and avoid
having to look up the values of these variables in the environment except when
the function was first called.  This optimization is especially important when
combined with the open-coding of primitives, as suggested above.

In addition, optimizations based on knowing special properties of certain
functions, the propagation of constant values at compile time, moving of
code segments to reduce the number of needed goto's, and many other possible
optimizations can be applied.  Some state-of-the-art compilers produce code
that is closely comparable with the best efforts of human coders, although
of course such compilers are much larger and more complex than what we have
presented.
