% Solutions for Problem Set 10 6.001 Fall 1988.
% Prepared by Jacob Katzenelson, formatted by Nikhil, December 12, 1988

\documentstyle[11pt,twoside]{article}

% HORIZONTAL MARGINS
% Left margin 1 inch (0 + 1)
\setlength{\oddsidemargin}{0in}
% Right margin 1 inch (0 + 1)
\setlength{\evensidemargin}{0in}
% Text width 6.5 inch (so right margin 1 inch).
\setlength{\textwidth}{6.5in}

% VERTICAL MARGINS
% Top margin 0.5 inch (-0.5 + 1)
\setlength{\topmargin}{-0.5in}
% Head height 0.25 inch (where page headers go)
\setlength{\headheight}{0.25in}
% Head separation 0.25 inch (between header and top line of text)
\setlength{\headsep}{0.25in}
% Text height 9 inch (so bottom margin 1 in)
\setlength{\textheight}{9in}

% PARAGRAPH INDENTATION
\setlength{\parindent}{0in}
% SPACE BETWEEN PARAGRAPHS
\setlength{\parskip}{\medskipamount}

% SHORT FORMS FOR ITALICIZED ``i.e.'', ``e.g.'', ``etc.''
\newcommand{\ie}{{\em i.e.,\/}}
\newcommand{\eg}{{\em e.g.,\/}}
\newcommand{\etc}{{\em etc.\/}}

% HORIZONTAL STRUT.  One argument (width).
\newcommand{\hstrut}[1]{\hspace*{#1}}
% VERTICAL STRUT. Two arguments (offset from baseline, height).
\newcommand{\vstrut}[2]{\rule[#1]{0in}{#2}}

% EMPTY BOXES OF VARIOUS WIDTHS, FOR INDENTATION
\newcommand{\hmm}{\hspace*{2em}}
\newcommand{\hmmmm}{\hspace*{4em}}

% VARIOUS CONVENIENT WIDTHS RELATIVE TO THE TEXT WIDTH, FOR BOXES.
\newlength{\hlessmm}
\setlength{\hlessmm}{\textwidth}
\addtolength{\hlessmm}{-2em}

\newlength{\hlessmmmm}
\setlength{\hlessmmmm}{\textwidth}
\addtolength{\hlessmmmm}{-4em}

% ----------------------------------------------------------------
% CODE FONT (e.g. {\cf x := 0}).
\newcommand{\cf}{\footnotesize\tt}
% ----------------------------------------------------------------
% LISP CODE DISPLAYS.
% Lisp code displays are enclosed between \blisp and \elisp.
% Most characters are taken verbatim, in typewriter font,
% Except:
%  Commands are still available (beginning with \)
%  Math mode is still available (beginning with $)

\outer\def\beginlisp{%
  \begin{list}{$\bullet$}{%
    \setlength{\topsep}{0in}
    \setlength{\partopsep}{0in}
    \setlength{\itemsep}{0in}
    \setlength{\parsep}{0in}
    \setlength{\leftmargin}{1.5em}
    \setlength{\rightmargin}{0in}
    \setlength{\itemindent}{0in}
  }\item[]
  \obeyspaces
  \obeylines \footnotesize\tt}

\outer\def\endlisp{%
  \end{list}
  }

{\obeyspaces\gdef {\ }}

% ----------------------------------------------------------------

\markboth{PS10 Solutions}{PS10 Solutions}
\pagestyle{myheadings}

\begin{document}

\begin{center}
{\Large\bf MASSACHUSETTS INSTITUTE OF TECHNOLOGY} \\
{\large\bf Department of Electrical Engineering and Computer Science} \\
{\bf 6.001: Structure and Interpretation of Computer Programs} \\
{\bf Fall Semester, 1988}

{\bf Solutions for Problem Set 10} \\

\end{center}

% ----------------------------------------------------------------
\mbox{}\hrulefill\mbox{}

{\large\bf Part I --- The Explicit Control Evaluator}

{\bf Problem 1 -- Running Interpreted Code}

{\tt sum-rec}:

\begin{tabular}[t]{|l|l|r|r|r|}
\hline
$n$    & \tt  lst & opers & pushes & max depth \\
\hline
\hline
 0 & \tt ()                  &   134    &     13   &     6    \\
\hline
 1 & \tt (5)                 &   398    &     44   &     9    \\
\hline
 2 & \tt (5 7)               &   662    &     75   &    12    \\
\hline
 3 & \tt (5 7 2)             &   926    &    106   &    15    \\
\hline
 4 & \tt (5 7 2 3)           &  1190    &    137   &    18    \\
\hline
\end{tabular}
\hfill
\begin{tabular}[t]{|lcl|}
\hline
Operations & = & $134 + 264 n $\\
Pushes     & = & $ 13 +  31 n $\\
Depth      & = & $  6 +   3 n $\\
\hline
\end{tabular}

\bigskip

{\tt sum-iter}:

\begin{tabular}[t]{|l|l|r|r|r|}
\hline
$n$    & \tt  lst & opers & pushes & max depth \\
\hline
\hline
 0 & \tt ()                  &   237    &     26   &     6    \\
\hline
 1 & \tt (5)                 &   519    &     60   &     8    \\
\hline
 2 & \tt (5 7)               &   801    &     94   &     8    \\
\hline
 3 & \tt (5 7 2)             &   1083    &    128   &     8    \\
\hline
 4 & \tt (5 7 2 3)           &  1365    &    162   &     8    \\
\hline
\end{tabular}
\hfill
\begin{tabular}[t]{|lcl|}
\hline
Operations & = & $ 237 + 282 n $ \\
Pushes     & = & $  26 +  34 n $ \\
Depth      & = & $ 6 $ if $ n=0 $, $ 8 $ if $n > 0 $ \\
\hline
\end{tabular}

\bigskip

{\em Explanation\/}

The major difference arises in the usage of the stack.  The size of the stack
grows linearly with $n$ for {\tt sum-rec}, a linear recursive process.
Because of the tail-recursion optimization in the explicit-control evaluator,
{\tt sum-iter}, an iterative process, runs in constant space--- the stack
does not grow with $n$.  Since the recursive call to {\tt loop} is the last
activity in {\tt loop}, it does not have to save anything on the stack for
the recursive call.

\bigskip

% ----------------------------------------------------------------

\mbox{}\hrulefill\mbox{}

{\large\bf Part II --- The Compiler}

{\bf Problem 2 -- Compiled Runs of {\tt sum-rec} and {\tt sum-iter}}

{\tt sum-rec} (compiled):

\begin{tabular}[t]{|l|l|r|r|r|}
\hline
$n$    & \tt  lst & opers & pushes & max depth \\
\hline
\hline
 0 & \tt ()                  &    73    &      7   &     3    \\
\hline
 1 & \tt (5)                 &   134    &     16   &     6    \\
\hline
 2 & \tt (5 7)               &   195    &     25   &     9    \\
\hline
 3 & \tt (5 7 2)             &   256    &     34   &    12    \\
\hline
 4 & \tt (5 7 2 3)            &   317    &     43   &    15    \\
\hline
\end{tabular}
\hfill
\begin{tabular}[t]{|lcl|}
\hline
Operations & = & $ 73 + 61 n $ \\
Pushes     & = & $  7 +  9 n $ \\
Depth      & = & $  3 +  3 n $ \\
\hline
\end{tabular}
 
\bigskip

{\tt sum-iter} (compiled):

\begin{tabular}[t]{|l|l|r|r|r|}
\hline
$n$    & \tt  lst & opers & pushes & max depth \\
\hline
\hline
 0 & \tt ()                  &    89    &      7   &     3    \\
\hline
 1 & \tt (5)                 &   154    &     17   &     4    \\
\hline
 2 & \tt (5 7)               &   219    &     27   &     4    \\
\hline
 3 & \tt (5 7 2)             &   284    &     37   &     4    \\
\hline
 4 & \tt (5 7 2 3)           &   349    &     47   &     4    \\
\hline
\end{tabular}
\hfill
\begin{tabular}[t]{|lcl|}
\hline
Operations & = & $ 89 + 65 n $ \\
Pushes     & = & $  7 +  10 n $ \\
Depth      & = & $ 3 $ if $ n = 0$ , $ 4 $ if $ n >  0 $ \\
\hline
\end{tabular}

\bigskip

% ----------------------------------------------------------------

\mbox{}\hrulefill\mbox{}

{\bf Problem 3 -- Comparing Compiled and Interpreted Traces}

In both the compiled and interpreted versions, stack space grows linearly
with $n$ for {\tt sum-rec} and stays at constant size for {\tt
sum-iter}.  The compiled versions use much fewer operations,  pushes and
stack space than the interpreted versions.  Some reasons for this are:
\begin{itemize}
\item In interpreted code, we have to go repeatedly through the
general ``classification'' code in {\tt eval-dispatch} to figure out what
kind of expression we are evaluating.  In compiled code, that analysis has
already been done by the compiler, and the code is setup directly to do the
right thing.  For example, the compiled code for {\tt sum-rec} directly
starts by evaluating the {\tt cond} predicate, whereas the interpreted
version must first discover that there is a {\tt cond}, then pull out the
first predicate, then start evaluating it.

\item In interpreted code, when sequences of expressions are evaluated (\eg\
operand lists, bodies of procedures, clauses of conditionals, \etc), the
machine does not know the length of the sequence.  Thus, it must loop,
repeatedly testing for the end of the sequence.  The compiler, on the other
hand, knows the length of the sequence, and it can generate sequential code
that directly executes each component, with no tests or loops.

\item In interpreted code, constants like {\tt 1} and {\tt ()} must be
classified and evaluated like any other ({\em via\/} {\tt ev-self-eval}).  In
compiled code, such constants are wired directly into the code.

\item In interpreted code, looking up a variable involves searching the
environment frame by frame, variable by variable, for the required binding.
In compiled code, the compiler knows the exact location of the variable in
the environment--- how many frames away from the current frame, and how many
variables down that frame.  So, the compiled code does no searching--- it
goes directly to the required binding.

\item In interpreted code, many registers are saved and restored according to
general rules.  On the other hand, the compiler can figure out that for
particular expressions, certain registers are not disturbed, certain
registers are needed, \etc\   Thus, it can generate code that does not do
unnecessary saves and restores.

\end{itemize}

% ----------------------------------------------------------------

\mbox{}\hrulefill\mbox{}

{\bf Problem 4 -- Understanding Compiled Code}

A file named "rec" was loaded and compiled.
The file content and the call to the compiler
are: 

\beginlisp
;;; The file "rec"
(define rec
'(define (sum-rec lst)
  (cond ((null? lst) 0)
	(else (+ (car lst) (sum-rec (cdr lst)))))))
\null
;;; Loading:
  1 ]=> (load "rec")
\null
;;; The call:
  2 ]=> (pp (compile rec))
\endlisp

The transcript of the call to the compiler
is shown below,  interspersed with explanatory text.

The compiled code  can be viewed as consisting of two parts:
\begin{itemize}
\item Code for generating of a procedure object: The first two lines
of the code and the last 5 lines, from {\cf after-lambda7} to end.
\item Code that implements the procedure body: starting from
{\cf entry8}.
\end{itemize}

The execution of the first part takes place when
the code is evaluated.
This evaluation is equivalent to the evaluation of
the definition {\cf (define (sum-rec lst) ...)}.
It starts at the first instruction and it does the following:
\begin{itemize}
\item Assumes that when executed, there will be
an environment in the {\cf env} register, and a continuation
at the top of the stack.

\item When executed, it will
\begin{itemize}
\item Insert a binding in that environment, associating the name {\cf
sum-rec} with a procedure object,

\item Leave the symbol {\cf sum-rec} in the {\cf val} register,

\item Pop the continuation and go to it.
\end{itemize}
\end{itemize}

At the entry point, it construct a procedure object corresponding to the {\cf
lambda} and put it in {\cf val}; It skip over the code for the procedure body, and executes the
code that performs the definition.  The procedure object is a closure
containing the code-pointer {\cf ENTRY8} and current environment.

\beginlisp
((ASSIGN VAL (MAKE-COMPILED-PROCEDURE ENTRY1 (FETCH ENV)))
 (GOTO AFTER-LAMBDA0)
\endlisp

Whenever {\cf sum-rec} is called, we will enter at {\cf ENTRY1}.  At that
time, {\cf fun} will have the procedure object itself, {\cf argl} will have
the list of argument values (evaluated), and {\cf continue} will contain
the continuation.  In this case, {\cf argl} contains a list of one value,
which is the list to be measured.  Pull out the environment from the
procedure object, and extend it with a binding for the symbol {\cf lst}.

\beginlisp
  ENTRY1
    (ASSIGN ENV (COMPILED-PROCEDURE-ENV (FETCH FUN)))
    (ASSIGN ENV (EXTEND-BINDING-ENVIRONMENT '(LST) (FETCH ARGL) (FETCH ENV)))
\endlisp

Now the actual procedure body begins. This is the compilation of \mbox{\cf
(null? lst)}, the predicate in the {\cf cond}.

\beginlisp
    (SAVE ENV)                 ; for expression in else clause
    (ASSIGN FUN (LOOKUP-VARIABLE-VALUE 'NULL? (FETCH ENV)))
    (ASSIGN VAL (LOOKUP-VARIABLE-VALUE 'LST (FETCH ENV)))
    (ASSIGN ARGL (CONS (FETCH VAL) '()))
    (ASSIGN CONTINUE AFTER-CALL3)
    (SAVE CONTINUE)
    (GOTO APPLY-DISPATCH)      ; to apply null?
\null
  AFTER-CALL3                 ; val now contains (null? lst)
    (RESTORE ENV)
    (BRANCH (TRUE? (FETCH VAL)) TRUE-BRANCH2)
\endlisp

If the predicate was false, we drop into the {\cf else} clause, which
directly begins evaluating \\
\mbox{\cf (+ (car lst) (sum-rec (cdr lst)))}, by saving
the {\cf +} procedure and partial argument list, and going to
evaluate \mbox{\cf (sum-rec (cdr lst))}:

\beginlisp
  (ASSIGN FUN (LOOKUP-VARIABLE-VALUE '+ (FETCH ENV)))
  (SAVE FUN)
  (SAVE ENV)
;; evaluate (car lst)
  (ASSIGN FUN (LOOKUP-VARIABLE-VALUE 'CAR (FETCH ENV)))
  (ASSIGN VAL (LOOKUP-VARIABLE-VALUE 'LST (FETCH ENV)))
  (ASSIGN ARGL (CONS (FETCH VAL) '()))
  (ASSIGN CONTINUE AFTER-CALL4)
  (SAVE CONTINUE)
  (GOTO APPLY-DISPATCH) ;; to compute (car lst)
AFTER-CALL4           ;; val has (car lst)
  (ASSIGN ARGL (CONS (FETCH VAL) '()))
  (RESTORE ENV)
  (SAVE ARGL)
;; prepare for evaluation of sum-rec
  (ASSIGN FUN (LOOKUP-VARIABLE-VALUE 'SUM-REC (FETCH ENV)))
  (SAVE FUN)
\null
  (ASSIGN FUN (LOOKUP-VARIABLE-VALUE 'CDR (FETCH ENV)))
  (ASSIGN VAL (LOOKUP-VARIABLE-VALUE 'LST (FETCH ENV)))
  (ASSIGN ARGL (CONS (FETCH VAL) '()))
  (ASSIGN CONTINUE AFTER-CALL6)
  (SAVE CONTINUE)
  (GOTO APPLY-DISPATCH) ;;to compute (cdr lst)
AFTER-CALL6           ;; val has  (cdr lst)
  (ASSIGN ARGL (CONS (FETCH VAL) '()))
  (RESTORE FUN)         ;; sum-rec is restored
  (ASSIGN CONTINUE AFTER-CALL5)
  (SAVE CONTINUE)
  (GOTO APPLY-DISPATCH) ;; to compute (sum-rec (cdr lst))
AFTER-CALL5  ;; (sum-rec (cdr lst)) in val
  (RESTORE ARGL)
  (ASSIGN ARGL (CONS (FETCH VAL) (FETCH ARGL)))
  (RESTORE FUN)
  (GOTO APPLY-DISPATCH) ;; to compute(+ ...)
;;; This apply-dispatch returns to the continution on top of stack; 
;;; this is the continuation put on the stack before sum-rec was called!
\null
TRUE-BRANCH2 ;; first condition code
  (ASSIGN VAL '0)
  (RESTORE CONTINUE)
;;; this is the continuation put on the stack before sum-rec was called!
  (GOTO (FETCH CONTINUE))
\null
\null
AFTER-LAMBDA0
  (PERFORM (DEFINE-VARIABLE! 'SUM-REC (FETCH VAL) (FETCH ENV)))
  (ASSIGN VAL 'SUM-REC)
  (RESTORE CONTINUE)
  (GOTO (FETCH CONTINUE)))
\endlisp

\bigskip

% ----------------------------------------------------------------

\mbox{}\hrulefill\mbox{}

\newpage

{\large\bf Part III --- Parallel Scheme}

\bigskip

{\bf Problem III.1}

\beginlisp
1 ]=> (load-and-go SMP gen-tree-form)
...Value: GEN-TREE; sweep: 31
((VALUE . GEN-TREE) (SWEEPS . 34) (TOT-INSTRUCTIONS . 34) (PROCESSORS . 1)
 (MAX-PARALLELISM . 1) (AVG-PARALLELISM . 1))
\null
1 ]=> (load-and-go SMP 'gen-tree)
.Value:
 (PROCEDURE
     (LAMBDA (IF (= (LOOKUP 0 0) 0)
                 23
                 ((LAMBDA (CONS (GEN-TREE (LOOKUP 0 0)) (GEN-TREE (LOOKUP 0 0))))
                  (- (LOOKUP 0 0) 1))))
     <PROCEDURE-ENV>); sweep: 10
((VALUE PROCEDURE (LAMBDA (IF (= (LOOKUP 0 0) 0) 23 ((LAMBDA (CONS
 (GEN-TREE (LOOKUP 0 0)) (GEN-TREE (LOOKUP 0 0)))) (- (LOOKUP 0 0) 1)))) ())
 (SWEEPS . 13) (TOT-INSTRUCTIONS . 13) (PROCESSORS . 1)
 (MAX-PARALLELISM . 1) (AVG-PARALLELISM . 1))
\endlisp

There are two differences:
\begin{itemize}

\item All references to formal parameters appear in ``index'' form, i.e., direct
indexes into the environment.  Advantages:
\begin{itemize}

\item Looking up a lambda-bound variable is much faster.  Instead of a
sequential search through all intervening frames, the evaluator can go
directly to the binding.  This is useful even in a sequential evaluator.

\item Instead of first building an argument list and then building a frame
from the argument list, the argument list itself can be used as the frame.

\item The frame can be built before knowing the value of the procedure object
in the application, since we don't need to know the formal parameter names.

\end{itemize}

\item The {\cf let} was changed to a lambda expression. Advantage: there will
be one less construct for the evaluator to worry about.
\end{itemize}

\bigskip

{\bf Problem III.2}

Step 1:
\beginlisp
1 ]=> (load-and-go SMP '(gen-tree 2))
................................................................
..............................Value: ((FULL (FULL . 23) FULL . 23)
 FULL (FULL . 23) FULL . 23); sweep: 944
((VALUE (FULL (FULL . 23) FULL . 23) FULL (FULL . 23) FULL . 23)
 (SWEEPS . 947) (TOT-INSTRUCTIONS . 2562) (PROCESSORS . 37)
 (MAX-PARALLELISM . 10) (AVG-PARALLELISM . 2.70539))
\endlisp

Step 2:
\beginlisp
1 ]=> (show-value SMP)
((23 . 23) 23 . 23)
\endlisp

Step 3:
\beginlisp
1 ]=> (show-pp SMP 60)Parallelism profile : 
60 : *
120 : *
180 : *
240 : *
300 : **
360 : *
420 : ***
480 : ***
540 : ***
600 : ***
660 : ***
720 : ***
780 : ******
840 : ******
900 : ****
DONE
\endlisp

Step 4:
\beginlisp
1 ]=> (load-and-go NSMP '(gen-tree 2))
.........................Value: ((EMPTY) EMPTY); sweep: 253.....
...............................
((VALUE (FULL (FULL . 23) FULL . 23) FULL (FULL . 23) FULL . 23)
 (SWEEPS . 613) (TOT-INSTRUCTIONS . 2044) (PROCESSORS . 37)
 (MAX-PARALLELISM . 10) (AVG-PARALLELISM . 3.33442))
\endlisp

Step 5:
\beginlisp
1 ]=> (show-value NSMP)
((23 . 23) 23 . 23)
\endlisp

Step 6:
\beginlisp
1 ]=> (show-pp NSMP 40)Parallelism profile : 
40 : *
80 : *
120 : **
160 : *
200 : **
240 : ***
280 : ***
320 : **
360 : ***
400 : **
440 : *****
480 : *******
520 : *****
560 : ******
600 : ****
DONE
\endlisp

$\bullet$ The resulting values (steps 2 and 5) are the same.  This is not
surprising--- we are running the same program, just changing the evaluation
model.

$\bullet$ Shape of profiles: As we go down the tree, the parallelism grows
exponentially.  As we return up the tree, the parallelism shrinks again,
exponentially.

$\bullet$ {\cf SMP} (Step 1) returns its result at sweep 944, at the end.
{\cf NSMP} (Step 4) returns its result at sweep 253, very much before the end
(613 sweeps).  The reason is that {\cf SMP} does not allocate or return the
topmost cons cell until the entire sub-trees are constructed, whereas {\cf
NSMP} allocates and returns the topmost cons cell as soon as possible, even
though recursive sub-processes are still busy filling out the lower parts of
the tree.  The trace also shows that when {\cf NSMP} returns its topmost cons
cell, its car and cdr slots are still empty.

\bigskip

{\bf Problem III.3}

In this problem, we fix the evaluation model ({\cf SMP}) and change the
algorithm, in order to demonstrate how the choice of algorithm affects the
available parallelism.

\beginlisp
]=> (load-and-go SMP count-atoms-r-form)
...Value: COUNT-ATOMS-R; sweep: 31
((VALUE . COUNT-ATOMS-R) (SWEEPS . 34) (TOT-INSTRUCTIONS . 34) (PROCESSORS . 1)
 (MAX-PARALLELISM . 1) (AVG-PARALLELISM . 1))
\endlisp

\beginlisp
1 ]=> (load-and-go SMP count-atoms-i-form)
...Value: COUNT-ATOMS-I; sweep: 31
((VALUE . COUNT-ATOMS-I) (SWEEPS . 34) (TOT-INSTRUCTIONS . 34) (PROCESSORS . 1)
 (MAX-PARALLELISM . 1) (AVG-PARALLELISM . 1))
\endlisp

\beginlisp
1 ]=> (load-and-go SMP count-atoms-i-loop-form)
...Value: COUNT-ATOMS-I-LOOP; sweep: 31
((VALUE . COUNT-ATOMS-I-LOOP) (SWEEPS . 34) (TOT-INSTRUCTIONS . 34) (PROCESSORS . 1)
 (MAX-PARALLELISM . 1) (AVG-PARALLELISM . 1))
\endlisp

\beginlisp
1 ]=> (load-and-go NSMP '(define tree2 (gen-tree 2)))
...........................Value: TREE2; sweep: 270.............
......................
((VALUE . TREE2) (SWEEPS . 625) (TOT-INSTRUCTIONS . 2061) (PROCESSORS . 37)
 (MAX-PARALLELISM . 10) (AVG-PARALLELISM . 3.2976))
\endlisp

\beginlisp
1 ]=> (load-and-go SMP '(count-atoms-r tree2))
................................................................
...............................Value: 4; sweep: 956
((VALUE . 4) (SWEEPS . 959) (TOT-INSTRUCTIONS . 2852) (PROCESSORS . 34)
 (MAX-PARALLELISM . 8) (AVG-PARALLELISM . 2.97393))
\endlisp

\beginlisp
1 ]=> (show-pp smp 40)Parallelism profile : 
40 : *
80 : *
120 : *
160 : *
200 : *
240 : *
280 : *
320 : ***
360 : ***
400 : **
440 : **
480 : ***
520 : **
560 : ***
600 : ***
640 : ******
680 : *******
720 : ****
760 : *****
800 : *****
840 : *****
880 : *****
920 : **
DONE
\endlisp

\beginlisp
1 ]=>(load-and-go SMP '(count-atoms-i tree2))
................................................................
................................................................
................................................................
................................................................
.......................Value: 4; sweep: 2790
((VALUE . 4) (SWEEPS . 2793) (TOT-INSTRUCTIONS . 4713) (PROCESSORS . 67)
 (MAX-PARALLELISM . 6) (AVG-PARALLELISM . 1.68743))
\endlisp

\beginlisp
1 ]=> (show-pp smp 150)Parallelism profile : 
150 : *
300 : *
450 : ***
600 : *
750 : **
900 : **
1050 : *
1200 : **
1350 : **
1500 : *
1650 : ***
1800 : *
1950 : **
2100 : *
2250 : *
2400 : ***
2550 : *
2700 : *
DONE
\endlisp

$\bullet$
\begin{center}
\begin{tabular}{|l|l|l|}
\hline
                & {\cf count-atoms-r} & {\cf count-atoms-i} \\
\hline
Sweeps          & 959                 & 2793 \\
\hline
Max parallelism & 8                   & 6    \\
\hline
\end{tabular}
\end{center}

$\bullet$ The difference in the sweep count has to do with the difference in
the algorithm.   The recursive form is a ``divide-and-conquer'' algorithm.
It spawns two sub-processes that independently count the atoms in the
sub-trees and then adds them up, so that the ``{\cf +}'' operations in the
algorithm form the same tree structure as the input tree.  Thus, the program can
complete in $O(\log n)$ time (i.e., proportional to the depth of the tree),
where $n$ is the number of nodes in the tree.

In the iterative version, the ``{\cf +}'' operations are strung out
sequentially;  thus, the algorithm takes $O(n)$ time.

The parallelism profiles reflect the fact that in the recursive version, the
parallelism grows exponentially, whereas in the iterative version, there is
essentially no parallelism.

\bigskip

{\bf Problem III.4}

In this problem, we fix the algorithm and change the evaluation model, in
order to demonstrate how the choice of evaluation model affects the available
parallelism.

\beginlisp
1 ]=> (load-and-go smp fringe-form)
...Value: FRINGE; sweep: 31
((VALUE . FRINGE) (SWEEPS . 34) (TOT-INSTRUCTIONS . 34) (PROCESSORS . 1)
 (MAX-PARALLELISM . 1) (AVG-PARALLELISM . 1))
\endlisp

\beginlisp
1 ]=> (load-and-go smp fringe-aux-form)
...Value: FRINGE-AUX; sweep: 31
((VALUE . FRINGE-AUX) (SWEEPS . 34) (TOT-INSTRUCTIONS . 34) (PROCESSORS . 1)
 (MAX-PARALLELISM . 1) (AVG-PARALLELISM . 1))
\endlisp

\beginlisp
1 ]=> (load-and-go smp '(fringe tree2))
................................................................
................................................................
................................................................
........................Value: ((FULL . 23) FULL (FULL . 23) FULL
 (FULL . 23) FULL (FULL . 23) FULL); sweep: 2164
((VALUE (FULL . 23) FULL (FULL . 23) FULL (FULL . 23) FULL (FULL . 23) FULL)
 (SWEEPS . 2167) (TOT-INSTRUCTIONS . 3269) (PROCESSORS . 44)
 (MAX-PARALLELISM . 4) (AVG-PARALLELISM . 1.50854))
\endlisp

\beginlisp
1 ]=> (show-value smp)
(23 23 23 23)
\endlisp

\beginlisp
1 ]=> (show-pp smp 80)Parallelism profile : 
80 : *
160 : *
240 : *
320 : *
400 : ***
480 : **
560 : *
640 : *
720 : **
800 : **
880 : *
960 : *
1040 : **
1120 : *
1200 : *
1280 : **
1360 : *
1440 : *
1520 : *
1600 : ***
1680 : *
1760 : *
1840 : *
1920 : **
2000 : *
2080 : *
2160 : **
DONE
\endlisp

\beginlisp
1 ]=> (load-and-go nsmp '(fringe tree2))
................................................................
....Value: ((FULL . 23) EMPTY); sweep: 689.........
((VALUE (FULL . 23) FULL (FULL . 23) FULL (FULL . 23) FULL (FULL . 23) FULL)
 (SWEEPS . 775) (TOT-INSTRUCTIONS . 2592) (PROCESSORS . 44)
 (MAX-PARALLELISM . 10) (AVG-PARALLELISM . 3.34452))
\endlisp

\beginlisp
1 ]=  (show-value nsmp)
(23 23 23 23)
\endlisp

\beginlisp
1 ]=> (show-pp nsmp 40)Parallelism profile : 
40 : *
80 : *
120 : **
160 : *
200 : *
240 : *
280 : *
320 : ****
360 : ****
400 : **
440 : **
480 : ***
520 : ******
560 : ********
600 : ******
640 : *****
680 : ******
720 : *****
760 : ***
DONE
\endlisp

$\bullet$ \begin{center}
\begin{tabular}{|l|l|l|}
\hline
                          & {\cf SMP} & {\cf NSMP} \\
\hline
Result returned at sweep  & 2164      & 689 \\
\hline
Sweeps                    & 2167      & 775 \\
\hline
Max parallelism           & 4         & 10  \\
\hline
Processors                & 44        & 44  \\
\hline
\end{tabular}
\end{center}

$\bullet$  The key to the difference lies in the fragment:
\beginlisp
...
        ...
                (fringe-aux$_l$ (car tree)
                            (fringe-aux$_r$ (cdr tree) lst)))))))
\endlisp
In the strict multiprocessor, the {\cf fringe-aux$_r$} call must finish,
returning a value, {\em before\/} it even begins {\cf fringe-aux$_l$}.  Thus,
the traversal of the tree is essentially sequential, from right to left.
There is essentially no parallelism,  and {\cf SMP} will take $O(n)$ time,
where $n$ is the number of nodes in the tree.

In the non-strict multiprocessor, the {\cf fringe-aux$_l$} call can be
started right away, even before the {\cf fringe-aux$_r$} call has returned a
value, i.e., the two calls to {\cf fringe-aux} can proceed concurrently.
Thus, the parallelism grows exponentially, and {\cf NSMP} can finish in
$O(\log n)$ time, where $n$ is the number of nodes in the tree.

This accounts for the shape of the parallelism profiles, and the dramatically
fewer sweeps for {\cf NSMP}.

$\bullet$ In both the non-strict and strict multiprocessors,
new processors are spawned in exactly the same situation.  Whenever a
processor encounters an application:
\beginlisp
(e$_1$ ... e$_n$)
\endlisp
it spawns exactly $n-1$ processors to evaluate {\cf e$_2$} through {\cf
e$_n$}, and itself goes on to evaluate {\cf e$_1$}.  Thus, the total
processor count in both models is the same, for a given program.

\end{document}
