From Programming Parallel Algorithms.
Communications of the ACM, 39(3), March, 1996.


next up previous
Next: Examples of Parallel Algorithms Up: Nested Data-Parallelism and NESL Previous: NESL

The Performance Model 

We now return to the issue of performance models, this time in the context of NESL. As mentioned earlier, NESL defines work and depth in terms of the work and depth of the primitive operations and rules for composing the measures across expressions. We will use tex2html_wrap_inline1697 and tex2html_wrap_inline1699 to refer to the work and depth of evaluating an expression e. In most cases the work and depth of an expression are the sums of the work and depth of the subexpressions. So, for example, if we have an expression tex2html_wrap_inline1703 where tex2html_wrap_inline1705 and tex2html_wrap_inline1707 are subexpressions then the work of the expression is

displaymath1693

where the 1 is the cost of the add. A similar rule is used for depth. The interesting rules concerning parallelism are the rules for an apply-to-each expression:

   eqnarray355

The first rule specifies that the work is the sum of the work of each of the applications of tex2html_wrap_inline1711 to tex2html_wrap_inline1713 , plus the work of tex2html_wrap_inline1715 , plus 1 to account for overheads. The rule for depth is similar, but takes the maximum of the depth of each application of tex2html_wrap_inline1717 . This supports our intuition that the applications are executed in parallel and the evaluation of the apply-to-each completes when the last call completes. The other interesting rules are the rules for an if expression, which for work is

  eqnarray364

with a similar rule for depth. The work and depth for a function call and for scalar primitives are 1 each. The costs of the NESL functions on sequences are summarized in Figure 6. We note that the performance rules can be more precisely defined using an operational semantics [7].

   figure379
Figure 6: List of some of the sequence functions supplied by NESL. The work required by each function is given in the Work column: L(v) refers to the length of the sequence v. The work of the write(d,a) function actually depends on whether the argument d needs to be copied or not, but in the examples in this paper the difference has no effect.

   figure445
Figure 7: Calculating the work and depth of {factorial(n) : n in [3,1,5,2]}.

As an example of composing work and depth, consider evaluating the expression

  {factorial(n) : n in a}

where a = [3,1,5,2]. Using the rules for work and the code for factorial given earlier, we can write the following equation for work:

eqnarray468

where tex2html_wrap_inline1721 , tex2html_wrap_inline1723 , and tex2html_wrap_inline1725 are the work for ==, *, and -, and are all 1. The two unit constants come from the cost of the function call and the if-then-else rule. Adding up the terms and solving the recurrence gives tex2html_wrap_inline1727 . Since there is no parallelism in the factorial function, the depth is the same as the work. To calculate work and depth for the full expression {factorial(n) : n in a} we can use equations 1 and 2. This calculation is shown in Figure 7.



next up previous
Next: Examples of Parallel Algorithms Up: Nested Data-Parallelism and NESL Previous: NESL



Guy Blelloch, blelloch@cs.cmu.edu