NESL

From Programming Parallel Algorithms.
Communications of the ACM, 39(3), March, 1996.

Next: The Performance Model Up: Nested Data-Parallelism and NESL Previous: Nested Data-Parallelism and NESL

NESL

This article uses NESL [5] as an example of a nested data-parallel language. This section gives an overview of the language and Section 3 gives several examples of parallel algorithms described and analyzed with NESL. NESL was designed to express nested parallelism in a simple way with a minimum set of structures and was therefore designed as a language on its own rather than as an extension of an existing sequential language. The ideas, however, can clearly be used in other languages. NESL is loosely based on ML [19], a language with a powerful type system, and on SETL [22], a language designed for concisely expressing sequential algorithms. As with ML, NESL is mostly functional (has only limited forms of side effects), but this feature is tangential to the points made in this article.

NESL supports data-parallelism by means of operations on sequences--one dimensional arrays. All elements of a sequence must be of the same type, and sequence indices are zero based (a[0] extracts the first element of the sequence a). The main data-parallel construct is apply-to-each, which uses a set-like notation. For example, the expression

  {a * a : a in [3, -4, -9, 5]};

squares each elements of the sequence [3, -4, -9, 5] returning the sequence [9, 16, 81, 25]. This can be read: ``in parallel, for each a in the sequence , square a''. The apply-to-each can be used over multiple sequences. The expression

  {a + b : a in [3, -4, -9, 5]; b in [1, 2, 3, 4]};

adds the two sequences elementwise returning [4, -2, -6, 9]. The apply-to-each construct also provides the ability to subselect elements of a sequence based on a filter. For example

  {a * a : a in [3, -4, -9, 5] | a > 0};

can be read: ``in parallel, for each a in the sequence such that a is greater than 0, square a''. It returns the sequence [9, 25]. The elements that remain maintain their relative order. This filtering was used in the Quicksort example.

Any function, whether primitive or user defined, may be applied to each element of a sequence. So, for example, we could define

  function factorial(n) =
    if (n == 1) then 1
    else n*factorial(n-1);

and then apply it over the elements of a sequence as in

  {factorial(i) : i in [3, 1, 7]};

which returns the sequence [6, 1, 5040].

In addition to the parallelism supplied by apply-to-each, NESL provides a set of functions on sequences, each of which can be implemented in parallel. For example the function sum adds the elements of a sequence, and the function reverse reverses the elements of a sequence. Perhaps the most important function on sequences is write, which supplies the only mechanism to modify multiple values of a sequence in parallel. write takes two arguments: the first is the sequence to modify and the second is a sequence of integer-value pairs that specify what to modify. For each pair (i,v) the value v is inserted into position i of the destination sequence. For example

  write([0, 0, 0, 0, 0, 0, 0, 0],
        [(4,-2),(2,5),(5,9)]);

inserts the -2, 5 and 9 into the sequence at locations 4, 2 and 5, respectively, returning

  [0, 0, 5, 0, -2, 9, 0, 0] .

If an index is repeated, then one value is written non-deterministically. For readers familiar with the variants of the PRAM model, we note that the write function is analogous to an ``arbitrary'' concurrent write. NESL also includes a function e_write that does not allow repeated indices, and is analogous to an exclusive write. If repeated indices are used with e_write, the current implementation reports an error.

Nested parallelism is supplied in NESL by allowing sequences to be nested and allowing parallel functions to be used in an apply-to-each. For example, we could apply the sum function in parallel over a nested sequence, as in

  {sum(a) : a in [[2,3], [8,3,9], [7]]};

which would return [5, 20, 7]. Here there is parallelism both within each sum and across the sums. The Quicksort algorithm showed another example of nested calls--the algorithm is itself used in an apply-to-each to invoke two recursive calls in parallel.

Next: The Performance Model Up: Nested Data-Parallelism and NESL Previous: Nested Data-Parallelism and NESL

Guy Blelloch, blelloch@cs.cmu.edu