Communications of the ACM, 39(3), March, 1996.

We conclude our examples with brief discussions of three other
algorithms: the fast Fourier transform (FFT), the scan operation (all
prefix sums), and an algorithm for finding the *k*th smallest
element of a set. All the code is shown in Figure 11.
These algorithms give more demonstration of the conciseness of
nested data parallel constructs.

**Figure 11:** Code for the fast Fourier
transforms, the scan operation, and for finding the smallest
element of a set.

We use the standard recursive version for the fast Fourier transform
(FFT) [11]. The second argument `w` is a sequence
of the same length as `a` containing all the complex *n*th roots
of unity. The FFT is called recursively on the odd and even elements
of `a`. The results are then combined using `cadd` and `
cmult` (complex addition and multiplication).
Assuming that `cadd` and `cmult` take constant work and depth,
then the recursion gives us the costs:

The plus-scan operation (also called all-prefix-sums) takes a sequence
of values and returns a sequence of equal length for which each
element is the sum of all previous elements in the original sequence.
For example, executing a plus-scan on the sequence `[3, 5, 3, 1, 6]` returns `[0, 3, 8, 11, 12]`.
This can be implemented as shown in Figure 11.
The algorithm works by elementwise adding the odd and even elements and
recursively solving the problem on these sums. The result of the
recursive call is then used to generate all the prefix sums.
The costs are:

The particular code shown only works on sequences that have a length equal to a power of two, but it is not hard to generalize it to work on sequences of any length.

A variation of Quicksort can be used to find the `k`th
smallest element of a sequence [11]. This algorithm
only calls itself recursively on the set of elements containing the
result. Here we consider a parallel version of this algorithm. After
selecting the `lesser` elements, if `#lesser` is greater than
`k`, then the `k`th smallest element must belong to that
set. In this case, the algorithm calls `kth_smallest`
recursively on `lesser` using the same `k`. Otherwise, the
algorithm selects the elements that are greater than the pivot, and
can similarly find if the `k`th element belongs in `
greater`. If it does belong in `greater`, the algorithm calls
itself recursively, but must now readjust `k` by subtracting the
number of elements less than or equal to the pivot. If the `
k`th element belongs in neither `lesser` nor `greater`,
then it must be the pivot, and the algorithm returns this value. For
sequences of length *n* the expected work of this algorithm is *O(n)*,
which is the same as the time of the serial version. The expected
depth is *O(log n)* since the expected depth of recursion is *O(log n)*.

Guy Blelloch, blelloch@cs.cmu.edu