In Proceedings of the 9th Annual ACM Symposium on Parallel Algorithms and Architectures. June 1997.

**Abstract:**
Pipelining has been used in the design of many PRAM algorithms to reduce
their asymptotic running time. Paul, Vishkin and Wagener (PVW) used the
approach in a parallel implementation of 2-3 trees. The approach was later
used by Cole in the first *O(log n)* time sorting algorithm on the PRAM
not based on the AKS sorting network, and has since been used to improve
the time of several other algorithms. Although the approach has improved
the asymptotic time of many algorithms, there are two practical
problems: maintaining the pipeline is quite complicated for the
programmer, and it forces the code to be executed in a highly synchronous
manner, making it harder to schedule for locality or space efficiency.

In this paper we show how futures (a parallel language construct) can
be used to implement pipelining without requiring the user to code it
explicitly, allowing for much simpler code and more asynchronous
execution. A runtime system then manages the pipelining implicitly.
As with user-managed pipelining, we show how the technique reduces the
depth of many algorithms by a logarithmic factor over the nonpipelined
version. We describe and analyze four algorithms for which this is
the case: a parallel merging algorithm on trees, a parallel version of
insertion into and deletion from randomized balanced trees (treaps),
and insertion into a variant of the PVW 2-3 trees. To determine the
runtime of algorithms we first analyze algorithms in a language-based
cost model in terms of the work *w* and depth *d* of
computations, and then show universal bounds for implementing the
language on various machine models.

@inproceedings{BGMZ97, author = "Guy Blelloch and Margaret Reid-Miller", title = "Pipelining with Futures", booktitle = "Proceedings of the 9th Annual {ACM} Symposium on Parallel Algorithms and Architectures", pages = "249--259", address = "Newport, RI", year = 1997, month = jun}