List Ranking and List Scan

Background: Theory indicates that the use of list, tree, and graph data structures can significantly reduce the representation of large problems and, therefore, can improve asymptotic performance. Many Parallel Random Access Machine (PRAM) algorithms for such data structures have been developed. Our interest is to develop and implement PRAM algorithm primitives to show that they can be used as building blocks for more complex algorithms. For example, scan has shown to be a very powerful primitive that can be applied to arrays linearly ordered in consecutive locations. List ranking and list scan are similar operations applied to linked lists. Applications include finding the Euler tour of a tree, load balancing, breaking symmetry, parallel tree contraction, expression evaluation, graph 3-connectivity, and planar graph embedding. We implemented a new vector parallel algorithm on the Cray Y-MP/C90 at the Pittsburgh Supercomputing Center. It is, as far as we know, the fasted implementation to date. In addition, it is the first implementation of which we are aware that significantly out performs fast workstations. Eight processors on the C90 is more than 200 times faster than a high-end workstation, a Dec 3000/600 Alpha.

Definitions: List ranking finds the distance of each vertex from the head of a linked list. List scan (parallel prefix) is similar. It compute for each vertex the "sum" of the weights of prior vertices in the linked list, where "sum" is a binary associative operator.

Logo

Performance: We implemented and compared four parallel list ranking algorithms, and also compared them with a serial version. The serial algorithm is trivial. Simply traverse each link in the linked list counting the links. Wyllie's algorithm is the simplest parallel algorithm. But it is not work efficient, and so its performance degrades for longer linked lists. The Miller/Reif algorithm is the simplest randomized algorithm but requires load balancing. The Anderson/Miller algorithm is also randomized but does not require any load balancing. We did not spend much time optimizing the code for the above algorithms. But we expect their performance to be within a factor of 2 or 3 of optimized code. Finally, the Blelloch/Reid-Miller algorithm is the algorithm we developed. The figure below shows the results of the five algorithms on one processor of the Cray Y-MP/C90. The parallel algorithms are vectorized and scale almost linearly on multiple processors. On eight processors our algorithm can process one link of a linked list every 4.6 nsec.

Graph

Papers:Margaret Reid-Miller and Guy Blelloch wrote an article List Ranking and List Scan on the Cray C-90 that describes the five list ranking algorithms, the vector multiprocessor implementation and performance of the algorithms on the Cray C-90, and the performance analysis of their algorithm. Margaret Reid-Miller gave a presentation of the work at SPAA'94.

Up to the Irregular Algorithms page.


mrmiller@cs.cmu.edu Last updated 16 June 1998