function MxV(Mval, Midx, Mlen, Vect) = let v = Vect -> Midx; p = {a * b: a in Mval; b in v} in {sum(row) : row in nest(p, Mlen)};In this example,

**Sparse Matrix Multiplication on Vector Multiprocessors:**
The NAS
benchmark report includes timings of our Cray Y-MP/C90
implementation of the Conjugate Gradient benchmark. Our implementation
is currently the fastest reported for any machine. In our paper,
Segmented
Operations for Sparse Matrix Computation on Vector Multiprocessors
by Guy Blelloch, Mike Heroux (at Cray
Research), and Marco Zagha, we present a new technique for sparse matrix
multiplication on vector multiprocessors based on a fully vectorized and
parallelized *segmented sum* algorithm. Because of our method's
insensitivity to relative row size, it is better suited than the
Ellpack/Itpack or the Jagged Diagonal algorithms for matrices which have a
varying number of non-zero elements in each row. Furthermore, our approach
requires less preprocessing, less auxiliary storage, and uses a more
convenient data representation (an augmented form of the standard compressed
sparse row format). On a test suite of sparse matrices from the
Harwell-Boeing collection and industrial application codes, our Cray Y-MP C90
implementation performs up to 3 times faster than the Jagged Diagonal
algorithm and up to 5 times faster than Ellpack/Itpack method. Here is a
summary of our results for 5 test matrices on a 16-processor C90 comparing our
algorithm (SEG) to several other algorithms:

Our
sparse
matrix multiplication code for the Cray Y-MP family (written in Cray
assembly language) is publicly available.

**Related Work:**
Also see our Graph Separators page
for more information about implementing finite element computation on
distributed memory machines.

Up to the Irregular Algorithms page.

marcoz@cs.cmu.edu. Last updated 16 July 1994