Scans on Vector Multiprocessors: Guy Blelloch, Sid Chatterjee, and Marco Zagha wrote a paper for Supercomputing'90 entitled Scan Primitives for Vector Computers that describes an optimized implementation of a set of scan primitives for a CRAY Y-MP. The algorithm used to implement the scans is based on an algorithm for parallel computers and is applicable with minor modifications to any register-based vector computer. An important aspect of our implementation is that a set of segmented versions of these scans are only marginally more expensive than the unsegmented versions. These segmented versions can be used to execute a scan on multiple data sets without having to pay the vector startup cost (n_1/2) for each set. We have used the techniques described in the paper for implementing sorting and sparse matrix multiplication algorithms, and for implementing a C Vector Library (CVL) which is used in the current NESL implementation.
Linear Recurrences on Vector Multiprocessors: Guy Blelloch, Sid Chatterjee, and Marco Zagha wrote a paper for IPPS 92 entitled Solving Linear Recurrences with Loop Raking (A revised version will appear in JPDC). The paper presents a variation of the partition method for solving linear recurrences that is well-suited to vector multiprocessors. The algorithm reduces the number of memory accesses and temporary memory requirements as compared to the more commonly used version of the partition method. Our variation uses a general loop restructuring technique called loop raking. Our implementations of this technique on a single processor of the Cray C90 (for first- and second-order linear recurrences) are up to 7.3 times faster than the corresponding optimized library routines in SCILIB. On 4 processors, we gain an additional speedup of at least 3.7.
Related work: