A New Approach for Automatic Parallelization of Blocked Linear Algebra Computations H. T. Kung and Jaspal Subhlok School of Computer Science Carnegie Mellon University Pittsburgh, Pennsylvania 15213 Abstract This paper describes a new approach for automatic genera- tion of efficient parallel programs from sequential blocked lin- ear algebra programs. By exploiting recent progress in fine- grain parallel architectures such as iWarp, and in libraries based on matrix-matrix block operations such as LAPACK, the approach is expected to be effective in parallelizing a large class of linear algebra computations. An implementation of LAPACK on iWarp is under development. In the implementa- tion, block routines are executed on the iWarp processor array using highly parallel systolic algorithms. Matrices are distrib- uted over the array in a way that allows parallel block routines to be used wherever the original program calls a sequential block routine. This data distribution scheme significantly sim- plifies the process of parallelization, and as a result, efficient parallel versions of programs can be generated automatically. We discuss experiences and performance results from our pre- liminary implementation, and present the design of a fully auto- matic system. Note: reprint from proceedings of Supercomputing '91, Albuquerque, NM, Nov 1991