A New Approach for Automatic Parallelization
            of Blocked Linear Algebra Computations


                  H. T. Kung and Jaspal Subhlok
                   School of Computer Science
                   Carnegie Mellon University
                 Pittsburgh, Pennsylvania 15213


                           Abstract
                                                                     
   This paper describes a new approach for automatic genera-
tion of efficient parallel programs from sequential blocked lin-
ear algebra programs. By exploiting recent progress in fine-
grain parallel architectures such as iWarp, and in libraries
based on matrix-matrix block operations such as LAPACK, the
approach is expected to be effective in parallelizing a large
class of linear algebra computations. An implementation of
LAPACK on iWarp is under development. In the implementa-
tion, block routines are executed on the iWarp processor array       
using highly parallel systolic algorithms. Matrices are distrib-    
uted over the array in a way that allows parallel block routines    
to be used wherever the original program calls a sequential          
block routine. This data distribution scheme significantly sim-
plifies the process of parallelization, and as a result, efficient   
parallel versions of programs can be generated automatically.        
We discuss experiences and performance results from our pre-         
liminary implementation, and present the design of a fully auto-     
matic system.                                                        
                                                                     

Note: reprint from proceedings of Supercomputing '91, Albuquerque, NM,
Nov 1991