transposes(3) C LIBRARY FUNCTIONS transposes(3) NAME transpose routines in tplib.a - fast transpose primitive for distributed 2D matrixes on the T3D. Based on the CMU direct deposit communication model. C SPECIFICATION int tp_transpose_init() int tp_transpose_complex(b,a,LN,NM) doublecomplex b[],a[]; int LN,NM; int tp_transpose_double(b,a,LN,NM) double b[],a[]; int LN,NM; int tp_lb(x) int x; int tp_transpose_mode(mode) int mode; PARAMETERS _a,_b Distributed two dimensional N by N arrays to be tran- sposed as b = Transpose(a). _L_N Logarithm base 2 of array of transpose area size N, (i.e. N=2^LN). _M_N Size of the declared array dimension as an integer. _x Integer, power of two to compute logarithm base 2. _m_o_d_e Predefined constant for congestion control mode {tp_auto, tp_plain, tp_controlled, tp_random}. DESCRIPTION tp_transpose_init() Generates optimal communication schedules for the particular partition and initializes the transpose routines. tp_transpose_complex() Transposes N by N matrix of complex numbers represented as pair of 64bit doubles. tp_transpose_double() Transposes N by N matrix of double float numbers represented. Both transposes move and transpose the values as b=Transpose(a); a and b must point to non overlapping two dimensional arrays, block distributed by row across all P processors of the partition. Sun Release 4.1 Last change: 1 transposes(3) C LIBRARY FUNCTIONS transposes(3) The actual size of the N by N matrix is given by the parame- ter LN, the base2-logarithm of N. The size of the matrix can be different from the declared, lower dimension of the arrays which must be given separately as integer MN. In most cases arrays are declared a[N/P][N], except when declared as a[N/P][MN] with MN rounded up to next "cache line" prime for better cache performance. Transpose calls, return 0 if successful, -1 if a parameter error was detected. tp_transpose_mode() Optional call to influence congestion control. Defaults to _t_p__a_u_t_o. For large arrays and machines _t_p__a_u_t_o uses additional bar- riers to control congestion in the network. This can be overridden by _t_p__p_l_a_i_n forces and optimal schedule without additional synchronization, _t_p__r_a_n_d_o_m forces random schedule without synchronization, _t_p__c_o_n_t_r_o_l_l_e_d forces the optimal schedule with barriers. Examples fft2d.c: transposes are the heart of large single and multi- dimensional FFTs on distributed memory machine. The tran- spose is required to restore locality for the second half of the algorithm, the row FFTs. tptest.c: a simple test-program verifying the transpose. SEE ALSO HTML page "transposes.html" in the "doc" directory with per- formance characterization of the routines and performance comparison to CRAFT and PVM based primitives. The page also contains references to our papers on the CMU deposit syn- chronization model (ICS95) the direct deposit memory perfor- mance model for chained data transfers (ISCA95) and the AAPC algorithm that was used (SPAA94). Credits to the staff of the Pittsburgh Supercomputer Center and in particular to John Kyle, who endured my endless ques- tions about the T3D and my request to run and test on the full size machine. BUG Routines are limited to arrays with base type double or doublecomplex, machine partitions with a power of two (number or processors) and 2D matrices with powers of two in size. More advanced algorithms for address generation are available and will be incorporated to cover all but very unusual cases. We would like to hear from users whether the power of two limitation is a problem and which other cases Sun Release 4.1 Last change: 2 transposes(3) C LIBRARY FUNCTIONS transposes(3) are encountered in their applications/ Sun Release 4.1 Last change: 3