transposes(3) C LIBRARY FUNCTIONS transposes(3)
NAME
transpose routines in tplib.a - fast transpose primitive for
distributed 2D matrixes on the T3D. Based on the CMU direct
deposit communication model.
C SPECIFICATION
int tp_transpose_init()
int tp_transpose_complex(b,a,LN,NM)
doublecomplex b[],a[];
int LN,NM;
int tp_transpose_double(b,a,LN,NM)
double b[],a[];
int LN,NM;
int tp_lb(x)
int x;
int tp_transpose_mode(mode)
int mode;
PARAMETERS
_a,_b Distributed two dimensional N by N arrays to be tran-
sposed as b = Transpose(a).
_L_N Logarithm base 2 of array of transpose area size N,
(i.e. N=2^LN).
_M_N Size of the declared array dimension as an integer.
_x Integer, power of two to compute logarithm base 2.
_m_o_d_e Predefined constant for congestion control mode
{tp_auto, tp_plain, tp_controlled, tp_random}.
DESCRIPTION
tp_transpose_init() Generates optimal communication
schedules for the particular partition and initializes the
transpose routines.
tp_transpose_complex() Transposes N by N matrix of complex
numbers represented as pair of 64bit doubles.
tp_transpose_double() Transposes N by N matrix of double
float numbers represented.
Both transposes move and transpose the values as
b=Transpose(a); a and b must point to non overlapping two
dimensional arrays, block distributed by row across all P
processors of the partition.
Sun Release 4.1 Last change: 1
transposes(3) C LIBRARY FUNCTIONS transposes(3)
The actual size of the N by N matrix is given by the parame-
ter LN, the base2-logarithm of N. The size of the matrix can
be different from the declared, lower dimension of the
arrays which must be given separately as integer MN.
In most cases arrays are declared a[N/P][N], except when
declared as a[N/P][MN] with MN rounded up to next "cache
line" prime for better cache performance.
Transpose calls, return 0 if successful, -1 if a parameter
error was detected.
tp_transpose_mode() Optional call to influence congestion
control. Defaults to _t_p__a_u_t_o.
For large arrays and machines _t_p__a_u_t_o uses additional bar-
riers to control congestion in the network. This can be
overridden by _t_p__p_l_a_i_n forces and optimal schedule without
additional synchronization, _t_p__r_a_n_d_o_m forces random schedule
without synchronization, _t_p__c_o_n_t_r_o_l_l_e_d forces the optimal
schedule with barriers.
Examples
fft2d.c: transposes are the heart of large single and multi-
dimensional FFTs on distributed memory machine. The tran-
spose is required to restore locality for the second half of
the algorithm, the row FFTs.
tptest.c: a simple test-program verifying the transpose.
SEE ALSO
HTML page "transposes.html" in the "doc" directory with per-
formance characterization of the routines and performance
comparison to CRAFT and PVM based primitives. The page also
contains references to our papers on the CMU deposit syn-
chronization model (ICS95) the direct deposit memory perfor-
mance model for chained data transfers (ISCA95) and the AAPC
algorithm that was used (SPAA94).
Credits to the staff of the Pittsburgh Supercomputer Center
and in particular to John Kyle, who endured my endless ques-
tions about the T3D and my request to run and test on the
full size machine.
BUG
Routines are limited to arrays with base type double or
doublecomplex, machine partitions with a power of two
(number or processors) and 2D matrices with powers of two in
size. More advanced algorithms for address generation are
available and will be incorporated to cover all but very
unusual cases. We would like to hear from users whether the
power of two limitation is a problem and which other cases
Sun Release 4.1 Last change: 2
transposes(3) C LIBRARY FUNCTIONS transposes(3)
are encountered in their applications/
Sun Release 4.1 Last change: 3