\section{Run-time Support for Tasking}

\subsection{Motivation}

In a Net-Fx program, data parallel tasks running in submodules produce
and consume distributed data structures such as HPF arrays or DOME
load-balanced vectors.  Communicating distributed data structures
between submodules can involve quite complex {\em distribution
computation} performed by code generated by a compiler or built into a
run-time system.  Traditionally, this code has used low level
communication primitives supplied by an oblivious communication system
to actually transport data.  This works fine when the network are
simple and highly regular, such as the network of a parallel
supercomputer.  However, the network that Net-Fx orchestrated tasks
will use to communicate may have an arbitrary topology, links with
different speeds, and many different protocol stacks.  Further, the
network may have other, unrelated traffic.  A common operation may be
sending the data in a DOME vector distributed over several
workstations over a general use HIPPI network to an HPF array
distributed over the nodes of an Intel Paragon.  This added complexity
forces a rethinking of how to achieve high performance communication
between submodules.

\subsection{Overview}

The Net-Fx run-time system is primarily responsible for transporting
distributed data structures between submodules.  This communication
must avoid synchronizing the source and destination submodules, but at
the same time, it must support changing data distribution and
submodule bindings.  The details of these bindings, distribution
computation, and of actual data transport over a likely complex and
irregular network are hidden below simple, high-level, non-blocking
collective send and receive calls for whole distributed data
structures.

Within the run-time system itself, the tasks of distribution
computation, maintaining bindings, and communication are segregated
into subsystems with well defined interfaces between them.  This makes
it possible to orthogonally extend each subsystem.  Although we expect
that only a small set of binding semantics will have to be supported,
the number of distribution types needed will grow as Net-Fx is used to
coordinate non-Fx submodules.  Further, to achieve high performance,
it will be necessary to specialize the communication system for
different networks.  By clearly defining the interfaces between
subsystems, we make this specialization possible and easy.

The communication subsystem is the heart of the run-time system and
exports high-level send and receive calls to the Net-Fx program.  Well
defined interfaces connect it to the binding subsystem and the
distribution help subsystem.  The binding subsystem maps abstract
submodule and data structure identifiers into their current physical
realization and distributions, respectively.  The distribution help
subsystem performs distribution computation to convert between the
source and destination distributions and distribution types.  Because
some distribution types may not be supported directly by the
distribution help subsystem, or because a faster support function
exists, the Net-Fx program can register support functions for use by
the distribution help subsystem.  Both the registration interface and
the calling convention for such support functions are standardized so
that they are not tied to any implementation of the run-time system.

\subsection{Bindings}

In order to support the abstraction of collective intermodule communication 
and of a submodule as a set of consecutive processes, the run-time system 
must be able to bind submodule numbers and process numbers to their current 
physical realization.  Similarily, in order to perform distribution 
computation, the run-time system must be able to bind a pair of distribution 
types to the approporiate distribution computation routine.  Both 
kinds of bindings can take one of three forms: simple, static, or dynamic.  
It is determined at compile time which form a binding will take.  {\em 
Simple bindings} are canonical and represent a ``standard interface'' for 
legacy codes being used as Net-Fx tasks.  {\em Static bindings} are fully 
specified at compile time, and do not change over the course of executing 
the Net-Fx program.  Similarily, {\em dynamic bindings} are initially 
supplied at compile time, are full specifications, but may change during 
the course of execution, at the granularity of the task procedure call.

A Net-Fx program consists of some number of submodules, numbered $0
\dots N_s-1$.  Submodule $m$ is composed of some number of processes
numbered $0 \dots N_m-1$ which execute on the processors of a
processor group.  A processor group is a set of processors that share
common attributes.  For example, an Intel Paragon or a workstation
cluster may be a processor group.  A pair of functions
the function
\begin{equation}
submodid \rightarrow \{simple, static, dynamic\}
\end{equation}
\begin{equation}
submodid \times process \rightarrow procgrp \times processor \times localid
\end{equation}
tell us what form the submodule binding takes, and the current binding of a 
submodule identifier process to the processor group, processor, and local 
process id of the corresponding process, respectively.  The first function 
is fixed at compile time, while the second is only initialized at compile 
time.  The communication subsystem uses the submodule bindings to decide 
how to communicate between the processes in the source and target 
submodules.  Further, the bindings are used to present the ``pool of 
consecutively numbered processes'' abstraction to the distribution help 
subsystem.

Before distribution computation can be performed, it is necessary to bind 
the source and target distribution types to the appropriate distribution 
computation routine to invoke.  This is made possible by the functions
\begin{equation}
	disttype \rightarrow \{simple, static, dynamic\} \times disttype 
	\label{}
\end{equation}
\begin{equation}
	disttype \times disttype \rightarrow routine
	\label{}
\end{equation}
The first function tells us the form of the distribution (not distribution 
type) binding and what distribution type it is an instance of, if any.  The 
second function maps from distribution types to the corresponding 
distribution computation routine to invoke.  If a distribution is bound
dynamically, the run-time system will determine the current distribution
before performing distribution computation.  

Maintaining these bindings in the run-time system serves three functions.  
First, it permits the run-time communications interface can present an 
abstract view of communication of distributed data structures between 
submodules which may be changing.  Second, because of the submodule 
bindings, distribution computations are independent of the actual machine 
configuration.  Finally, support for changing dynamic bindings can be made 
implementation dependent.


\subsection{Programming Interface}

In this section, we examine the programming interface as it used
during the execution of a Net-Fx program.  The interface exported to
the Net-Fx node program from the communication system is discussed in
sections \ref{init}, \ref{comm}, and \ref{bindchg}.  Section
\ref{disthelp} discusses the interface the distribution help system
provides to the communication system.  Finally, sections \ref{custfunc},
\ref{custreg} discuss how custom distribution help functions are called
and registered.

\subsubsection{Initialization}
\label{init}
When a Net-Fx program begins, it specifies all initial bindings to the 
run-time system.  Notice that for each form of binding, initial values are 
available at compile time.  Each process of the program initializes the 
run-time system by supplying the initial bindings for {\em all} submodules 
and 
\begin{verbatim}
int init_modules(unsigned    nummods,
                 unsigned    numprocs[],
                 ArchDepInfo processor_group[],
                 unsigned    thismod,
                 unsigned    thisproc)
\end{verbatim}


\subsubsection{Communication}
\label{comm}
Communication between submodules is accomplished via collective, 
non-blocking \verb.t_send.  and \verb.t_recv.  calls.  These functions are 
called by every process of a submodule, except in the case of a simply 
bound submodule, in which case only the proxy process invokes them.  The
\verb.t_send.  and \verb.t_recv. functions have the following prototypes:
\begin{center}
\begin{verbatim}
int t_send(unsigned target_submodule, 
           void     *local_chunk,
           unsigned data_type,
           void     *sender_distribution,
           unsigned sender_distribution_type,
           void     *receiver_distribution,
           unsigned receiver_distribution_type,
           void     *hint,
           unsigned hint_type)
 
int t_recv(unsigned source_submodule, 
           void     *local_chunk,
           unsigned data_type,
           void     *sender_distribution,
           unsigned sender_distribution_type,
           void     *receiver_distribution,
           unsigned receiver_distribution_type,
           void     *hint,
           unsigned hint_type)
\end{verbatim}
\end{center}
The target and source submodule numbers signify which submodule will
be sent to and received from, respectively.  \verb.Local_chunk. points
to the first data item of the distributed data structure owned by the
calling processor.  The array elements are of type \verb.data_type..
Some data types numbers are reserved.  

Notice that both calls include the source and target distributions.


The communication system




\subsubsection{Registering Custom Distribution Help Functions}

\subsection{Implementation of network communication}
