
\section{Introduction}

As the performance of local area networks grows, it is increasingly
tempting to use a cluster of workstations as a parallel computer.  At
the same time, presentation layer APIs such as PVM~\cite{PVM} and
MPI~\cite{MP-STANDARD}, and parallel languages such as High
Performance Fortran~\cite{HPF} are being standardized, greatly
enhancing the portability of parallel programs to workstation
clusters.  Further, the parallel computing community has developed
extremely efficient implementations of these APIs and
languages~\cite{STICHNOTH-COMM-ARRAY-STATEMENTS-JOURNAL,AGRAWAL-CHAOS-BLOCK-STRUCT-JOURNAL,FAST-ASSEMBLY-ADDRESS-RELATIONS-INPROCEEDINGS}.

As implementations continue to become more efficient, the performance
of the network will be increasingly important.  In addition to
significantly increased connection and aggregate bandwidths, next
generation LANs, such as
ATM~\cite{ATM-BISDN-OSI,ATM-DESIGN-PRACTICAL-FORE}, will supply
quality of service (QoS) guarantees for connections.  Parallel
programs may be able to benefit from such guarantees.  However, to
extract a QoS guarantee from a network, an application must supply a
characterization of its traffic~\cite{FERRARI-REQUIREMENTS-RT-COMM}.
Much of the work in traffic characterization has concentrated on media
streams~\cite{FERRARI-MM-NET,GARRETT-VBR-VIDEO-MODELLING}, although
some work on ATM call admission for parallel applications has assumed
correlated bursty
traffic~\cite{CALL-ADMISSION-ATM-CORRELATED-BURSTY-SC95}.  In this
paper, we detail measurements of the traffic of dense matrix parallel
programs written in a dialect of High Performance Fortran and compiled
with the Fx parallelizing
compiler~\cite{TASK-PARALLELISM-HPF-JOURNAL}. 

In all, we measured the network behavior of six Fx parallel programs
on an Ethernet.  Five of these programs are kernels which exhibit
global communication patterns common to Fx programs.  Fx parallelizes
dense matrix codes written in a dialect of High Performance Fortran.
Fx targets the SPMD machine model, as do many other parallelizing
compilers.  We also look at a large scale example of an Fx
application, an air quality modeling application which is being
parallelized at CMU in a project related to
Fx~\cite{AIRSHED-DESCRIPTION}.

The outgrowth of these measurements is the observation that the
traffic of Fx parallel programs is fundamentally different from those
of media streams.  Specifically, parallel programs exhibit
\begin{itemize}
\item Global collective communication patterns
\item Correlated traffic along many connections
\item Constant burst sizes
\item Periodic burstiness
\item Bandwidth dependent periodicity
\end{itemize}

We characterize the programs' bandwidth demands by the
power spectra of their instantaneous average bandwidths.  These
spectra directly correspond to the Fourier series coefficients needed
to reconstruct the instantaneous average bandwidth at any point
in time.  Interestingly, these spectra are rather sparse and
``spiky'', which means the Fourier expansion can be limited to
important spikes, forming a simple analytic model that approximates
the instantaneous average bandwidth.


The paper begins by describing common communication patterns exhibited
by Fx parallel programs.  The next section describes each of the six
programs we measured, in particular explaining how its communication
pattern arises.  Following this, we describe the PVM communications
library used by the the Fx run-time system.  Next, we describe our
methodology in considerable detail.  The main part of the paper
presents our measurements, including the power spectrum of the
instantaneous bandwidth for each of the programs. The power spectra of
the programs makes their periodicity absolutely clear.  Following the
measurements, we discuss the results, and comment on how the power
spectra can be used to build simple analytical models of the bandwidth
requirements of the programs.  We also discuss a QoS negotiation
scheme that is more amenable to parallel programs.  Finally, we
conclude with an overview.

