\section{Results}

We describe the traffic characteristics for each of the three classes
of programs.

\subsection{Fx kernels}

For each of the kernels, we examined its aggregate traffic and the 
traffic of a representative connection, if there was one.  We define
a connection to be a kernel-specific simplex channel between a source
machine in a destination machine.  Thus for $P=4$, each of the kernels
exhibits 12 connections.  Notice that by considering a connection between
{\em machines} as opposed to between machine-port pairs, we capture all
kernel-specific traffic between a source and destination machine.  This 
includes TCP traffic for message passing, UDP traffic between the PVM
daemons, and TCP ACKs for the symmetric channel.  The communication 
pattern of HIST and SEQ are not symmetric, so we only examine the
aggregate traffic of these kernels.  T2DFFT's pattern is symmetric about the 
partition, so we consider a connection from a machine in the sending half to
a machine in the receiving half.  The other kernels have symmetric
communication patterns, so we choose the connection between an
two arbitrary machines.

The traffic of each of the kernels is characterized by its packet
sizes, interarrival times for packets, and bandwidth, both for the
aggregate traffic and the traffic over the representative connection.
We concentrate on characterizing the bandwidth, since this appears the
most interesting.  

We note here that the graphs presented are not all to the same scale.  
The intention is to better highlight the features of each graph.  However,
this does make quick comparisons between graphs more difficult.

\subsubsection{Packet size statistics}

\begin{figure}
\begin{center}
\begin{tabular}{cc}
\begin{tabular}{|l|c|c|c|c|}
\hline
 & \multicolumn{4}{c|}{\bf Packet Size (Bytes)} \\
\cline{2-5}
{\bf Program} & Min & Max & Avg & SD \\
\hline
SOR    & 58 & 1518 & 473 & 568 \\
2DFFT  & 58 & 1518 & 969 & 678 \\
T2DFFT & 58 & 1518 & 912 & 663 \\
SEQ    & 58 & 90   & 75  & 14  \\
HIST   & 58 & 1518 & 499 & 575 \\
\hline
\end{tabular} &
\begin{tabular}{|l|c|c|c|c|}
\hline
 & \multicolumn{4}{c|}{\bf Packet Size (Bytes)} \\
\cline{2-5}
{\bf Program} & Min & Max & Avg & SD \\
\hline
SOR    & 58 & 1518 & 577 & 591 \\
2DFFT  & 58 & 1518 & 977 & 667 \\
T2DFFT &134 & 1518 & 1442 & 158 \\
SEQ    & - & -   & -  & -  \\
HIST   & - & - & - & - \\
\hline
\end{tabular}\\
(aggregate) &
(connection)\\
\end{tabular}
\end{center}
\caption{Packet size statistics for Fx kernels}
\label{fig:fxpacketstat}
\end{figure}

Figure~\ref{fig:fxpacketstat} shows the minimum, maximum, average and
standard deviation of packet sizes for each of the seven applications.
The first table covers all the connections while the second includes
only packets in a single representative connection.  Although we do
not present histograms here, it is important to remark that for several
of the kernels (2DFFT, HIST, SOR), the distribution of packet sizes is
{\em trimodal}.  This is because these programs send messages large
messages which are split over several maximal size packets and a
single smaller packet for the remainder.  Further, because TCP is used
for the data transfer, there are a significant number of ACK packets.
One would expect T2DFFT to also send large messages and therefore
exhibit a trimodal distribution of packet sizes.  However, a different
PVM mechanism is used to assemble messages in T2DFFT.  As described in
section~\ref{sec:pvm}, PVM internally stores messages as a fragment
list and generates packets for each fragment separately.  Because
of the way messages are assembled in T2DFFT, many fragments result, 
explaining the variety of packet sizes.

\subsubsection{Interarrival time statistics}
\begin{figure}
\begin{center}
\begin{tabular}{cc}
\begin{tabular}{|l|c|c|c|c|}
\hline
 & \multicolumn{4}{c|}{\bf Interarrival Time (ms)} \\
\cline{2-5}
{\bf Program} & Min & Max & Avg & SD \\
\hline
SOR    & 0.0 & 1728.7 & 82.1 & 234.9 \\
2DFFT  & 0.0 & 1395.8 & 1.3  & 10.8    \\
T2DFFT & 0.0 & 1301.6 & 1.5  & 14.3   \\
SEQ    & 0.0 & 218.6  & 1.3  & 8.6  \\
HIST   & 0.0 & 449.9  & 16.5 & 45.5 \\
\hline
\end{tabular} &
\begin{tabular}{|l|c|c|c|c|}
\hline
 & \multicolumn{4}{c|}{\bf Interarrival Time (ms)} \\
\cline{2-5}
{\bf Program} & Min & Max & Avg & SD \\
\hline
SOR    & 0.0 & 1797.0 & 614.2& 590.8 \\
2DFFT  & 0.0 & 2732.6 & 15.1 & 120.5    \\
T2DFFT & 0.0 & 4216.7 & 9.5  & 127.3 \\
SEQ    & -   & -      & -    & -  \\
HIST   & -   & -      & -    & - \\
\hline
\end{tabular}\\
(aggregate) &
(connection)\\
\end{tabular}
\end{center}
\caption{Packet interarrival time statistics for Fx kernels}
\label{fig:fxintstat}
\end{figure}

Figure~\ref{fig:fxintstat} shows the minimum, maximum, average, and
standard deviation of the packet interarrival times for each of the
seven programs.  The first table shows the statistics for all the
connections, while the second concentrates on a single representative
connection.  Notice that ratio of maximum to average interarrival time
for each program is quite high.  This is due to the aggregate bursty
nature of the traffic, as we discuss below.


\subsubsection{Bandwidth}

\begin{figure}
\begin{center}
\begin{tabular}{cc}
\begin{tabular}{|l|c|}
\hline
{\bf Program} & {\bf KB/s} \\
\hline
SOR    & 5.6 \\  
2DFFT  & 754.8 \\
T2DFFT & 607.1 \\    
SEQ    & 58.3 \\ 
HIST   & 29.6 \\ 
\hline
\end{tabular}
&
\begin{tabular}{|l|c|}
\hline
{\bf Program} & {\bf KB/s} \\
\hline
SOR    & 0.9 \\  
2DFFT  & 63.2 \\
T2DFFT & 148.6 \\    
SEQ    & - \\ 
HIST   & - \\ 
\hline
\end{tabular} \\
(aggregate) & (connection) \\
\end{tabular}
\end{center}
\caption{Average bandwidth for Fx kernels}
\label{fig:fxavgbw}
\end{figure}

Figure~\ref{fig:fxavgbw} shows the aggregate and per-connection
average bandwidth used over the lifetime of each of the seven
programs.  It is somewhat counter-intuitive (and quite
promising!)  that even the most communication intensive Fx programs
such as 2DFFT do not consume all the available bandwidth.  However,
recall that SPMD programs synchronize via their global communication
phases, so there are stretches of time where every processor is
computing.  Each of these periods is followed by an intense burst
of traffic, as every processor tries to communicate.  

In many newer SPMD communication strategies, the global
synchronization is {\em enforced} by a separate barrier
synchronization before the each communication
phase~\cite{DEPOSIT-ATM,STRICKER-TP}.  The Fx/PVM implementation does
not do this, however.  It relies on PVM's per-message synchronization
and proper message ordering to achieve correct message delivery.
Still, this ultimately synchronizes the processors even for simple
patterns such as neighbor, though not as tightly as would be the case
with a barrier.

The effect of this synchronization is made clear by examining
figure~\ref{fig:fxwinbw}, which plots the instantaneous bandwidth
averaged over a 10 ms window for the each of the kernel.  This was
computed using a sliding 10 ms averaging window which would move a
single packet at a time.  For the SOR, 2DFFT, and T2DFFT kernels, the
aggregate bandwidth is plotted on the left and the bandwidth of the
representative connection on the right.  Since HIST and SEQ have no
representative connection, only their aggregate bandwidths are
plotted.  In each case, we show a ten second span of time, enough to
include several iterations of the kernel.  The complete traces are
between 50 and several hundred seconds long.

The most remarkable attribute of each of the kernels is that the
bandwidth demand is highly periodic.  Consider the 2DFFT.  Both plots
show about five iterations of the kernel.  Notice that there are
substantial portions of time where virtually no bandwidth is used (all
the processors are in a compute phase).  The reason the third and
fourth burst are short is because they are, in fact, a single
communication phase where some processor descheduled the program.
Because the all-to-all communication schedule is fixed and synchronous,
the communication phase stalled until that processor was able to send
again.

Figure~\ref{fig:fxpsd} shows the corresponding power spectrums
(periodograms) of the instantaneous average bandwidth.  For these
calculations, the entire trace of each kernel was used, not just the
first 10 seconds displayed in figure~\ref{fig:fxwinbw}.  Because a
power spectrum computation  requires evenly spaced input data,
the input bandwidth was a computed along static 10 ms intervals by
including all packets that arrived during the interval.  This is a
close approximation to the sliding window bandwidth, and more feasible
than correctly sampling the sliding window bandwidth data, which would
require a curve fit over a massive amount of data.

Not surprisingly, SEQ, in which processor 0 repeatedly broadcasts a
single word, is extremely periodic, with the four Hz harmonic being
the most important.  HIST has a 5 Hz fundamental with linearly 
declining harmonics at 10, 15, etc. Hz.  

SOR and 2DFFT display opposite relationships between the connection
and aggregate power spectra.  For SOR, the connection power spectrum
shows great periodicity, with a fundamental of about 5 Hz and
interestingly modulated harmonics, but the aggregate power spectrum
shows far less clear periodicity.  For 2DFFT, the relationship is the
reverse, although less strong, with a clear fundamental of $1/2$ Hz
and exponentially declining harmonics.  There are two explanations for
this.  First, 2DFFT transfers more data per message than SOR ($O(\left
(
\frac{N}{P} \right )^2)$ versus $O(N)$, $N=512$, $P=4$), so has a better
chance of being descheduled (as discussed above).  Second, 2DFFT's
communication pattern more closely synchronizes all the processors
than SOR's.  Thus a single SOR connection has a better chance of being
periodic because the sending processor is less likely to be
descheduled.  On the other hand, SOR's aggregate traffic will be less
periodic because the processors are less tightly synchronized.  Notice,
however, that in both cases, the representative connection's power
spectrum {\em does} show considerable periodicity. 

T2DFFT's power spectra have the least clear periodicity of all the Fx
kernels.  However, the aggregate spectrum seems slightly cleaner than
the spectrum of the representative connection.  The fact that neither
spectrum is very clean is surprising given the synchronizing nature of
this pattern, the balanced message sizes, and the communication
schedule (shift) used for it.  We believe the problem arises from
PVM's handling of the message as a cluster of fragments.  

\begin{figure}
\begin{center}
\begin{tabular}{cc}
\hardypsfig{SOR.all.patch.time.winbw.chop.10.ps} &
\hardypsfig{SOR.ba.patch.time.winbw.chop.10.ps} \\
(SOR - aggregate) &
(SOR - connection) \\
\hardypsfig{FFT.all.patch.time.winbw.chop.10.ps} &
\hardypsfig{FFT.ba.patch.time.winbw.chop.10.ps} \\
(2DFFT - aggregate) &
(2DFFT - connection) \\
\hardypsfig{TFFT.all.patch.time.winbw.chop.10.ps} &
\hardypsfig{TFFT.ba.patch.time.winbw.chop.10.ps} \\
(T2DFFT - aggregate) &
(T2DFFT - connection) \\
\hardypsfig{SEQ.all.patch.time.winbw.chop.10.ps} &
\hardypsfig{HIST.all.patch.time.winbw.chop.10.ps} \\
(SEQ - aggregate) &
(HIST - aggregate) \\
\end{tabular}
\end{center}
\caption{Instantaneous bandwidth of Fx kernels (10ms averaging interval)}
\label{fig:fxwinbw}
\end{figure}

\begin{figure}
\begin{center}
\begin{tabular}{cc}
\hardypsfig{SOR.all.patch.time.swb.chop.psd.all.ps} &
\hardypsfig{SOR.ba.patch.time.swb.chop.psd.all.ps} \\
(SOR - aggregate) & 
(SOR - connection) \\
\hardypsfig{FFT.all.patch.time.swb.chop.psd.5.ps} &
\hardypsfig{FFT.ba.patch.time.swb.chop.psd.10.ps} \\
(2DFFT - aggregate) &
(2DFFT - connection)\\
\hardypsfig{TFFT.all.patch.time.swb.psd.5.ps} &
\hardypsfig{TFFT.ba.patch.time.swb.chop.psd.5.ps} \\
(T2DFFT - aggregate) &
(T2DFFT - connection)\\
\hardypsfig{SEQ.all.patch.time.swb.psd.10.ps} &
\hardypsfig{HIST.all.patch.time.swb.chop.psd.all.ps} \\
(SEQ - aggregate) &
(HIST - aggregate)\\
\end{tabular}
\end{center}
\caption{Power spectrum of bandwidth of Fx kernels (10ms averaging interval)}
\label{fig:fxpsd}
\end{figure}

\begin{figure}
\begin{center}
\begin{tabular}{cc}
\hardypsfig{SOR.all.patch.time.bwp.chop.ps} &
\hardypsfig{SOR.ba.patch.time.bwp.chop.ps} \\
(SOR - aggregate) & 
(SOR - connection) \\
\hardypsfig{FFT.all.patch.time.bwp.chop.ps} &
\hardypsfig{FFT.ba.patch.time.bwp.chop.ps} \\
(2DFFT - aggregate) &
(2DFFT - connection)\\
\hardypsfig{TFFT.all.patch.time.bwp.chop.ps} &
\hardypsfig{TFFT.ba.patch.time.bwp.chop.ps} \\
(T2DFFT - aggregate) &
(T2DFFT - connection)\\
\hardypsfig{SEQ.all.patch.time.bwp.chop.ps} &
\hardypsfig{HIST.all.patch.time.bwp.chop.ps} \\
(SEQ - aggregate) &
(HIST - aggregate)\\
\end{tabular}
\end{center}
\caption{D-BIND characterization of bandwidth of Fx kernels}
\label{fig:fxdbind}
\end{figure}


Figure~\ref{fig:fxdbind} shows the Deterministic-Bounding Interval
Dependent (D-BIND) traffic model characterization~\cite{D-BIND} of
each for each of the kernels.  The D-BIND model characterizes traffic
by the maximum bandwidth used over any interval of particular length
versus interval length.  For the Fx kernels, we examined every
interval from 10 ms to 10 seconds, with a granularity of 10 ms.  Each
graph corresponds to the instantaneous average bandwidth graph in
figure~\ref{fig:fxwinbw}, although it represents the D-BIND
characterization for the entire trace, not just ten seconds of it.
Notice that periodicity made evident by these graphs match precisely
that discovered via the power spectra of figure~\ref{fig:fxpsd},
further bolstering that the approximation to the sampled instantaneous
average bandwidth used to compute the spectra is reasonable.  For
example, the $1/2$ Hz fundamental of 2DFFT is clearly visible.


\subsection{AIRSHED Simulation}

For AIRSHED,  we examined both the aggregate traffic as well as the traffic
of one connection.

\subsubsection{Packet size statistics}

\begin{figure}
\begin{center}
\begin{tabular}{cc}
\begin{tabular}{|l|c|c|c|c|}
\hline
 & \multicolumn{4}{c|}{\bf Packet Size (Bytes)} \\
\cline{2-5}
{\bf Program} & Min & Max & Avg & SD \\
\hline
AIRSHED & 58 & 1518 & 899 & 693 \\
\hline
\end{tabular} &
\begin{tabular}{|l|c|c|c|c|}
\hline
 & \multicolumn{4}{c|}{\bf Packet Size (Bytes)} \\
\cline{2-5}
{\bf Program} & Min & Max & Avg & SD \\
\hline
AIRSHED & 58 & 1518 & 889 & 688 \\
\hline
\end{tabular}\\
(aggregate) &
(connection)\\
\end{tabular}
\end{center}
\caption{Packet size statistics for AIRSHED}
\label{fig:airshedpacketstat}
\end{figure}

Figure~\ref{fig:airshedpacketstat} shows the minimum, maximum, average,
and standard deviation of packet sizes for the AIRSHED application
(for all connections and for the representative connection).  We observe
that the packet size distribution for the single connection is very
similar to the aggregate packet distribution, which supports the argument
that the traffic from the single connection
is representative of the aggregate traffic.

\subsubsection{Interarrival time statistics}

\begin{figure}
\begin{center}
\begin{tabular}{cc}
\begin{tabular}{|l|c|c|c|c|}
\hline
 & \multicolumn{4}{c|}{\bf Interarrival Time (ms)} \\
\cline{2-5}
{\bf Program} & Min & Max & Avg & SD \\
\hline
AIRSHED & 0.0 & 23448.6 & 26.8 & 513.3 \\
\hline
\end{tabular} &
\begin{tabular}{|l|c|c|c|c|}
\hline
 & \multicolumn{4}{c|}{\bf Interarrival Time (ms)} \\
\cline{2-5}
{\bf Program} & Min & Max & Avg & SD \\
\hline
AIRSHED & 0.0 & 37018.5 & 317.4 & 2353.6 \\
\hline
\end{tabular}\\
(aggregate) &
(connection)\\
\end{tabular}
\end{center}
\caption{Packet interarrival time statistics for AIRSHED}
\label{fig:airshedintstat}
\end{figure}

Figure~\ref{fig:airshedintstat} shows the minimum, maximum, average,
and the standard deviation of packet interarrival times.  Note that
both the maximum and average interarrival times are of an order of
magnitude greater than that of the kernel applications.  As in the case
of the kernel applications, the
ratio of maximum to average interarrival time is quite high, which is
characteristic of a bursty traffic.

\subsubsection{Bandwidth}

The average aggregate and per-connection bandwidths for the AIRSHED
application are 32.7 KB/s and 2.7 KB/s, respectively.
Figure~\ref{fig:airshedwinbw}
shows the instantaneous bandwidth averaged over a 10 ms window (over
a 500 sec interval, and a 60 sec interval).  It
is clear that the bandwidth demand is highly periodic, and is periodic
over {\em three} different time scales.
The simulation is divided into a sequence
of $h$ simulation-hours ($h = 100$ in the simulation),
each of which involves a sequence of $k$ simulations steps ($k = 5$).
Each simulation hour starts with a preprocessing stage, where the stiffness
matrix is computed.  Once the stiffness matrix is computed,
the program moves on to the simulation stages.  Such simulation is
characterized by (1) a local horizontal transport computation phase,
(2) a subsequent global {\em all-to-all} transpose traffic,
(3) a local chemical/vertical transport
computation phase, and finally (4) a global {\em all-to-all} transpose
traffic in the reversed direction.

\begin{figure}
\begin{center}
\begin{tabular}{cc}
\hardypsfig{/afs/cs/usr/ksleung/NS/AIRSHED.all.patch.time.winbw.chop.1000.1500.ps} &
\hardypsfig{/afs/cs/usr/ksleung/NS/AIRSHED.ba.patch.time.winbw.chop.1000.1500.ps} \\
(aggregate, 500 seconds) &
(connection, 500 seconds) \\
\hardypsfig{/afs/cs/usr/ksleung/NS/AIRSHED.all.patch.time.winbw.chop.1000.1060.ps} &
\hardypsfig{/afs/cs/usr/ksleung/NS/AIRSHED.ba.patch.time.winbw.chop.1000.1060.ps} \\
(aggregate, 60 seconds) &
(connection, 60 seconds) \\
\end{tabular}
\end{center}
\caption{Instantaneous bandwidth of AIRSHED (10ms averaging interval)}
\label{fig:airshedwinbw}
\end{figure}


A total of
100 bursty periods are observed, corresponding to the 100 simulation hours.
The bandwidth utilization between each bursty period is very low because no
communication is involved during the preprocessing stage at the beginning
of each simulation hour.  Each bursty period can be further divided into
5 pairs of peaks, with each pair of peaks corresponding to one simulation
step.  The time between each pair of peaks reflects the time spent in the
chemical/vertical transport computation stage, whereas the time interval
between adjacent pairs -- which is slightly shorter -- corresponds to the
time spent in the horizontal transport computation.  
Such periodicity becomes very clear when we observe the
power spectrums for the AIRSHED simulation (figure~\ref{fig:airshedpsd}).
There are three peaks (plus their harmonics)
in the power spectrum at approximately
0.015 Hz (66 sec, corresponding to a simulation hour),
0.2 Hz (5 sec, corresponding to the length of
the chemical/vertical transport phase),
and 5 Hz (200 ms, corresponding to that of the horizontal transport phase),
respectively.

\begin{figure}
\begin{center}
\begin{tabular}{cc}
\hardypsfig{/afs/cs/usr/ksleung/NS/AIRSHED.all.patch.time.psd.0.01.ps} &
\hardypsfig{/afs/cs/usr/ksleung/NS/AIRSHED.ba.patch.time.psd.0.01.ps} \\
(aggregate, 0 -- 0.1 Hz) &
(connection, 0 -- 0.1 Hz) \\
\hardypsfig{/afs/cs/usr/ksleung/NS/AIRSHED.all.patch.time.psd.0.1.ps} &
\hardypsfig{/afs/cs/usr/ksleung/NS/AIRSHED.ba.patch.time.psd.0.1.ps} \\
(aggregate, 0 -- 1 Hz) &
(connection, 0 -- 1 Hz) \\
\hardypsfig{/afs/cs/usr/ksleung/NS/AIRSHED.all.patch.time.psd.0.20.ps} &
\hardypsfig{/afs/cs/usr/ksleung/NS/AIRSHED.ba.patch.time.psd.0.20.ps} \\
(aggregate, 0 -- 20 Hz) &
(connection, 0 -- 20 Hz) \\
\end{tabular}
\end{center}
\caption{Power spectrum of bandwidth of AIRSHED (10ms averaging interval)}
\label{fig:airshedpsd}
\end{figure}

Finally, figure~\ref{fig:airsheddbind}
shows the D-BIND characterization of AIRSHED for
both the aggregate traffic and the single-connection traffic.

\begin{figure}
\begin{center}
\begin{tabular}{cc}
\hardypsfig{/afs/cs/usr/ksleung/NS/AIRSHED.all.patch.time.bwp.chop.ps} &
\hardypsfig{/afs/cs/usr/ksleung/NS/AIRSHED.ba.patch.time.bwp.chop.ps} \\
(AIRSHED - aggregate) &
(AIRSHED - connection) \\
\end{tabular}
\end{center}
\caption{D-BIND characterization of bandwidth of AIRSHED}
\label{fig:airsheddbind}
\end{figure}

\subsection{DSD}

For DSD, we examined both the aggregate traffic of all nine nodes, and
the traffic generated by only one of the nodes.  All message passing done
by DSD uses UDP packets.  As expected, DSD uses very little bandwidth in
normal functioning.  Average bandwidth utilization was found to be
126 bps, or about  1/80,000 the capacity of an ethernet for each
machine running DSD.

\subsubsection{Packet size statistics}

\begin{figure}
\begin{center}
\begin{tabular}{cc}
\begin{tabular}{|l|c|c|c|c|}
\hline
 & \multicolumn{4}{c|}{\bf Packet Size (Bytes)} \\
\cline{2-5}
{\bf Program} & Min & Max & Avg & SD \\
\hline
DSD    & 70 &  246 & 79  & 13.1 \\
\hline
\end{tabular} &
\begin{tabular}{|l|c|c|c|c|}
\hline
 & \multicolumn{4}{c|}{\bf Packet Size (Bytes)} \\
\cline{2-5}
{\bf Program} & Min & Max & Avg & SD \\
\hline
DSD    & 70 &  166 &  79 & 13.1 \\
\hline
\end{tabular}\\
(aggregate) &
(single source)\\
\end{tabular}
\end{center}
\caption{Packet size statistics for DSD}
\label{fig:dsdpacketstat}
\end{figure}

Figure~\ref{fig:dsdpacketstat} shows the minimum, maximum, average, and
standard deviation of packet sizes for all machines and for a single 
machine.  Notice that the largest packet sizes are still fairly small
compared to an Ethernet packet, although this would grow as more machines
are added to run the application.  Also notice that the average packet
size is closest to the minimum packet size.  This is due to the fact that
DSD does not need to pass much information around unless an event occurs.
If events are rare, then the steady-state behavior of DSD uses very little
network overhead.

\subsubsection{Interarrival time statistics}

\begin{figure}
\begin{center}
\begin{tabular}{cc}
\begin{tabular}{|l|c|c|c|c|}
\hline
 & \multicolumn{4}{c|}{\bf Interarrival Time (ms)} \\
\cline{2-5}
{\bf Program} & Min & Max & Avg & SD \\
\hline
DSD    & 0.0 & 53010  & 822  & 3829 \\
\hline
\end{tabular} &
\begin{tabular}{|l|c|c|c|c|}
\hline
 & \multicolumn{4}{c|}{\bf Interarrival Time (ms)} \\
\cline{2-5}
{\bf Program} & Min & Max & Avg & SD \\
\hline
DSD    & 0.976 & 60159 & 4914 & 13459 \\
\hline
\end{tabular}\\
(aggregate) &
(single source)\\
\end{tabular}
\end{center}
\caption{Packet interarrival time statistics for DSD}
\label{fig:dsdintstat}
\end{figure}

Figure~\ref{fig:dsdintstat} shows the minimum, maximum, average, and
standard deviation of packet interarrival times.  Notice that the minimum
is quite small, corresponding to the 12 messages required during each
testing round.  The maximum is close to the expected value of about 60
seconds, which is the length of a testing round.  The maximum interval
for the aggregate case is also close to 60 seconds.  This is due only to
the fact that the programs were started on every machine at about the
same time.  If the starting times had been staggered, this would not have
been the case.

\begin{figure}
\centerline{
\begin{tabular}{cc}
\hardypsfig{all9.time.patch.winbw2.ps,width=2.5in} &
\hardypsfig{all9.time.patch.winbw2.115-130.ps,width=2.5in} \\
(whole trace) & (115-130 seconds) \\
\end{tabular}
}
\caption{DSD instantaneous bandwidth (10 ms averaging interval)}
\label{fig:dsdwinbw1}
\end{figure}

Figure~\ref{fig:dsdwinbw1} shows the bandwidth averaged over 0.01 second
intervals for the traffic generated by all nine machines.  Notice that
the peak bandwidth is only about 66 KB/s!  Although it is not obvious from
this diagram, the bandwidth is zero over most intervals.  A blowup of the
interval from 115 to 130 seconds makes this fact more easily visible.

\begin{figure}
\centerline{
\begin{tabular}{cc}
\hardypsfig{all9.time.patch.bwp.ps,width=2.5in} &
\hardypsfig{all9.pandora.time.patch.bwp2.ps,width=2.5in} \\
(aggregate) & (single source) \\
\end{tabular}
}
\caption{DSD D-BIND characterization, 0-150 seconds}
\label{fig:dsddbind1}
\end{figure}

\begin{figure}
\centerline{
\begin{tabular}{cc}
\hardypsfig{all9.time.patch.bwp2.ps,width=2.5in} &
\hardypsfig{all9.pandora.time.patch.bwp.ps,width=2.5in} \\
(aggregate) & (single source) \\
\end{tabular}
}
\caption{DSD D-BIND characterization, 0-1 second}
\label{fig:dsddbind2}
\end{figure}


Figures~\ref{fig:dsddbind1} and ~\ref{fig:dsddbind2} show the
D-BIND characterization of DSD for both the single-source and the 
aggregate cases.  Notice that these graphs look very similar to the
ones for the SOR application.  This was expected since both rely on 
neighbor-to-neighbor communication.  More testing should probably be done
to see if all programs that use neighbor communication exhibit the same
D-BIND characterization.
