\comment{
Chapter~\ref{chap:statprop} motivates the choice of host load as our
resource signal, shows how to appropriately sample it, and describes
the statistical characteristics of this resource signal.  This
knowledge forms the basis for the host load measurement system in
Figure~\ref{fig:intro.rtsa_structure}.  The interesting and new
statistical results are a strong autocorrelation structure,
self-similarity, and epochal behavior.  These findings suggest that
some form of linear model should be appropriate for prediction, but
that more complex models that capture long-range dependence may be
necessary.  Furthermore, they suggest that such models may need to be
refitted at epoch boundaries.
}


\chapter{Statistical Properties of Host Load}
\label{chap:statprop}

This dissertation argues for basing real-time scheduling advisors on
explicit resource-oriented prediction, specifically on the prediction
of resource signals.  In the last chapter, we presented a methodology
for understanding and predicting such resource signals and described
our toolkit for carrying out the methodology.  This chapter applies
the first four steps of our resource signal methodology, as described
in Section~\ref{sec:rps.newpredsys}, to {\em host load}, a signal that
tracks CPU availability.  The essence of these steps is to choose an
appropriate resource signal, determine how to sample it, collect
representative traces of the signal to form a signal analysis and
prediction problem, and then analyze the traces to find appropriate
predictive models.  The next chapter carries out the final two steps
of the methodology, resulting in an appropriate predictive model for
host load, and an on-line host load prediction system.

We ultimately want to predict the running time of tasks, which varies
primarily as a result of changing CPU availability.  CPU availability
is well measured by the host load signal we study in this chapter.
The running time of a compute-bound task is directly related to the
average load it encounters during execution.  What, then, are the
qualitative and quantitative properties of load on real systems, and
what are the implications of these properties for host load
prediction?  Since we are interested in scheduling short tasks as well
as long ones, the answers to these questions should extend to
correspondingly short timescales.  Unfortunately, to date there has
been little work on characterizing the properties of load at fine
resolutions.  The available studies concentrate on understanding
functions of load, such as
availability~\cite{WORKSTATION-AVAIL-STATS-CONDOR} or job
durations~\cite{LIMITED-BENEFIT-MIGRATION-EAGER,LOAD-BAL-HEURISTICS-PROCESS-BEHAVIOR-LELAND-OTT,EXPLOIT-PROC-LIFETIME-DIST-DYN-LB-SIGMETRICS}.
Furthermore, they deal with the coarse grain behavior of load---how it
changes over minutes, hours and days.

To understand the properties of host load on real systems at fine
resolutions, we collected week-long, 1 Hz resolution traces of the
Digital Unix load average (specifically, an exponential average with a
five second time constant) on over 35 different machines that we
classify as production and research cluster machines, compute servers,
or desktop workstations.  We collected two sets of such traces at
different times of the year.  The 1 Hz sample rate is sufficient to
capture all of the dynamic load information that is available to
user-level programs running on these machines.  This chapter presents
a detailed statistical analysis of both sets of traces and their
implications for prediction.  The subsequent chapters use these traces
to evaluate the host load prediction system, the running time advisor,
and the real-time scheduling advisor.

The basic question is whether load traces that might seem at first
glance to be random and unpredictable might have structure that could
be exploited for prediction.  Our results suggest that host load
signals do indeed have some structure in the form of clearly
identifiable properties. In essence, our results characterize how load
varies, which should be of interest not only to developers of
prediction algorithms, but also to those who need to generate
realistic synthetic loads in simulators or to those doing analytic
work.  The traces can also be used to reconstruct real background
workloads using host load trace playback, as described in
Chapter~\ref{chap:execpred}.

We found that host load is a resource signal with high absolute and
relative variability, leading to widely varying running times.  While
this is distressing, host load signals are not random.  Time series
analysis shows that there is strong correlation over time.  This
suggests that linear time series models, which attempt to capture this
autocorrelation parsimoniously, may be appropriate for predicting host
load.  However, host load signals are also self-similar, which
suggests that complex predictive models, such as the ARFIMA models
described in the previous chapter, may be necessary to predict them.
Finally, host load exhibits what we term epochal behavior---the signal
remains stationary for an extended period of time, and then abruptly
transitions to another stationary regime.  Linear time series models,
even those that explicitly model nonstationarity, cannot model such
abrupt transitions.  The implication is that a linear model would have
to be refit whenever such a transition occurred.


\section{Host load and running time}
\label{sec:loadpred.load_running_time}

\begin{figure}[tbp]
\centerline{
\colfigsize
\epsfbox{eps/load2time.eps}
}
\caption
[Relationship between host load and running time]
{Relationship between average load during execution and running time}
\label{fig:statprop.load2time}
\end{figure}

We chose to study the host load signal because, for compute-bound tasks,
there is an intuitive relationship between host load and running time.
Consider Figure~\ref{fig:statprop.load2time}, which plots running time
versus average load experienced during execution for tasks consisting
of simple wait loops.  The data was generated by running variable
numbers of these tasks together, at identical priority levels, on an
otherwise unloaded Digital Unix machine.  Each of these tasks sampled
the Digital Unix five-second load average at roughly one second
intervals during their execution and at termination printed the
average of these samples as well as their running time.  It is these
pairs that make up the 42,000 points in the figure.  Notice that the
relationship between the measured load during execution and the
running time is almost perfectly linear ($R^2>0.99$).

If load were presented as a continuous signal, we would summarize this
relationship between running time and load as 
\begin{equation}
\frac{t_{exec}}{1+\frac{1}{t_{exec}}\int_{0}^{t_{exec}} z(t) dt}
=
t_{nom}
\end{equation}
where $t_{nom}$ is the running time of the task on a completely
unloaded machine, $1 + \frac{1}{t_{exec}}\int_{0}^{t_{exec}} z(t) dt$
is the average load experienced during execution ($z(t)$ is the
continuous ``background'' load), and $t_{exec}$ is the task's running
time.  In practice, we can only sample the load with some
noninfinitesimal sample period $\Delta$ so we can only approximate the
integral by summing over the values in the sample sequence.  For these
machines, $\Delta=1$ second is appropriate, as we show in the next
section.  We will write a sequence of load samples, which we will also
refer to as a {\em load signal}, as $\langle z_t \rangle
=\ldots,z_{t-1},z_t,z_{t+1},\ldots$

\section{Measurement methodology}
\label{sec:statprop.load_trace_measurement}
Having decided that host load is a useful signal, we next developed a
sensor for it, and determined how quickly to sample it.  The load on a
Unix system at any given instant is the number of processes that are
running or are ready to run, which is the length of the ready queue
maintained by the scheduler.  The kernel samples the length of the the
ready queue at some rate and exponentially averages some number of
previous samples to produce a load average which can be accessed from
a user program.  The specific Unix system we used was Digital Unix
(DUX).

Unlike many Unix implementations, which exponentially average with a
time constant of one minute at the finest, DUX uses a time constant of
five seconds. This small time constant allows us to capture
considerably more of the dynamics of load than would have been
possible on other Unix implementations, and it minimizes the effect of
phantom correlations due to the exponential filter.  Interestingly,
directly sampling the length of the ready queue, which we tried on
Windows NT, does not provide much useful information because it is
impossible to sample the queue fast enough from a user process.

We developed a small tool to sample the DUX load average at one second
intervals and log the resulting time series to a data file.  The tool
is based on the host load sensor described in the previous chapter.
The 1 Hz sample rate was arrived at by subjecting DUX systems to
varying loads and sampling at progressively higher rates to determine
the rate at which DUX actually updated the value.  DUX updates the
value at a rate of $1/2$ Hz, thus we chose a 1 Hz sample rate by the
Nyquist criterion.  This choice of sample rate means we capture all of
the dynamic load information the operating system makes available to
user programs.  

\section{Description of traces}
\label{sec:statprop.trace_desc}

Having chosen an appropriate resource signal and sampling rate, we
next carried out the second step of the resource signal methodology:
collecting traces.  We ran our trace collection tool on 39 hosts
belonging to the Computing, Media, and Communication Laboratory (CMCL)
at CMU and the Pittsburgh Supercomputing Center (PSC) for slightly
more than one week in late August, 1997.  A second set of week-long
traces was acquired on almost exactly the same set of machines (35
machines total) in late February and early March, 1998.  The results
of the statistical analysis were similar for the two sets of traces.

All of the hosts in the August, 1997 set were DEC Alpha DUX machines,
running either DUX 3.2 or 4.0 and they form four classes:
\begin{itemize}
\item{{\em Production Cluster}: 13 hosts of the PSC's ``Supercluster'',
including two front-end machines (axpfea, axpfeb), four interactive
machines (axp0 through axp3), and seven batch machines scheduled by a
DQS~\cite{KAPLAN-COMPARE-CLUSTER-SW} variant (axp4 through axp10.)}
\item{{\em Research Cluster}: eight machines in an experimental
cluster in the CMCL \\
(manchester-1 through manchester-8.)} 
\item{{\em Compute servers}: two high performance large memory
machines used by the CMCL group as compute servers for simulations and
the like (mojave and sahara.)}
\item {\em Desktops}: 16 desktop workstations owned by members of the
CMCL (aphrodite through zeno.)
\end{itemize}
The same hosts were used for the March, 1998 traces, with the following
exceptions:
\begin{itemize}
\item{{\em Production Cluster}: axp9 was replaced by axp11 due to
hardware failures.}
\item{{\em Desktops}: argus, asclepius, bruce, cobain, darryl, and
hestia were replaced by belushi and loman due to hardware upgrades.}
\end{itemize}

\tinyspacing
\begin{table}
\small
\centerline{
\begin{tabular}{|llrr|}
\hline
Hostname & Start Time & Days & Samples \\
\hline
\multicolumn{4}{c}{Production Cluster} \\
\hline
axp0.psc & Tue Aug 12 21:29:12 EDT 1997 & 15.00 & 1296000 \\
axp1.psc & Tue Aug 12 21:30:09 EDT 1997 & 14.00 & 1209600 \\
axp2.psc & Tue Aug 12 21:30:53 EDT 1997 & 14.00 & 1209600 \\
axp3.psc & Tue Aug 12 21:31:13 EDT 1997 & 14.00 & 1209600 \\
axp4.psc & Tue Aug 12 21:31:12 EDT 1997 & 14.00 & 1209600 \\
axp5.psc & Tue Aug 12 21:31:47 EDT 1997 & 14.00 & 1209600 \\
axp6.psc & Tue Aug 12 21:31:15 EDT 1997 & 15.00 & 1296000 \\
axp7.psc & Tue Aug 12 20:51:19 EDT 1997 & 13.00 & 1123200 \\
axp8.psc & Tue Aug 12 21:31:19 EDT 1997 & 14.00 & 1209600 \\
axp9.psc & Tue Aug 12 21:31:45 EDT 1997 & 14.00 & 1209600 \\
axp10.psc & Tue Aug 12 21:31:21 EDT 1997 & 14.00 & 1209600 \\
axpfea.psc & Sat Aug 16 14:44:29 EDT 1997 & 13.00 & 1123200 \\
axpfeb.psc & Sat Aug 16 14:44:55 EDT 1997 & 12.00 & 1036800 \\
\hline
\multicolumn{4}{c}{Research Cluster} \\
\hline
manchester-1.cmcl & Sun Aug 17 19:41:10 EDT 1997 & 3.92 & 338400 \\
manchester-2.cmcl & Sun Aug 17 19:41:09 EDT 1997 & 4.00 & 345600 \\
manchester-3.cmcl & Sun Aug 17 19:41:13 EDT 1997 & 3.96 & 342000 \\
manchester-4.cmcl & Sun Aug 17 19:41:10 EDT 1997 & 4.00 & 345600 \\
manchester-5.cmcl & Sun Aug 17 19:41:09 EDT 1997 & 4.04 & 349200 \\
manchester-6.cmcl & Sun Aug 17 19:41:09 EDT 1997 & 4.08 & 352800 \\
manchester-7.cmcl & Sun Aug 17 19:41:10 EDT 1997 & 4.00 & 345600 \\
manchester-8.cmcl & Sun Aug 17 19:41:10 EDT 1997 & 4.00 & 345600 \\
\hline
\multicolumn{4}{c}{Compute Servers} \\
\hline
mojave.cmcl & Sun Aug 17 19:41:11 EDT 1997 & 4.04 & 349200 \\
sahara.cmcl & Sun Aug 17 19:41:11 EDT 1997 & 4.00 & 345600 \\
\hline
\multicolumn{4}{c}{Desktops} \\
\hline
aphrodite.nectar & Sun Aug 17 19:41:12 EDT 1997 & 4.00 & 345600 \\
argus.nectar & Sun Aug 17 19:41:17 EDT 1997 & 4.04 & 349200 \\
asbury-park.nectar & Sun Aug 17 19:41:11 EDT 1997 & 4.00 & 345600 \\
asclepius.nectar & Sun Aug 17 19:41:07 EDT 1997 & 4.08 & 352800 \\
bruce.nectar & Sun Aug 17 19:41:10 EDT 1997 & 3.92 & 338400 \\
cobain.nectar & Sun Aug 17 19:41:12 EDT 1997 & 4.04 & 349200 \\
darryl.nectar & Sun Aug 17 19:41:32 EDT 1997 & 1.71 & 147600 \\
hawaii.cmcl & Sun Aug 17 19:41:11 EDT 1997 & 2.63 & 226800 \\
hestia.nectar & Sun Aug 17 19:41:12 EDT 1997 & 4.00 & 345600 \\
newark.cmcl & Sun Aug 17 19:41:13 EDT 1997 & 4.00 & 345600 \\
pryor.nectar & Sun Aug 17 19:41:13 EDT 1997 & 1.71 & 147600 \\
rhea.nectar & Sun Aug 17 19:41:11 EDT 1997 & 4.00 & 345600 \\
rubix.mc & Sun Aug 17 19:41:13 EDT 1997 & 4.00 & 345600 \\
themis.nectar & Sun Aug 17 19:41:09 EDT 1997 & 4.00 & 345600 \\
uranus.nectar & Sun Aug 17 19:41:13 EDT 1997 & 4.00 & 345600 \\
zeno.nectar & Sun Aug 17 19:41:13 EDT 1997 & 4.08 & 352800 \\
\hline
\end{tabular}
}
\caption
[Details of the August, 1997 traces]
{Details of the August, 1997 traces.}
\label{tab:statprop.tracedetailsaug97}
\normalsize
\end{table}
\normspacing

\tinyspacing
\begin{table}
\small
\centerline{
\begin{tabular}{|llrr|}
\hline
Hostname & Start Time & Days & Samples \\
\hline
\multicolumn{4}{c}{Production Cluster} \\
\hline
axp0.psc & Wed Feb 25 17:34:26 EST 1998 & 12.08 & 1043400 \\
axp1.psc & Wed Feb 25 17:34:25 EST 1998 & 12.08 & 1043400 \\
axp2.psc & Wed Feb 25 17:34:25 EST 1998 & 12.08 & 1043400 \\
axp3.psc & Wed Feb 25 17:34:26 EST 1998 & 12.03 & 1039400 \\
axp4.psc & Wed Feb 25 17:34:27 EST 1998 & 12.06 & 1041900 \\
axp5.psc & Wed Feb 25 17:34:27 EST 1998 & 12.03 & 1039700 \\
axp6.psc & Wed Feb 25 17:34:27 EST 1998 & 12.08 & 1043400 \\
axp7.psc & Wed Feb 25 17:34:27 EST 1998 & 12.01 & 1037700 \\
axp8.psc & Wed Feb 25 17:34:28 EST 1998 & 12.06 & 1041800 \\
axp10.psc & Wed Feb 25 17:34:28 EST 1998 & 12.03 & 1039700 \\
axp11.psc & Wed Feb 25 17:16:20 EST 1998 & 12.03 & 1039800 \\
axpfea.psc & Wed Feb 25 17:18:01 EST 1998 & 12.08 & 1043800 \\
axpfeb.psc & Wed Feb 25 17:23:34 EST 1998 & 12.0 & 1043400 \\
\hline
\multicolumn{4}{c}{Research Cluster} \\
\hline
manchester-1.cmcl & Wed Feb 25 20:42:30 EST 1998 & 8.36 & 721900 \\
manchester-2.cmcl & Wed Feb 25 20:42:23 EST 1998 & 8.36 & 721900 \\
manchester-3.cmcl & Wed Feb 25 20:42:23 EST 1998 & 8.36 & 721900 \\
manchester-4.cmcl & Wed Feb 25 20:42:29 EST 1998 & 8.35 & 721300 \\
manchester-5.cmcl & Wed Feb 25 20:42:27 EST 1998 & 8.36 & 721900 \\
manchester-6.cmcl & Wed Feb 25 20:42:30 EST 1998 & 8.36 & 721900 \\
manchester-7.cmcl & Wed Feb 25 20:42:24 EST 1998 & 8.35 & 721800 \\
manchester-8.cmcl & Wed Feb 25 20:42:26 EST 1998 & 8.35 & 721800 \\
\hline
\multicolumn{4}{c}{Compute Servers} \\
\hline
mojave.cmcl & Wed Feb 25 20:42:31 EST 1998 & 5.31 & 458800 \\
sahara.cmcl & Wed Feb 25 20:42:34 EST 1998 & 8.34 & 721300 \\
\hline
\multicolumn{4}{c}{Desktops} \\
\hline
aphrodite & Wed Feb 25 20:42:17 EST 1998 & 8.36 & 722000 \\
asbury-park & Wed Feb 25 20:42:23 EST 1998 & 5.90 & 509400 \\
belushi & Wed Feb 25 20:42:28 EST 1998 & 7.77 & 671400 \\
hawaii & Wed Feb 25 20:42:18 EST 1998 & 8.36 & 722000 \\
loman & Wed Feb 25 20:42:34 EST 1998 & 8.36 & 721900 \\
newark.cmcl & Wed Feb 25 20:42:29 EST 1998 & 8.36 & 722200 \\
pryor.nectar & Wed Feb 25 20:42:24 EST 1998 & 8.36 & 722100 \\
rhea.nectar & Wed Feb 25 21:12:17 EST 1998 & 10.19 & 880600 \\
rubix.mc & Wed Feb 25 20:42:24 EST 1998 & 1.81 & 156400 \\
themis.nectar & Wed Feb 25 20:42:29 EST 1998 & 4.94 & 426900 \\
uranus.nectar & Wed Feb 25 20:42:33 EST 1998 & 8.36 & 722000 \\
zeno.nectar & Wed Feb 25 20:42:26 EST 1998 & 6.23 & 537900 \\
\hline
\end{tabular}
}
\caption
[Details of the March, 1998 traces]
{Details of the March, 1998 traces.}
\label{tab:statprop.tracedetailsmar98}
\normalsize
\end{table}
\normspacing

Tables~\ref{tab:statprop.tracedetailsaug97} and~\ref{tab:statprop.tracedetailsmar98}
provide additional details of the individual August, 1997 and March,
1998 traces.  The author will be happy to provide the traces to any
interested readers.


\section{Statistical analysis}
\label{sec:statprop.stats}

\def\correlfigsize{\epsfxsize=5in}
\begin{figure}

\centerline{
\begin{tabular}{cc}
\correlfigsize
\epsfbox{eps/correl_table_aug97.eps} \\
(a) Correlations for August, 1997 traces\\
\correlfigsize
\epsfbox{eps/correl_table_mar98.eps} \\
(b) Correlations for March, 1998 traces\\
\end{tabular}
}
\caption
[Correlations between statistical properties]
{Correlation coefficients (CCs) between all of the discussed
statistical properties. }
\label{fig:statprop.cross_correl}
\end{figure}

We analyzed the individual load traces using summary statistics,
histograms, fitting of analytic distributions, and time series
analysis.  The picture that emerges is that load varies over a wide
range in very complex ways.  Load distributions are rough and
frequently multi-modal.  Even traces with unimodal histograms are not
well fitted by common analytic distributions, which have tails that
are either too short or too long.  Time series analysis shows that
load is strongly correlated over time, but also has complex, almost
noise-like frequency domain behavior.


We summarized each of our load traces in terms of our statistical
measures and computed their correlations to determine how the measures
are related.  Figure~\ref{fig:statprop.cross_correl}(a) contains the
correlations for the August, 1997 set while
Figure~\ref{fig:statprop.cross_correl}(b) contains the correlations for the
March, 1998 set.  Unless otherwise noted, the remaining figures in the
chapter similarly present independent results for the two sets of
traces.  Each cell of the tables in Figure~\ref{fig:statprop.cross_correl} is
the correlation coefficient (CC) between the row measure and the
column measure, computed over the the load traces in the set.  We will
refer back to the highlighted cells (where the absolute correlations
are greater than 0.3) throughout the chapter.  It is important to note
that these cross correlations can serve as a basis for clustering load
traces into rough equivalence classes.


\subsection{Summary statistics}

Summarizing each load trace in terms of its mean, standard deviation,
and maximum and minimum illustrates the extent to which load varies.
Figure~\ref{fig:statprop.sdev_mean_load} shows the mean load and the +/- one
standard deviation points for each of the traces.  As we might expect,
the mean load on desktop machines is significantly lower than on other
machines.  However, we can also see a lack of uniformity within each
class, despite the long duration of the traces.  This is most clear
among the Production Cluster machines, where fewer than half of the
machines seem to be doing most of the work.  This lack of uniformity
even over long time scales shows clear opportunity for load balancing
or resource management systems.


\begin{figure}
\centerline{
\begin{tabular}{cc}
\colfigsize
\epsfbox{eps/load_sdev_mean_aug97.eps} &
\colfigsize
\epsfbox{eps/load_sdev_mean_march98.eps} \\
(a) &
(b) \\
\end{tabular}
}
\caption
[Mean load +/- one standard deviation, all traces]
{Mean load +/- one standard deviation: (a) August, 1997
traces, (b) March, 1998 traces.}  
\label{fig:statprop.sdev_mean_load}
\end{figure}


\begin{figure}
\centerline{
\begin{tabular}{cc}
\colfigsize
\epsfbox{eps/load_cov_mean_aug97.eps} &
\colfigsize
\epsfbox{eps/load_cov_mean_mar98.eps} \\
(a) & (b) \\
\end{tabular}
}
\caption
[COV of load and mean load, all traces]
{COV of load and mean load: (a) August, 1997
traces, (b) March, 1998 traces.}
\label{fig:statprop.cov_mean_load}
\end{figure}

From Figure~\ref{fig:statprop.sdev_mean_load} we can also see that desktop
machines have smaller standard deviations than the other machines.
Indeed, the standard deviation, which shows how much load varies in
{\em absolute} terms, grows with increasing mean load
(Figure~\ref{fig:statprop.cross_correl} shows CC=0.53 for the 1997 traces and
CC=0.72 for the 1998 traces.)  However, in {\em relative} terms,
variance shrinks with increasing mean load.  This can be seen in
Figure~\ref{fig:statprop.cov_mean_load}, which plots the coefficient of
variation (the standard deviation divided by the mean, abbreviated as
the COV) and the mean load for each of the load traces.  Here we can
see that desktop machines, with their smaller mean loads, have large
COVs compared to the other classes of machines.  The CC between mean
load and the COV of load is -0.49 for the 1997 traces and -0.64 for
the 1998 traces.  It is clear that as load increases, it varies {\em
less} in relative terms and {\em more} in absolute terms.


\begin{figure}
\centerline{
\begin{tabular}{cc}
\colfigsize
\epsfbox{eps/load_max_min_aug97.eps} &
\colfigsize
\epsfbox{eps/load_max_min_mar98.eps} \\
(a) & (b) \\
\end{tabular}
}
\caption
[Minimum, maximum, and mean load, all traces]
{Minimum, maximum, and mean load: (a) August, 1997
traces, (b) March, 1998 traces.}
\label{fig:statprop.min_max_mean_load}
\end{figure}


This difference between absolute and relative behavior also holds true
for the maximum load.  Figure~\ref{fig:statprop.min_max_mean_load} shows the
minimum, maximum, and mean load for each of the traces.  The minimum
load is, not surprisingly, zero in almost every case.  The maximum
load is positively correlated with the mean load (CC=0.60 for the 1997
traces and CC=0.43 for the 1998 traces in
Figure~\ref{fig:statprop.cross_correl}.)  Figure ~\ref{fig:statprop.max_to_mean_load}
plots the ratio max/mean and the mean load for each of the traces.  It
is clear that this relative measure is inversely related to mean load,
and Figure~\ref{fig:statprop.cross_correl} shows that the CC is -0.36 for the
1997 traces and -0.48 for the 1998 traces.  It is also important to
notice that while the differences in maximum load between the hosts
are rather small (Figure~\ref{fig:statprop.min_max_mean_load}), the differences
in the max/mean ratio can be quite large
(Figure~\ref{fig:statprop.max_to_mean_load}.)  Desktops are more surprising
machines in relative terms.

\begin{figure}
\centerline{
\begin{tabular}{cc}
\colfigsize
\epsfbox{eps/load_max_to_mean_aug97.eps} &
\colfigsize
\epsfbox{eps/load_max_to_mean_mar98.eps} \\
(a) & (b) \\
\end{tabular}
}
\caption
[Maximum to mean load ratios and mean load, all traces]
{Maximum to mean load ratios and mean load: (a) August, 1997
traces, (b) March, 1998 traces.}
\label{fig:statprop.max_to_mean_load}
\end{figure}



\begin{figure}[tbp]
\centerline{
\begin{tabular}{cc}
\epsfxsize=3.25in
\epsfbox{eps/Aug97BoxPlot.eps}
&
\epsfxsize=3.25in
\epsfbox{eps/Mar98BoxPlot.eps}
\\
(a) & (b)
\end{tabular}
}

\caption
[Load trace Box plots]
{Load trace Box plots: (a) August, 1997 traces, (b) March, 1998 traces.}
\label{fig:statprop.trace_stats}
\end{figure}

Figure~\ref{fig:statprop.trace_stats} presents another way of looking
at the traces' absolute variability.  The figure summarizes each
traces in a Box plot variant.  The central line in each box marks the
median load, while the lower and upper lines mark the 25th and 75th
percentiles.  The lower and upper ``whiskers'' extending from the box
mark the actual 2.5th and 97.5th percentiles.  The circle marks the
mean and the triangles mark the 2.5th and 97.5th percentiles assuming
a normal distribution with the trace's mean and variance.  This
representation gives some idea of the overall distribution of the host
load signal.

With respect to scheduling tasks, the implication of the differences
between relative and absolute measures of variability is that lightly
loaded (low mean load) hosts are not always preferable over heavily
loaded hosts.  For example, if the performance metric is itself a
relative one (that the running time not vary much relative to the mean
running time, say), then a more heavily loaded host may be preferable.



\subsection{Distributions}

\begin{figure}
\centerline{
\begin{tabular}{cc}
\epsfxsize=3in
\epsfbox{eps/axp0day.hist.eps} &
\epsfxsize=3in
\epsfbox{eps/axp7day.hist.eps} \\
(a) axp0 & (b) axp7\\
\end{tabular}
}
\caption
[Distribution of host load]
{Histograms for load on axp0 and axp7 on August 19, 1997.}
\label{fig:statprop.axphist}
\end{figure}

We next treated each trace as a realization of an independent,
identically distributed (IID) stochastic process.  Such a process is
completely described by its probability distribution function (pdf),
which does not change over time.  Since we have a vast number of data
points for each trace, histograms closely approximate this underlying
pdf.  We examined the histograms of each of our load traces and fitted
normal and exponential distributions to them.  To illustrate the
following discussion, Figure~\ref{fig:statprop.axphist} shows the histograms of
load measurements on (a) axp0 and (b) axp7 on August 19, 1997 (86400
samples each.)  Axp0 has a high mean load, while axp7 is much more
lightly loaded.

Some of the traces, especially those with high mean loads, have
multi-modal histograms.  Figure~\ref{fig:statprop.axphist}(a) is an
example of such a multi-modal distribution while
Figure~\ref{fig:statprop.axphist}(b) shows a unimodal distribution.
Typically, the modes are integer multiples of 1.0 (and occasionally
0.5.)  One explanation for this behavior is that jobs on these
machines are for the most part compute bound and thus the ready queue
length corresponds to the number of jobs.  This seems plausible for
the cluster machines, which run scientific workloads for the most
part.  However, such multi-modal distributions were also noticed on
the some of the other machines.

The rough appearance of the histograms (consider
Figure~\ref{fig:statprop.axphist}(b)) is due to the fact that the underlying
quantity being measured (ready queue length) is discrete.  Load
typically takes on 600-3000 unique values in these traces.  Shannon's
entropy measure~\cite{SHANNON-COMM-THEORY} indicates that the load
traces can be encoded in 1.4 to 8.5 bits per value, depending on the
trace.  These observations and the histograms suggest that load spends
most of its time in one of a small number of levels.

%
% This figure includes the fits for the multimodal distributions
%\begin{figure}%[t]
%\centerline{
%\begin{tabular}{cc}
%\epsfxsize=2.5in
%\epsfbox{eps/axp0day.N.qq.eps} &
%\epsfxsize=2.5in
%\epsfbox{eps/axp7day.N.qq.eps} \\
%(a) axp0 versus normal &
%(c) axp7 versus normal \\
%\epsfxsize=2.5in
%\epsfbox{eps/axp0day.Exp.qq.eps} &
%\epsfxsize=2.5in  
%\epsfbox{eps/axp7day.Exp.qq.eps} \\
%(b) axp0 versus exponential &
%(d) axp7 versus exponential \\
%\end{tabular}
%}
%\label{fig:statprop.axpqqplot}
%\caption{Quantile-quantile plots}
%\end{figure}

\begin{figure}
\centerline{
\begin{tabular}{cc}
\epsfxsize=3in
\epsfbox{eps/axp7day.N.qq.eps} &
\epsfxsize=3in  
\epsfbox{eps/axp7day.Exp.qq.eps} \\
(a) axp7 versus normal &
(b) axp7 versus exponential \\
\end{tabular}
}
\caption
[Quantile-quantile plots, axp7 trace]
{Quantile-quantile plots for axp7 trace of August 19, 1997.}
\label{fig:statprop.axpqqplot}
\end{figure}

The histograms share very few common characteristics and did not
conform well to the analytic distributions we fit to them.
Quantile-quantile plots are a powerful way to assess how a
distribution fits data (cf.~\cite[pp. 196--200]{JAIN-PERF-ANALYSIS})
The quantiles (the $\alpha$ quantile of a pdf (or histogram) is the
value $x$ at which $100\alpha$ \% of the probability (or data) falls
to the left of $x$) of the data set are plotted against the quantiles
of the hypothetical analytic distribution.  Regardless of the choice
of parameters, the plot will be linear if the data fits the
distribution.

We fitted normal and exponential distributions to each of the load
traces. The fits are atrocious for the multimodal traces, and we do
not discuss them here.  For the unimodal traces, the fits are slightly
better.  Figure~\ref{fig:statprop.axpqqplot} shows quantile-quantile
plots for (a) normal and (b) exponential distributions fitted to the
unimodal axp7 load trace of Figure~\ref{fig:statprop.axphist}(b).
Neither the normal or exponential distribution correctly captures the
tails of the load traces.  This can be seen in the figure.  The
quantiles of the data grow faster than those of the normal
distribution toward the right sides of
Figure~\ref{fig:statprop.axpqqplot}(a).  This indicates that the data
has a longer or heavier tail than the normal distribution.
Conversely, the quantiles of the data grow more slowly than those of
the exponential distribution, as can be seen in
Figures~\ref{fig:statprop.axpqqplot}(b).  This indicates that the data
has a shorter tail than the exponential distribution.  Notice that the
exponential distribution goes as $e^{-x}$ while the normal
distribution goes as $e^{-x^2}$.

There are two implications of these complex distributions.  First,
simulation studies and analytic results predicated on simple, analytic
distributions may produce erroneous results.  Clearly, trace-driven
simulation studies are to be preferred.  The second implication is
that prediction algorithms should not only reduce the overall variance
of the load signal, but also produce errors that are better fit an
analytic distribution.  One reason for this is to make confidence
intervals easier to compute.

\subsection{Time series analysis}

\def\tsfigsize{\epsfxsize=4in}

\begin{figure}
\centerline{
\begin{tabular}{cc}
\colfigsize
\epsfbox{eps/axp7_19_97_time.eps} &
\colfigsize
\epsfbox{eps/axp7_19_97_acf.eps} \\
(a)&(b)\\
\colfigsize
\epsfbox{eps/axp7_19_97_fft.eps}  &
\colfigsize
\epsfbox{eps/axp7_19_97_pacf.eps} \\
(d)&(c)\\
\end{tabular}
}
\caption
[Time series analysis, axp7 trace]
{Time series analysis of axp7 load trace collected on August
19, 1997: (a) Load trace,  (b) Autocorrelation
function to lag 600 (10 minutes), (c) Partial autocorrelation function
to lag 600 (10 minutes), (d) Periodogram.}
\label{fig:statprop.axp7ts}
\end{figure}

We examined the autocorrelation function, partial autocorrelation
function, and periodogram of each of the load traces.  These time
series analysis tools show that past load values have a strong
influence on future load values.  For illustration,
Figure~\ref{fig:statprop.axp7ts} shows (a) the axp7 load trace
collected on August 19, 1997, (b) its autocorrelation function to a
lag of 600, (c) its partial autocorrelation function to a lag of 600,
and (d) its periodogram.  The analysis of this trace is representative
of our results.

The autocorrelation function, which ranges from -1 to 1, shows how
well a load value at time $t$ is linearly correlated with its
corresponding load value at time $t+\Delta$---in effect, how well
the value at time $t$ linearly predicts the value at time $t+\Delta$.
Autocorrelation is a function of $\Delta$, and in
Figure~\ref{fig:statprop.axp7ts}(b) we show the results for $0 \le
\Delta \le 600$.  Notice that even at $\Delta=600$ seconds, values are
still strongly correlated.  This very strong, long range correlation
is common to each of the load traces.  

The partial autocorrelation function shows how well purely
autoregressive linear models capture the correlation structure of a
sequence~\cite[pp. 64--69]{BOX-JENKINS-TS}.  The square of the value
of the function at a lag $k$ indicates the benefit of advancing from a
$k-1$-th order model to a $k$-th order model.  Intuitively, if a
$k$-th order model were sufficient, then the partial autocorrelation
function would be zero beyond a lag of $k$ and the autocorrelation
function would be infinite.  In a dual manner, if a $k$-th order
purely moving average model were sufficient, then the autocorrelation
function would be zero beyond a lag of $k$ and the partial
autocorrelation function would be infinite.  As we can see from
Figure~\ref{fig:statprop.axp7ts}(b) and (c), both functions have extremely
large extents.  This suggests that mixed models, which combine
autoregressive and moving average components are appropriate for
modelling host load.

The periodogram of a load trace is the magnitude of the Fourier
transform of the load data, which we plot on a log scale
(Figure~\ref{fig:statprop.axp7ts}(d).)  The periodogram shows the contribution
of different frequencies (horizontal axis) to the signal.  What is
clear in the figure, and is true of all of the load traces, is that
there are significant contributions from all frequencies---the
signal looks much like noise.  We believe the two noticeable peaks to
be artifacts of the kernel sample rate---the kernel is not sampling
the length of the ready queue frequently enough to avoid aliasing.
Only a few of the other traces exhibit the smaller peaks, but they all
share the broad noise-like appearance of this trace.

There are several implications of this time series analysis.  First,
the existence of such strong autocorrelation implies that load
prediction based on past load values is feasible.  Furthermore, it
suggests that linear time series models may be appropriate for this
prediction.  Such models attempt to parsimoniously capture non-zero
autocorrelation functions.  The existence of the strong
autocorrelation also suggests that simulation models and analytical
work that eschews this very clear dependence may be in error.
Finally, the almost noise-like periodograms suggest that quite
complex, possibly nonlinear models will be necessary to produce or
predict load.


\section{Self-similarity}
\label{sec:statprop.self-sim}

\begin{figure}
\centerline{
\begin{tabular}{cc}
\epsfxsize=2.35in
\epsfbox{eps/axp7.1.eps} &
\epsfxsize=2.35in
\epsfbox{eps/axp7.5.eps} \\
10 days &
64 minutes \\
\epsfxsize=2.35in
\epsfbox{eps/axp7.2.eps} &
\epsfxsize=2.35in
\epsfbox{eps/axp7.6.eps} \\
2.5 days &
16 minutes \\
\epsfxsize=2.35in
\epsfbox{eps/axp7.3.eps} &
\epsfxsize=2.35in
\epsfbox{eps/axp7.7.eps} \\
16 hours &
4 minutes \\
\epsfxsize=2.35in
\epsfbox{eps/axp7.4.eps} &
\epsfxsize=2.35in
\epsfbox{eps/axp7.8.eps} \\
4 hours &
60 seconds \\
\end{tabular}
}
\caption
[Visual representation of self-similarity]
{Visual representation of self-similarity.  Each graph plots
load versus time and ``zooms in'' on the middle quarter}
\label{fig:statprop.visual_self_sim}
\end{figure}

The key observation of this section is that each of the load traces
exhibits a high degree of self-similarity.  This is significant for
two reasons.  First, it means that load varies significantly across
all time-scales---it is not the case that increasing smoothing of
the load quickly tames its variance. A job will have a great deal of
variance in its running time regardless of how long it is.  Second, it
suggests that load is difficult to model and predict well.  In
particular, self-similarity is indicative of long memory, possibly
non-stationary stochastic processes such as fractional ARIMA
models~\cite{HOSKING-FRACT-DIFF,GRANGER-JOYEUX-FRACT-DIFF,BERAN-STAT-LONG-RANGE-METHODS},
and fitting such models to data and evaluating them can be quite
expensive.

Figure~\ref{fig:statprop.visual_self_sim} visually demonstrates the self
similarity of the axp7 load trace.  The top left graph in the figure
plots the load on this machine versus time for 10 days.  Each
subsequent graph ``zooms in'' on the highlighted central 25\% of the
previous graph, until we reach the bottom right graph, which shows the
central 60 seconds of the trace.  The plots are scaled to make the
behavior on each time scale obvious.  In particular, over longer time
scales, wider scales are necessary.  Intuitively, a self-similar
signal is one that looks similar on different time scales given this
rescaling.  Although the behavior on the different graphs is not
identical, we can clearly see that there is significant variation on
all time scales.

An important point is that as we smooth the signal (as we do visually
as we ``zoom out'' toward the top of the page in
Figure~\ref{fig:statprop.visual_self_sim}), the load signal strongly
resists becoming uniform.  This suggests that low frequency components
are significant in the overall mix of the signal, or, equivalently,
that there is significant long range dependence.  It is this property
of self-similar signals that most strongly differentiates them and
causes significant modeling difficulty.   

Self-similarity is more than intuition---it is a well defined
mathematical statement about the relationship of the autocorrelation
functions of increasingly smoothed versions of certain kinds of
long-memory stochastic processes.  These stochastic processes model
the sort of the mechanisms that give rise to self-similar signals.  We
shall avoid a mathematical treatment here, but interested readers may
want to consult~\cite{SELF-SIM-ETHERNET-LELAND-SIGCOMM93}
or~\cite{IMPACT-SELF-SIM-NETWORK-PERF-ANALYSIS-MORIN} for a treatment
in the context of networking
or~\cite{FRACTAL-STRUCTURES-CHAOS-SCI-MED} for its connection to
fractal geometry, or~\cite{BERAN-STAT-LONG-RANGE-METHODS} for a
treatment from a linear time series point of view.   Interestingly,
self-similarity has revolutionized network traffic modelling in the
1990s~\cite{GARRETT-VBR-VIDEO-MODELLING,SELF-SIM-ETHERNET-LELAND-SIGCOMM93,IMPACT-SELF-SIM-NETWORK-PERF-ANALYSIS-MORIN,SELF-SIM-HIGH-VAR-ETHERNET-WILLINGER-SIGCOMM95}.


\begin{figure}
\centerline{
\begin{tabular}{cc}
\epsfxsize=3.1in
\epsfbox{eps/self_sim_psd_example.eps} &
\epsfxsize=3in
\epsfbox{eps/self_sim_rel_disp_example.eps} \\
(a) & (b) 
\end{tabular}
}
\caption
[Understanding the Hurst parameter]
{Meaning of the Hurst parameter: (a) Frequency domain
interpretation using power spectral analysis, (b) Effect of increased
smoothing using dispersional analysis. }
\label{fig:statprop.explain_hurst}
\end{figure}

The degree and nature of the self-similarity of a sequence is
summarized by the Hurst parameter, $H$~\cite{HURST51}.  Intuitively,
$H$ describes the relative contribution of low and high frequency
components to the signal.  Consider Figure~\ref{fig:statprop.explain_hurst}(a),
which plots the periodogram (the magnitude of the Fourier transform)
of the axp7 load trace of August 19, 1997 on a log-log scale.  In this
transformed form, we can describe the trend with a line of slope
$-\beta$ (meaning that the periodogram decays hyperbolically with
frequency $\omega$ as $\omega^{-\beta}$.  The Hurst parameter $H$ is
then defined as $H=(1+\beta)/2$.  As we can see in
Figure~\ref{fig:statprop.explain_hurst}(a), $H=0.5$ corresponds to a line of zero
slope.  This is the uncorrelated noise case, where all frequencies
make a roughly equal contribution.  As $H$ increases beyond $0.5$, we
see that low frequencies make more of a contribution.  Similarly, as
$H$ decreases below $0.5$, low frequencies make less of a
contribution.  $H>0.5$ indicates self-similarity with positive near
neighbor correlation, while $H<0.5$ indicates self-similarity with
negative near neighbor correlation.  Figure~\ref{fig:statprop.explain_hurst}(a)
shows that the axp7 trace is indeed self-similar with $H=0.875$.  This
method of determining the Hurst parameter is known as power spectral
analysis.

Figure~\ref{fig:statprop.explain_hurst}(b) illustrates another way to think
about the Hurst parameter $H$.  To create the figure, we binned the
load trace with increasingly larger, non-overlapping bins and then
plotted the relative dispersion (the standard deviation normalized by
the mean) of the binned series versus the size of the bins.  For
example, at a bin size of 8 we averaged the first 8 samples of the
original series to form the first sample of the binned series, the
next 8 to form the second sample, and so on.  The figure shows that
the relative dispersion of this new 8-binned series is slightly less
than one.  If the original load trace is self-similar, the relative
dispersion should decline hyperbolically with increasing bin size.  On
a log-log scale such we use in the figure this relationship would
appear to be linear with a slope of $H-1$.  We see that this is indeed
the case for the axp7 trace.  Notice that this method for estimating
$H$, which is called dispersional analysis, gives a slightly different
$H$ (0.95) than power spectral analysis.  What is important is that in
both figures we see a hyperbolic relationship, and that both estimates
for $H$ are much larger than 0.5.

%The idea of self-similarity and the statistic used to characterize it
%arose not in theory, but from the experimental physical sciences and
%engineering.  Hurst~\cite{HURST51} invented the notion and the
%statistic (now called the ``Hurst parameter'') in order to
%characterize how the amount of water in reservoirs changes over time.
%He then applied it to summarize many other data series arising from
%natural processes.  The Hurst parameter $H$ ranges from zero to one.
%$H=0.5$ indicates no self-similarity and is expected of a series
%generated from a stationary probability distribution where a sample is
%completely independent of all previous samples.  $H>0.5$ indicates
%self-similarity with positive near neighbor correlation, while $H<0.5$
%indicates self-similarity with negative near neighbor correlation.  



\begin{figure}
\centerline{
\begin{tabular}{cc}
\colfigsize 
\epsfbox{eps/hurst_mean_sdev_automated_aug97.eps}  &
\colfigsize
\epsfbox{eps/hurst_mean_sdev_automated_mar98.eps} \\
(a)& (b) \\
\end{tabular}
}
\caption
[Hurst parameter estimates, all traces]
{Hurst parameter estimates (mean of four point estimates and
standard deviation) : (a) August, 1997 traces, (b) March, 1998
traces.}
\label{fig:statprop.load_trace_self_sim_mean}
\end{figure}

\begin{figure}
\centerline{
\begin{tabular}{cc}
\colfigsize 
\epsfbox{eps/hurst_all_automated_aug97.eps}  &
\colfigsize
\epsfbox{eps/hurst_all_automated_mar98.eps} \\
(a)& (b) \\
\end{tabular}
}
\caption
[Hurst parameter point estimates, all traces]
{Hurst parameter point estimates:  (a) August, 1997
traces, (b) March, 1998 traces.}
\label{fig:statprop.load_trace_self_sim_all}
\end{figure}

We examined each of the load traces for self-similarity and estimated
each one's Hurst parameter.  There are many different estimators for
the Hurst parameter~\cite{TAQQU-HURST-ESTIMATORS-COMPARE}, but there
is no consensus on how to best estimate the Hurst parameter of a
measured series.  The most common technique is to use several Hurst
parameter estimators and try to find agreement among them.  The four
Hurst parameter estimators we used were R/S analysis, the
variance-time method, dispersional analysis, and power spectral
analysis.  A description of these estimators as well as several others
may be found in~\cite{FRACTAL-STRUCTURES-CHAOS-SCI-MED}.  The
advantage of these estimators is that they make no assumptions about
the stochastic process that generated the sequence.  However, they
also cannot provide confidence intervals for their estimates.
Estimators such as the Whittle
estimator~\cite{BERAN-STAT-LONG-RANGE-METHODS} can provide confidence
intervals, but a stochastic process model must be assumed over which
an $H$ can be found that maximizes its likelihood.

We implemented the estimators using Matlab and validated each one by
examining degenerate series with known $H$ and series with specific
$H$ generated using the random midpoint displacement method.  The
dispersional analysis method was found to be rather weak for $H$
values less than about $0.8$ and the power spectral analysis method
gave the most consistent results.

Figure~\ref{fig:statprop.load_trace_self_sim_mean} presents our estimates of
the Hurst parameters of each of load traces.  In the graph, each
central point is the mean of the four estimates, while the outlying
points are at +/- one standard deviation.
Figure~\ref{fig:statprop.load_trace_self_sim_all} shows the four actual point
estimates for each trace.  Notice that for small $H$ values,
the standard deviation is high due to the inaccuracy of dispersional
analysis.  The important point is that the mean Hurst estimates are
all significantly above the $H=0.5$ line and their +/- one standard
deviation points are also above the line.

The traces exhibit self-similarity Hurst parameters ranging from 0.73
to 0.99, with a strong bias toward the top of that range.  An
examination of the correlation coefficients (CCs) in
Figure~\ref{fig:statprop.cross_correl} shows some surprising results.  For the
August, 1997 traces, the Hurst parameter is positively correlated with
mean load (CC=0.45) and the standard deviation of load (CC=0.58), but
is negatively correlated with the max/mean load ratio (CC=-0.49).  On
the other hand, for the March, 1998 traces, the Hurst parameter is
negatively correlated with mean load (CC=-0.30) and the standard
deviation of load (CC=-0.41), but is positively correlated with the
max/mean ratio (CC=0.36).  Furthermore, in the March, 1998 traces, we
find the Hurst parameter is strongly positively correlated with the
epoch statistics, which is not the case at all for the August, 1997
traces.  It is not clear what the cause for this difference between
the traces is.

As we discussed above, self-similarity has implications for load
modeling and for load smoothing.  The long memory stochastic process
models that can capture self-similarity tend to be expensive to fit to
data and evaluate.  Smoothing the load (by mapping large units of
computations instead of small units, for example) may be misguided
since variance may not decline with increasing smoothing intervals as
quickly as quickly as expected.  Consider smoothing load by averaging
over an interval of length $m$.  Without long range dependence
($H=0.5$), variance would decay with $m$ as $m^{-1.0}$, while with
long range dependence, as $m^{2H-2}$ or $m^{-0.54}$ and $m^{-0.02}$
for the range of Hurst parameters we measured.

\section{Epochal behavior}
\label{sec:statprop.epochal_behavior}

The key observation in this section is that while load varies
in complex ways, the manner in which it changes remains relatively
constant for relatively long periods of time.  We refer to a period of
time in which this stability holds true as an epoch.  For example, the
load signal could be a 0.25 Hz Sin wave for a minute and a 0.125 Hz
sawtooth wave the next minute---each minute is an epoch.  That these
epochs exist and are long is significant because it suggests that
modeling load can be simplified by modeling epochs separately from
modeling the behavior within an epoch.  Similarly, it suggests a two
stage prediction process.

The spectrogram representation of a load trace immediately highlights
the epochal behavior we discuss in this section. A spectrogram
combines the frequency domain and time domain representations of a
time series.  It shows how the frequency domain changes locally (for a
small segment of the signal) over time.  For our purposes, this local
frequency domain information is the ``manner in which [the load]
changes'' to which we referred earlier.  To form a spectrogram, we
slide a window of length $w$ over the series, and at each offset $k$,
we Fourier-transform the $w$ elements in the window to give us $w$
complex Fourier coefficients.  Since our load series is real-valued,
only the first $w/2$ of these coefficients are needed.  We form a plot
where the $x$ coordinate is the offset $k$, the $y$ coordinate is the
coefficient number, $1,2,\dots,w/2$ and the $z$ coordinate is the
magnitude of the coefficient.  To simplify presentation, we collapse
to two dimensions by mapping the logarithm of the $z$ coordinate (the
magnitude of the coefficient) to color.  The spectrogram is basically
a midpoint in the tradeoff between purely time-domain or
frequency-domain representations.  Along the $x$ axis we see the
effects of time and along the $y$ axis we see the effects of
frequency.

\begin{figure}
\centerline{
\begin{tabular}{c}
\tsfigsize
\hspace{0.1in} \epsfbox{eps/axp7_19_97_time.eps}
\\
\tsfigsize
\epsfbox{eps/axp7_19_97_spec.eps}
\end{tabular}
}
\caption
[Epochal behavior]
{Time domain and spectrogram representations of load for host axp7 on August 19, 1997. }
\label{fig:statprop.specgrams}
\end{figure}


Figure~\ref{fig:statprop.specgrams} shows a representative case, a 24 hour
trace from the PSC host axp7, taken on August 19, 1997.  The top graph
shows the time domain representation, while the bottom graph shows the
corresponding spectrogram representation.
%~\footnote{Notice that there
%is evidence of foldover from a too-low sampling rate.  We are
%convinced that the problem lies in how often DUX actually updates its
%load value and not in our sampling.  One explanation is that DUX does
%not update the load average whenever the length of the ready queue
%changes.  Another is that it does not update it when running device
%drivers or other in-kernel tasks.} 
What is important to note (and which occurs in all the spectrograms of
all the traces) are the relatively wide vertical bands.  These
indicate that the frequency domain of the underlying signal stays
relatively stable for long periods of time.  We refer to the width of
a band as the duration of that frequency epoch.

That these epochs exist can be explained by programs executing
different phases, programs being started and shut down, and the like.
The frequency content within an epoch contains energy at higher
frequencies because of events that happen on smaller time-frames, such
as user input, device driver execution, and daemon execution.  

\begin{figure}
\centerline{
\begin{tabular}{cc}
\colfigsize
\epsfbox{eps/epoch_sdev_mean_aug97.eps} &
\colfigsize
\epsfbox{eps/epoch_sdev_mean_march98.eps} \\
(a) & (b) \\
\end{tabular}
}
\caption
[Mean epoch length +/- one standard deviation, all traces]
{Mean epoch length +/- one standard deviation:  (a) August, 1997
traces, (b) March, 1998 traces.}
\label{fig:statprop.sdev_mean_epoch}
\end{figure}


We can algorithmically find the edges of these epochs by computing the
difference in adjacent spectra in the spectrogram and then looking for
those offsets where the differences exceed a threshold.  Specifically,
we compute the sum of mean squared errors for each pair of adjacent
spectra.  The elements of this error vector are compared to an epsilon
(5\% here) times the mean of the vector.  Where this threshold is
exceeded, a new epoch is considered to begin.  Having found the
epochs, we can examine their statistics.
Figure~\ref{fig:statprop.sdev_mean_epoch} shows the mean epoch length and the
+/- one standard deviation levels for each of the load traces.  The
mean epoch length ranges from about 150 seconds to over 450 seconds,
depending on which trace.  The standard deviations are also relatively
high (80 seconds to over 600 seconds.)  It is the Production Cluster
class which is clearly different when it comes to epoch length.  The
machines in this class tend to have much higher means and standard
deviations than the other machines.  One explanation might be that
most of the machines run batch-scheduled scientific jobs which may
well have longer computation phases and running times.  However, two
of the interactive machines also exhibit high means and standard
deviations. Interestingly, there is no correlation of the mean epoch
length and standard deviation to the mean load for either set of
traces (Figure~\ref{fig:statprop.cross_correl}.)  However, for the March, 1998
traces, we find correlations between the epoch length statistics and
the Hurst parameter that are particularly strong.  It is not clear why
this difference exists between the traces.

\begin{figure}
\centerline{
\begin{tabular}{cc}
\colfigsize
\epsfbox{eps/epoch_cov_mean_aug97.eps} &
\colfigsize
\epsfbox{eps/epoch_cov_mean_mar98.eps} \\
(a) & (b) \\
\end{tabular}
}
\caption
[COV of epoch length and mean epoch length, all traces]
{COV of epoch length and mean epoch length:  (a) August, 1997
traces, (b) March, 1998 traces.}
\label{fig:statprop.cov_mean_epoch}
\end{figure}

The standard deviations of epoch length in
Figure~\ref{fig:statprop.sdev_mean_epoch} give us an absolute measure of the
variance of epoch length.  Figure~\ref{fig:statprop.cov_mean_epoch} shows the
coefficient of variance (COV) of epoch length and mean epoch length
for each trace.  The COV is our relative measure of epoch length
variance.  Unlike with load (Section~\ref{sec:statprop.stats}), these absolute
and relative measures of epoch length variance are {\em both}
positively correlated with the mean epoch length.  In addition, the
correlation is especially strong (the CCs are at least 0.88.)  As
epoch length increases, it varies more in both absolute and relative
terms.  The statistical properties of epoch lengths are independent of
the statistical properties of load.

The implication of long epoch lengths is that the problem of
predicting load may be able to be decomposed into a segmentation problem
(finding the epochs) and a sequence of smaller prediction subproblems
(predicting load within each epoch.)

Strictly speaking, epochal behavior means that load is not stationary.
However, it is also not free to wander at will---clearly load cannot
rise to infinite levels or fall below zero.  This is compatible with
the ``borderline stationarity'' implied by self-similarity.  It is
also important to note that the nonstationarity is not ``smooth'' and
therefore is difficult to model with an integration process, such as
in an ARIMA or ARFIMA model.

\subsection{Lack of seasonality}

The epochal behavior that the load traces exhibit is not the same
thing as seasonality in the time series analysis
sense~\cite{INTRO-TIME-SERIES-FORECASTING,BOX-JENKINS-TS}.
Seasonality means that there are dominant (or at least visible)
underlying periodic signals on top of which are layered other signals.
It is not unreasonable to expect seasonality given that other
studies~\cite{WORKSTATION-AVAIL-STATS-CONDOR} have found that
availability of compute resources to change regularly over the hours
of the working day and the days of the working week.  However,
examination of the power spectrums and autocorrelations of the load
traces suggests that load does {\em not} exhibit seasonality.  We feel
this does not contradict the earlier results---the fluctuation of
resources simply is not sufficiently periodic to qualify as
seasonality in the strict time series sense.


\section{Conclusions}

This chapter has applied the first four steps of the resource signal
methodology to host load signals.  We identified host load as a
powerful way to measure CPU availability, developed a sensor for this
signal, and determined an appropriate sample rate.  We collected long,
fine grain load measurements on a wide variety of machines at two
different times of the year.  The results of an extensive statistical
analysis of these traces and their implications are the following:

(1) The traces exhibit low means but very high standard deviations and
maximums.  Relatively few of the traces had mean loads of 1.0 or more.
The standard deviation is typically at least as large as the mean,
while the maximums can be as much as two orders of magnitude larger.
The implication is that these machines have plenty of cycles to spare
to execute jobs, but the running time of these jobs will vary
drastically.

(2) Standard deviation and maximum, which are absolute measures of
variation, are positively correlated with the mean, so a machine with
a high mean load will also tend to have a large standard deviation and
maximum.  However, these measures do not grow as quickly as the mean,
so their corresponding relative measures actually {\em shrink} as the
mean increases.  The implication is that if the goal in scheduling a
task is to optimize a relative performance metric, it may not be
unreasonable to use the host with higher mean load.

(3) The traces have complex, rough, and often multi-modal distributions
that are not well fitted by analytic distributions such as the normal
or exponential distributions.  Even for the traces which exhibit
unimodally distributed load, the normal distribution's tail is too
short while the exponential distribution's tail is too long.  The
implication is that modeling and simulation that assumes convenient
analytical load distributions may be flawed.  

(4) Time series analysis of the traces shows that load is strongly
correlated over time.  The autocorrelation function typically decays
very slowly while the periodogram shows a broad, almost noise-like
combination of all frequency components.  An important implication is
that history-based load prediction schemes, such as linear time series
models, seem very feasible.  However, the complex frequency domain
behavior suggests that linear modeling schemes may have difficulty.
From a modeling point of view, it is clearly important that these
dependencies between successive load measurements are captured.

(5) The traces are self-similar.  Their Hurst parameters range from
0.73 to 0.99, with a strong bias toward the top of that range.  This
tells us that load varies in complex ways on all time scales and is
long term dependent.  This has several important implications.  First,
smoothing load by averaging over an interval results in much smaller
decreases in variance than if load were not long range dependent.
This suggests that task migration in the face of adverse load
conditions may be preferable to waiting for the adversity to be
ameliorated over the long term.  The self-similarity result also
suggests certain modeling approaches, such as fractional ARIMA
models~\cite{HOSKING-FRACT-DIFF,GRANGER-JOYEUX-FRACT-DIFF,BERAN-STAT-LONG-RANGE-METHODS}
which can capture this property.

(6) The traces display epochal behavior.  The local frequency content
of the load signal remains quite stable for long periods of time
(150-450 seconds mean) and changes abruptly at the boundaries of such
epochs.  This suggests that the problem of predicting load may be able
to be decomposed into a sequence of smaller subproblems.  In
particular, when using linear models, none of which can capture this
sort of ``rough'' nonstationarity, it may be necessary to refit the
models at epoch boundaries. 

Given these properties, we decided to study the performance of
Box-Jenkins linear time series models~\cite{BOX-JENKINS-TS} and
fractional ARFIMA
models~\cite{GRANGER-JOYEUX-FRACT-DIFF,HOSKING-FRACT-DIFF,BERAN-STAT-LONG-RANGE-METHODS}
for short range prediction of host load.  The next chapter describes
how we applied the fifth step of the resource signal methodology to
find which of these models was most appropriate.  The final step,
implementing an on-line host load prediction system based on the
appropriate model was easy using the RPS Toolkit.


\comment{
The more complex models do
indeed tend to {\em fit} the data of a load trace better than simple
models.  For example, fractional ARFIMA models had as much as 40\%
lower mean squared error in some cases.  Surprisingly, however, we
found that simple autoregressive models performed sufficiently well
for short range {\em prediction}~\cite{DINDA-LOAD-PRED-TR-98}.  While
there were statistically significant differences between the models we
studied, they were not sufficiently large to warrant the use of more
complex models.  Our recommendation is to use autoregressive models of
order 16 or higher for prediction horizons of up to 30 seconds.

%The context of our work is dynamically mapping real-time tasks to
%hosts in a distributed environment. In this context, we are interested
%in load models for classifying environments and hosts, synthesizing
%realistic running times in a simulator, and for predicting running
%times on remote hosts.  We will gladly share the load traces we have
%collected with members of the research community.
}

\comment{
Epochal behavior is different from
seasonality: epoch lengths have significant variation and there is
significant and complex behavior within each epoch.  In fact, our
traces do not appear to exhibit seasonality at all.  Surprisingly, the
statistical properties of epoch lengths are independent of the
statistical properties of load and the Hurst parameter.  Both absolute
and relative measures of the variance in epoch length are strongly
correlated to the mean epoch length, however.  
}

\comment{
\begin{enumerate}

\item{The traces exhibit low means but very high
standard deviations and maximums.  This implies that these machines
have plenty of cycles to spare to execute jobs, but the running time
of these jobs will vary drastically.}

\item{Absolute measures of variation are positively correlated with
the mean while relative measures are negatively correlated.  This
suggests that it may not be unreasonable to map tasks to heavily
loaded machines under some performance metrics.}

\item{The traces have complex, rough, and often multimodal distributions that
are not well fitted by analytic distributions such as the normal or
exponential distributions, which are particularly inept at capturing
the tail of the distribution.  This implies that modeling and
simulation that assumes convenient analytical load distributions may
be flawed.  Trace-driven simulation may be preferable.}

\item{Load is strongly correlated over time, but has a
broad, almost noise-like frequency spectrum.  This implies that
history-based load prediction schemes are feasible, but that linear
methods may have difficulty.  Realistic load models should capture
this dependence, or trace-driven simulation should be used.}

\item{The traces are self-similar with relatively high Hurst
parameters. This means that load smoothing will decrease variance much
more slowly than expected.  It may be preferable to migrate tasks in
the face of adverse load conditions instead of waiting for the
adversity to be ameliorated over the long term.  Self-similarity also
suggests certain modeling approaches such as fractional ARIMA
models~\cite{HOSKING-FRACT-DIFF,GRANGER-JOYEUX-FRACT-DIFF,BERAN-STAT-LONG-RANGE-METHODS}
and non-linear models which can capture the self-similarity property.}

\item{The traces display epochal behavior in that the local frequency
content of the load signal remains quite stable for long periods of time
and changes abruptly at the boundaries of such epochs.  This suggests
that the problem of predicting load may be able to be decomposed into
a sequence of smaller subproblems.}

\end{enumerate}
}






