% most useful Latex command
\newcommand{\comment}[1]{}

\documentclass[12pt,twoside]{report}
\usepackage{fullpage}
\usepackage{cmu-titlepage}
\usepackage{epsf}
\usepackage{times}
%\usepackage{changebar}
%\usepackage{lscape}

%\input{restore}

%\renewcommand{\dbltopfraction}{.9}
\renewcommand{\topfraction}{.9}
\renewcommand{\textfraction}{.1}
\renewcommand{\floatpagefraction}{.9}
%\renewcommand{\dbltopnumber}{1}
%\renewcommand{\dblfloatpagefraction}{.9}
% \def\normspacing {\renewcommand{\baselinestretch}{0.9}}
% \def\tinyspacing {\renewcommand{\baselinestretch}{0.7}}

\def\normspacing {}
\def\tinyspacing {}

\normspacing

\def\thesection       {\arabic{section}}
\def\thefigure{\arabic{figure}}

%\newcounter {subsubsubsection}[subsubsection]
%\def\thesubsubsubsection {\thesubsubsection.\arabic{subsubsubsection}}
%\def\subsubsubsection {\subsubsection*}

\makeatletter\long\def\unmarkedfootnote#1{{\long\def\@makefntext##1{##1}\footnotetext{#1}
}}

\def\figsize {\epsfxsize=5.0in} % \epsfysize=2.5in}
\def\absfigsize {\epsfxsize=5.1in}
\def\relfigsize {\epsfxsize=5.1in} 

\begin{document}

\bibliographystyle{acm}

\title{The Statistical Properties of Host Load}

\author{Peter A. Dinda \and David R. O'Hallaron}

\citationinfo{
A version of this paper will appear in the Proceedings of the Fourth
Workshop on Languages, Compilers, and Run-time Systems for Scalable
Computers (LCR98). 
}

\date{July 1998}

\trnumber{CMU-CS-98-143}

\keywords{load traces, load statistics, statistical analysis, 
time-series analysis, self-similarity, epochal behavior}


\support{
%\small
%\tinyspacing
Effort sponsored in part by the Advanced Research Projects Agency and
Rome Laboratory, Air Force Materiel Command, USAF, under agreement
number F30602-96-1-0287, in part by the National Science Foundation
under Grant CMS-9318163, and in part by a grant from the Intel
Corporation. The U.S. Government is authorized to reproduce and
distribute reprints for Governmental purposes notwithstanding any
copyright annotation thereon.  The views and conclusions contained
herein are those of the authors and should not be interpreted as
necessarily representing the official policies or endorsements, either
expressed or implied, of the Advanced Research Projects Agency, Rome
Laboratory, or the U.S. Government.
%\normalsize
%\normspacing
}

\abstract{
Understanding how host load changes over time is instrumental in
predicting the execution time of tasks or jobs, such as in dynamic
load balancing and distributed soft real-time systems.  To improve
this understanding, we collected week-long, 1 Hz resolution Unix load
average traces on 38 different machines including production and
research cluster machines, compute servers, and desktop workstations
Separate sets of traces were collected at two different times of the
year. The traces capture all of the dynamic load information available
to user-level programs on these machines.  We present a detailed
statistical analysis of these traces here, including summary
statistics, distributions, and time series analysis results. Two
significant new results are that load is self-similar and that it
displays epochal behavior.  All of the traces exhibit a high degree of
self similarity with Hurst parameters ranging from .63 to .97,
strongly biased toward the top of that range.  The traces also display
epochal behavior in that the local frequency content of the load
signal remains quite stable for long periods of time (150-450 seconds
mean) and changes abruptly at epoch boundaries.
}

\maketitle

\cleardoublepage


\section{Introduction}
\label{sec:intro}
The distributed computing environments to which most users have access
consist of a collection of loosely interconnected hosts running
vendor operating systems.  Tasks are initiated independently by users
and are scheduled locally by a vendor supplied operating system; there
is no global scheduler that controls access to the hosts. As users run
their jobs the computational load on the individual hosts changes over
time.

Deciding how to map computations to hosts in systems with such
dynamically changing loads (what we will call the {\em mapping
problem}) is a basic problem that arises in a number of important
contexts, such as dynamically load-balancing the tasks in a parallel
program ~\cite{JADE-LANGUAGE,DOME-TECHREPORT,SIEGELL-DYNAMIC-LB-94},
and scheduling tasks to meet deadlines in a distributed soft real-time
system~\cite{REAL-TIME-MANIFESTO-JENSEN,REALTIME-CORBA-WHITEPAPER,PREDICTABLE-NETWORK-COMPUTING-POLZE-ICDCS97}.

Host load has a significant effect on running time.  Indeed, the
running time of a compute bound task is directly related to the
average load it encounters during execution.  Determining a good
mapping of a task requires a prediction, either implicit or explicit,
of the load on the prospective remote hosts to which the task could be
mapped.  Making such predictions demands an understanding of the
qualitative and quantitative properties of load on real systems.  If
the tasks to be mapped are short, this understanding of load should
extend to correspondingly fine resolutions.  Unfortunately, to date
there has been little work on characterizing the properties of load at
fine resolutions.  The available studies concentrate on understanding
functions of load, such as
availability~\cite{WORKSTATION-AVAIL-STATS-CONDOR} or job
durations~\cite{LIMITED-BENEFIT-MIGRATION-EAGER,LOAD-BAL-HEURISTICS-PROCESS-BEHAVIOR-LELAND-OTT,EXPLOIT-PROC-LIFETIME-DIST-DYN-LB-SIGMETRICS}.
Furthermore, they deal with the coarse grain behavior of load --- how
it changes over minutes, hours and days.

This paper is a first step to a better understanding the properties of
load on real systems at fine resolutions.  We collected week-long, 1
Hz resolution traces of the Unix load average on 38 different machines
that we classify as production and research cluster machines, compute
servers, or desktop workstations. We collected two sets of such traces
at different times of the year.  The 1 Hz sample rate is sufficient to
capture all of the dynamic load information that is available to
user-level programs running on these machines.  In this paper, we
present a detailed statistical analysis of the first set of traces,
taken in August, 1997.  The results are similar for the second set, so
we have omitted them to save space.  We also contemplate the
implications of those properties for the mapping problem.

The basic question is whether load traces that might seem at first
glance to be random and unpredictable might have structure that could
be exploited by a mapping algorithm.  Our results suggest that load
traces do indeed have some structure in the form of clearly
identifiable properties. In essence, our results characterize how load
varies, which should be of interest not only to developers of mapping
and prediction algorithms, but also to those who need to generate
realistic synthetic loads in simulators or to those doing analytic
work.  Here is a summary of our results and their implications:

(1) The traces exhibit low means but very high standard deviations and
maximums.  Only four traces had mean loads near 1.0.  The standard
deviation is typically at least as large as the mean, while the
maximums can be as much as two orders of magnitude larger.  The
implication is that these machines have plenty of cycles to spare to
execute jobs, but the execution time of these jobs will vary
drastically.

(2) Standard deviation and maximum, which are absolute measures of
variation, are positively correlated with the mean, so a machine with
a high mean load will also tend to have a large standard deviation and
maximum.  However, these measures do not grow as quickly as the mean,
so their corresponding relative measures actually {\em shrink} as
the mean increases.  The implication is that if the mapping problem
assumes a relative metric, it may not be unreasonable to use the host
with higher mean load.

(3) The traces have complex, rough, and often multimodal distributions
that are not well fitted by analytic distributions such as the normal
or exponential distributions.  Even for the traces which exhibit
unimodally distributed load, the normal distribution's tail is too
short while the exponential distribution's tail is too long.  The
implication is that modeling and simulation that assumes convenient
analytical load distributions may be flawed.  

(4) Time series analysis of the traces shows that load is strongly
correlated over time.  The autocorrelation function typically decays
very slowly while the periodogram shows a broad, almost noise-like
combination of all frequency components.  An important implication is
that history-based load prediction schemes seem very feasible.
However, the complex frequency domain behavior suggests that linear
modeling schemes may have difficulty.  From a modeling point of view,
it is clearly important that these dependencies between successive
load measurements are captured.  

(5) The traces are self-similar.  Their Hurst parameters range from
.63 to .97, with a strong bias toward the top of that range.  This
tells us that load varies in complex ways on all time scales and is
long term dependent.  This has several important implications.  First,
smoothing load by averaging over an interval results in much smaller
decreases in variance than if load were not long range dependent.
Variance decays with increasing interval length $m$ and Hurst
parameter $H$ as $m^{2H-2}$.  This is $m^{-1.0}$ for signals without
long range dependence and $m^{-0.74}$ to $m^{-0.06}$ for the range of
$H$ we measured.  This suggests that task migration in the face of
adverse load conditions may be preferable to waiting for the adversity
to be ameliorated over the long term.  The self-similarity result also
suggests certain modeling approaches, such as fractional ARIMA
models~\cite{HOSKING-FRACT-DIFF,GRANGER-JOYEUX-FRACT-DIFF,BERAN-STAT-LONG-RANGE-METHODS}
which can capture this property.

(6) The traces display epochal behavior.  The local frequency content
of the load signal remains quite stable for long periods of time
(150-450 seconds mean) and changes abruptly at the boundaries of such
epochs.  This suggests that the problem of predicting load may be
able to be decomposed into a sequence of smaller subproblems.  

\comment{
Epochal behavior is different
from seasonality: epoch lengths have significant variation and there
is significant and complex behavior within each epoch.  In fact, our
traces do not appear to exhibit seasonality at all.  Surprisingly, the
statistical properties of epoch lengths are independent of the
statistical properties of load and the Hurst parameter.  Both absolute
and relative measures of the variance in epoch length are strongly
correlated to the mean epoch length, however.
}

\section{Measurement methodology}
\label{sec:load_trace_measurement}
The load on a Unix system at any given instant is the number of
processes that are running or are ready to run, which is the length of
the ready queue maintained by the scheduler.  The kernel samples the
length of the the ready queue at some rate and exponentially averages
some number of previous samples to produce a load average which can be
accessed from a user program.

We developed a small tool to sample the Digital Unix (DUX) one minute
load average at one second intervals and log the resulting time series
to a data file.  The 1 Hz sample rate was arrived at by subjecting DUX
systems to varying loads and sampling at progressively higher rates to
determine the rate at which DUX actually updated the value.  DUX
updates the value at a rate of $1/2$ Hz, thus we chose a 1 Hz sample
rate by the Nyquist criterion.  This choice of sample rate means we
capture all of the dynamic load information the operating system makes
available to user programs.  We ran this trace collection tool on 38
hosts belonging to the Computing, Media, and Communication Laboratory
(CMCL) at CMU and the Pittsburgh Supercomputing Center (PSC) for
slightly more than one week in late August, 1997.  A second set of
week-long traces was acquired on almost exactly the same set of
machines in late February and early March, 1998.  The results of the
statistical analysis were similar for the two sets of traces.  In this
paper, we concentrate on the August 1997 set.

All of the hosts in the August 1997 set were DEC Alpha DUX machines,
and they form four classes:
\begin{itemize}
\item{{\em Production Cluster}: 13 hosts of the PSC's ``Supercluster'',
including two front-end machines (axpfea, axpfeb), four interactive
machines (axp0 through axp3), and seven batch machines scheduled by a
DQS~\cite{KAPLAN-COMPARE-CLUSTER-SW} variant (axp4 through axp10.)}
\item{{\em Research Cluster}: eight machines in an experimental
cluster in the CMCL \\
(manchester-1 through manchester-8.)} 
\item{{\em Compute servers}: two high performance large memory
machines used by the CMCL group as compute servers for simulations and
the like (mojave and sahara.)}
\item{{\em Desktops}: 15 desktop workstations owned by members of the
CMCL (aphrodite through zeno.)} 
\end{itemize}

\section{Statistical analysis}
\label{sec:stats}

\begin{figure}[t]
\centerline{
\figsize
\epsfbox{correl_table.eps}
}
\caption{Correlation coefficients (CCs) between all of the discussed
statistical properties. }
\label{fig:cross_correl}
\end{figure}

We analyzed the individual load traces using summary statistics,
histograms, fitting of analytic distributions, and time series
analysis.  The picture that emerges is that load varies over a wide
range in very complex ways.  Load distributions are rough and
frequently multimodal.  Even traces with unimodal histograms are not
well fitted by common analytic distributions, which have tails that
are either too short or too long.  Time series analysis shows that
load is strongly correlated over time, but also has complex, almost
noise-like frequency domain behavior.


We summarized each of our load traces in terms of our statistical
measures and computed their correlations to determine how the measures
are related.  Figure~\ref{fig:cross_correl} contains the results. Each
cell of the table is the correlation coefficient (CC) between the row
measure and the column measure, computed over the 38 load traces.  We
will refer back to the highlighted cells throughout the paper.  It is
important to note that these cross correlations can serve as a basis
for clustering load traces into rough equivalence classes.


\subsubsection*{Summary statistics:}

Summarizing each load trace in terms of its mean, standard deviation, and
maximum and minimum illustrates the extent to which load
varies.  Figure~\ref{fig:sdev_mean_load} shows the mean load and the
+/- one standard deviation points for each of the traces.  As we might
expect, the mean load on desktop machines is significantly lower than
on other machines.  However, we can also see a lack of uniformity
within each class, despite the long duration of the traces.  This is
most clear among the Production Cluster machines, where four machines
seem to be doing most of the work.  This lack of uniformity even over
long time scales shows clear opportunity for load balancing or
resource management systems.


\begin{figure}[t]
\centerline{
\absfigsize
\epsfbox{sdev_mean_load.epsf} \hspace{0.15in}
}
\caption{Mean load +/- one standard deviation.}
\label{fig:sdev_mean_load}
\end{figure}


\begin{figure}[t]
\centerline{
\relfigsize
\epsfbox{cov_mean_load.epsf}
}
\caption{COV of load and mean load.}
\label{fig:cov_mean_load}
\end{figure}

From Figure~\ref{fig:sdev_mean_load} we can also see that desktop
machines have smaller standard deviations than the other machines.
Indeed, the standard deviation, which shows how much load varies in
{\em absolute} terms, grows with increasing mean load (CC=0.53 from
Figure~\ref{fig:cross_correl}.)  However, in {\em relative} terms,
variance shrinks with increasing mean load.  This can be seen in
Figure~\ref{fig:cov_mean_load}, which plots the coefficient of
variation (the standard deviation divided by the mean, abbreviated as
the COV) and the mean load for each of the load traces.  Here we can
see that desktop machines, with their smaller mean loads, have large
COVs compared to the other classes of machines.  The CC between mean
load and the COV of load is -0.49.  It is clear that as load
increases, it varies {\em less} in relative terms and {\em more} in
absolute terms.


\begin{figure}[t]
\centerline{
\absfigsize
\epsfbox{min_max_mean_load.epsf} \hspace{0.15in}
}
\caption{Minimum, maximum, and mean load.}
\label{fig:min_max_mean_load}
\end{figure}


This difference between absolute and relative behavior also holds true
for the maximum load.  Figure~\ref{fig:min_max_mean_load} shows the
minimum, maximum, and mean load for each of the traces.  The minimum
load is, not surprisingly, zero in every case.  The maximum load is
positively correlated with the mean load (CC=0.60 in
Figure~\ref{fig:cross_correl}.)  Figure ~\ref{fig:max_to_mean_load}
plots the ratio max/mean and the mean load for each of the traces.  It
is clear that this relative measure is inversely related to mean load,
and Figure~\ref{fig:cross_correl} shows that the CC is -0.36.  It is
also important to notice that while the differences in maximum load
between the hosts are rather small
(Figure~\ref{fig:min_max_mean_load}), the differences in the max/mean
ratio can be quite large (Figure~\ref{fig:max_to_mean_load}.)
Desktops are clearly more surprising machines in relative terms.

\begin{figure}[t]
\centerline{
\relfigsize
\epsfbox{max_to_mean_load.epsf}
}
\caption{Maximum to mean load ratios and mean load.}
\label{fig:max_to_mean_load}
\end{figure}

With respect to the mapping problem, the implication of the
differences between relative and absolute measures of variability is
that lightly loaded (low mean load) hosts are not always preferable
over heavily loaded hosts.  For example, if the performance metric is
itself a relative one (that the execution time not vary much relative
to the mean execution time, say), then a more heavily loaded host may
be preferable.

\subsubsection*{Distributions:}

\begin{figure}[t]
\centerline{
\begin{tabular}{cc}
\epsfxsize=3in
\epsfbox{axp0day.hist.eps} &
\epsfxsize=3in
\epsfbox{axp7day.hist.eps} \\
(a) axp0 & (b) axp7\\
\end{tabular}
}
\caption{Histograms for load on axp0 and axp7 on August 19, 1997.}
\label{fig:axphist}
\end{figure}

We next treated each trace as a realization of an independent,
identically distributed (IID) stochastic process.  Such a process is
completely described by its probability distribution function (pdf),
which does not change over time.  Since we have a vast number of data
points for each trace, histograms closely approximate this underlying
pdf.  We examined the histograms of each of our load traces and fitted
normal and exponential distributions to them.  To illustrate the
following discussion, Figure~\ref{fig:axphist} shows the histograms of
load measurements on (a) axp0 and (b) axp7 on August 19, 1997 (86400
samples each.)  Axp0 has a high mean load, while axp7 is much more
lightly loaded.

Some of the traces, especially those with high mean loads, have
multimodal histograms.  Figure~\ref{fig:axphist}(a) is an example of
such a multimodal distribution while Figure~\ref{fig:axphist}(b) shows
a unimodal distribution.  Typically, the modes are integer multiples
of 1.0 (and occasionally 0.5.)  One explanation for this behavior is
that jobs on these machines are for the most part compute bound and
thus the ready queue length corresponds to the number of jobs.  This
seems plausible for the cluster machines, which run scientific
workloads for the most part.  However, such multimodal distributions
were also noticed on the some of the other machines.

The rough appearance of the histograms (consider
Figure~\ref{fig:axphist}(b)) is due to the fact that the underlying
quantity being measured (ready queue length) is discrete.  Load
typically takes on 600-3000 unique values in these traces.  Shannon's
entropy measure~\cite{SHANNON-COMM-THEORY} indicates that the load
traces can be encoded in 1.4 to 8.48 bits per value, depending on the
trace.  These observations and the histograms suggest that load spends
most of its time in one of a small number of levels.

%
% This figure includes the fits for the multimodal distributions
%\begin{figure}%[t]
%\centerline{
%\begin{tabular}{cc}
%\epsfxsize=2.5in
%\epsfbox{axp0day.N.qq.eps} &
%\epsfxsize=2.5in
%\epsfbox{axp7day.N.qq.eps} \\
%(a) axp0 versus normal &
%(c) axp7 versus normal \\
%\epsfxsize=2.5in
%\epsfbox{axp0day.Exp.qq.eps} &
%\epsfxsize=2.5in  
%\epsfbox{axp7day.Exp.qq.eps} \\
%(b) axp0 versus exponential &
%(d) axp7 versus exponential \\
%\end{tabular}
%}
%\label{fig:axpqqplot}
%\caption{Quantile-quantile plots}
%\end{figure}

\begin{figure}[t]
\centerline{
\begin{tabular}{cc}
\epsfxsize=3in
\epsfbox{axp7day.N.qq.eps} &
\epsfxsize=3in  
\epsfbox{axp7day.Exp.qq.eps} \\
(a) axp7 versus normal &
(b) axp7 versus exponential \\
\end{tabular}
}
\caption{Quantile-quantile plots for axp7 trace of August 19, 1997.}
\label{fig:axpqqplot}
\end{figure}

The histograms share very few common characteristics and did not
conform well to the analytic distributions we fit to them.
Quantile-quantile plots are a powerful way to assess how a
distribution fits data (cf.~\cite{JAIN-PERF-ANALYSIS}, pp. 196--200.)
The quantiles (the $\alpha$ quantile of a pdf (or histogram) is the
value $x$ at which $100\alpha$ \% of the probability (or data) falls
to the left of $x$) of the data set are plotted against the quantiles
of the hypothetical analytic distribution.  Regardless of the choice
of parameters, the plot will be linear if the data fits the
distribution.

We fitted normal and exponential distributions to each of the load
traces. The fits are atrocious for the multimodal traces, and we do
not discuss them here.  For the unimodal traces, the fits are slightly
better.  Figure~\ref{fig:axpqqplot} shows quantile-quantile plots for
(a) normal and (b) exponential distributions fitted to the unimodal
axp7 load trace of Figure~\ref{fig:axphist}(b).  Neither the normal or
exponential distribution correctly captures the tails of the load
traces.  This can be seen in the figure.  The quantiles of the data
grow faster than those of the normal distribution toward the right
sides of Figures~\ref{fig:axpqqplot}(a).  This indicates that the data
has a longer or heavier tail than the normal distribution.
Conversely, the quantiles of the data grow more slowly than those of
the exponential distribution, as can be seen in
Figures~\ref{fig:axpqqplot}(b).  This indicates that the data has a
shorter tail than the exponential distribution.  Notice that the
exponential distribution goes as $e^{-x}$ while the normal
distribution goes as $e^{-x^2}$.

There are two implications of these complex distributions.  First,
simulation studies and analytic results predicated on simple, analytic
distributions may produce erroneous results.  Clearly, trace-driven
simulation studies are to be preferred.   The second implication is
that prediction algorithms should not only reduce the overall variance
of the load signal, but also produce errors that are better fit an
analytic distribution.  One reason for this is to make confidence
intervals easier to compute.

\subsubsection*{Time series analysis:}

\begin{figure}[t]
\centerline{
\begin{tabular}{c}
\epsfxsize=6in \epsfysize=1in
\epsfbox{axp7-ts.eps} \\
(a) Load trace for axp7 on August 19, 1997 \\
\epsfxsize=6in \epsfysize=1in
\epsfbox{axp7-autocorr.eps} \\
(b) Autocorrelation function of (a) to lag 600 (10 minutes) \\
%\epsfxsize=5in \epsfysize=.75in
%\epsfbox{axp7-pacf.eps} \\
%(c) Partial autocorrelation function of (a) to lag 600 (10 minutes) \\
\epsfxsize=6in \epsfysize=1in 
\epsfbox{axp7-periodogram.eps}  \\
(c) Periodogram (log scale) of (a) \\
\end{tabular}
}
\caption{Time series analysis of axp7 load trace collected on August
19, 1997.}
\label{fig:axp7ts}
\end{figure}

We examined the autocorrelation function, partial autocorrelation
function, and periodogram of each of the load traces.  These time
series analysis tools show that past load values have a strong
influence on future load values.  For illustration,
Figure~\ref{fig:axp7ts} shows (a) the axp7 load trace collected on
August 19, 1997, (b) its autocorrelation function to a lag of 600, and
(c) its periodogram.  The autocorrelation function, which ranges from
-1 to 1, shows how well a load value at time $t$ is linearly
correlated with its corresponding load value at time $t+\Delta$ --- in
effect, how well the value at time $t$ predicts the value at time
$t+\Delta$.  Autocorrelation is a function of $\Delta$, and in the
figure we show the results for $0 \le
\Delta \le 600$.  Notice that even at $\Delta=600$ seconds, values are
still strongly correlated.  This very strong, long range correlation
is common to each of the load traces.  For space reasons, we do not
discuss the partial autocorrelation results here.  However, it is
important to note that the behavior of the autocorrelation and partial
autocorrelation functions is instrumental to the Box-Jenkins linear
time series model identification process~\cite{BOX-JENKINS-TS}.

The periodogram of a load trace is the magnitude of the Fourier
transform of the load data, which we plot on a log scale
(Figure~\ref{fig:axp7ts}(c).)  The periodogram shows the contribution
of different frequencies (horizontal axis) to the signal.  What is
clear in the figure, and is true of all of the load traces, is that
there are significant contributions from all frequencies --- the
signal looks much like noise.  We believe the two noticeable peaks to
be artifacts of the kernel sample rate --- the kernel is not sampling
the length of the ready queue frequently enough to avoid aliasing.
Only a few of the other traces exhibit the smaller peaks, but they all
share the broad noise-like appearance of this trace.

There are several implications of this time series analysis.  First,
the existence of such strong autocorrelation implies that load
prediction based on past load values is feasible.  It also suggests
that simulation models and analytical work that eschews this very
clear dependence may be in error.  Finally, the almost noise-like
periodograms suggest that quite complex, possibly nonlinear models
will be necessary to produce or predict load.  



\section{Self-similarity}
\label{sec:self-sim}

\begin{figure}%[t]
\centerline{
\begin{tabular}{c}
\epsfxsize=5in \epsfysize=.75in
\epsfbox{axp7.1.eps} \\
10 days \\
\epsfxsize=5in \epsfysize=.75in
\epsfbox{axp7.2.eps} \\
2.5 days \\
\epsfxsize=5in \epsfysize=.75in
\epsfbox{axp7.3.eps} \\
16 hours \\
\epsfxsize=5in \epsfysize=.75in
\epsfbox{axp7.4.eps} \\
4 hours \\
\epsfxsize=5in \epsfysize=.75in
\epsfbox{axp7.5.eps} \\
64 minutes \\
\epsfxsize=5in \epsfysize=.75in
\epsfbox{axp7.6.eps} \\
16 minutes \\
\epsfxsize=5in \epsfysize=.75in
\epsfbox{axp7.7.eps} \\
4 minutes \\
\epsfxsize=5in \epsfysize=.75in
\epsfbox{axp7.8.eps} \\
60 seconds 
\end{tabular}
}
\caption{Visual representation of self-similarity.  Each graph plots
load versus time and ``zooms in'' on the middle quarter}
\label{fig:visual_self_sim}
\end{figure}

The key observation of this section is that each of the load traces
exhibits a high degree of self-similarity.  This is significant for
two reasons.  First, it means that load varies significantly across
all time-scales --- it is not the case that increasing smoothing of
the load quickly tames its variance. A job will have a great deal of
variance in its running time regardless of how long it is.  Second, it
suggests that load is difficult to model and predict well.  In
particular, self-similarity is indicative of long memory, possibly
non-stationary stochastic processes such as fractional ARIMA
models~\cite{HOSKING-FRACT-DIFF,GRANGER-JOYEUX-FRACT-DIFF,BERAN-STAT-LONG-RANGE-METHODS},
and fitting such models to data and evaluating them can be quite
expensive.

Figure~\ref{fig:visual_self_sim} visually demonstrates the self
similarity of the axp7 load trace.  The top graph in the figure plots
the load on this machine versus time for 10 days.  Each subsequent
graph ``zooms in'' on the highlighted central 25\% of the previous
graph, until we reach the bottom graph, which shows the central 60
seconds of the trace.  The plots are scaled to make the behavior on
each time scale obvious.  In particular, over longer time scales,
wider scales are necessary.  Intuitively, a self-similar signal is one
that looks similar on different time scales given this rescaling.
Although the behavior on the different graphs is not identical, we can
clearly see that there is significant variation on all time scales.

An important point is that as we smooth the signal (as we do visually
as we ``zoom out'' toward the top of the page in
Figure~\ref{fig:visual_self_sim}), the load signal strongly
resists becoming uniform.  This suggests that low frequency components
are significant in the overall mix of the signal, or, equivalently,
that there is significant long range dependence.  It is this property
of self-similar signals that most strongly differentiates them and
causes significant modeling difficulty.   

Self-similarity is more than intuition --- it is a well defined
mathematical statement about the relationship of the autocorrelation
functions of increasingly smoothed versions of certain kinds of
long-memory stochastic processes.  These stochastic processes model
the sort of the mechanisms that give rise to self-similar signals.  We
shall avoid a mathematical treatment here, but interested readers may
want to consult~\cite{SELF-SIM-ETHERNET-LELAND-SIGCOMM93}
or~\cite{IMPACT-SELF-SIM-NETWORK-PERF-ANALYSIS-MORIN} for a treatment
in the context of networking
or~\cite{FRACTAL-STRUCTURES-CHAOS-SCI-MED} for its connection to
fractal geometry, or~\cite{BERAN-STAT-LONG-RANGE-METHODS} for a
treatment from a linear time series point of view.


\begin{figure}[t]
\centerline{
\epsfxsize=5in
\epsfbox{self-sim-explain.epsf}
}
\caption{Meaning of the Hurst parameter in frequency domain. }
\label{fig:explain_hurst}
\end{figure}

The degree and nature of the self-similarity of a sequence is
summarized by the Hurst parameter, $H$~\cite{HURST51}.  Intuitively,
$H$ describes the relative contribution of low and high frequency
components to the signal.  Consider Figure~\ref{fig:explain_hurst},
which plots the periodogram (the magnitude of the Fourier transform)
of the axp7 load trace of August 19, 1997 on a log-log scale.  In this
transformed form, we can describe the trend with a line of slope
$-\beta$ (meaning that the periodogram decays hyperbolically with
frequency $\omega$ as $\omega^{-\beta}$.  The Hurst parameter $H$ is
then defined as $H=(1+\beta)/2$.  As we can see in
Figure~\ref{fig:explain_hurst}, $H=0.5$ corresponds to a line of zero
slope.  This is the uncorrelated noise case, where all frequencies
make a roughly equal contribution.  As $H$ increases beyond $0.5$, we
see that low frequencies make more of a contribution.  Similarly, as
$H$ decreases below $0.5$, low frequencies make less of a
contribution.  $H>0.5$ indicates self-similarity with positive near
neighbor correlation, while $H<0.5$ indicates self-similarity with
negative near neighbor correlation.


%The idea of self-similarity and the statistic used to characterize it
%arose not in theory, but from the experimental physical sciences and
%engineering.  Hurst~\cite{HURST51} invented the notion and the
%statistic (now called the ``Hurst parameter'') in order to
%characterize how the amount of water in reservoirs changes over time.
%He then applied it to summarize many other data series arising from
%natural processes.  The Hurst parameter $H$ ranges from zero to one.
%$H=0.5$ indicates no self-similarity and is expected of a series
%generated from a stationary probability distribution where a sample is
%completely independent of all previous samples.  $H>0.5$ indicates
%self-similarity with positive near neighbor correlation, while $H<0.5$
%indicates self-similarity with negative near neighbor correlation.  



\begin{figure}[t]
\centerline{
\absfigsize 
\epsfbox{hurst_mean.epsf}  \hspace{0.15in}
}
\caption{Mean Hurst parameter estimates for traces +/- one standard
deviation. }
\label{fig:load_trace_self_sim_mean}
\end{figure}


We examined each of the 38 load traces for self-similarity and
estimated each one's Hurst parameter.  There are many different
estimators for the Hurst
parameter~\cite{TAQQU-HURST-ESTIMATORS-COMPARE}, but there is no
consensus on how to best estimate the Hurst parameter of a measured
series.  The most common technique is to use several Hurst parameter
estimators and try to find agreement among them.  The four Hurst
parameter estimators we used were R/S analysis, the variance-time
method, dispersional analysis, and power spectral analysis.  A
description of these estimators as well as several others may be found
in~\cite{FRACTAL-STRUCTURES-CHAOS-SCI-MED}.  The advantage of these
estimators is that they make no assumptions about the stochastic
process that generated the sequence.  However, they also cannot
provide confidence intervals for their estimates.  Estimators such as
the Whittle estimator~\cite{BERAN-STAT-LONG-RANGE-METHODS} can provide
confidence intervals, but a stochastic process model must be assumed
over which an $H$ can be found that maximizes its likelihood.

We implemented R/S analysis and the variance-time method using Matlab
and performed dispersional analysis and power spectral analysis by
hand on graphs prepared via Matlab.  We validated each method by
examining degenerate series with known $H$ and series with specific
$H$ generated using the random midpoint displacement method.  The
dispersional analysis method was found to be rather weak for $H$
values less than about $0.8$ and the power spectral analysis method
gave the most consistent results.

Figure~\ref{fig:load_trace_self_sim_mean} presents our estimates of
the Hurst parameters of each of the 38 load traces.  In the graph,
each central point is the mean of the four estimates, while the
outlying points are at +/- one standard deviation.  Notice that for
small $H$ values, the standard deviation is high due to the inaccuracy
of dispersional analysis.  The important point is that the mean Hurst
estimates are all significantly above the $H=0.5$ line and except for
three traces with low $H$ values, their +/- one standard deviation
points are also above the line.  

The traces exhibit self-similarity with Hurst parameters ranging from
0.63 to 0.97, with a strong bias toward the top of that range.  On
examination of the correlation coefficients (CCs) of
Figure~\ref{fig:cross_correl}, we can see that the Hurst parameter has
some correlation with mean load (CC=0.45), standard deviation of load
(CC=0.58), and is inversely correlated with the max/mean load ratio
(CC=-0.49).  The latter seems somewhat surprising.

As we discussed above, self-similarity has implications for load
modeling and for load smoothing.  The long memory stochastic process
models that can capture self-similarity tend to be expensive to fit to
data and evaluate.  Smoothing the load (by mapping large units of
computations instead of small units, for example) may be misguided
since variance may not decline with increasing smoothing intervals as
quickly as quickly as expected.  Consider smoothing load by averaging
over an interval of length $m$.  Without long range dependence
($H=0.5$), variance would decay with $m$ as $m^{-1.0}$, while with
long range dependence, as $m^{2H-2}$ or $m^{-0.74}$ and $m^{-0.06}$
for the range of Hurst parameters we measured.

\section{Epochal behavior}
\label{sec:epochal_behavior}

The key observation in this section is that while load changes
in complex ways, the manner in which it changes remains relatively
constant for relatively long periods of time.  We refer to a period of
time in which this stability holds true as an epoch.  For example, the
load signal could be a 0.25 Hz Sin wave for a minute and a 0.125 Hz
sawtooth wave the next minute --- each minute is an epoch.  That these
epochs exist and are long is significant because it suggests that
modeling load can be simplified by modeling epochs separately from
modeling the behavior within an epoch.  Similarly, it suggests a two
stage prediction process.

The spectrogram representation of a load trace immediately highlights
the epochal behavior we discuss in this section. A spectrogram
combines the frequency domain and time domain representations of a
time series.  It shows how the frequency domain changes locally (for a
small segment of the signal) over time.  For our purposes, this local
frequency domain information is the ``manner in which [the load]
changes'' to which we referred earlier.  To form a spectrogram, we
slide a window of length $w$ over the series, and at each offset $k$,
we Fourier-transform the $w$ elements in the window to give us $w$
complex Fourier coefficients.  Since our load series is real-valued,
only the first $w/2$ of these coefficients are needed.  We form a plot
where the $x$ coordinate is the offset $k$, the $y$ coordinate is the
coefficient number, $1,2,\dots,w/2$ and the $z$ coordinate is the
magnitude of the coefficient.  To simplify presentation, we collapse
to two dimensions by mapping the logarithm of the $z$ coordinate (the
magnitude of the coefficient) to color.  The spectrogram is basically
a midpoint in the tradeoff between purely time-domain or
frequency-domain representations.  Along the $x$ axis we see the
effects of time and along the $y$ axis we see the effects of
frequency.

\begin{figure}[t]
\centerline{
\begin{tabular}{c}
\epsfxsize=5in \epsfysize=1.5in
\hspace{0.1in} \epsfbox{axp7_19_day_time.eps}
\\
\epsfxsize=5in \epsfysize=1.5in
\epsfbox{axp7_19_day_spec.eps}
\end{tabular}
}
\caption{Time domain and spectrogram representations of load for host axp7 on August 19, 1997. }
\label{fig:specgrams}
\end{figure}


Figure~\ref{fig:specgrams} shows a representative case, a 24 hour
trace from the PSC host axp7, taken on August 19, 1997.  The top graph
shows the time domain representation, while the bottom graph shows the
corresponding spectrogram representation.
%~\footnote{Notice that there
%is evidence of foldover from a too-low sampling rate.  We are
%convinced that the problem lies in how often DUX actually updates its
%load value and not in our sampling.  One explanation is that DUX does
%not update the load average whenever the length of the ready queue
%changes.  Another is that it does not update it when running device
%drivers or other in-kernel tasks.} 
What is important to note (and which occurs in all the spectrograms of
all the traces) are the relatively wide vertical bands.  These
indicate that the frequency domain of the underlying signal stays
relatively stable for long periods of time.  We refer to the width of
a band as the duration of that frequency epoch.

That these epochs exist can be explained by programs executing
different phases, programs being started and shut down, and the like.
The frequency content within an epoch contains energy at higher
frequencies because of events that happen on smaller time-frames, such
as user input, device driver execution, and daemon execution.  

\begin{figure}[t]
\centerline{
\absfigsize
\epsfbox{sdev_mean_epoch.epsf}  \hspace{0.15in}
}
\caption{Mean epoch length +/- one standard deviation.}
\label{fig:sdev_mean_epoch}
\end{figure}


We can algorithmically find the edges of these epochs by computing the
difference in adjacent spectra in the spectrogram and then looking for
those offsets where the differences exceed a threshold.  Specifically,
we compute the sum of mean squared errors for each pair of adjacent
spectra.  The elements of this error vector are compared to an epsilon
(5\% here) times the mean of the vector.  Where this threshold is
exceeded, a new epoch is considered to begin.  Having found the
epochs, we can examine their statistics.
Figure~\ref{fig:sdev_mean_epoch} shows the mean epoch length and the
+/- one standard deviation levels for each of the load traces.  The
mean epoch length ranges from about 150 seconds to over 450 seconds,
depending on which trace.  The standard deviations are also relatively
high (80 seconds to over 600 seconds.)  It is the Production Cluster
class which is clearly different when it comes to epoch length.  The
machines in this class tend to have much higher means and standard
deviations than the other machines.  One explanation might be that
most of the machines run batch-scheduled scientific jobs which may
well have longer computation phases and running times.  However, two
of the interactive machines also exhibit high means and standard
deviations. Interestingly, there is no correlation of the mean epoch
length and standard deviation to the mean load or to the Hurst
parameter, as can be seen in Figure~\ref{fig:cross_correl}.

\begin{figure}[t]
\centerline{
\relfigsize
\epsfbox{cov_mean_epoch.epsf}
}
\caption{COV of epoch length and mean epoch length.}
\label{fig:cov_mean_epoch}
\end{figure}

The standard deviations of epoch length in
Figure~\ref{fig:sdev_mean_epoch} give us an absolute measure of the
variance of epoch length.  Figure~\ref{fig:cov_mean_epoch} shows the
coefficient of variance (COV) of epoch length and mean epoch length
for each trace.  The COV is our relative measure of epoch length
variance.  Unlike with load (Section~\ref{sec:stats}), these absolute
and relative measures of epoch length variance are {\em both}
positively correlated with the mean epoch length.  In addition, the
correlation is especially strong (CC=0.99 for standard deviation and
CC=0.95 for COV).  As epoch length increases, it varies more in both
absolute and relative terms.  The statistical properties of epoch
lengths are independent of the statistical properties of load.

The implication of long epoch lengths is that the problem of
predicting load may be able to be decomposed into a segmentation problem
(finding the epochs) and a sequence of smaller prediction subproblems
(predicting load within each epoch.)


\subsubsection*{Lack of seasonality:}

It is important to note that the epochal behavior of the load traces
is not the same thing as seasonality in the time series analysis
sense~\cite{INTRO-TIME-SERIES-FORECASTING,BOX-JENKINS-TS}.
Seasonality means that there are dominant (or at least visible)
underlying periodic signals on top of which are layered other signals.
It is not unreasonable to expect seasonality given that other
studies~\cite{WORKSTATION-AVAIL-STATS-CONDOR} have found that
availability of compute resources to change regularly over the hours
of the working day and the days of the working week.  However,
examination of the power spectrums and autocorrelations of the load
traces suggests that load does {\em not} exhibit seasonality.  We feel
this does not contradict the earlier results --- the fluctuation of
resources simply is not sufficiently periodic to qualify as
seasonality in the strict time series sense.


\section{Conclusions and future work}

We collected long, fine grain load measurements on a wide variety of
machines at two different times of the year.  The results of an
extensive statistical analysis of these traces and their implications
are the following: 
\begin{enumerate}

\item{The traces exhibit low means but very high
standard deviations and maximums.  This implies that these machines
have plenty of cycles to spare to execute jobs, but the execution time
of these jobs will vary drastically.}

\item{Absolute measures of variation are positively correlated with
the mean while relative measures are negatively correlated.  This
suggests that it may not be unreasonable to map tasks to heavily
loaded machines under some performance metrics.}

\item{The traces have complex, rough, and often multimodal distributions that
are not well fitted by analytic distributions such as the normal or
exponential distributions, which are particularly inept at capturing
the tail of the distribution.  This implies that modeling and
simulation that assumes convenient analytical load distributions may
be flawed.  Trace-driven simulation may be preferable.}

\item{Load is strongly correlated over time, but has a
broad, almost noise-like frequency spectrum.  This implies that
history-based load prediction schemes are feasible, but that linear
methods may have difficulty.  Realistic load models should capture
this dependence, or trace-driven simulation should be used.}

\item{The traces are self-similar with relatively high Hurst
parameters. This means that load smoothing will decrease variance much
more slowly than expected.  It may be preferable to migrate tasks in
the face of adverse load conditions instead of waiting for the
adversity to be ameliorated over the long term.  Self-similarity also
suggests certain modeling approaches such as fractional ARIMA
models~\cite{HOSKING-FRACT-DIFF,GRANGER-JOYEUX-FRACT-DIFF,BERAN-STAT-LONG-RANGE-METHODS}
and non-linear models which can capture the self similarity property.}

\item{The traces display epochal behavior in that the local frequency
content of the load signal remains quite stable for long periods of time
and changes abruptly at the boundaries of such epochs.  This suggests
that the problem of predicting load may be able to be decomposed into
a sequence of smaller subproblems.}

\end{enumerate}

We are currently exploring how well the hierarchy of linear time series
models~\cite{BOX-JENKINS-TS} perform for short term prediction of
load.  Part of this work involves quantifying the benefit of capturing
the self-similarity of load using fractional ARIMA
models~\cite{GRANGER-JOYEUX-FRACT-DIFF,HOSKING-FRACT-DIFF,BERAN-STAT-LONG-RANGE-METHODS}.
 Initial results show that fractional models provide as much as a 40\%
improvement in prediction error and rarely perform worse than their
non-fractional counterparts.  

The context of our work is dynamically mapping real-time tasks to
hosts in a distributed environment. In this context, we are interested
in load models for classifying environments and hosts, synthesizing
realistic execution times in a simulator, and for predicting execution
times on remote hosts.  We will gladly share the load traces we have
collected with members of the research community.


%\small 
%\tinyspacing
\bibliography{texbib}
%\normalsize
%\normspacing

\end{document}



