\documentclass[twocolumn]{article}
%\documentclass{article}
\usepackage{epsf}
\usepackage{times}
%\usepackage{changebar}
%\usepackage{lscape}

\input{restore}

\renewcommand{\dbltopfraction}{.9}
\renewcommand{\topfraction}{.9}
\renewcommand{\textfraction}{.1}
\renewcommand{\floatpagefraction}{.9}
%\renewcommand{\dbltopnumber}{1}
\renewcommand{\dblfloatpagefraction}{.9}

%\renewcommand{\baselinestretch}{1.37}

\newcounter {subsubsubsection}[subsubsection]
\def\thesubsubsubsection {\thesubsubsection.\arabic{subsubsubsection}}
\def\subsubsubsection {\subsubsection*}






\begin{document}

\bibliographystyle{acm}

\title{Statistical Behavior of Host Load in a Distributed Environment\\
{\em Extended Abstract}
}

\author{
Peter A. Dinda \hspace{.5in} David R. O'Hallaron \\
Carnegie Mellon University\\
5000 Forbes Avenue \\
Pittsburgh, PA 15213 \\
(pdinda,droh)@cs.cmu.edu \\
}


\maketitle

%\begin{abstract}
%Understanding how host load changes over time is instrumental in
%forecasting the execution time of tasks or jobs, such as in dynamic
%load balancing and distributed soft real-time systems.  To improve
%this understanding, we collected week-long, 1 Hz resolution Unix load
%average traces on 38 different machines including production and
%research cluster machines, compute servers, and desktop workstations.
%We present a detailed statistical analysis of these traces here. Two
%significant new results are that load is self-similar and that it
%displays epochal behavior.  All of the traces exhibit a high degree of
%self similarity with Hurst parameters ranging from .63 to .97,
%strongly biased toward the top of that range.  The traces also display
%epochal behavior in that the local frequency content of the load
%signal remains quite stable for long periods of time (150-450 seconds
%mean) and changes abruptly at epoch boundaries.
%\end{abstract}

\section{Introduction}
\label{sec:intro}

Deciding which host in a distributed environment to map a computation
to is at the heart of dynamic load balancing of parallel
programs~\cite{JADE-LANGUAGE, DOME-TECHREPORT, SIEGELL-DYNAMIC-LB-94},
distributed soft real-time
systems~\cite{REAL-TIME-MANIFESTO-JENSEN,REALTIME-CORBA-WHITEPAPER,PREDICTABLE-NETWORK-COMPUTING},
and other systems.  Making this decision entails forecasting, either
implicitly or explicitly, the behavior of load on the remote hosts for
the duration of the computation.  Because the behavior of host load
has such a profound influence on execution time, it is important to
understand its behavior in real systems.  Unfortunately, there is
little extant work on the statistical properties of load and there are
no available load traces for practitioners to work from.  This paper
attempts to remedy that situation.  We collected week-long, 1 Hz
resolution traces of the Unix load average on 38 different machines
that we classify as production and research cluster machines, compute
servers, or desktop workstations.  The paper presents a detailed
statistical analysis of these traces.

The traces exhibit remarkable statistical behavior.  Except for four
machines, all of the systems are quite underutilized over the long
run, and thus have very low mean loads.  However, load can vary
drastically.  The standard deviation of load (an absolute measure of
variation) is typically quite high and grows with increasing mean
load.  However, the coefficient of variation (a relative measure of
variation) actually shrinks with increasing mean load.  Further, while
desktop machines tend to have lower absolute variation, they have
significantly higher relative variation than the other classes.
Similarly, maximum load increases with mean load, while maximum load
relative to mean load decreases with mean load.  Finally, while load
maximums don't vary much across machines, desktop machines have much
higher maximums relative to their mean loads.

All of the traces exhibit a high degree of self-similarity with Hurst
parameters ranging from .63 to .97, strongly biased toward the top of
that range.  This tells us that load varies in complex ways on all
time scales.  As a result, lessening the effects of load behavior by
smoothing (not migrating long-running tasks, for example) may not be
very effective.  Another implication is that load may be difficult to
model well without many parameters.  For our traces, the Hurst
parameter measure correlates with the mean and standard deviation of
load, but is anti-correlated with maximum load relative to mean load.

The traces also display what we call epochal behavior: the local
frequency content of the load signal remains quite stable for long
periods of time (150-450 seconds mean) and changes abruptly at epoch
boundaries.  This suggests that the problem of forecasting load may be
partitionable into a sectioning problem (finding the epochs) and a,
hopefully simpler, time series forecasting problem (modeling the
behavior of load within the epoch.)  Epochal behavior is different
from seasonality: epoch lengths have significant variation and there
is significant and complex behavior within each epoch.  In fact, our
traces do not appear to exhibit seasonality at all.  Surprisingly, the
statistical properties of epoch lengths are independent of the
statistical properties of load and the Hurst parameter.  Both absolute
and relative measures of the variance in epoch length are strongly
correlated to the mean epoch length, however.



\section{Measurement methodology}
\label{sec:load_trace_measurement}

%
% It is important that we make it clear that these systems were all
% running the same OS (DUX) 
%

The load on a Unix system at a given instant is the number of
processes that are running or are ready to run.  This number is
sampled at some rate and the samples over some interval are
exponentially averaged to produce a load average which can be accessed
from a user program.

We developed a small tool to sample the Digital Unix (DUX) one minute
load average at one second intervals and log the resulting time series
to a data file.  The 1 Hz sample rate was arrived at by subjecting DUX
systems to varying loads and sampling at progressively higher rates to
determine the rate at which DUX actually updated the value.  DUX
updates the value at a rate of $1/2$ Hz, thus we chose a 1 Hz sample
rate.  We ran this trace collection tool on 38 hosts belonging to the
Computing, Media, and Communication Laboratory (CMCL) at CMU and the
Pittsburgh Supercomputing Center (PSC) for slightly more than one week
in late August, 1997.  All of the hosts were DEC Alpha machines and
they form four classes:
\begin{itemize}
\item{{\em Production Cluster}: 13 hosts of the PSC's ``Supercluster'',
including two front-end machines (axpfea, axpfeb), four interactive
machines (axp0 through axp3), and seven batch machines scheduled by a
DQS~\cite{KAPLAN-COMPARE-CLUSTER-SW} variant (axp4 through axp10.)}
\item{{\em Research Cluster}: eight machines in an experimental
cluster in the CMCL (manchester-1 through manchester-8.)} 
\item{{\em Compute servers}: two high performance large memory
machines used by the CMCL group as compute servers for simulations and
the like (mojave and sahara.)}
\item{{\em Desktops}: 15 desktop workstations owned by members of the
CMCL (aphrodite through zeno.)} 
\end{itemize}


\section{Statistics}
\label{sec:stats}

\begin{figure}
\centerline{
\epsfxsize=3.1in
\epsfbox{correl_table.eps}
}
\caption{Correlation coefficients (CCs) between all of the discussed
statistical properties.  The highlighted cells are particularly
interesting.}
\label{fig:cross_correl}
\end{figure}

This section focuses on several statistical properties of the load
traces.  Figure~\ref{fig:cross_correl}, which shows the correlation
coefficients (CCs) between each of the properties discussed in the
paper, may be helpful.  Each cell of the table is the CC between the
row property and the column property, computed over the 38 load
traces.  The highlighted cells are of particular interest.

One would expect that the mean load on clusters and servers is
typically higher than that on workstation machines, and our
measurements agree.  Figure~\ref{fig:sdev_mean_load} shows the mean
load and the +/- one standard deviation points for each of the traces.
Clearly, our desktop machines are significantly less loaded than the
other machines.  Interestingly, we also see a lack of uniformity
within each class, despite the long duration of the traces.  This is
most clear among the Production Cluster machines, where four machines
seem to be doing most of the work.  This lack of uniformity even over
long time scales shows clear opportunity for load balancing or
resource management systems.


\begin{figure}
\centerline{
\epsfxsize=3.1in
\epsfbox{sdev_mean_load.epsf}
}
\caption{Mean load +/- one standard deviation.}
\label{fig:sdev_mean_load}
\end{figure}

\begin{figure}
\centerline{
\epsfxsize=3.1in
\epsfbox{cov_mean_load.epsf}
}
\caption{COV of load and mean load.}
\label{fig:cov_mean_load}
\end{figure}

From Figure~\ref{fig:sdev_mean_load} we can also see that desktop
machines have smaller standard deviations than the other machines.
What is actually happening is that the standard deviation, which shows
how much load varies in {\em absolute} terms, is reasonably strongly
correlated with mean load (CC=0.53 from
Figure~\ref{fig:cross_correl}.)  However, this is not true in {\em
relative} terms as measured by the coefficient of variation (the
standard deviation divided by the mean, abbreviated as the COV.) This
can be seen in Figure~\ref{fig:cov_mean_load}, which plots the COV and
the mean load for each of the load traces.  Here we can see that
desktop machines, with their smaller mean loads, have large COVs
compared to the other classes of machines.  The CC between mean load
and the COV of load is -0.49.  As load increases, it varies {\em less}
in relative terms and {\em more} in absolute terms.


\begin{figure}
\centerline{
\epsfxsize=3.1in
\epsfbox{min_max_mean_load.epsf}
}
\caption{Minimum, maximum, and mean load.}
\label{fig:min_max_mean_load}
\end{figure}

\begin{figure}
\centerline{
\epsfxsize=3.1in
\epsfbox{max_to_mean_load.epsf}
}
\caption{Maximum to mean load ratios and mean load.}
\label{fig:max_to_mean_load}
\end{figure}

This difference between absolute and relative behavior also holds true
for the maximum load.  Figure~\ref{fig:min_max_mean_load} shows the
minimum, maximum, and mean load for each of the traces.  The minimum
load is, not surprisingly, zero in every case.  Although it is hard to
tell from the graph, maximum load is actually correlated to mean load
(CC=0.60.)  Figure ~\ref{fig:max_to_mean_load} plots the ratio
max/mean and the mean load for each of the traces.  It is clear that
this relative measure is inversely related to mean load, and
Figure~\ref{fig:cross_correl} shows that the CC is -0.36.  It is also
important to notice that while the differences in maximum load between
the hosts are rather small (Figure~\ref{fig:min_max_mean_load}), the
differences in the max/mean ratio can be quite large
(Figure~\ref{fig:max_to_mean_load}.)  Desktops are clearly more
surprising machines in relative terms.

The implication of these differences between relative and absolute
measures of variability is that lightly loaded (low mean load) hosts
are not always preferable over heavily loaded hosts.  For example, if
the performance metric is itself a relative one (that the execution
time not vary much relative to the mean execution time, say), then a more
heavily loaded host may be preferable.

\section{Self-similarity}
\label{sec:self-sim}

Each of the load traces exhibits a high degree of self-similarity.
This is significant for two reasons.  First, it means that load varies
significantly across all time-scales --- it is not the case that
increasing smoothing of the load quickly tames its variance. A job
will have a great deal of variance in its running time regardless of
how long it is.  Second, it suggests that load is difficult to model
and forecast well.  In particular, self-similarity is indicative of
long memory, possibly non-stationary stochastic processes, and fitting
such models to data typically requires a large number of parameters.

Intuitively, a self-similar signal is one that looks similar on
different time scales.  If a self-similar signal looks jagged over a
10 second time scale, it will also look jagged over a 1 second or 100
second time scale.  Another way of visualizing this is to imagine
smoothing the signal with an increasingly long window.  While the
smoothing will quickly blur small variations in the signal, the signal
will strongly resist becoming uniform (constant) with increasing
window size.  Statistically, the variance in increasingly smoothed
versions of a series resulting from sampling the signal will only
slowly decline.  Contrast this with a series derived from picking from
a typical stationary probability distribution.  The variance of such a
series will decline quickly with smoothing.  

Self-similarity is more than intuition --- it is a well defined
mathematical statement about the relationship of the autocorrelation
functions of increasingly smoothed versions of certain kinds of
long-memory stochastic processes.  These stochastic processes model
the sort of the mechanisms that give rise to self-similar signals.  We
shall avoid a mathematical treatment here, but interested readers may
want to consult~\cite{SELF-SIM-ETHERNET-LELAND-SIGCOMM93}
or~\cite{IMPACT-SELF-SIM-NETWORK-PERF-ANALYSIS-MORIN} for a treatment
in the context of networking
or~\cite{FRACTAL-STRUCTURES-CHAOS-SCI-MED} for its connection to
fractal geometry.


The degree and nature of the self-similarity of a sequence is
summarized by the Hurst parameter, $H$, which ranges from zero to
one~\cite{HURST51}. $H=0.5$ indicates no self-similarity and is
expected of a series generated from a stationary probability
distribution where a sample is completely independent of all previous
samples.  $H>0.5$ indicates self-similarity with positive near
neighbor correlation, while $H<0.5$ indicates self-similarity with
negative near neighbor correlation. 



%The idea of self-similarity and the statistic used to characterize it
%arose not in theory, but from the experimental physical sciences and
%engineering.  Hurst~\cite{HURST51} invented the notion and the
%statistic (now called the ``Hurst parameter'') in order to
%characterize how the amount of water in reservoirs changes over time.
%He then applied it to summarize many other data series arising from
%natural processes.  The Hurst parameter $H$ ranges from zero to one.
%$H=0.5$ indicates no self-similarity and is expected of a series
%generated from a stationary probability distribution where a sample is
%completely independent of all previous samples.  $H>0.5$ indicates
%self-similarity with positive near neighbor correlation, while $H<0.5$
%indicates self-similarity with negative near neighbor correlation.  


\begin{figure}
\centerline{
\epsfxsize=3.1in
\epsfbox{hurst_mean.epsf}
}
\caption{Mean Hurst parameter estimates for traces +/- one standard
deviation. }
\label{fig:load_trace_self_sim_mean}
\end{figure}


We examined each of the 38 load traces for self-similarity and
estimated each one's Hurst parameter.  Unfortunately, there is no
consensus on how to best estimate the Hurst parameter of a series.
The most common technique is to use several Hurst parameter estimators
and try to find agreement among them.
%~\footnote{Another approach is to
%use a maximum likelihood estimator which makes assumptions about the
%underlying process, but can then provide confidence intervals.}  
%This is the approach we took in our analysis of the load traces.  
The four Hurst parameter estimators we used were R/S analysis, the
variance-time method, dispersional analysis, and power spectral
analysis.  A description of these estimators as well as several others
may be found in~\cite{FRACTAL-STRUCTURES-CHAOS-SCI-MED}.  
{\em The full paper will describe these and MLE-based estimators here.}

%We
%implemented R/S analysis and the variance-time method using Matlab and
%performed dispersional analysis and power spectral analysis by hand on
%graphs prepared via Matlab.  We validated each method by examining
%degenerate series with known $H$ and series with specific $H$
%generated using the random midpoint displacement method.  The
%dispersional analysis method was found to be rather weak for $H$
%values less than about $0.8$ and the power spectral analysis method
%gave the most consistent results.


Figure~\ref{fig:load_trace_self_sim_mean} presents our estimates of
the Hurst parameters of each of the 38 load traces.  In the graph,
each central point is the mean of the four estimates, while
the outlying points are at +/- one standard deviation.  Notice that
for small $H$ values, the standard deviation is high due to the
inaccuracy of dispersional analysis.  The mean Hurst estimates are all
significantly above the $H=0.5$ line and except for three traces with
low $H$ values, their +/- one standard deviation points are also above
the line.

The traces exhibit self-similarity with Hurst parameters ranging from
0.63 to 0.97, with a strong bias toward the top of that range.  On
examination of the correlation coefficients (CCs) of
Figure~\ref{fig:cross_correl}, we can see that the Hurst parameter has
some correlation with mean load (CC=0.45), standard deviation of load
(CC=0.58), and is inversely correlated with the max/mean load ratio
(CC=-0.49).  The latter seems somewhat surprising.

As discussed above, self-similarity has implications for load modeling
and for load smoothing.  The long memory stochastic process models
that can capture self-similarity tend to have many parameters and are
difficult to fit to data.  Smoothing the load (mapping large units of
computations instead of small units) may be misguided since variance
may not decline with increasing smoothing intervals as quickly as
quickly as expected.

\section{Epochal behavior}
\label{sec:epochal_behavior}

The observation we make in this section is that while load changes
in complex ways, the manner in which it changes remains relatively
constant for relatively long periods of time.  We refer to a period of
time in which this stability holds true as an epoch.  For example, the
load signal could be a 0.25 Hz Sin wave for a minute and a 0.125 Hz
sawtooth wave the next minute --- each minute is an epoch.  That these
epochs exist and are long is significant because it suggests that
modeling load can be simplified by modeling epochs separately from
modeling the behavior within an epoch.  Similarly, it suggests a two
stage forecasting process.

The spectrogram representation of a load trace immediately highlights
the epochal behavior we discuss in this section. A spectrogram
combines the frequency domain and time domain representations of a
time series.  It shows how the frequency domain changes locally (for a
small segment of the signal) over time.  For our purposes, this local
frequency domain information is the ``manner in which [the load]
changes'' to which we referred earlier.  To form a spectrogram, we
slide a window of length $w$ over the series, and at each offset $k$,
we Fourier-transform the $w$ elements in the window to give us $w$
complex Fourier coefficients.  Since our load series is real-valued,
only the first $w/2$ of these coefficients are needed.  We form a plot
where the $x$ coordinate is the offset $k$, the $y$ coordinate is the
coefficient number, $1,2,\dots,w/2$ and the $z$ coordinate is the
magnitude of the coefficient.  To simplify presentation, we collapse
to two dimensions by mapping the logarithm of the $z$ coordinate (the
magnitude of the coefficient) to color.  The spectrogram is basically
a midpoint in the tradeoff between purely time-domain or
frequency-domain representations.  Along the $x$ axis we see the
effects of time and along the $y$ axis we see the effects of
frequency.

\begin{figure}
\begin{tabular}{c}
\epsfxsize=3.1in
\epsfbox{axp7_19_day_time.eps}
\\
\epsfxsize=3in
\epsfbox{axp7_19_day_spec.eps}
\end{tabular}
\caption{Time domain and spectrogram representations of load for host axp7 on August 19, 1997. }
\label{fig:specgrams}
\end{figure}


Figure~\ref{fig:specgrams} shows a representative case, a 24 hour
trace from the PSC host axp7, taken on August 19, 1997.  The top graph
shows the time domain representation, while the bottom graph shows the
corresponding spectrogram representation.~\footnote{Notice that there
is evidence of foldover from a too-low sampling rate.  We are
convinced that the problem lies in how often DUX actually updates its
load value and not in our sampling.  One explanation is that DUX does
not update the load average whenever the length of the ready queue
changes.  Another is that it does not update it when running device
drivers or other in-kernel tasks.} What is important to note (and
which occurs in all the spectrograms of all the traces) are the
relatively wide vertical bands.  These indicate that the frequency
domain of the underlying signal stays relatively stable for long
periods of time.  We refer to the width of a band as the duration of
that frequency epoch.

That these epochs exist can be explained by programs executing
different phases, programs being started and shut down, and the like.
The frequency content within an epoch contains energy at higher
frequencies because of events that happen on smaller time-frames, such
as user input, device driver execution, and daemon execution.  

\begin{figure}
\centerline{
\epsfxsize=3.1in
\epsfbox{sdev_mean_epoch.epsf}
}
\caption{Mean epoch length +/- one standard deviation.}
\label{fig:sdev_mean_epoch}
\end{figure}


We can algorithmically find the edges of these epochs by computing the
difference in adjacent spectra in the spectrogram and then looking for
those offsets where the differences exceed a threshold.  Specifically,
we compute the sum of mean squared errors for each pair of adjacent
spectra.  The elements of this vector are compared to an epsilon (5\%
here) times the mean of the vector.  Where this threshold is exceeded,
a new epoch is considered to begin.  Having found the epochs, we can
examine their statistics.  Figure~\ref{fig:sdev_mean_epoch} shows the
mean epoch length and the +/- one standard deviation levels for each
of the load traces.  The mean epoch length ranges from about 150
seconds to over 450 seconds, depending on which trace.  The standard
deviations are also relatively high (80 seconds to over 600 seconds.)
It is the Production Cluster class which is clearly different when it
comes to epoch length.  The machines in this class tend to have much
higher means and standard deviations than the other machines.  One
explanation might be that most of the machines run batch-scheduled
scientific jobs which may well have longer computation phases and
running times.  However, two of the interactive machines also exhibit
high means and standard deviations. Interestingly, there is no
correlation of the mean epoch length and standard deviation to the
mean load or to the Hurst parameter, as can be seen in
Figure~\ref{fig:cross_correl}.

\begin{figure}
\centerline{
\epsfxsize=3.1in
\epsfbox{cov_mean_epoch.epsf}
}
\caption{COV of epoch length and mean epoch length.}
\label{fig:cov_mean_epoch}
\end{figure}

The standard deviations of epoch length in
Figure~\ref{fig:sdev_mean_epoch} give us an absolute measure of the
variance of epoch length.  Figure~\ref{fig:cov_mean_epoch} shows the
coefficient of variance (COV) of epoch length and mean epoch length
for each trace.  The COV is our relative measure of epoch length
variance.  Unlike with load (Section~\ref{sec:stats}), these absolute
and relative measures of epoch length variance are {\em both}
positively correlated with the mean epoch length.  In addition, the
correlation is especially strong (CC=0.99 for standard deviation and
CC=0.95 for COV).  As epoch length increases, it varies more in both
absolute and relative terms.  The statistical properties of epoch
lengths are independent of the statistical properties of load.

The implication of long epoch lengths is that forecasting load may be
partitionable into the segmentation problem of finding the epochs, and a,
hopefully simpler, time series forecasting problem within each epoch.


\subsubsection*{Lack of seasonality}

It is important to note that the epochal behavior of the load traces
is not the same thing as seasonality.  Seasonality means that there is
a dominant (or at least visible) underlying periodic signal on top of
which is layered other signals
(See~\cite{INTRO-TIME-SERIES-FORECASTING}.)  It is certainly plausible
that load could have a seasonal component.  Possible sources of
seasonality include the daily behavior of human users and program
structures such as loops.  However, examination of the power spectrums
and autocorrelations of the load traces suggests that load does {\em
not} exhibit seasonality.  The extreme variability of epoch lengths
also attests to the lack of seasonality.  It is perhaps not so
surprising that this is so in light of the self-similarity
results of Section~\ref{sec:self-sim}.  


\section{Conclusions}

Relative and absolute measures of the variability of load are related
differently with mean load.  This implies that lightly loaded (low
mean load) hosts are not always preferable over heavily loaded hosts.
For example, if the performance metric is itself a relative one (that
the execution time not vary much relative to the mean execution time,
say), then a more heavily loaded host may be preferable.  

Load is self-similar, which means that it behaves in complex ways on
all timescales.  The long-memory stochastic process models that can
capture this property are difficult to fit to data and tend to have
many parameters.  
%Further, the structure of many such models has
%little obvious connection to the physical processes which result in
%the load signal. Models which aggregate ``traffic'' from sources with
%heavy tailed distributions, which seem physically intuitive for
%modeling self-similar Ethernet
%traces~\cite{SELF-SIM-HIGH-VAR-ETHERNET-WILLINGER-SIGCOMM95}, may also
%more satisfying in the context of host load.  
Self-similarity also implies that load smoothing by mapping large
units of computation may be misguided.  The variance in load may not
decline as quickly as expected with increasing scheduling granularity.

The local frequency content of the load signal remains quite stable
for long periods of time (epochs) and changes abruptly at epoch
boundaries.  This suggests that the problem of forecasting or
modeling load may be partitionable into a sectioning problem (finding
the epochs) and a, hopefully simpler, time series forecasting problem
(modeling the behavior of load within the epoch.)  

%We are looking at modeling host load in these and other ways.  Our
%interest is not only in the models' ability to generate load traces
%that exhibit the statistical properties we have found, but also in
%their predictive power and how they can provide a framework for
%classifying existing load traces.  The context of our work is
%dynamically mapping real-time tasks to hosts in a distributed
%environment so that they meet their deadlines.  In this context, we
%are interested in load models for classifying environments and hosts,
%synthesizing realistic execution times in a simulator, and for
%predicting execution times on remote hosts.  We will gladly share the
%load traces we have collected with members of the research community.


\small 

\bibliography{texbib}

\normalsize

\end{document}



