\section{Preliminary results: load trace analysis}
\label{sec:load_traces}

This section discusses an extensive analysis of a large number of fine
grain host load traces which show a number of properties that strongly
suggest that history-based prediction is feasible for predicting load.
These properties include self-similarity, epochal behavior of local
frequency content, and relatively simple phase space behavior.
Self-similarity implies a long-memory process where the future depends
(in part) on a long stretch of the past in a possibly complex way.
This dependence provides hope that history-based prediction can work,
but instills the fear that the relationship between the past and the
future may be too complex to glean.  However, the epochal behavior of
the local frequency content of the load signal suggests that the
relationship may be found tractably.  By epochal behavior of the local
frequency content, we mean that the manner in which the load varies
(the frequency content of the load signal) is relatively stable for
long periods of time (epochs.)  Because each epoch is long, we have an
extended opportunity to understand the stable behavior of the load
within it.  That the phase space behavior of load is relatively simple
further suggests that it can be predicted.  In particular, load
appears to spend much of its time in a small number of well
differentiated states, and the manner in which it switches between
states seems to be qualitatively simple.  Other researchers have found
some of these properties in network
measurements~\cite{SELF-SIM-ETHERNET-LELAND-SIGCOMM93,
SELF-SIM-HIGH-VAR-ETHERNET-WILLINGER-SIGCOMM95,GARRETT-VBR-VIDEO-MODELLING}.

We begin by describing our measurement methodology.  Next, we give an
overview of our analysis, and then we discuss in depth the
self-similarity, epochal behavior of local frequency content, and the
phase space behavior of the load traces.

\subsection{Measurement methodology}
\label{sec:load_trace_measurement}

We developed a small tool to sample the Digital Unix one minute load
average at one second intervals and log the time series to a data
file.  The 1 Hz sample rate was arrived at by subjecting DUX
systems to varying loads and sampling at progressively higher rates to
determine the rate at which DUX actually updated the value.  DUX
updates the value at a rate of $1/2$ Hz, thus we chose a 1 Hz
sample rate.  We ran this trace collection tool on 38 hosts in the
CMCL and PSC environments for slightly more than one week in late
August, 1997.  All of the hosts were DEC Alpha machines and they form
four groups:
\begin{itemize}
\item{{\em PSC Hosts}: 13 hosts of the PSC's ``Supercluster'',
including two front-end machines, four interactive machines, and eight
batch machines scheduled by a DQS~\cite{KAPLAN-COMPARE-CLUSTER-SW}
variant.}
\item{{\em Manchester testbed}: eight machines in an experimental
cluster in the CMCL.} 
\item{{\em Compute servers}: two high performance large memory
machines used by the CMCL group as compute servers for simulations and
the like.}
\item{{\em Desktops}: 15 desktop workstations owned by members of the
CMCL.} 
\end{itemize}

\subsection{Overview of analysis}

The purpose of our analysis of these traces was to determine whether
history based prediction was feasible for load.  We began by computing
gross statistical properties for each trace, including mean load,
variance, minimum, and maximum.  We do not report on these properties
here due to space.  However, one of these, mean load, is used, along
with mean epoch length (Section~\ref{sec:epochal_behavior}), to
partition the machines into useful groups in
Figure~\ref{fig:host_groups}.  Next we examined the overall power
spectra (periodograms) and autocorrelation functions of the
traces. Each trace did show strong close range autocorrelation, which
bodes well for prediction, and led us to the more fruitful analyses.
In the remaining sections, we will concentrate on these, namely
self-similarity, spectrogram analysis, and chaotic dynamics.

\subsection{Self-similarity and Hurst parameter estimation}
\label{sec:self-sim}

Intuitively, a self-similar signal is one that looks similar on
different time scales.  If a self-similar signal looks jagged over a
10 second time scale, it will also look jagged over a 1 second or 100
second time scale.  Another way of visualizing this is to imagine
smoothing the signal with an increasingly long window.  While the
smoothing will quickly blur small variations in the signal, the signal
will strongly resist becoming uniform (constant) with increasing window
size.  Statistically, the variance in increasingly smoothed versions
of a series resulting from sampling the signal will only slowly
decline.  Contrast this with a series derived from picking from a
typical stationary probability distribution.  The variance of such a
series will decline quickly with smoothing.

Self-similarity is more than intuition --- it is a well defined
mathematical statement about the relationship of the autocorrelation
functions of increasingly smoothed versions of certain kinds of
long-memory stochastic processes.  These stochastic processes model
the sort of the mechanisms that give rise to self-similar signals.  We
shall avoid a mathematical treatment here, but interested readers
may want to consult~\cite{SELF-SIM-ETHERNET-LELAND-SIGCOMM93}
or~\cite{IMPACT-SELF-SIM-NETWORK-PERF-ANALYSIS-MORIN} for a treatment
in the context of networking
or~\cite{FRACTAL-STRUCTURES-CHAOS-SCI-MED} for its connection to
fractal geometry.  What is important is that self-similarity can arise
from a long-memory process.

The idea of self-similarity and the statistic used to characterize it
arose not in theory, but from the experimental physical sciences and
engineering.  Hurst~\cite{HURST51} invented the notion and the
statistic (now called the ``Hurst parameter'') in order to
characterize how the amount of water in reservoirs changes over time.
He then applied it to summarize many other data series arising from
natural processes.  Many natural processes can be modelled in terms of
the long-memory stochastic processes mentioned earlier, with the Hurst
parameter describing the degree of dependence on the past.  The Hurst
parameter $H$ ranges from zero to one.  $H=0.5$ indicates no
self-similarity and is expected of a series generated from a
stationary probability distribution where a sample is completely
independent of all previous samples.  $H>0.5$ indicates
self-similarity with positive near neighbor correlation, while $H<0.5$
indicates self-similarity with negative near neighbor correlation.  It
is important to note that $H \neq 0.5$ does not imply that future
depends entirely on the past, but rather that it has some dependence
on the past.  


\begin{figure}
\centerline{
\epsfxsize=4in
\epsfbox{load_trace_self_sim_mean.epsf}
}
\caption{Mean Hurst parameter estimates for traces +/- one standard
deviation. }
\label{fig:load_trace_self_sim_mean}
\end{figure}

We examined each of the 38 load traces for self-similarity and
estimated each one's Hurst parameter.  Unfortunately, there is no
consensus on how to best estimate the Hurst parameter of a series.
The most common technique is to use several Hurst parameter estimators
and try to find agreement among them.  This is the approach we took in
our analysis of the host load series.  The four Hurst parameter
estimators we used were R/S analysis, the variance-time method,
dispersional analysis, and power spectral analysis.  A description of
these estimators as well as several others may be found
in~\cite{FRACTAL-STRUCTURES-CHAOS-SCI-MED}.  We implemented R/S
analysis and the variance-time method using Matlab and performed
dispersional analysis and power spectral analysis by hand on graphs
prepared via Matlab.  We validated each method by examining degenerate
series with known $H$ and series with specific $H$ generated using the
random midpoint displacement method.  The dispersional analysis method
was found to be rather weak for $H$ values less than about $0.8$ and
the power spectral analysis method gave the most consistent results.

Figure~\ref{fig:load_trace_self_sim_mean} presents our estimates of
the Hurst parameters of each of the 38 load traces.  In the graph,
each central point is the mean of each of the four estimators, while
the outlying points are at +/- one standard deviation.  Notice that
for small $H$ values, the standard deviation is high due to the
inaccuracy of dispersional analysis.

The traces exhibit self-similarity with Hurst parameters ranging from
0.63 to 0.97, with a strong bias toward the top of that range.  This
is a promising result for prediction because it strongly suggests that
future load values have a strong, albeit complex, dependence on past
load values.  This is not surprising for relatively quiescent
machines, but it also makes sense for heavily loaded machines such as
the PSC hosts, which spend long periods of time executing in
particular program phases, then different programs, and on and on in
to coarser and coarser timescales.


%\begin{figure}
%\centerline{
%\epsfxsize=4in
%\epsfbox{load_trace_self_sim_all.epsf}
%}
%\caption{Hurst parameter estimates for traces.}
%\label{fig:load_trace_self_sim_all}
%\end{figure}



%In the following, we have adapted the standard mathematical treatment
%of self-similarity given in~\cite{SELF-SIM-ETHERNET-LELAND-SIGCOMM93}
%in the context of formal stochastic processes to the context of a load
%time series, $\langle l_i \rangle$, albeit one with infinite extent.

%The load time series $\langle l_i \rangle$'s corresponding autocorrelation series
%$\langle r_k \rangle$ is given by
%\begin{displaymath}
%r_k = \sum_{j=-\infty,\dots,-1,0,+1,\dots,+\infty} l_j l_{j+k}
%\end{displaymath}
%For many series, including our load signals, $\langle r_k \rangle$ has roughly the
%form
%\begin{displaymath}
%r_k \propto a_1 k^{-\beta}, \mathrm{as} k \rightarrow \infty
%\end{displaymath}
%where $\beta > 0$ and $a_1$ is a positive constant.  

%Now form the aggregated series $\langle l_i \rangle^{(m)}$ where $\langle l_i \rangle^{(1)}$ is
%the same as the original series and 
%\begin{displaymath}
%l_i^{(m)} = 1/m \sum_{k=im}^{im+m-1} l_i^{(1)} \mathrm{for} m=2,3,\dots
%\end{displaymath}
%$\langle l_i \rangle^{(m)}$ is
%simply the original series averaged over non-overlapping windows of
%length $m$.  For each $m$, $\langle l_i \rangle^{(m)}$ has a corresponding
%autocorrelation series $\langle r_k \rangle^{(m)}$ which takes the form $r_k^{(m)} \propto
%a_m k^{-\beta_m}$ as $k \rightarrow \infty$.  

%The series $\langle l_i \rangle$ is called exactly second-order self-similar if
%$r_k^{(m)} = r_k$ for all $k$ and $m$ and asymptotically second-order
%self-similar if $r_k^{(m)}$ agrees with $r_k$ for large $m$ and $k$.
%In either case, we will see $\beta_m = \beta$ in our power-law
%approximation to the form of the autocorrelation series. 



%\begin{figure}
%\centerline{
%\epsfxsize=4in
%\epsfbox{machine_mean_load_vs_mean_epoch.epsf}
%}
%\label{fig:machine_mean_load_vs_mean_epoch}
%\caption{Mean load vs mean epoch length}
%\end{figure}

\subsection{Epochal behavior of local frequency content}
\label{sec:epochal_behavior}

The observation we make in this section is that while the load changes
in complex ways, the manner in which it changes remains relatively
constant for relatively long periods of time.  We refer to a period of
time in which this stability holds true as an epoch.  For example, the
load signal could be a 0.25 Hz Sin wave for a minute and a 0.125 Hz
sawtooth wave the next minute --- each minute is an epoch.  That these
epochs are long is what is important.  Long epochs are preferable over
short epochs since there is more time in a long epoch to learn its
behavior and use that knowledge to predict its future.  The mean epoch
lengths of our load series range from 150 to over 450 seconds, which
means we may be able to take 1000s of samples of a particular epoch
given the response times we require of our interactive applications
(Section~\ref{sec:app_model}.)  

This observation ties nicely with the self-similarity results of the
previous section.  Those results support the feasibility of
history-based prediction, but also raise concerns about its
practicality.  Could the long-memory process behavior implied by the
result in fact be too complex to usefully predict?  Could long past
values affect current values so strongly that the history we would
have to keep would be so long and the computation from it so expensive
that prediction would become impractical?  That the epochs we find are
so long suggests that this is not the case and that history-based
prediction is tractable.

A spectrogram representation of a load trace immediately highlights
the epochal behavior we discuss in this section. A spectrogram
combines the frequency domain and time domain representations of a
time series.  It shows how the frequency domain changes locally (for a
small segment of the signal) over time.  For our purposes, this local
frequency domain information is the ``manner in which [the load]
changes'' to which we referred earlier.  To form a spectrogram, we
slide a window of length $w$ over the series, and at each offset $k$,
we Fourier-transform the $w$ elements in the window to give us $w$
complex Fourier coefficients.  Since our load series is real-valued,
only the first $w/2$ of these coefficients are needed.  We form a plot
where the $x$ coordinate is the offset $k$, the $y$ coordinate is the
coefficient number, $1,2,\dots,w/2$ and the $z$ coordinate is the
magnitude of the coefficient.  To simplify presentation, we collapse
to two dimensions by mapping the logarithm of the $z$ coordinate (the
magnitude of the coefficient) to color.  The spectrogram is basically
a midpoint in the tradeoff between purely time-domain or
frequency-domain representations.  Along the $x$ axis we see the
effects of time and along the $y$ axis we see the effects of
frequency.

\begin{figure}
\begin{tabular}{cc}
\epsfxsize=3in
\epsfbox{axp7_19_day_time.eps}
&
\epsfxsize=3in
\epsfbox{themis_20_2pm_time.eps} \\
\epsfxsize=3in
\epsfbox{axp7_19_day_spec.eps}
&
\epsfxsize=3in
\epsfbox{themis_20_2pm_spec.eps} \\
axp7, August 19, 1997 (24 hours)  &  themis, August 20, 1997 2pm (one hour)
\end{tabular}
\caption{Time domain and spectrogram representations of load.}
\label{fig:specgrams}
\end{figure}

Figure~\ref{fig:specgrams} shows the time domains (top) and
spectrograms (bottom) of two series.  The series on the left is a 24
hour trace on a PSC host, axp7, taken on August 19th and the series on
the right is a one hour trace on a desktop, themis, taken from 2-3pm
on August 20th.  We examined the spectrograms of all the load traces
both visually and analytically.  These two are presented as
representative cases.  Also note that there is evidence of foldover
from a too-low sampling rate in these signals.  We are convinced that
the problem lies in how often DUX actually updates its load value and
not in our sampling.  One explanation is that DUX does not update the
load average whenever the length of the ready queue changes.  Another
is that it does not update it when running device drivers or other
in-kernel tasks.

What is important to note about all of the spectrograms is what is
very obvious about each of these examples - the relatively wide
vertical bands.  These indicate that the frequency domain of the
underlying signal stays relatively the same for long periods of time.
We refer to the width of a band as the duration of that frequency
epoch. 

That these epochs exist can be explained by programs executing
different phases, programs being started and shut down, and the like.
The frequency content within an epoch contains energy at higher
frequencies because of events that happen on smaller time-frames, such
as user input, device driver execution, and daemon execution.  

We can algorithmically find the edges of these epochs by computing the
difference in adjacent spectra in the spectrogram and then looking for
those offsets where the differences exceed a threshold.  Having found
the epochs, we can examine their statistics.  The mean epoch length
ranges from about 150 seconds to over 450 seconds, depending on which
trace.  The standard deviations are also relatively high (80 seconds
to over 600 seconds.)

These long periods of relative stability are periods in which it is
likely that a history-based prediction algorithm can collect enough
information to characterize the stable frequency domain and make
useful predictions.  Also, because the epochs are long, we expect that
after the prediction algorithm has converged, there will remain enough
time to make use of its predictions before it needs to be retrained
for the next epoch.  Detecting the start of a new epoch quickly is an
important problem.  Querying a companion management information
database or registry that keeps track of the coarse grain behavior of
programs, such as when new programs enter the environment and when new
program phases begin, may be one way to address it.

%The behavior also suggests that a meta algorithm
%may be necessary, which can detect the start of a new epoch quickly.


\subsection{Phase space behavior}
\label{sec:phase_space}

\begin{figure}
\begin{tabular}{cc}
\epsfxsize=3in
\epsfbox{axp7.phase.epsf}
&
\epsfxsize=3in
\epsfbox{themis.phase.epsf}
\end{tabular}
\caption{Phase space behavior.}
\label{fig:phase_space}
\end{figure}

An initial examination of the phase space behavior~\footnote{Here we
adopt the terminology of chaotic dynamics, where phase is the
continuous equivalent to state.  This should not be confused with the
phase component of the Fourier transform of a signal.} of several of
the load traces we have collected suggests that they arise from
relatively simple chaotic systems.  Plots of the systems' trajectories
through their phase space suggest that the systems spend most of their
time in a small number of states.  Animations show that the patterns
of transitions between states are often relatively simple.  While the
analysis presented here is far less detailed than for the previous
sections, it nonetheless strongly supports that load can be predicted
using history-based methods.  Intuitively, the fewer the number of
states the system has and the simpler its state transitions, the
easier it is to predict.

In chaotic dynamics, a basic tool is a plot of the trajectory of the
system through its phase space~\cite{CHAOTIC-FRACTAL-DYNAMICS-MOON}.
The dimensions of a phase space are the values in the system that
change over time. If the system being examined is chaotic, meaning
that small differences in the initial position in the phase space
result in large differences in the trajectory, then the trajectory
will tend to be quite complex and never exactly repeat itself, however
it may be self-similar.  A nonchaotic system will present a far
simpler trajectory that repeats itself.  Although the trajectories of
chaotic systems are inherently more complex, they can often be
characterized by ``strange attractors'' about which the trajectory
orbits.

Trajectories are easier to see in a multidimensional space, but our
datasets are unfortunately one dimensional.  To form a two dimensional
space, we plot the load time series against itself with a single unit
delay - that is, we plot $l_2,l_3,\dots$ versus $l_1, l_2,\dots$.  The
series $\langle (l_i,l_{i+1}) \rangle$ forms the trajectory through
the space.  This technique is known as the pseudo-phase space
method~\cite{CHAOTIC-FRACTAL-DYNAMICS-MOON}.  Phase space plots for
two load traces are given in Figure~\ref{fig:phase_space}.

Another way of looking at these plots is as to how well the load at
time $t$ predicts the load at time $t+1$.  For a perfect prediction,
we would expect to see a single line of unity slope and zero
intercept.  For a linear relationship, we would expect to see
a single line of some other slope and intercept.  Instead, we see
several lines and a number of scattered points.  The vast majority of
the data is in the lines.  Clearly, if we knew each line's slope and
intercept, which line we were on, and that we would remain on it
during the measurement of the next load value, we could make a perfect
prediction of the next load value from the current one.  Prediction
would then involve finding the lines, which does not appear to be
especially difficult, and determining when the system moves from one
line to another.

The lines represent the few states the system spends the majority of
its time in. The state transitions the system makes are quite clear
when we animate its trajectory through the phase space.~\footnote{To
see an example animation, cd to /afs/cs/usr/pdinda/public and run
demo.csh.  You will need matlab. }  The trajectory forms clear
geometric figures, mostly triangles, whose vertices are on the lines
we noted earlier.  The state transitions and the movement of the
geometric figures they aggregate to, appear eminently predictable.
The centers of mass of these figures could be interpreted to be
strange attractors.  These features suggest strongly that
history-based prediction is feasible for load.







