\comment{ 

This dissertation shows that real-time scheduling advisors based on
explicit resource-oriented prediction using statistical signal
processing are both feasible and powerful.  Such advisors can be useful
aids to distributed interactive applications.

}


\chapter{Introduction}
\label{chap:intro}

Users demand responsiveness from interactive applications such as
scientific visualization tools, image editors, modeling tools based on
physical simulations, and games.  Such applications react to
aperiodically arriving messages that arise from user actions.  In
response to each message, the application program executes a task
whose computation creates visual and aural feedback for the user.
This feedback helps determine the subsequent actions of the user, and
thus subsequent tasks.  The aperiodicity in message arrival is a
result of having a ``human in the control loop.''  To be responsive,
the application must execute the task induced by each message as
quickly as the user reasonably expects.  This timeliness requirement
can be easily expressed as a deadline for the task.  We assume that
the application is resilient in the face of missed deadlines.

At one point, creating applications that were responsive in this sense
was relatively easy since the user's machine was very predictable from
the point of view of the programmer.  Today, however, the user's
machine is becoming increasingly less predictable due to operating
system jobs, daemons, other users' jobs, and the like.  These external
factors make the actual computation rate the programmer can expect
vary in complex ways.  Further, the user's machine may simply not be
fast enough or have enough memory to perform some computations
responsively.

On the other hand, the user's machine is no longer alone---there are
many other hosts on the local area network which it can talk to.  As
the overheads of remote execution facilities such as remote procedure
call (RPC) systems, object request brokers (ORBs), and distributed
shared memory (DSM) systems decline, it becomes more appealing to
execute tasks remotely to achieve the responsiveness that interactive
applications require.  Interactive applications are thus becoming
distributed interactive applications.  Unfortunately, the overwhelming
majority of networks and hosts do not provide any sort of resource
reservations, or even priority-based scheduling that a programmer
could build upon to make a group of hosts behave in a more predictable
fashion than an individual host.  However, these shared, unreserved
environments can be measured with increasingly sophisticated tools.

The ability to run a task on any host in the environment greatly
increases the {\em opportunity} for the task to meet its deadline.
However, to exploit this opportunity, the application must {\em
choose} an appropriate host, which can be difficult.  A real-time
scheduling advisor is a middleware service that advises the application
as to the host where the task's deadline is most likely to be met.  It
may also provide additional information, such as the predicted running
time of the task on the proffered host, which the application can use
to adapt in other ways.  

A real-time scheduling advisor bases its advice on resource
measurements, the application's characterization of the task's
resource demands, and the required deadline.  Because reservations are
unavailable in the computing environment, this advice comes with no
guarantees---a real-time scheduling advisor operates on a best-effort
basis.  The usefulness of this service then depends on its measured
performance.  This dissertation shows that the measured performance of
a real-time scheduling advisor running in real computing environments
can be quite impressive.  The real-time scheduling advisors we
developed can greatly increase the probability that a task's deadline
is met.  Furthermore, they can accurately predict the performance of
tasks {\em before} they are run, thus giving the application a chance
to adapt in different ways when insufficient resources are available
to meet the original deadline.  Finally, they can introduce
appropriate randomness into their scheduling decisions, thus allowing
advisors to operate obliviously of each other with a low probability
of disastrous interaction due to unforeseen feedback loops.

The design space for real-time scheduling advisors is vast, but most
designs involve predicting the performance, either explicitly or
implicitly, of the task on each of the prospective hosts.  The
performance is determined predominantly by resource availability.
Designs that use explicit prediction are sub-divided into
resource-oriented prediction approaches and application-oriented
prediction approaches.  Resource-oriented approaches predict future
resource availability using information available about the resource.
These predictions of resource availability, and the task's resource
demands are then supplied to a model that estimates the task's
performance.  Application-oriented approaches predict task performance
directly using application information such as the performance of
previous tasks.

This dissertation argues for basing real-time scheduling advisors on
explicit resource-oriented prediction, specifically on the prediction
of resource signals.  This approach has much to recommend it over the
application-oriented prediction approach: 
\begin{itemize}
\item It scales better.
\item Multiple applications or nodes of a single application can easily share the same predictions.
\item It operates independently of application execution
and thus can always provide the latest information about any resource.
\item It can provide the basis for other kinds of scheduling advisors and
quality of service predictors.
\item It can more easily leverage advances in the statistical signal processing and time series analysis communities.
\end{itemize}

In contrast to the application-oriented approach, the
resource-oriented approach predicts quantities that are at some remove
from those that concern the application.  This dissertation
demonstrates that it is possible to span this gap between resource
predictions and task performance.  This enables effective real-time
scheduling advisors based on explicit resource-oriented prediction.

The core of the dissertation describes the design, implementation, and
evaluation of a real-time scheduling advisor for compute-bound tasks.
The advisor is based on explicit resource-oriented prediction using
the techniques of linear time series analysis to predict available CPU
time.  The resource signal is host load (specifically, the Digital
Unix five second load average), which we found to correlate very well
with available CPU time.  

Our explicit resource-oriented prediction approach is based on
statistical signal processing of resource signals.  Resource signals
are sequences of periodic measurements that are strongly correlated
with the availability of some underlying resource.  We exploit linear
time series analysis (for the most part) to characterize resource
signals and find appropriate predictors for them.  We developed a
methodology for the process and a toolkit that facilitates carrying
out the methodology and implementing on-line resource prediction
systems for new resource signals.

The main contribution of this dissertation is to show that the
resource-oriented prediction approach can work---that the application
of the resource signal methodology to host load results in useful
predictions that can be projected up to the application in a manner
that is sufficient to drive adaptation decisions.  The projection
takes the form of a query interface through which an application can
request a qualified prediction (in the form of a confidence interval)
for the running time of a task.  This fundamental information is
useful for controlling many different adaptation mechanisms to pursue
many different goals.  We show that it is sufficient to control one
particular mechanism (choice of host) to achieve one particular goal
(meeting a deadline).  Effectively, we show that real-time scheduling
advisors based on explicit resource-oriented prediction using
statistical signal processing are both feasible and powerful.

In the remainder of this chapter, we first describe the application
domain for which real-time scheduling advisors are intended and
provide examples of applications from this domain.  Next, we describe
the characteristics of the computing environment we target, and the
scheduling problem induced by running our applications in such
environments.  After this, we outline the design space for real-time
scheduling advisors, illustrating the advantages and disadvantages of
both the resource-oriented and application-oriented approaches.  The
main observation is that the resource-oriented
approach---specifically, an approach based on resource signal
prediction---is preferable to the application-oriented approach
provided that the gap from resource signal predictions to
application-level predictions can be spanned.  We then outline the
prototype real-time scheduling advisor that forms the core of the
dissertation, and describe the resource signal methodology we used to
develop it, and which we shall apply to signals other than host load
in the future.  Finally, we outline the remaining chapters of the
dissertation.

\section{Applications}
\label{sec:into.apps}

A real-time scheduling advisor operates on the behalf of a distributed
interactive application.  In such applications, computation takes the
form of tasks that are initiated by aperiodic user actions.  Each task
produces feedback for the user which helps determine his next action.
For this reason, the task must be completed in timely manner soon
after it has been initiated.  The application expresses this
timeliness requirement in the form of a deadline for each task.  The
application is resilient in the face of a missed deadline.  The
real-time scheduling advisor suggests which of a set of hosts is most
appropriate to run the task.  In addition to this required form of
adaptation, the application may also be able to adapt by changing the
compute requirements of the task or the required deadline.

The remainder of this section describes the characteristics of the
applications that this thesis targets and their execution model in
more detail.  In addition, we present four applications that conform
to the characteristics and the model.

\subsection{Characteristics}
\label{sec:intro.app_chars}

The applications we are interested in supporting have the following
characteristics.

\subsubsection{Interactivity} 
The application is interactive---computation takes the form of tasks
that are initiated or guided by a human being who desires
responsiveness.  Achieving responsiveness amounts to providing timely,
consistent, and predictable feedback to individual user actions.  If
the feedback arrives too late or there is too much jitter for a series
of similar actions, the utility of the program is degraded, perhaps
severely.  Research has shown that people have difficulty using an
interactive application that does not respond in this
manner~\cite{EDITOR-RESPONSE-VARIATIONS-EMBLY-81,PSYCH-LIMITS-ON-SYS-RESPONSE-TIME-KOMATSUBARA-97}.
Our mechanism for specifying timely, consistent, predictable feedback
is the task deadline.

\subsubsection{Aperiodicity}
The application's tasks arise from aperiodic user actions.  The
aperiodicity is due to from the variable ``think time'' humans need to
decide their next action~\cite{ENDO-LATENCY-OSDI96}.  Aperiodicity
precludes such traditional real-time approaches as rate monotonic
algorithms~\cite{RMS-EDF-SCHEDULING-LIU73}.

\subsubsection{Sequentiality}
The application has only a single task outstanding at any time.  The
user needs the feedback produced by that task to determine his next
action and its resulting task.  We discuss ways of loosening this
restriction in the concluding chapter.

\subsubsection{Resilience} 
The application is resilient in the face of missed deadlines to the
degree that it does not require either statistical or deterministic
guarantees from the real-time system.  The inability to meet a
deadline does not make the application unusable, but merely results in
lowered utility.  For example, occasional missing frames in playing
back video do not make the video performance unacceptable.
Consistently missing or irregular frames, however, result in
unacceptable playback.  Resilience is the characteristic that enables
the best-effort semantics of real-time scheduling advisors as opposed
to traditional ``soft'' (statistically
guaranteed)~\cite{LOTTERY-OSDI,PROB-JOB-SCHED-DIST-RT-BESTAVROS,TIME-DRIVEN-SCHED-MODEL-RTS-JENSEN-85}
semantics and ``hard'' (deterministically guaranteed) real-times
semantics~\cite{HARD-RTS-STANKOVIC-BOOK-88,MARS-SURVEY}.

\subsubsection{Distributability} 
The application has been developed with distributed, possibly
parallel, operation in mind.  We assume that it is possible to execute
its tasks on any of the available hosts using, for example, mechanisms
such as CORBA~\cite{CORBA-23-ARCH-SPEC-TR} or Java
RMI~\cite{JAVA-RMI-SPEC}.  Tasks need not be replicable (stateless),
but any data movement required to execute a task on a particular host
must be exposed, for example, through CORBA's Object By Value
mechanism~\cite[Chapter 5]{CORBA-23-ARCH-SPEC-TR}, or by Java's
serialization and reflection mechanisms.  The core of this
dissertation concentrates on compute-bound tasks.

\subsubsection{Adaptability} 
In addition to the adaptability provided by being able to choose on
which of a set of hosts a task will execute, our target applications
may also be able to indirectly adjust the amount of computation and
communication resources a task requires.  Adjustments such as changing
resolution, image quality, frame rate, and deadline may be needed in
order to deal with situations where a task's deadline cannot be met by
choosing the appropriate host to run it, or when longer term changes
in the available resources result in many tasks missing their
deadlines.

\subsubsection{Compute-bound tasks}
In the core of this dissertation we restrict ourselves to compute-bound
tasks.  This is because the proof-of-concept system that we develop
focuses only on CPU time.  The restriction is not inherent to
real-time scheduling advisors based on explicit resource-oriented
prediction using statistical signal processing.  Our RPS toolkit
already includes facilities for network resource prediction based on
Remos~\cite{REMOS-HPDC98} network measurements, and in the concluding
chapter we present our current work in evaluating the prospects for
network prediction.


\subsection{Execution model}
\label{sec:into.exec_model}

\begin{table}
\centerline{
\begin{tabular}{|l|l|}
\hline
Symbol     & Explanation \\
\hline
$t_{now}$  & Arrival time of the task \\
$t_{nom}$  & Compute requirements of the task (nominal running time) \\
$slack$    & Permitted expansion factor of the task \\
$t_{now}+(1+slack)t_{nom}$ & Deadline of the task \\
$t_{act}$  & The actual running time of the task \\
$t_{now}+t_{act}$ & The actual completion time of the task \\
\hline
\end{tabular}
}
\caption[Elements of the task execution model]
{Elements of the task execution model.}
\label{tab:into.exec_model_elements}
\end{table}

Our model interactive application has a very simple main loop that
waits for aperiodically arriving user input and then issues an
appropriate task.  The task runs to completion, producing feedback to
the user.  After the task completes and the feedback is delivered, the
user may produce other input, resulting in a new task being
initiated.   Table~\ref{tab:into.exec_model_elements} summarizes the
symbols we use in this section to describe the execution model.

From the perspective of the real-time scheduling advisor, a task
arrives at the current time, $t_{now}$.  The application specifies the
compute requirements of the task in terms of its nominal running time,
$t_{nom}$.  The nominal running time is the time the task would take
to run on a host with no other extant work.  We assume that the task
is compute-bound and that the communication time required to start the
task running on a particular host is negligible compared to the
nominal time of the task.  In the concluding chapter, we consider how
to extend the resource-oriented approach to include communication
costs where this is not the case.  The application expresses the
deadline of the task in the form of the $slack$, or the maximum
additional expansion of the running time of the task.  Because the
task must be immediately executed, the deadline is then
$t_{now}+(1+slack)t_{nom}$.   The actual running time of the task,
is $t_{act}$, so its completion time is $t_{now}+t_{act}$.  Thus, the
deadline is met if $t_{act} \leq (1+slack)t_{nom}$.  


\subsection{Examples}

In this section, we introduce four examples of applications which
exhibit the characteristics we described earlier and which could be
executed according the execution model.  The examples include
QuakeViz, a scientific visualization tool, OpenMap, a geographic
information services tool, an acoustic modeling tool, and an image
editor.  Of these, we consider QuakeViz in the most detail, including
measurements of the compute requirements of two QuakeViz applications
and a description of the active frame execution environment which we
have helped to develop for such applications.  OpenMap is
representative of applications which use replicated servers.  Other
examples include the mirroring and anycast of web
content~\cite{PERF-CHAR-MIRRORS-INFOCOM99}, and distributed database
systems~\cite{GIFFORD-QUORUM-CONSENSUS-79,GRAY-TP-BOOK}. The acoustic
modeling application is a computer aided design application based on a
physical simulation and can also be seen as a computational steering
application.  Image editing is typical of the large document editing
applications common on personal computers.

The purpose of this section is to illustrate applications in which a
real-time scheduling advisor can provide benefits.  The focus of the
thesis, however, is not on any particular application, but rather
using an explicit resource-oriented prediction-based approach to
solving the general scheduling problem posed by real-time scheduling
advisors.  Our evaluation does not focus on any particular
application, but is based on randomized tasks whose nominal compute
times are chosen to be within the range of interest for the QuakeViz
tasks we describe here.

\subsubsection{QuakeViz}
\label{sec:intro.quakeviz}


The Quake project developed tools to perform detailed simulations of
large geographic areas during strong
earthquakes~\cite{QUAKE-JOURNAL-98} and then applied those tools to
simulate earthquakes in various earthquake prone areas.  These
simulations produce vast amounts of output data.  For example, to
simulate the response of the San Fernando Valley to an aftershock of
the 1994 Northridge Earthquake required a data representation
containing 77 million tetrahedrons, produced over 40 million unknowns
per time step, and resulted in over 6 TB of output data.

Interactive scientific
visualization~\cite{FOCUS-ON-SCIENTIFIC-VISUALIZATION-BOOK} is a
necessary prerequisite in order for humans to make use of these
colossal datasets.  Unfortunately, current visualization systems for
such datasets either require extremely expensive hardware, or are
batch-oriented.  To remedy this situation, the Dv group, a part of the
Quake project which includes this author, is designing and building a
framework for constructing interactive scientific visualizations of
large datasets that can run on shared, unreserved distributed
computing environments~\cite{DV-PRELIM-REPORT-PDPTA99}.
Visualizations of datasets produced by the Quake Project, or QuakeViz
applications, are the first target of the Dv framework.

The Dv framework is based on the active frame model.  Active frames
are a form of active messages~\cite{ACTIVE-MESSAGES} in that they
contain both data and a program for transforming that data.  A
QuakeViz application can be expressed as a flowgraph whose nodes
represent computationally expensive data transformations and whose
edges represent communication.  The flowgraph source is the server
which provides the Quake dataset and its sink is the user's
workstation.  Typically, the flowgraph is linear.  The user initiates
computation by sending an active frame to the server.  The active
frame contains the flowgraph of the computation the user requires, a
specification of the region of the dataset the user is interested in,
and a deadline for when the result must be displayed.  The first node
of the flowgraph executes on the server and copies the necessary data
into the active frame.  The frame then sends itself to the most
appropriate host to execute the next node in the flowgraph.  The frame
uses the real-time scheduling advisor to decide which host is the most
appropriate.  This continues until the last node of the flowgraph has
been executed and the result has been displayed on the user's
workstation. 

\begin{figure}
\centerline{
\begin{tabular}{cc}
\colfigsize
\epsfbox{eps/quakeviz_volviz.eps} &
\colfigsize
\epsfbox{eps/quakeviz_iso.eps} \\
(a) Volume visualization & (b) Isosurface visualization
\end{tabular}
}
\caption[Example QuakeViz applications]
{Example QuakeViz applications: (a) Volume visualization using a
structured grid.  (b) Isosurface visualization using an unstructured grid.}
\label{fig:intro.quake}
\end{figure}

Figure~\ref{fig:intro.quake} shows the flowgraphs of two simple
QuakeViz applications.  More complex flowgraphs are discussed
elsewhere~\cite{DV-PRELIM-REPORT-PDPTA99,DINDA-CASE-BERT-WPDRTS-99}.
In both applications, the dataset produced by the simulation contains
values whose coordinates are based on irregular mesh.  A QuakeViz
application can maintain this unstructured representation throughout
its flowgraph, or it can interpolate the data onto a regular grid in
order to make down-stream processing faster and reduce the volume of
communication.  Figure~\ref{fig:intro.quake}(a) shows a simple volume
visualization that shows the data in the region of interest as a
3-dimensional image.  The data is interpolated to a regular grid in
order to make the rendering computation faster.
Figure~\ref{fig:intro.quake}(b) shows a visualization in which
isosurfaces corresponding to various response intensities in the region
of interest are displayed.  In this case, the data is left in its
unstructured form.

\begin{figure}
\centerline{
\colfigsize
\epsfbox{eps/volumeviz_nsf2_usertime.epsf}
}
\caption[Compute requirements of volume visualization]
{Compute requirements of the volume visualization code of 
Figure~\ref{fig:intro.quake}(a).} 
\label{fig:intro.quakeviz_volume}
\end{figure}

Figure~\ref{fig:intro.quakeviz_volume} shows the compute
requirements for the flowgraph shown in
Figure~\ref{fig:intro.quake}(a) as a function of the size of the
structured grid.  The computation here is measured as the user time
for a sequential version of the visualization written in
vtk~\cite{VTK-BOOK} and running on a 500 MHz Alpha 21164 machine under
Digital Unix 4.0D.  As we can see, the computation involved at each
stage of the flowgraph is significant and is usually dominated by the
interpolation step, which the active frame model permits us to locate
on any host.  The read and interpolation steps must be sited at the
server machine and the user's machine, respectively.

\begin{figure}
\centerline{
\begin{tabular}{cc}
\colfigsize
\epsfbox{eps/isotime.eps}
&
\colfigsize
\epsfbox{eps/rendertime.eps}
\\
(a) isosurface extraction &
(b) rendering \\
\end{tabular}
}
\caption[Compute requirements of isosurface visualization]
{Compute requirements versus timestep of the isosurface visualization
code of Figure~\ref{fig:intro.quake}(b).}
\label{fig:intro.quakeviz_iso}
\end{figure}

The amount of computation performed by the isosurface visualization
code depends not only on the spatial region of interest, but also on
the time step.  Figure~\ref{fig:intro.quakeviz_iso} shows how the
compute requirements of the (a) isosurface extraction and (b)
rendering steps of the isosurface visualization code in
Figure~\ref{fig:intro.quake}(b) vary with the time step for
different problem sizes.  The measured code is a sequential version
written in vtk and running on a 200 MHz Pentium Pro machine under
Microsoft Windows NT 4.0.  Measurements are of the user time.  Again,
we can see a computation (isosurface extraction) which requires
significant CPU time and which can potentially be run on any host in
the environment.  It is also interesting to note that the compute
requirements are quite predictable from time step to time step.


\subsubsection{OpenMap}
\label{sec:intro.openmap}

\begin{figure*}
\centerline{
\epsfxsize=6.5in
\epsfbox{eps/openmap.eps}
}
\caption[Structure of an OpenMap application]
{Structure of an OpenMap application.  Solid arrows represent
the flow of data, while dotted arrows represent user requests.}
\label{fig:intro.openmap}
\end{figure*}

BBN's OpenMap is a architecture for combining geographical information
from a variety of different, separately developed sources in order to
present a unified coherent visual representation, in the form of a
multi-layered map, to the end
user~\cite{OPENMAP-WEB-PAGE,QOS-GROUPWARE-MUTT-OPENMAP-99}.
OpemMap-based applications have been used to help coordinate
U.S. military actions in the former Yugoslavia.

OpenMap consists of four different kinds of components. Geographical
information is provided by third party {\em data sources}, which have
unique interfaces.  A {\em specialist} encapsulates a specific data
source, hiding the details of accessing it behind a uniform
CORBA interface.  The interface is based on sequences of objects
to be drawn.  A specialist has a corresponding {\em layer} that draws
an individual map based on the drawing objects.  Finally, a {\em map
bean} manages a group of layers, overlaying their maps to produce a
single combined map for the user.  Map beans and layers are Java
Beans, which can be conveniently embedded into Java applications. 

In Figure~\ref{fig:intro.openmap}, we show the structure of an example
OpenMap application where information from separate terrain and
political boundary data sources are combined to present the user with
a map of the Boston area. While the structure shown in
Figure~\ref{fig:intro.openmap} appears at first glance to be a pipeline, it
is important to note that it actually operates in a request-response
manner.  Computation happens only when the user decides to change the
{\em projection} of the map (the set of layers and the region of the
planet that is being viewed).  

OpenMap is thus interactive---computation happens as a direct result of a
projection change.  To provide a good user experience, the time from a
projection change to the resulting map display should be short,
consistent, and predictable.  A good abstraction for this requirement
is a deadline placed on the computation initiated by a projection
change.  Achieving such deadlines is challenging because specialists
and data sources may be located at distant sites and run on shared,
unreserved hosts communicating via the Internet.  However, missing
OpenMap deadlines only degrades the user's experience---OpenMap is
resilient.
 
The components of OpenMap were designed from the start to be
physically distributed using CORBA communication mechanisms.  We can
use this enabler to build replicated specialists and data sources, as
we highlight in gray in Figure~\ref{fig:intro.openmap}.  This provides
a choice of which specialist is used to satisfy a projection change
for a given layer.  The real-time scheduling advisor can be used to
decide which replica should be used.  This functionality can even be
hidden from the application by incorporating it into an object quality
of service framework such as BBN's QuO~\cite{QUO-JOURNAL-98}.  In
fact, as a proof of concept, we incorporated the host load prediction
system described in this thesis into QuO as a system condition object
and then developed QuO contracts that effectively represent a
real-time scheduling advisor.  This was then used to select the
appropriate replica of an image server.


\subsubsection{Acoustic CAD}

Acoustic CAD involves designing a space (a room, or a loudspeaker
enclosure, for example) using a CAD tool and being able to hear what
such that space will sound like from different positions within it.
For a given room configuration (room geometry and material properties,
listener positions, and loudspeaker positions), we compute an impulse
response function for each listener/loudspeaker pair.  For a
particular listener/loudspeaker pair, we convolve the music signal
coming from the loudspeaker with the impulse response function, giving
the room-filtered signal the listener would hear from that
loudspeaker.  By doing this for each loudspeaker and summing the
resulting room-filtered signals, we simulate what the listener would
hear given the room configuration~\footnote{Ignoring the listener's
Head Related Transfer Function, Doppler effects during movement, and
other issues that are beyond the scope of this document.  Interested
parties should look at~\cite{BEGAULT-3D-SOUND}.}.

The impulse responses are computed by simulating the wave equation
using a finite difference\\
method~\cite{SMITH-FINITE-DIFF}. In steady
state, periodic convolution and summation occurs to compute the sound
output.  When the user changes the room configuration by moving a
loudspeaker, wall, or himself, the (expensive) computation of the
impulse responses is repeated.  It is these expensive user-initiated
physical simulations that form the tasks for the real-time scheduling
advisor.

As an alternative to sound, the user can view the impulse response
functions directly, or can view the frequency response characteristics
of the room computed from them.  The user repeatedly adjusts the model
parameters (furniture position and composition), simulates the
physical system, and views the results.  The goal is find a set of
parameters that result in a flat response.


\subsubsection{Image editor}

Image editing gives the user tools to manipulate an in-memory visual
image as a whole and in part.  Some tools involve image-processing
operations such as boxcar convolution on large regions of the image,
while others involve emulating real-world tools such as pens, brushes,
or spray paint cans.  It is important to point out that image sizes
are rapidly growing, and the limits imposed by photographic film, drum
scanners, and digital cameras imply that truly vast (hundreds of
megabytes to gigabytes) images will need to be edited.  At the same
time, the functionality of image editing software, in terms of the
sophistication of image filters and how they can be applied is rapidly
advancing.

The typical resolution of color reversal film is 100 lines per
millimeter, which corresponds to 200 pixels per millimeter.  With this
information density, a 35mm slide contains 34.6 million pixels, or
about 138 megabytes of information at 32 bits per pixel.  A medium
format 6 cm by 7 cm slide contains 168 million pixels or 672
megabytes, and the smallest large format slide, 4 inches by 5 inches,
contains 516 million pixels or two gigabytes.  These vast image sizes
mean that even simple transformations result in large amounts of
computation.  However, the resolution at the user's workstation is
limited by the screen resolution, which is much lower.  This results
in large computational requirements combined with potentially low
communication requirements, which encourages a distributed
implementation of image editing.  


\section{Shared, unreserved computing environment}
\label{sec:intro.mach_model}

Our machine model corresponds to the real world computing environments
that most people have access to.  In particular, we assume host
computers interconnected by a local area network.  The host computers
have no centralized scheduler or coordinated scheduling mechanism and
their local schedulers do not support reservations or real-time
scheduling.  Similarly, the network does not support any sort of
reservation scheme.  The hosts execute independent tasks that generate
traffic on the network.  This is, of course, a description of any
modern group of workstations or PCs.  The specific environments we
studied are Digital Alpha-based workstations running Digital Unix 3.2
and 4.0. The software we developed has been ported to a variety of
other Unix systems and Microsoft Windows NT.

In addition to these commonplace features, we also assume that it is
possible to measure the system.  In particular, it must be possible to
acquire all the necessary permissions to record resource signals.  In
the case of the host load signal we use in this work, the baseline
permission required is to run the Unix uptime command.  On the Digital
Unix machines we use, the permissions of a typical user permit the use
of a much faster system call to measure the load, however.  It is also
necessary to have access to a real-time clock, through the Unix
gettimeofday call, for example.


\section{Scheduling problem}
\label{sec:intro.sched_prob}

The scheduling problem that a real-time scheduling advisor attempts to
solve is stated as follows.  The real-time scheduling advisor operates
on the behalf of a single application.  The application needs to run
tasks in response to aperiodically arriving user input.  A task must
run to completion before another task arrives, and the application can
run the task on any of a set of shared, unreserved hosts.  Suppose
that a task arrives at the current time, $t_{now}$, and its compute
requirement, expressed as a nominal running time on a quiescent host,
is $t_{nom}$.  The application wants the task to finish before the
deadline $t_{now}+(1+slack)t_{nom}$, where the $slack$ is the maximum
expansion factor that the application can allow.  The problem the
real-time scheduling advisor must solve is to choose the host from
among the set of available hosts where the deadline is most likely to
be met.  

Ideally, the advisor will also inform the application whether it
believes the deadline can be met on the chosen host.  It may be the
case that there are insufficient resources available on any host to
meet the deadline.  If this is the case, a prediction of the task's running
time or whether it will meet its deadline or not enables the
application to try a different form of adaptation or to change the
deadline.  One of the powerful aspects of the resource-oriented
prediction approach described here is that it can provide this
additional feedback.

It is also important to note that the predictions of running time that
underly this approach are useful in achieving goals other than
meeting deadlines.  This is also an important feature of the
resource-oriented approach.


\section{Design space}
\label{sec:intro.design}

The design space for real-time scheduling advisors that address the
scheduling problem posed in Section~\ref{sec:intro.sched_prob} is
vast.  However, one characteristic that most designs necessarily share
is the use of prediction, in that the advisor picks a host for the
task based on some prediction of what the task's performance {\em will
be} on each of the prospective hosts.  This prediction can either be
implicit or explicit.  Explicit approaches attempt to directly predict
some task performance metric---the task's running time, for
example---for each of the hosts and then choose a host whose predicted
performance is appropriate.  In contrast, implicit approaches simply
assume that the task's performance on each of the prospective hosts
will be ordered according to some task-independent metric on the
hosts, and then choose a host from early in the ordering.

Explicit prediction approaches are generally preferable to implicit
approaches for a number of reasons.  For one, they can provide
additional value to the application.  In particular, some explicit
approaches, such as the one that forms the core of this dissertation,
can inform the application as to whether the deadline is likely to be
met, which provides the application with a chance to modify the task's
requirements.  Another advantage of explicit prediction  is that
it makes it possible to apply the prediction technology developed in
the statistics, signal processing, and artificial intelligence
communities to the problem.  The drawback is the explicit prediction
approach is potentially much more complex than the implicit prediction
approach. 

Within the explicit prediction approach, there remain a vast number of
design choices: What quantity will be predicted?  What measurements
will be used as the basis of the predictions?  What prediction
algorithm will be used?  When will predictions happen?  When will
measurements be made?  How will scheduling decisions be made using the
predictions?  

Although there are many possible answers to these questions, the
primary distinction among explicit approaches is whether they are
application-oriented or resource-oriented.  In the
application-oriented approach, the application measures the
performance of each task it runs and provides this performance data to
the advisor.  The advisor predicts the performance of the next task on
each of the hosts based on this history and then chooses an
appropriate host based on the predictions.  In the resource-oriented
approach, each resource (eg, host) is conceptually responsible for
measuring and predicting its own availability.  When asked to schedule
a task, the advisor collects the latest predictions of resource
availability, uses them to compute predictions of some task
performance metric (eg, running time) for each host, and then chooses
the appropriate host based on those performance predictions.

The explicit application-oriented and resource-oriented prediction
approaches are complementary in their potential advantages and
disadvantages.  The application-oriented approach has the advantage of
operating directly on the performance metrics (whether deadlines are
met, running time) that the application (and advisor) ultimately care
about.  However, the measurements and predictions made using this
approach are entangled with application specifics and so are not
useful to other applications.  This leads to a duplication of effort,
where each application is predicting, in part, the availability of the
same resources.  Furthermore, because a measurement corresponds to a
task, the advisor ``sees'' only a small subset of the computing
environment.  If the task requires multiple resources, the measurement
conflates their individual availabilities, making it even harder to
share measurements.  Even when it is possible to untangle the effects
of application specifics and the availability of other resources, the
resulting measurements of an individual resource are aperiodic and
perhaps infrequent.  In contrast, the resource-oriented approach
measures and predicts resources independently of the application and
measures each resource periodically.  A resource-oriented advisor can
thus make decisions based on up-to-the-minute predictions for all of
the available resources.  Furthermore, different applications can share
these resource predictions just as they share the resources.  However,
unlike in the application-oriented approach, the advisor must
transform these resource predictions into task performance
predictions, and this increases the chance of error.

In the explicit resource-oriented prediction approach it is easy and
powerful to base prediction on (discrete-time) resource signals,
periodic measurements of resource availability.  Periodic measurement
is easy because measurement is decoupled from the application and
instead coupled to the resource.  Prediction of a resource signal
involves mapping from past values of the signal to future values.
This general prediction problem has been and continues to be
extensively studied in a number of different fields including
statistical signal processing, time series analysis, and chaotic
dynamics.  By casting the core of the explicit resource-oriented
prediction approach as a signal prediction problem, we can bring all
of this powerful existing and future machinery to the bear on our
scheduling problem.  In addition to helping us predict resource
signals, these tools also provide a framework for understanding
resource availability and for generating meaningful workloads.

Resource signal predictions are not, in themselves, sufficient to
solve the scheduling problem posed by real-time scheduling advisors.
Such predictions must be reconciled with the resource demands of the
task in order to compute a prediction of the running time on which to
base scheduling decisions.  Our work shows not only that at least one
resource signal (CPU availability as measured by host load) can be
usefully predicted using statistical signal processing, but also that
the gap between these predictions and useful scheduling can indeed be
spanned.


\subsection{Implicit versus explicit prediction}

In a design that uses implicit prediction, past measurements of some
quantity on each of the prospective hosts are used to order the hosts.
The ordering of the hosts with respect to the future performance of
the task is assumed to be the same.  For example, consider a real-time
scheduling advisor based purely on the latest measurement of host
load, which we use for comparison purposes in
Chapter~\ref{chap:rtsched}.  When the task arrives, the advisor
measures the current load on each of the hosts, orders the hosts
according to their load, and then assigns the task to the host with
the least load, assuming that its running time will be minimized on
that host and thus most likely to meet the deadline.

In a design that uses explicit prediction, past measurements are
explicitly transformed into predictions of the task's performance and
then scheduling decisions are made on the basis of these predictions.
The design that forms the core of this dissertation is based on
explicit prediction.  The system uses statistical signal processing to
continuously predict future CPU availability on each of the hosts.  The
advisor uses these predictions to form statistical estimates of the
running time of the task on each of the hosts, and then chooses one of
the hosts where the task is likely to meet its deadline with high
probability.  The measure of task performance need not be the running
time.  In Appendix~\ref{chap:app_pred}, for example, we look at
approaches that use a history of previous successes and failures or a
history of previous running times and deadlines to predict whether a
deadline can be met on a particular host.

The advantage of implicit approaches is an intrinsic simplicity, since
the hosts are merely ordered.  In contrast, explicit approaches
require computing a prediction of some metric of task performance.
However, this value is not only useful to the advisor, but can also be
of use to the application, especially if it indicates that the
deadline can not be met because insufficient resources are available.
In such a case, the application can adjust the resource demands of
the task or its deadline to a more realistic level.  Another advantage
of explicit approaches is that by clearly specifying a prediction
problem, the vast machinery of the statistics, signal processing, and
artificial intelligence communities becomes available to answer it.


\subsection{Application-oriented versus resource-oriented prediction}
\label{sec:intro.orient}

\begin{figure}
\centerline{
\colfigsize
\epsfbox{eps/dependencies_text_more.eps}
}
\caption[Dependencies]{Dependencies in a real-time scheduling advisor}
\label{fig:intro.dependencies}
\end{figure}

To understand the difference between the explicit application-oriented
and resource-oriented prediction approaches, it is useful to consider
the dependencies involved in choosing the appropriate host to run a
task, as shown in Figure~\ref{fig:intro.dependencies}.  The
appropriate host depends on the deadline ($t_{now}+(1+slack)t_{nom}$)
and the running time of the task on each of the hosts.  The running
time of a task on a particular host depends in turn on the resource
demand of the task ($t_{nom}$) and the availability of the resources
needed to run the task on that host (the predicted host load).

Conceptually, the real-time scheduling advisor wants to compute the
``host selection'' node of the tree before the ``resource
availability'' node is available.  In order to do so, it introduces
prediction into some node of the dependence tree.  The predictive node
uses its previous values to predict its current value.  This predicted
value is then propagated to all its dependencies, and so on.  For
example, we could introduce prediction at the ``resource
availability'' node by keeping a history of the availability of some
host.  We would then propagate this prediction upwards to compute a
predicted running time on that host, and then use that running time
prediction to choose an appropriate host.  The other extreme
possibility would be to predict at the ``host selection'' node,
basing our choices on how previous host selections fared.

Introducing prediction becomes increasingly difficult the further down
the tree we go.  The problem is that to propagate a predicted value
upwards through tree requires that we be able to model each of the
transformations along the way.  For example, if we predict at the
``host selection'' level, we can use our predictions directly.  On the
other hand, if we predict at the ``resource availability'' level, we
must transform these resource predictions into running time
predictions, and then use the predicted running times to predict which
host is the most appropriate.

There are several advantages to predicting lower in the tree.  First
of all, the prediction provides more detail to the application.  For
example, predicting at the ``running time'' level or below tells us
not only which host is most appropriate for the task, but also what
the running time will likely be.  This gives the application the
opportunity to modify the task's resource demand or deadline before
the task is even started.  Another advantage is that as we go lower in
the tree, the predictions become useful to an increasingly broad range
of tools.  Predictions at the ``host selection'' level are only
interesting to real-time scheduling advisors.  On the other hand,
predictions of running time are interesting to scheduling advisors
with other goals than real-time.  Of course, one could imagine
transforming predictions high in the tree into values lower in the
tree in order to provide such information.  However, notice that each
transformation as we move upwards in the tree reduces the amount of
information.  This means that the transformations can not be uniquely
reversible and so information gained by by reversing them can not be
as accurate as when measured or predicted directly.  Avoiding
entanglements by predicting deeper in the tree extends down to
individual resources.  Predictions of individual resources are easiest
to share among applications.  The likelihood that two applications
will be interested in the same individual resource is much higher than
that of them being interested in the same group of resources.  A third
advantage is that measurements deeper in the tree have the potential
to be fresher.

The most important distinction to be made on the basis of
Figure~\ref{fig:intro.dependencies} is between application-oriented
and resource-oriented prediction.  In application-oriented prediction,
which corresponds to the host selection and running time levels, each
task execution contributes a single measurement about a single set of
resources to the prediction system.  This means that the number of
sets of resources that are measured, and the frequency with which they
are measured is limited by the application and the user.  Furthermore,
unless each task uses only a single resource, the measurement
entangles the availability of multiple resources.  Even if each task
only uses a single resource or if it is possible to untangle multiple
resources, from the point of view of a single resource, measurements
would not be periodic, This complicates considerably the use of
statistical machinery such as linear time series models.  The primary
advantage of the application-oriented approach is that the
measurements are of quantities that are closer to the metrics that the
application is actually concerned about.

In contrast, in resource-oriented prediction, prediction happens at a
considerable remove from the application, requiring the development of
substantial transformations to predict application-level quantities.
In return, however, measurement and prediction can happen
independently of applications and the results can be easily shared by
multiple applications.  Furthermore, resources can be measured and
predicted periodically, which easily permits the use of most
prediction techniques.  Finally, the resource-oriented real-time
scheduling advisor has current information available for each of the
resources it considers using.  

Although this dissertation focuses on the resource-oriented approach,
we started our research focused on the application-oriented approach.
The results of some of that work are presented in
Appendix~\ref{chap:app_pred}.  With the appropriate prediction
algorithm, we found that application-oriented prediction can be very
effective in limited cases, namely those where the nominal time of the
tasks is fixed and thus implicitly untangled from CPU availability.

Several issues induced our switch to the resource-oriented approach.
Consider the compute-bound case, where a task corresponds to the
measurement of a single resource, CPU availability on one host.  The
first issue was scalability in terms of measurement histories.
Consider $N$ applications running on an $M$ host environment.  In the
application-oriented approach, each application would maintain its own
shared measurement history for every host.  This means that the amount
of measurement history in the system scales as $O(NM)$.  When the
advisor is run in a system with such a shared measurement history, it
needs to collect these measurement histories, resulting in a large
amount of communication.  Of course, each node could maintain its own
local set of histories to avoid the communication, but then the amount
of history scales as $O(NM^2)$, which is even worse.  In contrast,
in the resource-oriented approach the measurement history in the
system scales as $O(M)$.

A second issue that argues against the application-oriented approach
is entanglement.  As we noted earlier, information is reduced as we
climb the dependence tree.  Consider the running time level.  The
running time entangles the resource availability (eg, host load),
which is a quantity we could share between applications, with the
resource demand (eg, nominal time).  The result is that if we measure
the running time of a task which has some nominal time, this value is
only directly applicable to other tasks with the same nominal time.
To make it applicable to other tasks requires that we ``factor out''
the effect of the nominal time.  We found this to be non-trivial in
practice.  With entanglement, the size of the history grows once
again.  In contrast, the resource-oriented approach never requires
this sort of reverse computation, and we found that the forward
computations were possible to do accurately.

A third issue is that the lack of periodicity in the measurements of
the application-oriented approach severely restricted the kinds of
statistical machinery that we could apply to prediction.  At the same
time, we discovered that host load, measured as a periodically sampled
signal, exhibited properties that strongly suggested the use of
techniques that rely on periodic measurements.  By assuming the
periodic measurements possible in the resource-oriented approach, we
were able to develop a methodology, called the resource signal
methodology (Section~\ref{sec:intro.method}), that we were able to
apply not only to host load, but also to network flow bandwidth
(Section~\ref{sec:conc.future}).  The resource signal abstration lets
us leverage old and new work in the fields of time series
analysis~\cite{BOX-JENKINS-TS}, statistical signal
processing~\cite{SIGNAL-ANALYSIS-AND-PRED-BOOK,SIGNAL-PROCESSING-FRACTALS-WAVELET},
chaotic dynamics~\cite{ABARBANEL-CHAOTIC-DATA-BOOK}, artificial
intelligence~\cite{MACHINE-LEARNING-MITCHELL}, and others.
 
Of course, signal-based resource prediction is not in itself
sufficient to implement a real-time scheduling advisor, because it is
necessary to model the running time and host selection portions of
Figure~\ref{fig:intro.dependencies}.  We found that these elements of
the explicit resource-oriented prediction approach were indeed
feasible and can perform well.

\comment{
\subsection{Resource signals}

The resource-oriented approach easily permits periodic measurements of
resource availability because measurement is decoupled from
application execution.  We refer to a sequence of periodic
measurements as a discrete-time {\em resource signal}.  We measure the
resource signal using a sampling process that operates on some
underlying discrete-time or continuous-time signal.  
}

\section{Prototype real-time scheduling advisor}

\begin{figure}
\centerline{
\colfigsize
\epsfbox{eps/rtsa_structure_with_notes.eps}
}
\caption[Structure of resource-prediction-based real-time scheduling advisor]
{The structure of the resource-prediction-based real-time scheduling advisor} 
\label{fig:intro.rtsa_structure}
\end{figure}

The core of this dissertation describes the design, implementation,
and evaluation of a prototype real-time scheduling advisor.  The
prototype advisor schedules compute-bound tasks using explicit
prediction of {\em host load signals}, specifically, the Digital Unix
five second load average.  The low overhead and high performance of
this system demonstrate the power of the explicit resource-oriented
prediction approach.

Figure~\ref{fig:intro.rtsa_structure} shows the architecture of the
system.  At the highest level, the system is divided into two parts: a
library, which is bound to a single application, and a daemon, one of
which runs indendently on each host and can serve multiple
applications.  The library and the daemon can be further decomposed
into independent components that can communicate in a number of ways.
In the figure, a dashed arrow represents stream-oriented communication
between components, while a symmetric pair of arrows represents
request-response communication.

The daemon consists of a host load measurement system, which
periodically measures host load, and a host load prediction system,
which transforms each new measurement into a qualified prediction of
future measurements.  Each of the values in the prediction is
qualified by an estimate of its error and how it correlates with the
error of the other values.  The prediction system uses linear time
series models to predict host load.  It continuously monitors the
prediction accuracy, refitting the model when the accuracy drops below
a threshold.  The daemon stores a short history of the predictions and
makes them available via a request-response protocol.  This matches
the periodic nature of the measurement and prediction systems with
the aperiodic nature of application requests.

The library is easiest to describe from the perspective of an
application request.  A request consists of a nominal running time
($t_{nom}$) , a maximum slack ($slack$), a confidence level, and a
list of hosts.  The confidence level, which ranges from zero to one,
tells the real-time scheduling advisor the minimum probability of
meeting the deadline that the application requires.  The real-time
scheduling advisor transforms this single application request into
requests for predictions of the task's running time on each of the
hosts.  Each of these requests consists of the nominal time and the
confidence level.  To answer a request, the running time advisor
acquires the latest host load prediction from the host's daemon.  It
then uses a statistical model of the host's scheduler to transform the
host load prediction, the nominal time of the task, and the confidence
level into a prediction of the running time of the task on the host.
The prediction contains both an expected running time and a confidence
interval for the running time.  After acquiring running time
predictions for each of the hosts, the real-time scheduling advisor
chooses a host at random from among those hosts whose running time
predictions are less than the deadline.  If no such host exists, it
chooses the host with the minimum expected running time.  The chosen
host and the prediction of the running time of the task on that host
are returned to the application.  The application can then choose
whether to accept the solution or to pose a different request.

The application or other middleware services can also use the system at
lower levels.  For example, other kinds of schedulers can be based on
the running time advisor, or can use host load predictions directly.
The system is also extensible in a number of ways.   It is easy to
construct prediction systems for other kinds of resources, and it is
easy to add support for new predictive models.  The components of the
system can also be arranged differently, using various transports to
communicate over the network.


\section{Resource signal methodology}
\label{sec:intro.method}

In addition to arguing for basing real-time scheduling advisors on
explicit resource-oriented prediction, this dissertation also
recommends a methodology for investigating and implementing such
prediction.  The methodology, which is described in more detail in
Chapter~\ref{chap:rps}, and used in
Chapters~\ref{chap:statprop}---\ref{chap:loadpred}, is essentially to
transform a specific resource prediction problem into a general time
series prediction problem as early as possible, and then to apply the
substantial statistical machinery that already exists to address such
problems.

The resource signal methodology consists of six steps.  First, the
investigator finds an easily measured resource signal that correlates
with the availability of the resource in question.  Second, he uses
sampling theory to determine how often the signal needs to be sampled
to capture its behavior.  Third, he collects representative traces of
the sampled signal.  These steps require the expertise of a domain
expert---a systems researcher.  However, the collected traces
represent a general time series analysis and prediction problem, for
which a large base of expertise and numerous experts already exists.
The fourth step is to use the traces to determine the salient
statistical properties of the resource signal.  This leads to the
selection of a set of prospective modeling and prediction techniques.
A large number of tools are commercially available to assist with this
step.  In the fifth step, the investigator performs a randomized
evaluation of the prospective models on his traces to determine which
model is indeed the most appropriate.  In the sixth step, the
appropriate model is incorporated into an on-line prediction system
for the resource.  Few tools are available to help with these latter
two steps.  We contribute the RPS Toolkit (Chapter~\ref{chap:rps}) to
facilitate them.  RPS provides tools for carrying out the randomized
evaluation, and for rapidly implementing a prediction system based on
the appropriate predictive model.

%
% May want to extend this and talk about the resource signal
% methodlogy here
%

\section{Outline of dissertation}

The flow of the dissertation essentially follows the architecture
shown in Figure~\ref{fig:intro.rtsa_structure}, from bottom to top.

%
%
% Change as appropriate for moving rps stuff to appendix
%
%

Chapter~\ref{chap:rps} describes the design, implementation, and
performance evaluation of the RPS Toolkit, which forms the basis of
the host load measurement and prediction systems.  RPS provides
extensible sensor, prediction, and communication libraries for
building resource prediction systems, and a set of components that can
be composed at run-time to form resource prediction systems.  For the
predictive models that we later find appropriate for host load
prediction, RPS has extremely low overhead.  We also describe an
RPS-based parallel evaluation system that we use later to determine
the appropriate models for host load prediction.  The combination of a
powerful off-line evaluation tool and tools for quickly constructing
on-line prediction systems based on the evaluation results helps to
carry out the resource signal methodology.

Chapter~\ref{chap:statprop} motivates the choice of host load as our
resource signal, shows how to appropriately sample it, and describes
the statistical characteristics of this resource signal.  This
knowledge forms the basis for the host load measurement system in
Figure~\ref{fig:intro.rtsa_structure}.  The interesting and new
statistical results are a strong autocorrelation structure,
self-similarity, and epochal behavior.  These findings suggest that
some form of linear model should be appropriate for prediction, but
that more complex models that capture long-range dependence may be
necessary.  Furthermore, they suggest that such models may need to be
refitted at epoch boundaries.

Chapter~\ref{chap:loadpred} describes a large scale study that
evaluated linear models to determine which are most appropriate
for host load prediction.  The study was based on running randomized
testcases using the load traces described in
Chapter~\ref{chap:statprop}, and data-mining the results.  We found
that despite the complex behavior of host load signals, relatively
simple and computationally inexpensive autoregressive models, of
sufficiently high order, are appropriate for host load prediction.
This new knowledge forms the basis for the host load prediction system
in Figure~\ref{fig:intro.rtsa_structure}.  It was simple to construct
this component using RPS once the choice of model became clear.

Chapter~\ref{chap:execpred} describes the design, implementation, and
evaluation of the running time advisor component of
Figure~\ref{fig:intro.rtsa_structure}.  Surprisingly, this component
is non-trivial, but we were able to develop an algorithm for computing
confidence intervals for the running time.  The algorithm relies on
two new techniques: accounting for the correlation of prediction
errors, and load discounting.  We evaluate an implementation of the
algorithm by running randomized testcases on real hosts whose
workloads are provided by our load traces using a new technique called
load trace playback.  Essentially, this evaluation tests the bottom
three stages shown in Figure~\ref{fig:intro.rtsa_structure}.  The
results are that the algorithm performs quite well using the
autoregressive model we found appropriate for host load prediction.
The confidence intervals computed using that model have nearly the
desired coverage and are usually far narrower than those computed
using other predictive models.

Chapter~\ref{chap:rtsched} describes the design, implementation, and
evaluation of the real-time scheduling advisor component in
Figure~\ref{fig:intro.rtsa_structure}.  This component was relatively
easy to implement given a functional running time advisor.  We
evaluate it by running randomized testcases on real hosts whose
workloads are generated using load trace playback.  We compare our
system with simple approaches such as random scheduling and scheduling
a task on the host with minimum measured load.  Both our system and
the measurement approach are vastly superior to random scheduling in
terms of the fraction of deadlines that are met.  Our system always
performs at least as well as the measurement approach and
significantly outperforms it in several important regions of
operation.  Furthermore, unlike the measurement approach, our system
is able to tell the application, with very high accuracy, whether the
deadline can actually be met on the selected host.  This makes it
possible for the application to modify the task's requirements until
its deadline can be met.  Finally, our system is able to introduce
appropriate randomness into its scheduling decisions, reducing the
chance of synchronization among multiple independent scheduling
advisors.  Given that these advantages come at very little additional
cost over the measurement approach, the superiority of our system is
clear.

Chapter~\ref{chap:conc} concludes the dissertation by describing how
our work relates to other work in this area.  It also describes the
future directions of our work.  One direction is to statistically
characterize and predict other resources.  To this end, we describe an
RPS-based prediction system we have developed for network bandwidth,
measured using the Remos system, and we present some initial results
on evaluating linear models for network bandwidth prediction.  Another
direction is to develop and incorporate more sophisticated predictive
models, which seem to be needed for resource signals such as network
bandwidth.  We present some initial results on applying a non-linear
modeling technique to network bandwidth prediction.  Improved modeling
of different resource schedulers is another of the directions we
contemplate.

Appendix~\ref{chap:app_pred} describes an evaluation of
application-oriented prediction approaches.

%
% RPS Appendix here?
%

\comment{

Because the prediction errors for linear models are
necessarily correlated, it is important to account for this
correlation whenever a second moment is estimated, such as in
computing a confidence interval.  Load discounting models the effect
of the priority boost that Unix schedulers give processes that have
just finished an I/O operation.  Without load discounting, the running
time of short tasks is severely underestimated.  

resource signal and characterized its behavior, discovering new
properties, including strong correlation over time, self-similarity,
and epochal behavior.  This suggested the use of linear time series
models.  We developed a toolkit to simplify carrying out off-line
evaluations of predictors on resource signal traces and to simplify
implementing on-line resource prediction systems for the signals.  We
used the toolkit to carry out a large scale, randomized study of
linear time series models for host load prediction and found that a
relatively simple and efficient model was appropriate.  The toolkit
then enabled us to easily implement an on-line host load prediction
system.  We then developed a statistical model to transform the
predictions of this system to confidence intervals for task running
time, and implemented it as tool called the running time advisor.  We
characterized the quality of the running time predictions in a large
scale randomized evaluation based on host load trace playback, a new
technique for reconstructing real background workloads.  Next, we
developed a real-time scheduling advisor that makes its decisions
based on the predictions of the running time advisor.  We evaluated
the real-time scheduling advisor using a randomized approach in a load
trace playback-based environment.
}



\comment{
\subsection{Resource-oriented prediction}
\label{sec:intro.resource-oriented}

The performance is determined predominantly by resource
availability.  Of the designs that use explicit prediction, there are
resource-oriented prediction approaches and application-oriented
prediction approaches.  Resource-oriented approaches predict future
resource availability using information available about the resource.
These predictions of resource availability, and the task's resource
demands are then supplied to a model that estimates the task's
performance.  Application-oriented approaches predict task performance
directly using application information such as the performance of
previous tasks.




\section{Adaptation stages}

This is probably a junk section

\begin{figure}
\centerline{
\begin{tabular}{cc}
\colfigsize
\epsfbox{eps/adaptation_layers_general.eps}
&
\colfigsize
\epsfbox{eps/adaptation_layers_rtsa.eps}
\\
(a) General &
(b) Real-time scheduling advisor \\
\end{tabular}
}
\caption[Adaptation stack]{Adaptation stages}
\label{fig:intro.adaptation stages}
\end{figure}


\section{Prediction stages}
\label{sec:into.pred_stages}


The design space for a prediction-based real-time scheduling advisor
is vast.  

\begin{enumerate}
\item what is predicted
\item who does prediction
\item where does prediction happen
\item how is prediction done
\end{enumerate}







\section{Application-oriented prediction}

\section{Resource-oriented prediction}

}




