% most useful Latex command
\newcommand{\comment}[1]{}

\documentclass{llncs}
%\usepackage{fullpage}
\usepackage{epsf}
\usepackage{times}
%\usepackage{changebar}
%\usepackage{lscape}

%\input{restore}

%\makeatletter
%\renewcommand{\section}
%  {\@startsection{section}{1}{\z@}{-3ex plus -1ex}{2ex minus 1.0ex}{\normalfont\Large\bfseries}}
%\makeatother
% 2ex minus 0.5ex means post handling space that latex is allowed to
% remove..  2ex is the default.
%\makeatletter
%\renewcommand{\subsection}
%  {\@startsection{subsection}{2}{\z@}{-3ex plus -1ex}{2ex minus 1.0ex}{\normalfont\normalsize\bfseries}}
%\makeatother

\renewcommand{\dbltopfraction}{.9}
\renewcommand{\topfraction}{.9}
\renewcommand{\textfraction}{.1}
\renewcommand{\floatpagefraction}{.9}
%\renewcommand{\dbltopnumber}{1}
\renewcommand{\dblfloatpagefraction}{.9}

\def\normspacing {\renewcommand{\baselinestretch}{0.88}}
\def\tinyspacing {\renewcommand{\baselinestretch}{0.7}}
\def\bibspacing {\renewcommand{\baselinestretch}{0.7}} 
%\def\normspacing {}
%\def\tinyspacing {}

\normspacing

\def\thesection{\arabic{section}}
\def\thefigure{\arabic{figure}}

%\newcounter {subsubsubsection}[subsubsection]
%\def\thesubsubsubsection {\thesubsubsection.\arabic{subsubsubsection}}
%\def\subsubsubsection {\subsubsection*}

\makeatletter\long\def\unmarkedfootnote#1{{\long\def\@makefntext##1{##1}\footnotetext{#1}
}}

\def\pagefigwidth {\epsfxsize=2.5in}
\def\colfigwidth {\epsfxsize=2.5in}
\def\figfontsize {\tiny}
\def\leadfigwidth{\epsfxsize=2in}
%\def\absfigsize {\epsfxsize=3.0in}
%\def\relfigsize {\epsfxsize=3.0in} 

\newcounter{ecount}
\newenvironment{ecompact}{
  \begin{list}%
    {\arabic{ecount}}{\usecounter{ecount}
    \parsep 0pt plus 1pt
    \partopsep 0pt plus 1pt
    \topsep 2pt plus 2pt minus 1pt
    \itemsep 0pt plus 1pt}}%
  {\end{list}}

\newenvironment{icompact}{
  \begin{list}{$\bullet$}{
    \parsep 0pt plus 1pt
    \partopsep 0pt plus 1pt
    \topsep 2pt plus 2pt minus 1pt
    \itemsep 0pt plus 1pt}}%
  {\end{list}}


\begin{document}


\bibliographystyle{acm}

\title{The Case For \\
Prediction-based 
Best-effort Real-time Systems}
\titlerunning{Prediction-based Best-effort Real-time}

\author{Peter A. Dinda \hspace{.5cm} Bruce Lowekamp \\
Loukas F. Kallivokas \hspace{.5cm} David R. O'Hallaron}
\tocauthor{Peter A. Dinda, Bruce Lowekamp, Loukas F. Kallivokas, and David R. O'Hallaron}


\institute {
Carnegie Mellon University\\ 5000 Forbes Avenue, Pittsburgh, PA 15213, USA \\ 
\email{(pdinda,lowekamp,loukas,droh)@cs.cmu.edu}
}

\maketitle

\unmarkedfootnote{
\small
\tinyspacing
Effort sponsored in part by the Advanced Research Projects Agency and
Rome Laboratory, Air Force Materiel Command, USAF, under agreement
number F30602-96-1-0287, in part by the National Science Foundation
under Grant CMS-9318163, and in part by a grant from the Intel
Corporation.
\normalsize
\normspacing
}


\begin{abstract}
\comment{
This paper motivates a best-effort real-time service for interactive
applications running on COTS hardware and software and argues that
this service can be implemented by predicting network and host
behavior and choosing hosts on which to execute tasks.  We show how
two interesting applications (a GIS visualization tool and a
distribution earthquake simulation visualization) benefit from a
best effort real-time service.  Then we present our current results in
predicting whether hosts will be able to meet deadlines for simple
compute-bound tasks, and in predicting host load.
}

We propose a prediction-based best-effort real-time service to support
distributed,  interactive applications in shared, unreserved computing
environments.  These applications have timing requirements, but can
continue to function when deadlines are missed.  In addition, they
expose two kinds of adaptability: tasks can be run on any host, and
their resource demands can be adjusted based on user-perceived
quality.  After defining this class of applications, we describe a
significant example, an earthquake visualization tool, and show how
it could benefit from the service.  Finally, we present evidence
that the service is feasible in the form of two studies of algorithms
for host load prediction and for predictive task mapping.
%
% This abstract is missing something, but I'm too tired to deal with
% it right now.
%

\end{abstract}


\section{Introduction}


There is an interesting class of interactive applications that could
benefit from a real-time service, but which must run on conventional
reservation-less networks and hosts where traditional forms of
real-time service are difficult or impossible to implement.  However,
these applications do not require either deterministic or statistical
guarantees about the number of tasks that will meet their deadlines.
Further, they are adaptable in two ways: their components are
distributed, so a task can effectively be run on any available host,
and they can adjust the computations tasks perform and the
communication between tasks, trading off user-perceived degradation
and the chances of meeting deadlines.

We propose a best-effort real-time service that uses history-based
prediction to choose how to exploit these two levels of adaptation in
order to cause most tasks to meet their deadlines and for the
application to present reasonable quality to the user.  We believe
that such a service is feasible, and, while providing no guarantees,
would nonetheless simplify building responsive interactive
applications and greatly improve user experience.

The paper has two main thrusts.  The first is to define the class of
resilient, adaptable, interactive applications we have described above
and to show that it contains significant real applications.  As an
example of typical user demands and application adaptivity we analyze
QuakeViz, a distributed visualization system for earthquake
simulations being developed at CMU.  We are also studying OpenMap, a
framework for interactively presenting map-based information,
developed at BBN~\cite{OPENMAP-WEB-PAGE}.  The extended version of
this paper~\cite{WPDRTS99-EXTENDED-TR} covers OpenMap in more detail.

The second main thrust of the paper is to support the claim that
history-based prediction is an effective way to implement a
best-effort real-time service for this class of applications.  We
present two pieces of supporting evidence here. The first is a
trace-based simulation study of simple algorithms for predicting on
which host a task is most likely to meet its deadline based on a
history of past running times.  One of the algorithms provides near
optimal performance in the environments we simulated.  The second
piece of supporting evidence is a study of linear time series models
for predicting host load that shows that simple, practical models can
provide very good load predictions.  Because running time is strongly
correlated with the load experienced during execution, the fact that
load is predictable suggests that information that can be collected in
a scalable way can be used to make decisions about where to execute
tasks.  Together, these two results suggest that prediction is a
feasible approach to implementing a best-effort real-time service.

\section{Application characteristics}
\label{sec:app_chars}

The applications we are interested in supporting have the following
four characteristics.  First, they exhibit {\bf interactivity} ---
computation takes the form of tasks that are initiated or guided by a
human being who desires responsiveness and predictable behavior.
Research has shown that people have difficulty using an interactive
application which does not achieve timely, consistent, and predictable
feedback~\cite{EDITOR-RESPONSE-VARIATIONS-EMBLY-81,PSYCH-LIMITS-ON-SYS-RESPONSE-TIME-KOMATSUBARA-97}.
Our mechanism for specifying interactive performance is the task
deadline.  The second application characteristic is {\bf resilience}
in the face of missed deadlines --- such failures do not make these
applications unusable, but merely result in lowered quality.
Resilience is the characteristic that suggests a best-effort real-time
approach, instead of traditional ``soft'' (statistically
guaranteed)~\cite{LOTTERY-OSDI,PROB-JOB-SCHED-DIST-RT-BESTAVROS,TIME-DRIVEN-SCHED-MODEL-RTS-JENSEN-85}
and ``hard'' (deterministically guaranteed) real-time
approaches~\cite{HARD-RTS-STANKOVIC-BOOK-88}. The third
characteristic is {\bf distributability} --- we assume that the
applications are implemented in a distributable manner, with data
movement exposed for scheduling purposes.  Finally, our applications
are characterized by {\bf adaptability} --- they expose controls,
called {\em application-specific quality parameters}, that can be
adjusted to change the amount of computation and communication
resources a task requires.  
\comment{
Adjustments are made to meet the task
deadlines and the user's quality metrics.
}

\comment{

The applications we are interested in supporting have the following
characteristics:
\begin{enumerate}
\item They are {\em interactive}---computation is initiated or guided by a
human being who desires consistent, predictable response times.
\item They are {\em resilient} in the face of missed deadlines---no
guarantees, either statistical or deterministic about the number of
missed deadlines are strictly necessary.
\item They consist of {\em distributed and Replicable components},
making it possible to execute their tasks on any available host.
\item They expose {\em quality parameters} that can be adjusted to change the
amounts of computation and communication a task requires.
\end{enumerate}

Applications with  these characteristics  expose their requirements in
the form of deadlines for tasks, and provide two degrees of freedom in
controlling how those tasks are executed.  The real-time system adjusts
these control knobs to attempt to meet  as many deadlines as possible.
Because no reservations are provided  by the computing environments we
target, and we are in uncontrollable competing compute traffic


\comment {
\begin{description}
\item[interactive] Computation is initiated or guided by a
human being.
\item[resilient] Missed deadlines are not fatal.
\item[distributable] Their components can be run on any host with the
required resources.
\item[adaptable] Application behavior can be changed
according to the needs of the user and the capabilities of the system.
\end{description}
}

\subsubsection*{Interactivity} 
The applications are interactive---computation takes the form of tasks
that are initiated or guided by a human being who desires
responsiveness. 

\comment{ Achieving responsiveness amounts to providing timely,
consistent, and predictable feedback to individual user actions.  If
the feedback arrives too late or there is too much jitter for a series
of similar actions, the utility of the program is degraded, perhaps
severely.  Research has shown that people have difficulty using an
interactive application that does not respond in this
manner~\cite{EDITOR-RESPONSE-VARIATIONS-EMBLY-81,PSYCH-LIMITS-ON-SYS-RESPONSE-TIME-KOMATSUBARA-97}.
%For example, high-jitter or long latency feedback makes freehand
%drawing impossible in an image editor.  
} Research has shown that people have difficulty using an interactive
application which does not achieve timely, consistent, and predictable
feedback~\cite{EDITOR-RESPONSE-VARIATIONS-EMBLY-81,PSYCH-LIMITS-ON-SYS-RESPONSE-TIME-KOMATSUBARA-97}.
Our mechanism for specifying interactive performance is the task deadline.

\comment{
Therefore, an application must be designed to provide response
to a user's request by a specified deadline.  The key challenge in
this work is meeting these requirements as well as possible in an
environment lacking the guanteed resources which allow deterministic
performance.
}


\subsubsection*{Resilience} 
\comment{
The applications are resilient in the face of missed deadlines to the
degree that they do not require either statistical or deterministic
guarantees from the real-time system.  The inability to meet a
deadline does not make these applications unusable, but merely results
in lowered quality.  For example, occasional missing frames in playing
back video do not make the video performance unacceptable.
Consistently missing or irregular frames, however, result in
unacceptable playback.  Resilience is the characteristic that suggests
a best-effort real-time approach, instead of traditional ``soft''
(statistically
guaranteed)~\cite{LOTTERY-OSDI,PROB-JOB-SCHED-DIST-RT-BESTAVROS,TIME-DRIVEN-SCHED-MODEL-RTS-JENSEN-85}
and ``hard'' (deterministically guaranteed) real-time
approaches~\cite{HARD-RTS-STANKOVIC-BOOK-88}.
}
The applications are resilient in the face of missed deadlines because
these failures do not make these applications unusable, but merely results
in lowered quality.    Resilience is the characteristic that suggests
a best-effort real-time approach, instead of traditional ``soft''
(statistically
guaranteed)~\cite{LOTTERY-OSDI,PROB-JOB-SCHED-DIST-RT-BESTAVROS,TIME-DRIVEN-SCHED-MODEL-RTS-JENSEN-85}
and ``hard'' (deterministically guaranteed) real-time
approaches~\cite{HARD-RTS-STANKOVIC-BOOK-88}.


\subsubsection*{Distributability} 
We assume that the application are implemented in a distributable
manner, with data movement exposed for scheduling purposes.

\comment{
The applications have been developed
with distributed, possibly parallel, operation in mind.  We assume
that it is possible to execute their tasks on any of the available
hosts using, for example, mechanisms such as
CORBA~\cite{CORBA-20-ARCH-SPEC-TECHREPORT} or Java
RMI~\cite{JAVA-RMI-SPEC}.  Tasks need not be replicable (stateless),
but any data movement required to execute a task on a particular host
must be exposed, perhaps via an interface repository or reflection
mechanism.  
}

\comment{
support parallelism internally and are able to take advantage of
predictions designed to optimize the combination of computation and
communication they need to meet their requirements.  Replicable tasks
(those without side-effects) are also useful, because they simplify
reallocation and allow several copies of the tasks to be run
simultaneously to maximize the probability of meeting a deadline.
}

\subsubsection*{Adaptability} 
The applications expose controls, called {\em
application-specific quality parameters}, that can be adjusted to
change the amount of computation and communication resources a task
requires.  Adjustments are made to meet the task deadlines and the
user's quality metrics.

\comment{
Adjustments such as changing resolution, image quality,
frame rate, and response time may be needed in order to deal with
situations where a task's deadline cannot be met by choosing the
appropriate host to run it, or when longer term changes in the
available resources result in many tasks missing their deadlines.
These parameters can also be used as ``knobs'' to adjust the behavior
of the application to meet the user's quality metrics.
}

\comment{
The applications we are interested in supporting have the following
characteristics:
\begin{enumerate}
\item They are {\em interactive}---computation is initiated or guided by a
human being who desires consistent, predictable response times.
\item They are {\em resilient} in the face of missed deadlines---no
guarantees, either statistical or deterministic about the number of
missed deadlines are strictly necessary.
\item They consist of {\em distributed and Replicable components},
making it possible to execute their tasks on any available host.
\item They expose {\em quality parameters} that can be adjusted to change the
amounts of computation and communication a task requires.
\end{enumerate}

Applications with  these characteristics  expose their requirements in
the form of deadlines for tasks, and provide two degrees of freedom in
controlling how those tasks are executed.  The real-time system adjusts
these control knobs to attempt to meet  as many deadlines as possible.
Because no reservations are provided  by the computing environments we
target, and we are in uncontrollable competing compute traffic

}
}


\section{QuakeViz}

The Quake project developed a toolchain capable of detailed simulation
of large geographic areas during strong
earthquakes~\cite{QUAKE-JOURNAL-98}.  Thorough assessment of the
seismicity associated with a geographic region requires accurate,
interactive visualization of the data produced by the simulation.
Visualization of this data is complex because the full data for even a
small region such as the San Fernando Valley requires approximately
6TB of data.  Even selective output results in tens of gigabytes of
data.  Nevertheless, with proper management of the visualization data,
it is possible to achieve reasonable quality in a resource-limited
environment.

\comment{
The Quake project developed a toolchain capable of detailed simulation
of large geographic areas during strong
earthquakes~\cite{QUAKE-JOURNAL-98}.  For example, a simulation was
developed of the response of the San Fernando Valley to an aftershock
of the 1994 Northridge Earthquake.  The finite element representation
of this simulation contains up to 77 million tetrahedrons producing
over 40 million unknowns.  Storing all of the data for each time step
of the simulation would require approximately 6TB of data.  Even
selective output results in tens of gigabytes of data.


Thorough assessment of the seismicity associated with the geographic
region requires accurate, interactive visualization of the data
produced by the simulation.  To facilitate rapid turn-around in
improving the simulation and in testing different conditions, we are
designing an environment where the visualization can be done on a
researcher's regular desktop, without requiring special hardware
support.


Designing a visualization system for Quake which can be used
interactively on a standard desktop system across unreserved network
resources is a challenge because these devices were not designed to
handle this type of load.  Nevertheless, with proper management of the
visualization data, it is possible to achieve reasonable quality in
this environment.
}

%\begin{figure}[t]
%\centerline{
%\epsfxsize=2.5in
%\epsfbox{quake_structure.eps}}
%\caption{Stages of the Quake data visualization toolchain.
% Data can be downsampled on any edge if network bandwidth is limited.}
%\label{fig:quake}
%\end{figure}
\begin{figure}[t]
\epsfxsize=4.5in
\centerline{\epsfbox{quake_combo.eps}}
\caption{QuakeViz Application. (a) Stages of the Quake visualization
toolchain. Different placements of toolchain tasks
corresponding to different resource situations: (b) A high performance
network and workstation are available for the visualization.  (c) A
low-bandwidth network and PC with a 3D graphics accelerator are used.
(d) A low-end PC is used as a framebuffer displaying the result of a
completely remote visualization process.}
\label{fig:quake}
\end{figure}
The visualization toolchain is shown in Figure~\ref{fig:quake}(a).
The raw simulation data is typically read from storage, rather than
from the running application, because of the simulation's high cost,
scheduling complexities, and the lack of any runtime tunable
parameters in the simulation.  The raw irregular mesh data is first
interpolated onto a regular grid to facilitate processing and
downsampled to a more manageable size.  The resulting grid is then
used to calculate isosurfaces corresponding to various response
intensities.  The displacement and density isosurfaces are combined
with topology information to produce a 3D image, which is then drawn
on the user's desktop display.


Because the Quake visualization is done offline, the user's input
consists of controlling the parameters of the visualization.  These
operations may include rotating the display, removing surface layers,
zooming in and out, and adjusting the parameters for isosurface
calculation.


In the visualization diagram shown in Figure~\ref{fig:quake}(a),
we generally expect the resources to retrieve and do the initial
interpolation to be available on only a few high-performance machines,
such as those commonly used at remote supercomputing facilities.
Because the output device is determined by the user's location, the
isosurface calculation and scene calculation are the two components
which can be freely distributed to optimize performance.


\subsection{Task location}
The location of these two phases is governed by the combination of
network bandwidth, and computing and graphics resources available to
the user.  In fact, there are reasons to divide the pipeline at any of
the three remaining edges, depending on predicted processor and
network capacity.  Three examples are shown in
Figure~\ref{fig:quake}(b)-(d).

Figure~\ref{fig:quake}(b) is the ideal case where resources
allow the full regular mesh produced by the interpolation phase to be
sent directly to the user's desktop.  This situation is unlikely in
all but the most powerful environments, but may offer the best
opportunity for interactive use.  In Figure~\ref{fig:quake}(c),
the isosurface calculation is performed in the supercomputer doing the
interpolation. The scene calculation is done in the user's desktop
machine. The isosurface calculation is also fairly expensive, whereas
the scene calculation can be done quite effectively by a variety of
commodity graphics cards currently available.  A very limited case is
shown in Figure~\ref{fig:quake}(d), where the user's desktop is
acting only as a framebuffer for the images.  This setup may be useful
if the size of the final pixmap is smaller than the size of the 3D
representation, which depends on the complexity of the scene, or if
the user's workstation does not have 3D hardware.


\comment{
Figure~\ref{fig:quake}a is the ideal case where resources
allow the full regular mesh produced by the interpolation phase to be
sent directly to the user's desktop.  This would be possible in the
case where there is a large amount of bandwidth between the remote
site and the user's desktop and the user's desktop has sufficient
power to perform the entire 3d processing operations in real-time.
This situation is unlikely in all but the most powerful environments,
but may offer the best opportunity for interactive use.


In Figure~\ref{fig:quake}b, the isosurface calculation is
performed in the supercomputer doing the interpolation. The scene
calculation is done in the user's desktop machine.  This could be used
when less bandwidth is available, because the isosurfaces are
significantly lower bandwidth than the grid data.  The isosurface
calculation is also fairly expensive, whereas the scene calculation
can be done quite effectively by a variety of commodity graphics cards
currently available.


A very limited case is shown in Figure~\ref{fig:quake}c,
where the user's desktop is acting only as a framebuffer for the
images.  This setup may be useful if the size of the final pixmap is
smaller than the size of the 3D representation.  This will depend on a
number of factors, such as resolution, number of surfaces,
translucency, and the occlusion of data by opaque geographic features.
This arrangement could also be used in a case where the desktop
machine does not support hardware 3D graphics.
}


\subsection{Quality parameters}
Frame rate, resolution, and interactive response time are the primary
quality parameters which can be adjusted to meet the user's
requirements.  Because it may take some time for adjustments to these
parameters to be propagated through the pipeline, other mechanisms may
be used to maintain responsiveness.  For example, resolution can be
lowered along any edge in the toolchain.  Similarly, zooming in on the
image can be accomplished along edges, while waiting for the greater
detail to be propagated from the beginning of the pipeline.  Finally,
even if only a pixmap of the image is available on the user's
workstation, that image can be used for temporary adjustments,
providing the effects of zooming and rotation, while waiting for
actual data to fill in the unavailable information.


\comment{
(i) Description: In short, QuakeViz deals with the visualization and
post-processing of data sets that result from large-scale finite
element simulations of the earthquake-induced ground motion in large
basins.

(ii) Background: It is of great interest and significance, for
seismologists, engineers, urban planners and policy makers alike, to
be able to predict as accurately as possible the way a given
geographic location will react to future strong earthquakes. The
knowledge of the specific response characteristics of a geographic
location, i.e. its seismicity, is typically incorporated into design
codes that guide practicing engineers in erecting structures that are
safe and tailored to the local seismic expectations. To be able though
to assess the seismicity associated with a large geologic structure (>
10,000 ml3) it becomes necessary to simulate its response in a
multiple-scenario mode under excitations that are provided by past or
even future-anticipated earthquakes.

We have previously developed a toolchain that allows us to recover the
volume and surface motion characteristics of large geographic areas
that result when the soil is disturbed by the action of a strong
earthquake. Using the motion data, it is possible to arrive at
characterizations of the area's seismicity. We have used the toolchain
to, amongst others, simulate the response of the San Fernando Valley
in California to an aftershock of the 1994 Northridge Earthquake. The
core of the simulation engine is based on finite element computations
on an unstructured mesh. For the finest mesh created thus far, there
resulted 77 million tetrahedra and 13.3 million nodes giving rise to
approximately 40 million unknowns (the components of the displacement
vector). The simulation takes place in the time domain that is
discretized into equidistant time-steps. Thus, the simulation of a
one-minute earthquake-induced motion, using 20,000 time steps, will
result in, approximately, 6TB of data. Selective output might, at
best, reduce the data set to several tens of gigabytes of data. The
data set represents nodal values on the vertices of a 3D unstructured
mesh and its size prohibits multiple copy storage for local
processing. To convey the information encased in the data set to
analysts, seismologists, engineers, designers, etc, while allowing a
certain level of interaction, it becomes necessary to post-process and
visualize the data set on a highly heterogeneous networked
environment.

We next identify the visualization and processing tasks according to
whether they are compute- or communication-intensive.

(iii) Computational tasks

From 3d-unstructured grid data: 

surface data (regular grid extrapolations) - isosurface 

overlays with terrain and

urban landscape data (may entail extrapolation or downsampling) -

isosurface 

Zooming (near-source effects, edge-effects) 

higherresolution needs - isosurface Volume slicing (entails regular grid
extrapolations) 

isosurface Local/global response spectra (ffts,
searches for maxima, multiple freqs) 

Correlations with existing damage
distributions (energy measures + overlays) 

(iv)
Communication-intensive tasks
}


\comment{
\section{OpenMap}
\label{sec:openmap}

\begin{figure*}[t]
\centerline{
\epsfxsize=4.5in
\epsfbox{openmap.eps}
}
\caption{Structure of an OpenMap application.  Solid arrows represent
the flow of data, while dotted arrows represent user requests.}
\label{fig:openmap}
\end{figure*}

BBN's OpenMap is a architecture for combining geographical information
from a variety of different, separately developed sources in order to
present a unified coherent visual representation, in the form of a
multi-layered map, to the end user~\cite{OPENMAP-WEB-PAGE}.  
%
% there is an openmap paper I could reference here, but I can't find
% the damn thing, either my copy or via inspec.
%

OpenMap consists of four different kinds of components. Geographical
information is provided by third party {\em data sources}, which have
unique interfaces.  A {\em specialist} encapsulates a specific data
source, hiding the details of accessing it behind a uniform
CORBA interface.  The interface is based on sequences of objects
to be drawn.  A specialist has a corresponding {\em layer} that draws
an individual map based on the drawing objects.  Finally, A {\em map
bean} manages a group of layers, overlaying their maps to produce a
single combined map for the user.  Map beans and layers are Java
Beans, which can be conveniently embedded into Java applications. 

In Figure~\ref{fig:openmap}, we show the structure of an example
OpenMap application where information from separate terrain and
political boundary data sources are combined to present the user with
a map of the Boston area. While the structure shown in
Figure~\ref{fig:openmap} appears at first glance to be a pipeline, it
is important to note that it actually operates in a request-response
manner.  Computation happens only when the user decides to change the
{\em projection} of the map (the set of layers and the region of the
planet that is being viewed.)

OpenMap is interactive---computation happens as a direct result of a
projection change.  To provide a good user experience, the time from a
projection change to the resulting map display should be short,
consistent, and predictable.  A good abstraction for this requirement
is a deadline placed on the computation initiated by a projection
change.  Achieving such deadlines is challenging because specialists
and data sources may be located at distant sites and run on shared,
unreserved hosts communicating via the Internet.  However, missing
OpenMap deadlines only degrades the user's experience---OpenMap is
resilient.
 
The components of OpenMap were designed from the start to be
physically distributed using CORBA communication mechanisms.  We can
use this enabler to build replicated specialists and data sources,
as we highlight in gray in Figure~\ref{fig:openmap}.  This provides a
choice of which specialist is used to satisfy a projection change for
a given layer, which is one of the degrees of freedom that a
best-effort real-time system can exploit.

An OpenMap specialist can also expose two quality parameters to a
best-effort real-time system.  First, it can adjust the query sent to
its data source to change the amount of computation that needs to be
done to satisfy the request and the size of the response.  This is a
useful adaptation for when all of the replicated data sources are
simultaneously too busy.  The second quality parameter a specialist
can expose is the size of the response it sends to its layer.  If
the communication path from the specialist to the layer should become
very constrained, the specialist can send a lower quality drawing.

}

\comment{
For example, suppose the user decides to ``zoom''
in to display central Boston.  The map bean will request that the two
layers redraw themselves given the new projection.  Each layer must
then request a more detailed rendering from their corresponding
specialists, which in turn request new data from their corresponding
data sources.   It is only at this point that data begins to flow
toward the user.  


i) Description

(1) User requests to Bean result in layer updates

(2) Layer updates involve requests to specialists that take their local information and render polygons (well, drawing commands), which get sent back to the Bean, which merges them with other updates and draws

ii) Interactive - obvious, example of movement 

iii) Resilient to missed deadlines - guarantees not needed

(1) Non-JTF planner example

iv) Distributed and Replicable components

(1) Java Bean, specialists, replicated specialists

v) Application-unaware adaptively

(1) Choice of specialist in replicated setup

vi) Application-aware adaptively 

(1) Location of rendering component (weak, probably toss this)

vii) Communication/Compute characteristics

(1) Data sources are almost necessarily physically distributed - can argue that there is always some new data source that is of interest

(2) Lots of data is static and easily Replicable - political boundaries, vegetation, etc

(3) Other data changes slowly and can be replicated with low overhead for updates

(4) Example of quickly changing data - whiteboardish planning application - drawing layer

}

\section{Structure of a best-effort real-time system}

\begin{figure}[t]
\epsfxsize=3in
\centerline{\epsfbox{system_model.eps}}
\caption{Structure of best-effort real-time system.  The shaded
portion is discussed.}
\label{fig:struct}
\end{figure}

Figure~\ref{fig:struct} illustrates the structure of our proposed
best-effort real-time system.  The boxes represent system components.
Thin arrows represent the flow of measured and predicted information,
thick arrows represent the control the system exerts over the
application, and the dotted arrow represents the flow of application
requests.  We have shaded the parts of the design that we discuss
in this paper.

The system accepts requests to run tasks from the application and then
uses a mapping algorithm to assign these tasks to the hosts where they
are most likely to meet their deadlines.  A task mapping request
includes a description of the data the task needs, its resource
requirements (either supplied by the application programmer or
predicted based on past executions of the task), and the required
deadline.  The mapping algorithm uses predictions of the load on each
of the available hosts and on the network to determine the expected
running time of the task on each of the hosts, and then runs the task
on one of the hosts (the target host) where it is likely to meet its
deadline with high confidence.  If no such host exists, the system can
either map the task suboptimally, hoping that the resource crunch is
transient behavior which should soon disappear, or adjust the quality
parameters of the application to attempt to reduce the task's
computation and/or communication requirements or to add more slack to
the deadline.

The choice of target host and the choice of application-specific
quality parameters are adaptation mechanisms that are used at
different time scales.  A target host is chosen for every task, but
quality parameters are only changed when many tasks have failed to
meet their deadlines, or when it becomes clear that sufficient
resources exist to improve quality while still insuring that most
deadlines can be met by task mapping.  Intuitively, the system tries
to meet deadlines by using adaptation mechanisms that do not affect
the user's experience, falling back on mechanisms that do only on
failure.  Of course, the system can also make these adjustments
pre-emptively based on predictions.

The performance of the system largely reflects the quality of its
measurements and predictions.  The measurement and prediction stages
run continuously, taking periodic measurements and fitting predictive
models to series of these measurements.  For example, each host runs a
daemon that measures the load with some periodicity.  As the daemon
produces measurements, they are passed to a prediction stage which
uses them to improve its predictive model.  When servicing a mapping
request, the mapping algorithm requests predictions from these models.
The predictions are in the form of a series of estimated future values
of the sequence of measurements, annotated with estimates of their
quality and measures of the past performance of the predictive model.
Given the predictions, the mapping algorithm computes the running time
of the task as a confidence interval whose length is determined by the
quality measures of the prediction.  Higher quality predictions lead
to shorter confidence intervals, which makes it easier for the mapping
algorithm to choose between its various options.


\section{Evidence for a history-based prediction approach}

We present two pieces of evidence that suggest history-based
prediction is a feasible approach to implementing a best-effort
real-time service for distributed, interactive applications. The first
is a trace-based simulation study of simple mapping algorithms that
use past task running times to predict on which host a task is most
likely to meet its deadline.  The study shows that being able to map a
task to any host in the system exposes significant opportunities to
meet its deadline.  Further, one of the algorithms provides near
optimal performance in the environments we simulated.  The second
piece of evidence is a study of linear time series models for
predicting host load.  We found that simple, practical models can
provide very good load predictions, and these good predictions lead to
short confidence intervals on the expected running times of tasks,
making it easier for a mapping algorithm to choose between the hosts.
Network prediction is of considerable current interest, so we conclude
with a short discussion of representative results from the literature.


\subsection{Relationship of load and running time}
\label{sec:relationship}

The results in this section depend on the strong relationship that
exists between host load and running time for CPU-bound tasks.  We
measured the host load as the Unix five second load average and sample
it every second.  We collected week-long traces of such measurements
on 38 different machines in August 1997 and March 1998.  We use these
extensive traces to compute realistic running times for the simulation
experiments of Section~\ref{sec:simple_prob}.  In
Section~\ref{sec:load_pred} we directly predict them with an eye to
using the predictions to estimate running times.  A detailed
statistical analysis of the traces is available
in~\cite{DINDA-STAT-PROPS-HOST-LOAD-LCR98}.  Considering load as a
continuous signal $z(t)$ the relationship between load and running
time $t_{running}$ is
$\int_{0}^{t_{running}}\frac{1}{1+z(t)}dt=t_{nominal}$ where
$t_{nominal}$ is the running time of the task on a completely unloaded
host.  Notice that the integral simply computes the inverse of the
average load during execution.  Further details on how this
relationship was derived and verified can be found
in~\cite{DINDA-LOAD-PRED-TR-98}.

\subsection{Simple prediction-based mapping algorithms}
\label{sec:simple_prob}
\newcommand{\RangeCounter}[0]{{\em RangeCounter(W)}}

\comment{
Intelligently exploiting the freedom of which host to map a task to
can significantly increase the number of tasks that meet their
deadlines compared to strategies such as always mapping to the same
host or mapping to random hosts.  We reached this conclusion by
developing and evaluating a variety of simple mapping algorithms for
mapping compute-bound tasks.
}

We developed and evaluated nine different mapping algorithms that use
a past history of task running times on the different hosts to predict
the host on which the current task's deadline is most likely to be
met.  These algorithms included several different window-average
schemes, schemes based on confidence intervals computed using
stationary IID stochastic models of running times, and simple neural
networks.  Due to space limits, we will focus on the performance of
\RangeCounter, the most successful of the algorithms, and only describe the
evaluation procedure at a high level.  For more details on the
algorithms and how we evaluated them, consult the extended version of
this paper~\cite{WPDRTS99-EXTENDED-TR}.

\comment{
 That algorithm, which we call \RangeCounter,
associates a quality value, initially zero, with each host.  To choose
a host, the following steps are taken: If no host has a quality value
significantly higher than zero, a random host is chosen, otherwise the
host with the highest quality value is chosen.  Next, the quality
values of all hosts are reduced (aged) by a small amount.  If the
mapping results in the deadline being met, the chosen host's quality
value is increased by the inverse of our statistical confidence in it.
The statistical confidence is the cumulative probability over the
deadline range of a Normal distribution parameterized by the sample
mean and variance of previous task running times on the host.  If the
mapping is unsuccessful, we divide the chosen host's quality value by
two.
\RangeCounter\ rewards good choices based on how unlikely they appear
to be able to bear fruit and punishes bad choices equally.  The idea
is to encourage exploration and risk taking, but to react quickly and
decisively to failure.  The aging process discourages complacency.
}

We evaluated the mapping algorithms with a trace-driven simulator that
uses the load traces described in Section~\ref{sec:relationship} to
synthesize highly realistic execution times.  The simulator repeatedly
requested that a task with a nominal execution time $t_{nominal}$
be completed in $t_{max}$ seconds and counted the number of successful
mapping requests.  Communication costs were ignored---the task is
entirely compute-bound and requires no inputs or outputs to be
communicated.  In addition to simulating the mappings chosen by the
algorithm being tested, the simulator also simultaneously simulated a
random mapping, the best static mapping to an individual host, and the
optimal mapping.

\comment{
Given a test configuration of a group of hosts (represented by their
load traces), the simulator determines how many tasks meet their
deadlines for each algorithm.  In addition, the simulator also
evaluates a random mapping and determines the maximum (optimal) number
of tasks that could have met their deadlines using all of the hosts,
and using only the best individual host.

We studied a six different configurations of hosts, and for each one
we used eight different values of the nominal execution time
$t_{nominal}$, and twelve different deadlines $t_{max}$, ranging
$t_{nominal}$ to three times $t_{nominal}$.  We simulated $N=100,000$
tasks for each of these 576 different combinations for a total of 57.6
million simulated tasks.  An additional 72 combinations (7.2 million
calls) representing varying $t_{nominal}$ were also simulated.
}

\def\mapfigsize{\epsfxsize=2.4in}

\begin{figure*}[t]
\centerline{
\begin{tabular}{cc}
\mapfigsize
\epsfbox{nom_8mm.eps} &
\mapfigsize
\epsfbox{const_8mm.eps} \\
(a) & (b) \\
\end{tabular}
}
\caption{Representative data from our study of simple prediction-based
mapping algorithms:  (a) the effect of varying nominal execution time,
 (b) the effect of increasing deadline slack.}
\label{fig:predmap}
\end{figure*}

Figure~\ref{fig:predmap}(a), which is representative of our
overall results, illustrates the effect of varying the nominal time,
$t_{nominal}$ with a tight deadline of $t_{max}=t_{nominal}$.  
\comment{The
configuration being studied contains 8 hosts with a wide variety of
different load behaviors.  The x-axis in the figure is the nominal
time, $t_{nominal}$, and ranges from 50 milliseconds to 60 seconds,
while the y-axis is the percentage of tasks that meet their deadlines.
In addition to the performance of the \RangeCounter\ algorithm, the
performance of random mapping, always mapping to the best individual
host, and mapping optimally to all of the hosts are also plotted.  }
Notice that there is a substantial gap between the performance of the
optimal mapping algorithm and the performance of random mappings.
Further, it is clear that always mapping to the one host does not
result in an impressive number of tasks meeting their deadlines.
These differences suggest that a good mapping algorithm can make a
large difference.  We can also see that
\RangeCounter\ performs quite well, even for fairly long tasks.

Even with relaxed deadlines, a good mapping algorithm can make a
significant difference. Figure~\ref{fig:predmap}(b), which
is representative of our results, illustrates the effect of relaxing
the deadline $t_{max}$ (normalized to $t_{nominal}$ on the x-axis) on
the percentage of tasks that meet their deadlines.  
\comment{
As before, data is
plotted for \RangeCounter, random mapping, always mapping to the best
individual host, and optimal mapping. 
}
Notice that there is a large,
persistent difference between random mapping and optimal mapping even
with a great deal of slack.  Further, using the best single host is
suboptimal until the deadline is almost doubled.  On the other hand,
\RangeCounter\ presents nearly optimal performance even with the
tightest possible deadline.

In a real system, the cost of communication will result in fewer
deadlines being possible to be met and will tend to encourage the
placement of tasks near the data they would consume, and thus possibly
reduce the benefits of prediction.  Still, this study shows that it is
possible to meet many deadlines by using prediction to exploit the
degree of freedom that being able to run a task on any host gives us.

\subsection{Linear time series models for host load prediction}
\label{sec:load_pred}

Implicitly predicting the current task's running time on a particular
host based on previous task running times on that host, as we did in
the previous section, limits scalability because each application must
keep track of running times for every task on every node, and the
predictions are all made on a single host.

Because running time is so strongly related to load, as we discussed
in Section~\ref{sec:relationship}, another possibility is to have each
host directly measure and predict its own load and then make these
predictions available to other hosts.  When mapping a task submitted
on some host, the best-effort real-time system can use the load
predictions and the resource requirements of the task to estimate its
running time on each of the other hosts and then choose one where the
task is likely to meet its deadline with high confidence.

\def\benefitfigwidth{\epsfxsize=2.3in}

\begin{figure}[t]
\centerline{
\begin{tabular}{cc}
\benefitfigwidth
\epsfbox{unix16-simple.epsf} &
\benefitfigwidth
\epsfbox{axpfea-simple.epsf} \\
(a) & (b) \\
\end{tabular}
}
\caption{Benefits of prediction: (a) server machine with 40 users and
long term load average of 10, (b) interactive cluster machine with long term
load average of 0.17.}
\label{fig:benefits}
\end{figure}


Load predictions are unlikely to be perfect, so the estimates of
running time are actually confidence intervals.  Better load
predictions lead to smaller confidence intervals, which makes it
easier for the mapping algorithm to decide between the available
hosts.  We have found that very good load predictions that lead to
acceptably small confidence intervals can be made using relatively
simple linear time series models.  Due to space limits, we do not
discuss these models here, but interested readers will find Box, et
al.~\cite{BOX-JENKINS-TS} to be a worthy introduction.
Figure~\ref{fig:benefits} illustrates the benefits of such models on
(a) a heavily loaded server and (b) a lightly loaded interactive
cluster machine.  In each of the graphs we plot the length of the
confidence interval for the running time of a one second task as a
function of how far ahead the load predictions are made.
Figure~\ref{fig:benefits}(a) compares the confidence intervals for a
predictive AR($9$) model and for the raw variance of the load.  Notice
that for short-term predictions, the AR($9$) model provides confidence
intervals that are almost an order of magnitude smaller than the raw
signal.  For example, when predicting 5 seconds ahead, the confidence
interval for the AR($9$) model is less than 1 second while the
confidence interval for the raw load signal is about 4 seconds.
Figure~\ref{fig:benefits}(b) shows that such benefits are also
possible on a very lightly loaded machine.

The prediction process begins by fitting a {\em model} to a history of
periodically sampled load measurements.  As new measurements become
available, they are fed to the fitted model in order to refine its
predictions.  At any point after the model is fitted, we can request
predictions for an arbitrary number of samples into the future.  The
quality of predictions for $k$ samples into the future is measured by
the {\em $k$-step ahead mean squared error}, which is the average of
the squares of the differences between the $k$-step ahead predictions
and their corresponding actual measurements.  After some number of new
samples have been incorporated, a decision is made to refit the model
and the process starts again.  We refer to one iteration of this
process as a {\em testcase.}

A good model provides {\em consistent predictability} of load, by
which we mean it satisfies the following two requirements.  First, for
the average testcase, the model must have a considerably lower
expected mean squared error than the expected raw variance of the
load.  The second requirement is that this expectation is also very
likely, or that there is little variability from testcase to testcase.
Intuitively, the first requirement says that the model provides good
predictions on average, while the second says that most predictions
are close to that average.

\def\loadpredfigwidth{\epsfxsize=2.2in}

\begin{figure}[t]
\centerline {
\begin{tabular}{cc}
\loadpredfigwidth
\epsfbox{all_lead15_8to8.eps} &
\loadpredfigwidth
\epsfbox{axp0_lead15_8to8.eps} \\
(a) & (b) 
\end{tabular}
}
\caption{Performance of 8 parameter host load prediction models for 15
second ahead predictions: (a) All traces, (b) Moderately loaded
interactive host.}
\label{fig:lead15_8to8}
\end{figure}

In a previous paper~\cite{DINDA-LOAD-PRED-TR-98}, we evaluated the
performance of linear time series models for predicting our load
traces using the criterion of consistent predictability discussed
above.  Although load exhibits statistical properties such as
self-similarity (Beran~\cite{BERAN-STAT-LONG-RANGE-METHODS} provides
a good introduction to self-similarity and long-range dependence) and
epochal behavior~\cite{DINDA-STAT-PROPS-HOST-LOAD-LCR98} that suggest
that complex, expensive models such as
ARFIMA~\cite{HOSKING-FRACT-DIFF} models might be necessary to ensure
consistent predictability, we found that relatively simple models
actually performed just about as well.

Figure~\ref{fig:lead15_8to8}(a) shows the performance of 8 parameter
versions of the models we studied for 15 second ahead predictions,
aggregated over all of our traces, while
Figure~\ref{fig:lead15_8to8}(b) shows the performance on the trace of
a single, moderately loaded interactive host. Each of the graphs in
Figure~\ref{fig:lead15_8to8} is a Box plot that shows the distribution
of 15-step-ahead (predicting load 15 seconds into the future) mean
squared error.  The data for the figure comes from running a large
number of randomized testcases.  Each testcase fits a model to a
random section of a trace and then tests the model on a consecutive
ection of random length.  In the figure, each category is a specific
model and is annotated with the number of testcases used.  For each
model, the circle indicates the expected mean squared error, while the
triangles indicated the 2.5th and 97.5th percentiles assuming a normal
distribution.  The center line of each box shows the median while the
lower and upper limits of the box show the 25th and 75th percentiles
and the lower and upper whiskers show the actual 2.5th and 97.5th
percentiles.

Each of the predictive models has a significantly lower expected mean
squared error than the expected raw variance of the load (measured by
the MEAN model) and there is also far less variation in mean square
error from testcase to testcase.  These are the criteria for
consistent predictability that we outlined earlier.  Another important
point is that there is little variation in the performance across the
predictive models, other than that the MA model does not perform well.
This is interesting because the ARMA, ARIMA, and especially the
long-range dependence capturing ARFIMA models are vastly more
expensive to fit and use than the simpler AR and BM models.  An
important conclusion of our study is that reasonably high order AR
models are sufficient for predicting host load.  We recommend AR(16)
models or better.

\subsection{Network prediction}
Predicting network traffic levels is a challenging task due to the
large numbers of machines which can create traffic over a shared
network.  Statistical models are providing a better understanding of
how both wide-area~\cite{wan-olympics,poisson-failure} and
local-area~\cite{SELF-SIM-HIGH-VAR-ETHERNET-WILLINGER-SIGCOMM95}
network traffic behave.  Successes have been reported in using linear
time series models to predict both long
term~\cite{TIME-SERIES-MODEL-LONG-TERM-NSFNET-GROSCHWITZ-ICC94} and
short term~\cite{TIME-SERIES-MODELS-INTERNET-TRAFFIC-BASU-GT-TR-95}
Internet traffic.  These results and systems such as NWS~\cite{nws} and
Remos~\cite{remos} are being developed to provide the predictions of
network performance that are needed to provide best effort real-time
service.  For applications which are interested in predicting the
behavior of a link on which they are already communicating on, passive
monitoring may be appropriate~\cite{bolliger-passive,spand}.



\section{Conclusion}

We have identified two applications (QuakeViz, which we analyzed here,
and OpenMap, analyzed in~\cite{WPDRTS99-EXTENDED-TR}) which can
benefit from a best-effort real-time service.  Without such a service,
people using such interactive applications would face a choice between
acquiring other, possibly reserved or dedicated resources or running
the application at degraded quality.  
\comment{QuakeViz and OpenMap represent
an interesting class of applications which people such as engineering
researchers, and military and civilian GIS users can benefit from
being able to use in their existing environments.
}

The success of best-effort real-time depends on the accuracy of the
predictions of resource availability.  We have shown that CPU load can
be predicted with a high degree of accuracy using simple history-based
time series models.  When combined with currently available network
information systems, these resources allow decisions to be made for
locating tasks and selecting application parameters to provide a
usable system.

\comment{
We are currently integrating the CPU load prediction and network
prediction into the Remos architecture.  The development of this
system and integration of QuakeViz and OpenMap to make use of such a
system will allow us to test the success of best-effort real-time at
meeting real-world quality demands in a variety of environments
similar to how these applications are used by end-users.
}

\small 
\bibspacing
\enlargethispage{0.25in}
\bibliography{texbib,bruce}
\normalsize
\normspacing

\end{document}

