\section{Approach: History-based prediction with adaptation}
\label{sec:hist_pred}


The challenge in the dynamic mapping problem is to match future
application behavior (the unexecuted portion of the activation tree)
to future system behavior (host and network availability.)  Recall
that the purpose of mapping an individual node is to help achieve the
bounds placed on the tree of which it is a part.  Unfortunately, when
the node is mapped, only the part of the tree preceding it in the
depth-first traversal is known.  However, we cannot be oblivious to
the remainder of the tree, since it determines in part how quickly we
should try to execute the node.  If we execute the node too quickly,
we may well waste resources with no benefit to the user.  If we
execute it too slowly, we may fail to meet the bounds on the tree.
The future system behavior is important because it determines the
execution time of the node on each of the hosts.  Consider a leaf node
- its execution time on a particular host depends on the bandwidth and
latency the network will be able to provide and on the load on the
host during the node's execution.

Our approach to this challenge is to use history-based prediction and
to adapt our mapping strategy as we traverse the activation tree.
History-based prediction involves predicting the future values of a
series using a window of previous values.  In essence, we monitor the
system and the application and use our measurements to predict how
their behaviors will evolve.  Each node is then mapped to the host
which best serves to match the predicted future application behavior
to the predicted future system behavior.  As the tree is executed, we
adapt the degree to which the mapping strategy explores the system
and what prediction algorithms we use to how deep we are in the tree
and how far along in the traversal. 

Predicting future system behavior according to a history of
measurements is relatively easy to think about.  Consider host load,
for example.  Host load is a scalar quantity and the history is simply
a series of load measurements.  The prediction algorithm is a function
that maps these measurements of the past to a prediction of the
future.  As we shall see in Section~\ref{sec:load_traces}, this
function may be quite complex, but the concept is simple.  Similarly,
we can imagine mapping a history of bandwidth measurements into a
prediction of the bandwidth at some point in the future.  The
measurements of the system need not be at this low a level.  For
example, in Section~\ref{sec:simple_prob}, we collect a history of the
execution times of leaf nodes on a host and use these probes to
predict future execution times on the host.  It is this approach that
we plan to extend.

Predicting future application behavior according to a history is a
more difficult concept.  What does it mean to predict the remainder of
the tree when we map a node?  What level of detail is required?  How
do we represent this history and predictions?  Our approach is to
recursively divide the bounds placed on the root of the tree among the
children according to the predicted compute times of the nodes.  Given
this, we can visualize the prediction problem with only three levels
of the tree, as in Figure~\ref{fig:tree_example}.  In the figure, we
are executing node 1 (the root node), when it makes a call, making
node 2 visible.  We must map node 2, but at this point in time, we
don't know the execution time of node 2 or about the existence of
nodes 3 and 4 (or the subtrees they may be the roots of.)  What we
want to do is map 2 such that after 2 and 3 have been executed, there
is sufficient time remaining in the bound on 1 to execute 4 and the
remainder of 1.  The question then is: what portion of the bound on 1
should be alloted to 2?  A good choice for this bound embodies a
sufficiently detailed prediction of remainder of the tree.
Furthermore, this representation of the remainder of the tree as a
2-tuple is easy to understand and work with.  The prediction algorithm
can predict what proportion of the bound on a parent node to allot to
a child node node based on a history of such bounds.

\begin{figure}
\epsfxsize=1in
\centerline{\epsfbox{tree_test.eps}}
\caption{Tree example.}
\label{fig:tree_example}
\end{figure}

Although our current results (Sections~\ref{sec:load_traces}
and~\ref{sec:simple_prob}) suggest that history-based prediction is an
effective approach, it is important to note that it has its limits.
We simply cannot measure any system precisely enough to make our
predictions always agree with reality.  To smooth the differences, we
exploit the structure of the activation tree.  In particular, we can
tune our approach to where we are in the tree and what the overall
form of the tree is.  Early on in the traversal of the tree, we can be
more exploratory in our mapping decisions since if the mapping of a
particular node is disappointing, there presumably still remains
plenty of time to meet the bounds on the tree.  Deep in the tree, as
the compute time of a node becomes small and leaf nodes more common,
we can switch to a different mapping strategy or prediction algorithm,
or simply remain on the current host.  For very large trees with high
bounds, we may use a different algorithm altogether.

\subsubsection*{Discussion and comparison to other approaches}

The design space of a best effort real-time service is vast, but we
believe that our approach represents a very interesting and defensible
part of the space.  Our approach is based on the belief that
prediction algorithms can be developed that perform sufficiently well
given only limited, local, application-level measurements.  This
simplicity would greatly facilitate incorporation into systems such as
CORBA.  For example, our approach could be incorporated into object
references or similar proxies in other systems.  No external agent or
daemon would be necessary and the service would be available wherever
the program could be run.  However, there are two important criticisms
that can be made.  The first is information redundancy.  In our
approach, each host collects a history of execution times for each
procedure on each of the other hosts.  This becomes impractical with a
large number of procedures or hosts.  However, we expect that only a
limited number of hosts will be necessary to achieve responsiveness.
Similarly, we don't believe that the amount of history we will need to
keep for each procedure will be especially large.  In the work we
describe in Section~\ref{sec:simple_prob}, it is on par with the
amount of code in an object reference.  The second important criticism
is that chaotic behavior could result if several programs using our
algorithm were run simultaneously, since they would be oblivious to
each other.  While a particular algorithm may suffer from this
problem, it is not implicit in our approach.  Further, since such
interaction will likely be incidental instead of purposeful, damping
an algorithm's reaction or using randomization would probably be
sufficient to reduce the effects of such interaction.

While we compare our approach with dynamic load balancing systems,
distributed soft real-time systems, and quality of service systems in
Section~\ref{sec:related_work}, we will next examine two obvious
approaches that are probably on the reader's mind at this point.  The
first approach, a resource reservation system, is entirely opposite in
character from ours.  In such a system, bounds would be translated
into demands for processor and communication resources (scheduler
slots/priority, bandwidth, etc), which the program would then attempt
to reserve before executing a node or a whole subtree.  If all the
reservations were successful, the bounds would be guaranteed to be
met.  Clearly this determinism has much appeal, but the translation
between bounds and resource demands is likely to be quite complex and
may require considerable information (hints) from the programmer to
get right.  Further, over-reserving resources, which is probably a
natural reaction, would result in inefficient use of the resources.
Finally, to implement a resource reservation system in most computing
environments would require operating system and network modifications,
which would severely limit the sites and systems where it would be
possible. 

A middleground approach is one based on a shared measurement system.
In such a system, daemons running on each host would regularly
exchange measurements of low level quantities such as load and network
conditions, reaching a consensus about how the system is behaving as a
whole.  When mapping a node or a subtree, the local daemon would be
consulted and a mapping decision made on the basis of the consensus.
For example, the node could be mapped to the host with the lowest
load.  Such a shared system can be made to be more scalable than our
application-centric approach, but the low-level measurements it
provides may not be as useful as the application-level measurements
our approach uses.  Further, the measurement granularity is determined
by the system and may not be right for the application.  In contrast,
our approach can concentrate its measurements where they are most
appropriate.  Another point is that communication resources which
could otherwise be put to use by the application will instead be used
to maintain the consensus.   Finally, to keep the amount of communication
manageable, measurements will take longer to propagate to all hosts as
the number of hosts grows.

It is important to note that mixed approaches are possible.  For
example, a shared measurement system could provide input to our system
for hosts that we have not run on yet.  Alternatively, one could build a
system that shared application-level information.  Finally, a partial
reservation scheme (supporting, say, reservations of network
bandwidth) could also be used in conjunction with our system.



%We believe that the prediction of application and the prediction of
%system behavior should be interdependent.  One reason is that system
%behavior affects how important application behavior is.  For example,
%under conditions of low load, the proper decomposition of bounds is
%less important.  Another reason is 

% For example,
%one could force the selection of a bound for node 2 in
%Figure~\ref{fig:tree_example} to be a prediction only of the nominal
%compute time (under conditions of no load) of the remainder of the
%tree, but the tree is executing on a real system and the load
%conditions when node 3 is executed will likely be different from when
%node 4 is executed.


%For this dynamic mapping problem, one could imagine predicting and
%combining a number different quantities to predict whether executing
%the node on a particular host will meet the bounds.  Essentially,
%there are several ``stages'' at which prediction can be done, as is
%illustrated in Figure~\ref{fig:pred_levels}.  For example, the mapping
%algorithm could use a history of execution times on each host to
%predict future execution times, and then test those against the
%deadlines.  This scheme is marked as Stage 3 in the figure.  Another
%possibility would be to use past load measurements on each host to
%predict future load, and combine this with a similar scheme for
%predicting future network bandwidth from past measurements of network
%bandwidth.  This is prediction at Stage 1.  These quantities form
%diffrent prediction ``stages,'' a concept which is illustratated in
%Figure~\ref{fig:pred_levels}.  There is good reason to believe that
%history-based prediction is a good approach to solving the dynamic
%mapping problem given our results at two of these stages, as we
%present in sections~\ref{sec:load_traces} and~\ref{sec:simple_prob}.





%When we map the node, we need to answer two questions:
%what bounds should we assign to the node, and which host is most
%likely to meet those bounds?


%In this section, we begin by defining what we mean by a history-based
%predictor for a time series and then we describe the different stages
%at which prediction can be done to solve our dynamic mapping problem.

%\subsection{Definitions}


%\subsection{Prediction stages}


%\begin{figure}
%\epsfxsize=4in
%\centerline{\epsfbox{stages_test.eps}}
%\caption{Prediction Stages}
%\label{fig:pred_levels}
%\end{figure}

%Suppose we are trying to determine whether mapping a node of a tree
%onto a particular host will allow that node to meet its bounds.  Let's
%use the model of node execution time given in
%Section~\ref{sec:node_exec_model} and let us assume that the node is a
%leaf node so $t_{compute}=t_{computelocal}$.  Further, let us assume
%$t_{map}$ and $t_{update}$ are constant and ignore them.  With these
%simplifications, whether mapping will meet its deadline or not depends
%on the execution time, the sum of local computation time and the
%communication time to move data.  The local computation time depends
%on how the load on the host will fluctuate and the nominal amount of
%computation that needs to be done.  The latter is a function of the
%activation tree structure.  Similarly, the communication time depends
%on the future fluctuations of network properties such as bandwidth and
%latency as well as on the activation tree structure, which determines
%the amount of data that needs to be communicated.
%Figure~\ref{fig:pred_levels} illustrates these dependencies in the
%form of a graph.


%The solid edges of the graph in Figure~\ref{fig:pred_levels} represent
%values which we might predict while the nodes represent models that
%combine values to generate a new value.  For example, by using the
%appropriate model $f$, we can combine a host's load with the nominal
%compute time of a node to determine the actual compute time of the
%node on that host. $g$ is a similar model for computing communication
%time from network properties and tree properties. 

%It is important to note that there are four stages at which one might
%do prediction.  The first stage is that of the host, network, and
%program properties.  Good predictors for host load, network bandwidth
%and latency, and activation tree structure would provide us with what
%we need to know to determine whether this host is a reasonable place
%to map the node given the models $f$ and $g$.  At the second stage, we
%could predict the compute and communicate times directly.  Continuing
%the simplification, we could predict execution times directly in the
%third stage, or, in the final stage, whether or not deadlines are met.

%\subsection{Activation tree benefit}








