\section{Preliminary results: algorithms and evaluation for simplified problem}
\label{sec:simple_prob}

We developed history-based prediction algorithms to attack a
simplified version of the dynamic mapping problem.  In the simplified
problem, we map only leaf nodes and ignore communication costs.  Our
measure of algorithm performance is the percentage of a large number
of calls that execute within their bounds.  The most significant
result is that one of the algorithms we developed is both tractable
and near-optimal under most conditions.  This is strong evidence that
the full problem can also be addressed using history-based prediction.
In this section, we present extensive simulation results showing the
performance of this and other algorithms in different instances of the
simplified problem.

There are several other lessons to draw from our results: (1) Dynamic
mapping can clearly make a significant difference in the percentage of
mappings that meet their bounds;  (2) Simple, statistical algorithms,
even with randomization, are insufficient and behave inconsistently;  (3)
Complex algorithms such as neural networks work reasonably well but
are clearly too expensive to use in practice.  Our near optimal
algorithm is simple enough to be practical, but is not so simple that
it is useless.

Our approach to developing and evaluating algorithms was to build a
trace-driven simulator that uses the load traces described in
Section~\ref{sec:load_traces} to synthesize highly realistic execution
times.  Each algorithm predicts which host will most likely meet the
required time bounds based on a past history of execution times.
After the call is executed in simulation, the algorithm is informed of
the actual execution time.  The simulator also tries all possible
mappings in order to determine how well an optimal mapping strategy
can do.  In addition, it computes the performance of a random mapping
approach and of always mapping to the same individual host.

In the following, we first describe the simplified mapping problem in
more detail. Then we describe the simulator and how execution time is
determined from a load trace. This is followed by a description of the
algorithms we implemented and tested.  Finally, we present results for
how the performance of the different algorithms depends on nominal
execution time, relaxation of the upper time bound, and varying
nominal execution times.  We summarize the results for six different
simulated computing environments that have different combinations of
mean load and mean epoch as described in
Section~\ref{sec:load_traces}.  We conclude by interpreting the
results in light of the load trace analysis of the previous section.


\subsection{Simplified mapping problem}
\label{sec:simp_prob_desc}

We simplify the dynamic mapping problem posed in
Section~\ref{sec:dyn_map_prob} by mapping only leaf nodes and ignoring
communication costs and the costs of the running the prediction
algorithm.  Referring to the activation tree execution model of
Section~\ref{sec:act_tree_model}, we have
$t_{map},t_{incomm},t_{outcomm},t_{update}=0$ and
$t_{compute}=t_{computelocal}$.   The thesis work will relax these
simplifications and address the full dynamic mapping problem.

\subsection{Load trace-based Simulation}
\label{sec:simulator}

To develop and evaluate our algorithms, we built a simulator that
allows us to simulate the execution of the loop
\begin{center}
\verb.for i=1 to N do.
\MAP \verb.leaf_procedure(). \IN $[t_{min},t_{max}]$
\verb.endfor.
\end{center}
where the nominal execution time, $t_{nominal}$, of
\verb.leaf_procedure. is either constant or chosen from an
enumerated distribution.  The symbols we use in this section are
summarized in Figure~\ref{fig:symbols}.  The nominal execution time is
the time the procedure would take to execute under zero-load
conditions on the slowest host.  Since we are ignoring communication
and mapping overheads, the procedure begins executing at $t_{now}$,
the point in time the \MAP call was executed.  For the duration of the
loop, the mapping algorithm is the same, so time advances according to
the mapping decisions made by the algorithm under test, just as it
would in the real world.

Given the load trace of the host the mapping algorithm has chosen,
$t_{now}$, and $t_{nominal}$, we use a simple but very accurate model
to determine the actual compute time, $t_{compute}$, of the node on
that host at that time.  First, consider a functional form of the load
trace $\langle l_i \rangle$ so that $load(t) = l_i$ for $i \Delta \leq
t < (i+1) \Delta$, where
$\Delta$ is the sampling interval of the load trace.  The actual
execution time, $t_{compute}$ is the value which satisfies
\begin{displaymath}
\int_{t_{now}}^{t_{now}+t_{compute}} \frac{\beta}{1+load(t)} dt = t_{nominal}
\end{displaymath}
where $\beta$ is the raw speedup of the host over the slowest host.
We have found that the compute time estimates given by this model
correlate strongly ($>0.99$ coefficient of correlation) with the
actual computation time.  

For each simulated \MAP, the simulator requests a host choice from the
mapping algorithm based on the timing constraints, executes the call,
and then informs the mapping algorithm of the actual execution time
given its mapping.

\subsubsection*{Optimal mappings}

Using the model for computing execution times from load traces, the
simulator not only executes the procedure on the host the algorithm
chooses, but on every host.  This lets the simulator compute what an
optimal mapping algorithm would do if invoked at this point: it picks
a host whose execution time falls within the bounds, if one is
available.  It is important to note that simulated time evolves
according to the algorithm under test, thus the measured performance of
the optimal algorithm depends slightly on the algorithm under test.
In practice, this varies only marginally across the algorithms we
test.  We present the performance of the optimal algorithm with
respect to our near optimal algorithm, \RangeCounter.  In this
incarnation, the optimal algorithm is referred to as \OptimalRC.

%the algorithm under test.  In the following, we report the performance
%of the optimal algorithm with respect to our near optimal algorithm.
%The optimal algorithm knows the future -



\subsection{Prediction algorithms}
\label{sec:pred_algs}

\begin{figure}
\small
\centerline{
\begin{tabular}{|l|c|c|c|c|c|}
\hline
	        & \multicolumn{2}{c|}{Complexity}& \multicolumn{3}{c|}{Timings$\mu$-sec} \\
\hline
Name            & Time & Space & (4,50,0.1) & (8,50,0.1) & (16,50,0.1) \\
\hline
\hline
\OptimalRC        & - & - & - & - & - \\
\hline
\Constant\	& $O(1)$ & $O(1)$ & 0.244 & 0.244 & 0.244 \\
\Random\	        & $O(1)$ & $O(1)$ & 0.498 & 0.537 & 0.546 \\
\Mean\	        & $O(N)$ & $O(N)$ & 0.957 & 1.532 & 2.710  \\
\WinMean\	& $O(N)$ & $O(NW)$ & 3.318 & 4.460 & 7.005 \\
\WinVar\		& $O(NW)$ & $O(NW)$ & 7.039 & 9.580 & 14.73 \\
\RandWinMean\ & $O(NW)$ & $O(NW)$ & 7.223 & 13.166 & 24.534 \\
\RandWinVar\	& $O(NW)$ & $O(NW)$ & 17.408 & 33.58 & 65.866 \\
\Confidence\	& $O(NW)$ & $O(NW)$ & 13.674 & 21.360 & 36.775 \\
\RandConfidence\ & $O(NW)$ & $O(NW)$ & 26.951 & 52.869& 102.423 \\
\RangeCounter\	& $O(N) + O(W)$  & $O(NW)$   & 8.500 & 9.551 & 11.960 \\
\NeuralNet\   & $O(N^2W^2)$ & $O(N^2W^2)$ & 41807.630 & 175543.700 & 773379.700 \\
\hline
\end{tabular}
}
\normalsize
\caption{Algorithm complexity and timings.  The numbers $(N,W,P)$
refer to the number of hosts, the window size, and the randomization
probability.  Timings were collected on a DEC 3000/400.}
\label{fig:alg_timings}
\end{figure}

We designed and implemented nine different history-based prediction
algorithms for the simplified mapping problem.  Most of the
history-based algorithms are the simple statistical algorithms that
one is lead to when initially thinking about the problem.  A common
failing of these is that they fall into modes where they refuse to
``explore'' the space of possible mappings and so can get ``stuck''
giving unfortunate mappings repeatedly.  We added randomization to
these initial algorithms to attempt to avoid this.  The two remaining
algorithms include a neural network algorithm and the \RangeCounter\
algorithm, which is nearly optimal most of the time.  This section
describes each of these algorithms.  Figure~\ref{fig:alg_timings}
shows the asymptotic complexity and actual running times of all the
algorithms.  It also includes three other algorithms we use for
comparison purposes, \OptimalRC, which as described in
Section~\ref{sec:simulator}, \Random, and \Constant.  \Constant\ simply
always picks the same host while \Random\ always picks a host at
random.

The simplest algorithms include \Mean, \WinMean, and
\WinVar.  For each host \Mean\ computes the mean
execution time over the entire history of executions on that host and
chooses the host with the lowest mean.  \WinMean\ is similar,
except the mean is computed over the last $W$ executions on the host.
\WinVar\ chooses the host with the lowest variance in
execution time over the last $W$ executions.  \RandWinMean\
and \RandWinVar\ select a random host with probability $P$
and act like their non-randomized counterparts with probability $1-P$.

\Confidence\ assumes that execution times are picked from a
normal distribution.\footnote{A variety of other distributions were
also examined.}  It computes the sample mean and variance over the
last $W$ executions on the host and parameterizes a normal
distribution with these values.  The host whose normal has the highest
cumulative probability over the $[t_{min},t_{max}]$ range is chosen.
The \RandConfidence\ algorithm chooses a random host with probability
$P$ and performs like \Confidence\ with probability $1-P$.

The \RangeCounter\ algorithm can be thought of as a
non-linearly filtered version of the \Confidence\ algorithm.
Each host is assigned a quality value, initially zero.  To choose a
host, the following steps are taken: If no host has a quality value
significantly higher than zero, a random host is chosen, otherwise the
host with the highest quality value is chosen.  Next, the quality
values of all hosts are reduced (aged) by a small amount.  If the
mapping is successful, then the chosen host's quality value is
increased by the inverse of our statistical confidence in it.  As with
the \Confidence\ algorithm, we compute the sample mean and
variance of the past $W$ executions, parameterize a normal
distribution with them, and compute the cumulative probability of that
distribution over $[t_{min},t_{max}]$.  We increase the host's quality
value by the {\em inverse} of the cumulative probability.  If the
mapping is unsuccessful, we divide the chosen host's quality value by
two.  \RangeCounter\ rewards good choices based on how
unlikely they appear to be able to bear fruit and punishes bad choices
equally.  The idea is to encourage exploration and risk taking, but to
react quickly and decisively to failure.  The aging process
discourages complacency.


The \NeuralNet\ algorithm consists of a multilayer
feed-forward neural network~\cite{MACHINE-LEARNING-MITCHELL} where
each host contributes $W$ inputs, the durations of the last $W$
executions on that host.  For $N$ hosts, the intermediate layer has
$NW/2$ nodes and the output layer has $N$ nodes.  The network is
trained to drive the output representing the best host to the highest
output value.  To choose a host, the inputs are propagated forward
through the network and the host with the highest corresponding output
value is chosen.  If the choice is successful, we reinforce the chosen
output to unity and the remaining outputs to their present values and
propagate these outputs back through the network.  Similarly, if the
choice is unsuccessful, we reinforce the chosen output to zero and the
remaining outputs to their present values and propagate back.
Essentially, Each choice/response pair is an iteration of the BACKPROP
neural network training algorithm.  We continuously train the network
in operation.

\subsection{Simulated environments}

%\begin{figure}
%\centerline{
%\epsfxsize=4in
%\epsfbox{machine_groups.epsf}
%}
%\caption{Test configurations for PSC hosts}
%\label{fig:host_groups}
%\end{figure}

\begin{figure}
\centerline{
\begin{tabular}{c}
\epsfxsize=6in
\epsfbox{machine_groups.epsf}
\\
\begin{tabular}[b]{|c|c|c|c|}
\hline
 & \multicolumn{3}{c|}{Load Category} \\
\hline
Epoch Category & Small & Mixed & Large \\
\hline
Short          &  5SS  &  9MS  & 4LS \\
Mixed          &  9SM  &  8MM  &    \\  
Long           &  4SL  &       &    \\
\hline
\end{tabular}
\end{tabular}
}
\caption{Test configurations for PSC hosts}
\label{fig:host_groups}
\end{figure}


We constructed six different configurations of hosts from the load
traces discussed in Section~\ref{sec:load_traces}.  The configurations
consist of traces of hosts with different mean loads and different
mean epoch lengths.  In Figure~\ref{fig:host_groups} we plot each load
trace according to its mean load and its mean epoch length.  The six
configurations we discuss here are then highlighted, labeled, and
tabulated.  We categorize the load of a configuration by whether it
contains only machines with low ($\approx 0.2$) mean loads, only
machines with high mean loads ($\approx 1.1$), or a mixed combination
of machines in the former categories.  Similarly, we categorize the
epoch length of a configuration by whether all of its machines have
short (150-300 seconds) epochs, long (400-500 seconds) epochs, or a
combination of the former.  The labels consist of a number and two
letters.  The number represents the number of hosts in the
configuration, the first letter represents whether the load category
is (S)mall, (M)ixed, or (L)arge, and the second letter indicates
whether the epoch category is (S)hort, (M)ixed, or (L)ong.  Notice
that there are nine combinations, but our data limits us to exploring
only six of the combinations.

The configurations presented here contain only the PSC hosts.  We have
also done limited simulations using other configurations involving the
remaining machines, but there is no space to present them here.  The
advantage of using the PSC hosts is that they nicely cover most of the
space formed by the load and epoch categories. 

For each of the six configurations, we looked at eight different
values of the nominal execution time $t_{nominal}$, and twelve
different values of the upper timing bound $t_{max}$, ranging for each
case from $t_{nominal}$ to three times $t_{nominal}$.  $t_{min}$ was
set at zero.  This forms 576 different cases and we simulated 100,000
consecutive calls for each case, for a total of 57.6 million simulated
calls.  In addition, for each configuration, we simulated what happens
when $t_{nominal}$ is picked randomly from a set of four values
(25,50,75,100 ms) on each iteration, and $t_{max}$ is varied as
before.  This forms another 72 cases and 7.2 million calls.  


\subsection{Algorithm performance}
\label{sec:alg_perf}

In this section, we present three slices through our simulation
results to make presentation tractable.  The first two slices show how
the algorithms' performance varies with nominal execution time and the
upper time bound.  The third slice shows how the performance varies
with the upper time bound when the nominal execution time is chosen
randomly from four different values.  The performance of an algorithm
is the percentage of calls that meet the time bounds. 

Our discussion covers all of the simulations.  However, in order to
simplify our presentation, we show graphs for only two representative
configurations, 8MM and 4LS.  As can be seen in
Figure~\ref{fig:host_groups}, 8MM contains eight machines with mixed
load and epoch lengths, while 4LS contains four machines with high
loads and short epoch lengths.  To show the effect of nominal
execution time, only 8MM is shown since 4LS does not provide any
additional insight.  The other configurations show similar results.

For the algorithms that have parameters, namely $W$, the window of
previous execution times recorded for a host, and $P$, the probability
of a random mapping (for the randomized algorithms), we have fixed the
parameters to $W=50$ (10 for \NeuralNet) and $P=0.1$.  The performance
of the algorithms given these values is representative of how they
perform with the other, non-extreme, values we studied.  The relative
performance of the algorithms is not very sensitive to the choice of
these values.

\subsubsection*{Effect of nominal execution time}

\begin{figure}
\epsfxsize=5in
\centerline{\epsfbox{effect_of_nom_time_8MM_legend.epsf} }
\caption{The effect of varying nominal execution time for the 8MM case. The bounds are $[0,t_{nominal}]$. }
\label{fig:effect_of_nom_time}
\end{figure}

Figure~\ref{fig:effect_of_nom_time} shows how the different algorithms
perform when mapping nodes with different nominal execution times in a
representative configuration, 8MM.  The other configurations yield
similar results.  The nominal times range from 50 milliseconds to 60
seconds.  The x-axis is the nominal time, $t_{nominal}$, ranging from
50 milliseconds to 60 seconds, and the y-axis is the percentage of
calls that meet the bounds.  The bounds are set to $[0,t_{nominal}]$.

Notice that there is a substantial gap between the performance of the
optimal mapping algorithm (\OptimalRC) and the performance of random
mappings (\Random).  Although we have elided the results for clarity,
it is also never the case that always choosing the same host works
well.  We find similar results for the other five configurations, with
differences ranging from 40 to 80 percent.  This suggests that a good
mapping algorithm can make a large difference.

Interestingly, the simple statistics-based mapping algorithms and
their randomized variants perform rather badly for the most part, some
worse than even a random mapping.  Further, they tend to behave
inconsistently.  For example, \WinVar\ is about 35\% better than
\Random\ at 50 milliseconds, then abruptly declines with increasing
$t_{nominal}$ only to improve later and finally become completely
ineffectual at 60 seconds.  In another configuration, we find it near
optimal for some nominal times and yet collapsing to worse than
\Random\ for nearby times. On the other hand,\NeuralNet\ and \RangeCounter\ behave
consistently well and near-optimally for short nominal times less than
one second.  In fact, \RangeCounter\ is the best algorithm for all
configurations and nominal times except for the 60 second time in the
4LS configuration.  

\subsubsection*{Effect of bounds}

\begin{figure}
\epsfxsize=5in
\centerline{\epsfbox{effect_of_constraint_100ms_8MM_legend.epsf} }
\caption{The effect of different timing constraints for the 8MM case. 
$t_{nominal}=100ms$ and the constraint is $[0,t_{max}]$ where
$t_{max}$ varies from 100ms to 300ms.}
\label{fig:effect_of_constraint_100ms_8MM}
\end{figure}

\begin{figure}
\epsfxsize=5in
\centerline{\epsfbox{effect_of_constraint_100ms_4LS_legend.epsf} }
\caption{The effect of different timing constraints for the 4LS case. 
$t_{nominal}=100ms$ and the constraint is $[0,t_{max}]$ where
$t_{max}$ varies from 100ms to 300ms.}
\label{fig:effect_of_constraint_100ms_4LS}
\end{figure}


The effect of loosening the upper bounds on the performance of the
algorithms is quite interesting.  In
Figure~\ref{fig:effect_of_constraint_100ms_8MM} we show the effect in
the 8MM configuration and in
Figure~\ref{fig:effect_of_constraint_100ms_4LS} we show the effect in
the 4LS configuration. The x-axis in each graph is the upper time
bound and the y-axis is the percentage of calls that meet the bound.
The nominal execution time is fixed at 100 milliseconds and the upper
bound varies from 100 milliseconds to 300 milliseconds.  The remaining
four configurations show effects similar to the 8MM configuration
(Figure~\ref{fig:effect_of_constraint_100ms_8MM}).  If we scale the
nominal time and bounds, we see similar graphs in each of the
configurations.  For example, the graph for the 8MM configuration with
a 10 second nominal time whose x-axis ranges from 10 seconds to 30
seconds looks similar to
Figure~\ref{fig:effect_of_constraint_100ms_8MM}.  The choice of a 100
ms time for this set of graphs was no accident.  100 ms marks the
transition between imperceptible delay and perceptible delay.  To be
able to place a tight bound on a 100 ms task is important because even
small extensions in the delay have large consequences.

In Figure~\ref{fig:effect_of_constraint_100ms_8MM}, notice that there
is great differentiation between the optimal algorithm and a random
mapping as the upper bound gets tighter.  The same holds true in
Figure~\ref{fig:effect_of_constraint_100ms_4LS} for bounds greater
than 200 ms.  For this configuration, there are so few machines that
can meet the tighter bounds that the gap between \OptimalRC\ and
\Random\ declines severely with bounds tighter than 200 ms.  

In both figures, the simpler algorithms require a reasonably large
amount of additional slack in the upper bound before their performance
catches up to \RangeCounter\ and \NeuralNet.  This is also the case
for the other configurations, although the difference tends to be
slightly less for configurations with only unloaded machines.  In
Figure~\ref{fig:effect_of_constraint_100ms_8MM}, the simpler
algorithms require between 25 and 100 milliseconds of extra slack to
get to within an absolute 10\% of the performance of \RangeCounter\
and
\NeuralNet.  This is a significant amount of time.  For
Figure~\ref{fig:effect_of_constraint_100ms_4LS}, we see a great
upsurge in performance as the bound is relaxed to about double the
nominal execution time.  Before this point, there are very few
opportunities in which it is possible to meet the bound.  However,
\RangeCounter\ manages to ferret out these opportunities and manages to
closely approximate the optimal algorithm's performance across the
board.  It behaves well even when choices are very constrained.


\subsubsection*{Effect of varying nominal execution time}

\begin{figure}
\epsfxsize=5in
\centerline{\epsfbox{effect_of_varying_tnominal_8MM_legend.epsf}}
\caption{The effect of different timing constraints for the 8MM case.  
$t_{nominal}=100ms$ and the constraint is $[0,t_{max}]$ where
$t_{max}$ varies from 100ms to 300ms.}
\label{fig:effect_of_varying_constraint_100ms_8MM}
\end{figure}

\begin{figure}
\epsfxsize=5in
\centerline{\epsfbox{effect_of_varying_tnominal_4LS_legend.epsf}}
\caption{The effect of different timing constraints for the 4LS case.  
$t_{nominal}=100ms$ and the constraint is $[0,t_{max}]$ where
$t_{max}$ varies from 100ms to 300ms.}
\label{fig:effect_of_varying_constraint_100ms_4LS}
\end{figure}



Figures~\ref{fig:effect_of_varying_constraint_100ms_8MM} (8MM
configuration) and~\ref{fig:effect_of_varying_constraint_100ms_4LS}
(4LS configuration) show the effect of loosening the upper bound when
the nominal execution time is picked randomly from a set of values.
In each graph, the nominal time is randomly picked from 25, 50, 75,
and 100 milliseconds for each call.  Intuitively, this can be seen as
following different control paths through the procedure.  The x-axis
is the upper time bound and the y-axis is the performance.  We sweep
the upper bound from 25 ms to 300 ms.  The linear upward trend on the
first part of each graph is due to the varying nominal time.  At 50
milliseconds, only about 50\% of the calls have a nominal time of 50
milliseconds or less, so even the optimal algorithm can only manage to
map 50\% of the calls successfully, even given totally idle machines.

Notice the separation between the optimal algorithm and random
mapping, suggesting opportunity for a mapping algorithm.  The
separation is smaller than in the previous cases, since at any
particular time bound except for 25 ms, there are some number of calls
that have significant slack.  For example, at 50 ms, 25\% of the calls
have a nominal time of 25 ms and thus have a 2x slack.  This also
benefits the simple algorithms and decreases the spread in
performance.  However, the spread can still be significant.  For
example, in the 8MM configuration
(Figure~\ref{fig:effect_of_varying_constraint_100ms_8MM}), the spread
is as high as an absolute 40\% of the calls.

An important point here is that \RangeCounter, although it
does well, behaves somewhat erratically in the face of the varying
nominal times and is not clearly superior to the other algorithms
until the bound has been relaxed to encompass the longest of the
different nominal times.  That is, when some varying nominal times
exceed the bound, \RangeCounter\ can behave badly.  In some
cases, it is as much as 20\% down from the optimal algorithm and 
\NeuralNet.   This presents opportunity for improvement.  Indeed,
this behavior will become more important when we begin to map
activation trees, since we cannot expect all the nodes to require the
same nominal time to execute.  Far from being sorrowful, this is
actually an opportunity for the thesis research.

\subsubsection*{Conclusions}

There are several conclusions to be drawn from our results.  The first
is that a mapping algorithm can make a large difference.  There is a
large spread between the performance of an optimal mapping algorithm
and mapping randomly or always mapping to the same host.  The
difference can be as high as 80\%. The second conclusion is that
simple statistics-based algorithms are insufficient.  In some cases,
simple algorithms such as
\Mean, \WinMean, \WinVar, and \Confidence\ can actually perform worse than
a random mapping.  Worse, they can behave erratically in the face of
different nominal execution times.  Adding randomization to these
algorithms smoothes them and improves their overall performance,
but it still does not make them competitive with the best
algorithms, \NeuralNet\ and \RangeCounter. 

The load trace results we presented in Section~\ref{sec:load_traces}
can shed some light on why the simple statistical algorithms perform
so badly.  Recall that the load traces exhibit self-similarity
(Section~\ref{sec:self-sim}) and that their phase space behavior is
relatively simple and ordered (Section~\ref{sec:phase_space}.)
Clearly, consecutive load values are {\em not} chosen independently
according to some stationary probability distribution function.
However, this is precisely the underlying assumption made by the
simple statistical algorithms!  Further, even if, in the long term,
some pdf does emerge, it is highly unlikely that this pdf is
applicable over the short term, given the epochal behavior discussed
in Section~\ref{sec:epochal_behavior}.  Of course, we might expect
that with a sufficiently long nominal time, the simple statistical
algorithms might work better, but even there there is some doubt.
Unfortunately, the histograms of the load traces (which we elide for
space reasons) do not appear to be fit any analytic distribution we
are aware of.

\NeuralNet\ and \RangeCounter\ appear to be better able to deal with
the properties exhibited by the load traces because they behave
non-linearly.  This is probably a great benefit when transitioning
from one epoch to the next because it suggests that the internal state
of the algorithms will change drastically at these points, which is,
of course, what we desire.  Another interesting point is that as the
nominal execution time grows, there are fewer and fewer training
examples per epoch.  This offers a possible explanation for why
\NeuralNet, which behaves near optimally for short nominal times,
quickly declines as the nominal times grow.  Often, as neural networks
are trained, they tend to converge only slowly to a good set of
weights.  After some large number of training sets, there is an abrupt
convergence, beyond which any additional training tends to make the
neural network slowly worse.  We suspect that what we are seeing is
that with a long enough nominal time there is simply an insufficient
number of training samples in an epoch for \NeuralNet\ to make that
abrupt convergence.

The most important conclusion is that \RangeCounter\ exhibits the best
performance in nearly every case, and behaves relatively smoothly.
For nominal times less than one second, it behaves almost optimally,
while its performance degrades gradually for longer nominal times.
Because it is near optimal even with the tightest bound, it is no
surprise that it remains so as we relax the bound.  Further, while
\NeuralNet\ is also near optimal under conditions of tight
bounds and short nominal times, none of the other tractable algorithms are
especially close, and some require considerable loosening of the upper
bound before they become anywhere near optimal. The failings of
\RangeCounter\ are that behaves suboptimally in the presence of
varying nominal times and very long nominal times.  In some sense,
\RangeCounter\ tends to become specialized too quickly, which is the
reverse of \NeuralNet's problem.  \NeuralNet\ seems to remain
sufficiently general to do well with varying nominal times.  These
imperfections present opportunity for future research.


%\landscape
%\begin{figure}
%\begin{tabular}{ccc}
%\epsfxsize=2.6in
%\epsfbox{effect_of_nom_time_5SS.epsf}
%&
%\epsfxsize=2.6in
%\epsfbox{effect_of_nom_time_9MS.epsf}
%&
%\epsfxsize=2.6in
%\epsfbox{effect_of_nom_time_4LS.epsf}
%\\
%\epsfxsize=2.6in
%\epsfbox{effect_of_nom_time_9SM.epsf}
%&
%\epsfxsize=2.6in
%\epsfbox{effect_of_nom_time_8MM.epsf}
%&
%\\
%\epsfxsize=2.6in
%\epsfbox{effect_of_nom_time_4SL.epsf}
%&
%&
%\tiny
%\begin{tabular}[b]{c}
%\epsfxsize=2.6in
%\epsfbox{legend.epsf}
%\\
%\begin{tabular}[b]{|c|c|c|c|}
%\hline
% & \multicolumn{3}{c|}{Load Category} \\
%\hline
%Epoch Category & Small & Mixed & Large \\
%\hline
%Short          &  5SS  &  9MS  & 4LS \\
%Mixed          &  9SM  &  8MM  &    \\  
%Long           &  4SL  &       &    \\
%\hline
%\end{tabular}
%\end{tabular}
%\normalsize
%\\
%\end{tabular}
%\caption{The effect of varying nominal execution time.  Except for the 4LS case, the bounds are $[0,t_{nominal}]$.  In the 4LS case, they are $[0,2t_{nominal}]$}
%\label{fig:effect_of_nom_time}
%\end{figure}
%\endlandscape




%\landscape
%\begin{figure}
%\begin{tabular}{ccc}
%\epsfxsize=2.6in
%\epsfbox{effect_of_constraint_100ms_5SS.epsf}
%&
%\epsfxsize=2.6in
%\epsfbox{effect_of_constraint_100ms_9MS.epsf}
%&
%\epsfxsize=2.6in
%\epsfbox{effect_of_constraint_100ms_4LS.epsf}
%\\
%\epsfxsize=2.6in
%\epsfbox{effect_of_constraint_100ms_9SM.epsf}
%&
%\epsfxsize=2.6in
%\epsfbox{effect_of_constraint_100ms_8MM.epsf}
%&
%\\
%\epsfxsize=2.6in
%\epsfbox{effect_of_constraint_100ms_4SL.epsf}
%&
%&
%\tiny
%\begin{tabular}[b]{c}
%\epsfxsize=2.6in
%\epsfbox{legend.epsf}
%\\
%\begin{tabular}[b]{|c|c|c|c|}
%\hline
% & \multicolumn{3}{c|}{Load Category} \\
%\hline
%Epoch Category & Small & Mixed & Large \\
%\hline
%Short          &  5SS  &  9MS  & 4LS \\
%Mixed          &  9SM  &  8MM  &    \\  
%Long           &  4SL  &       &    \\
%\hline
%\end{tabular}
%\end{tabular}
%\normalsize
%\\
%\end{tabular}
%\caption{The effect of different timing constraints.  In each case,
%$t_{nominal}=100ms$ and the constraint is $[0,t_{max}]$ where
%$t_{max}$ varies from 100ms to 300ms.}
%\label{fig:effect_of_constraint_100ms}
%\end{figure}
%\endlandscape


%\landscape
%\begin{figure}
%\begin{tabular}{ccc}
%\epsfxsize=2.6in
%\epsfbox{effect_of_varying_tnominal_5SS.epsf}
%&
%\epsfxsize=2.6in
%\epsfbox{effect_of_varying_tnominal_9MS.epsf}
%&
%\epsfxsize=2.6in
%\epsfbox{effect_of_varying_tnominal_4LS.epsf}
%\\
%\epsfxsize=2.6in
%\epsfbox{effect_of_varying_tnominal_9SM.epsf}
%&
%\epsfxsize=2.6in
%\epsfbox{effect_of_varying_tnominal_8MM.epsf}
%&
%\\
%\epsfxsize=2.6in
%\epsfbox{effect_of_varying_tnominal_4SL.epsf}
%&
%&
%\tiny
%\begin{tabular}[b]{c}
%\epsfxsize=2.6in
%\epsfbox{legend.epsf}
%\\
%\begin{tabular}[b]{|c|c|c|c|}
%\hline
% & \multicolumn{3}{c|}{Load Catagory} \\
%\hline
%Epoch Category & Small & Mixed & Large \\
%\hline
%Short          &  5SS  &  9MS  & 4LS \\
%Mixed          &  9SM  &  8MM  &    \\  
%Long           &  4SL  &       &    \\
%\hline
%\end{tabular}
%\end{tabular}
%\normalsize
%\\
%\end{tabular}
%\caption{The effect of different timing constraints.  In each case,
%$t_{nominal}=100ms$ and the constraint is $[0,t_{max}]$ where
%$t_{max}$ varies from 100ms to 300ms.}
%\label{fig:effect_of_varying_constraint_100ms}
%\end{figure}
%\endlandscape





