\section{Conclusions}

The first conclusion to draw is that using a selector to map leaf
calls makes a difference.  There is a large spread between the
performance of the optimal selector and the random selector or
single-node selector.  We have shown spreads as high as 80\% between
random and optimal mappings.  

The second conclusion to draw is that simple selectors can perform
significantly better than a random or single-node selector.

The third conclusion is selectors cannot be too simple.  Obvious
selectors such as Mean, WinMean, WinVar, Confidence are very sensitive to the
computing environment, nominal execution times, and timing
constraints.  In fact, they act eratically and can get ``stuck'' such
that they consistently make the same wrong decisions. 

The fourth conclusion is that by adding an element of randomness to
these selectors, we can smooth out their performance, because we are
forcing them to explore to a greater extent.  However, this smoothing
also applies to their peak performance - while performance improves
for some operating regimes, it declines for others.  

The fifth conclusion is that we have developed a selector,
RangeCounter(W), which exhibits exhibits the best performance in
nearly all operating regimes.  Further, RangeCounter(W) behaves
smoothly.  Finally, RangeCounter(W) has near optimal performance for
short, sub-second, nominal execution times in almost all cases.  For
longer nominal execution times, performance degrades smoothly with
time and RangeCounter(W) remains the best selector in almost all
cases.

The sixth conclusion is that there is room for improvement with longer
($>$ 1 sec) nominal execution times.  For one minute times, even the
best selectors are consistently between 10 and 30 percent worse than
optimal.  

The seventh conclusion is that a Neural Network approach is
competitive for short nominal execution times (<1sec) and displays the
smoothness we desire.  However, its time and space complexity pretty
much rules it out.  It simply takes too long to run the selector.

All of the selectors we examined, except for NeuralNet(W), have low
asymtotic time and space complexity and very low (sub-millisecond)
running times.  This means that they can be used to map relatively
fine grain procedure calls.  Typical fast null RPCs on modern systems
take about one millisecond.  Running RangeCounter(W) takes less than
20 microseconds, even for relatively large numbers of hosts and a
large history window, putting it in the noise when it comes to the
overall execution time.  Conclusion:  The benefit is worth the
marginal cost.  



\begin{figure}
\begin{tabular}{|l|l|}
\hline 
Call Type & Time (usec) \\
\hline
Compiler                           & 0.032  \\
StreamCall (optimized in-thread)   & 0.050  \\ 
StreamCall (unoptimized in-thread) & 4.95 \\ 
StreamCall (process-to-process)    & 90.09 \\
CORBA     & 1000 (typ) \cite{foo} \\
DCE       & 1000 (typ) \cite{foo} \\
\hline
\end{tabular}
\end{figure}