\section{Thesis contributions}
\label{sec:contrib}

The contributions of this thesis will include algorithms, models,
evaluation, and artifacts.  The contributions of my work to date
include the following:
\begin{itemize}

\item{{\bf LDOS:} I developed a lightweight distributed object system to
better understand modern remote execution mechanisms and procedure
execution times on remote systems. }

\item{{\bf Load traces:} I built tools to collect host
load traces and collected over a week of fine-grain host load data on
a wide variety of machines (Section~\ref{sec:load_trace_measurement}.)
As far as I am aware, there is no such family of traces available to
the research community. }

\item{{\bf Load trace analysis:} I performed an in-depth analysis of the
collected load data, presented in Section~\ref{sec:load_traces}.  As
far as I am aware, this is the first analysis to identify the
following important properties of host load:
\begin{itemize}
\item{{\bf Self-similarity} (Section~\ref{sec:self-sim})}
\item{{\bf Epochal behavior of local frequency content}
(Section~\ref{sec:epochal_behavior})}
\item{{\bf Simple phase space behavior} (Section~\ref{sec:phase_space})}
\end{itemize}
}

\item{{\bf Compute time model:}  I created and validated a model for 
determining compute time from a load trace (Section~\ref{sec:simulator}) }

\item{{\bf Simulation infrastructure:} Using the compute time model,
I developed a trace-based simulator for a simplified version of the
dynamic mapping problem which ignores communication costs and maps
only leaf nodes (Section~\ref{sec:simulator}.) }

\item{{\bf Algorithms:} I developed nine different mapping algorithms
for the simplified mapping problem (Section~\ref{sec:pred_algs}.) }

\item{{\bf Evaluation:} Using the simulator and six representative
groups of hosts, I vigorously evaluated the mapping algorithms for the
simplified dynamic mapping problem (Section~\ref{sec:alg_perf}) and
found the following results:
\begin{itemize}
\item{{\bf Mapping algorithms matter.}  Mapping calls to randomly selected
hosts or always to the same host results in significantly fewer of the
calls executing within their time bounds than is optimal. }
\item{{\bf Simple statistical algorithms are insufficient.} The
performance of such algorithms is suboptimal and highly dependent on
the tightness of the time bounds, the nominal execution time, and
other factors.  Adding randomization to these algorithms does not
significantly improve their performance. }
\item{{\bf Neural networks perform well but are too expensive.}  For
short nominal times and even tight bounds, a neural network approach was
consistently near optimal.  However, the compute costs are clearly too
high for this approach to be practical.}
\end{itemize}
}

\item{{\bf Simple and effective mapping algorithm:} I developed a
mapping algorithm that is practical and yet gives near optimal
performance for short nominal times and tight bounds.  Even for long
nominal times, it performs better than any other algorithm we tested.
However, it is sensitive to varying nominal times.}
\end{itemize}

The thesis work will be focused on extending my methods and results
to the full dynamic mapping problem.  The contributions I expect from
this work include:
\begin{itemize}
\item{{\bf Methodology for collecting activation tree traces from real
applications:} As far as I know, no such methodology exists, and yet
such traces are clearly needed for analysis and simulation.  The
overall structure of this methodology is already in place, as
discussed in Section~\ref{sec:outline}. }

\item{{\bf Methodology for collecting network traces:} Although
networking researchers regularly collect traces to analyze, simulate,
and improve {\em networks}, I want to collect traces that are useful
for simulating the communication of applications on networks in order
to improve the {\em applications}. }

\item{{\bf Activation tree traces:} These traces
will let myself and others simulate and analyze real interactive
applications.  No collection of such traces exists at the present
time, so such a collection may also be quite useful to other
researchers. }

\item{{\bf Network traces:} These traces will let myself and others simulate
realistic communication times. It may also be possible to use traces
collected by networking researchers. }

\item{{\bf Analysis of activation tree traces:} I intend to analyze
the activation tree traces that I collect to the same level of detail
as I analyzed load traces in my current work. My analysis will be
geared toward history-based prediction.  I expect that one of the
results of this analysis will be a model and methodology for
generating activation trees with similar properties.  }

\item{{\bf Analysis of network traces:} My analysis of the network
traces will also be geared toward history-based prediction.  I am not
interested in long term aggregate properties of the network, but in
short term properties on the scale of the computation and
communication times of remote execution facilities.}

\item{{\bf Benchmarks:} A benchmark is a combination of an
activation tree trace and some number of host load and network traces
that can be used to simulate running a particular interactive
application in a particular distributed computing environment. A
family of such benchmarks would be very useful to researchers in
comparing different dynamic mapping algorithms. }

\item{{\bf Communication time model:} The extended simulator will have
to determine communication times given data sizes and network traces.
I will develop and validate a model for this purpose.  I am not sure
yet whether I will limit myself to client/server communication
(request-response between a pair of hosts) or attempt the
many-to-one/one-to-many parallel communication one would expect in a
optimized shared memory environment.  }

\item{{\bf Extended simulation infrastructure:} Using the
communication time model I will extend my simulator to support the
full dynamic mapping problem.  I will keep the simulator as simple and
realistic as possible by directly using the traces I will have
collected earlier.  I intend to maintain the current
``plug-and-play'' nature of the simulator so that will be useful for
future researchers as well as myself.  }

\item{{\bf Mapping algorithms for full dynamic mapping problem:} The
core contribution of this thesis will be mapping algorithms for the
full dynamic mapping problem.  I expect that this will involve a
prediction algorithm for estimating the number of children of a given
activation tree node and the distribution of nominal time among them,
and an extension of the algorithm I developed for the simplified
problem for mapping individual nodes.}

\item{{\bf Extensive evaluation of algorithms in simulator:} I will
evaluate the algorithms I develop using my simulator on the benchmarks
and other test sets.   The goal is to do as much evaluation as
possible in simulation.  This seems reasonable given the simulator's
trace-based nature, although it will be necessary for the evaluation
to be extensive in order to avoid dependencies on particular traces.
This extensive nature will also help to illustrate how each algorithm
depends on the properties of the environment and program.}

\item{{\bf Validation of mapping algorithm in LDOS:} As a proof of
concept, I will incorporate the best mapping algorithm I develop into
the LDOS system.  The goal here is to show that my approach is not
limited to a simulation environment, but can be incorporated into a
real system.  }
\end{itemize}


%I expect the primary contributio n to be a history-based prediction
%algorithm for the dynamic mapping problem posed in
%Section~\ref{sec:dyn_map_prob}.  Other contributions will include
%models for host load, network bandwidth and latency, and activation
%tree structure that are geared toward history-based prediction.  The
%artifacts that will be generated will include an infrastructure for
%collecting host, network, and actication tree traces, a large database
%of these traces, a simulation environment, and a prototype
%implementation in the LDOS system.





