\section{Outline of thesis}
\label{sec:outline}

In my thesis, I will extend the work I have done on the simplified
dynamic mapping problem to attack the full dynamic mapping problem.
There will be six stages to this work.  The first stage will involve
collecting extensive activation tree and network traces for use in
trace-driven simulation.  In the second stage, I will analyze and
characterize these traces, looking for properties that may be useful
for history-based prediction.  The third stage will involve extending
the trace-driven simulator I developed for my earlier work to use the
traces.  At this point, I will have a simulation environment in which
host computation times, communication times, and activation tree
structure will have high fidelity to the real world.  In the fourth
stage, I will use the simulator to develop a history-based dynamic
mapping algorithm to solve the full problem.  In the fifth stage, I
will evaluate the algorithm in simulation and I will incorporate it
into the LDOS system to show it works in a real system.  In the final
stage, I will write the dissertation document.  The remainder of this
section describes the stages in more detail.  A timeline for these
stages is given in Figure
\ref{fig:timeline} and the remainder of this section describes the
stages in more detail.

\begin{figure}
\centerline{
\begin{tabular}{|l|l|c|}
\hline
Activity & Artifacts & Duration \\
\hline
Trace collection & measurement tools, traces & 3 Months \\
Trace characterization & models, trace families & 3 Months \\
Simulator extension & simulator for full problem & 2 Months \\
Algorithm development & Mapping algorithm & 5 Months \\
Evaluation & Benchmarking results, LDOS implementation & 2 Months \\
Writing/Defense & Defended thesis & 3 Months \\
\hline
\multicolumn{2}{|l}{Total Time:} & 18 Months \\
\hline
\end{tabular}
}
\label{fig:timeline}
\caption{Thesis timeline}
\end{figure}

\subsubsection*{Trace collection}
%\label{sec:trace_collection}

In order to simulate realistic communication times and activation tree
structures, I will first collect extensive traces of real networks and
real applications.  These traces will complement the host load traces
I have already collected and allow me to simulate realistic environments.

I have already devised a methodology for collecting traces of
activation trees in Microsoft Windows applications.  I have written a
tool based on the DyninstAPI run-time dynamic instrumentation
library~\cite{DYN-INST-SHPCC,DYNINSTAPI-MANUAL} that lets me
instrument the entry point and exit points of any procedure in a
Windows COFF binary.  An input file lets me specify which procedures
to instrument.  By adding calls to a dynamically linked logging
library at entry and exit points, I can log activation trees to a
file.  The nodes of these trees are annotated with the time they
started executing and the duration of execution.  In addition to
capturing the tree structures I require, by running these instrumented
applications on a host with controlled (zero) load, the durations
stamped on each node will reflect the nominal execution times that the
simulator needs to determine actual running time given different load
conditions.

In addition to the activation trees, it is also necessary to capture
what data each procedure call references, since it will be necessary
to account for communicating this data when simulating running these
calls remotely.  One way of collecting this information is to limit
ourselves to codes whose interfaces are specified via an interface
definition language or an interface repository such as in CORBA or
DCOM.  A more practical approach is probably to use virtual memory
mechanisms to determine which pages each procedure references.  I am
currently looking at the user-level guard page mechanism in Windows NT
for this purpose.  Intuitively, on entry to a procedure, we can mark
all the process's pages with the guard bit.  Then each page the
procedure references will cause an exception that we can catch.  The
set of referenced pages is a conservative estimate of the data the
procedure references.  Another approach to collecting this data is to
instrument every load and store in the program.  Whichever approach is
used, it will be important to map the logged data reference behavior
of a procedure call to the actual data movements that the remote
execution facility would perform in order to execute the call
remotely.  For example, a trace of guard page faults corresponds
directly to the data movements a simple page-based DSM system would
have to undertake.  However, an RPC-based system could move less data
(because not all data on a page may be referenced) or more data
(because data for {\em all} possible execution paths must be moved)
than the trace suggests.

Extensive instrumentation will slow down the program considerably,
which is a concern because it is interactive, and the user may behave
differently if it runs slowly.  However, it is important to note that
we can collect our measurements in a two stage process.  In the first
stage, the user would execute an uninstrumented program with a
monitoring tool which would only collect the messages being sent to
the program.  This would only marginally affect the performance of the
program.  In the second stage, the log of messages would be fed to the
instrumented program, which could run as slowly as necessary to
collect the activation trees and data reference behavior.

I am still undecided as to how to collect network traces.  The purpose
of these traces is to enable the simulator to answer the question ``if
I move $N$ bytes of data from host $i$ to host $j$ at time $t$, how
long will it take?''  One possibility is to collect packet traces
(records of the form (time,sender,receiver,size)) on a broadcast
Ethernet network using a packet sniffer.  Durations of additional
traffic can then be computed either at the packet level or by the
residual bandwidth.  I have experience using this methodology in prior
work characterizing network traffic of Fx
programs~\cite{DINDA-TRAFFIC-CHAR-PARALLEL-PGMS}.  Another possibility
is to use the Remos system~\cite{REMOS-TR} which is currently being
developed by the Remulac group or by in-switch monitoring using the
active networking facilities being developed by the Darwin
group~\cite{DARWIN-TR}.  While these approaches would allow me to
collect data on more sophisticated ATM and switched Ethernet networks,
it is unclear when they will be available and in what form.  Another
issue is the granularity at which measurements will be able to be done
in these systems.  Since I am concerned with relatively fine-grain
communication, a measurement system that would provide only coarse
information would be non-ideal.  It may also be possible to use
existing network trace databases or to generate a trace from an
existing model.

The artifacts produced at this stage will consist of an infrastructure
for collecting host, network, and activation tree traces, as well as a
large collection of traces.

\subsubsection*{Trace characterization}

In the next stage of my thesis work, I will characterize the network
and activation tree traces I have collected looking for properties
potentially beneficial to history-based prediction.  

The way I analyze the network traces will mirror the approach I have
taken in examining host load traces in my current work.  In
particular, I will use the tools of statistics, time series analysis,
spectral analysis, self-similarity, and chaotic dynamics.  I expect to
be able to leverage the fairly extensive work that has been previously
done by myself and others on characterization of aggregate traffic.

Activation tree traces are more complex than a simple time series and
I am not sure exactly how to approach them at this point.  In terms of
solving the dynamic mapping problem, what is important is to know how
to divide the bounds placed on a parent node among its children.  A
useful characterization will facilitate predicting how many children a
node has and how large the subtree rooted at each of those children
is.  

The artifacts produced at this stage will be parameterizable models
that can be fit to trace data, and classification of the traces into
families based on the parameters.  Members from the families can then
be selected to form benchmarks.

\subsubsection*{Simulator extension}

Now having a useful family of load, network, and activation tree
traces, I will extend the simulator I have built to support them.  I
will develop a simple method to accurately compute communication time
given the size of the message and a trace of the traffic on the
network.  The part of the simulator that produces nodes to be mapped
will be replaced with one that can use the activation tree traces.  I
may also rewrite other portions of the simulator to make it more
modular and adaptable to other related simulation studies - adding
utility function support, for example.  The result of this work will
be a simulator well suited to attacking the full dynamic mapping
algorithm.

\subsubsection*{Algorithm development}

The heart of the thesis will be in developing an algorithm to solve
the dynamic mapping problem I posed in section \ref{sec:dyn_map_prob}.
In my current work, I have developed a good algorithm for a simplified
version of the problem where communication is free and only leaf nodes
are to be mapped.  I hope to complement this algorithm with an
algorithm that uses history-based prediction to divide the bounds
placed on a parent node among its children.  These predicted bounds
will represent the unexecuted part of the activation tree.  I am not
firmly tied to the algorithm I have developed and it may be necessary
or desirable to take a different approach.  The figure of merit for a
mapping algorithm will be the percentage of bounds it meets for a
family of activation tree traces on several different simulated
environments.  A good algorithm will score close to the optimal as
defined by simulating each possible mapping for sufficiently small
activation trees, and each possible node choice given prior choices
made by the algorithm under test for larger trees.

The questions I want to answer include
\begin{itemize}
\item{How can we predict host and network properties, execution times,
and whether bounds will be meet?}
\item{What history is most fruitful to collect, host and network
properties, execution time, or whether bounds are meet?}
\item{Is predicting activation tree structure useful and how can we do
it?}
\item{Can we exploit the activation tree's structure by being more
exploratory early in the tree?}  
\item{How deep in the tree should we go before we stop mapping and
leave the subtree to execute entirely on the current host?}
\item{Which single-node mapping algorithm should be used for a
particular node?}
\end{itemize}
Although they are interesting issues, I don't intend to look at
caching strategies to make expensive algorithms feasible or how to
incorporate input from external monitoring tools.  I also will not
look at how a reservation system changes matters.  Another issue is
procedure calls, such as calls to a windowing API, which must execute
on one particular host.  I will assume that all calls can execute on
any host.

\subsubsection*{Evaluation}

Since my simulation approach is so close to a real environment, I
intend to do as much of the evaluation as possible using the
simulator.  I will develop benchmarks from the families of activation
tree, network and host traces I will have collected and then test the
algorithm's performance on these benchmarks.  I hope to be able to
draw connections between benchmark properties and the performance of
my algorithm, at least to the extent that I can provide insight as to
how to set any parameters the algorithm may have.   The hope is to 
generalize the algorithm to the greatest extent possible.

I intend to compare this algorithm with other approaches.  I believe
that comparing to an approach based on a shared measurement system and
simple heuristics (eg, choose the least loaded host) should be
feasible.  Certainly it will be possible to evaluate the effect of
sharing information between hosts.

In addition, I will incorporate the algorithm into my LDOS distributed
object system to show that my approach can work in a real system.  I
may also do some limited evaluation using this system, but at this
point I am not sure what form this will take.

\subsubsection*{Writing and defense}

In the final stage, I will write up my results and defend them. 
