\section{The time scale of rank changes}
\label{sec:ranktime}

Once a client has found a highly ranked server, the client is
interested in how long that server is likely to maintain a high rank
among the servers the client could fetch from.  Fundamentally, it is
the time scale over which significant rank changes occur that is
important.  In this section, we show that most rank changes are small,
even over relatively long time scales.  Good servers remain good for
long periods of time.  Indeed, the probability of rank changes depends
very little on the time scale over which they occur.

Given periodically sampled ranks, the natural way to study change on
different time scales would be via frequency domain or time series
analysis~\cite{BOX-JENKINS-TS}.  However, as we discussed in
Section~\ref{sec:method}, our data was collected at exponentially
distributed intervals, making this difficult.  The transfer time data
could be resampled periodically and new rankings computed, but such
resampling is complex and since signal reconstruction from non-periodic
samples is an area of current research, such an approach would be
questionable as well as difficult to understand.  We did informally
try this method and found results similar to those presented here.

Our approach was to estimate the cumulative probability of rank
changes over increasing time scales.  Consider a single mirror server.
From the point of view of a single client using the set of mirrors, we
have a sequence of time-stamped samples of that server's rank (as well
as transfer times.)  Now extract all the pairs of rank samples that
are four or fewer hours apart.  For each pair, subtract the earlier
rank from the later rank and take the absolute value.  Count the
number of occurrences of each of the possible rank changes.
Accumulating these in the appropriate order gives an estimate of the
cumulative probability of rank changes given measurements four or
fewer hours apart.  

We may find that rank changes of three or fewer are 80\% probable
given time scales of four or fewer hours.  Notationally, we express
this as $P[|r_{t+w}-r_{t}| \leq R \mid sample\ period \leq w\leq
W]=0.8$, where $R=3$ is the rank change, $W=4$ hours is the maximum
time scale and the $r$s are our rank samples.  For each combination of
$W$ and $R$ examined, we use a randomly selected 10,000 samples to
assure a tight estimate of the probability.  Further, we aggregate the
probabilities across all clients for each dataset and document to
obtain the point of view of a random client interacting with a random
server within the group.  Finally, it is important to note that we are
limited by our average sampling interval of one hour --- we cannot
discern behavior for $W<1$ hour.

\begin{figure}
\centerline{
\begin{tabular}{c}
\epsfxsize=3in
\epsfbox{rankchangetime.cumprobgiventime.mars1.epsf}  \\
\end{tabular}
}
\caption{$P[|r_{t+w}-r_{t}| \leq R \mid sample\ period \leq w\leq W]$ for Mars/1 dataset, plotted for several different values of $W$.  $R$ is
the rank change and $W$ is the maximum time period.  The other Mars plots
and the News/0 plot are similar.}
\label{fig:ranktimesane}
\end{figure}

Figure~\ref{fig:ranktimesane} shows a representative plot of the
cumulative probability for the Mars/1 dataset.  The way to read the
plot is to pick a time scale, follow the corresponding curve
horizontally to the maximum rank change that is of interest, and then
read the cumulative probability from the vertical axis.  For example,
we see that for time scales of two (or fewer) hours, rank changes of
four (or fewer) occur with probability 0.85.  The plot also includes
the curve that would result if rankings were simply uniformly
distributed random permutations.

It is clear from the graph that most rank changes are small.  The 10
day curves cover the vast majority of the data, and we can see that
the smallest 25\% of possible rank changes account for about 90\% of
rank changes.

The graph also shows that rank changes depend very little on the time
scales over which they occur.  If there was a strong dependency, the
curves for the different time scales would be more widely separated.
We can see that the curves for increasingly longer time scales do
indeed slowly move to the right (toward larger rank changes), but the
effect is very marginal.  This is a very promising result.  If a
client can find a good server, it is highly likely that it will remain
good for quite some time.

\begin{figure}
\centerline{
\epsfxsize=3in
\epsfbox{rankchangetime.cumprobgiventime.apache6.epsf}
}
\caption{$P[|r_{t+w}-r_{t}| \leq R \mid  sampleperiod \leq w \leq W]$ for Apache/1, plotted for several different values of $W$.  
Other Apache plots are roughly similar, and all Apache plots differ
significantly from Mars or News plots.}
\label{fig:ranktimeinsane}
\end{figure}

The graph of Figure~\ref{fig:ranktimesane} is representative for the
Mars and News datasets.  Unfortunately, the Apache data shows very
different behavior, as can be seen in Figure~\ref{fig:ranktimeinsane},
which shows a cumulative probability plot for a representative,
Apache/1.  Here, we don't see the quick rise of the curves, so large
rank changes are relatively much more probable than with the Mars and
News data.  Further, since the curves do not hug each other very
closely, there is more dependence on the time scale.  At this point,
we do not understand why the Apache data is so different.  The
clearest distinguishing characteristic of the Apache sites is that
they tend to run non-commercial web servers (the Apache web server)
while the Mars and News sites tend to run commercial web servers
(Netscape and Microsoft servers.)  We have no evidence that this
difference causes the discrepancy, however.
