\section{Data characteristics}
\label{sec:data}

All clients began fetching documents on the afternoon of Thursday,
April 23, 1998 and continued until the morning of Thursday, May 14,
1998.  During this 3 week period, there were a total of 490843 fetches
made.  By data set, there were 287209 fetches to Mars servers, 157762
to Apache servers, and 45872 to News servers.  The much lower number
for the News data is mostly due to the fact that we only fetched one
document from each News site compared to five from each Mars and
Apache site.  We can estimate the number of times each set of servers
was visited by dividing the number of fetches by the number of
combinations of servers and documents.  For Mars, we divide 287209 by
100 (20 servers x 5 documents) to find that the Mars servers were
visited 2872 times.  Similarly, we see that Apache servers were
visited 2868 times and News servers were visited 2867 times.

The slightly lower number of visits to Apache and News sites is a
product of the way the client fetch script reacted to crashes.  When a
client was restarted, it began fetching documents from the first
server on its list rather than starting at the place where the last
series of fetches left off.  The script acted this way because we
assumed that any machine crash and reboot would take a significant
amount of time.  Therefore, a new group was started to avoid a group's
fetches from stretching over too long a period of time.  Since clients
visited Mars sites first, then Apache sites, and finally News sites,
it is not surprising that there are more fetches to Mars sites than to
Apache sites and more fetches to Apache sites than to News sites.

The number of fetches performed and the average length of time that
one group of fetches took to complete at each client site can be found
in Figure~\ref{client-sites}.  As expected, sites with longer group
fetch times completed fewer fetches.  We believe the differences
across clients reflect variation in the amount of available bandwidth
and machine speed at each client site.

Figure~\ref{client-sites} also shows the percentage of fetches that
were classified as failures (because timeouts and improper amounts of
data returned).  By client, the proportion of failures ranged from
1.96\% to 22.13\% of fetches.  Considering the loss rate by server
set, we see that Mars servers failed 5.85\% of the time, News servers
failed 9.49\% of the time, and Apache servers failed 24.23\% of the
time.  As far as we can tell, the differences in failure rates across
types of mirrors are not the result of using one brand of web server
or another.  However, we did notice that three Apache servers
consistently timed out for some clients while they succeeded a
reasonable amount of time for other clients.  These three servers
account for most of the Apache servers' comparatively high failure
rate.

