\section{Server choice and document choice}
\label{section:document}

\begin{figure}[t]
\begin{center}
{\small
\begin{tabular}{|l|c|c|c|c|c|}\hline
% & \multicolumn{5}{|c|}{Server Set of Document $j$} \\\cline{2-6}
%Document $i$  & 0       & 1      & 2      & 3      & 4 \\\hline
%0     & 0.55\%  & 4.11\% & 4.11\% & 6.85\% & 6.85\% \\
%1     & 9.32\%  & 0.00\% & 0.00\% & 0.82\% & 0.82\% \\
%2     & 9.86\%  & 0.00\% & 0.00\% & 0.27\% & 0.27\%  \\
%3     & 10.68\% & 0.55\% & 0.55\% & 0.00\% & 0.00\%  \\
%4     & 8.77\%  & 0.27\% & 0.27\% & 0.00\% & 0.00\%  \\\hline
%
% Transposed this - PAD 12/20
%
 & \multicolumn{5}{|c|}{Document $j$} \\\cline{2-6}
SS $i$
      & 0       & 1      & 2      & 3      & 4 \\\hline
0     & 0.55\%  & 9.32\% & 9.86\% &10.68\% & 8.77\% \\
1     & 4.11\%  & 0.00\% & 0.00\% & 0.55\% & 0.27\% \\
2     & 4.11\%  & 0.00\% & 0.00\% & 0.55\% & 0.27\%  \\
3     & 6.85\% & 0.82\% & 0.27\% & 0.00\% & 0.00\%  \\
4     & 6.85\%  & 0.82\% & 0.27\% & 0.00\% & 0.00\%  \\\hline
\end{tabular}
} %\small
\end{center}
\caption{
Percentage of time that good performance is not achieved using the top
5 servers from the server set of document $i$ (SS $i$) to fetch
document $j$.}
\label{doc-uncommon}
\end{figure}

The reader may have noticed that in Figure~\ref{working-set}, the
composition of server sets obviously varies from document to document.
This seems to suggest that in some cases, a server that provides good
performance for one document does not provide good performance for
another document.  However, further examination reveals that document
choice has at best a weak effect on server choice.

Recall that a server set is the {\em smallest} set of servers that
provide good performance for a given client.  Other servers not in the
server set could provide good performance at any given moment.  For
example, there are cases in which more than one collection of servers
can be a server set.  If two servers, A and B, provide good
performance at exactly the same moment, then two server sets are
possible: one using A and the other using B.  Thus, it is unwise to
rely on apparent differences in server sets as an indicator of
differences in server performance.

Figure~\ref{doc-uncommon} shows how using one document's server set to
fetch another document affects performance.  The table was built by
counting how often the top 5 servers from document $i$'s server set
(SS $i$) are able to offer good performance for document $j$ for every
$i,j \epsilon [0,4]$.  Although this data is generated from the Mars
data at client site U. Va, all other combinations of clients and web
sites produced similar results.  The entry at $(i,j)$ in the table is
the percentage of fetches for which the server set for document $i$
was {\bf not} able to provide good performance for document $j$.  For
example, we can see that using the server set for document 4 to fetch
document 1 would lead to good performance in over 99\% of fetches.

We used only the top 5 servers from each server set so that all sets
of servers considered would be the same size.  Server sets for
documents 2 through 4 only contained 5 servers, so they were
unaffected.  Document 0's server set, however, contained 7 servers.
The most immediate effect is that in the table above, the (truncated)
server set for document 0 failed to provide good performance 0.55\%
of the time.

Measuring how well one document's server set would do to fetch
another is a much more reasonable way to judge the differences in
server performance among documents.  It can directly show how often a
server identified as good for one document is not actually good for
another document.  In Figure~\ref{doc-uncommon}, we can see that most
often, performance remains good across server sets.  Ignoring data
from the first row and first column, we see that instances when one
document's server set does not offer good performance for another
document are very rare.

Looking at the table's first row and the first column, which
correspond to server set 0 and document 0 respectively, we see that
good performance is achieved less frequently.  The servers which offer
good performance for document 0 are at least partially different from
the servers that offer good performance for other documents.  This
indicates that there might be some link between document choice and
server choice.  In all client-site combinations, we observed that the
first document had a noticeably different set of good servers than the
other documents.

% Figure~\ref{doc-uncommon} shows how frequently the set of good servers
% for one document does not intersect with the set of good servers for
% another document in the Apache data set.  The number at box $i,j$ is
% calculated as the percentage of groups in which there was no
% possibility of achieving good performance fetching document $j$ with
% any of the servers that would have given good performance in fetching
% document $i$.  As in Section~\ref{section:working-sets}, we consider a
% ``good'' server to be capable of delivering a document within 10\% of
% the fastest transfer time.

% From Figure~\ref{doc-uncommon}, we see that for any pair of documents,
% there were a significant number of groups in which no single server
% provided good performance.  For instance, in less than half of all
% fetches, a server provided good performance for both documents 0 and
% 3.  The same trends are present in the Mars set, though the actual
% numbers are slightly lower.

In both the Apache and Mars data, the first document is also the
smallest (about 2 KB).  We believe the dependence is more a function
of document size than the specific documents being fetched, but
further study using a larger variety of documents is required to
verify this.  We can explain the effect of document size on server
choice if we assume that the network (and not the server) is the
bottleneck.  For smaller documents, the transfer time depends more on
the round trip time between the client and server.  The smallest
documents fit in one or two packets so the client-server conversation
lasts only a few round trip times.  For larger documents, the amount
of bandwidth available on the path between the client and server
becomes the important factor as the network ``pipe'' is packed with as
much data as possible.  In this scenario, one property of a server
(the round trip time between it and the client) would dominate for
small documents and another property (the throughput between the
client and server) would dominate for larger documents.

Regardless of the cause, the effect is not extremely significant.
First of all, at most 11\% of fetches were adversely affected by the
difference in server sets.  In these fetches, the increase in transfer
time was less than 25\% above optimal on average.  Also note that
these performance penalties are on top of a rather small transfer time
(about 1 second), so the actual penalties are on the order of hundreds
of milliseconds.  Thus there is little cause for concern over using
only one server set for all document sizes will lead to bad
performance.


