Motivation

When we design a multiserver system (or even a single-server system), an important step is to analyze and understand their performance. The study of system performance analysis dates back to the seminal work of A. K. Erlang in 1917 [50], where he provided formulas that can be used to evaluate the performance at a telephone exchange. Erlang's formulas were heavily used by telephone companies to provide necessary and sufficient resources, which led to the successful and rapid growth of telephone networks. Erlang's work was followed by many researchers and developed into a theory known today as queueing theory.

Queueing theory was introduced to computer scientists in 1960s by L. Kleinrock and others, and contributed to the successful growth of computer systems such as the Internet and time-sharing systems. For example, J. R. Jackson's theory for networks of queues [83,84] was used to study the performance of the ARPANET, a precursor of the Internet. Understanding how the performance is affected by various parameters was an important necessary step for optimizing the network topology, capacity assignment, and flow control of the ARPANET [101,102].

Although Erlang's formulas and Jackson's theory are defined for (simple) multiserver systems, much of queueing theory has been developed ironically for single-server systems. In earlier days (prior to the 1970s), researchers in queueing theory provided beautiful and simple formulas that can be used directly to evaluate the performance of single-server systems and some of the simplest models of multiserver systems such as the ones studied by Erlang and Jackson. These explicit formulas not only can be evaluated numerically but also can give us good intuition on how the system parameters affect the performance. However, as researchers moved on to more complex models, the formulas that they obtained became more complex. This was particularly true for multiserver systems. As a result, although classical queueing theory provides beautiful formulas that allow us to evaluate the performance of complicated single-server systems, it provides much less for multiserver systems. (See [189] for more details of a history of queueing theory.)

Today's multiserver systems can employ more complex configurations and/or more complex resource allocation towards more efficient and more profitable operations. For example, advancement of operating systems and communication technology allows sharing resources among tasks or users for utilizing otherwise idle CPU cycles. Also, advancement of technology such as information retrieval and data mining enables one to identify customers (jobs, or tasks) who are likely to bring more revenue to a company, and today's routers allow a sophisticated assignment of customers to servers based on the available information on the customers (e.g. based on the priority of the customers).

For the performance evaluation of these multiserver systems with resource sharing or prioritization, analytical tools are largely unavailable. The lack of analytical tools is unfortunate, since analytical tools have been proved to be effective in allocating necessary and sufficient resources, for example, at telephone exchanges and the ARPANET. Such capacity planning remains difficult for multiserver systems with resource sharing or prioritization.

Also, due to the lack of analytical tools, it is not well understood what is a good configuration and what is a good resource allocation policy for a multiserver system with resource sharing or prioritization. It is also not well understood how different configurations and different resource allocation policies compare with respect to their performance in these multiserver systems. In fact, many fundamental questions regarding the performance of multiserver systems, of importance to system designers and service providers, are largely unanswered.

For example, when a high volume web site (web server farm) receives requests of web pages or files, each request must be dispatched to exactly one of the web servers to be served. Similarly, when a supercomputing center receives jobs (tasks), each job must be dispatched to exactly one of the host machines for processing (see Figure 1.1(a)). The rule for assigning requests/jobs to host servers/machines is known as the task assignment policy. The choice of the task assignment policy can have a significant effect on the performance perceived by users of web sites and supercomputing centers. Designing a distributed server system such as web server farms and supercomputing centers thus often comes down to choosing the ``best'' possible task assignment policy for the given system and user requirements.

However, analyzing even the simplest task assignment policies can prove to be difficult. As a result, fundamental questions regarding the optimal task assignment policy and relative performance among various task assignment policies are largely open.

. Which job should be assigned to which server? What is the best task assignment policy, and how does its performance compare to other task assignment policies?

Networks of workstations can benefit tremendously from allowing a user to take advantage of machines other than her own, at times when those machines are idle (see Figure 1.1(b)). This notion is often referred to as cycle stealing. Cycle stealing has been implemented in various workload management systems for networks of workstations such as Butler [139], Condor [115,116], Benevolent Bandit [54], Sprite [48], Stealth Distributed Scheduler [109], and Linger-Longer [170]. Cycle stealing is also popular in Internet computing, which allows one to steal idle cycles of personal computers at home [41]. Internet computing has been used for the purpose of searching for extraterrestrial intelligence, protein folding research, cancer research, AIDS research, finding large prime numbers, etc. More generally, cycle stealing enables one to use multiple servers as a single resource, and has been receiving attention in the context of grid computing [14,40,56].

Despite the popularity of cycle stealing, the performance benefit or penalty of cycle stealing is not well understood, largely due to the lack of analytical tools for multiserver systems with cycle stealing. For example, when is cycle stealing a good idea?

. When does cycle stealing pay, and how much? Should we always steal idle cycles? Should we steal only idle cycles?

Analytical tools can also be used to gain insights into the physical nature of a system such as how the performance of the system is affected by the number of servers, system loads, job size distributions, and arrival processes of jobs. Such insights into the physical nature of a system are useful in designing good resource allocation policies for the system and in configuring the system effectively. However, due to the lack of analytical tools for multiserver systems with resource sharing and job prioritization, the physical nature of these multiserver systems is not well understood. For example, a single fast computer of speed

is more expensive than

slow computers, each of speed

, but does this mean that a single fast computer is better (see Figure 1.1(c))?

. How many servers are best? Does the answer change depending on system loads, job size distributions, and how the jobs are processed (e.g., first-come-first-served or with priority)? How much is the performance different between a single fast computer and slow computers? What is the optimal ?

Figure 1.1: Fundamental questions in multiserver systems.

$\includegraphics[width=.3\linewidth]{fig/model/dispatcher.eps}$

(a)

$\includegraphics[width=.9\linewidth]{fig/model/cycle_stealing.eps}$

(b)

$\includegraphics[width=.9\linewidth]{fig/model/single_vs_multi.eps}$

(c)

The goal of this thesis is thus twofold. First, we will develop new analytical tools for analyzing the performance of multiserver systems with resource sharing or job prioritization. Second, we will apply the new analytical tools to analyze various multiserver systems, addressing the fundamental questions regarding their performance. Our analysis leads to many lessons and guidelines useful in configuring multiserver systems and designing good resource allocation policies for multiserver systems.