Configuring multiserver systems with multiple priority classes (Chapter 4)

Next: Improving traditional task assignment Up: Synopsis of Part II: Previous: Synopsis of Part II: Contents

Configuring multiserver systems with multiple priority classes (Chapter 4)

In Chapter 4, we address a fundamental question in designing multiserver systems, namely ``How many servers are best to minimize mean response time?'' As may be expected, this question has a long history in the literature; however, it has not been fully resolved. As early as 1958, Morse observed that for an M/M//FCFS system^1.3, the optimal number of servers is one () [131]. In 1970, Stidham formalized and generalized the observation by Morse [188]; Stidham showed that a single server is optimal in a G/Er//FCFS system, namely when the arrival process is a general process and the service demand has an Erlang distribution (i.e., when the service demand is a sum of i.i.d., or independent identically distributed, exponential random variables). Stidham's result is extended by Brumelle to the system where the service demand is at most as variable as the exponential random variable; i.e., a single server is optimal in a G/G//FCFS system where the service demand distribution has the squared coefficient variation at most one ( $C^2\leq 1$ ) [31]. For general service demand distributions, only results for limiting cases are known. Specifically, the work by Reiman and Simon [161] can be used to show that a single server () is optimal in the light traffic limit (when servers are almost always idle), and the work by Iglehart and Whitt [79] implies that a multiserver system (GI/GI//FCFS) and its single server counterpart (GI/GI/1/FCFS) coincide in the heavy traffic limit (when servers are almost always busy). In fact, based on these results in the light and heavy traffic limits, Mandelbaum and Reiman state ``One expects, therefore, that the difference in performance between the systems^1.4is maximal in light traffic, and it diminishes as utilization increases to heavy traffic.'' [120].

Contrary to the prior work that suggests the optimality of single server systems in special/limiting cases, we will see that a multiserver system often provides lower mean response time than its single server counterpart. Specifically, we will see that the optimal number of servers increases as the load and/or the service demand variability increases. Intuition behind this finding is that multiple servers allow short jobs to avoid queueing behind long jobs; by contrast, a long job blocks all the other jobs in a single server system, while being processed.

Above we assume that jobs are served in the order of their arrivals (FCFS); however, real world systems are not always FCFS. Rather, there is often inherent scheduling in the system to allow high priority jobs to move ahead of low priority jobs (priority scheduling). For example, the high priority jobs may carry more importance, e.g., representing users who pay more. Alternatively, high priority jobs may simply consist of those jobs with small service demand, since giving priority to short jobs reduces mean response time overall.

The prevalence of priority scheduling motivates us to study the question ``How many servers are best?'' under priority scheduling. In particular, how does the optimal number of servers under priority scheduling compare to that under FCFS, and how is the answer affected by system parameters such as loads and service demand distributions? To answer these questions, we analyze the mean response time in a multiserver systems with multiple priority classes via DR, as its analysis involves a multidimensional Markov chain.

Our analysis shows that the optimal number of servers can be quite different between FCFS and priority scheduling. Intuition behind this finding is seen by considering an example where shorter jobs have higher priority: in this case, the benefit of multiple servers (i.e., allowing short jobs to avoid queueing behind long jobs) is smaller under priority scheduling, since priority scheduling by itself allows small jobs to jump ahead of long jobs. We will also study how the optimal number of servers differs for different priority classes, and how the mean response time is affected by choosing a non-optimal number of servers. We will also evaluate the accuracy of various existing approximations for multiserver systems with multiple priority classes, and see that the error in the mean response time can be as high as 50% or more.

Next: Improving traditional task assignment Up: Synopsis of Part II: Previous: Synopsis of Part II: Contents

Takayuki Osogami 2005-07-19