Thesis: Stochastic Models and Analysis for Resource Management in Server Farms

Server farms are popular architectures for computing infrastructures such as supercomputing centers, data centers and web server farms. As server farms become larger and their workloads more complex, designing efficient policies for managing the resources in server farms via trial-and-error becomes intractable. Stochastic modeling and analysis techniques are powerful tools which have been successfully employed to understand the performance of such complex systems and to guide the design of policies to optimize the performance. However, as computing paradigms and applications have evolved at a rapid pace, the assumptions of existing server farm models motivated by telephone networks, inventory management systems, and call centers have inadequate for compute server farms.

There are numerous disconnects between traditional models of multi-server systems and how today's server farms operate. To cite a few:
   (i) Unlike call durations, supercomputing jobs and file sizes have high variance in service requirements and this critically affects the optimality and performance of scheduling policies.
   (ii) Most existing analysis of server farms focuses on the First-Come-First-Served (FCFS) scheduling discipline, while time sharing servers (e.g., web and database servers) are better modeled by the Processor-Sharing (PS) scheduling discipline.
   (iii) Time sharing systems typically exhibit thrashing (resource contention) which limits the achievable concurrency level, but traditional models of time sharing systems ignore this fundamental phenomenon.
    (iv) Recently, minimizing energy consumption has become an important metric in managing server farms. State-of-the-art servers come with multiple knobs to control energy consumption, but traditional queueing models don't take the metric of energy consumption or these control knobs into account.

In this thesis we attempt to bridge some of these disconnects by bringing the stochastic modeling and analysis literature closer to the realities of today's compute server farms. We introduce new queueing models for computing server farms, develop stochastic analysis techniques to evaluate and understand these queueing models, and use the analysis to propose resource management policies to optimize their performance.

Thesis Draft: [pdf]
Defense Slides: [pdf]

Thesis Committee:
Dave Andersen
Computer Science Department
Carnegie Mellon University
Anupam Gupta
Computer Science Department
Carnegie Mellon University
Mor Harchol-Balter
Computer Science Department
Carnegie Mellon University
Alan Scheller-Wolf
Tepper School of Business
Carnegie Mellon University
Devavrat Shah
Don Towsley
Department of Computer Science
UMass. (Amherst)