15-857A: Performance Modeling & Design of Computer Systems
Classes: M,W 1:30 - 2:50, Room: GHC 4307 (new location)
Recitation: F 1:30 - 2:50, Room: GHC 4307 (new location)
(cross-listed with Tepper: 47-774 and 47-775)
Starts Wednesday September 9, 2009
Description:
In designing computer systems one is usually constrained by certain
performance requirements. For example, certain response times or
throughput might be required of the system. On the other hand, one
often has many choices: One fast disk, or two slow ones? What speed
CPU will suffice? Should we invest our money in more buffer space, or
a faster processor? Which migration policy will work best? Which
task assignment policy will work best? How can we redesign the
scheduling policy to improve the system performance?
Often answers to these questions are counter-intuitive. Ideally, one
would like to have answers to these questions before investing the
time and money to build a system. This class will introduce students
to analytic stochastic modeling with the aim of answering
questions such as those above.
Topics covered include:
- Operational Laws: Little's Law, response-time law, asymptotic bounds,
modification analysis, performance metrics;
- Markov Chain Theory: discrete-time Markov chains,
continuous-time Markov chains, renewal theory, time-reversibility;
Poisson Process: memorylessness, Bernoulli splitting, uniformity,
PASTA;
- Queueing Theory: open networks, closed networks, M/M/1, M/M/k,
M/M/k/k, Burke's theorem, Jackson networks, classed networks,
load-dependent servers, BCMP result and proof, M/G/1 full analysis,
M/G/k, G/G/1;
- Simulations: time averages versus ensemble averages, confidence
intervals, generating random variables for simulation, Inspection
Paradox;
- Empirical Workload Measurements: heavy-tailed property,
Pareto distributions, self-similarity, heavy-tailed distributions;
- Analysis of Scheduling: FCFS, non-preemptive
priorities, preemptive priorities, PS, LCFS, FB, SJF, SRPT;
- Applications: Web servers, database management systems,
supercomputing server farms, call centers, web server farms, disks.
The techniques studied in this class are useful to students in
Computer Science, ECE, Mathematics, ACO, Tepper, Statistics, and Engineering.
This course is packed with open problems -- problems which if solved
are not just interesting theoretically, but which have huge
applicability to the design of computer systems today.
Class Reviews
PREREQUISITES:
Recommended for those with strong background in probability.
Assumes knowledge of continuous and discrete
distributions, conditional probability, conditional expectation,
moments, and some previous exposure to Markov Chains. Assumed material can be found in: "Introduction to
Probability Models" by Sheldon M. Ross, Chapters 1-3. You can borrow
this book from my office.
Highly recommended for CS, ECE, ACO, Tepper, and Mathematics students.
Note: There is an entrance exam for this
class. So make sure you've got the prerequisites
TEACHING STAFF:
- Instructor: Prof. Mor Harchol-Balter OFFICE HOURS: Wednesday 3-4 p.m. and Thursday 4-5 p.m. in Gates 7207, Phone: x8-7893. Most Wednesdays I'll be able to stay later.
- TA: Emre Nadar. OFFICE HOURS: TUESDAYS 4:30 - 5:30 p.m. and FRIDAYS 3:30 - 4:30 p.m. in GSIA (Tepper) A19B, Phone: x8-9871, Email: enadar@andrew.cmu.edu.
TEXTBOOK:
I will pass out my own course notes and some supplementary handouts and papers at the end of each class.
Some good reference texts are listed
here: BOOK LIST.
You can borrow most of these books from my office.
You should be comfortable with Undergraduate Probability before taking
this class. Please have the first 3 chapters of Sheldon Ross' Introduction
to Probability Models, or the equivalent, memorized. If you do not have this book,
I will be happy to loan it to you.
GRADING:
- 7 or 8 homeworks -- worth 40% total.
- Midterm 1 -- 20%.
- Midterm 2 -- 20%.
- One grading meeting during semester, includes problem design -- 10%
- Class participation -- 10%.
COLLABORATING vs. CHEATING:
You will receive one week or week-and-a-half.
These will be difficult. Start immediately
so that you can take full advantage of office hours. You will find
office hours very helpful!
Some of these homework problems will
be repeated from previous years. The reason is that I have made up
all the problems myself and it takes a very long time to think up good
problems. Do not ask people who took this course in previous years to
help you with the homeworks. This is considered cheating and will be reported to the dean.
On the other hand, I strongly encourage you to collaborate with your
current classmates to solve the homework problems after you have tried
solving them by yourself. Each person must turn in a separate writeup. You should
note on your homework specifically which problems were a collaborative
effort and with whom.
ANTICIPATED DETAILED OUTLINE OF TOPICS FOR THIS CLASS:
PART I: Introduction, Operational Laws (laws that hold independent of any assumptions), Back-of-the-Envelope Bounds, and Modification Analysis.
- Overview + Motivating examples of power of queueing theory
- Queueing Terminology and Applications
- Introduce queueing theory terminology
- Define open networks with examples
- Define performance metrics for open networks: response time, throughput, utilization
- Define closed networks with examples: batch network, terminal-driven network
- Define performance metrics for closed networks: response time, throughput, utilization
- Hopefully start Little's Law !
- Little's Law
- Time averages vs. Ensemble Averages with applications to Simulations.
- Little's Law for Open system.
- Little's Law for Closed system.
- Full Proof of Little's Thm for open system.
- Full Proof of Little's Thm for closed system.
- Examples.
- Operation Law: Response Time Law.
- Asymptotic Bounds and Modification Analysis
- Another Operational Law: Forced Flow Law
- Another Operation Law: Bottleneck Law.
- Practice using Operation Laws in Combination.
- Asymptotic Bounds for Closed Systems.
- Modification Analysis for Closed Systems.
- Examples.
- Differences between open and closed networks.
PART II: Traditional Queueing Theory (with all the usual assumptions:
Exponential Service, Poisson Arrivals, FCFS Scheduling). Strong
emphasis on applications/case studies.
- Discrete-Time Markov Chains
- Markov property: examples
- Limiting probabilities
- Definitions
- Ergodicity theorems
- Method 1 for solving for limiting probabilities: Stationary Eqns
- Examples using stationary eqns + Solving via Mathematica
- Discrete-Time Markov Chains Continued
- More on Ensemble Averages vs. Time Averages
- Limiting probability as rate
- Time-reversibility
- Method 2 for solving for limiting probabilities: Time-reversibility
- Practical applications of time-reversibility
- Examples -- how google.com uses DTMCs
- Exponential Distribution and Poisson Process
- Exponential Distribution
- Memorylessness
- Relationship between Exponential and Geometric
- Sum of Exponentials
- Probability Event 1 occurs before Event 2
- Poisson Process
- 3 Definitions and proof of equivalence
- Merging Poisson Processes
- Bernoulli splitting
- Uniformity
- Continuous Time Markov Chains (CTMC)
- Transition from Discrete Time Markov Chains to CTMCs.
- Balance Equations
- M/M/1 and Performance Metrics
- M/M/1 and variations
- Applications of M/M/1
- PASTA
- Full distribution of Time in System for M/M/1
- Finite buffers
- M/M/m/m
- M/M/m
- Applications
- All this is under exponential context!
- Comparison of distributed server configurations.
- Many hosts or single host?
- To balance load or not to balance load?
- To migrate or not to migrate?
- Setup cost for power management in M/M/1 and M/M/m
- Solving finite state CTMCs
- Examples -- including evaluating Ethernet efficiency.
- Buildup to Open Networks of EXP/FCFS queues
- Time-reversibility
- Burke's Thm
- Tandem queues
- Jackson Networks
- Jackson networks of queues
- Full Soln via local balance approach
- Applications
- Generalizations of Open Networks of EXP/FCFS queues
- Motivating examples
- Classed networks
- Full proof via local balance approach
- Closed networks
- Full derivation via local balance
- Applications
- BCMP
- BCMP theorems and proof
- What queueing theory can't model ...
- Matrix-Geometric Techniques
- Motivating examples
- Solution method
- Recent results
PART III: "MODERN" Applied Queueing Theory: Measured
Heavy-tailed Workloads, Correlated Arrivals, Preemptive Service
Disciplines. (Applications include: Load Balancing in NOWs, Task
Assignment in Super-Computing Centers, Scheduling in a Web Server,
Scheduling in a Distributed Web Server, Power Management in Data Centers.)
- Empirical Measurements of Job sizes
- Pareto distribution measurements
- Application: Load balancing in Network of Workstations
- M/G/1
- Tagged Job -- first moment
- M/D/1
- M/EK/1
- M/H2/1
- M/G/1
- Renewal Theory
- More topics on Simulations
- Inspection Paradox
- Transforms
- Laplace Transforms
- Moments
- Linearity properties
- Z-Transforms
- Moments
- Linearity properties
- Theorems on Laplace and z-transforms
- Examples
- Application of transforms to determining busy periods.
- M/G/1
- All moments via Laplace Transforms
- Supplementary Random Variables -- LIKELY SKIP
- Technique
- Examples of use.
- Fluid Approximations -- LIKELY SKIP
- Technique
- Examples of use.
- Applications of M/G/1
- M/G/1 with setup cost.
- To balance load or not to balance?
- Many hosts or single host?
- Task assignment in a distributed server of FCFS hosts.
- M/G/k
- Confidence Intervals
- Definition
- Proofs
- Examples
- Why confidence intervals don't work
- Measured Arrival Processes -- Self-similarity -- LIKELY SKIP
- Effect of Variability in Arrival Process:
M/M/1 vs. Ek/M/1 vs. H2/M/1 vs. Ph/M/1 vs. G/M/1
- Why Poisson Process is Used -- Model of Inf # cust. each with GI/G/1
- When the Poisson Process approximation doesn't make sense
- Illustrations of Measured Arrival Processes
- Short-range auto corrolations --
The case for compositions of Poisson Processes
- Long-range auto corrolations -- Self-Similarity -- alpha parameter
- Model: Superposition of processes with heavy-tailed on-times
- How to measure your arrival process.
- How to tell if it's Poisson.
- How to tell if there's correlation.
- Estimating the alpha parameter.
- Examples and Homework Examples -- go out and get measurements.
- Open research
- What is the effect of getting the arrival
process wrong?
- E.g., In deriving good load balancing algorithm,
getting the workload right is crucial -- how about getting
the arrival process right?
- Scheduling (with full proofs using Laplace transforms):
- Scheduling: Part I
- Performance metrics
- Non-preemptive scheduling policies
- Scheduling: Part II
- Processor-Sharing
- Task assignment in Processor-Sharing distributed server
- Preemptive-LCFS
- Scheduling: Part III
- Comparison of Scheduling algorithms
- Priority Queueing
- Non-preemptive priority
- Preemptive priority
- Scheduling: Part IV
- SJF
- SRPT
- Application: Scheduling in Web servers.