|
Mor Harchol-Balter Bruce J. Nelson Professor
of Computer Science
-
Dept. of Computer Science
- Carnegie Mellon University
- Pittsburgh, PA 15213
- Office: Gates Hall 7207
- Phone: (412) 268-7893
- Fax: (412) 268-5576
- harchol@cs.cmu.edu
- Exec. Assistant:
- Patricia Loring (GHC 7025)
-
-
|
|
-
-
|
Introduction for students:
Link to my upcoming IC talk slides 2021
I am interested in the performance analysis and design of computer
systems, particularly distributed systems. I use analytical models to capture the important characteristics of a
computer system, and then I prove theorems about these models that
allow me to redesign the system to improve its
performance. Here "performance" might denote response time, energy use, throughput, capacity, etc. Most of my research involves inventing new analytical techniques in the area of performance analysis, as well as new algorithms for resource allocation.
Unlike many theoretical computer scientists, my analysis is based on stochastic (probabilistic) models of computer systems. There is no "adversary" sending us worst-case inputs. By contrast, there is a stream of requests, whose arrival times and service demands come from empirically fitted distributions. These distributions might be correlated, and they often exhibit heavy tailed service demands and high variability.
I believe that many conventional wisdoms on which we base computer system
designs are not well understood and sometimes false, leading to
inferior designs. My research revisits very classic questions in system design.
Here are a few examples of commonly-held beliefs that my research challenges:
- Thousands of "load balancing" heuristics do exactly that -- they
aim to balance the load among the existing hosts. But is load balancing
necessarily a good thing?
- Ever notice that the optimal scheduling policy,
Shortest-Remaining-Processing-Time-First (SRPT), is rarely used in
practice? There's a fear that the big jobs will "starve", or be
treated unfairly as compared with Processor-Sharing (PS). But is
this a reality?
- To minimize mean waiting time in a server farm, research suggests that each job should be sent to the server at which it will experience the least wait. That seems good from the job's perspective, but is the greedy strategy best for the system as a whole?
- Given a choice between a single machine with speed s, or n
identical machines each with speed s/n, which would you choose?
Think again ...
- Migrating active jobs is generally considered too expensive.
Killing jobs midway through execution and restarting them from scratch
later is even worse! Says who?
- Cycle stealing (using another machine's idle cycles to do your work) is
a central theme in distributed systems and has been the topic of thousands
of papers. But can we quantify when cycle stealing pays off, as a function of switching costs and threshold parameters?
- Redundancy is the idea of making multiple copies of the same job, where each copy is dispatched to a different queue, and where one only needs to wait for the first copy to complete. When does redundancy help?
Research Specifics:
I work on the design and performance analysis of computer systems including both theory
and implementation:
Algorithmic Work:
Designing and analyzing algorithms for:
resource allocation, task/job scheduling, load sharing, routing, cycle stealing, replication, power-management, multi-class/multi-server scheduling, multi-core parallel scheduling, fairness.
Correlated arrival processes, and analysis under
high-variability workloads. ``All-Can-Win Theorem'', demonstrating that scheduling policies which are biased towards favoring small jobs can also be preferable to large jobs.
Stochastic Analysis & Queueing Techniques:
Developing new methods for stochastic
analysis. Examples include: (i) Recursive Dimensionality Reduction (RDR), a technique
to reduce a multi-dimensional Markov chain to a 1-dimensional chain, by using busy period
transitions; (ii) Recursive Renewal Reward (RRR) and Clearing Analysis on Phases (CAP), techniques that allow one to
obtain closed-form solutions for many one-dimensional infinite repeating Markov chains, including the M/M/k with setup chain; (iii) Exact analysis of Replication Systems, the first exact solution (in product-form) for replication systems, involving any number of servers, and number of classes, and any degree of job replication;
(iv) SOAP Analysis of Scheduling Policies, the first exact response time analysis of a huge class of scheduling policies with no prior analysis, including the Gittins Index; (v) Optimal Scheduling for Multi-server systems, the first algorithms and analysis for optimal scheduling in the M/G/k, both with known and unknown job sizes.
Computer systems implementation:
Resource management/scheduling in data
centers and supercomputing centers. Implementation of QoS tail
latency guarantees for flows in networked storage systems (PriorityMeister and SNC-Meister).
Online scheduling of jobs with heterogeneous preferences (TetriSched). Dynamic power management in multi-tier data centers
(AutoScale), adopted by Facebook. Scheduling of Memory Controllers (ATLAS).
Pricing and queueing optimizations for cloud services. Kernel-level SRPT
connection scheduling in Web servers (SyNC). Scheduling to cope with
transient overload in Web servers.
Modeling and Workload characterization:
Known for the discovery of Pareto heavy-tailed distribution of UNIX process CPU lifetimes. Statistical characterization of workloads including UNIX processes, web, OLTP, supercomputing, memory workloads, datacenter workloads, and parallel multi-core jobs.
My TENURE RESEARCH STATEMENT from May 2007:
SOME TALKS
- (2024) ValueTools: Can Increasing the Hit Ratio Hurt Cache Throughput?"
talk slides .
- (2017) CanQueue Keynote & MIT-LIDS Keynote:
Queueing With Redundant Requests:
talk slides and
abstract .
- (2015) ISCA Tutorial & ICDCS Keynote:
What Queueing Theory Teaches Us about Computer Systems Design:
talk slides and
abstract .
- (2014) GreenMetrics Keynote:
Power Management in Data Centers: Theory & Practice:
talk slides and
abstract .
- (2009) Surprising Results on Task Assignment for High-Variability Workloads:
talk slides and abstract .
- (2007) ORSIS '07 Keynote, Lunteren '07 Keynote, MASCOTS '08 Keynote, SIPEW '08 Keynote:
Scheduling in Server Farms:
talk slides and abstract .
- (2006) Analysis of Cycle Stealing and Priority Queueing in Multi-Server Systems via Dimensionality Reduction Technique:
talk slides and abstract .
- (2005) Scheduling Your Network Connections: the SYNC project
talk slides and abstract .
- (2006) Open Workload Generators versus Closed: A cautionary tale
Adam's talk slides and abstract .
- (2004) What Analytical Performance Modeling Teaches Us About Computer System Design
talk slides and abstract .
Graduated PhD STUDENTS
- Ben Berg . Assistant Professor at UNC, Chapel-Hill, Computer Science Department. Link to Ben's thesis .
- Sherwin Doroudi .
Assistant Professor at U. Minnesota, ISyE.
Link to Sherwin's thesis . Winner
2016 William W. Cooper Doctoral Dissertation Award in Management or Management Science.
- Anshul Gandhi .
Associate Professor at SUNY Stony Brook, Computer Science Department.
Link to Anshul's thesis . Winner SPEC Dissertation Award. Winner SIGMETRICS Rising Star Award.
- Kristy Gardner .
Assistant Professor at Amherst College, Computer Science Department.
Link to Kristy's thesis .
- Isaac Grosof . Assistant Professor at Northwestern, ISyE. Co-winner INFORMS 2022 George Nicholson Prize. Winner of 2023 SIGMETRICS Doctoral Dissertation Award. Link to Isaac's thesis .
- Varun Gupta
Associate Professor at University of Chicago, Booth School of Business.
Link to Varun's thesis .
- Takayuki Osogami
Researcher at IBM-TRL.
Link to Taka's thesis .
- David McWherter
Researcher at Google, Seattle.
Link to David's thesis .
- Bianca Schroeder
Associate Professor at University of Toronto, Computer Science.
Link to Bianca's thesis .
- Ziv Scully
Assistant Professor at Cornell, Operations Research and Industrial Engineering Department.
Link to Ziv's thesis .
Co-winner INFORMS 2022 George Nicholson Prize. Also winner 2022 SIGMETRICS Doctoral Dissertation Award.
- Adam Wierman
Full Professor at Caltech, Computer Science. Department Head of Computer Science.
Link to Adam's thesis . Winner of 2007 SCS Distinguished Dissertation Award. Winner SIGMETRICS Rising Star Award.
- Timothy Zhu
Assistant Professor at Penn State University, Computer Science Department. Link to Timmy's thesis .
Current PhD STUDENTS
KEYNOTE PRESENTATIONS
CONFEST 2024, QTECQS 2023, ITC 2022, MAM 2022, UEMCON 2021, DDQC 2021, QTNA 2019, YEQT 2018, 40th Dutch Queueing Colloquium 2018, MIT LIDS Student Conference 2017, CanQueue 2016, ACM SIGMETRICS 2016, ICDCS 2015, GreenMetrics 2014, MASCOTS 2008, SIPEW 2008, Lunteren Conference 2007, ORSIS 2007.
ANNUAL TALK on Applying to Ph.D. Programs in Computer Science (pdf) .
TEACHING
|