Timothy Zhu

Graduate Student

Computer Science Department
Carnegie Mellon University

Office: Gates Hillman Center (GHC) 7010
Email: timothyz (at) cs (dot) cmu (dot) edu

Advisor: Mor Harchol-Balter

Research Statement

Teaching Statement


Thesis proposal

I am graduating this year and am on the job market!

I am looking for a tenure track faculty position in systems. Here is my Research Statement, Teaching Statement, and CV.

Research summary:

I am interested in designing and implementing computer systems that use novel resource management and scheduling techniques to meet performance goals. I believe that efficiently utilizing resources will require building automated performance management tools based on mathematically sound performance analysis models. Below are examples of automated performance management systems I have built.
  • Quality of Service (QoS) support for tail latency SLOs [details]
    My main research focus is on how to meet tail latency Service Level Objectives (SLOs) in shared storage and networks. The problem of long tail latencies is pervasive in datacenter environments, and many companies and researchers are trying to better control latency. Congestion is one of the main sources of tail latency in shared environments. Our IOFlow paper (SOSP 2013) introduces a QoS architecture for controlling congestion via rate limiting and prioritization of storage and network I/O. Our PriorityMeister (SoCC 2014) paper addresses how to automatically configure priorities and rate limits to meet tail latency SLOs using a Deterministic Network Calculus (DNC) analysis. Our SNC-Meister (SoCC 2016) paper shows significant improvements in admission control when using a probabilistic analysis called Stochastic Network Calculus (SNC) instead of DNC, which is a worst-case analysis. We are the first to build a computer system based on SNC, and our code is publicly available at: https://github.com/timmyzhu/SNC-Meister.
  • Cluster scheduling on heterogeneous resources [details]
    I have also worked on cluster scheduling problems and am interested in ways of better scheduling jobs to take advantage of specialized resources. With heterogeneous resources comes new questions in scheduling. For example, is it beneficial to statically partition specialized resources, or to dynamically schedule a large pool of heterogeneous resources? When dynamically scheduling a large pool of heterogeneous resources, should the scheduler wait for specialized resources to become available in the future or use slower alternative resources that are immediately available? Our TetriSched (EuroSys 2016) paper introduces a new cluster scheduler that optimizes when and where to run jobs so as to improve performance in heterogeneous clusters.
  • Autoscaling [details]
    Autoscaling is a useful technique for adapting resource utilization to load. In my CacheScale (HotCloud 2012) work, I investigate techniques for elastically scaling memcached resources to reduce costs of cloud web services. As an alternative to autoscaling memcached servers, our SOFTScale (Middleware 2012) work performs cycle-stealing on memcached servers to help deal with bursts of work during periods of low load. I have also investigated autoscaling resources to meet deadlines for multi-phase batch jobs during an internship at Google.
I believe that underlying all of these resource management problems is a need for automated configuration of resources. It is too difficult and cumbersome for IT operators to constantly optimize system parameters. Furthermore, it is possible to embed the knowledge and expertise from better performance analysis techniques into systems that automatically tune parameters. I am excited about research problems in automatically managing resources to meet performance goals, and I hope to continue working on these types of problems.


  • SNC-Meister: Admitting More Tenants with Tail Latency SLOs
    Timothy Zhu, Daniel S. Berger, Mor Harchol-Balter
    SoCC 2016 [To appear]
  • TetriSched: Global Rescheduling with Adaptive Plan-ahead in Dynamic Heterogeneous Clusters
    Alexey Tumanov, Timothy Zhu, Jun Woo Park, Michael A. Kozuch, Mor Harchol-Balter, Gregory R. Ganger
    Best student paper award at EuroSys 2016 [pdf]
  • PriorityMeister: Tail Latency QoS for Shared Networked Storage
    Timothy Zhu, Alexey Tumanov, Michael A. Kozuch, Mor Harchol-Balter, Gregory R. Ganger
    SoCC 2014 [pdf]
  • TetriSched: Space-Time Scheduling for Heterogeneous Datacenters
    Alexey Tumanov, Timothy Zhu, Michael A. Kozuch, Mor Harchol-Balter, Gregory R. Ganger
    CMU PDL Technical Report CMU-PDL-13-112, Dec 2013 [pdf]
  • IOFlow: A Software-Defined Storage Architecture
    Eno Thereska, Hitesh Ballani, Greg O'Shea, Thomas Karagiannis,
    Antony Rowstron, Tom Talpey, Richard Black, Timothy Zhu
    SOSP 2013 [pdf]
  • SOFTScale: Stealing Opportunistically For Transient Scaling
    Anshul Gandhi, Timothy Zhu, Mor Harchol-Balter and Michael A. Kozuch
    Middleware 2012
    CMU Technical Report CMU-CS-12-111 [pdf] (extended version)
  • Saving Cash by Using Less Cache
    Timothy Zhu, Anshul Gandhi, Mor Harchol-Balter and Michael A. Kozuch
    HotCloud 2012 [pdf]