Timothy Zhu

Graduate Student

Computer Science Department
Carnegie Mellon University

Office: Gates Hillman Center (GHC) 7010
Email: timothyz (at) cs (dot) cmu (dot) edu

Advisor: Mor Harchol-Balter

CV

Thesis proposal


Research:

I am interested in the performance analysis and design of computer systems. I enjoy building systems and finding practical ways of solving resource management and scheduling problems using mathematically sound techniques.

My main research focus is on how to meet tail latency Service Level Objectives (SLOs) in shared storage and networks. The problem of long tail latencies is pervasive in datacenter environments, and many companies and researchers are trying to better control latency. Congestion is one of the main sources of tail latency, and I believe that analysis techniques such as Stochastic Network Calculus (SNC) and Deterministic Network Calculus (DNC) are useful tools in determining how to control congestion. Our IOFlow paper (SOSP 2013) introduces a QoS architecture for providing rate control and prioritization of storage and network I/O. Our PriorityMeister (SoCC 2014) paper addresses how to automatically configure priorities and rate limits to meet tail latency SLOs using DNC. I'm continuing this line of work as my thesis, and I look forward to exploring other methods of more efficiently using resources to meet performance goals such as tail latency SLOs.

I have also worked on cluster scheduling problems and am interested in ways of better scheduling jobs to take advantage of specialized resources. With heterogeneous resources comes new questions in scheduling. For example, is it beneficial to statically partition specialized resources, or to dynamically schedule a large pool of heterogeneous resources? When dynamically scheduling a large pool of heterogeneous resources, should the scheduler wait for specialized resources to become available in the future or use slower alternative resources that are immediately available? We have started to investigate some of these questions in our TetriSched work, but there is still more research to be done in this area.

I believe that underlying all of these resource management problems is a need for automated configuration of resources. It is too difficult and cumbersome for operators to constantly optimize system parameters. Furthermore, it is possible to embed the knowledge and expertise from better performance analysis techniques into systems that automatically tune parameters. In some of my earlier work, I looked at tuning the number of VMs. In our HotCloud 2012 work, I investigated techniques for elastically scaling memcached resources to reduce costs of cloud web services. I also investigated auto-scaling resources to meet deadlines for multi-phase batch jobs during an internship at Google. More recently, I'm looking at tuning QoS parameters to meet tail latency SLOs. I am excited about research problems in automatically managing resources to meet performance goals, and I hope to continue working on these types of problems.

Publications:

  • TetriSched: Global Rescheduling with Adaptive Plan-ahead in Dynamic Heterogeneous Clusters
    Alexey Tumanov, Timothy Zhu, Jun Woo Park, Michael A. Kozuch, Mor Harchol-Balter, Gregory R. Ganger
    Best student paper award at EuroSys 2016 [pdf]
  • PriorityMeister: Tail Latency QoS for Shared Networked Storage
    Timothy Zhu, Alexey Tumanov, Michael A. Kozuch, Mor Harchol-Balter, Gregory R. Ganger
    SoCC 2014 [pdf]
  • TetriSched: Space-Time Scheduling for Heterogeneous Datacenters
    Alexey Tumanov, Timothy Zhu, Michael A. Kozuch, Mor Harchol-Balter, Gregory R. Ganger
    CMU PDL Technical Report CMU-PDL-13-112, Dec 2013 [pdf]
  • IOFlow: A Software-Defined Storage Architecture
    Eno Thereska, Hitesh Ballani, Greg O'Shea, Thomas Karagiannis,
    Antony Rowstron, Tom Talpey, Richard Black, Timothy Zhu
    SOSP 2013 [pdf]
  • SOFTScale: Stealing Opportunistically For Transient Scaling
    Anshul Gandhi, Timothy Zhu, Mor Harchol-Balter and Michael A. Kozuch
    Middleware 2012
    CMU Technical Report CMU-CS-12-111 [pdf] (extended version)
  • Saving Cash by Using Less Cache
    Timothy Zhu, Anshul Gandhi, Mor Harchol-Balter and Michael A. Kozuch
    HotCloud 2012 [pdf]