15-858 Section C
Algorithms and Analysis for Large-Scale Cloud Computing Systems
Time: Tuesday and Thursday 10:30 AM - 11:50 AM
Location: GHC 4303
Instructor: Weina Wang weinaw “at” cs “dot” cmu “dot” edu
Office hours: anytime, please send an email first
Prerequisite: undergraduate probability, mathematical maturity in general
Large-scale cloud computing systems are at the heart of today’s emerging technologies. A key challenge in advancing the state-of-the-art is to design systems with stringent performance guarantees such as high-throughput and low-delay. To tackle this challenge, it is important to understand how algorithms in various components of the system have an impact on the performance. This course introduces theories and techniques from applied probability and stochastic processes for building models and analyzing algorithms. The goal is to address questions such as: How should computation tasks be assigned to the large number of servers in a cloud computing system? How should the bandwidth in a datacenter network be allocated to achieve fast data transfer? How should data be distributed and how does that affect the efficiency of computation?
Topics covered include:
- Load-Balancing: Join-the-shortest-queue, power-of-d-choices
- Data transfer in datacenter networks: flow-level modeling, multi-commodity flow algorithms
- Datacenter switch: data packet dynamics, matching algorithms
- Data storage and caching: distributed storage, data locality
- Markov chains, Foster-Lyapunov theorem, queueing theory, mean-field analysis
After successfully completing this course, you will
- gain or improve your skills in stochastic modeling of modern computing systems;
- learn advanced tools for designing and analyzing algorithms for stochastic systems;
- have opportunities to identify and collaborate on new research problems.
- R. Srikant and Lei Ying. Communication networks: an optimization, control, and stochastic networks perspective. Cambridge University Press, 2013.
- Bruce Hajek. Random Processes for Engineers, Cambridge University Press, 2015. (Downloadable from Hajek’s website)
- Robert Gallager’s course notes. (Downloadable on MIT OpenCourseWare. Putting these notes together we actually get his book.)
- Mor Harchol-Balter. Performance Modeling and Design of Computer Systems: Queueing Theory in Action. Cambridge University Press, 2013.
[Sep 4] Introduction: A datacenter from the modeling perspective
[Sep 6] Discrete-time Markov chain (DTMC) (References: Chapter 3.3 of Srikant and Ying’s book, Chapter 2 & 6 of Hajek’s book, Chapter 5 of Gallager’s book/notes)
[Sep 11] Foster-Lyapunov stability criterion with application in load-balancing
[Sep 13] Applying Foster-Lyapunov to the giant switch abstraction of datacenter networks
[Sep 18] Moment bounds given by the drift
[Sep 20] The drift method (I)
[Sep 25] The drift method (II)
[Sep 27] The drift method (III)
[Oct 2] Continuous-time Markov chain (CTMC)
[Oct 4] No class
[Oct 9] Extending Foster-Lyapunov and the drift method to CTMC
[Oct 11] Course project presentation
[Oct 16] Course project presentation
[Oct 18] Mean-field analysis: Cavity method and differential equations method
[Oct 23] Mean-field analysis: Kurtz’s theorem example
[Oct 25] Mean-field analysis: Kurtz’s theorem proof and more examples
[Oct 30] Mean-field analysis: Interchange of limits, asymptotic independence
[Nov 1] Mean-field analysis: asymptotic independence, neural network example
[Nov 6] No class
[Nov 8] Data management in datacenters: Data locality and data transfer
[Nov 13] A connection-level model for data transfer, multi-commodity flow algorithms
[Nov 14] (Makeup class) Open problems in modern large-scale computing systems
[Nov 15] Course project presentation
[Nov 20] Course project presentation
[Nov 27] Course project presentation
[Nov 29] Course project presentation
You will have homework every other week.
Course project: 40%
You will choose a topic of your interest and read up papers in the literature. I will suggest some topics and papers. But you are free to choose any topic that you find particularly interesting as long as it is related to this course. You can work by yourself or together with another student. Your team will submit a brief report every four weeks to document your progress, and a final report at the end of the semester. Of course you are also welcome to have discussions with me anytime. Your team will also give a 40 min presentation to show your findings. You are encouraged to identify and solve problems beyond the existing work in the literature – there will be bonus in your grade and I am happy work with you if desired to help you submit your work to a conference.
- Homework: feel free to discuss with your fellow classmates. But you should write your own answers.
- Project: if you work together as a team of two, please indicate the contribution of each of you in the reports.
- CMU academic integrity policy
Accommodations for students with disabilities: If you have a disability and require accommodations, please contact Catherine Getchell, Director of Disability Resources, 412-268-6121, email@example.com. If you have an accommodations letter from the Disability Resources office, I encourage you to discuss your accommodations and needs with me as early in the semester as possible. I will work with you to ensure that accommodations are provided as appropriate.
Student wellness: As a student, you may experience a range of challenges that can interfere with learning, such as strained relationships, increased anxiety, substance use, feeling down, difficulty concentrating and/or lack of motivation. These mental health concerns or stressful events may diminish your academic performance and/or reduce your ability to participate in daily activities. CMU services are available, and treatment does work. You can learn more about confidential mental health services available on campus at: http://www.cmu.edu/counseling/. Support is always available (24⁄7) from Counseling and Psychological Services: 412-268-2922.