Improving Systems Management Policies Using Hybrid Reinforcement Learning
Reinforcement Learning (RL) provides a promising new approach to systems performance management that differs radically from standard queuing-theoretic approaches making use of explicit system performance models. In principle, RL can automatically learn high-quality management policies without explicit performance models or traffic models, and with little or no built-in system specific knowledge. Previously we showed that online RL can learn to make high-quality server allocation decisions in a multi-application prototype Data Center scenario. The present work shows how to combine the strengths of both RL and queuing models in a hybrid approach, in which RL trains offline on data collected while a queuing model policy controls the system. By training offline we avoid suffering potentially poor performance in live online training. Our latest results show that, in both open-loop and closed-loop traffic, hybrid RL training can achieve significant performance improvements over a variety of initial model-based policies. We also give several interesting insights as to how RL, as expected, can deal effectively with both transients and switching delays, which lie outside the scope of traditional steady-state queuing theory.
Gerry Tesauro received a PhD in theoretical physics from Princeton University in 1986, and owes his subsequent conversion to machine learning research in no small part to the first Connectionist Models Summer School, held at Carnegie Mellon in 1986. Since then he has worked on a variety of ML applications, including computer virus recognition, intelligent e-commerce agents, and most notoriously, TD-Gammon, a self-teaching program that learned to play backgammon at human world championship level. He has also been heavily involved for many years in the annual NIPS conference, and was NIPS Program Chair in 1993 and General Chair in 1994. He is currently interested in applying the latest and greatest ML approaches to a huge emerging application domain of self-managing computing systems, where he foresees great opportunities for improvements over current state-of-the-art approaches.