RainMon: An Integrated Approach to Mining Bursty Timeseries Monitoring Data
Ilari Shafer, Kai Ren, Vishnu Naresh Boddeti, Yoshihisa Abe, Gregory R. Ganger and Christos Faloutos
Timeseries data are prevalent in large-scale computing centers. Systems often capture sampled metrics of performance, utilization, and even sensor data like temperature. These streams are used for monitoring, placement, optimization, and more. RainMon is a framework to manage massive data-center timeseries streams that are lengthy and bursty in nature. It uses a multi-stage modeling approach. In the first phase, the incoming data streams are decomposed into “smooth” and “spiky” components. In the second phase, the streams are summarized into a set that can be visualized and understood. In the third phase, predictions are made about the future state of the system. Such a framework provides the potential to address a number of practical advances for data center efficiency,
The framework incorporates several existing algorithms from the literature including Cypress, SPIRIT and Kalman filters. RainMon has been applied to large data streams collected from production clusters to detect real anomalies.