This dissertation proposes a fundamentally different way of monitoring persistent storage. It introduces a monitoring platform based on the modern reality of software defined storage which enables the decoupling of policy from mechanism. The proposed platform is both agentless—meaning it operates external to and independent of the entities it monitors—and scalable—meaning it is designed to address many systems at once with a mixture of operating systems and applications. Concretely, this dissertation focuses on virtualized clouds, but the proposed monitoring platform generalizes to any form of persistent storage.
The core mechanism this dissertation
introduces is called Distributed Streaming Virtual Machine Introspection
(DS-VMI), and it leverages two properties of modern clouds: virtualized servers
managed by hypervisors enabling efficient introspection, and file-level
duplication of data within cloud instances. We explore a new class of
agentless monitoring applications via three interfaces with two different
consistency models: cloud-inotify
(strong consistency), /cloud
(eventual
consistency), and /cloud-history
(strong consistency). cloud-inotify
is a
publish-subscribe interface to cloud-wide file-level updates and it supports
event-based monitoring applications. /cloud
is designed to support
batch-based and legacy monitoring applications by providing a file system
interface to cloud-wide file-level state. /cloud-history
is designed to support
efficient search and management of historic virtual disk state. It leverages
new fast-to-access archival storage systems, and achieves tractable indexing of
file-level history via whole-file deduplication using a novel application of an
incremental hashing construction.