This dissertation proposes a fundamentally different way of monitoring virtual disk state in the cloud. The proposed platform is both agentless—meaning it operates external to and independent of the virtual servers it monitors—and scalable—meaning it is designed to efficiently address collections of virtual servers numbering in the thousands. The core technology used to create this platform is called Distributed Streaming Virtual Machine Introspection (DS-VMI), and it leverages two properties of modern clouds: virtualized servers managed by Virtual Machine Monitors (VMMs) enabling efficient introspection, and file-level duplication of data within cloud instances.
We explore a new class of agentless monitoring
applications via three interfaces with two different consistency models:
cloud-inotify (strong consistency), /cloud (eventual consistency), and
/cloud-history (strong consistency). cloud-inotify is a publish-subscribe
interface to cloud-wide file-level updates and it supports event-based monitoring
applications. /cloud is designed to support batch-based and legacy monitoring
applications by providing a file system interface to cloud-wide file-level state.
/cloud-history is designed to support
efficient search and management of historic virtual disk state. Achieving
distributed near-real-time file-level deduplication, key for scalability, leads
to a novel application of an incremental hashing construction. We also
describe a novel snapshotting method combining the properties of both black box
and white box methods which creates near-real-time file-level deduplicated
snapshots of virtual disks.