Exploiting Weak Connectivity in a Distributed File System

Lily Mummert
December 1996


Weak connectivity, in the form of intermittent, low-bandwidth, or expensive networks is a fact of life in mobile computing. For the foreseeable future, access to cheap, high-performance, reliable networks, or strong connectivity will be limited to a few oases, such as work or home, in a vast desert of weak connectivity. The design of distributed file systems has traditionally been based on an assumption of strong connectivity. Yet, to provide ubiquitous data access, it is vital that distributed file systems make effective use of weak connectivity.

 This dissertation describes the design, implementation, and evaluation of weakly connected operation in the Coda File System. The starting point of this work is disconnected operation, in which a file system client operates using data in its cache during server or network failures. Disconnected clients suffer from many limitations: updates are not visible to other clients, cache misses may impede progress, updates are at risk from client loss or damage, and the danger of update conflicts increases as disconnections are prolonged. Weak connectivity provides an opportunity to alleviate these limitations.

 Coda's strategy for weakly connected operation is best characterized as application-transparent adaptation. The system bears full responsibility for coping with the demands of weak connectivity. This approach preserves upward compatibility by allowing applications to run unchanged. Coda provides several mechanisms for weakly connected operation motivated by actual experience. The foundation of adaptivity in this system is the communications layer, which derives and supplies information on network conditions to higher system layers. The rapid cache validation mechanism enables the system to recover quickly in intermittent environments. The trickle reintegration mechanism insulates the user from poor network performance by propagating updates to servers asynchronously. The cache miss handling mechanism alerts the user to potentially lengthy service times and provides opportunities for intervention.

 A quantitative evaluation of these mechanisms, based on controlled experimentation and empirical data gathered from the deployed system in everyday use, shows that Coda is able to provide good performance even when network bandwidth varies over four orders of magnitude -- from modem speeds to LAN speeds.