MIME-Version: 1.0
Server: CERN/3.0
Date: Sunday, 24-Nov-96 22:05:18 GMT
Content-Type: text/html
Content-Length: 1826
Last-Modified: Saturday, 17-Feb-96 00:50:09 GMT
Unreliable Failure Detectors for Reliable Distributed Systems
Unreliable Failure Detectors for Reliable Distributed Systems
by
Tushar Deepak Chandra and
Sam Toueg.
This paper has 51 pages. To get a postscript copy of this paper, click
here
(or here
to get it from the mirror site). To appear in JACM.
Abstract
We introduce the concept of unreliable failure detectors
and study how they can be used to solve Consensus
in asynchronous systems with crash failures.
We characterise unreliable failure detectors
in terms of two properties ---
completeness and accuracy.
We show that Consensus can be solved even with unreliable failure detectors
that make an infinite number of mistakes,
and determine which ones
can be used to solve Consensus
despite any number of crashes,
and which ones require a majority of correct processes.
We prove that Consensus and Atomic Broadcast are reducible to
each other in asynchronous systems with crash failures;
thus the above results also apply to Atomic Broadcast.
A companion paper shows that one of the failure detectors introduced here
is the weakest failure detector for solving Consensus [CHT92].
Research supported by an IBM graduate fellowship, NSF grants CCR-8901780 and
CCR-9102231, DARPA/NASA Ames Grant NAG-2-593, and in part by Grants
from IBM and Siemens Corp.
Maintained by tushar@watson.ibm.com