The Process Group Approach to Reliable Distributed Computing Kenneth P. Birman CACM v36 n12 1993 (Summary by Ted Wong) This paper presents the Isis protocols for supporting distributed groups of cooperating programs. Its key design features include: - Explicit process group abstractions - Virtual synchrony - Reliable ordered messaging with CBAST and ABCAST primitives. Process Groups -------------- Most distributed applications have some implicit notion of process groups, i.e. a set of processes that exchange messages and perform some collective computation based on those messages and other shared state. Isis provides explicit primitives for naming groups, managing membership, and transferring shared state to new memebers. Group names then provide a convenient destination for group-wide message multicasts; instead of explicitly sending a message to each member, a process multicasts to the group name and lets Isis handle the membership expansion under the covers. Group membership comes in two general flavors: anonymous and explicit. With anonymous groups, some members may multicast streams of messages, and other members may listen passively (cf. Usenet news messages). With explicit groups, members may be executing a distributed algorithm, and use their knowledge of the membership to assign work. Isis refers to the current list of group members as a 'view'. When processes join and leave (explicitly or through failure) the group, the remaining members run a group membership protocol to install a new view. As part of the protocol, members must ensure that message operation that began during the previous view complete before installing the new view. In the event that a group member fails, the surviving members 'shun' it by removing it from the group and excluding it from further message operations. Isis detects failures through multiple mechanisms: timeouts on missing ACKs for multicast messages, timeouts on 'keep-alive' pings, broken TCP connections to the group membership server, etc. When a process fails and is shunned, it must rejoin the group as a new member. Virtual Synchrony ----------------- We generally find it easier to reason about distributed algorithms that execute synchronously - that is, the execution environment schedules distributed events in some serial order. In general, though, many of these events may not have any temporal dependencies on each other, and may execute concurrently. Isis introduces the concept of virtual synchrony for message events. If two multicast messages are related by Lamport's 'happens before' relationship, then they will delivered in that order at all members. If not, then in general they are delivered in some arbitrary order. Ordered Messaging ----------------- Isis provides two types of mesage ordering: causally-ordered messages, and causal-and-totally-orders mesages, using the CBCAST and ABCAST primitives respectively. CBCAST messaging enforces 'happens before' relationship on messages. If m happens before m', then Isis delivers m before m' at all operational members. For the 'happens before' relationship to exist between m and m', either the same process must send both m and m', or it must receive m and subsequently send m'. m and m' sent by independent processes are not ordered. To enforce causal ordering, Isis uses a Lamport logical clock called a vector clock. Each member pi keeps a vector time VTi that has a message count entry for each member. Whenever pi sends a message m, it increments VTi[i] and timestamps m with the whole vector. Then, each receiver pj waits until it has: a) received all other messages from pi (checked by ensuring that VTj[i] is one less than VT[i] on the message). b) received as least as many messages as pi has from other pk (checked by ensuring that VTj[k] is equal to or greater than VT[k] on the message). We see that these ordering constraints correspond to the resource request ordering constraints given by Lamport [Lamport78]. ABCAST messaging adds total ordering on top of CBCAST. In addition to enforcing 'happens before' on causally-related messages, Isis imposes a total ordering on concurrent messages. That is, if Isis delivers concurrent m and m' in that order at any operational member, it delivers them in that order at all members. Discussion ---------- Isis attempts to incorporate strong message ordering guarantees into the communication layer of a distributed system, and thus make it easier for the application programmer to reason about the behaviour of their distributed application. However, Isis cannot model events that are temporally ordered by a channel outside of the message layer (e.g. threads communicating through shared memory variables), and the programmer may need to introduce additional end-to-end checks to account for such channels and their effects. With this caveat in mind, it is possible to build reliable distributed applications with high message throughput.