Client Library Interface

The client library will be a thin layer (i.e. having very little internal state) which contains an independent thread. This thread listens for machine list updates sent by the local daemon; when an update is sent, the thread calls any installed "watch" functions with the content of the update as a parameter.

A watch function is the client application's way of subscribing to machine list update notifications. The application passes a pointer to one or more designated watch functions to its interface library.

The update will be stored as a 64-bit quantity. The first 32-bit word will indicate the type of update (adding a machine to the list, deleting a machine from the list, "goodbye" before changing leaders, etc.) and the second contains an IP address, if pertinent to the type of update. The daemon will send these updates over the established TCP link with the client library, and the library will call installed watches with an update structure as a parameter.

Leadership Consolidation

When more than one machine assumes the role of leader, the network becomes partitioned. If the system can consolidate leadership within a short period after separate leaders discover each other, then high availability is maintained.

In our system, leader machines periodically multicast UDP packets on a designated "leader port" to announce their status. When a leader (call it Machine A) becomes aware of another leader with a lower IP address (Machine B), it ceases listening to other leaders and attempts to make a TCP connection to Machine B in order to merge with it. If the connection is refused or broken, the transaction fails and Machine A reverts to the leader state. Otherwise, it transfers its list of machines to Machine B, sends a goodbye message to the machines under it via the tether connections, and then closes the tethers. Machine B contacts all of the machines in Machine A's set (including Machine A), which enter a special wait state. Having received a goodbye message, they expect to be contacted by another leader within a fixed time interval. If this contact is not made, the machines revert to the startup state, our default fault model.