Next: 5.3.3 Establishing edge-disjoint paths Up: 5.3 Removing the distinction Previous: 5.3.1 The network

### 5.3.2 Constructing node-disjoint paths

If the paths are to be node-disjoint, then each path must avoid the sources and sinks in the second and third parts as it passes from the first part to the fourth part. To avoid these sources and sinks, we declare them to be blocked. We then apply the technique of [17] for tolerating faults in multibutterfly networks to the second and third parts, treating blocked nodes as if they were faulty. The technique of [17] can be summarized as follows. First, any splitter (and all nodes that can be reached from that splitter) that contains more than a fraction of blocked inputs is erased, meaning that its nodes cannot be used for routing, where . Next, working backwards from the outputs to the inputs, a node is declared to be blocked if more than of its up or down neighbors at the next level are blocked (and not erased). (Note that it is not possible for all of a node's up and down neighbors to be erased unless that node is also erased.) Upon reaching the inputs of the network, all the blocked nodes are erased. The switches that are not erased are said to be working. The expansion property of the network of working switches is reduced from to .

The following lemmas bound the number of inputs (on levels and ) and outputs (on level 0) that are erased in the second and third parts. Note that Lemma 5.1 bounds the number of inputs that are erased, but are not themselves blocked. (All the blocked inputs are erased.) Note also that since the two parts share level 0, the number of erased nodes on that level may be as large as twice the bound given in Lemma 5.2.

In both networks at least of the inputs and of the outputs are left working, where K is the number of sources (and sinks). Suppose that , where is some constant. By choosing to be small, we can ensure that at least K of the nodes on level 0 are not erased in either the second or third parts. We call these K nodes the rendezvous points. By making (and hence d) large, we can also ensure that the number of nodes on levels and that are erased, but are not themselves sources or sinks, is , where can be made to be an arbitrarily small constant.

The reconfiguration technique described in [17] requires off-line computation to count the number of blocked inputs in each splitter. In another paper, Goldberg, Maggs, and Plotkin [12] describe a technique for reconfiguring a multibutterfly on-line in word steps.

The next step is to mark some of the nodes in the first part as blocked. We begin by declaring any node in the first part to be reserved if it is a neighbor of a source in the second, third, or fourth part via a cross edge. Now, working backwards from level to , a node is declared blocked if at least of its 2d neighbors at the next level are either sources, sinks, blocked, reserved, or erased. We call a node that is not a source or a sink, and is not reserved, blocked, or erased, a working node.

Where did the bound on non-working neighbors come from? In order to be apply the routing algorithm of Section 4.3, the subnetwork of working nodes must have an unshared neighbor property. Let be the largest value such that the subnetwork of working nodes has an expansion property (where is the original expansion property of the first part). To show that the subnetwork of working nodes has an unique neighbors property, we need . If every working node has at most non-working neighbors, then the subnetwork of working nodes has expansion property . (Recall that we multiply the parameter by 2 to get the actual expansion in each merger.) Thus . If , then . By restricting a working switch to have fewer non-working neighbors, we could have reduced the required expansion from down to nearly . As the following lemma shows, however, if a working switch can have non-working neighbors, then we also need in order to ensure that there aren't too many blocked nodes. If we were to allow a working switch to have fewer (or more) than non-working neighbors, then one of the two `` '' lower bounds would increase, and the network would require more expansion.

An identical process is applied to the fourth part, with blocked nodes propagating from level to level , and a lemma analogous to Lemma 5.3 can be proven, showing that there are at most blocked nodes in this part.

Because each node in the first part may be reserved by one source in each of the second, third, and fourth parts, it may not be possible for all the sources to establish their paths. If several sources wish to begin their paths at the same node, then one is locally and arbitrarily selected to do so, and the others give up. Since at most four paths start at any node in the first section, at least of the sources are able to begin their paths. Each source then sends a message to the corresponding sink. A message first routes across the row of its source to level (recall that in every merger there is an edge from each input to the output in the same row), then uses the multibutterfly store-and-forward packet routing algorithm from [17, 37] to route to the row of its sink on level 0, then routes across that row in the third and fourth parts until it either reaches its sinks or reaches the cross edge to its sink. The entire routing can be performed in word steps. Note that we can't use the circuit-switching algorithm of Section 4.3 here because there may be as many as sinks in a single row. The or more sinks that receive messages then each pick one of these messages (there are at most 4), and send an acknowledgement to the source of that message. At least sources receive acknowledgements, and these sources are the ones that will establish paths. A source that doesn't receive an acknowledgement gives up on routing its path.

Some of the nodes at which the remaining sources and sinks wish to begin or end their paths may have been declared blocked. None of these nodes will be used. By making (and hence d) large, however, the number of blocked nodes in the first and fourth parts, , can be made small relative to . Thus, we are left with source-sink pairs.

The paths from the sources and the paths from the sinks are routed independently through the first two and last two parts, respectively. The path from a source then meets the path from the corresponding sink at a rendezvous point on level 0.

The rendezvous points are selected as follows. First, the sources route their paths to distinct nodes on level in time using the algorithm from Section 4 on the working switches. Then the sources are numbered according to the order in which they appear on that level using a parallel prefix computation. A parallel prefix computation can be performed in word (or even bit) steps on an N-leaf complete binary tree, and hence also on a butterfly. For a proof, see [16, Section 1.2,]. (Note that although we are treating some of the nodes as if they were faulty, there are no actual faults in the network, so non-working nodes can assist in performing prefix computations.) The rendezvous points are also numbered according to the order in which they appear on level 0 using another prefix computation.

Next, using a packing operation, a packet representing the ith rendezvous point is routed from to the node in the ith row of level 0. At the same time, a packet representing the ith source is routed from level , where 's path has reached so far, to the ith node of level 0. These two routings can be implemented in word (or bit) steps on a butterfly [16, Section 3.4.3,].

Once the packets for and are paired up, a packet is sent back to 's node on level informing it of the position of on level 0. (This is an unpacking operation.) The path for source is then extended from level to level 0 using the algorithm from Section 4.3 on the working switches. Then a packet containing the location of is sent from 's node on level to the node on level 0 that is in the same row that lies on in the fourth part. This routing can be performed in word steps using the store-and-forward multibutterfly routing algorithm of [17]. (We can't use the circuit switching algorithm because there may be as many as sinks in the same row.)

In time, the packet works it way across the row from level 0 to , which lies somewhere between levels and . (Note that although there may be as many as in the same row, the total time is still at most .)

Finally, a path is extended from to any working node on level and from there to using the algorithm of Section 4.3 on the working switches.

Next: 5.3.3 Establishing edge-disjoint paths Up: 5.3 Removing the distinction Previous: 5.3.1 The network

Bruce Maggs
Mon Jul 22 21:19:59 EDT 1996