If the paths are to be node-disjoint, then each path must avoid the
sources and sinks in the second and third parts as it passes from the
first part to the fourth part. To avoid these sources and sinks, we
declare them to be *blocked*. We then apply the technique of
[17] for tolerating faults in multibutterfly networks to
the second and third parts, treating blocked nodes as if they were
faulty. The technique of [17] can be summarized as
follows. First, any splitter (and all nodes that can be reached from
that splitter) that contains more than a fraction
of blocked inputs is *erased*, meaning that its nodes cannot be used
for routing, where . Next, working backwards
from the outputs to the inputs, a node is declared to be blocked if
more than of its up or down neighbors at the next level are
blocked (and not erased). (Note that it is not possible for all of a
node's up and down neighbors to be erased unless that node is also
erased.) Upon reaching the inputs of the network, all the blocked
nodes are erased. The switches that are not erased are said to be *
working*. The expansion property of the network of working switches is
reduced from to .

The following lemmas bound the number of inputs (on levels
and ) and outputs (on level *0*) that are erased in the second
and third parts. Note that Lemma 5.1 bounds the
number of inputs that are erased, but are not themselves blocked.
(All the blocked inputs are erased.) Note also that since the two
parts share level *0*, the number of erased nodes on that level may be
as large as twice the bound given in Lemma 5.2.

In both networks at least of the inputs and of the
outputs are left working, where *K* is the number of sources (and
sinks). Suppose that , where is some
constant. By choosing to be small, we can ensure that at
least *K* of the nodes on level *0* are not erased in either the
second or third parts. We call these *K* nodes the *rendezvous
points*. By making (and hence *d*) large, we can also ensure
that the number of nodes on levels and that are
erased, but are not themselves sources or sinks, is ,
where can be made to be an arbitrarily small constant.

The reconfiguration technique described in [17] requires off-line computation to count the number of blocked inputs in each splitter. In another paper, Goldberg, Maggs, and Plotkin [12] describe a technique for reconfiguring a multibutterfly on-line in word steps.

The next step is to mark some of the nodes in the first part as
blocked. We begin by declaring any node in the first part to be
*reserved* if it is a neighbor of a source in the second, third,
or fourth part via a cross edge. Now, working backwards from level
to , a node is declared *blocked* if at
least of its *2d* neighbors at the next level are either
sources, sinks, blocked, reserved, or erased. We call a node that
is not a source or a sink, and is not reserved, blocked, or erased, a
*working* node.

Where did the bound on non-working neighbors come from? In
order to be apply the routing algorithm of Section 4.3,
the subnetwork of working nodes must have an
unshared neighbor property. Let be the largest value such
that the subnetwork of working nodes has an
expansion property (where is the original expansion
property of the first part). To show that the subnetwork of working
nodes has an unique neighbors property, we need
. If every working node has at most non-working
neighbors, then the subnetwork of working nodes has expansion property
. (Recall that we multiply the parameter
by *2* to get the actual expansion in each merger.) Thus . If , then .
By restricting a working switch to have fewer non-working neighbors,
we could have reduced the required expansion from down to
nearly . As the following lemma shows, however, if a working
switch can have non-working neighbors, then we also need in order to ensure that there aren't too many blocked nodes.
If we were to allow a working switch to have fewer (or more) than
non-working neighbors, then one of the two `` ''
lower bounds would increase, and the network would require more
expansion.

An identical process is applied to the fourth part, with blocked nodes propagating from level to level , and a lemma analogous to Lemma 5.3 can be proven, showing that there are at most blocked nodes in this part.

Because each node in the first part may be reserved by one source in
each of the second, third, and fourth parts, it may not be possible
for all the sources to establish their paths. If several sources wish
to begin their paths at the same node, then one is locally and
arbitrarily selected to do so, and the others give up. Since at most
four paths start at any node in the first section, at least of
the sources are able to begin their paths. Each source then sends a
message to the corresponding sink. A message first routes across the
row of its source to level (recall that in every merger
there is an edge from each input to the output in the same row), then
uses the multibutterfly store-and-forward packet routing algorithm
from
[17, 37] to route to the row of its sink on level *0*, then
routes across that row in the third and fourth parts until it either
reaches its sinks or reaches the cross edge to its sink. The entire
routing can be performed in word steps. Note that we
can't use the circuit-switching algorithm of Section 4.3
here because there may be as many as sinks in a single row.
The or more sinks that receive messages then each pick one of
these messages (there are at most *4*), and send an acknowledgement to
the source of that message. At least sources receive
acknowledgements, and these sources are the ones that will establish
paths. A source that doesn't receive an acknowledgement gives up on
routing its path.

Some of the nodes at which the remaining sources and sinks wish to
begin or end their paths may have been declared blocked. None of
these nodes will be used. By making (and hence *d*) large,
however, the number of blocked nodes in the first and fourth parts,
, can be made small relative to
. Thus, we are left with source-sink pairs.

The paths from the sources and the paths from the sinks are routed
independently through the first two and last two parts, respectively.
The path from a source then meets the path from the
corresponding sink at a rendezvous point on level *0*.

The rendezvous points are selected as follows. First, the sources
route their paths to distinct nodes on level in
time using the algorithm from Section 4 on the
working switches. Then the sources are numbered according to the
order in which they appear on that level using a parallel prefix
computation. A parallel prefix computation can be performed in
word (or even bit) steps on an *N*-leaf complete binary
tree, and hence also on a butterfly. For a proof, see
[16, Section 1.2,]. (Note that although we are treating
some of the nodes as if they were faulty, there are no actual faults
in the network, so non-working nodes can assist in performing prefix
computations.) The rendezvous points are also numbered according to
the order in which they appear on level *0* using another prefix
computation.

Next, using a packing operation, a packet representing the *i*th
rendezvous point is routed from to the node in the *i*th
row of level *0*. At the same time, a packet representing the *i*th
source is routed from level , where 's path has
reached so far, to the *i*th node of level *0*. These two routings
can be implemented in word (or bit) steps on a butterfly
[16, Section 3.4.3,].

Once the packets for and are paired up, a packet is sent
back to 's node on level informing it of the position
of on level *0*. (This is an unpacking operation.) The path
for source is then extended from level to level *0*
using the algorithm from Section 4.3 on the working
switches. Then a packet containing the location of is sent from
's node on level to the node on level *0* that is in
the same row that lies on in the fourth part. This routing can
be performed in word steps using the store-and-forward
multibutterfly routing algorithm of [17]. (We can't use
the circuit switching algorithm because there may be as many as sinks in the same row.)

In time, the packet works it way across the row from level
*0* to , which lies somewhere between levels and . (Note that although there may be as many as in
the same row, the total time is still at most .)

Finally, a path is extended from to any working node on level and from there to using the algorithm of Section 4.3 on the working switches.

Mon Jul 22 21:19:59 EDT 1996