In this section, we describe an algorithm for routing *N* packets of
length *M* in bit steps on an *N*-node -dilated butterfly network. In Section 3, we
show that the algorithm can be emulated by an *N*-node hypercube with
constant slowdown.

An 8-input *butterfly network* is illustrated in
Figure 1. Each node has a distinct label ,
where *r* is the row, and *l* is the level. On a butterfly with
inputs, the row is an *n*-bit binary number and the level is an
integer between *0* and *n*. The nodes on level *0* and *n* are
called the *inputs* and *outputs*, respectively. We will
assume that the input and output nodes in each row are identified as
the same node, so that the total number of nodes is . For *l
< n*, a node labeled is connected to nodes and
, where denotes *r* with bit *l+1*
complemented (bit *1* is the least significant, bit *n* the most). We
will assume that the edges are directed in the order of increasing
level (from left to right in Figure 1). We call the
edges into a switch its *input edges*, and those out of the switch
its *output edges*.

**Figure 1:** An *8*-input butterfly
network.

Between any input and output of the butterfly there is a unique path,
and there is a simple greedy algorithm for finding the path. Upon
reaching switch , *l < n*, a packet with destination
compares with . If they are equal, the packet
takes the edge to . If not, it takes the edge to
.

A * b-dilated* butterfly is derived from a butterfly by replacing
each edge in the butterfly with a bundle of

There are two models for the switches. In the *strong switch
model*, a switch can examine all of its edges at each step. In the
*weak switch model*, a switch can examine only one edge from each
input bundle at each step. In both models a switch can shunt an input
edge to an output edge so that, in the future, a bit received on the
input edge is transmitted on the output edge one bit step later,
without being examined by the switch. In the strong model, a switch
can shunt any number of input-output pairs together in one step; in
the weak model, only one. In either model, an edge can transmit at
most one bit at each step.

Our goal is to route permutations on a fully loaded butterfly. It is
easy to reduce this to the case where each input is the origin of *n*
packets and each output is the destination of *n* packets. Paths for
the packets are selected using Valiant's paradigm
[26, 27]; each packet is first routed to a random
intermediate destination, and then routed to its true destination.
The routing is performed in two phases. In the first phase, each
packet is routed from level *0* to a random output on level *n*. In
the second phase, each packet is routed from level *0* to its true
destination on level *n*. (Recall that the inputs and outputs in each
row are identified as the same node.) As we shall see, routing
through random intermediate destinations ensures that with high
probability at most packets pass through any switch in
each phase. The network is dilated by this same factor of
so that at most one packet ever uses any edge. As a consequence, the
routing algorithm can be used for circuit switching as well.

Packets are routed through the network in a *wormhole* fashion
[11]. At the end of each edge is a queue that can hold a
small number of bits (typically *2*). Since a packet carries at least
*n* bits of addressing information, it cannot be stored entirely on
one edge, but must be spread out over many edges. A packet can be
thought of as a worm working its way head first through the network.
Behind the head, each bit of the worm can advance only if there is
space in the queue at the end of the next edge. When the head moves,
the queue space it frees up trickles back to the tail, allowing the
entire worm to move. When we speak of the location of a packet we
refer to its head. Our algorithms also work for arbitrary *
cut-through* routing [16] where the entire packet can
pile up at one node, provided that there is sufficient queueing at the
edges.

At the head of each packet is a *2n*-bit destination header consisting
of the row number of its intermediate destination followed by the row
number of its final destination. Each time a packet passes through a
switch, the lead bit in its header is stripped off and examined to
determine which output bundle to send the packet through. The switch
then shunts the edge on which the packet has just arrived to a
previously unused edge in the outgoing bundle specified by the bit.
Such an unused edge can always be found, provided that the number of
wires in each outgoing bundle is at least as large as the number of
packets that pass through the bundle.

Mon Jul 22 17:37:10 EDT 1996