In this section, we describe an algorithm for routing N packets of length M in bit steps on an N-node -dilated butterfly network. In Section 3, we show that the algorithm can be emulated by an N-node hypercube with constant slowdown.
An 8-input butterfly network is illustrated in Figure 1. Each node has a distinct label , where r is the row, and l is the level. On a butterfly with inputs, the row is an n-bit binary number and the level is an integer between 0 and n. The nodes on level 0 and n are called the inputs and outputs, respectively. We will assume that the input and output nodes in each row are identified as the same node, so that the total number of nodes is . For l < n, a node labeled is connected to nodes and , where denotes r with bit l+1 complemented (bit 1 is the least significant, bit n the most). We will assume that the edges are directed in the order of increasing level (from left to right in Figure 1). We call the edges into a switch its input edges, and those out of the switch its output edges.
Figure 1: An 8-input butterfly network.
Between any input and output of the butterfly there is a unique path, and there is a simple greedy algorithm for finding the path. Upon reaching switch , l < n, a packet with destination compares with . If they are equal, the packet takes the edge to . If not, it takes the edge to .
A b-dilated butterfly is derived from a butterfly by replacing each edge in the butterfly with a bundle of b edges. A switch in a b-dilated butterfly has 2 input bundles of b input edges each, and 2 output bundles of b output edges each.
There are two models for the switches. In the strong switch model, a switch can examine all of its edges at each step. In the weak switch model, a switch can examine only one edge from each input bundle at each step. In both models a switch can shunt an input edge to an output edge so that, in the future, a bit received on the input edge is transmitted on the output edge one bit step later, without being examined by the switch. In the strong model, a switch can shunt any number of input-output pairs together in one step; in the weak model, only one. In either model, an edge can transmit at most one bit at each step.
Our goal is to route permutations on a fully loaded butterfly. It is easy to reduce this to the case where each input is the origin of n packets and each output is the destination of n packets. Paths for the packets are selected using Valiant's paradigm [26, 27]; each packet is first routed to a random intermediate destination, and then routed to its true destination. The routing is performed in two phases. In the first phase, each packet is routed from level 0 to a random output on level n. In the second phase, each packet is routed from level 0 to its true destination on level n. (Recall that the inputs and outputs in each row are identified as the same node.) As we shall see, routing through random intermediate destinations ensures that with high probability at most packets pass through any switch in each phase. The network is dilated by this same factor of so that at most one packet ever uses any edge. As a consequence, the routing algorithm can be used for circuit switching as well.
Packets are routed through the network in a wormhole fashion . At the end of each edge is a queue that can hold a small number of bits (typically 2). Since a packet carries at least n bits of addressing information, it cannot be stored entirely on one edge, but must be spread out over many edges. A packet can be thought of as a worm working its way head first through the network. Behind the head, each bit of the worm can advance only if there is space in the queue at the end of the next edge. When the head moves, the queue space it frees up trickles back to the tail, allowing the entire worm to move. When we speak of the location of a packet we refer to its head. Our algorithms also work for arbitrary cut-through routing  where the entire packet can pile up at one node, provided that there is sufficient queueing at the edges.
At the head of each packet is a 2n-bit destination header consisting of the row number of its intermediate destination followed by the row number of its final destination. Each time a packet passes through a switch, the lead bit in its header is stripped off and examined to determine which output bundle to send the packet through. The switch then shunts the edge on which the packet has just arrived to a previously unused edge in the outgoing bundle specified by the bit. Such an unused edge can always be found, provided that the number of wires in each outgoing bundle is at least as large as the number of packets that pass through the bundle.