Substantial effort has been devoted to the study of store-and-forward packet routing algorithms for hypercubic networks. The fastest algorithms are randomized, and can solve any one-to-one packet routing problem on an N-node hypercube in packet steps with high probability [3, 14, 18, 19, 22, 23, 24, 26, 27]. Of the deterministic algorithms [5, 9, 10], the fastest requires packet steps . One drawback to these algorithms, however, is that they all require bit steps when implemented on real parallel machines such as the Connection Machine. The reason is that most parallel machines operate in a bit-serial or byte-serial fashion because of pin limitations on switches. Since packets always contain at least bits of addressing information, a typical packet step really consists of bit steps.
In general, fast bit-serial algorithms are harder to design than store-and-forward algorithms. This is particularly true on a hypercube since the head of a -bit packet should be close to its destination before the tail of the packet leaves the origin. Moreover, while the packet is spread out and moving through the network, it prevents up to wires from being used by other packets. This phenomenon raises issues such as deadlock and livelock  that do not arise in the simpler store-and-forward model.
In this paper, we describe an asymptotically optimal randomized algorithm for packet routing on the hypercube. The algorithm solves any permutation routing problem for packets that are M bits long (not including addressing information) in bit steps with high probability. Moreover, the algorithm requires each node of the hypercube to deal with only new packets at each step. The algorithm works in either the wormhole  or cut-through  models of routing.
The algorithm also solves the problem of on-line circuit switching in an -dilated hypercube, where a b-dilated hypercube is one in which each edge is replaced by a bundle of b edges. Previously, it was known that for any one-to-one routing problem, edge-disjoint paths could be established from the inputs to the outputs of a Benes network . The Benes network can be embedded in a 4-dilated hypercube. However, the only on-line algorithms previously known for finding the paths in a Benes network or hypercube require steps [8, 21]. Our algorithm finds a set of edge-disjoint paths on the -dilated hypercube, on-line, in bit steps.
The algorithm has three essential features: (1) it uses Valiant's paradigm of routing to random intermediate destinations, (2) it uses a new embedding of a -dilated butterfly into the hypercube that is based on 1-error correcting codes, and (3) it implements a bit-serial version of Ranade's algorithm [18, 19, 23] that eliminates the ghost packets and the need for packets to carry -bit ranks.
Most hypercube routing algorithms [15, 18, 19, 23, 26, 27] are oblivious. A deterministic oblivious routing algorithm is one in which the path taken by a packet through the network is a function of the origin and destination of the packet. Deterministic oblivious algorithms have previously been called oblivious algorithms . A randomized oblivious algorithm is one in which each packet independently chooses its path according to a probability distribution which is a function of its origin and destination.
The algorithms described in this paper are not oblivious. In fact, we show that any randomized oblivious algorithm on any N-node network with maximum node degree d requires bit steps, with high probability, for almost all permutations, where M is the packet size (not including addressing information). For routing -bit messages on the hypercube, the lower bound is bit steps. This result extends the Borodin-Hopcroft -packet-step lower bound for deterministic oblivious algorithms  to randomized oblivious algorithms. The lower bound for deterministic oblivious algorithms has recently been improved to by Kaklamanis, Krizanc, and Tsantilas , who also provide a deterministic oblivious store-and-forward algorithm for routing on the hypercube in steps. Further work on bounds for oblivious routing and trade-offs between time and randomness can be found in .
Although our algorithm is the first route permutations on the hypercube in bit steps with high probability, Leighton and Plaxton  have subsequently achieved a stronger result. They show how to sort on bounded-degree hypercubic networks in steps (with the caveat that with very small probability the output will not be in sorted order). Their algorithm can be pipelined and hence can achieve permutation routing as well as many-to-one and one-to-many routing on hypercubic networks in bit steps with high probability, using standard reductions of routing to sorting (plus a minor fix to take care of the case when the output is not sorted). Nonetheless, techniques used in this paper have recently been extended by Aiello and Leighton  to construct improved embeddings of trees and other structures in a hypercube and to design more efficient and robust algorithms for reconfiguring a hypercube around random or worst-case faults.
A natural open problem is to find a deterministic algorithm for routing (or sorting) in bit steps on a hypercubic network. The only known bounded-degree networks capable of deterministic -bit-step routing are the AKS sorting circuit  and the multibutterfly [4, 25]. However, these sorting and routing algorithms rely on expansion properties that the hypercube and its derivatives do not possess.
The remainder of the paper is divided into sections as follows. We start by describing two bit-step algorithms for routing on a -dilated butterfly in Section 2. The first is quite simple, but needs a switch to handle up to new packets at once. The second is more efficient, allowing a switch to handle at most 2 new packets at each step. In Section 3, we show how to implement the algorithms on a hypercube by embedding the -dilated butterfly in the hypercube. The lower bound for randomized oblivious routing is described in Section 4.