Substantial effort has been devoted to the study of store-and-forward
packet routing algorithms for hypercubic networks. The fastest
algorithms are randomized, and can solve any one-to-one packet routing
problem on an *N*-node hypercube in packet steps with high
probability
[3, 14, 18, 19, 22, 23, 24, 26, 27].
Of the deterministic algorithms
[5, 9, 10], the fastest requires packet steps [9]. One drawback to these
algorithms, however, is that they all require bit
steps when implemented on real parallel machines such as the
Connection Machine. The reason is that most parallel machines operate
in a bit-serial or byte-serial fashion because of pin limitations on
switches. Since packets always contain at least bits of
addressing information, a typical packet step really consists of
bit steps.

In general, fast bit-serial algorithms are harder to design than store-and-forward algorithms. This is particularly true on a hypercube since the head of a -bit packet should be close to its destination before the tail of the packet leaves the origin. Moreover, while the packet is spread out and moving through the network, it prevents up to wires from being used by other packets. This phenomenon raises issues such as deadlock and livelock [11] that do not arise in the simpler store-and-forward model.

In this paper, we describe an asymptotically optimal randomized
algorithm for packet routing on the hypercube. The algorithm solves
any permutation routing problem for packets that are *M* bits long
(not including addressing information) in bit steps with
high probability. Moreover, the algorithm requires each node of the
hypercube to deal with only new packets at each step. The
algorithm works in either the wormhole [11] or cut-through
[16] models of routing.

The algorithm also solves the problem of on-line circuit switching in
an -dilated hypercube, where a *b*-*dilated* hypercube is
one in which each edge is replaced by a bundle of *b* edges.
Previously, it was known that for any one-to-one routing problem,
edge-disjoint paths could be established from the inputs to the
outputs of a Benes network [6]. The Benes network can be
embedded in a *4*-dilated hypercube. However, the only on-line
algorithms previously known for finding the paths in a Benes network
or hypercube require steps
[8, 21]. Our algorithm finds a set of
edge-disjoint paths on the -dilated hypercube, on-line, in
bit steps.

The algorithm has three essential features: (1) it uses Valiant's
paradigm of routing to random intermediate destinations, (2) it uses a
new embedding of a -dilated butterfly into the hypercube that
is based on *1*-error correcting codes, and (3) it implements a
bit-serial version of Ranade's algorithm
[18, 19, 23] that eliminates the ghost
packets and the need for packets to carry -bit ranks.

Most hypercube routing algorithms
[15, 18, 19, 23, 26, 27]
are oblivious. A *deterministic oblivious* routing algorithm is
one in which the path taken by a packet through the network is a
function of the origin and destination of the packet. Deterministic
oblivious algorithms have previously been called oblivious algorithms
[7]. A *randomized oblivious*
algorithm is one in which each packet independently chooses its path
according to a probability distribution which is a function of its
origin and destination.

The algorithms described in this paper are not oblivious. In fact, we
show that any randomized oblivious algorithm on any *N*-node network
with maximum node degree *d* requires bit steps, with high probability, for almost
all permutations, where *M* is the packet size (not including
addressing information). For routing -bit messages on the
hypercube, the lower bound is bit steps. This result extends the Borodin-Hopcroft
-packet-step lower bound for deterministic oblivious
algorithms [7] to randomized oblivious algorithms. The
lower bound for deterministic oblivious algorithms has recently been
improved to by Kaklamanis, Krizanc, and Tsantilas
[15], who also provide a deterministic oblivious
store-and-forward algorithm for routing on the hypercube in
steps. Further work on bounds for oblivious
routing and trade-offs between time and randomness can be found in
[17].

Although our algorithm is the first route permutations on the hypercube in bit steps with high probability, Leighton and Plaxton [20] have subsequently achieved a stronger result. They show how to sort on bounded-degree hypercubic networks in steps (with the caveat that with very small probability the output will not be in sorted order). Their algorithm can be pipelined and hence can achieve permutation routing as well as many-to-one and one-to-many routing on hypercubic networks in bit steps with high probability, using standard reductions of routing to sorting (plus a minor fix to take care of the case when the output is not sorted). Nonetheless, techniques used in this paper have recently been extended by Aiello and Leighton [1] to construct improved embeddings of trees and other structures in a hypercube and to design more efficient and robust algorithms for reconfiguring a hypercube around random or worst-case faults.

A natural open problem is to find a deterministic algorithm for routing (or sorting) in bit steps on a hypercubic network. The only known bounded-degree networks capable of deterministic -bit-step routing are the AKS sorting circuit [2] and the multibutterfly [4, 25]. However, these sorting and routing algorithms rely on expansion properties that the hypercube and its derivatives do not possess.

The remainder of the paper is divided into sections as follows. We
start by describing two bit-step algorithms for routing on
a -dilated butterfly in Section 2. The first is
quite simple, but needs a switch to handle up to new packets
at once. The second is more efficient, allowing a switch to handle at
most *2* new packets at each step. In Section 3, we
show how to implement the algorithms on a hypercube by embedding the
-dilated butterfly in the hypercube. The lower bound for
randomized oblivious routing is described in
Section 4.

Mon Jul 22 17:37:10 EDT 1996