next up previous
Next: 8 Sorting on butterflies Up: Randomized Routing and Sorting Previous: 6 Routing on multidimensional

7 Construction of area and volume-universal networks


In this section we construct a class of networks that are area-universal in the sense that a network in the class with N nodes has area tex2html_wrap_inline3533 and can, with high probability, simulate in tex2html_wrap_inline3535 steps each step of any network of area tex2html_wrap_inline3537 . The networks are based on the fat-trees of Greenberg and Leiserson [10] and the simulation uses the packet routing algorithm from Section 2.

Leiserson was the first to display a class of networks that could efficiently simulate any other network of the same area or volume. In [21] he showed that a fat-tree of area tex2html_wrap_inline3539 can simulate in tex2html_wrap_inline3541 bit-steps each bit-step of any network of area tex2html_wrap_inline3543 . (In the bit model an edge can transmit a single bit in each time step. All of the algorithms in this paper are described in terms of the packet model, in which an edge can transmit a packet of at least tex2html_wrap_inline3545 bits in one step.) Leiserson's simulation used an off-line routing algorithm for fat-trees. On-line routing algorithms were later developed by Greenberg and Leiserson [10].

A fat-tree network is shown in Figure 5. Its underlying structure is a complete 4-ary tree. Each edge in the 4-ary tree corresponds to a pair of oppositely directed groups of edges called channels. The channel directed from the leaves to the root is called an up channel; the other is called a down channel. The capacity of a channel C, tex2html_wrap_inline3549 , is the number of edges In the channel. We call the tree ``fat'' because the capacities of the channels grow by a factor of 2 at every level. A fat-tree of height m has tex2html_wrap_inline3555 nodes at the root and tex2html_wrap_inline3557 leaves.

Figure 5: A fat-tree.

It will prove useful to label the nodes at the top and bottom of each channel. Let the level of a node be its distance from the leaves. Suppose a channel C connects tex2html_wrap_inline3561 nodes at level l with tex2html_wrap_inline3565 nodes at level l+1. Give the nodes at level l labels 0 through tex2html_wrap_inline3573 and the nodes at level l+1 labels 0 through tex2html_wrap_inline3579 . Then node k at level l is connected to nodes k and tex2html_wrap_inline3587 at level l+1. The following lemma relates the labels of the nodes on a packet's path from a leaf to the root.


The simplest way to route packets in a leveled fashion on a fat-tree is to route every packet all of the way up to the root before routing it down to its destination. The problem with this strategy is that it may cause unnecessary congestion at the root. For example, suppose that every leaf wants to send a single packet to its sibling. If these packets are sent to the root, then tex2html_wrap_inline3601 packets will pass through each channel at the root. On the other hand, if each packet goes up just one level before turning around, then at most one packet will pass through any channel in the entire network. Thus, we will route every packet from its origin to its destination along a path that passes through as few channels as possible.

For a set Q of packets to be delivered between the leaves of the fat-tree, we define the load of Q on a channel C, tex2html_wrap_inline3609 , to be the number of different destinations of packets in Q for which at least one packet must pass through C. (A packet must pass through C only if every path from the packet's origin to its destination passes through the C.) Note that even if many packets with the same destination must pass through a channel, that destination contributes at most one to the load of the channel. The routing algorithm will combine all packets with the same destination that attempt to pass through the channel. We define the load factor of Q on C, tex2html_wrap_inline3623 , to be the ratio of the load of Q on C to the capacity of C, tex2html_wrap_inline3631 The load factor tex2html_wrap_inline3633 on the entire network is simply the maximum load factor on any channel tex2html_wrap_inline3635 The load factor is a lower bound on the the number of steps required to deliver Q. We shall sometimes write tex2html_wrap_inline3639 to denote tex2html_wrap_inline3641 when the set of packets to be delivered is clear from the context.

In a leveled fat-tree a node at the top of an up channel at level l is connected to itself at the top of the corresponding down channel by a linear chain of nodes of length tex2html_wrap_inline3645 . A packet may only make a transition from an up channel to a down channel by traversing a chain. Thus all shortest paths between leaves in a leveled fat-tree have length 2m. Note that the load of a set of packets on a channel of the leveled fat-tree is identical to the load on the corresponding channel in the fat-tree.

The path that a packet for destination x takes through a leveled fat-tree is determined by the m-universal hash function [7]


where P is a prime number larger than the number of possible different destinations, and the tex2html_wrap_inline3657 are chosen at random off-line. A packet with destination x follows up channels along the unique shortest path to the node labeled tex2html_wrap_inline3661 at the root until it can reach x without using any more up channels. It then crosses over to a down channel via a chain, and follows down channels to x. Note that a packet only passes through a channel if all paths from its origin to its destination pass through that channel. Also, all packets with destination x that pass through channel C pass through node tex2html_wrap_inline3671 at the top of C and through node tex2html_wrap_inline3675 at the bottom of C.

The following lemma shows that we can use the scheduling algorithm from Section 2 to route packets in a fat-tree.


Proof: The paths of the packets are first randomized using the universal hash function tex2html_wrap_inline3697 . With high probability, the resulting congestion is tex2html_wrap_inline3699 . Each packet travels a distance of tex2html_wrap_inline3701 . The packets are then scheduled using the algorithm from Section 2.

to .667emto .667em

Let us now consider the VLSI area requirements [34] of fat-trees. A fat-tree with root capacity M and tex2html_wrap_inline3705 nodes has a layout with area tex2html_wrap_inline3707 that is obtained by embedding the fat-tree in the tree of meshes [15]. The nodes of the tree of meshes in this layout are separated by a distance of tex2html_wrap_inline3709 in both the horizontal and vertical directions. Thus, the tex2html_wrap_inline3711 space for the chain associated with each node in the leveled fat-tree can be allocated without increasing the asymptotic area of the layout. (In fact, it is possible to attach a chain of size tex2html_wrap_inline3713 to each fat-tree node without increasing the area by more than a constant factor.) The leaves of the fat-tree are separated in the layout from each other by a distance of tex2html_wrap_inline3715 in each direction. We can improve the density of nodes without increasing the asymptotic area of the layout by connecting a tex2html_wrap_inline3717 mesh of nodes to each leaf. The resulting network has tex2html_wrap_inline3719 nodes and area tex2html_wrap_inline3721 . The N-node network in this class has root capacity tex2html_wrap_inline3725 , tex2html_wrap_inline3727 leaves, and area tex2html_wrap_inline3729 .

The following theorem shows that this class of networks is area-universal.


Proof: The nodes of network B are mapped to the nodes of the area-universal network U off-line using a recursive decomposition technique as in [21]. In each step, an edge of B is simulated by routing packets between the nodes that it connects. At each level of the recursion at most tex2html_wrap_inline3749 edges connect the nodes mapped below a channel C with the rest of the network. This property of the mapping ensures that the load factor of each set of packets used in the simulation of B is at most tex2html_wrap_inline3755 . At the bottom of the decomposition tree, a tex2html_wrap_inline3757 region of the layout of B is mapped to each leaf of the fat-tree. The tex2html_wrap_inline3761 mesh connected to the leaf in U simulates this region of B with tex2html_wrap_inline3767 slowdown using a mesh routing algorithm such as the one in Section 3.

to .667emto .667em

The study of fat-tree routing algorithms that perform combining was motivated in part by an abstraction of the volume and area-universal networks called the distributed random-access machine (DRAM). A host of conservative algorithms for tree and graph problems for the exclusive-read exclusive-write (EREW) DRAM are presented in [22]. Recently we discovered conservative concurrent-read concurrent-write (CRCW) algorithms that require fewer steps for some of these problems [23]. Until now, however, no efficient fat-tree routing algorithms that perform combining were known. The tex2html_wrap_inline3769 step routing algorithm presented here fills the void.

Only slight modifications to the area-universal fat-tree are necessary to make it volume universal [10]. The underlying structure of the volume-universal fat-tree is a complete 8-ary tree. Instead of doubling at each level, the channel capacities increase by a factor of 4. The tree has m levels, root capacity tex2html_wrap_inline3775 , and tex2html_wrap_inline3777 leaves. The nodes at the top of a channel at level l are labeled 0 through tex2html_wrap_inline3783 . Node k at level l is connected to nodes k, tex2html_wrap_inline3791 , tex2html_wrap_inline3793 , and tex2html_wrap_inline3795 at level l+1. A layout with volume tex2html_wrap_inline3799 for the fat-tree can be obtained by embedding it in the three-dimensional tree of meshes. As before, a chain of size tex2html_wrap_inline3801 can be attached to each node of the fat-tree without increasing the asymptotic layout area and the density of nodes can be improved by connecting a tex2html_wrap_inline3803 mesh to each leaf.

The simulation scheme in this section can also be used to simulate shared-bus networks. In a shared-bus network, an edge is allowed to connect more than two nodes. If several nodes attempt to send packets on the same edge in the same step, then the packets are combined using some simple rule to form a single packet. Combining is assumed to require a single step, regardless of the number of packets combined or the rule used. Because the fat-tree routing algorithm is capable of combining, it is no more difficult for the fat-tree to simulate shared-bus networks than to simulate point-to-point networks (i.e., networks in which each edge connects a pair of nodes). The simulation is optimal because a point-to-point network may require tex2html_wrap_inline3805 steps to simulate one step of a shared-bus network. Previously, no fat-tree routing algorithms were capable of combining packets to the same destination. As a consequence, no scheme for simulating shared-bus networks was known. A network that can simulate in tex2html_wrap_inline3807 steps each step of any shared-bus network area of equal area was presented in [24]. However, the connections in that network are not fixed, but instead nodes communicate via reconfigurable buses.

next up previous
Next: 8 Sorting on butterflies Up: Randomized Routing and Sorting Previous: 6 Routing on multidimensional

Bruce Maggs
Mon Jul 22 22:57:44 EDT 1996