next up previous
Next: 3 Emulating the routing Up: 2 Routing on a Previous: 2.1 Routing in the

2.2 Routing in the weak switch model

Routing is more complicated in the weak switch model. A problem arises when several packets arrive on the same input bundle at the same step. Since a switch can examine only one edge in the bundle, it can forward only one of them, while the others must wait. The routing algorithm must guarantee that the total delay experienced by each packet is at most tex2html_wrap_inline1023 bit steps.

Our routing algorithm is a variant of Ranade's algorithm for routing on the butterfly [18, 19, 23]. In Ranade's algorithm, each packet is initially assigned an tex2html_wrap_inline1025 -bit random rank, which it carries in front of its destination header. The most important invariant maintained by the algorithm is that the stream of packets leaving each switch is arranged in order of non-decreasing rank. In order to keep the stream sorted, a switch never sends a packet until it is sure that no other packets with smaller rank will arrive later.

The problem with implementing Ranade's algorithm in the bit model is that it takes too long to compare the ranks of two packets at the same switch. In our algorithm, each packet is also assigned a random rank initially, and that rank does not change throughout the course of the routing. However, a packet does not carry its rank along with it. Instead, we introduce a new type of packet called a token. A token carries no information other than its type, and as a consequence consists of only tex2html_wrap_inline1027 bits. The algorithm uses these tokens to represent the ranks of the other packets. It maintains the invariant that a packet of rank r passes through a bundle only after exactly r tokens have passed through the bundle. This invariant guarantees that the packets pass through the bundle in order sorted by rank. A token can be considered to have a rank as well; a token of rank r passes through a bundle only after exactly r tokens have passed through the bundle. Note that the tokens are not the same as the ghost packets used in Ranade's algorithm, which carry a rank. Our algorithm does not use ghost packets.

The mechanics of the algorithm are fairly simple. A switch maintains a pointer to an edge in each of its two input and two output bundles. We call these edges the current input and output edges, respectively. The behavior of the switch depends on whether the current input queues contain packets or tokens. There are four cases. If both of the current input edges have tokens in their queues, then the switch sends tokens on both of the current output edges and advances all four pointers. If one of the current input edges has a token in its queue and the other a packet, then the switch strips off the first bit of the packet's header, shunts the packet to the current edge of the output bundle specified by that bit, and advances the pointers for the input and output bundles used by the packet. If both of the current input edges have packets in their queues, then the switch forwards one of them, and advances the pointers for the bundles used by that one. If either of the current input edges has an empty queue, then the switch does nothing.

Initially, each packet p is assigned a random rank tex2html_wrap_inline1039 in the range tex2html_wrap_inline1041 , where tex2html_wrap_inline1043 , and will be specified later. The input switches on level 0 send out a stream of packets interspersed with tokens so that r tokens precede each packet with rank r. A total of w tokens are sent on each bundle. The tokens with rank w-1 follow all of the other packets and tokens in the network. A routing phase terminates when every output has received a rank w-1 token on both of its input bundles. Rank w-1 tokens can be specially marked so that the outputs do not have to count the number of tokens that they have received.

During the course of the algorithm, each edge is used by either one packet, one token, or not at all. Because w tokens pass through each bundle, we further dilate the butterfly by adding tex2html_wrap_inline1061 edges to each bundle. By using the same edges to send both packets and tokens, it is possible to remove these extra w edges. However, the algorithm and its analysis become slightly more complicated because several tokens may be sent across the same edge, and each one must wait to be sent until a space frees up in the queue at the end of the edge.

The proof that the heads of the packets quickly reach their destinations uses a delay sequence argument like that in [18, 19, 23]. A k-delay sequence consists of four components: a path backwards from an output to an input, a sequence of (not necessarily distinct) switches appearing on the path in the order tex2html_wrap_inline1067 , a sequence tex2html_wrap_inline1069 of distinct packets, and a sequence tex2html_wrap_inline1071 of non-increasing ranks. We say that a k-delay sequence has occurred if packet tex2html_wrap_inline1075 passes through tex2html_wrap_inline1077 for tex2html_wrap_inline1079 , and tex2html_wrap_inline1081 for tex2html_wrap_inline1083 .

The following lemma is the crux of the delay sequence argument. It shows that if some packet is delayed, then a delay sequence must have occurred. For simplicity we analyze the first routing phase only.


Proof: We will construct a delay sequence by spending lag points. The lag of a packet on level l at time tex2html_wrap_inline1093 is tex2html_wrap_inline1095 . If some packet arrives at its destination at step tex2html_wrap_inline1097 , then it has lag tex2html_wrap_inline1099 . We use these tex2html_wrap_inline1101 points to build the sequence. For each point spent, we will either insert a packet into the sequence, or decrease the rank of the next packet in the sequence. Since there can be at most w decreases in the rank, the sequence must contain at least tex2html_wrap_inline1105 packets.

The construction of the delay sequence begins with the last packet to arrive at its destination. Suppose a packet tex2html_wrap_inline1107 arrives at its destination tex2html_wrap_inline1109 at time tex2html_wrap_inline1111 . First, we insert tex2html_wrap_inline1113 , tex2html_wrap_inline1115 , and tex2html_wrap_inline1117 into the delay sequence. Then we follow tex2html_wrap_inline1119 back in time until it was last delayed. The lag remains tex2html_wrap_inline1121 at each step on the path, until the step at which tex2html_wrap_inline1123 failed to move forward, where the lag drops to tex2html_wrap_inline1125 .

If tex2html_wrap_inline1127 was delayed at switch tex2html_wrap_inline1129 because another packet tex2html_wrap_inline1131 was sent instead, then tex2html_wrap_inline1133 , and we insert tex2html_wrap_inline1135 , tex2html_wrap_inline1137 , and tex2html_wrap_inline1139 into the sequence, and follow tex2html_wrap_inline1141 back in time. In this case, we have inserted one packet into the sequence at a cost of one lag point. In the butterfly network, once the paths of two packets diverge, they do not meet again. Thus, if tex2html_wrap_inline1143 delays tex2html_wrap_inline1145 , then tex2html_wrap_inline1147 remains ahead of tex2html_wrap_inline1149 as long as their paths intersect. As a consequence, no packet is inserted into the sequence more than once.

If tex2html_wrap_inline1151 was delayed because a token tex2html_wrap_inline1153 was sent instead, then tex2html_wrap_inline1155 . In this case we do not insert anything into the sequence, but instead follow tex2html_wrap_inline1157 back in time until it was last delayed. In this case, we have lost one point in lag, but have also decreased the rank of the next packet in the sequence by at least one point.

The third possibility is that tex2html_wrap_inline1159 was delayed because the current edge on the other input bundle, c, had nothing to send. In this case we do not insert anything into the sequence, but go back through c to a switch s at the previous level at the previous time step. At the previous time step, switch s must not have sent anything through bundle c. This can happen for one of two reasons. Either s sent a packet tex2html_wrap_inline1173 on its other bundle, or one of the input bundles tex2html_wrap_inline1175 to s had nothing to send. In the former case, we insert tex2html_wrap_inline1179 , tex2html_wrap_inline1181 , and tex2html_wrap_inline1183 into the sequence, and follow tex2html_wrap_inline1185 back until it was last delayed. In the latter case, we go back through tex2html_wrap_inline1187 to a switch tex2html_wrap_inline1189 at the previous level at the previous time step and continue. Continuing in this fashion, we must eventually reach a switch that sent a packet.

Now that we have analyzed the ways in a which a packet can be delayed, we must consider the ways in which a token can be delayed. Suppose we are following a token tex2html_wrap_inline1191 back in time until it was last delayed at a switch s.

If tex2html_wrap_inline1195 was delayed because s sent a packet p instead, then we insert tex2html_wrap_inline1201 , tex2html_wrap_inline1203 , and tex2html_wrap_inline1205 into the sequence.

Otherwise, if tex2html_wrap_inline1207 was delayed because s sent another token tex2html_wrap_inline1211 instead, then tex2html_wrap_inline1213 , and we follow tex2html_wrap_inline1215 back until it was last delayed.

The last possibility is that tex2html_wrap_inline1217 was delayed because one of the input bundles c to s had nothing to send. In this case, we go back through c just as if a packet had been delayed for the same reason.

The preceding case analysis shows that for each point of lag spent, either a packet is inserted into the delay sequence, or the rank of the next packet in the sequence decreases. Thus, the sequence must contain at least tex2html_wrap_inline1225 packets.

to .667emto .667em


Proof: To bound the probability that some packet is delayed for tex2html_wrap_inline1235 steps, we now need only to bound the probability that some tex2html_wrap_inline1237 -delay sequence occurs. To do so, we enumerate all possible delay sequences, and sum the probabilities that each occurs. The argument is similar to the ones in [18, 19, 23]. There are tex2html_wrap_inline1239 ways to choose the path, since there are tex2html_wrap_inline1241 choices for the output at which it starts, tex2html_wrap_inline1243 choices for the input at which it ends, and only one path between them. Once the path is fixed, there are at most tex2html_wrap_inline1245 ways of choosing switches tex2html_wrap_inline1247 on the path. Let tex2html_wrap_inline1249 denote the level of switch tex2html_wrap_inline1251 . Then there are at most tex2html_wrap_inline1253 ways to choose tex2html_wrap_inline1255 . Finally, there are at most tex2html_wrap_inline1257 ways of choosing a sequence tex2html_wrap_inline1259 of non-increasing ranks in the range tex2html_wrap_inline1261 . A delay sequence occurs only if tex2html_wrap_inline1263 passes through tex2html_wrap_inline1265 and tex2html_wrap_inline1267 , for tex2html_wrap_inline1269 . For the delay sequence constructed thus far, the probabilities that these two events occur are tex2html_wrap_inline1271 , and tex2html_wrap_inline1273 , respectively. Thus, the probability that any delay sequence occurs is at most


Using the inequality tex2html_wrap_inline1277 for tex2html_wrap_inline1279 , it is straightforward to show that for any tex2html_wrap_inline1281 , there are constants tex2html_wrap_inline1283 and tex2html_wrap_inline1285 , such that for tex2html_wrap_inline1287 , and tex2html_wrap_inline1289 , tex2html_wrap_inline1291 , the product is at most tex2html_wrap_inline1293 .

to .667emto .667em

The analysis for the second phase of routing is similar. If we require the first phase to be completed before the second phase begins, then each takes tex2html_wrap_inline1295 bit steps, and the total time is tex2html_wrap_inline1297 bit steps. In fact, it is possible to pipeline the two phases so that some packets begin the second phase before the first is finished. The resulting algorithm is simpler and still requires only tex2html_wrap_inline1299 bit steps. The analysis is a little more complicated because a packet tex2html_wrap_inline1301 can delay another packet tex2html_wrap_inline1303 in the first phase, and tex2html_wrap_inline1305 can delay tex2html_wrap_inline1307 in the second.

Once the head of a packet reaches its true destination, the M bits of data following behind it trickle into the destination in tex2html_wrap_inline1311 bit steps. Since, with high probability, all of the heads reach their destinations in tex2html_wrap_inline1313 bit steps, the total time is also tex2html_wrap_inline1315 with high probability.

next up previous
Next: 3 Emulating the routing Up: 2 Routing on a Previous: 2.1 Routing in the

Bruce Maggs
Mon Jul 22 17:37:10 EDT 1996