The first step in the analysis is to show that, with high probability,
the number of splitter candidates chosen by each butterfly is
within a constant factor of the expectation. We say that an M-input
butterfly is well-partitioned if the number of splitter
candidates chosen is between
and
. The
upper bound ensures that the candidates can be sorted
deterministically by the butterfly in
time and the
lower bound implies that the subbutterflies at the next
level of recursion will have at most
inputs. If
all of the butterflies are well-partitioned, then the algorithm
terminates after
levels of recursion. (The choice of
and
as the coefficients of
are not particularly
important. Other constants would serve equally well.)
Proof:
We begin by considering a single M-input butterfly that
is to sort n packets. Since each packet chooses independently to be
a candidate, the number of candidates has a binomial distribution.
Let S be the number of successes in r independent Bernoulli trials
where each trial has probability p of success. Then we have
. We estimate the area under
the tails of this binomial distribution using a Chernoff-type bound
[8]. Following Angluin and Valiant [3]
we have
for
and
.
In our application r = n,
,
, and
. For any fixed constant
, there is a constant
such that the right-hand sides of the two inequalities sum
to at most
for
.
To bound the probability that any butterfly is not well-partitioned,
we sum the probabilities for all of the individual butterflies. Over
the course of the algorithm, the algorithm is invoked on at most
individual butterflies. Thus, the sum is at most
. For any
, there is a
such that this sum is
at most
.
The next lemma shows that, with high probability, the number of
packets in each interval is at most a constant factor times its
expectation. We say that an M-input butterfly that is assigned
n packets to sort is
-split if every interval has size
at most
. As we shall see, if every
butterfly is
-split and there are
levels of
recursion, then by lightly loading the butterfly we can ensure that no
butterfly is assigned too many packets to sort.
Proof:
We begin by examining a single packet in a single
M-input butterfly that is to sort n-packets. To show that a
packet lies in an interval of size at most
it
is sufficient to show that both following and preceding it in the
sorted order at least
of the next
packets are candidates.
First we consider the packets that follow in the sorted order. The
number of candidates in a sequence of
packets
has a binomial distribution. For
,
,
,
, and
, we have
. For any
we can make the right-hand side smaller than
by choosing
large enough.
The calculation for the packets that precede in the sorted order are
identical. The probability that fewer
of the preceding
packets are candidates is at most
.
Thus, the probability that an individual packet lies in an interval of
size greater than
is at most
.
To bound the probability that any interval in the butterfly is too
large we sum the probabilities that each individual packet lies in an
interval that is too large. Since there are at most
packets, this sum is at most
.
To bound the probability that any butterfly is not
-split, we sum
the probabilities that each individual butterfly is not. Over the
course of the algorithm, the algorithm is invoked on at most
butterflies. The sum of the probabilities is at most
. For any constant
, we can make this sum at most
by making
large enough.
The remainder of the analysis is conditioned on the event that every
subbutterfly is well-partitioned and
-split, which occurs with
high probability. Two technical points bear mentioning. First,
Lemma 8.1 requires that the number of inputs
to every subbutterfly be at least
, where
is some
constant. Since the recursion terminates when the number of inputs is
, N must be large enough that
. Second, both Lemmas 8.1
and 8.2 hold independent of the number of packets to be
sorted by each subbutterfly. Thus, as the following lemmas show, we
can adjust the load on the butterfly in order to ensure that each
M-input subbutterfly receives at most M packets to sort.
Proof:
At each level of recursion the number of inputs drops from M
to at most
, until the number of inputs reaches
.
Proof:
Since the ratio of packets to inputs is
at
the top level of the recursion, and increases by at most a constant
factor at each of
levels, it is possible to choose
such that at the bottom level it will be at most one.