15-750 Graduate Algorithms         January 25, 2001
         Daniel Sleator

* splay trees continued
--------------------------------------------------------------------------
In this lecture we'll see how the access lemma can be applied to prove
more advanced properties of splay trees.  We take advantage of the fact
that we can assign arbitrary positive values to the weights of the
nodes.

In the last lecture, we proved the following:

  Access lemma:  The number of splaying steps done when splaying
  node x in a tree with root t is at most 3(r(t)-r(x))+1.

Actually, something a bit stronger is true.  By defining rank to be the
log of the weight of a node (instead of the floor of the log), we can
prove a stronger lemma:

  Strong Access Lemma: The number of rotations done when splaying
  node x in a tree with root t is at most 3(r(t)-r(x))+1.

Here r(t) and r(x) denote the continuous definition of rank.  The bound
in this lemma is approximately a factor of two tighter than the previous
one.  Throughout these notes, we'll use the strong access lemma and the
continuous definition of rank.  The main advantage here is simply that
it means we can stop writing floors, because in these notes we're only
trying to prove big-oh bounds.

Let's rewrite the access lemma a bit.  Let w(x) be the weight of a node
x.  r(x) is log(w(x) + total weight of other descendants of x).  Since
the weight of the descendants of x is a positive number we have:

                       r(x) >= log(w(x))

So the amortized cost of a splay is bounded by 3(r(t) - log(w(x))) + 1.

The theorems below concern bounds on the running times of sequencess of
accesses in splay trees.  (The set of nodes in these trees is static.)
We'll use the rotation bound of the strong access lemma as our measure of
the running time of a sequence of operations.

Recall the following consequence of the definition of amortized cost:

 total real cost = total amortized cost + initial potential - final potential

To bound the cost of a sequence, we'll bound the total amortized cost
(using the access lemma) and bound the total decrease in potential that
could result from the sequence.

NOTATION: There will be n items in the tree, numbered 1...n.
The letter i will indicate an index into this list of items.
We'll be considering a sequence of m accesses, and the index j
will refer to one of these acceses.  Let i[j] refer to the index
of the jth access.

BALANCE THEOREM: The total access time is O(m + (n+m) log(n))

PROOF: Assign a weight of 1/n to each item.  Then the total weight is 1
and the rank of the root is 0.  So the access lemma bounds the amortized
cost of an access by 3log(n)+1.  Summing this over all accesses gives
O(m + m log(n)).  For any node i:

               0   >=   r(i)  >=  log(1/n)

Write this as:

               0   <=  -r(i)  <=  log(n)

Thus, the net decrease in the potential is at most log(n) per node, and
totals to n log(n).  QED.


STATIC OPTIMALITY THEOREM: Let q(i) be the number of times item i is
accessed.  Let m = Sum q(i) be the number of accesses.  Assume that
q(i) > 0 for each i.  Then the total cost of the sequence is:
                                      m
                      O(m + Sum log ------- )
                             j      q(i[j])

PROOF: Choose w(i) (the weight of item i) to be q(i)/m.  The total
weight of all items is 1.  So the amortized cost of an access to element
i is at most:

   3(r(root) - log(w(i))) + 1 = 3 (- log(q(i)/m)) +1 = 3 log(m/q(i))+1

Summing this, we see that the amortized cost satisfies the bound stated 
in the theorem.  It remains to bound the decrease in potential.  Each
item contributes its rank to the potential.  We have the following
bounds on ranks:

                 0 >= r(i) = log(size(i)) >= log(w(i))

Rewriting this:

                    0 <= -r(i) <= log(1/w(i))

The the maximum amount that a node's potential can decrease is just
log(1/w(i)), which is log(m/q(i)).   Summing this we see:

         total potential decrease <= Sum log(m/q(i))
                                      i

Since each element is accessed at least once, the terms in this
summation are a subset of the terms in the bound we're tryig to prove in
the theorem.  So this additional cost can be absorbed into the big-oh.
QED.

NOTE 1: The reason we need to assume that each element is accessed at
least once is to guard against the possibility that we start out with a
very bad tree which contains millions of elements near the top of the
tree that are never accessed, and we are forced to incur a cost to get
by them.

NOTE 2: This is called the static optimatilty theorem because the static
tree that is constructed to be optimal for this specific access sequence
will incur a cost of O(log(m/q(i))) to access the item i.

STATIC FINGER THEOREM: Let f be a specific element called the finger.  
Then the following is a bound on the cost of splaying a sequence:

               O(m + n log(n) + Sum log (|f - i[j]| + 1))
                                 j

NOTE: |f-i| is the distance in the symmetric ordering of the items
between the finger and item i. 

PROOF: Assign the weights w(i) as follows:

                   1
    w(i) = ----------------
                        2
             (|f-i| + 1)

 
Since Sum 1/n^2 (n from 1 to infinity) is a constant (pi^2/6) we see
that the sum of all the weights is bouned by a constant (at most twice
this).  Let's look at the amortized cost of an access.  Recall:

                3(r(t) - log(w(x))) + 1.

The term r(t), meaning the rank of the root, is just bounded by a
constant, and sums to O(m).  The 1 term also sums to O(m).  The
-log(w(x)) term can be written:

                   1
    -log ( ---------------- )
                        2
             (|f-i| + 1)

    = 2 log (|f-i|+1)

Combining these estimates shows that the amortized cost is bounded by
the summation stated in the theorem.  It remains to bound the decrease
in potential.  What bound can we put on the rank of an item i?

              2 pi^2                                1
         log( ------ )   >=   r(i)   >=   2 log (-------)
                6                                |f-i|+1

Negating this we get:

              2 pi^2
       - log( ------ )   <=   -r(i)   <=   2 log (|f-i|+1)
                6                                

So the potential change of a node is bounded by O(log n).
This contributes O(n log n) to the total real cost.  QED.

Some other theorems that we're not going to prove:

WORKING SET THEOREM: Let t(j) be the number of accesess of different
items that occured between access j and the previous access of the same
item.  The the cost of the sequence is bounded by:

               O(m + n log(n) + Sum log (t(j) + 1))
                                 j

NOTE: this is proven by using a weight assignment very similar to the
one used in the proof of the static finger theorem.  Imagine that we're
maintaining a move-to-front list (the accessed element moves to the
front).  When accessing an item the number of elements it has to pass
when it moves to the front is t(j).  If an item is in position p,
(positions numbered 1, 2, 3...) in the list, then it has a weight of
1/p^2.

The following theorems do not follow from the access lemma. 

SEQUENTIAL ACCESS THEOREM: The cost of the access sequence that accesses
each of the n items in the treey in symmetric (left-to-right) order is
O(n). 

This is generalized by the following theorem, the proof of which is 
very difficult:

DYNAMIC FINGER THEOREM: Using the notation of the static finger theorem,
a bound on the splaying cost is:

               O(m + n + Sum log (|i[j-1] - i[j]| + 1))
                          j

The two theorems above are special cases of the following conjecture: 

DYNAMIC OPTIMAILTY CONJECTURE: Consider the following class of
binary-tree based algorithms for processing a sequence of requests:
on each access we pay the distance from the root to the accessed node.
Arbitrary rotations are also allowed, at a cost of 1 per rotation.
For a sequence Sigma, let C(T,Sigma) be the cost of the optimal algorithm 
in this class for processing a sequence Sigma starting from tree T.
Then the cost of splaying the sequence (starting from any tree) is
O(C(T,Sigma) + n).

What about dynamic operations that change the set of elements in the
tree?  Here's how we can do Insertion, Join, and Deletion.


Insertion: Say we want to insert a new item e.  Proceed like ordinary
binary tree insertion, but stop short of linking the new item e into the
tree.  Let the node you were about to link e to be called p.  We splay p
to the root.  Now construct a tree according to one of the following
pictures: 
     
              e              e 
             / \            / \
            p                  p
           / \                / \

depending on if the item in e is before or after p.

Let's analyze this when all the weights are 1.  The amortized cost of
the splay is O(log n) according to the access lemma.  Then we have to
pay the additional amortized cost which equals the potential of the new
root.  This is log(n+1).  So the amortized cost of the whole thing is 
O(log(n)).

An alternative insertion algorithm is to do ordinary binary tree
insertion, and then simply splay the item to the root.  This also has
efficient amortized cost, but the analysis is a tiny bit more
complicated.

Join: Here we have two trees, say, A and B.  We want to put them
together with all the items in A to the left of all the items in B.
This is caled the "Join" operation.  This is done as follows.  Splay the
rightmost node of A.  Now make B the right child of this node.  The
amortized cost of the operation is easily seen to be O(log n) where n is
the number of nodes in the tree after joining.  (Just assign uniform
weights to all the nodes, as we did in the Insertion analysis.)

Deletion: To delete a node, x, simply splay it to the root and then Join
its left and right subtrees together as described above.  The amortized
cost is easily seen to be O(log n), where n is the size before the
deletion.