15-750 Graduate Algorithms         January 23, 2001

* Amortized analysis
     (Lecture notes by D. Sleator, adapted from those of A. Blum)

----------------------------------------------------------------------------
Amortized analysis
------------------
-- definition and motivation
-- example of counting
-- example of implementing a FIFO queue with two stacks
-- example of doubling-array for stack.
-- potential functions

Amortized analysis means analyzing time-averaged cost for a
sequence of operations.

Motivation is that traditional worst-case-per-operation
analysis can give overly pessimistic bound if the only way
of having an expensive operation is to have a lot of cheap
ones before it.

NOTE: this is DIFFERENT from our usual notion of "average
case analysis" -- we're not making any assumptions about
inputs being chosen at random -- we're just averaging over
time.

The approach is going to be to somehow assign an artificial
cost to each operation in the sequence.  This artificial
cost is called the _amortized cost_ of an operation.  The
key property required of amortized cost is that the total
real cost of the sequence should be bounded by the total of
the amortized costs of all the operations.  Then, for
purposes of analyzing an algorithm that, say, accesses a
data structure, it is okay to just use the amortized cost
instead of the actual cost of the operation.  This will give
you correct results.

Note: There is sometimes flexibility in the assignment of
amortized costs.

There are going to be three approaches that we call:

   The aggregate method
   The banker's method (tokens in the data structure)
   The physicist's method (potential functions)

(The physicist's method is just a slightly more formal
version of the banker's method, as we'll see.)

We'll illustrate these methods through some examples.

Example1: a binary counter
---------------------------------
Say we want to store a big binary counter in an array A:
Start all entries at 0.  The operation we are going to
implement and analyze is that of counting.

The algorithm we'll use for incrementing this counter is the
usual one.  We toggle bit A[0].  If it changed from 0 to 1,
then we toggle bit A[1], etc.  We stop when a bit changes
from 0 to 1.  The cost of an increment is the number of bits
that change.

      A[m] A[m-1] ......   A[3] A[2] A[1] A[0]         cost
      ----------------------------------------         ----
       0      0              0    0    0    0
							1
       0      0              0    0    0    1
							2
       0      0              0    0    1    0
							1
       0      0              0    0    1    1
							3
       0      0              0    1    0    0
							1
       0      0              0    1    0    1
							2
       0      0              0    1    1    0

The number of bits that change when the increment produces a
number n is at most 1+floor(lg n).  (That's just the number of
bits in the binary representation of n.)

Thus, in a sequence of n increments, worst-case cost per increment
is bounded by n(1+floor(lg n)) = O(n log n).

But, what is our *amortized* cost per increment?  Answer:  2.

Proof 1 (aggregate method): how often do we flip A[0]?
Answer: every time.  how often do we flip A[1]?  Answer:
every other time.  How often do we flip A[2]?  Answer: every
4th time.  Etc.  So, total cost spent on flipping A[0] is n,
total cost of A[1] is floor(n/2), total cost on A[2] is
floor(n/4), etc.  So, the total cost is:

  total cost = n + floor(n/2) + floor(n/4) + ...  

  
  total cost <= n + n/2 + n/4 + n/8 + ...   <= 2n

So the total cost is 2n, which means the amortized cost of
an increment is 2.

Proof 2 (banker's method): Let's us a kind of accounting
trick.  On every bit that is a 1, let's keep a dollar on
that bit.  So for example, if the current count is 6, we'd
have:

                       $    $    
  array: 0  ....  0    1    1    0

We'll use the convention that whenever we toggle a bit, we
must pay a dollar to do that.

Let's say we allocate $2 to do an increment.  Let's see how
much it costs to do the increment.  In general, a bunch of
low order bits change from 1 to 0, and then one bit changes
from a 0 to a 1, and the process terminates.  For each of
the bits that changes from 1 to 0, we have a dollar sitting
on the bit to pay for toggling that bit.  For the bit that
changes from a 0 to a 1, we have to pay a dollar to toggle
the bit, then put a dollar on that bit (for future use).
Thus, having allocated $2 for the increment always
guarantees that we will have enough money to pay for the
work, no matter how costly the increment actually is.

This completes proof 2 that the amortized cost of the
increment is 2.

Example 2: Implementing a FIFO queue with two stacks
----------------------------------------------------

Say you have a stack data type, and you need to implement a
FIFO queue.  The stack has the usual POP and PUSH
operations, and the cost of each operation is 1.

We can implement a FIFO queue using two stacks as follows.
The FIFO has two operations: ENQUEUE and DEQUEUE:

  ENQUEUE(x):  Push x onto stack1.

  DEQUEUE():   if stack2 is empty, then we POP the
               entire contents of stack1 and PUSH it
               into stack2.  Now simply POP from 
               stack2 and return the result.

It's easy to see that this algorithm is correct.  Here we'll
just worry about the running time.  (I'm going to ignore the
cost of checking of stack2 is empty, and only measure the
cost in terms of the number of PUSHs and POPs that are
done.) 

Claim: the amortized cost of ENQUEUE is 3 and DEQUEUE is 1.

Proof 1 (aggregate method): As an element flows through the
two-stack data structure, it's PUSHed at most twice and
POPPed at most twice.  This shows that we can assign an
amortized cost of 4 to ENQUEUE and 0 to DEQUEUE.

To get the desired result, note that if an element is not
DEQUEUED it's only PUSHED twice and POPPED once.  So the
cost of 3 is paid for by the cost of 3 per ENQUEUE.  The
last POP is paid for by the DEQUEUE (if it happens).

Proof 2 (banker's method): Maintain a collection of tokens
on stack1.  In fact, keep 2 tokens for each thing in the
stack.

When we ENQUEUE, we have three tokens to work with.  We use
one to push the element onto stack1, and the other two are
put on the element for future use.  When we have to move
stack1 into stack2, we have enough tokens in the stack to
pay for the move (one POP and one PUSH for each element).
Finally, the last pop done by the DEQUEUE is paid for by the
1 we allocated for it. QED.


Example 3: doubling array
-------------------------

We'll use an array to implement a stack that allows push and
pop operations.  We represent the stack as an array, A[] and
an integer variable top that points to the top of the stack.
Here is a naive implementation of push and pop:

  push(x):  top++;  A[top] = x;
  pop;      top--;  return A[top+1]

These operations are both constant time (cost=1).  The
problem is, what happens when the array gets full?  To deal
with this, we keep another variable L, storing the size of
the array, and a variable k keeping count of the number of
elements of the array that are in use.  When we are about to
overflow the available space (k=L), we allocate an array
twice as large, move the data over to the new array, free
the old array, and double L.  This operation is called
"doubling".  It will be convenient to analyze it as thought
it was a separate operation on the data strucure, that
occurs only when k=L.

If the stack is full, and has size L, and we apply the
doubling operation, the cost of the operation is L, because
we have to move L items into the new array.  (The cost of
allocating and freeing the arrays is O(L).)

Starting from an empty stack, doing any sequence n pushes
and pops, how much does this cost?

First analysis: Total cost for pushes and pops is n.
Total cost for doubling is at worst 1 + 2 + 4 + ... n/2 + n
< 2n  So, the total cost is 3n.  Therefore we can say that
the amortized cost of an operation is 3.

Second analysis: Again, we use the financial approach.
We'll keep money lying around the data structure in the
following way.

                       $ $ $ $
                       $ $ $ $
      +-------------------------------+
      |x|x|x|x|x|x|x|x|x|x|x|x| | | | |
      +-------------------------------+
                      ^
                      |
                  midpoint

We'll keep $2 on each element stored in the stack that is
BEYOND THE MIDPOINT of the current array.  In the normal
case when we do a push (the array is not full), we need $1
to do the work, and then at most $2 to put onto the the new
item we just pushed.  Thus the cost is $3.  (pop is even
cheaper.) Now what happens if we have to do a doubling
operation?

Before doubling:


                       $ $ $ $ $ $ $ $
                       $ $ $ $ $ $ $ $
      +-------------------------------+
      |x|x|x|x|x|x|x|x|x|x|x|x|x|x|x|x|
      +-------------------------------+
                      ^
                      |
                  midpoint

       <--------------L--------------->

After doubling:


      +-------------------------------+-------------------------------+
      |x|x|x|x|x|x|x|x|x|x|x|x|x|x|x|x| | | | | | | | | | | | | | | | |
      +-------------------------------+-------------------------------+
                                      ^
                                      |
                                  midpoint

You can see that the money that we had on the items (L) was
sufficient to pay for all the work of doubling the array
(L).  We conclude that if we allocate $3 for each operation,
we'll never run out of money.  Thus the amortized cost of an
operation is 3.

In these examples, the financial model was not really
necessary to get the desired results.  However, later we'll
see examples where this approach is necessary.


Potential Functions. (The Physicist's method)
---------------------------------------------

Let's say that instead of distributing our money all over
the data structure (as we did above), we keep it all in a
piggy bank.  What's really important is how much money is in
the piggy bank.  We'll call the amount of money in this bank
the "potential function" (Phi) for the problem.

So for the two problems above, we used the following two
potential functions:

  Phi(counter) = # of one bits in the counter


  Phi(queue) = 2 * the size of stack 1.


               [  2(k-L/2)     if k >= L/2
  Phi(stack) = [
               [  0            otherwise  


(Here L is the current array size, and k is the number of
elements currently in the stack.)

Using this formalism we can define the amortized cost of an
operation.  Say the system changes from state S to state S'
as a result of doing some operation.  We define the
amortized cost of the operation as follows:

  amortized cost = actual cost + Delta(Phi) = 
                 = actual cost + Phi(S') - Phi(S)

This is simply the amount of additional money that we need
to maintain our piggy bank and to pay for the work.

For the counter, with the potential function given above:

   amortized cost of increment <= 2

For the stack, with the potential function given above:

   amortized cost of pop <= 3

How is this amortized cost related to actual cost?  Let's
sum the above definition of amortized cost over all the
operations:

  Sigma(amortized cost) = Sigma(actual cost) + Phi(final) - Phi(initial)
or
  Sigma(actual cost) = Sigma(amortized cost) + Phi(initial) - Phi(final)

If the potential is always non-negative, and starts at zero
(as it is in our examples), then

  Sigma(actual cost) <= Sigma(amortized cost)

In this more general framework, the potential can be
negative, and may not start at 0.  So in general we have to
worry about the initial and final potentials.

Summary of using potential functions to do amortized
analysis: 

  (1) Pick a potential function that's going to work
      (this is art)

  (2) Using your potential function, bound the amortized
      cost of the operations you're interested in.

  (3) Bound Phi(initial) - Phi(final)

In terms of (1), one obvious point is that if the actual
cost of an operation is HIGH, and you want the amortized
cost to be LOW, then in this case the potential must
DECREASE by a lot to pay for it.  This is illustrated in
both of the examples in this lecture.

We'll see in the next lecture just how essential this
formalism is for analyzing splay trees.