15-451 Algorithms                  11/13/13
                        recitation notes

* Hand back quiz
* A few problems from old hwks/minis/tests related to
  approx algs and online algs
=================================================================

- Go over quiz.

- Here are some problems from prior year's minis and hwks related to
approximation algorithms and online algorithms.


Problem 1. Strange SAT


We know the problem "Given a CNF formula, does it have a satisfying
assignment?" is NP-complete. In fact, the problem "Given a CNF
formula, does there exist an assignment that satisfies exactly one
literal per clause?" is also NP-complete. However, the problem "Given
a CNF formula, does there exist an assignment that satisfies an odd
number of literals in each clause" *can* be solved in polynomial time.
Why/how?

  Answer: convert each clause into a linear equation mod 2.  Then
  solve them using Gaussian elimination.  Notice that Gaussian
  elimination is fine mod 2 (unlike linear programming).  Here is an
  example if you want to get a feel for it:

       x + y         = 1 (mod 2)
           y + z     = 0 (mod 2)
           y +     w = 1 (mod 2)
       x +     z + w = 0 (mod 2)


Comment: Notice that you've shown the following strange fact.  Say A
is the set of CNF formulas that have a satisfying assignment, B is the
set of CNF formulas that have an assignment satisfying an odd number
of literals per clause, and C is the set of CNF formulas that have an
assignment satisfying exactly one literal per clause.  Notice that C
is a subset of B, and B is a subset of A.  This implies that even
though it is NP-complete to determine whether a given formula belongs
to A, and NP-complete to determine whether a given formula belongs to
C, membership in B can be decided in polynomial time. This is
interesting because given some formula F, if your algorithm says NO,
then we know that it is not in C, and if your algorithm says YES, then
we know that it *is* in A.


Problem 2: The List-Update Problem.

Suppose we have n data items x_1, x_2, ..., x_n that we wish to
store in a linked list in some order.  Let's say the cost for
performing a lookup(x) operation is $1 if x is in the head of the
list, $2 if x is the second element in the list, and so on.

For instance, say there are 4 items and it turns out that we end up
accessing x_1 3 times, x_2 5 times, x_3 once, and x_4 twice.  In this
case, in hindsight, the best ordering for a linked list would have
been (x_2, x_1, x_4, x_3) with a total cost of $21.

The {\em Move-to-Front} (MTF) strategy is the following algorithm for
organizing the list if we don't know in advance how many times we will
access each element.  We begin with the elements in their initial
order (x_1, x_2, \ldots, x_n).  Then, whenever we perform a lookup(x)
operation, we move the item accessed to the front of the list.  Let us
say that performing the movement is free.  For instance, if the first
operation was lookup(x_3), then we pay $3, and afterwards the list
will look like (x_3, x_1, x_2, x_4...)


Your job is to prove that the total cost of the MTF algorithm on a
sequence of m operations (think of m as much larger than n) is
at most 2C_{static} + n^2 where C_{static} is the cost of the best
static list in hindsight for those m operations (like in our first
example).  We will prove this in two steps.

a. First prove the somewhat easier statement that the cost of
Move-to-Front is at most 2C_{initial} where C_{initial} is the cost of
the original ordering (x_1, x_2, \ldots, x_n).

  Answer: If i<j but x_j is in front of x_i in the MTF list, let's say
  that x_j has ``cut in line'' in front of x_i.  Now, imagine that each
  element x_i has a piggy bank with $1 for everyone that is currently
  cutting in line in front of it.  (So, initially all banks are empty
  and by design they can never become negative.) 

  Suppose we do a lookup on x_i, which is currently at depth d.  Notice
  that if d>i, then there must be at least (d-i) elements cutting in
  line in front of it.  Notice also that by moving x_i to the front,
  these are no longer cutting in front of x_i, and x_i
  now is cutting in line in front of at most i-1 new elements
  (because by definition, there are at most i-1 elements that x_i can
  possibly cut in line in front of).  Finally, no other pairs (x_j, x_k)
  have changed orderings by our movement of x_i.  So the total cost +
  change in potential is at most  d - max(0,d-i) + (i-1),
  which is at most 2i-1.


b. Now prove the 2C_{static} + n^2 bound.

  Answer: just do exactly as part A except define "cutting in line" as
  being wrt the optimal static list.  Of course the algorithm doesn't
  know what that optimal list is, but that's fine since this is just
  for analysis.  Note also we need to "seed" our piggy banks with some
  initial cash, but that will cost us at most n^2.

Note: one nice use of this is for data compression.  You store each
ascii character in a list in this way, and then when reading a string
of text, for each character you output its index i in the list before
moving the character to the front (this requires only O(log i) bits,
which will be small if the item was close to the front of the list).