15-451 Algorithms 10/05/04
* Dynamic Programming
=====================================================================
Dynamic Programming: Powerful technique that often allows you to solve
a problem in time O(n^2) or O(n^3) where a naive approach would take
exponential time. Usually, to get speed down below that (if
possible), need to add other ideas. Today: 2 or 3 examples.
There are several ways of thinking about the basic idea.
Basic Idea (version #1): What we want to do is take our problem and
somehow break it down into a reasonable number of subproblems (where
"reasonable" might be something like n^2) in such a way that we can
use solutions to the smaller subproblems to solve the bigger ones.
Unlike "divide and conquer" (like mergesort or quicksort) it is OK if
our subproblems overlap, so long as there aren't too many of them.
===========================================
Example #1: Longest Common Subsequence
2 strings: S of length n, T of length m. E.g.,
S = BACBAD
T = ABAZDC
LCS is longest sequence of characters that appear left-to-right (but
not necessarily in a contiguous block) in both strings.
E.g., here, LCS has length 4: ABAD.
For instance, use in genetics: given two DNA fragments, the LCS gives
info about what they have in common and the best way to line them up.
Equivalent problem: "minimum edit distance", where the legal
operations are insert and delete. (E.g., unix "diff", where S and T
are files, and the elements of S and T are lines of text). The
minimum edit distance to transform S into T is achieved by doing
|S| - LCS(S,T) deletes and |T| - LCS(S,T) inserts.
Let's solve the LCS problem using Dynamic Programming. Subproblems:
look at LCS of prefix of S and prefix of T. For simplicity, we'll
worry first about finding *length* of longest and then modify to
produce the sequence.
So, here's the question: say LCS[i,j] is the LCS of S[1..i] with
T[1..j]. How can we solve for LCS[i,j] in terms of the LCS for
the smaller problems?
Case 1: what if S[i]!=T[j]? Then, the answer LCS has to ignore one
of S[i] or T[j] so the answer is the max of LCS[i-1,j]
and LCS[i,j-1].
Case 2: what if S[i]==T[j]? Then the LCS of S[1..i] and T[1..j]
might as well match them up. If I gave you a CS that matched
S[i] to an earlier location in T, for instance, you could
always match it to T[j] instead. So, the solution is 1 +
LCS[i-1,j-1].
So, we can just do two loops, filling in the LCS using the rules above.
Here's what it looks like on the strings above (except for some reason
I switched S and T in this picture):
| B A C B A D
--+------------
A | 0 1 1 1 1 1
B | 1 1 1 2 2 2
A | 1 2 2 2 3 3
Z | 1 2 2 2 3 3
D | 1 2 2 2 3 4
C | 1 2 3 3 3 4
To find sequence: walk backwards through matrix to largest of left or
up. If both are < current, then go diagonally and output character.
(Or, could modify alg to fill matrix with sequences instead of numbers)
Total time: O(mn)
We've been looking at "bottom-up Dynamic Programming". Here's another
way of thinking about DP, that also leads to basically the same
algorithm, but viewed from the other direction. Sometimes this is
called "top-down Dynamic Programming".
Basic Idea (version #2): Suppose you have a recursive algorithm for
some problem that gives you a really bad recurrence like
T(n)=2T(n-1)+n. But, suppose that many of the subproblems you reach
as you go down the recursion tree are the *same*. I.e., the "tree" is
really a "DAG". Then you can hope to get a big savings if you store
your computations so that you only compute each one once. Can store
these in an array or hash table. Called "memoizing".
E.g., for LCS, using our analysis we had at the beginning, we might
have produced the following exponential-time recursive program (arrays
start at 1):
LCS(S,n,T,m)
{
if (n==0 || m==0) return 0;
if (S[n] == T[m]) result = 1 + LCS(S,n-1,T,m-1); // no harm in matching up
else result = max( LCS(S,n-1,T,m), LCS(S,n,T,m-1) );
return result;
}
Memoized version: start by initializing arr[i][j]=unknown for all i,j.
LCS(S,n,T,m)
{
if (n==0 || m==0) return 0;
if (arr[n][m] != unknown) return arr[n][m]; // <- added this line
if (S[n] == T[m]) result = 1 + LCS(S,n-1,T,m-1);
else result = max( LCS(S,n-1,T,m), LCS(S,n,T,m-1) );
arr[n][m] = result; // <- and this line
return result;
}
Running time is O(nm). Why?
-> we get to the second-to-last line at most n*m times
=> at most 2nm recursive calls.
=> O(nm) running time.
Comparing bottom-up and top-down: top-down (memoized) pays a penalty
in recursion overhead, but can be faster for problems where a
reasonable fraction of the subproblems never get examined at all, so
we can avoid computing them.
Extensions
==========
In computational biology applications, often one has a more general
notion of sequence alignment.
Given two sequences S and T, say we are allowed to extend them to S'
and T' by adding in spaces, and we have a position-by-position scoring
function "score(x,y)" where x and y are letters or spaces. Our goal
is to figure out how to add in the spaces to maximize
SUM_k score(S'(k), T'(k))
For instance, say we define
score(x,y) = 1 if x = y and they are not spaces, and
score(x,y) = 0 otherwise
Then this is the same as LCS. E.g., for above, we'd get an optimal
alignment of:
B A C B A - D -
- A - B A Z D C
What about for more general scoring functions? We can still solve
with basically the same algorithm:
LCS[i][j] = MAX( score(S[i],T[j]) + LCS[i-1][j-1],
score(S[i],"-") + LCS[i-1][j],
score("-", T[j]) + LCS[i][j-1] )
=============================================================================
Example #2: knapsack problem
Imagine you have a homework assignment with different parts
A,B,C,D,E,F,G. Each part has a "value" (in points) and a "size"
(time in hours) to do.
EG: A B C D E F G
value: 7 9 5 12 14 6 12
time: 3 4 2 6 7 3 5
You have: 15 hours. Which parts should you do? What's the best total
value possible? (Assume no partial credit on parts) (A: 34 -- ABFG)
This is called the "knapsack problem":
Given n items. Each has size s_i and value v_i. Knapsack of size S.
Goal: find subset of items of max possible total value such that sum
of sizes is <= S.
Can solve in time O(2^n) by trying all possible subsets. Want something
better. We'll use DP to solve in time O(n*S).
Let's do this top down by starting with a simple recursive solution
and trying to memoize. Start by just computing the best possible value
- then we can see how to actually extract the items needed.
recursive algorithm: either we use the last element or we don't.
Value(n,S) // S = space left, n = # items still to choose from
{
if (S <= 0 || n = 0) return 0;
if (s_n > S) result = Value(n-1,S); // can't use nth item
else result = max{v_n + Value(n-1, S-s_n), Value(n-1, S)};
return result;
}
Right now, time is O(2^n). But, how can we speed up?
Use a 2-d array to store results so we don't end up computing the same
thing over and over again. Initialize arr[i][j] to "unknown"
Value(n,S)
{
if (S <= 0 || n = 0) return 0;
if (arr[n][S] != unknown) return arr[n][S]; // <- added this
if (s_n > S) result = Value(n-1,S);
else result = max{v_n + Value(n-1, S-s_n), Value(n-1, S)};
arr[n][S] = result; // <- and this
return result;
}
Running time:
same analysis as for LCS: O(nS).
How to get items? Work backwards: if arr[n][S] = arr[n-1][S] then we
*didn't* use the nth item so recursively work backwards from
arr[n-1][S]. Otherwise, we *did*, so output nth item and recursively
work backwards from arr[n-1][S-s_n].
Can also do bottom up.
==============================================================================
If we have time:
Example #3: Matrix product parenthesization
Say we want to multiply 3 matrices X, Y, and Z. And we're going to
use the usual algorithm, not Strassen. We could do it like
this: (XY)Z or like this X(YZ). Which way doesn't affect final
outcome but *does* affect running time to compute it.
E.g., say X is 100x20, Y is 20x100, and Z is 100x20. So, end result
will be a 100x20 matrix. If we multiply using usual algorithm, then
to multiply lxm matrix by an mxn matrix takes time O(lmn). So in this
case, which is better, doing (XY)Z or X(YZ)?
Answer: X(YZ) is better because computing (YZ) takes 20x100x20 steps,
and produces a 20x20 matrix, then multiplying this by X takes another
20x100x20 steps, so total is just 2x20x100x20. But, doing the other
way, (XY) takes 100x20x100 steps, so already right there you've spent
more, and then multplying this with Z takes another 100x20x100 steps.
Question: suppose we need to multiply a series of matrices:
A_1 x A_2 x A_3 x ... x A_n.
What is the best way to parenthesize them?
There are an exponential number of different possible
parenthesizations: {2(n-1) choose n-1}/n. So don't want to search
through all of them. DP gives us a better way.
Idea: if you were going to do it recursively, what would the
subproblems be?
-> For each i,j, what is the best way to multiply A_i x ... x A_j.
So, let's start by solving the small subproblems and saving our
results, and then using those results to help us solve the bigger
subproblems.
How can you solve one subproblem given that you've already solved
all the smaller subproblems?
-> To figure out how to multiply A_i x ... x A_j, consider all
possible middle points k.
-> For each such k, our cost if we decided to split there would be:
cost-to-multiply(A_i x ... x A_k) <- already computed
+ cost-to-multiply(A_{k+1} x ... x A_j) <- already computed
+ cost to multiply the results. <- get right from the dimensions.
-> So, just chose the k that minimizes this sum.
What is the time?
-> O(n^2) subproblems.
-> O(n) time per subproblem (because you need to consider O(n) choices of k).
=> O(n^3) time total.
==============================================================================
High-level discussion of Dynamic Programming
What kinds of problems can be solved using DP? One property these
problems have is that if the optimal solution to the problem includes a
solution to a subproblem, then it includes the optimal solution to
that subproblem. For instance, say we want to find the shortest path
from A to B in a graph. And say this shortest path goes through C.
Then it must be using the shortest path from A to C. Or, in the
knapsack example, if the optimal solution doesn't use item n, then it
is the optimal solution for the problem in which item n doesn't
exist. The book calls this the "optimal substructure" property.
You have to be a little careful, though. For instance, suppose we're
trying to find paths between locations in Pittsburgh, and some
intersections have no-left-turn rules (this is even worse in San
Francisco). Then, just because the fastest way from A to B goes
through intersection C, it doesn't necessarily use the fastest way to
C because you might need to be coming into C in the correct direction.
In fact, the right way to model that problem as a graph is not to have
one node per intersection, but to have one node per
(intersection,direction) pair. That way you recover the optimal
substructure property.