15-451 Algorithms 12/01/2010
recitation notes
* FFT
* Answer questions, go over topics as needed, general review
====================================================================
FFT
===
FFT allows us to do a convolution of two vectors of length n in time
O(n log n). Def of convolution of A and B is vector C such that
C[i] = A[0]*B[i] + A[1]*B[i-1] + ... + A[i]*B[0].
E.g., if A and B are the vectors of coefficients of polynomials A(x)
and B(x), then C gives the coefficients for the polynomial A(x)*B(x).
This has a bunch of different uses. Here is a neat one.
String-matching with don't-cares
================================
In this problem you are trying to lookup all instances of something
like "s**rch" in a file, where the *'s can match any character
(search, starch). We'll call the thing we are looking up the "pattern"
P and we'll call the file the "text" T.
Say P has length n and T has length m, where m>n. There is
a simple O(mn)-time algorithm to solve this problem: try all
O(m) possible starting positions, and for each one, check in time
O(n) to see if P matches there. Can we do it faster?
Let's simplify the problem to assume that T is a string of 0s and 1s
and P is a string of 1s and *s. (it will be a little easier to not
have zeros in P). We want to find all locations of P in T. For
instance, if P = 11*1 and T = 10111101, then P appears twice in T:
once starting at the 3rd position in T and once starting at the 5th
position in T.
How can we use the FFT to solve this problem? All we need to do is
reverse P, change the *'s to zeroes (so in the above example, this
would give us 1011), and then do a convolution with T. We can then
scan the result to see the positions where the value of C[i] equals
the number of 1s in P. (Try on the above example).
If we wanted to handle zeroes in P, one thing we could do is a little
"padding". Go through T and replace each "1" with "10", and replace
each "0" with "01". Do the same with P. Replace stars with "11".
Then reverse P and do the convolution, and scan to see positions where
the value of C[i] equals n.
===================================================
General Review
--------------
What the course has been about:
1. techniques for developing algorithms
2. techniques for analysis
Break down (1) into
(a) fast subroutines: useful tools that go *inside* an algorithm.
E.g., data structures like B-trees, hashing, union-find. Things
like sorting, DFS/BFS.
(b) problems that are important because you can reduce a lot
of *other* problems to them: network flow, linear programming.
Notion of reduction is important in algorithms and in complexity
theory:
- solve a problem by reducing it to something you know how to do.
- show a problem is hard by reducing a known hard problem to it.
Let's organize the algorithms and problems we've discussed in class
along a "running time" line.
sublinear --- linear --- near-linear --- low-degree poly --- general poly(n) --- hard but probably not NP-complete --- NP-complete
sublinear:
- data structures: hashing, balanced search trees, heaps, union find.
linear:
- depth-first search, breadth-first search, topological sorting
- selection/median-finding
near-linear:
- greedy algs with good data structures
- Prim, Kruskal, Dijkstra
- divide-and-conquer algorithms
- sorting, FFTs
low-degree & general poly(n):
- dynamic programming: Bellman-Ford, LCS, etc.
- network flow, matchings, min-cost flow
- linear programming
- Graph matrix algorithms
- primality testing (we didn't really worry about exact running time)
Hard-but-not-probably-not-NP-complete:
- factoring
NP-complete:
- 3-SAT, Vertex cover, Clique, TSP, etc.
----------------------------------------------------------------
Algorithm tools and where they are typically used:
* Dynamic programming: typically for improving exponential time down to
polynomial.
* Reducing to LP, or reducing to network flow: likewise, typically for
improving exponential time down to polynomial.
* Divide-and-conquer: typically for getting from O(n^2) to O(n log n).
Sometimes just for reducing exponent (like Karatsuba or Strassen).
* Data structures: ditto
* Randomization: everywhere.
* Approximation algorithms: typically for NP-complete problems, but
sometimes also makes sense for easier problems too.
----------------------------------------------------------------
Analysis tools:
* Amortized analysis, Potential functions, piggy banks:
Typically for showing O(1) or O(log n) amortized cost per operation
* Reductions: proving NP-completeness by reducing a known NP-complete
problem to it, or giving a poly-time algorithm by reducing to a
problem like LP or network flow.
* Recurrences. Esp with divide-and-conquer
* Linearity of expectation
* Group properties for number-theory algs.
=====================================