15-451 Algorithms 12/01/2010 recitation notes * FFT * Answer questions, go over topics as needed, general review ==================================================================== FFT === FFT allows us to do a convolution of two vectors of length n in time O(n log n). Def of convolution of A and B is vector C such that C[i] = A[0]*B[i] + A[1]*B[i-1] + ... + A[i]*B[0]. E.g., if A and B are the vectors of coefficients of polynomials A(x) and B(x), then C gives the coefficients for the polynomial A(x)*B(x). This has a bunch of different uses. Here is a neat one. String-matching with don't-cares ================================ In this problem you are trying to lookup all instances of something like "s**rch" in a file, where the *'s can match any character (search, starch). We'll call the thing we are looking up the "pattern" P and we'll call the file the "text" T. Say P has length n and T has length m, where m>n. There is a simple O(mn)-time algorithm to solve this problem: try all O(m) possible starting positions, and for each one, check in time O(n) to see if P matches there. Can we do it faster? Let's simplify the problem to assume that T is a string of 0s and 1s and P is a string of 1s and *s. (it will be a little easier to not have zeros in P). We want to find all locations of P in T. For instance, if P = 11*1 and T = 10111101, then P appears twice in T: once starting at the 3rd position in T and once starting at the 5th position in T. How can we use the FFT to solve this problem? All we need to do is reverse P, change the *'s to zeroes (so in the above example, this would give us 1011), and then do a convolution with T. We can then scan the result to see the positions where the value of C[i] equals the number of 1s in P. (Try on the above example). If we wanted to handle zeroes in P, one thing we could do is a little "padding". Go through T and replace each "1" with "10", and replace each "0" with "01". Do the same with P. Replace stars with "11". Then reverse P and do the convolution, and scan to see positions where the value of C[i] equals n. =================================================== General Review -------------- What the course has been about: 1. techniques for developing algorithms 2. techniques for analysis Break down (1) into (a) fast subroutines: useful tools that go *inside* an algorithm. E.g., data structures like B-trees, hashing, union-find. Things like sorting, DFS/BFS. (b) problems that are important because you can reduce a lot of *other* problems to them: network flow, linear programming. Notion of reduction is important in algorithms and in complexity theory: - solve a problem by reducing it to something you know how to do. - show a problem is hard by reducing a known hard problem to it. Let's organize the algorithms and problems we've discussed in class along a "running time" line. sublinear --- linear --- near-linear --- low-degree poly --- general poly(n) --- hard but probably not NP-complete --- NP-complete sublinear: - data structures: hashing, balanced search trees, heaps, union find. linear: - depth-first search, breadth-first search, topological sorting - selection/median-finding near-linear: - greedy algs with good data structures - Prim, Kruskal, Dijkstra - divide-and-conquer algorithms - sorting, FFTs low-degree & general poly(n): - dynamic programming: Bellman-Ford, LCS, etc. - network flow, matchings, min-cost flow - linear programming - Graph matrix algorithms - primality testing (we didn't really worry about exact running time) Hard-but-not-probably-not-NP-complete: - factoring NP-complete: - 3-SAT, Vertex cover, Clique, TSP, etc. ---------------------------------------------------------------- Algorithm tools and where they are typically used: * Dynamic programming: typically for improving exponential time down to polynomial. * Reducing to LP, or reducing to network flow: likewise, typically for improving exponential time down to polynomial. * Divide-and-conquer: typically for getting from O(n^2) to O(n log n). Sometimes just for reducing exponent (like Karatsuba or Strassen). * Data structures: ditto * Randomization: everywhere. * Approximation algorithms: typically for NP-complete problems, but sometimes also makes sense for easier problems too. ---------------------------------------------------------------- Analysis tools: * Amortized analysis, Potential functions, piggy banks: Typically for showing O(1) or O(log n) amortized cost per operation * Reductions: proving NP-completeness by reducing a known NP-complete problem to it, or giving a poly-time algorithm by reducing to a problem like LP or network flow. * Recurrences. Esp with divide-and-conquer * Linearity of expectation * Group properties for number-theory algs. =====================================