15-750 Graduate Algorithms 04/24/01
* Multiplying polynomials
* Fast Fourier Transform (FFT)
===========================================================================
Fast Fourier transform is one of most widely used nontrivial
algorithms in a variety of areas like signal processing, speech
understanding, digital audio, radar.
We'll develop it in the course of trying to solve the problem of "how
to multiply quickly" we talked about on first day of class. This is
not how it was invented historically and obscures the connection to
the usual kind of Fourier Transform, but I think it's cleanest from
the algorithmic perspective.
-- simpler problem: polynomial multiplication.
E.g., multiply (2x^2 + 5)(x^2 + 2x + 3)
2x^2 + 0x + 5
1x^2 + 2x + 3
-------------
6 0 15
4 0 10
2 0 5
----------------------------
2x^4 + 4x^3 + 11x^2 +10x+15
- basically a simpler version of integer multiplication without carrys.
- we'll view each individual small multiplication as a unit cost operation.
More generally, given A(x) = a_0 + a_1 x + ... + a_{n-1}x^{n-1},
B(x) = b_0 + b_1 x + ... + b_{n-1}x^{n-1}
Our goal is to compute C(x) = A(x)*B(x).
- Straightforward computation is O(n^2) time. Karatuba is n^{1.58..}
- we'll use FFTs to do in O(n log n) time. This is then used in
Schonhage-Strassen integer multiplication algorithm that multiplies
two n-bit integers in O(n log n loglog n) time. We're only going to do
polynomial multiplication.
=======
High Level Idea of Algorithm
============================
Let m = 2n-1. [so degree of C is less than m]
1. Pick m points x_0, x_1, ..., x_{m-1} according to a secret formula.
2. Evaluate A at each of the points: A(x_0),..., A(x_{m-1}).
3. Same for B.
4. Now compute C(x_0),..., C(x_{m-1}), where C is A(x)*B(x)
5. Interpolate to get the coefficients of C.
This approach is based on the fact that a polynomial of degree < m is
uniquely specified by its value on m points. The FFT will allow us
to quickly move from "coefficient representation" of polynomial to the
"value on m points" representation, and back, for our special set of m
points. (Doesn't work for *arbitrary* m points. The special points
will turn out to be roots of unity).
The reason we like this is that multiplying is easy in the "value on m
points" representation. We just do: C(x_i) = A(x_i)*B(x_i). So, only
O(m) time for step 4.
Let's just focus on forward direction for now. In that case, we've
reduced our problem to the following: Given a polynomial A of degree <
m, evaluate A at m points of our choosing in total time O(m log m).
Let's assume WLOG that m is a power of 2.
The FFT:
=======
Let's first develop it through an example. Say we have a polynomial
A(x) = 2 + 3x + 4x^2 + 5x^3 +6x^4 + 7x^5 + 8x^6 + 9x^7.
And we want to evaluate at eight points. Here is an idea. Say we
split A into the part with even degree and the part with odd degree.
Can write as
A(x) = (2 + 4x^2 + 6x^4 + 8x^6) + x(3 + 5x^2 + 7x^4 + 9x^6)
let
A_even(x) = 2 + 4x + 6x^2 + 8x^3
A_odd(x) = 3 + 5x + 7x^2 + 9x^3
So, A(x) = A_even(x^2) + x A_odd(x^2).
What's nice is that the effort spent computing A(x) will give us A(-x)
almost for free. So, let's say our special set of m points will have
the property:
The 2nd half of points are the negative of the 1st half (*)
E.g., {1,2,3,4,-1,-2,-3,-4}.
Now, things look good: Let T(m) = time to evaluate a degree-m
polynomial at our special set of m points. We're doing this by
evaluating two degree-m/2 polynomials at m/2 points each (the
squares), and then doing O(m) work to combine the results. This is
great because the recurrence T(m) = 2T(m/2) + O(m) solves to O(m log
m).
But, we're deluding ourselves by saying "just do it recursively".
Why is that? The problem is that recursively, our special points (now
{1, 4, 9, 16}) have to satisfy property (*). E.g., they should really
look like {1, 4, -1, -4}. But then what should original points look like?
{1, 2, i, 2i, -1, -2, -i, -2i}
so then their squares are:
1, 4, -1, -4
and their squares are
1, 16
But, at the next level we again need property (*). So, we want to
have {1, -1} there. Working back, here's what we'll *really* do. Let
w = sqrt(i) = 0.707 + 0.707i.
Then our original set of points will be:
1, w, w^2 (= i), w^3, w^4 (= -1), w^5, w^6 (= -i), w^7
so that the squares are:
1, i, -1, -i
and *their* squares are:
1, -1
and *their* squares are:
1
The "w" we are using is called the "eighth primitive root of unity"
(since w^8 = 1 and w^k != 1 for 0 < k < 8).
In general, the mth primitive root of unity is the vector
corresponding to the angle theta = 2*pi/m.
w = cos(2*pi/m) + i*sin(2*pi/m)
To multiply two complex numbers you mutliply the lengths and add the
angles. Here all of our lengths are 1, so we just need to add the
angles.
Since m is a power of 2, when we start with
{1, w, w^2, ..., w^{m-1}}
then recursively we'll have
{1, w^2, w^4, ..., w^{m-2}}
where w^2 is a m/2-th primitive root of unity.
THE FFT ALGORITHM
=================
Here is the general algorithm in pseudo-C:
Let A be array of length m, w be primitive mth root of unity.
Goal: produce DFT F(A): evaluation of A at 1, w, w^2,...,w^{m-1}.
FFT(A, m, w)
{
allocate space for output vector V.
if (m==1) return vector V = (a_0)
else {
A_even = (a_0, a_2, ..., a_{m-2})
A_odd = (a_1, a_3, ..., a_{m-1})
V_even = FFT(A_even, m/2, w^2) //w^2 is a primitive m/2-th root of unity
V_odd = FFT(A_odd, m/2, w^2)
for (j=0; j < m/2; ++j) {
V[j] = V_even[j] + w^j*V_odd[j]
V[j+m/2] = V_even[j] - w^j*V_odd[j]
}
}
MODULAR FFT
===========
What else has interesting roots of unity besides complex numbers? How
about modular arithmetic. E.g., 2 is a primitive 8th root of unity
mod 17. {2^0,2^1,2^2,...,w^7} = {1,2,4,8,-1,-2,-4,-8}.
Then when you square them, you get {1,4,-1,-4}, etc.
This is nice because we don't need to deal with messy floating-points
THE INVERSE OF THE FFT
======================
Remember, we started all this by saying that we were going to multiply
two polynomials A and B by evaluating each at a special set of m
points (which we can now do in time O(m log m)), then multiply the
values point-wise to get C evalauated at all these points (in O(m)
time) but then we need to interpolate back to get the coefficients.
In other words, we're doing F^{-1}(F(A) \cdot F(B)).
So, we need to compute F^{-1}. Here's how.
First, we can look at the forward computation (computing A(x) at 1, w,
w^2, ..., w^{m-1}) as an implicit matrix-vector product:
+---------------------------------------+ +-----+ +-----+
A(1) | 1 1 1 1 ... 1 | | a_0 | | y_0 |
| | | | | |
A(w) | 1 w w^2 w^3 ... w^{m-1} | | a_1 | | y_1 |
| | | | | |
A(w^2) | 1 w^2 w^4 w^6 ... w^{2(m-1)} | | a_2 | | y_2 |
| | | | = | |
A(w^3) | 1 w^3 w^6 w^9 ... w^{3(m-1)} | | a_3 | | y_3 |
| | | | | |
| ..... ..... | | ... | | ... |
| | | | | |
| 1 w^{-1} w^{-2} w^{-3} ... w | |a_m-1| |y_m-1|
+---------------------------------------+ +-----+ +-----+
(Note w^{m-1} = w^{-1} since w^m = 1)
(We're doing this "implicitly" in the sense that we don't even have
time to write down the matrix.)
To do the inverse transform, what we want is to multiply by the
inverse of the F matrix. As it turns out, this inverse looks very
much like F itself. In particular, notice that w^{-1} is also a
principal mth root of unity. Let's define \bar{F} to be the fourier
transform matrix using w^{-1} instead of w. Then,
Claim: F^{-1} = (1/m) * \bar{F}
Proof: What is the i,j entry of \bar{F} * F? It is:
1 + w^{j-i} + w^{2j-2i} + w^{3j-3i} + ... + w^{(m-1)j - (m-1)i}
If i=j, then these are all = 1, so the sum is m. Then when we
divide by m we get 1.
If i!=j, then the claim is these all cancel out and we get zero.
Maybe easier to see if we let z = w^{j-i}, so then the sum is:
1 + z + z^2 + z^3 + z^4 + ... + z^{m-1}.
Then can see these cancel by picture.
Or can use the formula for summations: (1 - z^m)/(1-z) = 0/(1-z) = 0.
Since F^{-1} has the same form as F, we can also compute it in time
O(n log n) with the same algorithms, just feeding it w^{-1} as the
root of unity to use.
CONNECTION TO USUAL FOURIER TRANSFORM
-------------------------------------
We've built up the FFT without saying in what sense this is a "Fourier
transform".
The usual Fourier transform works like this: You're given some
function f -- so the meaning of f(x) is the value of function f at
point x. The fourier transform is a new function F, where F(z) is the
correlation of f with the sine wave of frequency z.
F(z) = Integral of f(x)e^{-2*pi*i*zx}dx
In the discrete version, say we have sampled f at points j=0,1,2,3,...,m-1.
So, we are represnting f as a vector of length m.
If we think of the a_i's as representing these values, then the FFT is
computing the correlations with these different frequencies. So this
is sort of backwards compared to how we've been thinking about it.
In fact, some definitions call the DFT what we've been calling
the *inverse* DFT. In other words,
m-1 m-1
F(z) = (1/m) SUM f(j) e^{-(2*pi*i/m)jz} = (1/m)SUM f(j) w^{-jz}
j=0 j=0
This makes more sense in a way since to get the correlation we should
be dividing by m anyway.