15-451 Algorithms 11/28/06 * Number-theoretic algorithms II - an important property of groups - Fermat's little thm, Euler's thm - primality testing - complexity classes RP, co-RP, BPP ============================================================================== We ended class last time with Fermat's little theorem: if N is prime and A is between 1 and N-1, then A^{N-1} = 1 mod N. So, if we pick some A (like A=2) and compute A^{N-1} mod N (which we now know how to do efficiently), and find the result is not equal to 1, this is a PROOF that N is not prime, even though it gives us no information about how to factor N. Today we'll see a proof of FlT, and a generalization called Euler's theorem, and then we'll extend this further to get a randomized polynomial-time algorithm for testing if a number is prime or composite. DEFN: Z_N^* = {A in 1..N such that GCD(A,N)=1} E.g., Z_15^* = {1,2,4,7,8,11,13,14} Recall, Z_N^* is a group under multiplication mod N. That means it's closed under the group operation (e.g., 7*8=11 mod 15), and also every A in Z_N^* has an inverse B in Z_N^* (AB=1 mod N). E.g., 2^{-1} = 8 mod 15. Here is a REALLY IMPORTANT PROPERTY of groups: say G is a group and H is a subgroup of G (if A,B are in H then A*B is in H; if A is in H then A^{-1} is in H). Then THE SIZE OF H DIVIDES THE SIZE OF G. Proof: Say H is a subgroup of G and y is not in H. Then the coset yH = {yh : h in H} is a set of size |H| (if y*h1 = y*h2 then h1 = h2) and is disjoint from H (if y*h1 = h2 then y = h2*h1^{-1}, which is in H by H's group closure properties). Furthermore, all cosets are disjoint (if z*h1 = y*h2 then z = y*h3 for some h3 in H). E.g., G = Z_15^*, H = {1,2,4,8}. DEFN: N is a Carmichael number if N is composite but A^{N-1}=1 for all A in Z_N^*. THEOREM: if N is composite but not a Carmichael number, then A^{N-1}!=1 for *at least half* of the A in Z_N^*. Proof: Let H = {A in Z_N^* : A^{N-1} = 1 mod N}. Then H is a subgroup of Z_N^* because it's closed under multiplication (if A^{N-1} = 1 and B^{N-1} = 1 then (AB)^{N-1}=1) and inverses (if AB = 1 then (AB)^{N-1} = 1, so if A^{N-1} = 1 then B^{N-1}=1). This means that if there is even a single element of G that is not in H (namely if N is not Carmichael) then |H| is at most |G|/2. So, right away we get that if there is even a single witness to N being composite, there must be a *lot* of witnesses. So, if you pick 100 random A's and you always get 1 you can be pretty confident that A is either prime or a Carmichael number. OK, now let's go ahead and prove Fermat's little theorem. DEFN: Euler phi function: phi(N) = |Z_N^*|. E.g., if N is prime, then phi(N) = N-1. DEFN: for A in Z_N^*, order(A) = smallest t such that A^t = 1 (mod N). (e.g., in Z_15^*, order(2)=4, order(14)=2) THEOREM: for all A in Z_N^*, order(A) divides phi(N). PROOF: {1, A, A^2, ..., A^{t-1}} is a subgroup of Z_N^*: it's closed under multiplication mod N and taking inverses. So we just use our Really Important Property. COROLLARY 1: Euler's Theorem: for any A in Z_N^*, A^{phi(N)} = 1 (mod N). Proof: if t is the order of A, then phi(N) = B*t for some B by our theorem, and A^{phi(N)} = (A^t)^B = 1 (mod N). COROLLARY 2: Fermat's little theorem: If N is prime, then for any A in Z_N^*, A^{N-1} = 1 mod N. PRIMALITY TESTING ================= We're now going to give a fast randomized algorithm for testing if a number is prime, with the following properties: if N is prime, then it outputs YES if N is composite, then it outputs NO with probability at least 1 - 1/2^100. So, if it says YES, you don't have a 100% proof that the number is prime, but you can be pretty confident. [Note, it was only very recently that a *deterministic* poly-time algorithm for this problem was developed] We actually already almost have the algorithm. If we ignore the Carmichael numbers, then the algorithm is just this: - pick 100 random values A between 1 and N. - If all have A^{N-1} = 1 mod N then output YES (Probably Prime) - Else output NO (Definitely Composite) [don't even need to test GCD since if GCD is not 1 then for sure A^{N-1} != 1 mod N] The trick for Carmichael numbers is it turns out they are easy to factor. (They are also very rare. Smallest is 561.) Combining these two gives us the Miller-Rabin primality test. More on Carmichael numbers: We're going to be able to factor Carmichael numbers using the following idea. Suppose we have some number x that's not 1 or -1 mod N, such that x^2 = 1 mod N. E.g., 11^2 = 1 mod 15. This means that (x-1)*(x+1) is a multiple of N, even though neither x-1 nor x+1 is. So, GCD(x-1,N) gives us a factor of N (as does GCD(x+1,N)), and GCD is something WE CAN COMPUTE EFFICIENTLY! The way we will find such an x is via the following key lemma: KEY LEMMA: Suppose N is odd, not a prime power or perfect square, and composite. Let t < N. If there exists x in Z_N^* such that x^t != 1 (mod N), then at least half of x in Z_N^* have x^t != {-1,+1} mod N. Furthermore, if t is ODD, then such an x exists. Proof of KEY LEMMA: Don't have time to go through details but one step is you show the set of x such that x^t *is* in {-1,1} is a subgroup, and then you also use something called the Chinese Remainder Theorem. Proof of factoring Carmichael given our key lemma: First, we can trivially handle Ns that are even, prime-powers, or perfect squares, so we can assume the lemma above holds. Now, take N-1 and pull out all powers of 2, so that we have N-1 = B * 2^k, where B is an odd number. Now, consider exponents t = B, 2B, 4B, 8B, ..., N-1. The lemma says that we can put them into two categories: (1) all A in Z_N^* have A^t = 1 mod N (2) at least half of A have A^t != 1 or -1 (mod N) The lemma tells us that t=B is in category (2), and the fact that N is Carmichael tells us that N-1 is in category 1. So the world looks something like this: t = B 2B 4B 8B ... N-1 --------------------------------------------------- category: 2 2 1 1 ... 1 ^ call this point t_critical Now, pick random A and compute A^B, A^{2B},..., A^{N-1}. Define t_critical as largest exponent in category (2). By definition, there is at least 1/2 chance that A^{t_critical} != {1,-1}. Call this x. So, x is not 1 or -1, but x^2 = 1 mod N. So, we use this to factor! A little more complexity theory =============================== Turns out there are a number of interesting complexity classes that you can define in terms of a polynomial-time algorithm V that takes in two inputs: an instance I, and an auxiliary string w, where we assume that the length of w is polynomial in the size of the instance. NP: A is in NP if there is a polynomial-time algorithm V(I,w) such that: If "I" is in YES_A, then there exists w such that V(I,w) = YES. If "I" is in NO_A, then for all w, V(I,w) = NO. Co-NP: other way around from NP: swap YES and NO. RP (randomized polynomial time): A is in RP if exists poly-time V s.t.: If "I" is in YES_A, then for at least half of w, V(I,w) = YES. If "I" is in NO_A, then for all w, V(I,w) = NO. Co-RP: the other way around. BPP: two-sided error If "I" is in YES_A, then for at least 2/3 of w, V(I,w) = YES. If "I" is in NO_A, then for at least 2/3 of w, V(I,w) = NO. Can boost up the 1/2, 2/3 by repetition. We showed primality in Co-RP. RSA PUBLIC-KEY CRYPTOGRAPHY =========================== RSA is an answer to the question of "how can two people communicate securely if they have never met to exchange secret keys before?". Answer is to somehow separate encryption and decryption. Each person has a public encryption key that's published in the phone book, and then their own private decryption key that's kept secret. To send a msg to A, you look them up in the phone book and use their public key. The point about public-key crypto is that just because you can encrypt a message to A doesn't mean you can decrypt anyone else's msgs sent to A. RSA: Person A (Alice) subscribes by (1) picking two large primes p and q (say 100 decimal digits long) and computing N = p*q. (2) Picking a small odd number e relatively prime to (p-1)*(q-1). E.g., e=3. (3) Computing d = e^{-1} mod phi(N), where phi(N) = (p-1)*(q-1). (4) Publishing the pair (e,N) as her PUBLIC KEY in a global phone book, and keeping the pair (d,N) secret as her SECRET KEY. The primes p and q can now be thrown away. Person B(Bob) sends msg M to Alice by computing x = M^e mod N, and sending x. (5) To decrypt, Alice computes x^d mod N, which is M. [Ignoring various issues like: might want to prepend garbage onto M so that evesdropper can't guess-and-check] Let's now look at details of how/why all this works: STEP 1: a reasonable proportion of random numbers are prime. So can just pick random numbers and test, until we get two primes. STEP 3: just requires computing inverse which we know how to do. STEP (5): Why do we get back M? Answer is that this comes from Euler's theorem. x^d = M^{de} mod N. By definition, de = 1 + k*phi(N). So, M^{1 + k*phi(N)} = M*M^{phi(N)^k} = M*1^k = M (mod N). Also: use fast exponentiation here since d might be a large number. ====================================================================== Why might we expect RSA to be secure? Here is one fact: given N and e, finding the decryption key d is as hard as factoring. (Though this doesn't say there might not be some other way of decrypting a message that people haven't thought of).