15-451 Algorithms       11/17/09
* Number-theoretic algorithms II
  - Recap
  - primality testing (+ dealing with Carmichael #s)
  - complexity classes RP, co-RP, BPP
  - RSA
==============================================================================
We ended class last time with Fermat's little theorem: if N is prime
and A is between 1 and N-1, then A^{N-1} = 1 mod N.

So, if we pick some A (like A=2) and compute A^{N-1} mod N (which we
now know how to do efficiently), and find the result is not equal to
1, this is a PROOF that N is not prime, even though it gives us no
information about how to factor N.  Today we'll extend this further to
get a randomized polynomial-time algorithm for testing if a number is
prime or composite. 

RECAP
=====

DEFN: Z_N^* = {A in 1..N such that GCD(A,N)=1}

E.g., Z_15^* = {1,2,4,7,8,11,13,14}
Recall, Z_N^* is a group under multiplication mod N.  That means it's
closed under the group operation (e.g., 7*8=11 mod 15), and also every
A in Z_N^* has an inverse B in Z_N^* (AB=1 mod N).  E.g., 2^{-1} = 8
mod 15.

A REALLY IMPORTANT PROPERTY of groups: say G is a group and H
is a subgroup of G (if A,B are in H then A*B is in H; if A is in H
then A^{-1} is in H).  Then THE SIZE OF H DIVIDES THE SIZE OF G.

        E.g., G = Z_15^*, H = {1,2,4,8}. (H is the powers of 2)

PROOF: Say H is a subgroup of G and y is not in H. Then the coset yH =
{yh : h in H} is a set of size |H| (if y*h1 = y*h2 then h1 = h2) and
is disjoint from H (if y*h1 = h2 then y = h2*h1^{-1}, which is in H by
H's group closure properties).  Furthermore, all cosets are disjoint
(if z*h1 = y*h2 then z = y*h3 for some h3 in H).


THEOREM: for all A in Z_N^*, order(A) divides phi(N).

[Recall, order(A) = smallest t>0 s.t. A^t = 1 (mod N). phi(N) = |Z_N^*|. ]

PROOF: {1, A, A^2, ..., A^{t-1}} is a subgroup of Z_N^*:
  it's closed under multiplication mod N and taking inverses.  So we
  just use our Really Important Property.

COROLLARY 1: Euler's Theorem: for any A in Z_N^*, A^{phi(N)} = 1 (mod N).

COROLLARY 2: Fermat's little theorem: If N is prime, then for any A in
Z_N^*, A^{N-1} = 1 mod N.

DEFN: N is a Carmichael number if N is composite but A^{N-1}=1 for all
A in Z_N^*.


PRIMALITY TESTING
=================
We're now going to give a fast randomized algorithm for testing if a
number is prime, with the following properties:

      if N is prime, then it outputs YES
      if N is composite, then it outputs NO with probability at least
             1 - 1/2^100.

So, if it says YES, you don't have a 100% proof that the number is
prime, but you can be pretty confident.  [Note, it was only very
recently that a *deterministic* poly-time algorithm for this problem
was developed] 


THEOREM: if N is composite but not a Carmichael number, then A^{N-1}!=1 
for *at least half* of the A in Z_N^*.

Proof: Let H = {A in Z_N^* : A^{N-1} = 1 mod N}.  Then H is a subgroup
of Z_N^* because it's closed under multiplication (if A^{N-1} = 1 and
B^{N-1} = 1 then (AB)^{N-1}=1) and inverses (if AB = 1 then
(AB)^{N-1} = 1, so if A^{N-1} = 1 then B^{N-1}=1).  This means that if
there is even a single element of G that is not in H (namely if N is
not Carmichael) then |H| is at most |G|/2.   So, right away we get
that if there is even a single witness to N being composite, there
must be a *lot* of witnesses. QED

So, if we ignore the Carmichael numbers, then the algorithm we want is
just this:
  - pick 100 random values A between 1 and N.
  - If all have A^{N-1} = 1 mod N then output YES (Probably Prime)
  - Else output NO (Definitely Composite)

[don't even need to test GCD since if GCD is not 1 then for sure
A^{N-1} != 1 mod N]


The trick for Carmichael numbers is it turns out they are easy to
factor.  (They are also pretty rare.  Smallest is 561.)
Combining these two gives us the Miller-Rabin primality test.

More on Carmichael numbers: We're going to be able to factor
Carmichael numbers using the following idea.  Suppose we have some
number x that's not 1 or -1 mod N but nonetheless x^2 = 1 mod N.
Weird: wouldn't happen modulo a prime but can happen modulo a
composite.  [Like a bug in the Matrix...]  E.g., 11^2 = 1 mod 15.
This means that (x-1)*(x+1) is a multiple of N, even though neither
x-1 nor x+1 is.  So, only some of the factors of N must go into x-1
and the others into x+1 (like 10,12 for N=15).  This means GCD(x-1,N)
gives us a factor of N (as does GCD(x+1,N)), and recall that GCD is
something WE CAN COMPUTE EFFICIENTLY!

The way we will find such an x is via the following key lemma:

KEY LEMMA: Suppose N is odd, not a prime power or perfect square, and
let t < N.  If N is composite and there exists x in Z_N^* such that
x^t != 1 (mod N), then at least half of x in Z_N^* have x^t != {-1,+1}
mod N.  Furthermore, if t is ODD, then such an x exists.

Proof of KEY LEMMA: Don't have time to go through details but one step
is you show the set of x such that x^t *is* in {-1,1} is a subgroup,
and then you also use something called the Chinese Remainder Theorem.

Proof of factoring Carmichael given our key lemma: First, we can
trivially handle Ns that are even, prime-powers, or perfect squares,
so we can assume the lemma above applies.  Now, take N-1 and pull out
all powers of 2, so that we have N-1 = B * 2^k, where B is an odd number.
Now, consider exponents t = B, 2B, 4B, 8B, ..., N-1.  The lemma says
that we can put them into two categories:
        (1) all A in Z_N^* have A^t = 1 mod N
        (2) at least half of A have A^t != 1 or -1 (mod N)
The lemma tells us that t=B is in category (2), and the fact that N is
Carmichael tells us that N-1 is in category 1.  So the world
looks something like this: 

        t =     B       2B      4B      8B      ...     N-1
        ---------------------------------------------------
   category:    2       2       1       1       ...     1
                        ^
                        call this point t_critical

Now, pick random A and compute A^B, A^{2B},..., A^{N-1}.  Define
t_critical as largest exponent in category (2).  By definition, 
there is at least 1/2 chance that A^{t_critical} != {1,-1}.  Call this
x.  So, x is not 1 or -1, but x^2 = 1 mod N.  So, we use this to factor!

A little more complexity theory
===============================
Turns out there are a number of interesting complexity classes that
you can define in terms of a polynomial-time algorithm V that takes in
two inputs: an instance I, and an auxiliary string w, where we assume
that the length of w is polynomial in the size of the instance.

NP: A is in NP if there is a polynomial-time algorithm V(I,w) such that:
  If "I" is in YES(A), then there exists w such that V(I,w) = YES.
  If "I" is in NO(A), then for all w, V(I,w) = NO.

Co-NP: other way around from NP: swap YES and NO.

RP (randomized polynomial time): A is in RP if exists poly-time V s.t.:
  If "I" is in YES(A), then for at least half of w, V(I,w) = YES.
  If "I" is in NO(A), then for all w, V(I,w) = NO.

Co-RP: the other way around.

BPP: two-sided error 
  If "I" is in YES(A), then for at least 2/3 of w, V(I,w) = YES.
  If "I" is in NO(A), then for at least 2/3 of w, V(I,w) = NO.

Can boost up the 1/2, 2/3 by repetition.  We showed primality in Co-RP.

RSA PUBLIC-KEY CRYPTOGRAPHY
===========================
RSA is an answer to the question of "how can two people communicate
securely if they have never met to exchange secret keys before?".
Answer is to somehow separate encryption and decryption.  Each person
has a public encryption key that's published in the phone book, and
then their own private decryption key that's kept secret.  To send a
msg to A, you look them up in the phone book and use their public key.
The point about public-key crypto is that just because you can encrypt
a message to A doesn't mean you can decrypt anyone else's msgs sent to A. 

RSA: Person A (Alice) subscribes by
(1) picking two large primes p and q (say 100 decimal digits long) and
        computing N = p*q.  
(2) Picking a small odd number e relatively prime to (p-1)*(q-1). E.g., e=3. 
(3) Computing d = e^{-1} mod phi(N), where phi(N) = (p-1)*(q-1).
(4) Publishing the pair (e,N) as her PUBLIC KEY in a global phone
        book, and keeping the pair (d,N) secret as her SECRET KEY.  
        The primes p and q can now be thrown away.

Person B(Bob) sends msg M to Alice by computing x = M^e mod N, and sending x.

(5) To decrypt, Alice computes x^d mod N, which is M.

[Ignoring various issues like: might want to prepend garbage onto M so
that eavesdropper can't guess-and-check] 

Let's now look at details of how/why all this works:
 STEP 1: a reasonable proportion of random numbers are prime.  So can
   just pick random numbers and test, until we get two primes.

 STEP 3:  just requires computing inverse which we know how to do.

 STEP (5): Why do we get back M?
   Answer is that this comes from Euler's theorem.  
   x^d = M^{de} mod N.  By definition, de = 1 + k*phi(N).  So,
   M^{1 + k*phi(N)} = M*M^{phi(N)^k} = M*1^k = M (mod N).
   Also: use fast exponentiation here since d might be a large number.
======================================================================
Why might we expect RSA to be secure?  Here is one fact: given N and
e, finding the decryption key d is as hard as factoring.  (Though this
doesn't say there might not be some other way of decrypting a message
that people haven't thought of).