15-451 Algorithms: Lecture 5/2/00

Announcements:

- No recitation Thurs 5/4
- Final exam review Sun 5/7 1-4 pm, 1112 Doherty

**********

Topic:  Cryptography and Number-Theoretic Algorithms

  I. RSA Cryptosystem
 II. GCD
III. INVERSE mod n
 IV. EXPONENTIATION mod n

**********

Today we consider Cryptographic contexts for algorithm design.
Suppose Alice and Bob wish to communicate some information,
 compute a function, enter a contract, etc.
What are examples of cryptographic concerns???
[- Want their communication secure from an eavesdropper Eve
 - Alice wants to be sure Bob is really Bob, and vice-versa
 - Alice wants proof that message from Bob has not been tampered, and
   that Bob really sent the message
 - Alice may want to prove to Bob that she has some info without revealing it]

We will focus on one aspect of this, namely the use of a public-key
cryptosystem.  In particular, the RSA public-key cryptosystem.

**********

Usually, cryptography works by having the two people who want to
communicate first get together and decide on some shared secret
that is then used for encoding and decoding messages.  But what if the
two parties never get a chance to get together or talk in private: can
they still communicate securely?  Answer is Yes. 

Basic idea is to somehow separate encryption and decryption 
- Each person has an encryption key they make public 
  so anyone can encrypt a msg to send to them
- Each person has a private decryption key that only they know
- Just because you know how to encrypt a msg to person A, 
  doesn't mean you can decrypt anyone else's msgs sent to A.

What makes this possible: 
- Finding two large prime numbers and multiplying them is easy
- Factoring the product of two large prime numbers is difficult

FACTORING: Given number n, find a number d (1 < d < n) that divides n if
one exists.  No known fast algorithm.

Note: Goal is polynomial time in lg n  
      I.e., in the number of bits it takes to write down the input

Getting polynomial in n, in fact O(sqrt(n)) is easy.  How???
[For j=2..sqrt(n), see if j divides n]

Easy problems:
- PRIMALITY TESTING: Given number n, is it prime or composite?  This
  *can* be done quickly by a randomized algorithm: if n is prime then
  for sure will say prime, but if n is composite, it is only guaranteed
  to say composite with prob 1 - 1/2^{100}. 

- GCD: given a,b  compute GCD(a,b)
  What is GCD??? 
  [largest k such that k divides a and k divides b, 
   i.e., a and b are both multiples of k]

- INVERSE mod n:  given a,n such that gcd(a,n)=1,
                  compute b such that a*b = 1 mod n

- EXPONENTIATION mod n: given a,b,n  compute a^b mod n.

Will look at each of these, as time permits.

(Aside: if want to send a message to Alice, need to ensure that you
 know Alice's public key, i.e., the key has not been tampered with.
 Two ways to accomplish this: 
 - Disseminate keys so widely that tamperer can only modify a small
   fraction of the copies: will be able to detect tampering
 - Have a trusted key keeper.  Everyone knows the public key of the
   key keeper.  Exchange secure messages with the key keeper to get
   public key for Alice.)

**********

I. RSA Cryptosystem

Creating a (public key, secret key) pair for Alice:

(1) Select two large primes p and q (say 200 decimal digits long) and
        compute n = p*q  

(2) Select a small odd integer e relatively prime to (p-1)*(q-1). E.g., e=3. 
    What does relatively prime mean??? [gcd(a,b) = 1]

(3) Compute d such that  d*e = 1 mod (p-1)*(q-1)
 
(4) Publish the pair (e,n) as her PUBLIC KEY in a global phone
    book, and keep the pair (d,n) secret as her SECRET KEY
  
The primes p and q can now be thrown away

Bob sends a message M to Alice:

(a) Compute x = M^e mod n, and send x to Alice.  (M should be smaller than n
but still > n/1000000, say.  If it's larger, then you can break M up
into pieces and encrypt them separately.) 

(b) Alice decrypts by computing M' = x^d mod n (we show below that M' = M)

**********

Why is step 1 fast??? 
[starting at a random large number, test each number for primality
 until find a prime: primality testing]

Why is step 2 fast??? 
[GCD or simply see if 3 divides (p-1)*(q-1), then try 5, etc]

Why is step 3 fast???
[INVERSE mod n]

Why are steps (a) and (b) fast???
[EXPONENTIATION mod n]

**********

Why does Alice get M?
- Let x = M^e - n*z for some nonnegative integer z.
- Then x^d = (M^e - n*z)^d = M^{ed} + (a number of terms all having n as a factor)
- Thus M'= M^{ed} mod n
- Now ed = 1 + k(p-1)(q-1) for some integer k

- We will apply Fermat's little theorem: 
  For all a in [1..p-1]: a^{p-1} = 1 mod p, when p is prime

- Thus we have
  M^{ed} = M * (M^{p-1})^{k(q-1)} mod p
         = M * 1^{k(q-1)} mod p
         = M  mod p
(For special case of M = 0 mod p, we still have M^{ed} = M mod p)

Likewise, for all M, M^{ed} = M mod q

Our goal is to show that for all M, M^{ed} = M mod p*q

Thrm:  Let p and q be primes.  If X = Y mod p and X = Y mod q, then X = Y mod p*q.
Proof: Suppose instead that X = Y + Z mod p*q.
       Then Z = 0 mod p and Z = 0 mod q, and since p and q are prime, then 
       Z = j p q for some j.
       Hence Z = 0 mod p*q, implying X = Y mod p*q.
QED

Thus we have that regardless of the message M, M' = M, i.e., Alice receives M.

**********

II. GCD

GCD(a,b) = largest d such that d divides a and d divides b, 
i.e., a and b are both multiples of d

E.g., GCD(12,30) = 6

Note: GCD(a,0)=a

Claim: If a > b > 0, then GCD(a,b) = GCD(b, a mod b)

Proof: If d|a (so a = a'*d) and d|b (so b = b'*d) 
       then a mod b = a-kb = d(a' - kb'),
       so d|(a mod b) too.  
       Similarly, if d|b and d|(a-kb) then d|a.
       QED

How can we use this fact to compute a GCD???
[
EUCLID_GCD(a,b) {
  if (b=0) return a;
  else return EUCLID_GCD(b, a mod b);
}
]

E.g., EGCD(30,21) = EGCD(21,9) = EGCD(9,3) = EGCD(3,0) = 3.

Can anyone see a quick argument that running time is O(log a)?  E.g.,
looking 2 steps at a time???  
[If b < a/2 then after just 1 step the largest is cut in half. 
 If b > a/2 then a mod b < a/2 so in 2 steps the largest is cut in half.]

**********

Extended GCD: Also computes integers x and y such that d = ax + by.
- x and y may be negative or zero

The following algorithm returns a triple of the form (d,x,y)
such that d = GCD(a,b) = ax + by:

EXTENDED_EUCLID_GCD(a,b) {
  if (b=0) return (a, 1, 0);

  (d,x',y') = EXTENDED_EUCLID_GCD(b, a mod b);
  return (d,y',x'-floor(a/b)*y');
}

E.g., EEGCD(30,21)

     Calls EEGCD(21,9)
       Calls EEGCD(9,3)
         Calls EEGCD(3,0) returns (3,1,0)   Note: 3 = GCD(3,0) = 3*1 + 0*0
       returns (3,0,1-3*0) = (3,0,1)        Note: 3 = GCD(9,3) = 9*0 + 3*1
     returns (3,1,0-2*1) = (3,1,-2)         Note: 3 = GCD(21,9) = 21*1 + 9*(-2)
   returns (3,-2,1-1*(-2)) = (3,-2,3)       Note: 3 = GCD(30,21) = 30*(-2) + 21*(3)

Why does this work?
- Note: a mod b = a - floor(a/b)*b
- Recursively, running on b and a mod b, we compute d, x', y'
  such that d = b*x' + (a-floor(a/b)*b)*y' 
- Thus d = ay' + b(x'- floor(a/b)*y')

**********

III. INVERSE mod n

Given n, and a in [1..n-1] such that gcd(a,n)=1, 
would like to find b such that a*b=1 mod n,
i.e., b is the multiplicative inverse of a, mod n.

Note that for RSA step 3, we need inverse for e such that gcd(e,(p-1)*(q-1))=1

Algorithm:
- Compute extended GCD of n,a.  Get x,y such that nx+ay=gcd(n,a)=1

So, what is the inverse of a mod n???
[a*y = 1 mod n, so y is the inverse of a, mod n]

E.g., EEGCD(17,5) calls EEGCD(5,2) calls EEGCD(2,1) calls EEGCD(1,0)
      which returns (1,1,0) returns (1,0,1) returns (1,1,-2) 
      returns (1,-2,1-3(-2)) = (1,-2,7)
      Note: 17*(-2) + 5*7 = 1
      So y=7 is the inverse of 5 mod 17

**********

IV. EXPONENTIATION mod n

Given a, d, n, can we compute a^d mod n in time poly(lg n, lg d)?

Answer: yes, by repeated doubling
Nice way:  Call the routine below with x=1

POWER(a,d,x,n)  // invariant is result = (a^d)*x mod n
{
  if (d==0) return x;
  if d is odd, return POWER(a, d-1, x*a mod n, n);
  if d is even, return POWER(a^2 mod n, d/2, x, n);
}

Every two steps, d is cut by factor of 2, so # iterations is O(log d).

E.g., POWER(2,11,1,100) calls POWER(2,10,2,100) calls POWER(4,5,2,100)
      calls POWER(4,4,8,100) calls POWER(16,2,8,100)
      calls POWER(56,1,8,100) because 16^2 mod 100 = 56
      calls POWER(56,0,48,100) because 56*8 mod 100 = 48
      returns 48

      Correct, because 2^11 mod 100 = 2048 mod 100 = 48

**********

[FYI only. Did not have time to cover in class]

How to do primality testing:

Intuition for why primality testing might be easy is Fermat's little
theorem (given above): For all a in [1..p-1]: a^{p-1} = 1 mod p, when p is prime. 

How might we use this theorem as a primality test such that if a given number
n is prime, it must pass the test???
[Can try all a in [1..p-1] and see if a^{p-1} = 1 mod p.
 But takes too long to try out all a's.
 So instead try a bunch of random a's.
 If ever fails, then know number is not prime.]

Could it be that n is composite and yet all or most of a's 
have a^{n-1} = 1 mod n?  
It turns out unfortunately the answer is YES, but the good news is that 
those special numbers (called Carmichael numbers) are few and they can 
be handled using additional techniques (details are in the book).

The end result is an algorithm that 
- if n is prime then for sure will say prime, but
- if n is composite, it will say composite with prob 1 - 1/2^s, 
  where s is the number of random a's selected.