SECURITY and CRYPTOGRAPHY 15-827 25 SEP 01
Lecture #4 M.B.
4615 Wean
On account of the YOM KIPPUR holiday, this class will NOT meet this
Thursday 27 September.
To get on the cs827 mailing list, send email to
hopper@cs.cmu.edu
indicating your desire to participate in this course.
The list will be used to address comments to the entire class.
The course web page is at
http://www.cs.cmu.edu/~hopper/cs827-f01/
Leonid LEVIN's OPTIMAL NUMBER-SPLITTING (FACTORING) ALGORITHM.
Let SPLIT denote any algorithm that computes
INPUT: a positive composite (i.e. not prime) integer n.
OUTPUT: a nontrivial factor of n.
THEOREM: There exists an "optimal" number-splitting algorithm, which we
call OPTIMAL-SPLIT. This algorithm is OPTIMAL in the sense that:
for every number-splitting ALgorithm SPLIT
there is a (quite large but fixed) constant C such that
for every positive composite integer input n,
the "running time" of OPTIMAL-SPLIT on input n is at most C times the
running time of SPLIT on input n.
RUNNING TIME of an algorithm on input n is the number of steps taken by
that algorithm on that input, using any "reasonable model" of computation,
and any "reasonable definition of step."
REASONABLE MODELS of computation include BASIC, C, PASCAL, FORTRAN, LISP
augmented with the PROG feature, and multi-tape multi-symbol Turing machines.
REASONABLE DEFINITION of a STEP includes ... the time to execute a single
"atomic" instruction, or a single machine cycle.
The OPTIMAL-SPLIT ALGORITHM:
BEGIN
Enumerate all algorithms in order of size, lexicographically within each size.
Run all algorithms so that at any moment in time, t, the ith algorithm
gets [1/(2^i)] fraction of the time to execute.
Wnenever an algorithm halts with some output integer m in the range 1 < m
< n, check if m divides n (i.e. if n mod m = 0).
If so, return m.
END
IDEA of PROOF:
Now let SPLIT be any number-splitting algorithm, say the ith in our list.
Then OPTIMAL-SPLIT runs SPLIT for a fraction [1/(2^i)] of its running time.
Hence the running time of OPTIMAL-SPLIT on input n is at most
(2^i) x (running time of SPLIT on input n).
QED
Challenge-Response protocols can be built on passwords, each of which is a
random randomly-accessible mapping from characters to digits.
RANDOM means that all 10^26 mappings are equally probable.
RANDOM-ACCESS means that one can "instantly" map a character x to its
corresponding digit f(x).
How easy is it to memorize such a random mapping?
The HOROSCOPE EFFECT: Look up your horoscope in the newspaper. Think about
it. Observe that it applies to you. Now look up horoscope for any other
date and see that it does NOT apply to you.
The idea is to give a person a random mapping, and then give that person a
reason why that mapping is the unique obvious correct mapping.
What are the possible reasons to map X to a particular digit in {0...9}?
X -> 0 (X is Roman 10. tic-tac-toe uses X's and 0's. )
1 (eXcellence is 1st class. X marks the (unique) spot. some people
write a slim X that looks like 1; only 1 X in a scrabble set.)
2 (X is written with *2* slashes)
3 (XYZ are the 3 last characters. Xmas has 3 wise men. Xavier's Feast
Day is Dec 3. X is 3rd from the end.
4 (X is a 4-leaf clover. X has 4 arms.)
5 (X has 4 arms plus a central point. The indeX FInger is one of 5
FIngers. X-ray is 5 character long word.)
6 (on account of the X in siX. No other digit has an X.)
7 (Latin 7 and X are both "crossed")
8 (center of 8 looks like an X. X looks like 8 if you put upper &
lower horizontal bars on it)
9 (NIX = NIne X. Red SoX has 9 players.)
What are the pros and cons of the following challenge-response protocol:
1. User's password ia a random map f from the 26 characters to the 10 digits.
2. For a d=1 digit response: for any 3-character challenge, the user
responds with the sum of the three digits mod 10.
More generally,
2' For a d digit response: for any 3*d character challenge, the user
responds with d digits, where the first digit is the sum of the first 3
characters, the second digit is the sum of the next 3 characters, and so on.
If the user need only respond correctly to 5 of the 6 challenges, then
the probab that an imposter succeeds by simply choosing the 6 digits at
random is 1/10^6 + (9*6)/10^6 = 55/10^6.
EXAMPLE : CHALLENGE = xcv sdf xdk yui ert fds
RESPONSE = 0 9 6 1 5 9
What are the pros (positives) and cons (negatives)?
PROS = The horoscope effect can make it relatively easy to learn a random
mapping. Once the map is burned into the brain's random access memory, and
once one can do addition mod 10 wo stumbling, one can get d digits at 5
seconds per digit. 30 seconds for 6 digits.
CONS = Have to learn a random map.
Each challenge response pair is an equation
Let f(a)=A, f(b)=B, etc. Then the above 6 challenge response pairs yield
the following 6 eqtns:
X+C+V=0 S+D+F=9 X+D+K=6 ... E+R+T=5 ...
26 independent equations suffice to determine the mapping f, using Gaussian
elimination.
Suppose you want 3-digit responses. Then you need 3 3-character challenges
for each authentication. After 9 independent authentications, the
eavesdropper knows the map or at least a great deal about it.
IMPORTANT: While Gaussian elimination is easy, Gaussian Elimination with
10% of equations in error is hard(er)!!
For example, suppose you have 36 equations and 10 are wrong, leaving 26
correct. One method to solve is to try and solve
36 choose 26 = 254x10^6 different systems of 26 equations.
------------------------------------------------------------
NEXT TIME: Some ideas from Learning Theory.
GAUSSIAN ELIMINATION WITH NOISE
INPUT: m linear equations in n unknowns (say over GF(2)).
A positive integer k.
QUESTION: Is there an assignment of values to the variables
that solves at least k of the m equations?
Gaussian Elimination with Noise is NP-hard (eg by reducing MAX 2-SAT to
it). It's even hard to approximate:
Johan Hastad "Some optimal inapproximability results" in Proc 29th STOC,
pp1-10 (May 1997) proves in a very strong sense that it is NP-hard to
approximate the number of equations that can be solved. One can randomly
solve half the eqtns, ie m/2. One cannot even solve m/2 + epsilon. ie to
solve that many is NP-hard.
Max 2-SAT:
Instance: set of m clauses, each of size 2, in n variables. an
integer k.
Question: Is there a truth assignment that satisfies at least k
of the clauses?
MAX 2-SAT is poly-time reducible to Gaussian Elimination with Noise.