SECURITY and CRYPTOGRAPHY 15-827 25 SEP 01 Lecture #4 M.B. 4615 Wean On account of the YOM KIPPUR holiday, this class will NOT meet this Thursday 27 September. To get on the cs827 mailing list, send email to hopper@cs.cmu.edu indicating your desire to participate in this course. The list will be used to address comments to the entire class. The course web page is at http://www.cs.cmu.edu/~hopper/cs827-f01/ Leonid LEVIN's OPTIMAL NUMBER-SPLITTING (FACTORING) ALGORITHM. Let SPLIT denote any algorithm that computes INPUT: a positive composite (i.e. not prime) integer n. OUTPUT: a nontrivial factor of n. THEOREM: There exists an "optimal" number-splitting algorithm, which we call OPTIMAL-SPLIT. This algorithm is OPTIMAL in the sense that: for every number-splitting ALgorithm SPLIT there is a (quite large but fixed) constant C such that for every positive composite integer input n, the "running time" of OPTIMAL-SPLIT on input n is at most C times the running time of SPLIT on input n. RUNNING TIME of an algorithm on input n is the number of steps taken by that algorithm on that input, using any "reasonable model" of computation, and any "reasonable definition of step." REASONABLE MODELS of computation include BASIC, C, PASCAL, FORTRAN, LISP augmented with the PROG feature, and multi-tape multi-symbol Turing machines. REASONABLE DEFINITION of a STEP includes ... the time to execute a single "atomic" instruction, or a single machine cycle. The OPTIMAL-SPLIT ALGORITHM: BEGIN Enumerate all algorithms in order of size, lexicographically within each size. Run all algorithms so that at any moment in time, t, the ith algorithm gets [1/(2^i)] fraction of the time to execute. Wnenever an algorithm halts with some output integer m in the range 1 < m < n, check if m divides n (i.e. if n mod m = 0). If so, return m. END IDEA of PROOF: Now let SPLIT be any number-splitting algorithm, say the ith in our list. Then OPTIMAL-SPLIT runs SPLIT for a fraction [1/(2^i)] of its running time. Hence the running time of OPTIMAL-SPLIT on input n is at most (2^i) x (running time of SPLIT on input n). QED Challenge-Response protocols can be built on passwords, each of which is a random randomly-accessible mapping from characters to digits. RANDOM means that all 10^26 mappings are equally probable. RANDOM-ACCESS means that one can "instantly" map a character x to its corresponding digit f(x). How easy is it to memorize such a random mapping? The HOROSCOPE EFFECT: Look up your horoscope in the newspaper. Think about it. Observe that it applies to you. Now look up horoscope for any other date and see that it does NOT apply to you. The idea is to give a person a random mapping, and then give that person a reason why that mapping is the unique obvious correct mapping. What are the possible reasons to map X to a particular digit in {0...9}? X -> 0 (X is Roman 10. tic-tac-toe uses X's and 0's. ) 1 (eXcellence is 1st class. X marks the (unique) spot. some people write a slim X that looks like 1; only 1 X in a scrabble set.) 2 (X is written with *2* slashes) 3 (XYZ are the 3 last characters. Xmas has 3 wise men. Xavier's Feast Day is Dec 3. X is 3rd from the end. 4 (X is a 4-leaf clover. X has 4 arms.) 5 (X has 4 arms plus a central point. The indeX FInger is one of 5 FIngers. X-ray is 5 character long word.) 6 (on account of the X in siX. No other digit has an X.) 7 (Latin 7 and X are both "crossed") 8 (center of 8 looks like an X. X looks like 8 if you put upper & lower horizontal bars on it) 9 (NIX = NIne X. Red SoX has 9 players.) What are the pros and cons of the following challenge-response protocol: 1. User's password ia a random map f from the 26 characters to the 10 digits. 2. For a d=1 digit response: for any 3-character challenge, the user responds with the sum of the three digits mod 10. More generally, 2' For a d digit response: for any 3*d character challenge, the user responds with d digits, where the first digit is the sum of the first 3 characters, the second digit is the sum of the next 3 characters, and so on. If the user need only respond correctly to 5 of the 6 challenges, then the probab that an imposter succeeds by simply choosing the 6 digits at random is 1/10^6 + (9*6)/10^6 = 55/10^6. EXAMPLE : CHALLENGE = xcv sdf xdk yui ert fds RESPONSE = 0 9 6 1 5 9 What are the pros (positives) and cons (negatives)? PROS = The horoscope effect can make it relatively easy to learn a random mapping. Once the map is burned into the brain's random access memory, and once one can do addition mod 10 wo stumbling, one can get d digits at 5 seconds per digit. 30 seconds for 6 digits. CONS = Have to learn a random map. Each challenge response pair is an equation Let f(a)=A, f(b)=B, etc. Then the above 6 challenge response pairs yield the following 6 eqtns: X+C+V=0 S+D+F=9 X+D+K=6 ... E+R+T=5 ... 26 independent equations suffice to determine the mapping f, using Gaussian elimination. Suppose you want 3-digit responses. Then you need 3 3-character challenges for each authentication. After 9 independent authentications, the eavesdropper knows the map or at least a great deal about it. IMPORTANT: While Gaussian elimination is easy, Gaussian Elimination with 10% of equations in error is hard(er)!! For example, suppose you have 36 equations and 10 are wrong, leaving 26 correct. One method to solve is to try and solve 36 choose 26 = 254x10^6 different systems of 26 equations. ------------------------------------------------------------ NEXT TIME: Some ideas from Learning Theory. GAUSSIAN ELIMINATION WITH NOISE INPUT: m linear equations in n unknowns (say over GF(2)). A positive integer k. QUESTION: Is there an assignment of values to the variables that solves at least k of the m equations? Gaussian Elimination with Noise is NP-hard (eg by reducing MAX 2-SAT to it). It's even hard to approximate: Johan Hastad "Some optimal inapproximability results" in Proc 29th STOC, pp1-10 (May 1997) proves in a very strong sense that it is NP-hard to approximate the number of equations that can be solved. One can randomly solve half the eqtns, ie m/2. One cannot even solve m/2 + epsilon. ie to solve that many is NP-hard. Max 2-SAT: Instance: set of m clauses, each of size 2, in n variables. an integer k. Question: Is there a truth assignment that satisfies at least k of the clauses? MAX 2-SAT is poly-time reducible to Gaussian Elimination with Noise.