SECURITY and CRYPTOGRAPHY 15-827 27 NOV 01 Lecture #18 M.B. 4615 Wean 1. Ask students how they did on analyzing the palindrome- restricted #82. This is protocol #82 with challenges restricted to being palindromes having a1,a2, ... all pairwise distinct. Does this restriction make #82 stronger? In particular, ask Abie to explain his attack on the above protocol. 2. On to CAPTCHAS. 3. Ask John to show Luis's FORKS CAPTCHA: Q: What is common to these 5 pictures? A: fork This CAPTCHA randomly selects a "picturable noun" from BASIC ENGLISH and searches on the web for 5 random images indexed by that noun. 4. Suggest that someone write their own CAPTCHA, perhaps based on: Q: Find the correspondence between labelled points in this picture to labelled points in this other picture? A: A-1. B-5. C-4. d-2. E-3. Given a picture and a slightly distorted variant of the picture, find points in the distorted picture that map from given points in the original picture. Or given two distinct views of a picture, find the point in one that corresponds to a given point in the other. The pictures could be the two images of a 3-D picture, or they could be two slightly different views of a face. The fundamental AI problem on which these are based is the problem of finding a picture in a data base. It would be valuable to be able to find an image in a data base, given some slightly distorted portion of the picture. You take a photograph of a picture to a museum or library and ask: Who was the artist? How can I find more such pictures? 5. Is the image search problem hard? If not, how would you solve it? 6. Is it possible to have a text-based (ascii) CAPTCHA? It's not even clear how to define it. Ascii art draws pictures using only ascii characters. We don't want to allow that to be an ASCII CAPTCHA. DEFINITION: An ASCII CAPTCHA is a CAPTCHA composed of ASCII characters, typically "English words", that can be understood about as well when heard (spoken in a normal clear voice line after line from left to right) as when seen. An ASCII CAPTCHA is NOT a sound-oriented CAPTCHA: while the ASCII CAPTCHA can be spoken in a normal voice and understood by almost any English-speaking English-literate person, it can equally well be displayed and read visually. Q: What makes an ASCII CAPTCHA so difficult (if not impossible) to construct? A: The source of semantically meaningful sentences is public. GOOGLE can find any public sentence from just half a dozen words. Replacing words by synonyms is not a big impediment since GOOGLE can try synonyms for every set of half a dozen words. For example, (6^6)*(20 choose 6) = 1.8x10^9 Replacing words by synonyms can make a sentence harder for humans to understand. The usual cryptographic technique for checking that a proof is correct is to check for consistency, but the notion of consistency in text is unclear. Where is the consistency in: "The deaf dumb and blind kid plays a mean pinball." This is a semantically meaningful syntactically correct sentence that is, all the same, highly paradoxical. 7. Ke Yang points out that alice.org is a pretty good "language understanding" program. For another program that understands something, see http://test.thespark.com/genertest/ It's an imbecilic test that nevertheless guessed my gender correctly (but just barely). 8. Suggestions for proving that a text-based (ASCII) captcha is impossible to construct: Show that for any proposed text-based (ASCII) captcha, one can construct a bot that passes the Test: As usual, the bot knows how the CAPTCHA works. Let's suppose that the CAPTCHA (tester) works in 1 round: it generates a random phrase, to which the subject (human or bot) responds. That's the entirety of the conversation. The CAPTCHA then either accepts (HUMAN) or rejects (NOT HUMAN). To fool the CAPTCHA, a bot need only look for a response that causes its own internal private copy of the CAPTCHA to accept. This is possible if: 1. the initial challenge has a sufficiently large probability to be generated (reasonable for a text-based CAPTCHA), and 2. the time to find an acceptable response is sufficiently small (again reasonable for a text-based CAPTCHA). Now suppose the conversation were required to have a large number of rounds, say 10 rounds. The bot can be helped enormously if at each round, the CAPTCHA can correctly decide how promising (at that point) is each possible response. 9. Suggestions for proving that a text-based (ASCII) captcha is possible to construct: IDEA: CAPTCHA supplies a current conversation between ALICE and a human. Respondent is to say which is human. Note that the CAPTCHA knows the correct answer, as it took the conversation off the web. Unfortunately, since ALICE is public and uses almost no state info, a bot could determine which are her responses and therefore which side is ALICE. This approach has potential if ALICE is expanded so that it's part in a conversation is (more) state dependent, the state space is large, and the state is randomly chosen or determined by some undivulged earlier conversation. But then ALICE's part in the conversation might(?) become more human? IDEA: CAPTCHA supplies the original and final English versions of a paragraph taken from the web that has been translated from English to German back to English. Making modifications to the original English -- as few modifications as possible -- arrange that the final English translation be close to if not identical to the original English. The hard AI problem on which this is based is the problem of translating from one language into another. Does "understanding" text help one to make minimal changes? Giving a collection of translations, only one of which must be improved, should be helpful to the human. Does few changes guarantee an improved translation? This can be checked experimentally using the web translator "babelfish". IDEA: CAPTCHA is to algorithmically generate a 10-second sequence of integers, one that most arithmetically literate humans recognize in just 10 seconds. For example, 211, 3111, 41111, ... or 1 2 1 2 3 2 1 2 3 4 3 2 1 2 3 4 5 4 3 ... or 0 1 1 0 1 1 1 0 0 1 0 1 1 1 0 1 1 1 1 0 0 0 ... (too hard). Present the sequence as challenge, and request a short C program to generate that sequence of integers and its continuation. (This is a good place to introduce Sloane's book of sequences.) PROBLEM: any catalog of these easy to understand sequences could be used to create a catalog of C programs for the same sequences.