Notes for 3/11/98 Note: today will look at learning finite state environment. Next week, finish up and look at HMMs and also relations between machine learning and cryptography. Then Spring break. Then your presentations. Web page now has pointers to some specific papers that I think would make good presentations. ============================================================================= Learning deterministic finite state devices. -------------------------------------------- The problem we're going to look as is the following. We have a box with lights and buttons. We can press the buttons, and see the lights change. The box has internal state so, e.g., you might press the same button several times and yet have different things happen to the lights. But, we'll assume it's deterministic. Goal: figure out what's going on --- in particular, make a predictive model. - Do some examples. First example: no hidden state (internal state completely determined by color of light right now). Second example: 3-state environment with hidden state. We'll use an "Equivalence and Membership query" type of model: allow our algorithm to do 2 things. 1: Probe the box (press buttons and see lights - like a membership query) 2: ask if we've got it right, and if not get a counterexample. A counterexample is a sequence of button pushes showing why our model isn't right (e.g., if we are using our model to make predictions, then this is like the mistake-bound model where you only get to see the examples you make a mistake on). why allow equiv. queries? --> combination lock automata. E.g., it looks like every room is White, but there is some sequence of moves you can take to see Black (but it will take you exponential time to find it). why allow memb. queries (as opposed to just watching a distribution) --> crypto hardness results. This is a good opportunity to think again about the notion of the "expressivity" of a language in terms of what we can describe efficiently in it. E.g., Decision Trees can represent any function. So can DNF. So can truth tables. Real issue is: what can you express efficiently? (In particular since the bounds we're shooting for are poly in size of target function). For instance, think of a robot wandering around a maze in which each location has a color. (We have buttons for moving north, east, west, south, and the color of the room is the "light"). We could describe this nicely with a finite state environment. But what if there are doors that can open and close? Then DFA rep can get large. Another example is what if the box has a rubik's cube inside. You have 3 actions for manipulating the cube (sufficient to reach all configurations) and your lights are the colors of 3 squares on the front face. Here the DFA representation is really bad, but there are others, like one called the "diversity" representation, that are very efficient here. Today: talk about learning in time polynomial in the DFA-size of the box = # states in smallest DFA. (and in the length of the counterexamples) =========================================================================== Start by assuming there is a reset. Resets you back to the start state. (then the membership queries are like asking whether a string is in the Regular Language defined by the automaton) Some preliminary facts: What does it mean for 2 states to be different in a minimal DFA? q and q' are different if there exists a string (action sequence) s such that obs(qs) != obs(q's). qs = state you get to when you're at q and then do s. obs(q) = observation at q. Idea of algorithm: build DFA, where each state is "named" redundantly by: (1) a string that it takes to get there, and (2) the results of doing several experiments from that state. The reason for (2) is we can use it to determine where the edges go by taking and edge and then conducting the experiments. If we're wrong, then get a counterexample, and use it to get a new experiment that allows you to increase # states in your representation. --> Go through 4-state example. How to use the counterexample to figure out where we went wrong? Key trick: look at all ways to split the counterexample into a (prefix, suffix). For each split, replace prefix with our predicted string in S and make a membership query. We are guaranteed to find two adjacent pairs (pref_i, suff_{n-i}), (pref_{i+1}, suff_{n-i-1}) whose outputs differ, and this gives us a new experiment (specifically, suff_{n-i-1}) to use. Learning algorithm creates up to n rows in top part of table and n*|A| rows in the bottom part. # experiments is at most n since each one distinguishes a new state. -> # EQ's is at most n -> # MQ's is at most n(|A| + 1)*n + n*log(m) (first term is the number of rows times the number of experiments (i.e., the table size), and the second is how many queries it takes to find the next experiment given a counterexample * number of counterexamples.) (m = max length of counterexample) What if there is no reset? -> next time.