15-859(B) Machine Learning Theory 03/21/02 * Learning finite state environments, cond: what if no reset button? * Hidden Markov Models =========================================================================== Last time: algorithm for learning a finite state environment from experimentation and counterexmples when we have a reset button. Needed reset button to be able to run different experiments from same known start state. Now: look at what if you don't have a reset. To deal with this, use notion of a Homing sequence: For a given sequence of actions h, and a given state q, let obs(q,h) be the sequence of observations you see, starting at state q and performing h. Look at {obs(q,h) : q in Q}. Let's think of this as a partition of the state space Q, where two states are in the same partition if they have the same observation sequence. h is a Homing Sequence if each partition has a unique *ending* state. I.e., if obs(q,h) = obs(q',h) then it must be that qh = q'h. [using "qh" to mean the final state after running h from q] Claim: every DFA has a homing sequence. In fact, every n-state DFA has a homing sequence of size at most O(n^2). [Start with h=lambda. If there exists two states q, q' that violate property, append a disinguishing sequence for (qh,q'h) onto h. I.e., sequence A s.t. obs(qh,A) != obs(q'h,A). Must exist one of length at most n. This increases # elements in the partition implied by h] Say you are given a homing sequence. How to use? Run h and record the observation. Instantiate a new copy of Learn-DFA, indexed by the observation seen. When you need to reset, run h instead. If you see a new observation, start up a new copy of the algorithm. If you see an old observation, go to the associated copy. There will be at most n algorithms running (interleaved) in total and every time through we make one unit of progress on one of them. So, just blows up running time by factor of n. How to learn a homing sequence as we learn the environment? Start off with guess of h = lambda and run the above algorithm. Two things might happen. 1) Some copy (learn-DFA)_sigma might discover *inconsistent behavior*. I.e., it might notice that two runs of action sequence A have produced two different outputs. In this case, A is a distinguishing sequence for two states we *thought* were the same. So, append A onto h and start over. Actually, for better performance, can build a "homing tree" (also called an "adaptive homing sequence") which is a sequence where the next action depends on the observations you've seen. In particular, we'll do h, and then only add A on the end if we saw sigma as our observation. This means we don't need to discard ALL of the learn-DFAs, just (Learn-DFA)_sigma. 2) Some copy grows a model with > n states. Here is a probabilistic way to improve h, based on the following "thought experiment". (We're not going to actually run this). Let's say we are in state q, and we run homing sequence h, and we enter this copy of Learn-DFA. This copy has its states labeled with strings s_1, s_2,..., s_{n+1} (remember how the algorithm worked). Since there are really only n states, it must be that for some i != j, q h s_i = q h s_j <- final states after these action seqs On the other hand, row(s_i) != row(s_j) in the algorithm's table. Let's let e be the experiment responsible for that (if there are several, just pick one). This means that for some starting states q' and q'' (these are before the homing sequence was run) obs(q', h) = obs(q'', h) = obs(q, h) but final-obs(q'h, s_i e) != final-obs(q''h, s_j e) (final-obs means just the last observation) But, final-obs(qh, s_i e) = final-obs(qh, s_j e) So, it can't possibly be that the three states qh, q'h, q''h are all the same. In fact, it must be the case that either final-obs(q'h, s_i e) != final-obs(qh, s_i e), or final-obs(q''h, s_j e) != final-obs(qh, s_j e). That means that either "s_i e" or "s_j e" is a good string to add on to h. The problem is we don't know what i and j are. Even if we ran the experiment for real, we couldn't run both s_i and s_j from the state qh. But, we can guess and pick one. Alternatively, we can guess a random entry in the table and use that as "s e". In either case we have at least a 1/(n+1)^2 chance of guessing right. Now rerun the entire algorithm. Since can enlarge the homing sequence at most n times, we will get here at most O(n^3) times. This gives a crazy n^6-ish bound, but Schapire's experiments point out that this is really a big overcount and in practice it is not so bad. If fact, can usually get close to a homing sequence with just a random string.