15-859(B) Machine Learning Theory 04/03/07
* Learning finite state environments w/o a reset
=========================================================================
Now: what if you don't have a reset? Extension by Rivest & Schapire.
To deal with this, use notion of a Homing sequence:
For a given sequence of actions h, and a given state q, let obs(q,h)
be the sequence of observations you see, starting at state q and
performing h. Look at {obs(q,h) : q in Q}. Let's think of this as a
partition of the state space Q, where two states are in the same
partition if they have the same observation sequence. h is a Homing
Sequence if each partition has a unique *ending* state. I.e., if
obs(q,h) = obs(q',h) then it must be that qh = q'h.
[using "qh" to mean the final state after running h from q]
E.g., "aa" in the example in the slides is a homing sequence.
Claim 1: for any two states q,q' there must exist a sequence A of
length at most n such that obs(q,A) != obs(q',A).
Proof: Think of the following Dynamic Programming algorithm. For
each pair of states, say they are k-equivalent if they look
equivalent for all strings A of length <= k. So, at k=0 we have 2
equivalence classes: white states and black states (or more if we
have more observations). To go from k to k+1, we just see, for
each pair of k-equivalent states (q,q'), whether there exists an
action a such that qa and q'a are not k-equivalent. If so, then
q,q' are not k+1-equivalent; else they are. So, notice that if the
number of equivalence classes does not increase going from k to
k+1, then it *never* will increase, so that k-equivalent states are
truly equivalent. So, that means we are done by the time we reach
k=n-2. BTW, this is also an algorithm for DFA minimization.
[but there might not be a single sequence, of any length, that works
to distinguish all pairs q,q'. Such a sequence would be called a
"distinguishing sequence"].
Claim 2: every DFA has a homing sequence. In fact, every n-state DFA
has a homing sequence of size at most O(n^2).
Proof: Start with h=lambda. If there exists two states q, q' that
violate property, append a sequence A s.t. obs(qh,A) != obs(q'h,A).
Must exist one of length at most n. This increases # elements in
the partition implied by h.
Can we hope to get O(n)? o(n^2)?
No: consider an ``L'' shaped environment with 2n states. actions
right, up. topmost state is black and rest are white. If you do
``up'' from a state with nothing above, then you stay put. If you
do ``right'' from bottom-right, it takes you to bottom-left. If
you go right from the vertical part of the L, then you go down to
the bottom of the L and go right by 1. (so in terms of (x,y)
coordinates, going right sets x = x+1 mod n and sets y=0; going up
sets y = y+1 mod n if x=0, and does nothing if x != 0).
Note: existence of homing sequence implies existence of distinguishing
sequence if each action induces a permutation among the states. Why?
Say you are given a homing sequence. How to use?
Run h and record the observation. Instantiate a new copy of
Learn-DFA, indexed by the observation seen. When you need to reset,
run h instead. If you see a new observation, start up a new copy of
the algorithm. If you see an old observation, go to the associated
copy. There will be at most n algorithms running (interleaved) in
total and every time through we make one unit of progress on one of
them. So, just blows up running time by factor of n.
How to learn a homing sequence as we learn the environment? Start off
with guess of h = lambda and run the above algorithm. Two things
might happen.
1) Some copy (learn-DFA)_sigma might discover *inconsistent behavior*.
I.e., it might notice that two runs of action sequence A have produced
two different outputs. In this case, A is a distinguishing sequence
for two states we *thought* were the same. So, append A onto h and
start over.
Actually, for better performance, can build a "homing tree" (also
called an "adaptive homing sequence") which is a sequence where the
next action depends on the observations you've seen. In particular,
we'll do h, and then only add A on the end if we saw sigma as our
observation. This means we don't need to discard ALL of the
learn-DFAs, just (Learn-DFA)_sigma.
2) Some copy grows a model with > n states.
Here is a probabilistic way to improve h, based on the
following "thought experiment". (We're not going to actually
run this). Let's say we are in state q, and we run homing
sequence h, and we enter this copy of Learn-DFA. This copy
has its states labeled with strings s_1, s_2,..., s_{n+1}
(remember how the algorithm worked). Since there are
really only n states, it must be that for some i != j,
q h s_i = q h s_j <- final states after these action seqs
On the other hand, row(s_i) != row(s_j) in the algorithm's
table. Let's let e be the experiment responsible for that (if there
are several, just pick one). This means that for some starting states
q' and q'' (these are before the homing sequence was run)
obs(q', h) = obs(q'', h) = obs(q, h)
but
final-obs(q'h, s_i e) != final-obs(q''h, s_j e)
(final-obs means just the last observation)
But,
final-obs(qh, s_i e) = final-obs(qh, s_j e)
So, it can't possibly be that the three states qh, q'h, q''h
are all the same. In fact, it must be the case that either
final-obs(q'h, s_i e) != final-obs(qh, s_i e), or final-obs(q''h,
s_j e) != final-obs(qh, s_j e). That means that either "s_i e" or
"s_j e" is a good string to add on to h.
The problem is we don't know what i and j are. Even if we ran the
experiment for real, we couldn't run both s_i and s_j from the state
qh. But, we can guess and pick one. Alternatively, we can guess a
random entry in the table and use that as "s e". In either case we
have at least a 1/(n+1)^2 chance of guessing right. Now rerun the
entire algorithm. Since can improve the homing sequence at most n
times, we will get here at most O(n^3) times in expectation.
This gives a crazy n^6-ish bound, but Schapire's experiments point out
that this is really a big overcount and in practice it is not so bad.
If fact, can usually get close to a homing sequence with just a random
string.