Notes for 3/11/98

Note: today will look at learning finite state environment.  Next week,
finish up and look at HMMs and also relations between machine learning
and cryptography.  Then Spring break.  Then your presentations.  Web
page now has pointers to some specific papers that I think would make
good presentations. 
=============================================================================

Learning deterministic finite state devices.
--------------------------------------------
The problem we're going to look as is the following.  We have a 
box with lights and buttons.  We can press the buttons, and see the
lights change.  The box has internal state so, e.g., you might press
the same button several times and yet have different things happen to
the lights.  But, we'll assume it's deterministic.  Goal: figure out
what's going on --- in particular, make a predictive model.

- Do some examples.  First example: no hidden state (internal state
completely determined by color of light right now).  Second example:
3-state environment with hidden state.

We'll use an "Equivalence and Membership query" type of model: allow our
algorithm to do 2 things.  1: Probe the box (press buttons and see
lights - like a membership query) 2: ask if we've got it
right, and if not get a counterexample.  A counterexample is a
sequence of button pushes showing why our model isn't right (e.g., if
we are using our model to make predictions, then this is like the
mistake-bound model where you only get to see the examples you make a
mistake on).  

       why allow equiv. queries?  --> combination lock automata.
	E.g., it looks like every room is White, but there is some 
	sequence of moves you can take to see Black (but it will take you
	exponential time to find it).

       why allow memb. queries (as opposed to just watching a
	distribution) --> crypto hardness results.

This is a good opportunity to think again about the notion of the
"expressivity" of a language in terms of what we can describe
efficiently in it.  E.g., Decision Trees can represent any function.
So can DNF. So can truth tables.  Real issue is: what can you express
efficiently?  (In particular since the bounds we're shooting for are
poly in size of target function).  For instance, think of a robot
wandering around a maze in which each location has a color.  (We have
buttons for moving north, east, west, south, and the color of the room
is the "light").  We could describe this nicely with a finite state
environment.  But what if there are doors that can open and close?
Then DFA rep can get large.  Another example is what if the box has a
rubik's cube inside. You have 3 actions for manipulating the cube
(sufficient to reach all configurations) and your lights are the
colors of 3 squares on the front face.  Here the DFA representation is
really bad, but there are others, like one called the "diversity"
representation, that are very efficient here.

Today: talk about learning in time polynomial in the DFA-size of the
box = # states in smallest DFA.  (and in the length of the counterexamples)


===========================================================================

Start by assuming there is a reset.  Resets you back to the start state.
(then the membership queries are like asking whether a string is in
the Regular Language defined by the automaton)

Some preliminary facts:

What does it mean for 2 states to be different in a minimal DFA?  q
and  q' are different if there exists a string (action sequence) s
such that obs(qs) != obs(q's).
	qs = state you get to when you're at q and then do s.
	obs(q) = observation at q.

Idea of algorithm: build DFA, where each state is "named" redundantly by: (1) a string
that it takes to get there, and (2) the results of doing several
experiments from that state.  The reason for (2) is we can use it to
determine where the edges go by taking and edge and then conducting
the experiments.  

If we're wrong, then get a counterexample, and use it to get a new
experiment that allows you to increase # states in your representation.

--> Go through 4-state example.

How to use the counterexample to figure out where we went wrong?  Key
trick: look at all ways to split the counterexample into a (prefix,
suffix).  For each split, replace prefix with our predicted string in
S and make a membership query.  We are guaranteed to find two adjacent
pairs (pref_i, suff_{n-i}), (pref_{i+1}, suff_{n-i-1}) whose outputs
differ, and this gives us a new experiment (specifically,
suff_{n-i-1}) to use.


Learning algorithm creates up to n rows in top part of table and n*|A|
rows in the bottom part.  # experiments is at most n since each one
distinguishes a new state. 
	-> # EQ's is at most n
	-> # MQ's is at most n(|A| + 1)*n + n*log(m)
         	(first term is the number of rows times the number of
		experiments (i.e., the table size), and the second is how many
		queries it takes to find the next experiment given a
		counterexample * number of counterexamples.)
		(m = max length of counterexample)

What if there is no reset?  -> next time.