15-859(B) Machine Learning Theory 03/24/10
DFAs
* Can learn in Mistake-bound + Membership Queries
* But hard to learn in PAC
======================
Learning in MB+MQ -> see slides.
Hardness in PAC: based on hardness of learning NC^1 (logarithmic
depth) circuits.
Idea: given an NC^1 circuit f over {0,1}^n, will create DFA g over
{0,1}^m where m = n*2^{depth(f)}. Also will define a simple mapping
r: x -> xxx...x. The DFA g will have the property that g(r(x)) =
f(x). Also the number of states in g will be polynomial in m.
-> first, how would you create a DFA for the function x_2 OR not(x_4)?
This is easy. In fact, notice we can do this so that the
DFA always ends up in either a specific accept-state A or a specific
reject-state R,and we can get rid of self-loops if we only care about
inputs of size n.
-> Given a DFA for the function f1 and a DFA for the function f2, how to
create a DFA for "f1 AND f2", assuming we're allowed to replicate the
input?
Just make f1's A state be f2's start state, and have f1's R state go
through a chain of dummy states to f2's R state. Similarly for OR.
So we double the length of the input needed each time we go up 1 level
in the circuit.
That's it. Notice that even though NC^1 circuits are hard to learn
even with queries, we have *not* shown that DFA's are hard to learn
with membership queries (good thing too, since we just gave an
algorithm that learns DFAs using membership queries). The reason is
that only queries of the form "xxx...x" actually map back to legal
queries for the original NC^1 circuit. Also, even though NC^1 is hard
to learn even over the uniform distribution, we also have *not* shown
that DFA's are hard to learn over the uniform distribution on n-bit
inputs. (This is a nice open problem).