.po 1i
.ll 6.5i
.EQ
delim $$
gsize 12
.EN
.ce 10
\fB\s16Homework 3: Formal Learnability\fP\s0
.sp
.sz 14
CS 395T: Machine Learning
.sp .5
Due: Thursday, March 7
.sp
.ce 0
.nr pp 12
.nr sp 12

.np
Show that the following hypothesis space sizes are correct where n is the
number of binary features. Recall that $left ( pile {n above k} right ) ~ =
~ O( n sup k )$.
.(l
k-term-DNF: |H| = $2 sup {O(kn)}$
k-DNF: |H| = $2 sup {O( n sup k )}$
k-CNF: |H| = $2 sup {O( n sup k )}$
DNF: |H| = $2 sup {2 sup n}$
.)l

.np
Consider the hypothesis space consisting of descriptions of \fIk\fP separate
objects each described as a purely conjunctive description on \fIn\fP nominal
features each having \fIv\fP values (like the two-object description language
used as an example in the version space paper, [(large red triangle) (small
blue circle)]).  Note that the order of the object descriptions is irrelevant
and that duplicate object descriptions are allowed. 

i) Determine an upper-bound on the sample complexity of any learning algorithm
that uses this hypothesis space consistently.  (i.e. a sufficient number of
examples to guarantee with probability at least 1-\(*d that the version-space
is \(*e-exhausted.) You may need to refer to a basic text on combinatorics.

ii) Using this upper-bound, calculate the number of examples needed when
\(*e = 0.1, \(*d = 0.1, n = 20, k = 3, and v = 5.

iii) Simplify your upper-bound by restating it in order notation O(n) instead
of an exact numerical bound. 

.np
Consider the hypothesis space consisting of disjunctions of two
intervals on a single real-valued attribute (e.g. (5 < X < 7) or (10 < X <
20)).

i) Determine an upper-bound on the sample complexity of any learning algorithm
that uses this hypothesis space consistently.  (i.e. a sufficient number of
examples to guarantee with probability at least 1-\(*d that the version-space
is \(*e-exhausted.)

ii) Using this upper-bound, calculate the number of examples needed when
\(*e = 0.1, \(*d = 0.1.



