.po 1i
.ll 6.5i
.nr ps 11
.nr pp 11
.ce 10
\fB\s16Homework 3: Disjoint Covers AQ\fP\s0
.sp
.sz 14
CS 395T: Machine Learning
.sp .5
Due: Thursday, March 3
.sp
.ce 0
.pp
A version of the AQ algorithm for learning from examples which uses the
VERSION-SPACE system to compute stars is in the file AQ-VS.  It can be tested
on the examples in FIGURE-DOMAIN or WEATHER-DOMAIN. However, it is too
inefficient to run even on the reduced soybean dataset since star generation
in this case is equivalent to the the slow trial in part one of Homework 1.  A
version of AQ which uses beam-search to compute bounded stars is in the file
AQ and this system can also be tested on the soybean data if the beam width
(*max-star*) is set low enough. 
.uh "Part 1: Constructing AQ-DISJOINT-CATEGORIES"
.pp
The function AQ-CATEGORIES in AQ, like the previous analogous functions for
VERSION-SPACE and ID3, runs multiple learning trials to learn possibly
overlapping concepts for multiple categories. Your assignment is to write a
similar function AQ-DISJOINT-CATEGORIES which uses multiple trials of AQ to
learn disjoint covers for each category.  The trial for category C\*<i\*>
should use instances of the \fIi\fPth category as positive examples, instances
of categories C\*<j\*>, j>i as negative examples, and each of the complexes in
the already learned covers for C\*<j\*>, j<i as negative examples. Since the
current system assumes that negative examples are always instances and not
complexes (i.e. it assumes that negative examples cannot contain "?"s), a
couple of the functions in AQ used in the generation of stars will also have
to be changed to allow for "generalized negative examples" (i.e.  complexes
that cover multiple negative examples).  Only a small number of minor changes
are needed. Hand in a copy of your commented code. Test your function on the
reduced soybean dataset (SOYBEAN-RDOMAIN) with the default 8 training
instances and beam-width 1 and hand in a printout of the rules produced (leave
*TRACE-AQ* off).  You can check to make sure these rules are disjoint.
.uh "Part 2: Experiment on Overlapping vs. Disjoint Categories"
.pp
First write a function RULE-COMPLEXITY which takes a list of categories for
which covers have been generated (using AQ-CATEGORIES or
AQ-DISJOINT-CATEGORIES) and computes the average number of disjuncts (i.e.
complexes) per cover and the average number of conjuncts (i.e. selectors) per
complex. Next use AQ-CATEGORIES to learn and test the covers produced for the
full soybean dataset (SOYBEAN-DOMAIN). Generate rules using beam-width 1, test
them on the test set, and measure complexity using RULE-COMPLEXITY for 4, 6,
8, 10, 12, and 14 training instances and record the run time, overall %
correct, average number of disjuncts, and average number of conjuncts. Do the
same thing again for AQ-DISJOINT-COVERS.  Hand in a table of the results and
one or two paragraphs explaining these results as best as you can.


