.po 1i
.ll 6.5i
.ce 10
\fB\s16Homework 1: Version Space Algorithm\fP\s0
.sp
.sz 14
CS 395T: Machine Learning
.sp .5
Due: Tuesday, February 13
.sp
.ce 0
.pp
A Common Lisp implementation of the version space algorithm is in the file
VERSION-SPACE.  The top-level function is called VERSION-SPACE and takes a list
of examples.  A very simple data file for this system is FIGURE-DATA which
contains several examples which were discussed in class.  The reduced soybean
data in SOYBEAN-RDATA can also be used with VERSION-SPACE.  The full soybean
data is disjunctive and takes too long to run. The code and data are commented
and the manual in your packet describes the basics of using the system.  This
assignment has two parts.  In the first part, you will run the system on your
personal concept and in the second part, you will add an additional feature to
the system.
.uh "Part 1: Personal Concept"
.pp
Run the system on your personal dataset.  Use the function VS-TEST to run and
test on various subsets of the entire dataset.  Turning on the flag
*print-with-feature-names* should help you interpret the resulting
generalizations.  Try to find at least some relatively large subset of the
examples on which to train such that VS at least produces a non-null result.
Hand in a dribble file for the output of a train and test for such a
"successful" run (leave *trace-vs* off for this run).  Include a brief comment
on your evaluation of the results.
.uh "Part 2: Adding Two Object Descriptions"
.pp
The current system only supports simple nominal feature vectors of a single
object.  Change the description language to handle instances consisting of two
feature-vector descriptions of two unordered objects like that used as an
example in Mitchell's paper.  An example instance description in this language
is ((large red circle)(small blue triangle)) and a sample generalization which
matches this instance is ((? blue ?)(? ? circle)). Remember the order of the
two object descriptions is irrelevant.  Additional examples of this language
which you will use to test your system are in the file TWO-FIGURE-DATA.  You
will need to redefine the following functions: EQUAL-GENERALIZATIONS, MATCH,
MORE-GENERAL?, INITIALIZE-G, GENERALIZATIONS-TO, and SPECIALIZATIONS-AGAINST.
Additional functions may also be necessary.  The existing functions for single
feature vectors should be useful as subroutines for the new language.  You
should be careful in formulating the generalization and specialization
functions and be aware that unlike for the existing language, S may become
larger than one generalization.  Getting SPECIALIZATIONS-AGAINST to do compute
\fIall\fP possible least-specializations is quite tricky.  Test the system on
the provided examples in the file TWO-FIGURE-DATA and hand in your commented
code and a dribble file for the examples.  For your dribble file, make sure you
set the variable *TRACE-VS* to T so that the current S and G sets are printed
out after processing each example.
.pp
Also test your specialization procedure directly with the following
examples:
.sp
> (clean-g (specializations-against '((? ? ?)(? ? ?)) '((small green square)(small blue triangle))))
.sp .5V
> (clean-g (specializations-against '((? ? ?)(? ? triangle)) '((small green square)(small blue triangle))))

