.po 1i
.ll 6.5i
.nr ps 11
.nr pp 11
.ce 10
\fB\s16Homework 1: Version Space Algorithm\fP\s0
.sp
.sz 14
CS 395T: Machine Learning
.sp .5
Due: Thursday, February 7
.sp
.ce 0
.pp
All code for the semester will be in the directory \fI/u/mooney/ml-code/\fP.  A
Common Lisp implementation of the version space algorithm is in the file
\fIversion-space\fP.  The file \fIml-utilities\fP has various helper functions
and should be loaded with all programs.  For efficiency, be sure to compile all
code.  The basic top-level function is VERSION-SPACE and takes a list of
examples.  A very simple data file for this system is \fIfigure-data\fP which
contains several examples discussed in class. The code and data are commented
and provide the necessary information needed for their use.  This assignment
has two parts.  In the first part, you will perform a simple experiment using
the system and in the second part, you will add an additional feature to the
system.
.uh "Part 1: Experiment with Example Ordering"
.pp
The complete soybean disease dataset (17 diseases, 50 features, 289 examples)
is in the file \fIsoybean-data\fP and is too large to run on this system.  The
file \fIsoybean-reduced-data\fP is a dataset containing descriptions of 17
examples for each of four soybean diseases using only 32 features.  The
function TRAIN-MULTI-VERSION-SPACE can be used to learn concepts for multiple
categories.  The function TRAIN-AND-TEST in \fIml-utilities\fP can be used to
run a standard train and test cycle.
.pp
First, train and test the system on the reduced soybean data (using 30 training
examples) and note the run time printed out.  The system currently gives the
examples to VERSION-SPACE as a list containing all of the positive examples
followed by all of the negative examples.  Next, change the system so that,
instead, it orders all negative examples first followed by all of the positive
examples. Run the soybean examples again and note the run time.  Finally,
change it so that examples are randomly ordered.  Write a paragraph or two
reporting the results, explaining them in terms of how the algorithm works, and
commenting on the qualities of the representation language which cause these
results.  \fBWarning\fP: One of the above trials may take a \fIlong\fP time,
feel free to terminate this run when it is clear that it takes much longer than
the other two.
.uh "Part 2: Adding Structured Features"
.pp
The current system only supports simple nominal feature vectors. Add the
ability to support structured features (while maintaining all of the current
abilities).  To get you started, some initial functions for creating and
traversing simple hierarchies as well as a simple test domain and test examples
are in the file \fIversion-space-hw\fP.  In addition, you will need to edit and
redefine the following functions in VERSION-SPACE: MATCH, MORE-GENERAL?,
INITIALIZE-G, GENERALIZATIONS-TO, and SPECIALIZATIONS-AGAINST.  Writing a
couple of additional functions may also be necessary (my solution has just two
small additional functions). Run the new system on the provided test examples
and hand in your commented code and a dribble file for the examples.  For your
dribble file, make sure you set the variable *TRACE-VS* to T so that the
current S and G sets are printed out after processing each example.
