.po 1i
.ll 6.5i
.ce 10
\fB\s16Homework 2: ID3\fP\s0
.sp
.sz 14
CS 395T: Machine Learning
.sp .5
Due: Tuesday, February 27
.sp
.ce 0
.pp
A version of the ID3 decision tree learning system is in the file ID3.
It can be tested on the examples in FIGURE-DATA or the weather examples from
Quinlan's article (in the file WEATHER-DATA).  The function ID3-TEST is
analogous to VS-TEST and can be used to train on a subset of examples and test
on the rest. ID3-CATEGORIES is for running and testing multi-category
data like SOYBEAN-DATA.
.uh "Part 1: Personal Concept"
.pp
Run the system on your personal dataset.  Use the function ID3-TEST to run and
test on various subsets of the entire dataset.  You might want to try
reordering your examples as well.  Hand in a dribble file for the output of a
train and test for a "successful" run (leave *trace-vs* off for this run).
Include some brief comments on your evaluation of the results.
.uh "Part 2: Adding Post Pruning"
.pp
Add "Reduced-Error Post-Pruning" to ID3 as described on the following sheet
taken from a paper comparing methods for pruning decision trees to handle noisy
data (\fIMachine Learning\fP 4,2, 1989).  This method uses a separate test set
to prune an existing tree.  You should define a function (PRUNE-DECISION-TREE
<decision-tree> <test-set>) which returns the pruned tree.  ID3 will have to be
changed so that when inconsistent data is encountered the leaf is labeled with
the majority class of the examples reaching that leaf. Also ID3 must store in
every node the majority class of the training examples reaching that node,
which will become the label of the resulting leaf if the subtree under it is
pruned.  If *TRACE-ID3* is set, your pruning process should print with each
subtree it examines the leaf label if pruned, error if pruned, error if kept,
and error improvement score (error-if-kept \(mi error-if-pruned) and explicitly
mention each subtree it prunes.  Don't worry about pruning the largest subtree
in case of a tie, just pick one.  Hand in a commented version of the code you
write.  Hand in a traced version of your pruner processing the tree generated
for the weather data using the pruning test examples given in WEATHER-DATA.
.pp
Functions for automatically adding noise to data and testing both pruned and
unpruned trees is in the file ID3-NOISE-TEST.  Use this code to get some
results on noisy versions of your data.  Hand in noise-level vs. correctness
curves for pruned and unpruned trees for various levels of feature and category
noise. Pruning should prove beneficial, at least after some level of noise.  If
you can't get good results with your data, try SOYBEAN-2CLASS-DATA, which
generates a good two-class version of the reduced soybean data.  You may want
to average each point over a number of random runs to get smooth curves.
