.po 1i
.ll 6.5i
.nr ps 11
.nr pp 11
.ce 10
\fB\s16Homework 6: PERCEPTRON and PDP\fP\s0
.sp
.sz 14
CS 395T: Machine Learning
.sp .5
Due: Tuesday, April 25
.sp
.ce 0
.pp
Common Lisp implementations of the perceptron and error back-propagation
(generalized delta rule) connectionist learning algorithms are in the files
PERCEPTRON and PDP respectively. Both use code in the file BINARY-ENCODER (for
encoding feature vectors into bit vectors) and TESTER (for testing
multi-category data).  Both can be run on learning from examples data in
FIGURE-DATA, WEATHER-DATA, SOYBEAN-RDATA,  SOYBEAN-DATA, and LOCATION-DATA.
Some additional examples for PDP are in the file PDP-DATA. (Time constraints
prevent running PDP on full soybean data but try it if you like). There is an
XOR example in the file PERCEPTRON and PDP-DATA.
.uh "Part 1. Binary Encoding of Examples"
.pp
PERCEPTRON and PDP support two ways of encoding feature vectors into bit
vectors.  One uses one bit for every value of every feature.  The other
encodes the \fIn\fP values for any given feature into \fIlog(n)\fP bits.
Compare these two approaches on both your location data and the \fIreduced\fP
soybean data with 4 diseases (you can run the full soybean data on perceptron
but it takes pretty long to run it on PDP) (Make sure you reload the initial
soybean data files each time before encoding them using
ENCODE-CATEGORY-INSTANCES since this function cannot re-encode data already
encoded into binary.)  For each dataset, run each system on the same set of
training instances using both ways of encoding the inputs.  For each case,
record the run time for training and the correctness on the test set.
Hand in a table of your results and a discussion of which encoding is better
and try to explain why. Also discuss the relative performance of perceptron
and PDP with each other and with Version Space, ID3, and AQ.
.uh "Part 2. Better Ways to Handle Multiple Categories"
.pp
Currently in PERCEPTRON, multiple categories are currently handled by building
a separate perceptron for distinguishing members of each category.  A new
example is assigned to all categories whose perceptron output exceeds its
threshold.  A better way is to assign a new example to the single category
whose output exceeds its threshold by the largest amount (or is closest to
reaching its threshold if no category actually exceeds it). This is a sort of
measure of the confidence in the category.  Similarly, in PDP when learning
one network for each category (using pdp-categories) a new example is assigned
to all categories whose output is within *output-epsilon* of 1.  A better
approach would just be to assign it to the category with the highest output.
Change the testing fuctions of both systems to work in these ways.  Test the
new versions on the reduced soybean data and compare to results on the old
versions (from the previous section).  Also compare the PDP result to the
result of building just one network with one output unit for each category
(using pdp-categories1 with "best guess" interpretation of outputs). Turn in
you commented code, table of results and discussion of results.


