.po 1i
.ll 6.5i
.ce 10
\fB\s16Homework 4: COBWEB\fP\s0
.sp
.sz 14
CS 395T: Machine Learning
.sp .5
Due: Tuesday, April 3
.sp
.ce 0
.nr pp 11
.nr sp 11
.pp
Code for a system called CLASSWEB, a successor of COBWEB which can handle both
numeric and nominal data, is my ai directory ml-code/classweb/.  There are a
number of files.  The file "classweb" is a top level file that loads all of
the others.  It uses the LOOP macro defined in the file "loop."  I have ran it
on my EXPLORER but there may be unanticipated problems on the HP's (lets hope
these are limited).  The function "run" is the top-level execution function.
The data files "sample-data," "animal-data," and "soybean-data" can be used to
test the system.  The file "multi-test" allows one to incrementally train on
one set of data and periodically test on another set of data in another file.
The files "soybean-train-data" and "soybean-test-data" can be used with the
function "run-and-test" defined in this file.  See the file "soybean-out"
for sample runs.
.uh "Part 1: Personal Concept"
.pp
Format your personal concept data without class information into a form
suitable for CLASSWEB.  Run the system on this data and comment on the
resulting hierarchy.  Separate your data into reasonable train and test sets
adding class information as an extra feature (as in the soybean data). Use
"run-and-test" to test CLASSWEB's ability to predict missing features for
various specified lists of PRED-ATTS including the class feature.  Hand in a
trace of one of your runs. Draw some predictive accuracy graphs and comment on
the results.
.uh "Part 2: Creating ISA Hierarchies With Defaults"
.pp
Write a post processor for the hierarchies created by CLASSWEB to determine
norms for the various concepts in the hierarchy.  A norm for a node is a
attribute value present in at least a certain fraction of the instances that
have a value for this attribute (use 0.8 as a default threshold for norms,
P(A=V | C) \(>= 0.8).  Knowledge of the defstructs defined in the file "struct"
should be sufficient to perform this task.  Add an extra field called "norms"
to the "node" structure to contain a list of norms where a norm is a structure
containing the fields "feature," "value," and "probability" (P(A=V | C)).  If a
norm can be inherited from an ancestor of a concept in the hierarchy, then it
should not be considered a norm for that concept.  However, norms for lower
class that over-ride the inherited value should be specified (e.g. the "bird"
concept has a norm flying=true but this is over-ridden by a norm flying=false
for penguins).  Make a nice function for printing out hierarchies with their
norms.  Run your code on the "animal-data" and show the hierarchy with norms
(see the file ml-code/classweb/animal-hierarchy for sample output).
Also hand in your commented code.




