Newsgroups: comp.ai.neural-nets,comp.ai
Path: cantaloupe.srv.cs.cmu.edu!nntp.club.cc.cmu.edu!godot.cc.duq.edu!news.duke.edu!MathWorks.Com!europa.eng.gtefsd.com!howland.reston.ans.net!math.ohio-state.edu!darwin.sura.net!news.Vanderbilt.Edu!dfisher
From: dfisher@vuse.vanderbilt.edu (Douglas H. Fisher)
Subject: Re: Explanation & ANNs
Message-ID: <1994Oct1.184128.4271@news.vanderbilt.edu>
Sender: news@news.vanderbilt.edu
Nntp-Posting-Host: aim.vuse.vanderbilt.edu
Organization: Department of Computer Science, Vanderbilt University, Nashville, TN, USA
References: <369o2m$qtn@post.its.mcw.edu>
Date: Sat, 1 Oct 1994 18:41:28 GMT
Lines: 49
Xref: glinda.oz.cs.cmu.edu comp.ai.neural-nets:19207 comp.ai:24478

The comprehensibility of NNs has been addressed in work
on finding symbolic rules that approximate the
I/O of neural nets, particularly feed-forward nets. A strategy
implemented by one of Masters students, Kun Huang, for her
degree in 1993 might be termed `symbolic Backpropagation'.
If one has a trained feedforward NN to perform classification,
then one classifies available data using the NN and creates 
an augmented data set, in which each datum is the set of
activations for all neurons at all levels: input (0 level), 
hidden (levels 1 through N-1), and output (level N).

One then uses decision tree induction to build a DT classifier
that classifies as level N does (assuming a winner take all
strategy) based on activations at level N-1; the result is a
DT with arcs labeled by <= or > threshold values for level
N-1 activations (DT induction systems such as C4.5 typically
find appropriate threshold values automatically). Suffice it 
to say that these thresholds discretize level
N-1 activations, which then become target categories
for decision tree induction -- DT induction is performed to
predict distinguished discrete ranges of level N-1 based on N-2 
activations. The two trees -- for predicting level N from level N-1,
and for predicting level N-1 from level N-2, are then `merged',
resulting in a tree that directly maps level N-2 activations
onto level N (without N-1 as an intermediary). A DT that maps
level N-3 activations to distinguished ranges of level N-2
is built, merged, etc., until one is left with a tree that
directly maps inputs (level 0) to outputs (level N).

One could perform DT induction to map level 0 to level N
directly -- the rationale for the `sym. backprop' approach is that with
very `simple' data (particularly), there may be many ways
to do the mapping -- not all equal reflections on the
NN mapping (i.e., the behaviors may be similar of training data,
but not necessarily on test data). I have reservations about
using DT induction as the component symbolic learning strategy,
but the general idea seems easily adapted to include
an alternate symbolic strategy. One day maybe we'll get
around to publishing it beyond the Master's paper.

You are at UWisc? Craven and Shavlik are in your Computer Sci
department. They published a recent paper in the Machine Learning
conference that also exploits activations versus weights,
have good pointers to other work, and they are among the experts
in the area -- craven/shavlik @cs.wisc.edu 

Cheers, Doug Fisher (dfisher@vuse.vanderbilt.edu)

P.S. send me a US mail address.
