Genetic Algorithms Digest    Monday, 9 January 1989    Volume 3 : Issue 2

 - Send submissions to GA-List@AIC.NRL.NAVY.MIL
 - Send administrative requests to GA-List-Request@AIC.NRL.NAVY.MIL

Today's Topics:
	- Neural Networks and sparse payoffs
	- Re: Classifier System Problems
	- Re: Constrained Functional Optimization

--------------------------------

From: merrill@bucasb.BU.EDU (John Merrill)
Date:  Wed, 4 Jan 89 13:24:19 EST
Subject: Neural Networks and sparse payoffs

In GA-List v3n1, John Holland (or someone using an account with the
same name), writes:

> 3) In applications with sparse payoff, neural nets are a poor
> alternative to classifier systems.  In their "real" incarnations neural
> nets require detailed error feedback on every step.  Classifier
> systems can work with payoff (an occasional signal that simply
> ranks the outcome of a SEQUENCE of actions), and they can
> implement Samuel's method of prediction and revision.

This claim, at least as it concerns neural networks, is false.  Not
even the most limited learning algorithms, such as back-propagation,
require feedback after every learning cycle; in some cases, in fact,
such feedback is injurious to performance (see, e.g., Jordan, 1988).

A weaker claim, sometimes heard and also false, is that neural
networks require feedback which specifies what the values of the
outputs of the network should be---that is, direct control of the
outputs of the network.  This, if it were true, would be a grave
weakness, since that direct control is impossible for many tasks, and,
for many others, requires the user to make an arbitrary, possibly
suboptimal, selection of values.

In fact, there are modifications of backprop which avoid this
requirement.  For instance, Jordan (1988) has described a mechanism by
which the control of the outputs of a network which is operating in a
temporal framework can be specified as 

"x_1 must be between 0.25 and 0.5; x_4 must be greater than .5; and
x_5 must be greater than .8"

with added provisos such as smoothness of trajectory.

More extreme still, if performance can be evaluated by a single
number, then derivatives of the various derivative estimating
procedures (based on the original work of Riscorla and Wagner) can be
employed.  (See, for instance, Sutton and Barto, 1981.)  These allow
very sparse feedback with nothing more than a "better/worse" ranking.

--- Bibliography

Jordan, M. J., (1988) "Supervised learning and systems with excess
degrees of freedom", COINS Tech. Rept. 88-27, Department of Computer
and Information Science, Univ. of Mass. at Amherst.

Sutton, R. S., and Barto, A. G. (1981) "Toward a modern theory of
adaptive networks: Expectation and prediction". Psych. Review, 88,
135-170. 

(I'm sure that the Jordan paper has appeared in some other form; if
you're interested, E_mail me and I'll find out.)
--
John Merrill			|	ARPA:	merrill@bucasb.bu.edu
Center for Adaptive Systems	|	
111 Cummington Street		|	
Boston, Mass. 02215		|	Phone:	(617) 353-5765

--------------------------------

From: Rick_Riolo@um.cc.umich.edu
Date: Fri, 6 Jan 89 14:50:26 EST
Subject: Re: Classifier System Problems
 
 Michael Hall's comments and John Holland's response
 have prompted me to make a few comments, too.
 
 There were several issues discussed, and I think its
 important to keep them as distinct as possible.
 I saw (at least) these issues raised:
 
 1. The Michigan vs. Pitt Approach to Classifier Systems.
 
 2. The Bucket Brigade algorithm.
 
 3. Emergence and Stability of Coupled Chains of Classifiers.
 
 4. Parasites in Classifier Systems.
 
 5. The GA in an "Ecological" setting.
 
 Lots has been said about (1), in this conference and in
 articles, so I will restrict my comments to 2 through 5.
 
 2. The Bucket Brigade Algorithm.
 
    The BBA is a type of "temporal difference method" described by
    Sutton (Machine Learning 3:9-44, 1988).  As he points out,
    TDM's have some advantages and some disadvantages.
    In some cases TDM's lead to *faster* learning!
    I highly recommend this paper as food for thought.
 
    Grefenstette, Westerdale and others (including me!)
    have pointed out problems that arise when classifiers
    are shared across different chains.  John Holland has
    argued that "bridging classifiers" are one way to
    avoid this problem (and to speed up BBA strength allocation).
    Bridges do indeed help.  As I see it the task now is to get bridges
    to emerge.  To my knowledge no one has explicitly tried
    to figure out how to get bridges to form.  I think the hope
    was that the GA would discover them, but I think that
    is not likely (just as it could not discover chains themselves--see below).
    I believe a new rule generation operator is required.
 
    Another issue is what should the BBA be allocating?
    To date classifier systems have had just one measure of merit,
    strength.  I (and others) have argued that strength is
    burdened with too many jobs, e.g., conflict resolution (via bids),
    biasing the search for new rules, as "capital"; and with
    too many roles, e.g., for measures of prediction (what does
    the system think will happen next) and prescription
    (what does the system think it should do next).
    But these are issues no matter how "credit" is allocated.
 
 
 3. Emergence and Stability of Coupled Chains of Classifiers.
 
    Robertson and I, Holland, and others have been able to get
    short chains to form using various "triggered chaining operators"
    (See Machine Learning 3, issues 2 and 3, or the Induction book
    by Holland etal, which gives argues for using such special operators).
 
    Have other approaches been used---I'd like to hear about them.
 
    At any rate, I don't think the GA alone can't do it,
    which explains Hall's problems with his two-step task.
 
    A number of problems have surfaced, most boiling down to
    variants or offshoots of the "premature convergence" problem
    i.e., the spread of some high strength rules to the detriment
    of weaker rules that are still required to solve the task,
    and (as with classical GA optimization problems), to the detriment
    of the crossover operator.
 
    Some of these problems can be attributed to different decisions
    about how strength is allocated by the BBA and its helpers.
    (Examples: For two condition rules, how should the bid be
    allocated to the rules supplying messages? What and how
    should taxes be used?)  Use of reward sharing (as suggested
    by Booker and Wilson's work) and other techniques has
    ameliorated these problems, but in the long run I think
    some modifications to the "classic" classifier systems are
    probably required to get longer chains to form and be stable.
 
    Bridging classifier should also help, if we can get them to form
    (yes, I invoke "bridges" for every unsolved problem, including
    when my furnance stops working, a major problem here in Ann Arbor!).
 
 
 4. Parasites in Classifier Systems.
 
    As Hall points out, parasites can lead to instabilities
    in performance.  I think parasites will always be found in
    classifier systems (at least the ones I want to work with---the ones
    that are good models of natural systems).  Thus the task is to
    control them, so that they do not spread so much that they cause
    (a) performance degradation, and (b) the loss of the useful
    rules they are living off.
 
    I (and others) have implemented ways to control some parasites (e.g.,
    hallucination producers, rules that try to fill the message list, etc.).
    I think Stewart Wilson's (inadvertant) solution, i.e., not have rules
    post messages to other rules, is also appropriate for many
    applications (i.e., those where chains are not needed).
 
    However, I don't have a general answer about how to control them.
    I do think we should look to natural systems for clues.
    After all, parasites are *everywhere* in nature,
    but they only rarely cause the demise of a species.
 
 
 5. The GA in an "Ecological" setting.
 
    I think John Holland's comments about classifier systems
    as an ecology are right on the mark:  the issue is not
    how to find the best individual (or even the best few),
    but instead how to get a set of individuals that cover all
    (well, many) of the available niches and that form a
    stable (non-trivial) system.
 
    The "optimization" verus "adaption" arguments are very
    deep and interesting.  I would suggest a book edited by
    John Dupre, The Latest on the Best (MIT Press, 1987) for
    a good discussion of many of the issues involved.
 
 
 There are lots of other interesting things to think about
 when working with classifier systems, e.g., how to implement
 the interface to the world, should the message list be
 limited, should classifier produce more that one message per step,
 and so on.  But I will leave those for the future...
 
 
 In conclusion, my attitude toward classifier systems in similar
 to John Hollands, I think:  if you just want to get the
 best learning on a specific task right away, classifier systems
 might not be the way to go.  However, I do think classifier
 systems do offer a good way to study very complex systems
 (like ecologies, economies, cns's, etc.), and I think
 they also are an interesting approach to machine learning
 that may in the longer run will offer systems that can
 solve "real" tasks.
 
   - r

--------------------------------

From: WSIEDLECKI@VMSA.CF.UCI.EDU
Date: Fri, 6 Jan 89 16:08 PST
Subject: Re: Constrained Functional Optimization

This is my reply to Dave Powell's questions on constrained GA's [v3n1].

You have probably looked alredy into some suggestions in the recent
book by G. Goldberg (Genetic Algorithms in Search Optimization and 
Machine Learning, Addison-Wesley, 1988), pages 85-86.  It is somewhat
surprising that so little attention was given to the constrained
optimization problem (COP) in general.  Although Goldberg faced COP
in his work on pipeline optimization, he offers very little information
on how he did it (at least in that book or all other publications he
did on the subject).  His approach is simply to use a penalty function
multiplied by scale factor.

I myself faced the same problem while trying to make feature selection
for statistical pattern classifiers.  In my case the goal is to reduce the
number of features (i.e., the dimensionality of the pattern space) 
without exceeding a certain threshold on the error rate of the classifier.
The main difficulty is here that to compute the error rate takes a
tremendous amount of time, so Goldberg's remark that we could just skip
infeasible (i.e., violating constraints) solutions seems like a waste of
computer resources.

The penalty function approach is healthy but you have to know how to
set the scale factor which weights the influence of the primary
optimization criterion and the penalty function on how the search
continues.  I seem to have solved this problem at least for my feature
selection task, and, perhaps, my solution may not easily extend on other
types of applications.  I am going to submit a paper to the conference
on GA's on constrained optimization, but if you need details right away
you may contact me on BITNET at WSIEDLECKI@UCIVMS.

Best Regards,

Wojciech Siedlecki
U.C. Irvine

--------------------------------

End of Genetic Algorithms Digest
********************************
