Newsgroups: comp.ai.neural-nets,sci.bio.systematics
Path: cantaloupe.srv.cs.cmu.edu!rochester!udel!gatech!usenet.eel.ufl.edu!news.mathworks.com!uunet!in1.uu.net!news.sprintlink.net!redstone.interpath.net!sas!mozart.unx.sas.com!saswss
From: saswss@hotellng.unx.sas.com (Warren Sarle)
Subject: Re: neural nets in taxonomy
Originator: saswss@hotellng.unx.sas.com
Sender: news@unx.sas.com (Noter of Newsworthy Events)
Message-ID: <DFwCu4.1vC@unx.sas.com>
Date: Tue, 3 Oct 1995 23:49:16 GMT
X-Nntp-Posting-Host: hotellng.unx.sas.com
References: <4332rp$fl0@net.bio.net> <DFH95w.1Mp@unx.sas.com> <rrg.43.0E43C340@aber.ac.uk>
Organization: SAS Institute Inc.
Lines: 101
Xref: glinda.oz.cs.cmu.edu comp.ai.neural-nets:27224 sci.bio.systematics:309


In article <4332rp$fl0@net.bio.net>, minch@lotka.stanford.edu inquired:
|> I'm about to start some work applying neural network models to problems in
|> taxonomy and systematics. The last time I played with NNs was over five
|> years ago, and I haven't really kept up. If anyone knows of work of this
|> type--even in progress, even classification rather than phylogeny--I'd be
|> grateful for a tip.

In article <DFH95w.1Mp@unx.sas.com> saswss@hotellng.unx.sas.com (Warren Sarle) replied:
>Neural network methods for taxonomy are generally slow and ineffective
>compared to methods in the numerical taxonomy and statistical clustering
>literature. See, for example:
>   Balakrishnan, P.V., Cooper, M.C., Jacob, V.S., and Lewis, P.A. (1994)
>   "A study of the classification capabilities of neural networks using
>   unsupervised learning: A comparison with k-means clustering",
>   Psychometrika, 59, 509-525.

In article <rrg.43.0E43C340@aber.ac.uk>, rrg@aber.ac.uk (Roy Goodacre) riposted:
|> I'm glad Warren used the word 'generally' in his rather inaccurate statement
|> above.  As he knows myself and colleagues have been applying both
|> supervised and unsupervised neural networks to the analysis of the pyrolysis
|> mass spectra of bacteria and foodstuffs.  This approach has been very
|> successful for their classification and identification.

I'm not sure we're talking about the same thing. Numerical taxonomy is
primarily concerned with unsupervised learning of classifications and
phylogenies. What I have seen of Roy's work is primarily concerned with
supervised learning of classifications. Here are some references on
numerical taxonomy:

   Cavalli-Sforza, L.L., Menozzi, P. and Piazza, A. (1994), The History
   and Geography of Human Genes, Princeton, NJ: Princeton Univ Press.

   Cole, A.J., ed. (1969), Numerical Taxonomy, London: Academic Press.  
   
   Jardine, N. and Sibson, R. (1971), Mathematical Taxonomy, New York:
   John Wiley & Sons.
   
   Mezzich, J.E and Solomon, H. (1980), Taxonomy and Behavioral
   Science, New York: Academic Press.
   
   Sneath, P.H.A. and Sokal, R.R. (1973), Numerical Taxonomy, San
   Francisco: W.H. Freeman.

Cavalli-Sforza et al. is accessible and entertaining and involves
organisms with which everyone is familiar, although it is quite
heavy reading (literally).

|> One thing that we have done which we strongly encourage others to do (and what
|> I hope Warren was implying 8-)) is to compare NN based classifications/
|> identifications with other statistical cluster analysis.

There isn't very much in the NN literature to compare with numerical
taxonomy. The usual AVQ methods such as Kohonen nets are demonstrably
inferior to standard clustering algorithms (see Balakrishnan et al.
cited above). The NN literature concentrates on scalar-product
similarity measures and occasionally Euclidean distance, while the
numerical taxonomy literature contains dozens of general-purpose
similarity measures and numerous special-purpose ones. The numerical
taxonomy literature contains systematic studies of various clustering
criteria, while the NN literature just hooks networks together to see
what they do. The few NN methods that have received any serious study,
such as Kohonen nets, are intended for data reduction, not for
identifying natural groups; the latter problem is vastly more difficult
and has been the subject of extensive research in the numerical taxonomy
literature. I have seen no NN methods for estimating phylogenies, which
is one of the fundamental tasks of numerical taxonomy.  I have seen no
methods for validating unsupervised learning in the NN literature (can
anybody give me a reference?), whereas this is a critical issue in
numerical taxonomy. So when it comes to comparing NN methods with
numerical taxonomy, NN has very little to offer.

The one significant contribution of the NN literature is Kohonen's
self-organizing maps, but I'm still trying to figure out what they're
good for. ART is useless for taxonomy or statistical applications.  I
have seen some interesting unsupervised NN methods such as Jonathan
Marshall's paper at Interface '94 (which I just looked for in the
proceedings but I can't seem to find it), but they are not taxonomical
methods. However, if anyone can show me a NN paper providing a reliable
method for identifying natural clusters (including a method for
validating the results) that is not a simple adaptation of some
well-known statistical method (such as normal mixture estimation or
nonparametric density estimation), I will be delighted to read it.

Neural networks are good for a wide variety of applications, but
taxonomy isn't one of them.

|> Comparisons are fairly congruent, although NN have often given slightly better
|> results.  For more information see WWW on
|> http://gepasi.dbs.aber.ac.uk/roy/pymshome.htm

While such studies are of interest to people involved with pyrolysis
mass spectra and related fields, they do not provide the broader
generalizations of systematic investigations such as the Balakrishnan
et al. paper.

-- 

Warren S. Sarle       SAS Institute Inc.   The opinions expressed here
saswss@unx.sas.com    SAS Campus Drive     are mine and not necessarily
(919) 677-8000        Cary, NC 27513, USA  those of SAS Institute.
