Newsgroups: comp.ai.neural-nets
Path: cantaloupe.srv.cs.cmu.edu!bb3.andrew.cmu.edu!newsfeed.pitt.edu!gatech!newsfeed.internetmci.com!in1.uu.net!redstone.interpath.net!sas!mozart.unx.sas.com!saswss
From: saswss@hotellng.unx.sas.com (Warren Sarle)
Subject: Re: Probabilistic Neural Networks
Originator: saswss@hotellng.unx.sas.com
Sender: news@unx.sas.com (Noter of Newsworthy Events)
Message-ID: <DnnsxD.2B4@unx.sas.com>
Date: Sat, 2 Mar 1996 21:27:13 GMT
X-Nntp-Posting-Host: hotellng.unx.sas.com
References:  <4h1h5a$8sl@panoramix.fi.upm.es>
Organization: SAS Institute Inc.
Lines: 62


In article <4h1h5a$8sl@panoramix.fi.upm.es>, galleske@usera.dia.fi.upm.es (Ingo Galleske) writes:
|> does anyone know, what Probabilistic Neural Networks are?
|> Is there any literature/articles about this topic.

OK already! I'll put it in the FAQ!!

PNN is Donald Specht's term for kernel discriminant analysis. You can
think of it as a normalized RBF network in which there is a hidden unit
centered at every training case. These RBF units are called "kernels"
and are usually probability density functions.  The hidden-to-output
weights are usually 1 or 0; for each hidden unit, a weight of 1 is used
for the connection going to the output that the case belongs to, while
all other connections are given weights of 0.  Alternatively, you can
adjust these weights for the prior probabilities of each class.  So the
only weights that need to be learned are the widths of the RBF units.
These widths (often a single width is used) are called "smoothing
parameters" or "bandwidths" and are usually chosen by cross-validation
or by more esoteric methods that are not well-known in the neural net
literature; gradient descent is _not_ used.

Specht's claim that a PNN trains 100,000 times faster than backprop is
at best misleading.  While they are not iterative in the same sense as
backprop, kernel methods require that you estimate the kernel bandwidth,
and this requires accessing the data many times. Furthermore, computing
a single output value with kernel methods requires either accessing the
entire training data or clever programming, and either way is much
slower than computing an output with a feedforward net.  And there are a
variety of methods for training feedforward nets that are much faster
than standard backprop.  So depending on what you are doing and how you
do it, PNN may be either faster or slower than a feedforward net.

PNN is a universal approximator for smooth class-conditional densities,
so it should be able to solve any smooth classification problem given
enough data. The main drawback of PNN is that, like kernel methods in
general, it suffers badly from the curse of dimensionality. PNN cannot
ignore irrelevant inputs without major modifications to the basic
algorithm. So PNN is not likely to be the top choice if you have
more than 5 or 6 inputs.

References:

   Hand, D.J. (1982) Kernel Discriminant Analysis, Research Studies Press.

   McLachlan, G.J. (1992) Discriminant Analysis and Statistical Pattern
   Recognition, Wiley.

   Masters, T. (199?) Advanced Algorithms for Neural Networks.

   Michie, D., Spiegelhalter, D.J. and Taylor, C.C. (1994) Machine
   Learning, Neural and Statistical Classification, Ellis Horwood.

   Scott, D.W. (1992) Multivariate Density Estimation, Wiley.

   Specht, D.F. (1990) "Probabilistic neural networks," Neural Networks,
   3, 110-118.

-- 

Warren S. Sarle       SAS Institute Inc.   The opinions expressed here
saswss@unx.sas.com    SAS Campus Drive     are mine and not necessarily
(919) 677-8000        Cary, NC 27513, USA  those of SAS Institute.
