Newsgroups: comp.ai.neural-nets
Path: cantaloupe.srv.cs.cmu.edu!rochester!udel!news.mathworks.com!newsfeed.internetmci.com!in1.uu.net!redstone.interpath.net!sas!mozart.unx.sas.com!saswss
From: saswss@hotellng.unx.sas.com (Warren Sarle)
Subject: Re: What is a probabilistic Neural Network?
Originator: saswss@hotellng.unx.sas.com
Sender: news@unx.sas.com (Noter of Newsworthy Events)
Message-ID: <DnKspJ.A89@unx.sas.com>
Date: Fri, 1 Mar 1996 06:29:43 GMT
X-Nntp-Posting-Host: hotellng.unx.sas.com
References: <dmost-2202962308280001@dmost.magna.com.au> <4gkc14$kmt@news.iastate.edu>
Organization: SAS Institute Inc.
Lines: 85


In article <dmost-2202962308280001@dmost.magna.com.au>, dmost@magna.com.au (Damien Morton) writes:
|>     Ive got a book my Timothy Masters which begins by comparing Feed
|> forward Neural Networks and Probilistic Neural Networks, but then goes on
|> without discussing the Probililistic Networks any further. Masters sais
|> PNN are slow but fast to train. This sounds fine to me, could someone
|> describe briefly how a PNN works.

PNN is Donald Specht's term for kernel discriminant analysis. You can
think of it as a normalized RBF network in which there is a hidden unit
centered at every training case. The hidden-layer activation function is
usually Gaussian or some other probability density function.  The
hidden-to-output weights are usually 1 or 0; for each hidden unit, a
weight of 1 is used for the connection going to the output that the case
belongs to, while all other connections are given weights of 0.
Alternatively, you can adjust these weights for the prior probabilities
of each class.  So the only weights that need to be learned are the
widths of the RBF units.  These widths (often a single width is used)
are called "smoothing parameters" or "bandwidths" and are usually chosen
by cross-validation or by more esoteric methods that are not well-known
in the neural net literature; gradient descent is _not_ used.
References:

   Hand, D.J. (1982) Kernel Discriminant Analysis, Research Studies Press.

   McLachlan, G.J. (1992) Discriminant Analysis and Statistical Pattern
   Recognition, Wiley.

   Michie, D., Spiegelhalter, D.J. and Taylor, C.C. (1994) Machine
   Learning, Neural and Statistical Classification, Ellis Horwood.

   Scott, D.W. (1992) Multivariate Density Estimation, Wiley.

   Specht, D.F. (1990) "Probabilistic neural networks," Neural Networks,
   3, 110-118.

In article <4gkc14$kmt@news.iastate.edu>, dhanwada@iastate.edu (C Dhanwada) writes:
|> A PNN layout looks like a typical neural network. But it is
|> a statistical method based on parzen window technique for
|> pattern classification. I guess the basic difference is a PNN
|> is a paramtric pattern classifier. 

No, PNN is most definitely nonparametric. The distributions of each
class need only be smooth; there is no parametric model involved.

|> ... A test pattern propagates forward
|> to perform a dot product with the pattern in each hidden node.

It's not a dot product but a distance, usually Euclidean.

|> The book is probably referring to the situation when we have a
|> large number of training patterns. Then forward propagtion
|> requires compution involving a large number of hidden nodes,
|> making the PNN slow to generate an outputE. The training
|> procedure is trivial. Once we have a training set, we just
|> need to store each pattern in a hidden node, and compute
|> a proper normalization factor. One shot proc. Indeed training is
|> fast, especially if you have seen plain backprop worrying over
|> even a "moderately complex" problem. Specht found his PNN
|> trained "hundred thousand times faster than backprop" for
|> same performance. Sure, but not always.

PNN is certainly _not_ 100,000 times faster than any decent training
method for a feedforward net.  While they are not iterative in the same
sense as backprop, kernel methods require that you estimate the kernel
bandwidth, and this requires accessing the data many times. Furthermore,
computing a single output value with kernel methods requires either
accessing the entire training data or clever programming, and either way
is much slower than computing an output with a feedforward net. So
depending on what you are doing and how you do it, PNN may be either
faster or slower than a feedforward net.

|> A friend of mine found
|> a basic limitation: A PNN cannot separate patterns inside a
|> unit circle from those falling outside. The reason is PNNs
|> simplistic dependence on dot products for classification.

That's absurd. Given a reasonable amount of data, the circle-in-the-square
problem is trivial for a PNN.

-- 

Warren S. Sarle       SAS Institute Inc.   The opinions expressed here
saswss@unx.sas.com    SAS Campus Drive     are mine and not necessarily
(919) 677-8000        Cary, NC 27513, USA  those of SAS Institute.
