Newsgroups: comp.ai.neural-nets
Path: cantaloupe.srv.cs.cmu.edu!bb3.andrew.cmu.edu!newsfeed.pitt.edu!godot.cc.duq.edu!newsgate.duke.edu!interpath!news.interpath.net!sas!newshost.unx.sas.com!hotellng.unx.sas.com!saswss
From: saswss@unx.sas.com (Warren Sarle)
Subject: changes to "comp.ai.neural-nets FAQ" -- monthly posting
Originator: saswss@hotellng.unx.sas.com
Sender: news@unx.sas.com (Noter of Newsworthy Events)
Message-ID: <nn.changes.posting_836017240@hotellng.unx.sas.com>
Supersedes: <nn.changes.posting_833338836@hotellng.unx.sas.com>
Date: Sat, 29 Jun 1996 03:00:42 GMT
Expires: Sat, 3 Aug 1996 03:00:40 GMT
X-Nntp-Posting-Host: hotellng.unx.sas.com
Reply-To: saswss@unx.sas.com (Warren Sarle)
Organization: SAS Institute Inc., Cary, NC, USA
Keywords: modifications, new, additions, deletions
Followup-To: comp.ai.neural-nets
Lines: 2159

==> nn1.changes.body <==
*** nn1.oldbody	Tue May 28 23:00:11 1996
--- nn1.body	Fri Jun 28 23:00:10 1996
***************
*** 1,4 ****
  Archive-name: ai-faq/neural-nets/part1
! Last-modified: 1996-05-27
  URL: ftp://ftp.sas.com/pub/neural/FAQ.html
  Maintainer: saswss@unx.sas.com (Warren S. Sarle)
--- 1,4 ----
  Archive-name: ai-faq/neural-nets/part1
! Last-modified: 1996-06-27
  URL: ftp://ftp.sas.com/pub/neural/FAQ.html
  Maintainer: saswss@unx.sas.com (Warren S. Sarle)
***************
*** 74,82 ****
--- 74,86 ----
     Why use activation functions?
     What is a softmax activation function?
+    What is the curse of dimensionality?
     How do MLPs compare with RBFs?
+    What are OLS and subset regression?
     Should I normalize/standardize/rescale the data?
+    Should I nonlinearly transform the data?
     What is ART?
     What is PNN?
     What is GRNN?
+    What does unsupervised learning learn?
     What about Genetic Algorithms and Evolutionary Computation?
     What about Fuzzy Logic?
***************
*** 374,377 ****
--- 378,384 ----
  the training data. 
  
+ As for simulating human consciousness and emotion, that's still in the realm
+ of science fiction. 
+ 
  For examples of NN applications, see: 
  
***************
*** 384,387 ****
--- 391,395 ----
   o The Applications Corner, provided by NeuroDimension, Inc., at 
     http://www.nd.com/appcornr/purpose.htm. 
+  o The BioComp Systems, Inc. Solutions page at http://www.bio-comp.com 
   o Athanasios Episcopos's web page with References on Neural Net
     Applications to Finance and Economics at 
***************
*** 390,394 ****
     McGraw-Hill, ISBN 0-07-011189-8. 
   o Trippi, R.R. & Turban, E. (1993), Neural Networks in Finance and
!    Investing, Chicago: Probus, ISBN 1-55738-452-5. 
   o The series Advances in Neural Information Processing Systems containing
     proceedings of the conference of the same name, published yearly by
--- 398,402 ----
     McGraw-Hill, ISBN 0-07-011189-8. 
   o Trippi, R.R. & Turban, E. (1993), Neural Networks in Finance and
!    Investing, Chicago: Probus. 
   o The series Advances in Neural Information Processing Systems containing
     proceedings of the conference of the same name, published yearly by
***************
*** 519,522 ****
--- 527,539 ----
  variance than others, then you may be able to use weighted least squares
  instead of ordinary least squares to obtain more efficient estimates. 
+ 
+ Hundreds, perhaps thousands of people have run comparisons of neural nets
+ with "traditional statistics" (whatever that means). Most such studies
+ involve one or two data sets, and are of little use to anyone else unless
+ they happen to be analyzing the same kind of data. But there is an
+ impressive comparative study of supervised classification by Michie,
+ Spiegelhalter, and Taylor (1994), and an excellent comparison of
+ unsupervised Kohonen networks and k-means clustering by Balakrishnan,
+ Cooper, Jacob, and Lewis (1994). 
  
  Communication between statisticians and neural net researchers is often

==> nn2.changes.body <==
*** nn2.oldbody	Tue May 28 23:00:16 1996
--- nn2.body	Fri Jun 28 23:00:15 1996
***************
*** 1,4 ****
  Archive-name: ai-faq/neural-nets/part2
! Last-modified: 1996-05-27
  URL: ftp://ftp.sas.com/pub/neural/FAQ2.html
  Maintainer: saswss@unx.sas.com (Warren S. Sarle)
--- 1,4 ----
  Archive-name: ai-faq/neural-nets/part2
! Last-modified: 1996-06-27
  URL: ftp://ftp.sas.com/pub/neural/FAQ2.html
  Maintainer: saswss@unx.sas.com (Warren S. Sarle)
***************
*** 21,29 ****
--- 21,33 ----
     Why use activation functions?
     What is a softmax activation function?
+    What is the curse of dimensionality?
     How do MLPs compare with RBFs?
+    What are OLS and subset regression?
     Should I normalize/standardize/rescale the data?
+    Should I nonlinearly transform the data?
     What is ART?
     What is PNN?
     What is GRNN?
+    What does unsupervised learning learn?
     What about Genetic Algorithms and Evolutionary Computation?
     What about Fuzzy Logic?
***************
*** 201,215 ****
  optimization have been studied for hundreds of years, and there is a huge
  literature on the subject in fields such as numerical analysis, operations
! research, and statistical computing, e.g., Bertsekas 1995, Gill, Murray, and
! Wright 1981. There is no single best method for nonlinear optimization. You
! need to choose a method based on the characteristics of the problem to be
! solved. For functions with continuous second derivatives (which would
! include feedforward nets with the most popular differentiable activation
! functions and error functions), three general types of algorithms have been
! found to be effective for most practical purposes: 
  
   o For a small number of weights, stabilized Newton and Gauss-Newton
     algorithms, including various Levenberg-Marquardt and trust-region
!    algorithms are efficient. 
   o For a moderate number of weights, various quasi-Newton algorithms are
     efficient. 
--- 205,222 ----
  optimization have been studied for hundreds of years, and there is a huge
  literature on the subject in fields such as numerical analysis, operations
! research, and statistical computing, e.g., Bertsekas (1995), Gill, Murray,
! and Wright (1981). Masters (1995) has a good elementary discussion of
! conjugate gradient and Levenberg-Marquardt algorithms in the context of NNs.
! 
! There is no single best method for nonlinear optimization. You need to
! choose a method based on the characteristics of the problem to be solved.
! For functions with continuous second derivatives (which would include
! feedforward nets with the most popular differentiable activation functions
! and error functions), three general types of algorithms have been found to
! be effective for most practical purposes: 
  
   o For a small number of weights, stabilized Newton and Gauss-Newton
     algorithms, including various Levenberg-Marquardt and trust-region
!    algorithms, are efficient. 
   o For a moderate number of weights, various quasi-Newton algorithms are
     efficient. 
***************
*** 217,222 ****
     efficient. 
  
  All of the above methods find local optima. For global optimization, there
! are a variety of approaches. You can simply run any of the local
  optimization methods from numerous random starting points. Or you can try
  more complicated methods designed for global optimization such as simulated
--- 224,232 ----
     efficient. 
  
+ For functions that are not continuously differentiable, the Nelder-Mead
+ simplex algorithm is useful. 
+ 
  All of the above methods find local optima. For global optimization, there
! are also a variety of approaches. You can simply run any of the local
  optimization methods from numerous random starting points. Or you can try
  more complicated methods designed for global optimization such as simulated
***************
*** 229,232 ****
--- 239,245 ----
   o The kangaroos, a nontechnical description of various optimization
     methods, at ftp://ftp.sas.com/pub/neural/kangaroos. 
+  o The Netlib repository, http://www.netlib.org/, containing freely
+    available software, documents, and databases of interest to the numerical
+    and scientific computing community. 
   o John Gregory's nonlinear programming FAQ at 
     http://www.skypoint.com/subscribers/ashbury/nonlinear-programming-faq.html.
***************
*** 248,251 ****
--- 261,267 ----
     nonlinear parameters," SIAM J. Appl. Math., 11, 431-441. 
  
+    Masters, T. (1995) Advanced Algorithms for Neural Networks: A C++
+    Sourcebook, NY: John Wiley and Sons, ISBN 0-471-10588-0 
+ 
     More\', J.J. (1977) "The Levenberg-Marquardt algorithm: implementation
     and theory," in Watson, G.A., ed., _Numerical Analysis_, Lecture Notes in
***************
*** 404,409 ****
  
  B-splines provide a way of coding ordinal inputs into fewer than C variables
! while retaining information about the order of the categories. See Gifi
! (1990, 365-370). 
  
  Target variables with ordered categories require thermometer coding. The
--- 420,425 ----
  
  B-splines provide a way of coding ordinal inputs into fewer than C variables
! while retaining information about the order of the categories. See Brown and
! Harris (1994) or Gifi (1990, 365-370). 
  
  Target variables with ordered categories require thermometer coding. The
***************
*** 416,419 ****
--- 432,438 ----
  References: 
  
+    Brown, M., and Harris, C. (1994), Neurofuzzy Adaptive Modelling and
+    Control, NY: Prentice Hall. 
+ 
     Gifi, A. (1990), Nonlinear Multivariate Analysis, NY: John Wiley & Sons,
     ISBN 0-471-92620-5. 
***************
*** 563,566 ****
--- 582,632 ----
  ------------------------------------------------------------------------
  
+ Subject: What is the curse of dimensionality? 
+ ==============================================
+ 
+ This answer was provided by Janne Sinkkonen: 
+ 
+    Curse of dimensionality (Bellman 1961) refers to the exponential
+    growth of hypervolume as a function of dimensionality. What has this
+    to do with the NNs? 
+ 
+    Well, NNs are mappings from an input space to an output space. Thus,
+    loosely speaking, an NN needs to "monitor" or cover or represent
+    every part of its input space in order to know how the space should
+    be mapped. Covering the input space takes resources, and the amount
+    of resources needed is proportional to the hypervolume of the input
+    space. This notion seems to hold generally, but formalizing
+    "resources" and "every part of the input space" would take us so deep
+    that we could eventually surface to a different world on the other
+    side of the deepness. :) 
+ 
+    Here is an example. Think of an unsupervised competitive one-layer
+    network that models data scattered uniformly over a unit hypercube.
+    The network tries to share its units (resources) more or less equally
+    over the hypercube (input space). One could argue that the average
+    distance from a random point of the space to the nearest network unit
+    measures the goodness of the representation: the shorter the
+    distance, the better is the represention of the data in the cube. By
+    simulations or by thinking it can be shown that the total number of
+    units required to keep the average distance constant increases
+    exponentially with the dimensionality of the cube. 
+ 
+    Curse of dimensionality causes networks with lots of irrelevant
+    inputs to be behave relatively badly. The dimension of the input
+    space is high, and the network uses almost all its resources to
+    represent irrelevant portions of the space. 
+ 
+ References: 
+ 
+    Bellman, R. (1961), Adaptive Control Processes: A Guided Tour, Princeton
+    University Press. 
+ 
+    Bishop, C.M. (1995), Neural Networks for Pattern Recognition, Oxford:
+    Oxford University Press, section 1.4. 
+ 
+    Scott, D.W. (1992), Multivariate Density Estimation, NY: Wiley. 
+ 
+ ------------------------------------------------------------------------
+ 
  Subject: How do MLPs compare with RBFs? 
  ========================================
***************
*** 607,626 ****
  hidden unit. 
  
- To illustrate various architectures, an example with two inputs and one
- output will be used so that the results can be shown graphically. The
- function being learned resembles a landscape with a Gaussian hill and a
- logistic plateau as shown in:
- 54K ftp://ftp.sas.com/pub/neural/tnnex_hillplat.ps
- 1K ftp://ftp.sas.com/pub/neural/tnnex_hillplat.sas
- In the examples, files with extensions of ".ps" are postscript files
- containing graphics, whereas files with extensions of ".sas" and ".bls" are
- plain text files. 
- 
- An example using an MLP with one hidden layer and an identity output
- activation function is shown in these files:
- 357K ftp://ftp.sas.com/pub/neural/tnnex_hillplat_mlp.bls
- 885K ftp://ftp.sas.com/pub/neural/tnnex_hillplat_mlp.ps
- 1K ftp://ftp.sas.com/pub/neural/tnnex_hillplat_mlp.sas
- 
  Radial basis function (RBF) networks usually have only one hidden layer for
  which the combination function is the Euclidean distance between the input
--- 673,676 ----
***************
*** 638,649 ****
  function, so the activations of all the hidden units are normalized to sum
  to one. This type of network is often called a "normalized RBF", or NRBF,
! network. 
  
  While the distinction between these two types of Gaussian RBF architectures
  is sometimes mentioned in the NN literature, its importance has rarely been
! appreciated except by Tao (1993). 
  
! There are several subtypes of both ORBF and NRBF architectures.
! Descriptions, formulas, and examples are as follows: 
  
  ORBFUN 
--- 688,701 ----
  function, so the activations of all the hidden units are normalized to sum
  to one. This type of network is often called a "normalized RBF", or NRBF,
! network. In a NRBF network, the output units should not have a bias, since
! the constant bias term would be linearly dependent on the constant sum of
! the hidden units. 
  
  While the distinction between these two types of Gaussian RBF architectures
  is sometimes mentioned in the NN literature, its importance has rarely been
! appreciated except by Tao (1993) and Werntges (1993). 
  
! There are several subtypes of both ORBF and NRBF architectures. Descriptions
! and formulas are as follows: 
  
  ORBFUN 
***************
*** 650,662 ****
     Ordinary radial basis function (RBF) network with unequal widths
     h_j = exp(f*log(a_j) - s_j^-2 * [(w_ij-x_i)^2] )
-    177K ftp://ftp.sas.com/pub/neural/tnnex_hillplat_orbfeh.bls
-    681K ftp://ftp.sas.com/pub/neural/tnnex_hillplat_orbfeh.ps
-    1K ftp://ftp.sas.com/pub/neural/tnnex_hillplat_orbfeh.sas
  ORBFEQ 
     Ordinary radial basis function (RBF) network with equal widths
     h_j = exp( - s^-2 * [(w_ij-x_i)^2] )
-    159K ftp://ftp.sas.com/pub/neural/tnnex_hillplat_orbfeq.bls
-    715K ftp://ftp.sas.com/pub/neural/tnnex_hillplat_orbfeq.ps
-    1K ftp://ftp.sas.com/pub/neural/tnnex_hillplat_orbfeq.sas
  NRBFUN 
     Normalized RBF network with unequal widths and heights
--- 702,708 ----
***************
*** 663,669 ****
     h_j = softmax(f*log(a_j) - s_j^-2 *
     [(w_ij-x_i)^2] )
-    35K ftp://ftp.sas.com/pub/neural/tnnex_hillplat_nrbfun.bls
-    221K ftp://ftp.sas.com/pub/neural/tnnex_hillplat_nrbfun.ps
-    1K ftp://ftp.sas.com/pub/neural/tnnex_hillplat_nrbfun.sas
  NRBFEV 
     Normalized RBF network with equal volumes
--- 709,712 ----
***************
*** 670,682 ****
     h_j = softmax( f*log(b_j) - s_j^-2 *
     [(w_ij-x_i)^2] )
-    39K ftp://ftp.sas.com/pub/neural/tnnex_hillplat_nrbfev.bls
-    221K ftp://ftp.sas.com/pub/neural/tnnex_hillplat_nrbfev.ps
-    1K ftp://ftp.sas.com/pub/neural/tnnex_hillplat_nrbfev.sas
  NRBFEH 
     Normalized RBF network with equal heights (and unequal widths)
     h_j = softmax( - s_j^-2 * [(w_ij-x_i)^2] )
-    35K ftp://ftp.sas.com/pub/neural/tnnex_hillplat_nrbfeh.bls
-    220K ftp://ftp.sas.com/pub/neural/tnnex_hillplat_nrbfeh.ps
-    1K ftp://ftp.sas.com/pub/neural/tnnex_hillplat_nrbfeh.sas
  NRBFEW 
     Normalized RBF network with equal widths (and unequal heights)
--- 713,719 ----
***************
*** 683,695 ****
     h_j = softmax( f*log(a_j) - s^-2 *
     [(w_ij-x_i)^2] )
-    170K ftp://ftp.sas.com/pub/neural/tnnex_hillplat_nrbfew.bls
-    658K ftp://ftp.sas.com/pub/neural/tnnex_hillplat_nrbfew.ps
-    1K ftp://ftp.sas.com/pub/neural/tnnex_hillplat_nrbfew.sas
  NRBFEQ 
     Normalized RBF network with equal widths and heights
     h_j = softmax( - s^-2 * [(w_ij-x_i)^2] )
!    152K ftp://ftp.sas.com/pub/neural/tnnex_hillplat_nrbfeq.bls
!    659K ftp://ftp.sas.com/pub/neural/tnnex_hillplat_nrbfeq.ps
!    1K ftp://ftp.sas.com/pub/neural/tnnex_hillplat_nrbfeq.sas
  
  The ORBF architectures use radial combination functions and the exp
--- 720,761 ----
     h_j = softmax( f*log(a_j) - s^-2 *
     [(w_ij-x_i)^2] )
  NRBFEQ 
     Normalized RBF network with equal widths and heights
     h_j = softmax( - s^-2 * [(w_ij-x_i)^2] )
! 
! To illustrate various architectures, an example with two inputs and one
! output will be used so that the results can be shown graphically. The
! function being learned resembles a landscape with a Gaussian hill and a
! logistic plateau as shown in ftp://ftp.sas.com/pub/neural/hillplat.gif.
! There are 441 training cases on a regular 21-by-21 grid. The table below
! shows the root mean square error (RMSE) for a test data set. The test set
! has 1681 cases on a regular 41-by-41 grid over the same domain as the
! training set. If you are reading the HTML version of this document via a web
! browser, click on any number in the table to see a surface plot of the
! corresponding network output (each plot is a gif file, approximately 9K). 
! 
! The MLP networks in the table have one hidden layer with a tanh activation
! function. All of the networks use an identity activation function for the
! outputs. 
! 
!           Hill and Plateau Data: RMSE for the Test Set
! 
! HUs  MLP   ORBFEQ  ORBFUN  NRBFEQ  NRBFEW  NRBFEV  NRBFEH  NRBFUN
!                                                            
!  2  0.218   0.247   0.247   0.230   0.230   0.230   0.230   0.230
!  3  0.192   0.244   0.143   0.218   0.218   0.036   0.012   0.001
!  4  0.174   0.216   0.096   0.193   0.193   0.036   0.007
!  5  0.160   0.188   0.083   0.086   0.051   0.003
!  6  0.123   0.142   0.058   0.053   0.030
!  7  0.107   0.123   0.051   0.025   0.019
!  8  0.093   0.105   0.043   0.020   0.008
!  9  0.084   0.085   0.038   0.017
! 10  0.077   0.082   0.033   0.016
! 12  0.059   0.074   0.024   0.005
! 15  0.042   0.060   0.019
! 20  0.023   0.046   0.010
! 30  0.019   0.024
! 40  0.016   0.022
! 50  0.010   0.014
  
  The ORBF architectures use radial combination functions and the exp
***************
*** 716,720 ****
  the hidden layer to equal one. Thus, each output unit computes a weighted
  average of the hidden-to-output weights, and the output values must lie
! within the range of the hidden-to-output weights. 
  
  Radial combination functions incorporating altitudes are useful with NRBF
--- 782,789 ----
  the hidden layer to equal one. Thus, each output unit computes a weighted
  average of the hidden-to-output weights, and the output values must lie
! within the range of the hidden-to-output weights. Thus, if the
! hidden-to-output weights are within as reasonable range (such as the range
! of the target values), you can be sure that the outputs will be within that
! same range for all possible inputs, even when the net is extrapolating. 
  
  Radial combination functions incorporating altitudes are useful with NRBF
***************
*** 821,830 ****
  
  The exponential increase in the number of hidden units required for hybrid
! learning is one aspect of the curse of dimensionality (Bellman 1961; Scott
! 1992). The number of training cases required also increases exponentially in
! general. No neural network architecture--in fact no method of learning or
! statistical estimation--can escape the curse of dimensionality in general,
! hence there is no practical method of learning general functions in more
! than a few dimensions. 
  
  Fortunately, in many practical applications of neural networks with a large
--- 890,899 ----
  
  The exponential increase in the number of hidden units required for hybrid
! learning is one aspect of the curse of dimensionality. The number of
! training cases required also increases exponentially in general. No neural
! network architecture--in fact no method of learning or statistical
! estimation--can escape the curse of dimensionality in general, hence there
! is no practical method of learning general functions in more than a few
! dimensions. 
  
  Fortunately, in many practical applications of neural networks with a large
***************
*** 832,836 ****
  irrelevant, and some architectures can take advantage of these properties to
  yield useful results. But escape from the curse of dimensionality requires
! fully supervised learning. 
  
  Additive inputs
--- 901,909 ----
  irrelevant, and some architectures can take advantage of these properties to
  yield useful results. But escape from the curse of dimensionality requires
! fully supervised training as well as special types of data. Supervised
! training for RBF networks can be done by "backprop" (see "What is
! backprop?") or other optimization methods (see "What are conjugate
! gradients, Levenberg-Marquardt, etc.?"), or by subset regression "What are
! OLS and subset regression?"). 
  
  Additive inputs
***************
*** 877,883 ****
  produce accurate outputs only near the subspace occupied by the data. Adding
  redundant inputs has little effect on the effective dimensionality of the
! data; hence the curse of dimensionality does not apply. However, if the test
! cases do not follow the same pattern of redundancy as the training cases,
! generalization will require extrapolation and will rarely work. 
  
  Irrelevant inputs
--- 950,957 ----
  produce accurate outputs only near the subspace occupied by the data. Adding
  redundant inputs has little effect on the effective dimensionality of the
! data; hence the curse of dimensionality does not apply, and even hybrid
! methods (2) and (3) can be used. However, if the test cases do not follow
! the same pattern of redundancy as the training cases, generalization will
! require extrapolation and will rarely work well. 
  
  Irrelevant inputs
***************
*** 964,971 ****
  use an architecture with equal widths. 
  
! References: 
  
!    Bellman, R. (1961), Adaptive Control Processes: A Guided Tour, Princeton
!    University Press. 
  
     Friedman, J.H. and Stuetzle, W. (1981), "Projection pursuit regression,"
--- 1038,1048 ----
  use an architecture with equal widths. 
  
! References:
! There are few good references on RBF networks. Bishop (1995) gives one of
! the better surveys, but also see Tao (1993) for the importance of
! normalization. 
  
!    Bishop, C.M. (1995), Neural Networks for Pattern Recognition, Oxford:
!    Oxford University Press. 
  
     Friedman, J.H. and Stuetzle, W. (1981), "Projection pursuit regression,"
***************
*** 995,1000 ****
     ftp://ftp.sas.com/pub/neural/neural1.ps. 
  
-    Scott, D.W. (1992), Multivariate Density Estimation, NY: Wiley. 
- 
     Tao, K.M. (1993), "A closer look at the radial basis function (RBF)
     networks," Conference Record of The Twenty-Seventh Asilomar
--- 1072,1075 ----
***************
*** 1006,1009 ****
--- 1081,1201 ----
     Image Signal Processing, 141, 210-216. 
  
+    Werntges, H.W. (1993), "Partitions of unity improve neural function
+    approximation," Proceedings of the IEEE International Conference on
+    Neural Networks, San Francisco, CA, vol 2, 914-918. 
+ 
+ ------------------------------------------------------------------------
+ 
+ Subject: What are OLS and subset regression? 
+ =============================================
+ 
+ If you are statistician, "OLS" means "ordinary least squares" (as opposed to
+ weighted or generalized least squares), which is what the NN literature
+ often calls "LMS" (least mean squares). 
+ 
+ If you are a neural networker, "OLS" means "orthogonal least squares", which
+ is an algorithm for forward stepwise regression proposed by Chen et al.
+ (1991) for training RBF networks. 
+ 
+ OLS is a variety of supervised training. But whereas backprop and other
+ commonly-used supervised methods are forms of continuous optimization, OLS
+ is a form of combinatorial optimization. Rather than treating the RBF
+ centers as continuous values to be adjusted to reduce the training error,
+ OLS starts with a large set of candidate centers and selects a subset that
+ usually provides good training error. For small training sets, the
+ candidates can include all of the training cases. For large training sets,
+ it is more efficient to use a random subset of the training cases or to do a
+ cluster analysis and use the cluster means as candidates. 
+ 
+ Each center corresponds to a predictor variable in a linear regression
+ model. The values of these predictor variables are computed from the RBF
+ applied to each center. There are numerous methods for selecting a subset of
+ predictor variables in regression (Myers 1986; Miller 1990). The ones most
+ often used are: 
+ 
+  o Forward selection begins with no centers in the network. At each step the
+    center is added that most decreases the error function. 
+  o Backward elimination begins with all candidate centers in the network. At
+    each step the center is removed that least increases the error function. 
+  o Stepwise selection begins like forward selection with no centers in the
+    network. At each step, a center is added or removed. If there are any
+    centers in the network, the one that contributes least to reducing the
+    error criterion is subjected to a statistical test (usually based on the
+    F statistic) to see if it is worth retaining in the network; if the
+    center fails the test, it is removed. If no centers are removed, then the
+    centers that are not currently in the network are examined; the one that
+    would contribute most to reducing the error criterion is subjected to a
+    statistical test to see if it is worth adding to the network; if the
+    center passes the test, it is added. When all centers in the network pass
+    the test for staying in the network, and all other centers fail the test
+    for being added to the network, the stepwise method terminates. 
+  o Leaps and bounds (Furnival and Wilson 1974) is an algorithm for
+    determining the subset of centers that minimizes the error function; this
+    optimal subset can be found without examining all possible subsets, but
+    the algorithm is practical only up to 30 to 50 candidate centers. 
+ 
+ OLS is a particular algorithm for forward selection using modified
+ Gram-Schmidt (MGS) orthogonalization. While MGS is not a bad algorithm, it

==> nn3.changes.body <==
*** nn3.oldbody	Tue May 28 23:00:20 1996
--- nn3.body	Fri Jun 28 23:00:20 1996
***************
*** 1,4 ****
  Archive-name: ai-faq/neural-nets/part3
! Last-modified: 1996-05-27
  URL: ftp://ftp.sas.com/pub/neural/FAQ3.html
  Maintainer: saswss@unx.sas.com (Warren S. Sarle)
--- 1,4 ----
  Archive-name: ai-faq/neural-nets/part3
! Last-modified: 1996-06-25
  URL: ftp://ftp.sas.com/pub/neural/FAQ3.html
  Maintainer: saswss@unx.sas.com (Warren S. Sarle)
***************
*** 64,68 ****
  possible, please inform the FAQ maintainer (saswss@unx.sas.com). 
  
! The second necessary condition for good generalization. is that the training
  cases be a sufficiently large and representative subset ("sample" in
  statistical terminology) of the set of all cases that you want to generalize
--- 64,68 ----
  possible, please inform the FAQ maintainer (saswss@unx.sas.com). 
  
! The second necessary condition for good generalization is that the training
  cases be a sufficiently large and representative subset ("sample" in
  statistical terminology) of the set of all cases that you want to generalize
***************
*** 95,101 ****
  nonlinear models, although still not nearly as safe as interpolation. You
  can also use such information to choose the training cases more efficiently.
! For example, with a linear model, you should choose training at the outer
! limits of the input space instead of spreading them out throughout the input
! space. 
  
  ------------------------------------------------------------------------
--- 95,101 ----
  nonlinear models, although still not nearly as safe as interpolation. You
  can also use such information to choose the training cases more efficiently.
! For example, with a linear model, you should choose training cases at the
! outer limits of the input space instead of evenly distributing them out
! throughout the input space. 
  
  ------------------------------------------------------------------------
***************
*** 108,112 ****
  set is. On the other hand, injecting artificial noise (jitter) into the
  inputs during training is one of several ways to improve generalization for
! smooth functions when you have a small training set. See "What is jitter?".
  
  Certain assumptions about noise are necessary for theoretical results.
--- 108,112 ----
  set is. On the other hand, injecting artificial noise (jitter) into the
  inputs during training is one of several ways to improve generalization for
! smooth functions when you have a small training set. 
  
  Certain assumptions about noise are necessary for theoretical results.
***************
*** 125,128 ****
--- 125,130 ----
  learned by the type of net you are using. 
  
+ Noise in the target values is exacerbated by overfitting (Moody 1992). 
+ 
  Noise in the inputs also limits the accuracy of generalization, but in a
  more complicated way than does noise in the targets. In a region of the
***************
*** 131,134 ****
--- 133,142 ----
  noise can degrade generalization severely. 
  
+ Furthermore, if the target function is Y=f(X), but you observe noisy inputs
+ X+D, you cannot obtain an arbitrarily accurate estimate of f(X) given X+D no
+ matter how large a training set you use. The net will not learn f(X), but
+ will instead learn a convolution of f(X) with the distribution of the noise
+ D (see "What is jitter?)" 
+ 
  For more details, see one of the statistically-oriented references on neural
  nets such as: 
***************
*** 143,146 ****
--- 151,158 ----
     ed., London: Chapman & Hall. 
  
+    Moody, J.E. (1992), "The Effective Number of Parameters: An Analysis of
+    Generalization and Regularization in Nonlinear Learning Systems", NIPS 4,
+    847-854. 
+ 
     Ripley, B.D. (1996) Pattern Recognition and Neural Networks, Cambridge:
     Cambridge University Press. 
***************
*** 161,167 ****
  easily lead to predictions that are far beyond the range of the training
  data with many of the common types of NNs. But underfitting can also produce
! wild predictions in multilayer perceptrons, even with noise-free data. There
! are graphical examples of overfitting and underfitting in Sarle (1995). 
  
  The best way to avoid overfitting is to use lots of training data. If you
  have at least 30 times as many training cases as there are weights in the
--- 173,186 ----
  easily lead to predictions that are far beyond the range of the training
  data with many of the common types of NNs. But underfitting can also produce
! wild predictions in multilayer perceptrons, even with noise-free data. 
  
+ For an elementary discussion of overfitting, see Smith (1993). For a more
+ rigorous approach, see the article by Geman, Bienenstock, and Doursat (1992)
+ on the bias/variance trade-off (it's not really a dilemma). We are talking
+ statistical bias here: the difference between the average value of an
+ estimator and the correct value. Underfitting produces excessive bias in the
+ outputs, whereas overfitting produces excessive variance. There are
+ graphical examples of overfitting and underfitting in Sarle (1995). 
+ 
  The best way to avoid overfitting is to use lots of training data. If you
  have at least 30 times as many training cases as there are weights in the
***************
*** 179,191 ****
   o Bayesian estimation 
  
  The complexity of a network is related to both the number of weights and the
  size of the weights. Model selection is concerned with the number of
! weights, and hence the number of hidden units and layers. The other
! approaches listed above are concerned, directly or indirectly, with the size
! of the weights. 
! 
! The issue of overfitting/underfitting is intimately related to the
! bias/variance tradeoff in nonparametric estimation (Geman, Bienenstock, and
! Doursat 1992). 
  
  References: 
--- 198,212 ----
   o Bayesian estimation 
  
+ There approaches are discussed in more detail under subsequent questions. 
+ 
  The complexity of a network is related to both the number of weights and the
  size of the weights. Model selection is concerned with the number of
! weights, and hence the number of hidden units and layers. The more weights
! there are, relative to the number of training cases, the more overfitting
! amplifies noise in the targets (Moody 1992). The other approaches listed
! above are concerned, directly or indirectly, with the size of the weights.
! Reducing the size of the weights reduces the "effective" number of
! weights--see Moody (1992) regarding weight decay and Weigend (1994)
! regarding early stopping. 
  
  References: 
***************
*** 194,197 ****
--- 215,222 ----
     the Bias/Variance Dilemma", Neural Computation, 4, 1-58. 
  
+    Moody, J.E. (1992), "The Effective Number of Parameters: An Analysis of
+    Generalization and Regularization in Nonlinear Learning Systems", NIPS 4,
+    847-854. 
+ 
     Sarle, W.S. (1995), "Stopped Training and Other Remedies for
     Overfitting," to appear in Proceedings of the 27th Symposium on the
***************
*** 199,202 ****
--- 224,234 ----
     large compressed postscript file, 747K, 10 pages) 
  
+    Smith, M. (1993), Neural Networks for Statistical Modeling, NY: Van
+    Nostrand Reinhold. 
+ 
+    Weigend, A. (1994), "On overfitting and the effective number of hidden
+    units," Proceedings of the 1993 Connectionist Models Summer School,
+    335-342. 
+ 
  ------------------------------------------------------------------------
  
***************
*** 207,216 ****
  Training with jitter is a form of smoothing related to kernel regression
  (see "What is GRNN?"). It is also closely related to regularization methods
! such as weight decay and ridge regression (see "What is weight decay?"). 
  
  Training with jitter works because the functions that we want NNs to learn
  are mostly smooth. NNs can learn functions with discontinuities, but the
! functions must be continuous in a finite number of regions if our network is
! restricted to a finite number of hidden units. 
  
  In other words, if we have two cases with similar inputs, the desired
--- 239,248 ----
  Training with jitter is a form of smoothing related to kernel regression
  (see "What is GRNN?"). It is also closely related to regularization methods
! such as weight decay and ridge regression. 
  
  Training with jitter works because the functions that we want NNs to learn
  are mostly smooth. NNs can learn functions with discontinuities, but the
! functions must be piecewise continuous in a finite number of regions if our
! network is restricted to a finite number of hidden units. 
  
  In other words, if we have two cases with similar inputs, the desired
***************
*** 235,241 ****
  definition, be the Nadaraya-Watson kernel regression estimator using the
  jitter density as the kernel. Hence, training with jitter is an
! approximation to training on the kernel regression estimator. And choosing
! the amount (variance) of jitter is equivalent to choosing the bandwidth of
! the kernel regression estimator (Scott 1992). 
  
  When studying nonlinear models such as feedforward NNs, it is often helpful
--- 267,273 ----
  definition, be the Nadaraya-Watson kernel regression estimator using the
  jitter density as the kernel. Hence, training with jitter is an
! approximation to training with the kernel regression estimator as target.
! Choosing the amount (variance) of jitter is equivalent to choosing the
! bandwidth of the kernel regression estimator (Scott 1992). 
  
  When studying nonlinear models such as feedforward NNs, it is often helpful
***************
*** 506,510 ****
  near discontinuities. Excessively large weights leading to output units can
  cause wild outputs far beyond the range of the data if the output activation
! function is not bounded to the same range as the data. 
  
  Other penalty terms besides the sum of squared weights are sometimes used. 
--- 538,544 ----
  near discontinuities. Excessively large weights leading to output units can
  cause wild outputs far beyond the range of the data if the output activation
! function is not bounded to the same range as the data. To put it another
! way, large weights can cause excessive variance of the output (Geman,
! Bienenstock, and Doursat 1992). 
  
  Other penalty terms besides the sum of squared weights are sometimes used. 
***************
*** 541,547 ****
  generalization error often requires vast amounts of computation. 
  
! Fortunately, there is a superior alternative to weight decay: hierarchical
  Bayesian estimation. Bayesian estimation makes it possible to estimate
! efficiently numerous decay constants. See "What is Bayesian estimation?" 
  
  References: 
--- 575,581 ----
  generalization error often requires vast amounts of computation. 
  
! Fortunately, there is a superior alternative to weight decay: hierarchical 
  Bayesian estimation. Bayesian estimation makes it possible to estimate
! efficiently numerous decay constants. 
  
  References: 
***************
*** 550,553 ****
--- 584,590 ----
     Oxford University Press. 
  
+    Geman, S., Bienenstock, E. and Doursat, R. (1992), "Neural Networks and
+    the Bias/Variance Dilemma", Neural Computation, 4, 1-58. 
+ 
     Ripley, B.D. (1996) Pattern Recognition and Neural Networks, Cambridge:
     Cambridge University Press. 
***************
*** 606,610 ****
  may get better generalization with a simple linear model than with a
  complicated nonlinear model if there is too little data or too much noise to
! estimate the nonlinear model accurately. 
  
  In MLPs with step/threshold/Heaviside activation functions, you need two
--- 643,647 ----
  may get better generalization with a simple linear model than with a
  complicated nonlinear model if there is too little data or too much noise to
! estimate the nonlinearities accurately. 
  
  In MLPs with step/threshold/Heaviside activation functions, you need two
***************
*** 612,633 ****
  Bishop (1995, 121-126). 
  
! In MLPs with any of a wide variety of nonlinear hidden-layer activation
! functions, one hidden layer suffices for the "universal approximation"
! property (e.g., Hornik, Stinchcombe and White 1989; Hornik 1993; for more
! references, see Bishop 1995, 130). In practice, a single hidden layer is
! sufficient if you have only one input. But if you have two or more inputs,
! an MLP with a single hidden layer may require a huge number of hidden units
! to fit simple smooth surfaces, and you can often get a better fit with fewer
! weights using two hidden layers (Chester 1990). More than two hidden layers
! can be useful in certain architectures such as cascade correlation (Fahlman
! and Lebiere 1990) and in special applications (Le Cun et al. 1989). 
! 
! Be wary of using any MLP with two or more inputs and fewer than 20 units in
! a single hidden layer, since you can easily get spurious ridges and valleys
! in the output function (see "How many hidden units should I use?)" 
  
  RBF networks are most often used with a single hidden layer. But an extra,
  linear hidden layer before the radial hidden layer enables the network to
! ignore irrelevant inputs (see How do MLPs compare with RBFs?") 
  
  References: 
--- 649,794 ----
  Bishop (1995, 121-126). 
  
! In MLPs with any of a wide variety of continuous nonlinear hidden-layer
! activation functions, one hidden layer with an arbitrarily large number of
! units suffices for the "universal approximation" property (e.g., Hornik,
! Stinchcombe and White 1989; Hornik 1993; for more references, see Bishop
! 1995, 130). But there is no theory yet to tell you how many hidden units are
! needed to approximate any given function. 
! 
! If you have only one input, there seems to be no advantage to using more
! than one hidden layer. But things get much more complicated when there are
! two or more inputs. To illustrate, examples with two inputs and one output
! will be used so that the results can be shown graphically. In each example
! there are 441 training cases on a regular 21-by-21 grid. The test sets have
! 1681 cases on a regular 41-by-41 grid over the same domain as the training
! set. If you are reading the HTML version of this document via a web browser,
! you can see surface plots based on the test set by clicking on the file
! names mentioned in the folowing text. Each plot is a gif file, approximately
! 9K in size. 
! 
! Consider a target function of two inputs, consisting of a Gaussian hill in
! the middle of a plane (hill.gif). An MLP with an identity output activation
! function can easily fit the hill by surrounding it with a few sigmoid
! (logistic, tanh, arctan, etc.) hidden units, but there will be spurious
! ridges and valleys where the plane should be flat (h_mlp_6.gif). It takes
! dozens of hidden units to flatten out the plane accurately (h_mlp_30.gif). 
! 
! Now suppose you use a logistic output activation function. As the input to a
! logistic function goes to negative infinity, the output approaches zero. The
! plane in the Gaussian target function also has a value of zero. If the
! weights and bias for the output layer yield large negative values outside
! the base of the hill, the logistic function will flatten out any spurious
! ridges and valleys. So fitting the flat part of the target function is easy 
! (h_mlpt_3_unsq.gif and h_mlpt_3.gif). But the logistic function also tends
! to lower the top of the hill. 
! 
! If instead of a rounded hill, the target function was a mesa with a large,
! flat top with a value of one, the logistic output activation function would
! be able to smooth out the top of the mesa just like it smooths out the plane
! below. Target functions like this, with large flat areas with values of
! either zero or one, are just what you have in many noise-free classificaton
! problems. In such cases, a single hidden layer is likely to work well. 
! 
! When using a logistic output activation function, it is common practice to
! scale the target values to a range of .1 to .9. Such scaling is bad in a
! noise-free classificaton problem, because it prevents the logistic function
! from smoothing out the flat areas (h_mlpt1-9_3.gif). 
! 
! For the Gaussian target function, [.1,.9] scaling would make it easier to
! fit the top of the hill, but would reintroduce undulations in the plane. It
! would be better for the Gaussian target function to scale the target values
! to a range of 0 to .9. But for a more realistic and complicated target
! function, how would you know the best way to scale the target values? 
! 
! By introducing a second hidden layer with one sigmoid activation function
! and returning to an identity output activation function, you can let the net
! figure out the best scaling (h_mlp1_3.gif). Actually, the bias and weight
! for the output layer scale the output rather than the target values, and you
! can use whatever range of target values is convenient. 
! 
! For more complicated target functions, especially those with several hills
! or valleys, it is useful to have several units in the second hidden layer.
! Each unit in the second hidden layer enables the net to fit a separate hill
! or valley. So an MLP with two hidden layers can often yield an accurate
! approximation with fewer weights than an MLP with one hidden layer. (Chester
! 1990). 
! 
! To illustrate the use of multiple units in the second hidden layer, the next
! example resembles a landscape with a Gaussian hill and a Gaussian valley,
! both elliptical (hillanvale.gif). The table below gives the RMSE (root mean
! squared error) for the test set with various architectures. If you are
! reading the HTML version of this document via a web browser, click on any
! number in the table to see a surface plot of the corresponding network
! output. 
! 
! The MLP networks in the table have one or two hidden layers with a tanh
! activation function. The output activation function is the identity. Using a
! squashing function on the output layer is of no benefit for this function,
! since the only flat area in the function has a target value near the middle
! of the target range. 
! 
!           Hill and Valley Data: RMSE for the Test Set
!               (Number of weights in parentheses)
!                          MLP Networks
! 
! HUs in                  HUs in Second Layer
! First  ----------------------------------------------------------
! Layer    0           1           2           3           4
!  1     0.204(  5)  0.204(  7)  0.189( 10)  0.187( 13)  0.185( 16)
!  2     0.183(  9)  0.163( 11)  0.147( 15)  0.094( 19)  0.096( 23)
!  3     0.159( 13)  0.095( 15)  0.054( 20)  0.033( 25)  0.045( 30)
!  4     0.137( 17)  0.093( 19)  0.009( 25)  0.021( 31)  0.016( 37)
!  5     0.121( 21)  0.092( 23)              0.010( 37)  0.011( 44)
!  6     0.100( 25)  0.092( 27)              0.007( 43)  0.005( 51)
!  7     0.086( 29)  0.077( 31)
!  8     0.079( 33)  0.062( 35)
!  9     0.072( 37)  0.055( 39)
! 10     0.059( 41)  0.047( 43)
! 12     0.047( 49)  0.042( 51)
! 15     0.039( 61)  0.032( 63)
! 20     0.025( 81)  0.018( 83)  
! 25     0.021(101)  0.016(103)  
! 30     0.018(121)  0.015(123)  
! 40     0.012(161)  0.015(163)  
! 50     0.008(201)  0.014(203)  
! 
! For an MLP with only one hidden layer (column 0), Gaussian hills and valleys
! require a large number of hidden units to approximate well. When there is
! one unit in the second hidden layer, the network can represent one hill or
! valley easily, which is what happens with three to six units in the first
! hidden layer. But having only one unit in the second hidden layer is of
! little benefit for learning two hills or valleys. Using two units in the
! second hidden layer enables the network to approximate two hills or valleys
! easily; in this example, only four units are required in the first hidden
! layer to get an excellent fit. Each additional unit in the second hidden
! layer enables the network to learn another hill or valley with a relatively
! small number of units in the first hidden layer, as explained by Chester
! (1990). In this example, having three or four units in the second hidden
! layer helps little, and actually produces a worse approximation when there
! are four units in the first hidden layer due to problems with local minima. 
! 
! Unfortunately, using two hidden layers exacerbates the problem of local
! minima, and it is important to use lots of random initializations or other
! methods for global optimization. Local minima with two hidden layers can
! have extreme spikes or blades even when the number of weights is much
! smaller than the number of training cases. One of the few advantages of 
! standard backprop is that it is so slow that spikes and blades will not
! become very sharp for practical training times. 
! 
! More than two hidden layers can be useful in certain architectures such as
! cascade correlation (Fahlman and Lebiere 1990) and in special applications,
! such as the two-spirals problem (Lang and Witbrock 1988) and ZIP code
! recognition (Le Cun et al. 1989). 
  
  RBF networks are most often used with a single hidden layer. But an extra,
  linear hidden layer before the radial hidden layer enables the network to
! ignore irrelevant inputs (see How do MLPs compare with RBFs?") The linear
! hidden layer allows the RBFs to take elliptical, rather than radial
! (circular), shapes in the space of the inputs. Hence the linear layer gives
! you an elliptical basis function (EBF) network. In the hill and valley
! example, an ORBFUN network requires nine hidden units (37 weights) to get
! the test RMSE below .01, but by adding a linear hidden layer, you can get an
! essentially perfect fit with three linear units followed by two radial units
! (20 weights). 
  
  References: 
***************
*** 649,652 ****
--- 810,818 ----
     Neural Networks, 6, 1069-1072. 
  
+    Lang, K.J. and Witbrock, M.J. (1988), "Learning to tell two spirals
+    apart," in Touretzky, D., Hinton, G., and Sejnowski, T., eds., 
+    Procedings of the 1988 Connectionist Models Summer School, San Mateo,
+    CA: Morgan Kaufmann. 
+ 
     Le Cun, Y., Boser, B., Denker, J.s., Henderson, D., Howard, R.E.,
     Hubbard, W., and Jackel, L.D. (1989), "Backpropagation applied to
***************
*** 687,694 ****
  must simply try many networks with different numbers of hidden units, 
  estimate the generalization error for each one, and choose the network with
! the minimum estimated generalization error. However, there is little point
! in trying a network with more weights than training cases, since such a
! large network is likely to overfit. 
  
  If you are using early stopping, it is essential to use lots of hidden units
  to avoid bad local optima (Sarle 1995). There seems to be no upper limit on
--- 853,866 ----
  must simply try many networks with different numbers of hidden units, 
  estimate the generalization error for each one, and choose the network with
! the minimum estimated generalization error. 
  
+ Using conventional optimization algorithms (see "What are conjugate
+ gradients, Levenberg-Marquardt, etc.?"), there is little point in trying a
+ network with more weights than training cases, since such a large network is
+ likely to overfit. But Lawrence, Giles, and Tsoi (1996) have shown that
+ standard online backprop can have considerable difficulty reducing training
+ error to a level near the globally optimal value, hence using "oversize"
+ networks can reduce both training error and generalization error. 
+ 
  If you are using early stopping, it is essential to use lots of hidden units
  to avoid bad local optima (Sarle 1995). There seems to be no upper limit on
***************
*** 735,738 ****
--- 907,917 ----
     the Bias/Variance Dilemma", Neural Computation, 4, 1-58. 
  
+    Lawrence, S., Giles, C.L., and Tsoi, A.C. (1996), "What size neural
+    network gives optimal generalization? Convergence properties of
+    backpropagation," Technical Report UMIACS-TR-96-22 and CS-TR-3617,
+    Institute for Advanced Computer Studies, University of Maryland, College
+    Park, MD 20742,
+    http://www.neci.nj.nec.com/homepages/lawrence/papers/minima-tr96/minima-tr96.html
+ 
     Neal, R.M. (1995), Bayesian Learning for Neural Networks, Ph.D. thesis,
     University of Toronto, ftp://ftp.cs.toronto.edu/pub/radford/thesis.ps.Z. 
***************
*** 882,885 ****
  ------------------------------------------------------------------------
  
! Next part is part 4 (of 7). Previous part is part 2. @
  
--- 1061,1064 ----
  ------------------------------------------------------------------------
  
! Next part is part 4 (of 7). Previous part is part 2. 
  

==> nn4.changes.body <==
*** nn4.oldbody	Tue May 28 23:00:24 1996
--- nn4.body	Fri Jun 28 23:00:24 1996
***************
*** 1,4 ****
  Archive-name: ai-faq/neural-nets/part4
! Last-modified: 1996-05-27
  URL: ftp://ftp.sas.com/pub/neural/FAQ4.html
  Maintainer: saswss@unx.sas.com (Warren S. Sarle)
--- 1,4 ----
  Archive-name: ai-faq/neural-nets/part4
! Last-modified: 1996-06-27
  URL: ftp://ftp.sas.com/pub/neural/FAQ4.html
  Maintainer: saswss@unx.sas.com (Warren S. Sarle)
***************
*** 41,53 ****
  American, 267 (September), 144-151. 
  
! The best elementary textbooks on NNs
! ------------------------------------
  
! Masters, Timothy (1994). Practical Neural Network Recipes in C++, Academic
! Press, ISBN 0-12-479040-2, US $45 incl. disks.
! "Lots of very good practical advice which most other books lack."
  
  Weiss, S.M. & Kulikowski, C.A. (1991), Computer Systems That Learn,
  Morgan Kaufmann. ISBN 1 55860 065 5. 
  
  The best intermediate textbooks on NNs
--- 41,72 ----
  American, 267 (September), 144-151. 
  
! The best elementary textbooks on using NNs
! ------------------------------------------
  
! Smith, M. (1993). Neural Networks for Statistical Modeling, NY: Van Nostrand
! Reinhold. 
! Smith is not a statistician, but he tries. The book has entire brief
! chapters on overfitting and validation (early stopping and split-sample
! sample validation, which he incorrectly calls cross-validation), putting it
! a rung above most other introductions to NNs. There are also brief chapters
! on data preparation and diagnostic plots, topics usually ignored in
! elementaty NN books. Only feedforward nets are covered in any detail. 
  
  Weiss, S.M. & Kulikowski, C.A. (1991), Computer Systems That Learn,
  Morgan Kaufmann. ISBN 1 55860 065 5. 
+ Briefly covers at a very elementary level feedforward nets, linear and
+ nearest-neighbor discriminant analysis, trees, and expert sytems. For a book
+ at this level, it has an unusually good chapter on estimating generalization
+ error, including bootstrapping. 
+ 
+ The best elementary textbook on using and programming NNs
+ ---------------------------------------------------------
+ 
+ Masters, Timothy (1994). Practical Neural Network Recipes in C++, Academic
+ Press, ISBN 0-12-479040-2, US $45 incl. disks.
+ Masters has written three exceptionally good books on NNs (the two others
+ are listed below). He combines generally sound practical advice with some
+ basic statistical knowledge to produce a programming text that is far
+ superior to the competition (see "The Worst" below). 
  
  The best intermediate textbooks on NNs
***************
*** 107,111 ****
  
  Masters, T. (1994), Signal and Image Processing with Neural Networks: A
! C++ Sourcebook, Wiley.
  
  Cichocki, A. and Unbehauen, R. (1993). Neural Networks for Optimization
--- 126,130 ----
  
  Masters, T. (1994), Signal and Image Processing with Neural Networks: A
! C++ Sourcebook, NY: Wiley.
  
  Cichocki, A. and Unbehauen, R. (1993). Neural Networks for Optimization
***************
*** 127,130 ****
--- 146,155 ----
  MA. 
  
+ The best book on neurofuzzy systems
+ -----------------------------------
+ 
+ Brown, M., and Harris, C. (1994), Neurofuzzy Adaptive Modelling and
+ Control, NY: Prentice Hall. 
+ 
  The best comparison of NNs with other classification methods
  ------------------------------------------------------------
***************
*** 137,153 ****
  
  Aleksander, I. and Morton, H. (1990). An Introduction to Neural Computing.
! Chapman and Hall. (ISBN 0-412-37780-2). Comments: "This book seems to be
! intended for the first year of university education."
  
  Beale, R. and Jackson, T. (1990). Neural Computing, an Introduction. Adam
! Hilger, IOP Publishing Ltd : Bristol. (ISBN 0-85274-262-2). Comments: "It's
! clearly written. Lots of hints as to how to get the adaptive models covered
! to work (not always well explained in the original sources). Consistent
! mathematical terminology. Covers perceptrons, error-backpropagation, Kohonen
! self-org model, Hopfield type models, ART, and associative memories."
  
  Dayhoff, J. E. (1990). Neural Network Architectures: An Introduction. Van
! Nostrand Reinhold: New York. Comments: "Like Wasserman's book, Dayhoff's
! book is also very easy to understand".
  
  Fausett, L. V. (1994). Fundamentals of Neural Networks: Architectures,
--- 162,193 ----
  
  Aleksander, I. and Morton, H. (1990). An Introduction to Neural Computing.
! Chapman and Hall. (ISBN 0-412-37780-2). 
! Comments: "This book seems to be intended for the first year of university
! education."
  
  Beale, R. and Jackson, T. (1990). Neural Computing, an Introduction. Adam
! Hilger, IOP Publishing Ltd : Bristol. (ISBN 0-85274-262-2). 
! Comments: "It's clearly written. Lots of hints as to how to get the adaptive
! models covered to work (not always well explained in the original sources).
! Consistent mathematical terminology. Covers perceptrons,
! error-backpropagation, Kohonen self-org model, Hopfield type models, ART,
! and associative memories."
! 
! Caudill, M. and Butler, C. (1990). Naturally Intelligent Systems. MIT Press:
! Cambridge, Massachusetts. (ISBN 0-262-03156-6). 
! The authors try to translate mathematical formulas into English. The results
! are likely to disturb people who appreciate either mathematics or English.
! Have the authors never heard that "a picture is worth a thousand words"?
! What few diagrams they have (such as the one on p. 74) tend to be confusing.
! Their jargon is peculiar even by NN standards. 
! 
! Chester, M. (1993). Neural Networks: A Tutorial, Englewood Cliffs, NJ: PTR
! Prentice Hall. 
! Shallow, sometimes confused, especially with regard to Kohonen networks. 
  
  Dayhoff, J. E. (1990). Neural Network Architectures: An Introduction. Van
! Nostrand Reinhold: New York. 
! Comments: "Like Wasserman's book, Dayhoff's book is also very easy to
! understand".
  
  Fausett, L. V. (1994). Fundamentals of Neural Networks: Architectures,
***************
*** 155,162 ****
  published as a Prentice Hall International Edition, ISBN 0-13-042250-9.
  Sample softeware (source code listings in C and Fortran) is included in an
! Instructor's Manual. "Intermediate in level between Wasserman and
! Hertz/Krogh/Palmer. Algorithms for a broad range of neural networks,
! including a chapter on Adaptive Resonace Theory with ART2. Simple examples
! for each network."
  
  Freeman, James (1994). Simulating Neural Networks with Mathematica,
--- 195,202 ----
  published as a Prentice Hall International Edition, ISBN 0-13-042250-9.
  Sample softeware (source code listings in C and Fortran) is included in an
! Instructor's Manual.
! "Intermediate in level between Wasserman and Hertz/Krogh/Palmer. Algorithms
! for a broad range of neural networks, including a chapter on Adaptive
! Resonance Theory with ART2. Simple examples for each network."
  
  Freeman, James (1994). Simulating Neural Networks with Mathematica,
***************
*** 166,169 ****
--- 206,214 ----
  World Wide Web.
  
+ Freeman, J.A. and Skapura, D.M. (1991). Neural Networks: Algorithms,
+ Applications, and Programming Techniques, Reading, MA: Addison-Wesley. 
+ A good book for beginning programmers who want to learn how to write NN
+ programs while avoiding any understanding of what NNs do or why they do it. 
+ 
  Gately, E. (1996). Neural Networks for Financial Forecasting. New York:
  John Wiley and Sons, Inc.
***************
*** 176,271 ****
  * Nothing here for those with any neural net experience
  
! Haykin, S. (1994). Neural Networks, a Comprehensive Foundation.
! Macmillan, New York, NY.
! "A very readable, well written intermediate text on NNs Perspective is
! primarily one of pattern recognition, estimation and signal processing.
! However, there are well-written chapters on neurodynamics and VLSI
! implementation. Though there is emphasis on formal mathematical models of
! NNs as universal approximators, statistical estimators, etc., there are also
! examples of NNs used in practical applications. The problem sets at the end
! of each chapter nicely complement the material. In the bibliography are over
! 1000 references."
! 
! Hecht-Nielsen, R. (1990). Neurocomputing. Addison Wesley. Comments: "A
! good book", "comprises a nice historical overview and a chapter about NN
! hardware. Well structured prose. Makes important concepts clear."
  
  McClelland, J. L. and Rumelhart, D. E. (1988). Explorations in Parallel
  Distributed Processing: Computational Models of Cognition and Perception
! (software manual). The MIT Press. Comments: "Written in a tutorial style,
! and includes 2 diskettes of NN simulation programs that can be compiled on
! MS-DOS or Unix (and they do too !)"; "The programs are pretty reasonable as
! an introduction to some of the things that NNs can do."; "There are *two*
! editions of this book. One comes with disks for the IBM PC, the other comes
! with disks for the Macintosh".
  
  McCord Nelson, M. and Illingworth, W.T. (1990). A Practical Guide to Neural
! Nets. Addison-Wesley Publishing Company, Inc. (ISBN 0-201-52376-0).
! Comments: "No formulas at all"; "It does not have much detailed model
! development (very few equations), but it does present many areas of
! application. It includes a chapter on current areas of research. A variety
! of commercial applications is discussed in chapter 1. It also includes a
! program diskette with a fancy graphical interface (unlike the PDP
! diskette)".
  
  Muller, B., Reinhardt, J., Strickland, M. T. (1995). Neural Networks. An
  Introduction (2nd ed.). Berlin, Heidelberg, New York: Springer-Verlag. ISBN
! 3-540-60207-0. (DOS 3.5" disk included.) Comments: The book was developed
! out of a course on neural-network models with computer demonstrations that
! was taught by the authors to Physics students. The book comes together with
! a PC-diskette. The book is divided into three parts: (1) Models of Neural
! Networks; describing several architectures and learing rules, including the
! mathematics. (2) Statistical Physiscs of Neural Networks; "hard-core"
! physics section developing formal theories of stochastic neural networks.
! (3) Computer Codes; explanation about the demonstration programs. First part
! gives a nice introduction into neural networks together with the formulas.
! Together with the demonstration programs a 'feel' for neural networks can be
! developed.
  
  Orchard, G.A. & Phillips, W.A. (1991). Neural Computation: A Beginner's
! Guide. Lawrence Earlbaum Associates: London. Comments: "Short user-friendly
! introduction to the area, with a non-technical flavour. Apparently
! accompanies a software package, but I haven't seen that yet".
  
  Rao, V.B & H.V. (1993). C++ Neural Networks and Fuzzy Logic. MIS:Press,
! ISBN 1-55828-298-x, US $45 incl. disks. "Probably not 'leading edge' stuff
! but detailed enough to get your hands dirty!"
  
  Wasserman, P. D. (1989). Neural Computing: Theory & Practice. Van Nostrand
! Reinhold: New York. (ISBN 0-442-20743-3) Comments: "Wasserman flatly
! enumerates some common architectures from an engineer's perspective ('how it
! works') without ever addressing the underlying fundamentals ('why it works')
! - important basic concepts such as clustering, principal components or
! gradient descent are not treated. It's also full of errors, and unhelpful
! diagrams drawn with what appears to be PCB board layout software from the
! '70s. For anyone who wants to do active research in the field I consider it
! quite inadequate"; "Okay, but too shallow"; "Quite easy to understand"; "The
! best bedtime reading for Neural Networks. I have given this book to numerous
! collegues who want to know NN basics, but who never plan to implement
! anything. An excellent book to give your manager."
! 
! Wasserman, P.D. (1993). Advanced Methods in Neural Computing. Van
! Nostrand Reinhold: New York (ISBN: 0-442-00461-3). Comments: Several neural
! network topics are discussed e.g. Probalistic Neural Networks,
! Backpropagation and beyond, neural control, Radial Basis Function Networks,
! Neural Engineering. Furthermore, several subjects related to neural networks
! are mentioned e.g. genetic algorithms, fuzzy logic, chaos. Just the
! functionality of these subjects is described; enough to get you started.
! Lots of references are given to more elaborate descriptions. Easy to read,
! no extensive mathematical background necessary.
! 
! Zurada, Jacek M. (1992). Introduction To Artificial Neural Systems.
! Hardcover, 785 Pages, 317 Figures, ISBN 0-534-95460-X, 1992, PWS Publishing
! Company, Price: $56.75 (includes shipping, handling, and the ANS software
! diskette). Solutions Manual available.
! "Cohesive and comprehensive book on neural nets; as an engineering-oriented
! introduction, but also as a research foundation. Thorough exposition of
! fundamentals, theory and applications. Training and recall algorithms appear
! in boxes showing steps of algorithms, thus making programming of learning
! paradigms easy. Many illustrations and intuitive examples. Winner among NN
! textbooks at a senior UG/first year graduate level-[175 problems]."
! Contents: Intro, Fundamentals of Learning, Single-Layer & Multilayer
! Perceptron NN, Assoc. Memories, Self-organizing and Matching Nets,
! Applications, Implementations, Appendix) 
  
  The Classics:
--- 221,280 ----
  * Nothing here for those with any neural net experience
  
! Hecht-Nielsen, R. (1990). Neurocomputing. Addison Wesley. 
! Comments: "A good book", "comprises a nice historical overview and a chapter
! about NN hardware. Well structured prose. Makes important concepts clear."
  
  McClelland, J. L. and Rumelhart, D. E. (1988). Explorations in Parallel
  Distributed Processing: Computational Models of Cognition and Perception
! (software manual). The MIT Press. 
! Comments: "Written in a tutorial style, and includes 2 diskettes of NN
! simulation programs that can be compiled on MS-DOS or Unix (and they do too
! !)"; "The programs are pretty reasonable as an introduction to some of the
! things that NNs can do."; "There are *two* editions of this book. One comes
! with disks for the IBM PC, the other comes with disks for the Macintosh".
  
  McCord Nelson, M. and Illingworth, W.T. (1990). A Practical Guide to Neural
! Nets. Addison-Wesley Publishing Company, Inc. (ISBN 0-201-52376-0). 
! Lots of applications without technical details, lots of hype, lots of goofs,
! no formulas.
  
  Muller, B., Reinhardt, J., Strickland, M. T. (1995). Neural Networks. An
  Introduction (2nd ed.). Berlin, Heidelberg, New York: Springer-Verlag. ISBN
! 3-540-60207-0. (DOS 3.5" disk included.) 
! Comments: The book was developed out of a course on neural-network models
! with computer demonstrations that was taught by the authors to Physics
! students. The book comes together with a PC-diskette. The book is divided
! into three parts: (1) Models of Neural Networks; describing several
! architectures and learing rules, including the mathematics. (2) Statistical
! Physiscs of Neural Networks; "hard-core" physics section developing formal
! theories of stochastic neural networks. (3) Computer Codes; explanation
! about the demonstration programs. First part gives a nice introduction into
! neural networks together with the formulas. Together with the demonstration
! programs a 'feel' for neural networks can be developed.
  
  Orchard, G.A. & Phillips, W.A. (1991). Neural Computation: A Beginner's
! Guide. Lawrence Earlbaum Associates: London. 
! Comments: "Short user-friendly introduction to the area, with a
! non-technical flavour. Apparently accompanies a software package, but I
! haven't seen that yet".
  
  Rao, V.B & H.V. (1993). C++ Neural Networks and Fuzzy Logic. MIS:Press,
! ISBN 1-55828-298-x, US $45 incl. disks. 
! "Probably not 'leading edge' stuff but detailed enough to get your hands
! dirty!"
  
  Wasserman, P. D. (1989). Neural Computing: Theory & Practice. Van Nostrand
! Reinhold: New York. (ISBN 0-442-20743-3) 
! Comments: "Wasserman flatly enumerates some common architectures from an
! engineer's perspective ('how it works') without ever addressing the
! underlying fundamentals ('why it works') - important basic concepts such as
! clustering, principal components or gradient descent are not treated. It's
! also full of errors, and unhelpful diagrams drawn with what appears to be
! PCB board layout software from the '70s. For anyone who wants to do active
! research in the field I consider it quite inadequate"; "Okay, but too
! shallow"; "Quite easy to understand"; "The best bedtime reading for Neural
! Networks. I have given this book to numerous collegues who want to know NN
! basics, but who never plan to implement anything. An excellent book to give
! your manager."
  
  The Classics:
***************
*** 273,285 ****
  
  Kohonen, T. (1984). Self-organization and Associative Memory.
! Springer-Verlag: New York. (2nd Edition: 1988; 3rd edition: 1989). Comments:
! "The section on Pattern mathematics is excellent."
  
  Rumelhart, D. E. and McClelland, J. L. (1986). Parallel Distributed
  Processing: Explorations in the Microstructure of Cognition (volumes 1 & 2).
! The MIT Press. Comments: "As a computer scientist I found the two Rumelhart
! and McClelland books really heavy going and definitely not the sort of thing
! to read if you are a beginner."; "It's quite readable, and affordable (about
! $65 for both volumes)."; "THE Connectionist bible".
  
  Introductory Journal Articles:
--- 282,295 ----
  
  Kohonen, T. (1984). Self-organization and Associative Memory.
! Springer-Verlag: New York. (2nd Edition: 1988; 3rd edition: 1989). 
! Comments: "The section on Pattern mathematics is excellent."
  
  Rumelhart, D. E. and McClelland, J. L. (1986). Parallel Distributed
  Processing: Explorations in the Microstructure of Cognition (volumes 1 & 2).
! The MIT Press. 
! Comments: "As a computer scientist I found the two Rumelhart and McClelland
! books really heavy going and definitely not the sort of thing to read if you
! are a beginner."; "It's quite readable, and affordable (about $65 for both
! volumes)."; "THE Connectionist bible".
  
  Introductory Journal Articles:
***************
*** 287,301 ****
  
  Hinton, G. E. (1989). Connectionist learning procedures. Artificial
! Intelligence, Vol. 40, pp. 185--234. Comments: "One of the better neural
! networks overview papers, although the distinction between network topology
! and learning algorithm is not always very clear. Could very well be used as
! an introduction to neural networks."
  
  Knight, K. (1990). Connectionist, Ideas and Algorithms. Communications of
! the ACM. November 1990. Vol.33 nr.11, pp 59-74. Comments:"A good article,
! while it is for most people easy to find a copy of this journal."
  
  Kohonen, T. (1988). An Introduction to Neural Computing. Neural Networks,
! vol. 1, no. 1. pp. 3-16. Comments: "A general review".
  
  Not-quite-so-introductory Literature:
--- 297,319 ----
  
  Hinton, G. E. (1989). Connectionist learning procedures. Artificial
! Intelligence, Vol. 40, pp. 185--234. 
! Comments: "One of the better neural networks overview papers, although the
! distinction between network topology and learning algorithm is not always
! very clear. Could very well be used as an introduction to neural networks."
  
  Knight, K. (1990). Connectionist, Ideas and Algorithms. Communications of
! the ACM. November 1990. Vol.33 nr.11, pp 59-74. 
! Comments:"A good article, while it is for most people easy to find a copy of
! this journal."
  
  Kohonen, T. (1988). An Introduction to Neural Computing. Neural Networks,
! vol. 1, no. 1. pp. 3-16. 
! Comments: "A general review".
! 
! Rumelhart, D. E., Hinton, G. E. and Williams, R. J. (1986). Learning
! representations by back-propagating errors. Nature, vol 323 (9 October), pp.
! 533-536. 
! Comments: "Gives a very good potted explanation of backprop NN's. It gives
! sufficient detail to write your own NN simulation."
  
  Not-quite-so-introductory Literature:
***************
*** 303,324 ****
  
  Anderson, J. A. and Rosenfeld, E. (Eds). (1988). Neurocomputing:
! Foundations of Research. The MIT Press: Cambridge, MA. Comments: "An
! expensive book, but excellent for reference. It is a collection of reprints
! of most of the major papers in the field." 
  
  Anderson, J. A., Pellionisz, A. and Rosenfeld, E. (Eds). (1990). 
! Neurocomputing 2: Directions for Research. The MIT Press: Cambridge, MA.
  Comments: "The sequel to their well-known Neurocomputing book."
  
! Caudill, M. and Butler, C. (1990). Naturally Intelligent Systems. MIT Press:
! Cambridge, Massachusetts. (ISBN 0-262-03156-6). Comments: "I guess one of
! the best books I read"; "May not be suited for people who want to do some
! research in the area".
  
  Khanna, T. (1990). Foundations of Neural Networks. Addison-Wesley: New
! York. Comments: "Not so bad (with a page of erroneous formulas (if I
! remember well), and #hidden layers isn't well described)."; "Khanna's
! intention in writing his book with math analysis should be commended but he
! made several mistakes in the math part".
  
  Kung, S.Y. (1993). Digital Neural Networks, Prentice Hall, Englewood
--- 321,355 ----
  
  Anderson, J. A. and Rosenfeld, E. (Eds). (1988). Neurocomputing:
! Foundations of Research. The MIT Press: Cambridge, MA. 
! Comments: "An expensive book, but excellent for reference. It is a
! collection of reprints of most of the major papers in the field." 
  
  Anderson, J. A., Pellionisz, A. and Rosenfeld, E. (Eds). (1990). 
! Neurocomputing 2: Directions for Research. The MIT Press: Cambridge, MA. 
  Comments: "The sequel to their well-known Neurocomputing book."
  
! Bourlard, H.A., and Morgan, N. (1994), Connectionist Speech Recognition: A
! Hybrid Approach, Boston: Kluwer Academic Publishers.
! 
! Deco, G. and Obradovic, D. (1996), An Information-Theoretic Approach to
! Neural Computing, NY: Springer-Verlag. 
! 
! Haykin, S. (1994). Neural Networks, a Comprehensive Foundation.
! Macmillan, New York, NY.
! "A very readable, well written intermediate text on NNs Perspective is
! primarily one of pattern recognition, estimation and signal processing.
! However, there are well-written chapters on neurodynamics and VLSI
! implementation. Though there is emphasis on formal mathematical models of
! NNs as universal approximators, statistical estimators, etc., there are also
! examples of NNs used in practical applications. The problem sets at the end
! of each chapter nicely complement the material. In the bibliography are over
! 1000 references."
  
  Khanna, T. (1990). Foundations of Neural Networks. Addison-Wesley: New
! York. 
! Comments: "Not so bad (with a page of erroneous formulas (if I remember
! well), and #hidden layers isn't well described)."; "Khanna's intention in
! writing his book with math analysis should be commended but he made several
! mistakes in the math part".
  
  Kung, S.Y. (1993). Digital Neural Networks, Prentice Hall, Englewood
***************
*** 326,349 ****
  
  Levine, D. S. (1990). Introduction to Neural and Cognitive Modeling.
! Lawrence Erlbaum: Hillsdale, N.J. Comments: "Highly recommended".
  
  Lippmann, R. P. (April 1987). An introduction to computing with neural nets.
  IEEE Acoustics, Speech, and Signal Processing Magazine. vol. 2, no. 4, pp
! 4-22. Comments: "Much acclaimed as an overview of neural networks, but
! rather inaccurate on several points. The categorization into binary and
! continuous- valued input neural networks is rather arbitrary, and may work
! confusing for the unexperienced reader. Not all networks discussed are of
! equal importance."
  
  Maren, A., Harston, C. and Pap, R., (1990). Handbook of Neural Computing
! Applications. Academic Press. ISBN: 0-12-471260-6. (451 pages) Comments:
! "They cover a broad area"; "Introductory with suggested applications
! implementation".
  
  Pao, Y. H. (1989). Adaptive Pattern Recognition and Neural Networks
! Addison-Wesley Publishing Company, Inc. (ISBN 0-201-12584-6) Comments: "An
! excellent book that ties together classical approaches to pattern
! recognition with Neural Nets. Most other NN books do not even mention
! conventional approaches."
  
  Refenes, A. (Ed.) (1995). Neural Networks in the Capital Markets.
--- 357,382 ----
  
  Levine, D. S. (1990). Introduction to Neural and Cognitive Modeling.
! Lawrence Erlbaum: Hillsdale, N.J. 
! Comments: "Highly recommended".
  
  Lippmann, R. P. (April 1987). An introduction to computing with neural nets.
  IEEE Acoustics, Speech, and Signal Processing Magazine. vol. 2, no. 4, pp
! 4-22. 
! Comments: "Much acclaimed as an overview of neural networks, but rather
! inaccurate on several points. The categorization into binary and continuous-
! valued input neural networks is rather arbitrary, and may work confusing for
! the unexperienced reader. Not all networks discussed are of equal
! importance."

==> nn5.changes.body <==
*** nn5.oldbody	Tue May 28 23:00:28 1996
--- nn5.body	Fri Jun 28 23:00:28 1996
***************
*** 1,4 ****
  Archive-name: ai-faq/neural-nets/part5
! Last-modified: 1996-05-27
  URL: ftp://ftp.sas.com/pub/neural/FAQ5.html
  Maintainer: saswss@unx.sas.com (Warren S. Sarle)
--- 1,4 ----
  Archive-name: ai-faq/neural-nets/part5
! Last-modified: 1996-06-27
  URL: ftp://ftp.sas.com/pub/neural/FAQ5.html
  Maintainer: saswss@unx.sas.com (Warren S. Sarle)
***************
*** 55,59 ****
  20. Fuzzy ARTmap 
  21. PYGMALION 
! 22. Basis-of-AI-backprop 
  23. Matrix Backpropagation 
  24. WinNN 
--- 55,59 ----
  20. Fuzzy ARTmap 
  21. PYGMALION 
! 22. Basis-of-AI-NN Software 
  23. Matrix Backpropagation 
  24. WinNN 
***************
*** 64,67 ****
--- 64,68 ----
  29. NeoC Explorer 
  30. AINET 
+ 31. DemoGNG 
  
  See also http://www.emsl.pnl.gov:2080/docs/cie/neural/systems/shareware.html
***************
*** 152,157 ****
     /pub/mac/dartnet.sit.hqx (124 KB). 
  
! 10. SNNS
! ++++++++
  
     "Stuttgart Neural Network Simulator" from the University of Stuttgart,
--- 153,158 ----
     /pub/mac/dartnet.sit.hqx (124 KB). 
  
! 10. SNNS 4.1
! ++++++++++++
  
     "Stuttgart Neural Network Simulator" from the University of Stuttgart,
***************
*** 175,179 ****
     a workstation cluster. Available for ftp from
     ftp.informatik.uni-stuttgart.de [129.69.211.2] in directory /pub/SNNS as 
!    SNNSv4.0.tar.gz (1.4 MB, Source code) and SNNSv4.0.Manual.ps.gz (1 MB,
     Documentation). There are also various other files in this directory
     (e.g. the source version of the manual, a Sun Sparc executable, older
--- 176,180 ----
     a workstation cluster. Available for ftp from
     ftp.informatik.uni-stuttgart.de [129.69.211.2] in directory /pub/SNNS as 
!    SNNSv4.1.tar.gz (1.4 MB, Source code) and SNNSv4.1.Manual.ps.gz (1 MB,
     Documentation). There are also various other files in this directory
     (e.g. the source version of the manual, a Sun Sparc executable, older
***************
*** 180,186 ****
     versions of the software, some papers, an implementation manual, and the
     software in several smaller parts). It may be best to first have a look
!    at the file SNNSv4.0.Readme. This file contains a somewhat more elaborate
     short description of the simulator. More information can be found in the
!    WWW under http://vasarely.informatik.uni-stuttgart.de/snns/snns.html 
  
  11. Aspirin/MIGRAINES
--- 181,188 ----
     versions of the software, some papers, an implementation manual, and the
     software in several smaller parts). It may be best to first have a look
!    at the file SNNSv4.1.Readme. This file contains a somewhat more elaborate
     short description of the simulator. More information can be found in the
!    WWW under 
!    http://www.informatik.uni-stuttgart.de/ipvr/bv/projekte/snns/snns.html 
  
  11. Aspirin/MIGRAINES
***************
*** 381,406 ****
     imag.imag.fr: archive/pygmalion/pygmalion.tar.Z). 
  
! 22. Basis-of-AI-backprop
! ++++++++++++++++++++++++
  
!    Earlier versions have been posted in comp.sources.misc and people around
!    the world have used them and liked them. This package is free for
!    ordinary users but shareware for businesses and government agencies
!    ($200/copy, but then for this you get the professional version as well).
!    I do support this package via email. Some of the highlights are: 
!     o in C for UNIX and DOS and DOS binaries 
!     o gradient descent, delta-bar-delta and quickprop 
!     o extra fast 16-bit fixed point weight version as well as a conventional
!       floating point version 
!     o recurrent networks 
!     o numerous sample problems 
!    Available for ftp from ftp.mcs.com in directory /mcsnet.users/drt. Or see
!    the WWW page http://www.mcs.com/~drt/home.html. The expanded professional
!    version is $30/copy for ordinary individuals including academics and
!    $200/copy for businesses and government agencies (improved user
!    interface, more activation functions, networks can be read into your own
!    programs, dynamic node creation, weight decay, SuperSAB). More details
!    can be found in the documentation for the student version. Contact: Don
!    Tveter; 5228 N. Nashville Ave.; Chicago, Illinois 60656; drt@mcs.com 
  
  23. Matrix Backpropagation
--- 383,410 ----
     imag.imag.fr: archive/pygmalion/pygmalion.tar.Z). 
  
! 22. Basis-of-AI-NN Software
! +++++++++++++++++++++++++++
! 
!    DOS and UNIX C source code, examples and DOS binaries are available in
!    the following different program sets: 
! 
!       [backprop, quickprop, delta-bar-delta, recurrent networks],
!       [simple clustering, k-nearest neighbor, LVQ1, DSM],
!       [Hopfield, Boltzman, interactive activation network],
!       [interactive activation network],
!       [feedforward counterpropagation],
!       [ART I],
!       [a simple BAM] and
!       [the linear pattern classifier]
!       
! 
!    For details see: Basis of AI NN software at
!    http://www.mcs.com/~drt/svbp.html . 
! 
!    An improved professional version of backprop is also available, $30 for
!    regular people, $200 for businesses and governmental agencies. See: Basis
!    of AI Professional Backprop at http://www.mcs.com/~drt/probp.html . 
  
!    Questions to: Don Tveter, drt@mcs.com 
  
  23. Matrix Backpropagation
***************
*** 554,557 ****
--- 558,603 ----
     ftp://oak.oakland.edu/SimTel/win3/math/ainet100.zip 
  
+ 31. DemoGNG
+ +++++++++++
+ 
+    This simulator is written in Java and should therefore run without
+    compilation on all platforms where a Java interpreter (or a browser with
+    Java support) is available. It implements the following algorithms and
+    neural network models: 
+     o Hard Competitive Learning (standard algorithm) 
+     o Neural Gas (Martinetz and Schulten 1991) 
+     o Competitive Hebbian Learning (Martinetz and Schulten 1991, Martinetz
+       1993) 
+     o Neural Gas with Competitive Hebbian Learning (Martinetz and Schulten
+       1991) 
+     o Growing Neural Gas (Fritzke 1995) 
+    DemoGNG is distributed under the GNU General Public License. It allows to
+    experiment with the different methods using various probability
+    distributions. All model parameters can be set interactively on the
+    graphical user interface. A teach modus is provided to observe the models
+    in "slow-motion" if so desired. It is currently not possible to
+    experiment with user-provided data, so the simulator is useful basically
+    for demonstration and teaching purposes and as a sample implementation of
+    the above algorithms. 
+ 
+    DemoGNG can be accessed most easily at 
+    http://www.neuroinformatik.ruhr-uni-bochum.de/ in the file 
+    /ini/VDM/research/gsn/DemoGNG/GNG.html where it is embedded as Java
+    applet into a Web page and is downloaded for immediate execution when you
+    visit this page. An accompanying paper entitled "Some competitive
+    learning methods" describes the implemented models in detail and is
+    available in html at the same server in the directory 
+    ini/VDM/research/gsn/JavaPaper/. 
+ 
+    It is also possible to download the complete source code and a Postscript
+    version of the paper via anonymous ftp from
+    ftp.neuroinformatik.ruhr-uni-bochum [134.147.176.16] in directory
+    /pub/software/NN/DemoGNG/. The software is in the file 
+    DemoGNG-1.00.tar.gz (193 KB) and the paper in the file sclm.ps.gz (89
+    KB). There is also a README file (9 KB). Please send any comments and
+    questions to demogng@neuroinformatik.ruhr-uni-bochum.de which will reach
+    Hartmut Loos who has written DemoGNG as well as Bernd Fritzke, the author
+    of the accompanying paper. 
+ 
  For some of these simulators there are user mailing lists. Get the packages
  and look into their documentation for further info.
***************
*** 568,571 ****
  ------------------------------------------------------------------------
  
! Next part is part 6 (of 7). Previous part is part 4. 
  
--- 614,617 ----
  ------------------------------------------------------------------------
  
! Next part is part 6 (of 7). Previous part is part 4. @
  

==> nn6.changes.body <==
*** nn6.oldbody	Tue May 28 23:00:32 1996
--- nn6.body	Fri Jun 28 23:00:32 1996
***************
*** 1,4 ****
  Archive-name: ai-faq/neural-nets/part6
! Last-modified: 1996-05-27
  URL: ftp://ftp.sas.com/pub/neural/FAQ6.html
  Maintainer: saswss@unx.sas.com (Warren S. Sarle)
--- 1,4 ----
  Archive-name: ai-faq/neural-nets/part6
! Last-modified: 1996-06-17
  URL: ftp://ftp.sas.com/pub/neural/FAQ6.html
  Maintainer: saswss@unx.sas.com (Warren S. Sarle)
***************
*** 46,50 ****
  1. nn/xnn 
  2. BrainMaker 
! 3. SAS Software for Neural Networks 
  4. NeuralWorks 
  5. MATLAB Neural Network Toolbox 
--- 46,50 ----
  1. nn/xnn 
  2. BrainMaker 
! 3. SAS Neural Network Application 
  4. NeuralWorks 
  5. MATLAB Neural Network Toolbox 
***************
*** 57,61 ****
  12. NeuFuz4 
  13. Cortex-Pro 
! 14. PARTEK 
  15. NeuroSolutions v2.0 
  16. Qnet For Windows Version 2.0 
--- 57,61 ----
  12. NeuFuz4 
  13. Cortex-Pro 
! 14. Partek 
  15. NeuroSolutions v2.0 
  16. Qnet For Windows Version 2.0 
***************
*** 73,76 ****
--- 73,77 ----
  28. OWL Neural Network Library (TM) 
  29. Neural Connection 
+ 30. Pattern Recognition Workbench Expo/PRO/PRO+ 
  
  See also http://www.emsl.pnl.gov:2080/docs/cie/neural/systems/software.html 
***************
*** 199,206 ****
       Introduction to Neural Networks 324 pp book
  
! 3. SAS Software for Neural Networks
! +++++++++++++++++++++++++++++++++++
  
!        Name: SAS Software
  
               In USA:                 In Europe:
--- 200,207 ----
       Introduction to Neural Networks 324 pp book
  
! 3. SAS Neural Network Application
! +++++++++++++++++++++++++++++++++
  
!        Name: SAS Neural Network Application
  
               In USA:                 In Europe:
***************
*** 212,239 ****
        Phone: (919) 677-8000          (49) 6221 4160
          Fax: (919) 677-4444          (49) 6221 474 850
!       Email: saswss@unx.sas.com (Neural net macros)
!              sasjub@unx.sas.com or eurgxh@mvs.sas.com (Neural net GUI)
!         URL: ftp://ftp.sas.com/pub/neural/README
!    Operating systems for macros: MS Windows (3.1, 95, NT) IBM OS/2 (2.1, 3.0, Warp),
!       MVS, VM/CMS, VSE/ESA, OpenVMS, ULTRIX, Digital UNIX, DG-UX, HP/UX,
!       Solaris, AIX, ConvexOS, MIPS ABI, INTEL ABI, Novell UNIXware,
!       Macintosh System 7.5, PowerPC 
!    Operating systems for GUI: Windows 3.1, OS/2, HP/UX, Solaris, AIX
!    System requirements: Lots of memory and disk space, floating point hardware
!    Comments: Oriented toward data analysis and statistical applications
! 
!    Several SAS macros for feedforward neural nets are available for release
!    6.08 and later. For a list of macros and articles relating to neural
!    networks, see ftp://ftp.sas.com/pub/neural/README. The macros are free
!    but won't do you any good unless you have licensed the required SAS
!    products. If you want information about licensing SAS products, call one
!    of the phone numbers listed above and ask for Software Sales. 
! 
!    There is also the SAS Neural Network Application including a graphical
!    user interface, on-site training and customisation. For prices and other
!    information, send email to sasjub@unx.sas.com (North America) or
!    eurgxh@mvs.sas.com (Europe). TNN is an elaborate system of macros for
!    feedforward neural nets including multilayer perceptrons, radial basis
!    functions, statistical versions of counterpropagation and learning vector
     quantization, a variety of built-in activation and error functions,
     multiple hidden layers, direct input-output connections, missing value
--- 213,224 ----
        Phone: (919) 677-8000          (49) 6221 4160
          Fax: (919) 677-4444          (49) 6221 474 850
!       Email: sasjub@unx.sas.com      eurgxh@mvs.sas.com
! 
!    Operating systems: Windows 3.1, OS/2, HP/UX, Solaris, AIX
! 
!    The SAS Neural Network Application trains a variety of neural nets and
!    includes a graphical user interface, on-site training and customisation.
!    Features include multilayer perceptrons, radial basis functions,
!    statistical versions of counterpropagation and learning vector
     quantization, a variety of built-in activation and error functions,
     multiple hidden layers, direct input-output connections, missing value
***************
*** 241,253 ****
     and multiple preliminary optimizations from random initial values to
     avoid local minima. Training is done by state-of-the-art numerical
!    optimization algorithms instead of tedious backprop. Maximum likelihood
!    and hierarchical Bayesian training are provided for a wide range of noise
!    distributions. TNN requires the SAS/OR product in release 6.08 or later.
!    Release 6.10 or later is strongly recommended. Release 6.10 is required
!    for the plotting macros to use SAS/INSIGHT. 
! 
!    NETIML is a collection of SAS/IML modules and macros for training and
!    running multilayer perceptrons with a variety of activation and error
!    functions. NETIML requires the SAS/IML product in release 6.08 or later. 
  
  4. NeuralWorks
--- 226,230 ----
     and multiple preliminary optimizations from random initial values to
     avoid local minima. Training is done by state-of-the-art numerical
!    optimization algorithms instead of tedious backprop. 
  
  4. NeuralWorks
***************
*** 592,720 ****
     email: <m.reiss@kcl.ac.uk>. 
  
! 14. PARTEK
  ++++++++++
  
!    PARTEK is a powerful, integrated environment for visual and quantitative
!    data analysis and pattern recognition. Drawing from a wide variety of
!    disciplines including Artificial Neural Networks, Fuzzy Logic, Genetic
!    Algorithms, and Statistics, PARTEK integrates data analysis and modeling
!    tools into an easy to use "point and click" system. The following modules
!    are available from PARTEK; functions from different modules are
!    integrated with each other whereever possible: 
!    1. The PARTEK/AVB - The Analytical/Visual Base. (TM) 
! 
!        * Analytical Spreadsheet (TM)
!          The Analytical Spreadsheet is a powerful and easy to use data analysis,
!          transformations, and visualization tool.  Some features include:
!             - import native format ascii/binary data
!             - recognition and resolution of missing data
!             - complete set of common mathematical & statistical functions
!             - contingency table analysis / correspondence analysis
!             - univariate histogram analysis
!             - extensive set of smoothing and normalization transformations
!             - easily and quickly plot color-coded 1-D curves and histograms,
!               2-D, 3-D, and N-D mapped scatterplots, highlighting selected
!               patterns
!             - Command Line (Tcl) and Graphical Interface
! 
!        * Pattern Visualization System (TM)
!          The Pattern Visualization System offers powerful tools for
!          visual analysis of the patterns in your data.  Some features include:
!             - automatically maps N-D data down to 3-D for visualization of
!               *all* of your variables at once
!             - hard copy color Postscript output
!             - a variety of color-coding, highlighting, and labeling options
!               allow you to generate meaningful graphics
! 
!        * Data Filters
!          Filter out selected rows and/or columns of your data for flexible and
!          efficient cross-validation, jackknifing, bootstrapping, feature set
!          evaluation, and more.
! 
!        * Random # Generators
!          Generate random numbers from any of the following parameterized
!          distributions:
!             - uniform, normal, exponential, gamma, binomial, poisson
! 
!        * Many distance/similarity metrics
!          Choose the appropriate distance metric for your data:
!             - euclidean, mahalanobis, minkowski, maximum value, absolute value,
!               shape coefficient, cosine coefficient, pearson correlation,
!               rank correlation, kendall's tau, canberra, and bray-curtis
! 
!        * Tcl/Tk command line interface
! 
!    2. The PARTEK/DSA - Data Structure Analysis Module 
! 
!        * Principal Components Analysis and Regression
!          Also known as Eigenvector Projection or Karhunen-Loeve Expansions,
!          PCA removes redundant information from your data.
!             - component analysis, correlate PC's with original variables
!             - choice of covariance, correlation, or product dispersion matrices
!             - choice of eigenvector, y-score, and z-score projections
!             - view SCREE and log-eigenvalue plots
! 
!        * Cluster Analysis
!          Does the data form groups?  How many?  How compact?  Cluster Analysis
!          is the tool to answer these questions.
!             - choose between several distance metrics
!             - optionally weight individual patterns
!             - manually or auto-select the cluster number and initial centers
!             - dump cluster counts, mean, cluster to cluster distances,
!               cluster variances, and cluster labeled data to a matrix viewer or
!               the Analytical Spreadsheet for further analysis
!             - visualize n-dimensional clustering
!             - assess goodness of partion using several internal and external
!               criteria metrics
! 
!        * N-Dimensional Histogram Analysis
!          Among the most inportant questions a researcher needs to know when
!          analyzing patterns is whether or not the patterns can distinguish
!          different classes of data.  N-D Histogram Analysis is one tool to
!          answer this question.
!             - measures histogram overlap in n-dimensional space
!             - automatically find the best subset of features
!             - rank the overlap of your best feature combinations
! 
!        * Non-Linear Mapping
!          NLM is an iterative algorithm for visually analyzing the structure of
!          n-dimensional data.  NLM produces a non-linear mapping of data which
!          preserves interpoint distances of n-dimensional data while reducing
!          to a lower dimensionality - thus preserving the structure of the data.
!             - visually analyze structure of n-dimensional data
!             - track progress with error curves
!             - orthogonal, PCA, and random initialization
! 
!    3. The PARTEK/CP - Classification and Prediction Module 
! 
!        * Multi-Layer Perceptron
!          The most popular among the neural pattern recognition tools is the MLP.
!          PARTEK takes the MLP to a new dimension, by allowing the network to
!          learn by adapting ALL of its parameters to solve a problem.
!             - adapts output bias, neuron activation steepness, and neuron
!               dynamic range, as well as weights and input biases
!             - auto-scaling at input and output - no need to rescale your data
!             - choose between sigmoid, gaussian, linear, or mixture of neurons
!             - learning rate, momentum can be set independently for each parameter
!             - variety of learning methods and network initializations
!             - view color-coded network, error, etc as network trains, tests, runs
! 
!        * Learning Vector Quantization
!          Because LVQ is a multiple prototype classifier, it adapts to identify
!          multiple sub-groups within classes
!             - LVQ1, LVQ2, and LVQ3 training methods
!             - 3 different functions for adapting learning rate
!             - choose between several distance metrics
!             - fuzzy and crisp classifications
!             - set number of prototypes individually for each class
! 
!        * Bayesian Classifier
!          Bayes methods are the statistical decision theory approach to
!          classification.  This classifier uses statistical properties of your
!          data to develop a classification model.
! 
!    PARTEK is available on HP, IBM, Silicon Graphics, and SUN workstations.
!    For more information, send email to "info@partek.com" or call
!    (314)926-2329. 
  
  15. NeuroSolutions v2.0
--- 569,591 ----
     email: <m.reiss@kcl.ac.uk>. 
  
! 14. Partek
  ++++++++++
  
!    Partek is a young, growing company dedicated to providing our customers
!    with the best software and services for data analysis and modeling. We do
!    this by providing a combination of statistical analysis and modeling
!    techniques and modern tools such as neural networks, fuzzy logic, genetic
!    algorithms, and data visualization. These powerful analytical tools are
!    delivered with high quality, state of the art software. 
! 
!    Please visit our home on the World Wide Web: www.partek.com 
! 
!    Partek Incorporated 
!    5988 Mid Rivers Mall Dr. 
!    St. Charles, MO 63304 
!    voice: 314-926-2329 
!    fax: 314-441-6881 
!    email: info@partek.com 
!    http://www.partek.com/ 
  
  15. NeuroSolutions v2.0
***************
*** 1448,1451 ****
--- 1319,1375 ----
     * Science - classify climate types
  
+ 
+ 30. Pattern Recognition Workbench Expo/PRO/PRO+
+ +++++++++++++++++++++++++++++++++++++++++++++++
+ 
+    Name: Pattern Recognition Workbench Expo/PRO/PRO+ 
+    Company: Unica Technologies, Inc. 
+    Address: 55 Old Bedford Rd., Lincoln, MA 01773 USA 
+    Phone, Fax: (617) 259-5900, (617) 259-5901 
+    Email: unica@unica-usa.com 
+ 
+    Basic capabilities: 
+     o Supported architectures and training methods include backpropagation,
+       radial basis functions, K nearest neighbors, Gaussian mixture, Nearest
+       cluster, K means clustering, logistic regression, and more. 
+     o Experiment managers interactively control model development by
+       walking you through problem definition and set-up; 
+        o Provides icon-based management of experiments and reports. 
+        o Easily performs automated input feature selection searches and
+          automated algorithm parameter searches (using intelligent search
+          methods including genetic algorithms) 
+        o Statistical model validation (cross-validation, bootstrap
+          validation, sliding-window validation). 
+     o "Giga-spreadsheets" hold 16,000 columns by 16 million rows of data
+       each (254 billion cells)! 
+     o Intelligent spreadsheet supports data preprocessing and manipulation
+       with over 100 built-in macro functions. Custom user functions can be
+       built to create a library of re-usable macro functions. 
+     o C source code generation, DLLs, and real-time application linking via
+       DDE/OLE links. 
+     o Interactive graphing and data visualization (line, histogram, 2D and
+       3D scatter graphs). 
+ 
+    Operating system: Windows 3.1, WFW 3.11, Windows 95, Windows NT (16-
+    and 32-bit versions available) 
+ 
+    System requirements: Intel 486+, 8+ MB memory, 5+ MB disk space 
+ 
+    Approx. price: software starts at $995.00 (call for more info) 
+    Solving Pattern Recognition Problems text book: $49.95 
+    Money-back guarantee 
+ 
+    Comments: Pattern Recognition Workbench (PRW) is a comprehensive
+    environment/tool for solving pattern recognition problems using neural
+    network, machine learning, and traditional statistical technologies. With
+    an intuitive, easy-to-use graphical interface, PRW has the flexibility to
+    address many applications. With features such as automated model
+    generation (via input feature selection and algorithm parameter
+    searches), experiment management, and statistical validation, PRW
+    provides all the necessary tools from formatting and preprocessing your
+    data to setting up, running, and evaluating experiments, to deploying
+    your solution. PRW's automated model generation capability can generate
+    literally hundreds of models, selecting the best ones from a thorough
+    search space, ultimately resulting in better solutions! 
  
  ------------------------------------------------------------------------

==> nn7.changes.body <==
*** nn7.oldbody	Tue May 28 23:00:35 1996
--- nn7.body	Fri Jun 28 23:00:37 1996
***************
*** 350,354 ****
  FAQ maintainer at saswss@unx.sas.com. 
  
-  o What is the curse of dimensionality? 
   o How many training cases do I need? 
   o How should I split the data into training and validation sets? 
--- 350,353 ----
***************
*** 360,364 ****
   o How to handle missing data? 
   o Should NNs be used in safety-critical applications? 
-  o What does unsupervised learning learn? 
   o My net won't learn! What should I do??? 
   o My net won't generalize! What should I do??? 
--- 359,362 ----
***************
*** 426,429 ****
--- 424,428 ----
   o Ed Rosenfeld <IER@aol.com> 
   o Franco Insana <INSANA@asri.edu> 
+  o Janne Sinkkonen <janne@iki.fi> 
   o Javier Blasco-Alberto <jblasco@ideafix.cps.unizar.es> 
   o Jean-Denis Muller <jdmuller@vnet.ibm.com> 
-- 

Warren S. Sarle       SAS Institute Inc.   The opinions expressed here
saswss@unx.sas.com    SAS Campus Drive     are mine and not necessarily
(919) 677-8000        Cary, NC 27513, USA  those of SAS Institute.
