Newsgroups: comp.ai.neural-nets
Path: cantaloupe.srv.cs.cmu.edu!bb3.andrew.cmu.edu!newsfeed.pitt.edu!godot.cc.duq.edu!newsgate.duke.edu!news.mathworks.com!newsfeed.internetmci.com!in1.uu.net!news.interpath.net!sas!newshost.unx.sas.com!hotellng.unx.sas.com!saswss
From: saswss@unx.sas.com (Warren Sarle)
Subject: changes to "comp.ai.neural-nets FAQ" -- monthly posting
Originator: saswss@hotellng.unx.sas.com
Sender: news@unx.sas.com (Noter of Newsworthy Events)
Message-ID: <nn.changes.posting_833338836@hotellng.unx.sas.com>
Supersedes: <nn.changes.posting_830746833@hotellng.unx.sas.com>
Date: Wed, 29 May 1996 03:00:38 GMT
Expires: Wed, 3 Jul 1996 03:00:36 GMT
X-Nntp-Posting-Host: hotellng.unx.sas.com
Reply-To: saswss@unx.sas.com (Warren Sarle)
Organization: SAS Institute Inc., Cary, NC, USA
Keywords: modifications, new, additions, deletions
Followup-To: comp.ai.neural-nets
Lines: 2117

==> nn1.changes.body <==
*** nn1.oldbody	Sun Apr 28 23:00:10 1996
--- nn1.body	Tue May 28 23:00:07 1996
***************
*** 1,4 ****
  Archive-name: ai-faq/neural-nets/part1
! Last-modified: 1996-03-27
  URL: ftp://ftp.sas.com/pub/neural/FAQ.html
  Maintainer: saswss@unx.sas.com (Warren S. Sarle)
--- 1,4 ----
  Archive-name: ai-faq/neural-nets/part1
! Last-modified: 1996-05-27
  URL: ftp://ftp.sas.com/pub/neural/FAQ.html
  Maintainer: saswss@unx.sas.com (Warren S. Sarle)
***************
*** 58,61 ****
--- 58,62 ----
  
     What is this newsgroup for? How shall it be used?
+    Where is comp.ai.neural-nets archived?
     What is a neural network (NN)?
     What can you do with an NN and what not?
***************
*** 73,76 ****
--- 74,89 ----
     Why use activation functions?
     What is a softmax activation function?
+    How do MLPs compare with RBFs?
+    Should I normalize/standardize/rescale the data?
+    What is ART?
+    What is PNN?
+    What is GRNN?
+    What about Genetic Algorithms and Evolutionary Computation?
+    What about Fuzzy Logic?
+ 
+ Part 3: Generalization
+ 
+    How is generalization possible?
+    How does noise affect generalization?
     What is overfitting and how can I avoid it?
     What is jitter? (Training with noise)
***************
*** 78,101 ****
     What is weight decay?
     What is Bayesian estimation?
     How many hidden units should I use?
     How can generalization error be estimated?
     What are cross-validation and bootstrapping?
-    Should I normalize/standardize/rescale the data?
-    What is ART?
-    What is PNN?
-    What is GRNN?
-    What about Genetic Algorithms and Evolutionary Computation?
-    What about Fuzzy Logic?
  
! Part 3: Information resources
  
     Good literature about Neural Networks?
!    Any journals and magazines about Neural Networks?
     The most important conferences concerned with Neural Networks?
     Neural Network Associations?
     Other sources of information about NNs?
- 
- Part 4: Datasets
- 
     Databases for experimentation with NNs?
  
--- 91,106 ----
     What is weight decay?
     What is Bayesian estimation?
+    How many hidden layers should I use?
     How many hidden units should I use?
     How can generalization error be estimated?
     What are cross-validation and bootstrapping?
  
! Part 4: Books, data, etc.
  
     Good literature about Neural Networks?
!    Journals and magazines about Neural Networks?
     The most important conferences concerned with Neural Networks?
     Neural Network Associations?
     Other sources of information about NNs?
     Databases for experimentation with NNs?
  
***************
*** 247,250 ****
--- 252,277 ----
  ------------------------------------------------------------------------
  
+ Subject: Where is comp.ai.neural-nets archived? 
+ ================================================
+ 
+ Two archives are available for comp.ai.neural-nets: 
+ 
+  o ftp://ftp.cs.cmu.edu/user/ai/pubs/news/comp.ai.neural-nets 
+  o http://asknpac.npac.syr.edu 
+ 
+    According to Gang Cheng, gcheng@npac.syr.edu, the Northeast Parallel
+    Architecture Center (NPAC), Syracue University, maintains an archive
+    system for searching/reading USENET newsgroups and mailing lists. Two
+    search/navigation interfaces accessible by any WWW browser are provided:
+    one is an advanced search interface allowing queries with various options
+    such as query by mail header, by date, by subject (keywords), by sender.
+    The other is a Hypermail-like navigation interface for users familiar
+    with Hypermail. 
+ 
+ For more information on newsgroup archives, see 
+ http://starbase.neosoft.com/~claird/news.lists/newsgroup_archives.html 
+ 
+ ------------------------------------------------------------------------
+ 
  Subject: What is a neural network (NN)?
  =======================================
***************
*** 357,360 ****
--- 384,390 ----
   o The Applications Corner, provided by NeuroDimension, Inc., at 
     http://www.nd.com/appcornr/purpose.htm. 
+  o Athanasios Episcopos's web page with References on Neural Net
+    Applications to Finance and Economics at 
+    http://phoenix.som.clarkson.edu/~episcopo/neurofin.html. 
   o Chen, C.H., ed. (1996) Fuzzy Logic and Neural Network Handbook, NY:
     McGraw-Hill, ISBN 0-07-011189-8. 

==> nn2.changes.body <==
*** nn2.oldbody	Sun Apr 28 23:00:16 1996
--- nn2.body	Tue May 28 23:00:13 1996
***************
*** 1,4 ****
  Archive-name: ai-faq/neural-nets/part2
! Last-modified: 1996-04-05
  URL: ftp://ftp.sas.com/pub/neural/FAQ2.html
  Maintainer: saswss@unx.sas.com (Warren S. Sarle)
--- 1,4 ----
  Archive-name: ai-faq/neural-nets/part2
! Last-modified: 1996-05-27
  URL: ftp://ftp.sas.com/pub/neural/FAQ2.html
  Maintainer: saswss@unx.sas.com (Warren S. Sarle)
***************
*** 21,32 ****
     Why use activation functions?
     What is a softmax activation function?
!    What is overfitting and how can I avoid it?
!    What is jitter? (Training with noise)
!    What is early stopping?
!    What is weight decay?
!    What is Bayesian estimation?
!    How many hidden units should I use?
!    How can generalization error be estimated?
!    What are cross-validation and bootstrapping?
     Should I normalize/standardize/rescale the data?
     What is ART?
--- 21,25 ----
     Why use activation functions?
     What is a softmax activation function?
!    How do MLPs compare with RBFs?
     Should I normalize/standardize/rescale the data?
     What is ART?
***************
*** 36,41 ****
     What about Fuzzy Logic?
  
! Part 3: Information resources
! Part 4: Datasets
  Part 5: Free software
  Part 6: Commercial software
--- 29,34 ----
     What about Fuzzy Logic?
  
! Part 3: Generalization
! Part 4: Books, data, etc.
  Part 5: Free software
  Part 6: Commercial software
***************
*** 434,443 ****
  ===============================
  
! Consider a multilayer perceptron. Choose any hidden unit or output unit.
! Let's say there are N inputs to that unit, which define an N-dimensional
! space. The given unit draws a hyperplane through that space, producing an
! "on" output on one side and an "off" output on the other. (With sigmoid
! units the plane will not be sharp -- there will be some gray area of
! intermediate values near the separating plane -- but ignore this for now.) 
  
  The weights determine where this hyperplane lies in the input space. Without
--- 427,437 ----
  ===============================
  
! Consider a multilayer perceptron with any of the usual sigmoid activation
! functions. Choose any hidden unit or output unit. Let's say there are N
! inputs to that unit, which define an N-dimensional space. The given unit
! draws a hyperplane through that space, producing an "on" output on one side
! and an "off" output on the other. (With sigmoid units the plane will not be
! sharp -- there will be some gray area of intermediate values near the
! separating plane -- but ignore this for now.) 
  
  The weights determine where this hyperplane lies in the input space. Without
***************
*** 448,454 ****
  bias would ALL be constrained to pass through the origin. 
  
! The "universal approximation" property of multilayer perceptrons does not
! hold if you omit the bias units. 
  
  ------------------------------------------------------------------------
  
--- 442,457 ----
  bias would ALL be constrained to pass through the origin. 
  
! The "universal approximation" property of multilayer perceptrons with most
! commonly-used hidden-layer activation functions does not hold if you omit
! the bias units. But Hornik (1993) shows that a sufficient condition for the
! universal approximation property without biases is that no derivative of the
! activation function vanishes at the origin, which implies that with the
! usual sigmoid activation functions, a fixed nonzero bias can be used. 
  
+ Reference: 
+ 
+    Hornik, K. (1993), "Some new results on neural network approximation,"
+    Neural Networks, 6, 1069-1072. 
+ 
  ------------------------------------------------------------------------
  
***************
*** 560,1222 ****
  ------------------------------------------------------------------------
  
! Subject: What is overfitting and how can I avoid it? 
! =====================================================
! 
! The critical issue in developing a neural network is generalization: how
! well will the network make predictions for cases that are not in the
! training set? NNs, like other flexible nonlinear estimation methods such as
! kernel regression and smoothing splines, can suffer from either underfitting
! or overfitting. A network that is not sufficiently complex can fail to
! detect fully the signal in a complicated data set, leading to underfitting.
! A network that is too complex may fit the noise, not just the signal,
! leading to overfitting. Overfitting is especially dangerous because it can
! easily lead to predictions that are far beyond the range of the training
! data with many of the common types of NNs. But underfitting can also produce
! wild predictions in multilayer perceptrons, even with noise-free data. There
! are graphical examples of overfitting and underfitting in Sarle (1995). 
! 
! The best way to avoid overfitting is to use lots of training data. If you
! have at least 30 times as many training cases as there are weights in the
! network, you are unlikely to suffer from overfitting. But you can't
! arbitrarily reduce the number of weights for fear of underfitting. 
! 
! Given a fixed amount of training data, there are at least five effective
! approaches to avoiding underfitting and overfitting, and hence getting good
! generalization: 
! 
!  o Model selection 
!  o Jittering 
!  o Weight decay 
!  o Early stopping 
!  o Bayesian estimation 
! 
! The complexity of a network is related to both the number of weights and the
! size of the weights. Model selection is concerned with the number of
! weights, and hence the number of hidden units and layers. The other
! approaches listed above are concerned, directly or indirectly, with the size
! of the weights. 
! 
! The issue of overfitting/underfitting is intimately related to the
! bias/variance tradeoff in nonparametric estimation (Geman, Bienenstock, and
! Doursat 1992). 
! 
! References: 
! 
!    Geman, S., Bienenstock, E. and Doursat, R. (1992), "Neural Networks and
!    the Bias/Variance Dilemma", Neural Computation, 4, 1-58. 
! 
!    Sarle, W.S. (1995), "Stopped Training and Other Remedies for
!    Overfitting," to appear in Proceedings of the 27th Symposium on the
!    Interface, ftp://ftp.sas.com/pub/neural/inter95.ps.Z (this is a very
!    large compressed postscript file, 747K, 10 pages) 
! 
! ------------------------------------------------------------------------
! 
! Subject: What is jitter? (Training with noise) 
! ===============================================
! 
! Jitter is artificial noise deliberately added to the inputs during training.
! Training with jitter is a form of smoothing related to kernel regression
! (see "What is GRNN?"). It is also closely related to regularization methods
! such as weight decay and ridge regression (see "What is weight decay?"). 
! 
! Training with jitter works because the functions that we want NNs to learn
! are mostly smooth. NNs can learn functions with discontinuities, but the
! functions must be continuous in a finite number of regions if our network is
! restricted to a finite number of hidden units. 
! 
! In other words, if we have two cases with similar inputs, the desired
! outputs will usually be similar. That means we can take any training case
! and generate new training cases by adding small amounts of jitter to the
! inputs. As long as the amount of jitter is sufficiently small, we can assume
! that the desired output will not change enough to be of any consequence, so
! we can just use the same target value. The more training cases, the merrier,
! so this looks like a convenient way to improve training. But too much jitter
! will obviously produce garbage, while too little jitter will have little
! effect (Koistinen and Holmstro\"m 1992). 
! 
! Consider any point in the input space, not necessarily one of the original
! training cases. That point could possibly arise as a jittered input as a
! result of jittering any of several of the original neighboring training
! cases. The average target value at the given input point will be a weighted
! average of the target values of the original training cases. For an infinite
! number of jittered cases, the weights will be proportional to the
! probability densities of the jitter distribution, located at the original
! training cases and evaluated at the given input point. Thus the average
! target values given an infinite number of jittered cases will, by
! definition, be the Nadaraya-Watson kernel regression estimator using the
! jitter density as the kernel. Hence, training with jitter is an
! approximation to training on the kernel regression estimator. And choosing
! the amount (variance) of jitter is equivalent to choosing the bandwidth of
! the kernel regression estimator (Scott 1992). 
! 
! When studying nonlinear models such as feedforward NNs, it is often helpful
! first to consider what happens in linear models, and then to see what
! difference the nonlinearity makes. So let's consider training with jitter in
! a linear model. Notation: 
! 
!    x_ij is the value of the jth input (j=1, ..., p) for the
!         ith training case (i=1, ..., n).
!    X={x_ij} is an n by p matrix.
!    y_i is the target value for the ith training case.
!    Y={y_i} is a column vector.
! 
! Without jitter, the least-squares weights are B = inv(X'X)X'Y, where
! "inv" indicates a matrix inverse and "'" indicates transposition. Note that
! if we replicate each training case c times, or equivalently stack c copies
! of the X and Y matrices on top of each other, the least-squares weights are
! inv(cX'X)cX'Y = (1/c)inv(X'X)cX'Y = B, same as before. 
! 
! With jitter, x_ij is replaced by c cases x_ij+z_ijk, k=1, ...,
! c, where z_ijk is produced by some random number generator, usually with
! a normal distribution with mean 0 and standard deviation s, and the 
! z_ijk's are all independent. In place of the n by p matrix X, this
! gives us a big matrix, say Q, with cn rows and p columns. To compute the
! least-squares weights, we need Q'Q. Let's consider the jth diagonal
! element of Q'Q, which is 
! 
!                    2           2       2
!    sum (x_ij+z_ijk) = sum (x_ij + z_ijk + 2 x_ij z_ijk)
!    i,k                i,k
! 
! which is approximately, for c large, 
! 
!              2     2
!    c(sum x_ij  + ns ) 
!       i
! 
! which is c times the corresponding diagonal element of X'X plus ns^2.
! Now consider the u,vth off-diagonal element of Q'Q, which is 
! 
!    sum (x_iu+z_iuk)(x_iv+z_ivk)
!    i,k
! 
! which is approximately, for c large, 
! 
!    c(sum x_iu x_iv)
!       i
! 
! which is just c times the corresponding element of X'X. Thus, Q'Q equals
! c(X'X+ns^2I), where I is an identity matrix of appropriate size.
! Similar computations show that the crossproduct of Q with the target values
! is cX'Y. Hence the least-squares weights with jitter of variance s^2 are
! given by 
! 
!        2                2                    2
!    B(ns ) = inv(c(X'X+ns I))cX'Y = inv(X'X+ns I)X'Y
! 
! In the statistics literature, B(ns^2) is called a ridge regression
! estimator with ridge value ns^2. 
! 
! If we were to add jitter to the target values Y, the cross-product X'Y
! would not be affected for large c for the same reason that the off-diagonal
! elements of X'X are not afected by jitter. Hence, adding jitter to the
! targets will not change the optimal weights; it will just slow down
! training. 
! 
! The ordinary least squares training criterion is (Y-XB)'(Y-XB).
! Weight decay uses the training criterion (Y-XB)'(Y-XB)+d^2B'B,
! where d is the decay rate. Weight decay can also be implemented by
! inventing artificial training cases. Augment the training data with p new
! training cases containing the matrix dI for the inputs and a zero vector
! for the targets. To put this in a formula, let's use A;B to indicate the
! matrix A stacked on top of the matrix B, so (A;B)'(C;D)=A'C+B'D.
! Thus the augmented inputs are X;dI and the augmented targets are Y;0,
! where 0 indicates the zero vector of the appropriate size. The squared error
! for the augmented training data is: 
! 
!    (Y;0-(X;dI)B)'(Y;0-(X;dI)B)
!    = (Y;0)'(Y;0) - 2(Y;0)'(X;dI)B + B'(X;dI)'(X;dI)B
!    = Y'Y - 2Y'XB + B'(X'X+d^2I)B
!    = Y'Y - 2Y'XB + B'X'XB + B'(d^2I)B
!    = (Y-XB)'(Y-XB)+d^2B'B
! 
! which is the weight-decay training criterion. Thus the weight-decay
! estimator is: 
! 
!     inv[(X;dI)'(X;dI)](X;dI)'(Y;0) = inv(X'X+d^2I)X'Y
! 
! which is the same as the jitter estimator B(d^2), i.e. jitter with
! variance d^2/n. The equivalence between the weight-decay estimator and
! the jitter estimator does not hold for nonlinear models unless the jitter
! variance is small relative to the curvature of the nonlinear function.
! However, the equivalence of the two estimators for linear models suggests
! that they will often produce similar results even for nonlinear models. 
! 
! B(0) is obviously the ordinary least-squares estimator. It can be shown
! that as s^2 increases, the Euclidean norm of B(ns^2) decreases; in
! other words, adding jitter causes the weights to shrink. It can also be
! shown that under the usual statistical assumptions, there always exists some
! value of ns^2 > 0 such that B(ns^2) provides better expected
! generalization than B(0). Unfortunately, there is no way to calculate a
! value of ns^2 from the training data that is guaranteed to improve
! generalization. There are other types of shrinkage estimators called Stein
! estimators that do guarantee better generalization than B(0), but I'm not
! aware of a nonlinear generalization of Stein estimators applicable to neural
! networks. 
! 
! The statistics literature describes numerous methods for choosing the ridge
! value. The most obvious way is to estimate the generalization error by
! cross-validation, generalized cross-validation, or bootstrapping, and to
! choose the ridge value that yields the smallest such estimate. There are
! also quicker methods based on empirical Bayes estimation, one of which
! yields the following formula, useful as a first guess: 
! 
!     2    p(Y-XB(0))'(Y-XB(0))
!    s   = --------------------
!     1      n(n-p)B(0)'B(0)
! 
! You can iterate this a few times: 
! 
!     2      p(Y-XB(0))'(Y-XB(0))
!    s     = --------------------
!     l+1              2     2
!             n(n-p)B(s )'B(s )
!                      l     l
! 
! Note that the more training cases you have, the less noise you need. 
! 
! References: 
! 
!    Bishop, C.M. (1995), Neural Networks for Pattern Recognition, Oxford:
!    Oxford University Press. 
! 
!    Koistinen, P. and Holmstro\"m, L. (1992) "Kernel regression and
!    backpropagation training with noise," NIPS4, 1033-1039. 
! 
!    Scott, D.W. (1992) Multivariate Density Estimation, Wiley. 
! 
!    Vinod, H.D. and Ullah, A. (1981) Recent Advances in Regression Methods,
!    NY: Marcel-Dekker. 
! 
! ------------------------------------------------------------------------
! 
! Subject: What is early stopping? 
! =================================
! 
! NN practitioners often use nets with many times as many parameters as
! training cases. E.g., Nelson and Illingworth (1991, p. 165) discuss training
! a network with 16,219 parameters with only 50 training cases! The method
! used is called early stopping or stopped training and proceeds as follows: 
! 
! 1. Divide the available data into training and validation sets. 
! 2. Use a large number of hidden units. 
! 3. Use very small random initial values. 
! 4. Use a slow learning rate. 
! 5. Compute the validation error rate periodically during training. 
! 6. Stop training when the validation error rate "starts to go up". 
! 
! It is crucial to realize that the validation error is not a good estimate
! of the generalization error. One method for getting an unbiased estimate of
! the generalization error is to run the net on a third set of data, the test
! set, that is not used at all during the training process. For other methods,
! see "How can generalization error be estimated?" 
! 
! Early stopping has several advantages: 
! 
!  o It is fast. 
!  o It can be applied successfully to networks in which the number of weights
!    far exceeds the sample size. 
!  o It requires only one major decision by the user: what proportion of
!    validation cases to use. 
! 
! But there are several unresolved practical issues in early stopping: 
! 
!  o How many cases do you assign to the training and validation sets? Rules
!    of thumb abound, but appear to be no more than folklore. The only
!    systematic results known to the FAQ maintainer are in Sarle (1995), which
!    deals only with the case of a single input. Amari et al. (1995) attempts
!    a theoretical approach but contains serious errors that completely
!    invalidate the results, especially the incorrect assumption that the
!    direction of approach to the optimum is distributed isotropically. 
!  o Do you split the data into training and validation sets randomly or by
!    some systematic algorithm? 
!  o How do you tell when the validation error rate "starts to go up"? It may
!    go up and down numerous times during training. The safest approach is to
!    train to convergence, then go back and see which iteration had the lowest
!    validation error. For more elaborate algorithms, see section 3.3 of 
!    ftp://ftp.ira.uka.de/pub/papers/techreports/1994/1994-21.ps.gz. 
! 
! Statisticians tend to be skeptical of stopped training because it appears to
! be statistically inefficient due to the use of the split-sample technique;
! i.e., neither training nor validation makes use of the entire sample, and
! because the usual statistical theory does not apply. However, there has been
! recent progress addressing both of the above concerns (Wang 1994). 
! 
! Early stopping is closely related to ridge regression. If the learning rate
! is sufficiently small, the sequence of weight vectors on each iteration will
! approximate the path of continuous steepest descent down the error function.
! Early stopping chooses a point along this path that optimizes an estimate of
! the generalization error computed from the validation set. Ridge regression
! also defines a path of weight vectors by varying the ridge value. The ridge
! value is often chosen by optimizing an estimate of the generalization error
! computed by cross-validation, generalized cross-validation, or bootstrapping
! (see "What are cross-validation and bootstrapping?") There always exists a
! positive ridge value that will improve the expected generalization error in
! a linear model. A similar result has been obtained for early stopping in
! linear models (Wang, Venkatesh, and Judd 1994). In linear models, the ridge
! path lies close to, but does not coincide with, the path of continuous
! steepest descent; in nonlinear models, the two paths can diverge widely. The
! relationship is explored in more detail by Sjo\"berg and Ljung (1992). 
! 
! References: 
! 
!    S. Amari, N.Murata, K.-R. Muller, M. Finke, H. Yang. Asymptotic
!    Statistical Theory of Overtraining and Cross-Validation. METR 95-06,
!    1995, Department of Mathematical Engineering and Information Physics,
!    University of Tokyo, Hongo 7-3-1, Bunkyo-ku, Tokyo 113, Japan. 
! 
!    Finnof, W., Hergert, F., and Zimmermann, H.G. (1993), "Improving model
!    selection by nonconvergent methods," Neural Networks, 6, 771-783. 
! 
!    Nelson, M.C. and Illingworth, W.T. (1991), A Practical Guide to Neural
!    Nets, Reading, MA: Addison-Wesley. 
! 
!    Sarle, W.S. (1995), "Stopped Training and Other Remedies for
!    Overfitting," to appear in Proceedings of the 27th Symposium on the
!    Interface, ftp://ftp.sas.com/pub/neural/inter95.ps.Z (this is a very
!    large compressed postscript file, 747K, 10 pages) 
! 
!    Sjo\"berg, J. and Ljung, L. (1992), "Overtraining, Regularization, and
!    Searching for Minimum in Neural Networks," Technical Report
!    LiTH-ISY-I-1297, Department of Electrical Engineering, Linkoping
!    University, S-581 83 Linkoping, Sweden, http://www.control.isy.liu.se . 
! 
!    Wang, C. (1994), A Theory of Generalisation in Learning Machines with
!    Neural Network Application, Ph.D. thesis, University of Pennsylvania. 
! 
!    Wang, C., Venkatesh, S.S., and Judd, J.S. (1994), "Optimal Stopping and
!    Effective Machine Complexity in Learning," NIPS6, 303-310. 
! 
!    Weigend, A. (1994), "On overfitting and the effective number of hidden
!    units," Proceedings of the 1993 Connectionist Models Summer School,
!    335-342. 
! 
! ------------------------------------------------------------------------
! 
! Subject: What is weight decay? 
! ===============================
! 
! Weight decay adds a penalty term to the error function. The usual penalty is
! the sum of squared weights times a decay constant. In a linear model, this
! form of weight decay is equivalent to ridge regression. See "What is
! jitter?" for more explanation of ridge regression. 
! 
! Weight decay is a subset of regularization methods. The penalty term in
! weight decay, by definition, penalizes large weights. Other regularization
! methods may involve not only the weights but various derivatives of the
! output function (Bishop 1995). 
! 
! The weight decay penalty term causes the weights to converge to smaller
! absolute values than they otherwise would. Large weights can hurt
! generalization in two different ways. Excessively large weights leading to
! hidden units can cause the output function to be too rough, possibly with
! near discontinuities. Excessively large weights leading to output units can
! cause wild outputs far beyond the range of the data if the output activation
! function is not bounded to the same range as the data. 
! 
! Other penalty terms besides the sum of squared weights are sometimes used. 
! Weight elimination (Weigend, Rumelhart, and Huberman 1991) uses: 
! 
!           (w_i)^2
!    sum -------------
!     i  (w_i)^2 + c^2
! 
! where w_i is the ith weight and c is a user-specified constant. Whereas
! decay using the sum of squared weights tends to shrink the large
! coefficients more than the small ones, weight elimination tends to shrink
! the small coefficients more, and is therefore more useful for suggesting
! subset models (pruning). 
! 
! The generalization ability of the network can depend crucially on the decay
! constant, especially with small training sets. One approach to choosing the
! decay constant is to train several networks with different amounts of decay
! and estimate the generalization error for each; then choose the decay
! constant that minimizes the estimated generalization error. Weigend,
! Rumelhart, and Huberman (1991) iteratively update the decay constant during
! training. 
! 
! There are other important considerations for getting good results from
! weight decay. You must either standardize the inputs and targets, or adjust
! the penalty term for the standard deviations of all the inputs and targets.
! It is usually a good idea to omit the biases from the penalty term. 
! 
! A fundamental problem with weight decay is that different types of weights
! in the network will usually require different decay constants for good
! generalization. At the very least, you need three different decay constants
! for input-to-hidden, hidden-to-hidden, and hidden-to-output weights.
! Adjusting all these decay constants to produce the best estimated
! generalization error often requires vast amounts of computation. 
! 
! Fortunately, there is a superior alternative to weight decay: hierarchical
! Bayesian estimation. Bayesian estimation makes it possible to estimate
! efficiently numerous decay constants. See "What is Bayesian estimation?" 
! 
! References: 
! 
!    Bishop, C.M. (1995), Neural Networks for Pattern Recognition, Oxford:
!    Oxford University Press. 
! 

==> nn3.changes.body <==
*** nn3.oldbody	Sun Apr 28 23:00:20 1996
--- nn3.body	Tue May 28 23:00:18 1996
***************
*** 1,4 ****
  Archive-name: ai-faq/neural-nets/part3
! Last-modified: 1996-03-14
  URL: ftp://ftp.sas.com/pub/neural/FAQ3.html
  Maintainer: saswss@unx.sas.com (Warren S. Sarle)
--- 1,4 ----
  Archive-name: ai-faq/neural-nets/part3
! Last-modified: 1996-05-27
  URL: ftp://ftp.sas.com/pub/neural/FAQ3.html
  Maintainer: saswss@unx.sas.com (Warren S. Sarle)
***************
*** 13,25 ****
  Part 1: Introduction
  Part 2: Learning
! Part 3: Information resources
  
!    Good literature about Neural Networks?
!    Any journals and magazines about Neural Networks?
!    The most important conferences concerned with Neural Networks?
!    Neural Network Associations?
!    Other sources of information about NNs?
  
! Part 4: Datasets
  Part 5: Free software
  Part 6: Commercial software
--- 13,31 ----
  Part 1: Introduction
  Part 2: Learning
! Part 3: Generalization
  
!    How is generalization possible?
!    How does noise affect generalization?
!    What is overfitting and how can I avoid it?
!    What is jitter? (Training with noise)
!    What is early stopping?
!    What is weight decay?
!    What is Bayesian estimation?
!    How many hidden layers should I use?
!    How many hidden units should I use?
!    How can generalization error be estimated?
!    What are cross-validation and bootstrapping?
  
! Part 4: Books, data, etc.
  Part 5: Free software
  Part 6: Commercial software
***************
*** 28,1051 ****
  ------------------------------------------------------------------------
  
! Subject: Good literature about Neural Networks?
  ===============================================
  
! The Best
! ++++++++
  
! The best popular introduction to NNs
! ------------------------------------
  
! Hinton, G.E. (1992), "How Neural Networks Learn from Experience", Scientific
! American, 267 (September), 144-151. 
  
! The best elementary textbooks on NNs
! ------------------------------------
! 
! Masters, Timothy (1994). Practical Neural Network Recipes in C++, Academic
! Press, ISBN 0-12-479040-2, US $45 incl. disks.
! "Lots of very good practical advice which most other books lack."
! 
! Weiss, S.M. & Kulikowski, C.A. (1991), Computer Systems That Learn,
! Morgan Kaufmann. ISBN 1 55860 065 5. 
! 
! The best intermediate textbooks on NNs
! --------------------------------------
! 
! Bishop, C.M. (1995). Neural Networks for Pattern Recognition, Oxford:
! Oxford University Press. ISBN 0-19-853849-9 (hardback) or 0-19-853864-2
! (paperback), xvii+482 pages.
! This is definitely the best book on neural nets for practical applications
! (rather than for neurobiological models). It is the only textbook on neural
! nets that I have seen that is statistically solid.
! "Bishop is a leading researcher who has a deep understanding of the material
! and has gone to great lengths to organize it in a sequence that makes sense.
! He has wisely avoided the temptation to try to cover everything and has
! therefore omitted interesting topics like reinforcement learning, Hopfield
! networks, and Boltzmann machines in order to focus on the types of neural
! networks that are most widely used in practical applications. He assumes
! that the reader has the basic mathematical literacy required for an
! undergraduate science degree, and using these tools he explains everything
! from scratch. Before introducing the multilayer perceptron, for example, he
! lays a solid foundation of basic statistical concepts. So the crucial
! concept of overfitting is introduced using easily visualized examples of
! one-dimensional polynomials and only later applied to neural networks. An
! impressive aspect of this book is that it takes the reader all the way from
! the simplest linear models to the very latest Bayesian multilayer neural
! networks without ever requiring any great intellectual leaps." -Geoffrey
! Hinton, from the foreword. 
! 
! Hertz, J., Krogh, A., and Palmer, R. (1991). Introduction to the Theory of
! Neural Computation. Addison-Wesley: Redwood City, California. ISBN
! 0-201-50395-6 (hardbound) and 0-201-51560-1 (paperbound)
! "My first impression is that this one is by far the best book on the topic.
! And it's below $30 for the paperback."; "Well written, theoretical (but not
! overwhelming)"; It provides a good balance of model development,
! computational algorithms, and applications. The mathematical derivations are
! especially well done"; "Nice mathematical analysis on the mechanism of
! different learning algorithms"; "It is NOT for mathematical beginner. If you
! don't have a good grasp of higher level math, this book can be really tough
! to get through."
! 
! The best advanced textbook covering NNs
! ---------------------------------------
! 
! Ripley, B.D. (1996) Pattern Recognition and Neural Networks, Cambridge:
! Cambridge University Press, ISBN 0-521-46086-7 (hardback), xii+403 pages.
! Brian Ripley's new book is an excellent sequel to Bishop (1995). Ripley
! starts up where Bishop left off, with Bayesian inference and statistical
! decision theory, and then covers some of the same material on NNs as Bishop
! but at a higher mathematical level. Ripley also covers a variety of methods
! that are not discussed, or discussed only briefly, by Bishop, such as
! tree-based methods and belief networks. While Ripley is best appreciated by
! people with a background in mathematical statistics, the numerous realistic
! examples in his book will be of interest even to beginners in neural nets.
! 
! The best books on image and signal processing with NNs
! ------------------------------------------------------
! 
! Masters, T. (1994), Signal and Image Processing with Neural Networks: A
! C++ Sourcebook, Wiley.
! 
! Cichocki, A. and Unbehauen, R. (1993). Neural Networks for Optimization
! and Signal Processing. NY: John Wiley & Sons, ISBN 0-471-930105 (hardbound),
! 526 pages, $57.95. 
! "Partly a textbook and partly a research monograph; introduces the basic
! concepts, techniques, and models related to neural networks and
! optimization, excluding rigorous mathematical details. Accessible to a wide
! readership with a differential calculus background. The main coverage of the
! book is on recurrent neural networks with continuous state variables. The
! book title would be more appropriate without mentioning signal processing.
! Well edited, good illustrations."
! 
! The best book on time-series forecasting with NNs
! -------------------------------------------------
! 
! Weigend, A.S. and Gershenfeld, N.A., eds. (1994) Time Series Prediction:
! Forecasting the Future and Understanding the Past, Addison-Wesley: Reading,
! MA. 
! 
! The best comparison of NNs with other classification methods
! ------------------------------------------------------------
! 
! Michie, D., Spiegelhalter, D.J. and Taylor, C.C. (1994), Machine Learning,
! Neural and Statistical Classification, Ellis Horwood. 
! 
! Books for the Beginner:
! +++++++++++++++++++++++
! 
! Aleksander, I. and Morton, H. (1990). An Introduction to Neural Computing.
! Chapman and Hall. (ISBN 0-412-37780-2). Comments: "This book seems to be
! intended for the first year of university education."
! 
! Beale, R. and Jackson, T. (1990). Neural Computing, an Introduction. Adam
! Hilger, IOP Publishing Ltd : Bristol. (ISBN 0-85274-262-2). Comments: "It's
! clearly written. Lots of hints as to how to get the adaptive models covered
! to work (not always well explained in the original sources). Consistent
! mathematical terminology. Covers perceptrons, error-backpropagation, Kohonen
! self-org model, Hopfield type models, ART, and associative memories."
! 
! Dayhoff, J. E. (1990). Neural Network Architectures: An Introduction. Van
! Nostrand Reinhold: New York. Comments: "Like Wasserman's book, Dayhoff's
! book is also very easy to understand".
! 
! Fausett, L. V. (1994). Fundamentals of Neural Networks: Architectures,
! Algorithms and Applications, Prentice Hall, ISBN 0-13-334186-0. Also
! published as a Prentice Hall International Edition, ISBN 0-13-042250-9.
! Sample softeware (source code listings in C and Fortran) is included in an
! Instructor's Manual. "Intermediate in level between Wasserman and
! Hertz/Krogh/Palmer. Algorithms for a broad range of neural networks,
! including a chapter on Adaptive Resonace Theory with ART2. Simple examples
! for each network."
! 
! Freeman, James (1994). Simulating Neural Networks with Mathematica,
! Addison-Wesley, ISBN: 0-201-56629-X. Helps the reader make his own NNs. The
! mathematica code for the programs in the book is also available through the
! internet: Send mail to MathSource@wri.com or try http://www.wri.com/ on the
! World Wide Web.
! 
! Haykin, S. (1994). Neural Networks, a Comprehensive Foundation.
! Macmillan, New York, NY.
! "A very readable, well written intermediate text on NNs Perspective is
! primarily one of pattern recognition, estimation and signal processing.
! However, there are well-written chapters on neurodynamics and VLSI
! implementation. Though there is emphasis on formal mathematical models of
! NNs as universal approximators, statistical estimators, etc., there are also
! examples of NNs used in practical applications. The problem sets at the end
! of each chapter nicely complement the material. In the bibliography are over
! 1000 references."
! 
! Hecht-Nielsen, R. (1990). Neurocomputing. Addison Wesley. Comments: "A
! good book", "comprises a nice historical overview and a chapter about NN
! hardware. Well structured prose. Makes important concepts clear."
! 
! McClelland, J. L. and Rumelhart, D. E. (1988). Explorations in Parallel
! Distributed Processing: Computational Models of Cognition and Perception
! (software manual). The MIT Press. Comments: "Written in a tutorial style,
! and includes 2 diskettes of NN simulation programs that can be compiled on
! MS-DOS or Unix (and they do too !)"; "The programs are pretty reasonable as
! an introduction to some of the things that NNs can do."; "There are *two*
! editions of this book. One comes with disks for the IBM PC, the other comes
! with disks for the Macintosh".
! 
! McCord Nelson, M. and Illingworth, W.T. (1990). A Practical Guide to Neural
! Nets. Addison-Wesley Publishing Company, Inc. (ISBN 0-201-52376-0).
! Comments: "No formulas at all"; "It does not have much detailed model
! development (very few equations), but it does present many areas of
! application. It includes a chapter on current areas of research. A variety
! of commercial applications is discussed in chapter 1. It also includes a
! program diskette with a fancy graphical interface (unlike the PDP
! diskette)".
! 
! Muller, B. and Reinhardt, J. (1990). Neural Networks, An Introduction.
! Springer-Verlag: Berlin Heidelberg New York (ISBN: 3-540-52380-4 and
! 0-387-52380-4). Comments: The book was developed out of a course on
! neural-network models with computer demonstrations that was taught by the
! authors to Physics students. The book comes together with a PC-diskette. The
! book is divided into three parts: (1) Models of Neural Networks; describing
! several architectures and learing rules, including the mathematics. (2)
! Statistical Physiscs of Neural Networks; "hard-core" physics section
! developing formal theories of stochastic neural networks. (3) Computer
! Codes; explanation about the demonstration programs. First part gives a nice
! introduction into neural networks together with the formulas. Together with
! the demonstration programs a 'feel' for neural networks can be developed.
! 
! Orchard, G.A. & Phillips, W.A. (1991). Neural Computation: A Beginner's
! Guide. Lawrence Earlbaum Associates: London. Comments: "Short user-friendly
! introduction to the area, with a non-technical flavour. Apparently
! accompanies a software package, but I haven't seen that yet".
! 
! Rao, V.B & H.V. (1993). C++ Neural Networks and Fuzzy Logic. MIS:Press,
! ISBN 1-55828-298-x, US $45 incl. disks. "Probably not 'leading edge' stuff
! but detailed enough to get your hands dirty!"
! 
! Wasserman, P. D. (1989). Neural Computing: Theory & Practice. Van Nostrand
! Reinhold: New York. (ISBN 0-442-20743-3) Comments: "Wasserman flatly
! enumerates some common architectures from an engineer's perspective ('how it
! works') without ever addressing the underlying fundamentals ('why it works')
! - important basic concepts such as clustering, principal components or
! gradient descent are not treated. It's also full of errors, and unhelpful
! diagrams drawn with what appears to be PCB board layout software from the
! '70s. For anyone who wants to do active research in the field I consider it
! quite inadequate"; "Okay, but too shallow"; "Quite easy to understand"; "The
! best bedtime reading for Neural Networks. I have given this book to numerous
! collegues who want to know NN basics, but who never plan to implement
! anything. An excellent book to give your manager."
! 
! Wasserman, P.D. (1993). Advanced Methods in Neural Computing. Van
! Nostrand Reinhold: New York (ISBN: 0-442-00461-3). Comments: Several neural
! network topics are discussed e.g. Probalistic Neural Networks,
! Backpropagation and beyond, neural control, Radial Basis Function Networks,
! Neural Engineering. Furthermore, several subjects related to neural networks
! are mentioned e.g. genetic algorithms, fuzzy logic, chaos. Just the
! functionality of these subjects is described; enough to get you started.
! Lots of references are given to more elaborate descriptions. Easy to read,
! no extensive mathematical background necessary.
! 
! Zurada, Jacek M. (1992). Introduction To Artificial Neural Systems.
! Hardcover, 785 Pages, 317 Figures, ISBN 0-534-95460-X, 1992, PWS Publishing
! Company, Price: $56.75 (includes shipping, handling, and the ANS software
! diskette). Solutions Manual available.
! "Cohesive and comprehensive book on neural nets; as an engineering-oriented
! introduction, but also as a research foundation. Thorough exposition of
! fundamentals, theory and applications. Training and recall algorithms appear
! in boxes showing steps of algorithms, thus making programming of learning
! paradigms easy. Many illustrations and intuitive examples. Winner among NN
! textbooks at a senior UG/first year graduate level-[175 problems]."
! Contents: Intro, Fundamentals of Learning, Single-Layer & Multilayer
! Perceptron NN, Assoc. Memories, Self-organizing and Matching Nets,
! Applications, Implementations, Appendix) 
! 
! The Classics:
! +++++++++++++
! 
! Kohonen, T. (1984). Self-organization and Associative Memory.
! Springer-Verlag: New York. (2nd Edition: 1988; 3rd edition: 1989). Comments:
! "The section on Pattern mathematics is excellent."
! 
! Rumelhart, D. E. and McClelland, J. L. (1986). Parallel Distributed
! Processing: Explorations in the Microstructure of Cognition (volumes 1 & 2).
! The MIT Press. Comments: "As a computer scientist I found the two Rumelhart
! and McClelland books really heavy going and definitely not the sort of thing
! to read if you are a beginner."; "It's quite readable, and affordable (about
! $65 for both volumes)."; "THE Connectionist bible".
! 
! Introductory Journal Articles:
! ++++++++++++++++++++++++++++++
! 
! Hinton, G. E. (1989). Connectionist learning procedures. Artificial
! Intelligence, Vol. 40, pp. 185--234. Comments: "One of the better neural
! networks overview papers, although the distinction between network topology
! and learning algorithm is not always very clear. Could very well be used as
! an introduction to neural networks."
! 
! Knight, K. (1990). Connectionist, Ideas and Algorithms. Communications of
! the ACM. November 1990. Vol.33 nr.11, pp 59-74. Comments:"A good article,
! while it is for most people easy to find a copy of this journal."
! 
! Kohonen, T. (1988). An Introduction to Neural Computing. Neural Networks,
! vol. 1, no. 1. pp. 3-16. Comments: "A general review".
! 
! Not-quite-so-introductory Literature:
! +++++++++++++++++++++++++++++++++++++
! 
! Anderson, J. A. and Rosenfeld, E. (Eds). (1988). Neurocomputing:
! Foundations of Research. The MIT Press: Cambridge, MA. Comments: "An
! expensive book, but excellent for reference. It is a collection of reprints
! of most of the major papers in the field." 
! 
! Anderson, J. A., Pellionisz, A. and Rosenfeld, E. (Eds). (1990). 
! Neurocomputing 2: Directions for Research. The MIT Press: Cambridge, MA.
! Comments: "The sequel to their well-known Neurocomputing book."
! 
! Caudill, M. and Butler, C. (1990). Naturally Intelligent Systems. MIT Press:
! Cambridge, Massachusetts. (ISBN 0-262-03156-6). Comments: "I guess one of
! the best books I read"; "May not be suited for people who want to do some
! research in the area".
! 
! Khanna, T. (1990). Foundations of Neural Networks. Addison-Wesley: New
! York. Comments: "Not so bad (with a page of erroneous formulas (if I
! remember well), and #hidden layers isn't well described)."; "Khanna's
! intention in writing his book with math analysis should be commended but he
! made several mistakes in the math part".
! 
! Kung, S.Y. (1993). Digital Neural Networks, Prentice Hall, Englewood
! Cliffs, NJ.
! 
! Levine, D. S. (1990). Introduction to Neural and Cognitive Modeling.
! Lawrence Erlbaum: Hillsdale, N.J. Comments: "Highly recommended".
! 
! Lippmann, R. P. (April 1987). An introduction to computing with neural nets.
! IEEE Acoustics, Speech, and Signal Processing Magazine. vol. 2, no. 4, pp
! 4-22. Comments: "Much acclaimed as an overview of neural networks, but
! rather inaccurate on several points. The categorization into binary and
! continuous- valued input neural networks is rather arbitrary, and may work
! confusing for the unexperienced reader. Not all networks discussed are of
! equal importance."
! 
! Maren, A., Harston, C. and Pap, R., (1990). Handbook of Neural Computing
! Applications. Academic Press. ISBN: 0-12-471260-6. (451 pages) Comments:
! "They cover a broad area"; "Introductory with suggested applications
! implementation".
! 
! Pao, Y. H. (1989). Adaptive Pattern Recognition and Neural Networks
! Addison-Wesley Publishing Company, Inc. (ISBN 0-201-12584-6) Comments: "An
! excellent book that ties together classical approaches to pattern
! recognition with Neural Nets. Most other NN books do not even mention
! conventional approaches."
! 
! Rumelhart, D. E., Hinton, G. E. and Williams, R. J. (1986). Learning
! representations by back-propagating errors. Nature, vol 323 (9 October), pp.
! 533-536. Comments: "Gives a very good potted explanation of backprop NN's.
! It gives sufficient detail to write your own NN simulation."
! 
! Simpson, P. K. (1990). Artificial Neural Systems: Foundations, Paradigms,
! Applications and Implementations. Pergamon Press: New York. Comments:
! "Contains a very useful 37 page bibliography. A large number of paradigms
! are presented. On the negative side the book is very shallow. Best used as a
! complement to other books".
! 
! Zeidenberg. M. (1990). Neural Networks in Artificial Intelligence. Ellis
! Horwood, Ltd., Chichester. Comments: "Gives the AI point of view".
! 
! Zornetzer, S. F., Davis, J. L. and Lau, C. (1990). An Introduction to Neural
! and Electronic Networks. Academic Press. (ISBN 0-12-781881-2) Comments:
! "Covers quite a broad range of topics (collection of articles/papers ).";
! "Provides a primer-like introduction and overview for a broad audience, and
! employs a strong interdisciplinary emphasis".
! 
! The Worst
! +++++++++
! 
!    Blum, Adam (1992), Neural Networks in C++, Wiley. 
! 
!    Welstead, Stephen T. (1994), Neural Network and Fuzzy Logic
!    Applications in C/C++, Wiley. 
! 
! Both Blum and Welstead contribute to the dangerous myth that any idiot can
! use a neural net by dumping in whatever data are handy and letting it train
! for a few days. They both have little or no discussion of generalization,
! validation, and overfitting. Neither provides any valid advice on choosing
! the number of hidden nodes. If you have ever wondered where these stupid
! "rules of thumb" that pop up frequently come from, here's a source for one
! of them: 
! 
!    "A rule of thumb is for the size of this [hidden] layer to be
!    somewhere between the input layer size ... and the output layer size
!    ..." Blum, p. 60. 
! 
! (John Lazzaro tells me he recently "reviewed a paper that cited this rule of
! thumb--and referenced this book! Needless to say, the final version of that
! paper didn't include the reference!") 
! 
! Blum offers some profound advice on choosing inputs: 
! 
!    "The next step is to pick as many input factors as possible that
!    might be related to [the target]." 
! 
! Blum also shows a deep understanding of statistics: 
! 
!    "A statistical model is simply a more indirect way of learning
!    correlations. With a neural net approach, we model the problem
!    directly." p. 8. 
! 
! Blum at least mentions some important issues, however simplistic his advice
! may be. Welstead just ignores them. What Welstead gives you is code--vast
! amounts of code. I have no idea how anyone could write that much code for a
! simple feedforward NN. Welstead's approach to validation, in his chapter on
! financial forecasting, is to reserve two cases for the validation set! 
  
! My comments apply only to the text of the above books. I have not examined
! or attempted to compile the code. 
  
  ------------------------------------------------------------------------
  
! Subject: Any journals and magazines about Neural
! ================================================
! Networks?
! =========
! 
! [to be added: comments on speed of reviewing and publishing,
!               whether they accept TeX format or ASCII by e-mail, etc.]
! 
! A. Dedicated Neural Network Journals:
! +++++++++++++++++++++++++++++++++++++
! 
! Title:   Neural Networks
! Publish: Pergamon Press
! Address: Pergamon Journals Inc., Fairview Park, Elmsford,
!          New York 10523, USA and Pergamon Journals Ltd.
!          Headington Hill Hall, Oxford OX3, 0BW, England
! Freq.:   10 issues/year (vol. 1 in 1988)
! Cost/Yr: Free with INNS or JNNS or ENNS membership ($45?),
!          Individual $65, Institution $175
! ISSN #:  0893-6080
! WWW:     http://www.elsevier.nl/locate/inca/841
! Remark:  Official Journal of International Neural Network Society (INNS),
!          European Neural Network Society (ENNS) and Japanese Neural
!          Network Society (JNNS).
!          Contains Original Contributions, Invited Review Articles, Letters
!          to Editor, Book Reviews, Editorials, Announcements, Software Surveys.
! 
! Title:   Neural Computation
! Publish: MIT Press
! Address: MIT Press Journals, 55 Hayward Street Cambridge,
!          MA 02142-9949, USA, Phone: (617) 253-2889
! Freq.:   Quarterly (vol. 1 in 1989)
! Cost/Yr: Individual $45, Institution $90, Students $35; Add $9 Outside USA
! ISSN #:  0899-7667
! URL:     http://www-mitpress.mit.edu/jrnls-catalog/neural.html
! Remark:  Combination of Reviews (10,000 words), Views (4,000 words)
!          and Letters (2,000 words).  I have found this journal to be of
!          outstanding quality.
!          (Note: Remarks supplied by Mike Plonski "plonski@aero.org")
! 
! Title:   IEEE Transactions on Neural Networks
! Publish: Institute of Electrical and Electronics Engineers (IEEE)
! Address: IEEE Service Cemter, 445 Hoes Lane, P.O. Box 1331, Piscataway, NJ,
!          08855-1331 USA. Tel: (201) 981-0060
! Cost/Yr: $10 for Members belonging to participating IEEE societies
! Freq.:   Quarterly (vol. 1 in March 1990)
! Remark:  Devoted to the science and technology of neural networks
!          which disclose significant  technical knowledge, exploratory
!          developments and applications of neural networks from biology to
!          software to hardware.  Emphasis is on artificial neural networks.
!          Specific aspects include self organizing systems, neurobiological
!          connections, network dynamics and architecture, speech recognition,
!          electronic and photonic implementation, robotics and controls.
!          Includes Letters concerning new research results.
!          (Note: Remarks are from journal announcement)
! 
! Title:   International Journal of Neural Systems
! Publish: World Scientific Publishing
! Address: USA: World Scientific Publishing Co., 1060 Main Street, River Edge,
!          NJ 07666. Tel: (201) 487 9655; Europe: World Scientific Publishing
!          Co. Ltd., 57 Shelton Street, London WC2H 9HE, England.
!          Tel: (0171) 836 0888; Asia: World Scientific Publishing Co. Pte. Ltd.,
!          1022 Hougang Avenue 1 #05-3520, Singapore 1953, Rep. of Singapore
!          Tel: 382 5663.
! Freq.:   Quarterly (Vol. 1 in 1990)
! Cost/Yr: Individual $122, Institution $255 (plus $15-$25 for postage)
! ISSN #:  0129-0657 (IJNS)
! Remark:  The International Journal of Neural Systems is a quarterly
!          journal which covers information processing in natural
!          and artificial neural systems. Contributions include research papers,
!          reviews, and Letters to the Editor - communications under 3,000
!          words in length, which are published within six months of receipt.
!          Other contributions are typically published within nine months.
!          The journal presents a fresh undogmatic attitude towards this
!          multidisciplinary field and aims to be a forum for novel ideas and

==> nn4.changes.body <==
*** nn4.oldbody	Sun Apr 28 23:00:22 1996
--- nn4.body	Tue May 28 23:00:21 1996
***************
*** 1,4 ****
  Archive-name: ai-faq/neural-nets/part4
! Last-modified: 1996-01-06
  URL: ftp://ftp.sas.com/pub/neural/FAQ4.html
  Maintainer: saswss@unx.sas.com (Warren S. Sarle)
--- 1,4 ----
  Archive-name: ai-faq/neural-nets/part4
! Last-modified: 1996-05-27
  URL: ftp://ftp.sas.com/pub/neural/FAQ4.html
  Maintainer: saswss@unx.sas.com (Warren S. Sarle)
***************
*** 13,19 ****
  Part 1: Introduction
  Part 2: Learning
! Part 3: Information resources
! Part 4: Datasets
  
     Databases for experimentation with NNs?
  
--- 13,24 ----
  Part 1: Introduction
  Part 2: Learning
! Part 3: Generalization
! Part 4: Books, data, etc.
  
+    Good literature about Neural Networks?
+    Journals and magazines about Neural Networks?
+    The most important conferences concerned with Neural Networks?
+    Neural Network Associations?
+    Other sources of information about NNs?
     Databases for experimentation with NNs?
  
***************
*** 24,27 ****
--- 29,1082 ----
  ------------------------------------------------------------------------
  
+ Subject: Good literature about Neural Networks?
+ ===============================================
+ 
+ The Best
+ ++++++++
+ 
+ The best popular introduction to NNs
+ ------------------------------------
+ 
+ Hinton, G.E. (1992), "How Neural Networks Learn from Experience", Scientific
+ American, 267 (September), 144-151. 
+ 
+ The best elementary textbooks on NNs
+ ------------------------------------
+ 
+ Masters, Timothy (1994). Practical Neural Network Recipes in C++, Academic
+ Press, ISBN 0-12-479040-2, US $45 incl. disks.
+ "Lots of very good practical advice which most other books lack."
+ 
+ Weiss, S.M. & Kulikowski, C.A. (1991), Computer Systems That Learn,
+ Morgan Kaufmann. ISBN 1 55860 065 5. 
+ 
+ The best intermediate textbooks on NNs
+ --------------------------------------
+ 
+ Bishop, C.M. (1995). Neural Networks for Pattern Recognition, Oxford:
+ Oxford University Press. ISBN 0-19-853849-9 (hardback) or 0-19-853864-2
+ (paperback), xvii+482 pages.
+ This is definitely the best book on neural nets for practical applications
+ (rather than for neurobiological models). It is the only textbook on neural
+ nets that I have seen that is statistically solid.
+ "Bishop is a leading researcher who has a deep understanding of the material
+ and has gone to great lengths to organize it in a sequence that makes sense.
+ He has wisely avoided the temptation to try to cover everything and has
+ therefore omitted interesting topics like reinforcement learning, Hopfield
+ networks, and Boltzmann machines in order to focus on the types of neural
+ networks that are most widely used in practical applications. He assumes
+ that the reader has the basic mathematical literacy required for an
+ undergraduate science degree, and using these tools he explains everything
+ from scratch. Before introducing the multilayer perceptron, for example, he
+ lays a solid foundation of basic statistical concepts. So the crucial
+ concept of overfitting is introduced using easily visualized examples of
+ one-dimensional polynomials and only later applied to neural networks. An
+ impressive aspect of this book is that it takes the reader all the way from
+ the simplest linear models to the very latest Bayesian multilayer neural
+ networks without ever requiring any great intellectual leaps." -Geoffrey
+ Hinton, from the foreword. 
+ 
+ Hertz, J., Krogh, A., and Palmer, R. (1991). Introduction to the Theory of
+ Neural Computation. Addison-Wesley: Redwood City, California. ISBN
+ 0-201-50395-6 (hardbound) and 0-201-51560-1 (paperbound)
+ "My first impression is that this one is by far the best book on the topic.
+ And it's below $30 for the paperback."; "Well written, theoretical (but not
+ overwhelming)"; It provides a good balance of model development,
+ computational algorithms, and applications. The mathematical derivations are
+ especially well done"; "Nice mathematical analysis on the mechanism of
+ different learning algorithms"; "It is NOT for mathematical beginner. If you
+ don't have a good grasp of higher level math, this book can be really tough
+ to get through."
+ 
+ The best advanced textbook covering NNs
+ ---------------------------------------
+ 
+ Ripley, B.D. (1996) Pattern Recognition and Neural Networks, Cambridge:
+ Cambridge University Press, ISBN 0-521-46086-7 (hardback), xii+403 pages.
+ Brian Ripley's new book is an excellent sequel to Bishop (1995). Ripley
+ starts up where Bishop left off, with Bayesian inference and statistical
+ decision theory, and then covers some of the same material on NNs as Bishop
+ but at a higher mathematical level. Ripley also covers a variety of methods
+ that are not discussed, or discussed only briefly, by Bishop, such as
+ tree-based methods and belief networks. While Ripley is best appreciated by
+ people with a background in mathematical statistics, the numerous realistic
+ examples in his book will be of interest even to beginners in neural nets.
+ 
+ The best books on image and signal processing with NNs
+ ------------------------------------------------------
+ 
+ Masters, T. (1994), Signal and Image Processing with Neural Networks: A
+ C++ Sourcebook, Wiley.
+ 
+ Cichocki, A. and Unbehauen, R. (1993). Neural Networks for Optimization
+ and Signal Processing. NY: John Wiley & Sons, ISBN 0-471-930105 (hardbound),
+ 526 pages, $57.95. 
+ "Partly a textbook and partly a research monograph; introduces the basic
+ concepts, techniques, and models related to neural networks and
+ optimization, excluding rigorous mathematical details. Accessible to a wide
+ readership with a differential calculus background. The main coverage of the
+ book is on recurrent neural networks with continuous state variables. The
+ book title would be more appropriate without mentioning signal processing.
+ Well edited, good illustrations."
+ 
+ The best book on time-series forecasting with NNs
+ -------------------------------------------------
+ 
+ Weigend, A.S. and Gershenfeld, N.A., eds. (1994) Time Series Prediction:
+ Forecasting the Future and Understanding the Past, Addison-Wesley: Reading,
+ MA. 
+ 
+ The best comparison of NNs with other classification methods
+ ------------------------------------------------------------
+ 
+ Michie, D., Spiegelhalter, D.J. and Taylor, C.C. (1994), Machine Learning,
+ Neural and Statistical Classification, Ellis Horwood. 
+ 
+ Books for the Beginner:
+ +++++++++++++++++++++++
+ 
+ Aleksander, I. and Morton, H. (1990). An Introduction to Neural Computing.
+ Chapman and Hall. (ISBN 0-412-37780-2). Comments: "This book seems to be
+ intended for the first year of university education."
+ 
+ Beale, R. and Jackson, T. (1990). Neural Computing, an Introduction. Adam
+ Hilger, IOP Publishing Ltd : Bristol. (ISBN 0-85274-262-2). Comments: "It's
+ clearly written. Lots of hints as to how to get the adaptive models covered
+ to work (not always well explained in the original sources). Consistent
+ mathematical terminology. Covers perceptrons, error-backpropagation, Kohonen
+ self-org model, Hopfield type models, ART, and associative memories."
+ 
+ Dayhoff, J. E. (1990). Neural Network Architectures: An Introduction. Van
+ Nostrand Reinhold: New York. Comments: "Like Wasserman's book, Dayhoff's
+ book is also very easy to understand".
+ 
+ Fausett, L. V. (1994). Fundamentals of Neural Networks: Architectures,
+ Algorithms and Applications, Prentice Hall, ISBN 0-13-334186-0. Also
+ published as a Prentice Hall International Edition, ISBN 0-13-042250-9.
+ Sample softeware (source code listings in C and Fortran) is included in an
+ Instructor's Manual. "Intermediate in level between Wasserman and
+ Hertz/Krogh/Palmer. Algorithms for a broad range of neural networks,
+ including a chapter on Adaptive Resonace Theory with ART2. Simple examples
+ for each network."
+ 
+ Freeman, James (1994). Simulating Neural Networks with Mathematica,
+ Addison-Wesley, ISBN: 0-201-56629-X. Helps the reader make his own NNs. The
+ mathematica code for the programs in the book is also available through the
+ internet: Send mail to MathSource@wri.com or try http://www.wri.com/ on the
+ World Wide Web.
+ 
+ Gately, E. (1996). Neural Networks for Financial Forecasting. New York:
+ John Wiley and Sons, Inc.
+ Franco Insana comments:
+ 
+ * Decent book for the neural net beginner
+ * Very little devoted to statistical framework, although there 
+     is some formulation of backprop theory
+ * Some food for thought
+ * Nothing here for those with any neural net experience
+ 
+ Haykin, S. (1994). Neural Networks, a Comprehensive Foundation.
+ Macmillan, New York, NY.
+ "A very readable, well written intermediate text on NNs Perspective is
+ primarily one of pattern recognition, estimation and signal processing.
+ However, there are well-written chapters on neurodynamics and VLSI
+ implementation. Though there is emphasis on formal mathematical models of
+ NNs as universal approximators, statistical estimators, etc., there are also
+ examples of NNs used in practical applications. The problem sets at the end
+ of each chapter nicely complement the material. In the bibliography are over
+ 1000 references."
+ 
+ Hecht-Nielsen, R. (1990). Neurocomputing. Addison Wesley. Comments: "A
+ good book", "comprises a nice historical overview and a chapter about NN
+ hardware. Well structured prose. Makes important concepts clear."
+ 
+ McClelland, J. L. and Rumelhart, D. E. (1988). Explorations in Parallel
+ Distributed Processing: Computational Models of Cognition and Perception
+ (software manual). The MIT Press. Comments: "Written in a tutorial style,
+ and includes 2 diskettes of NN simulation programs that can be compiled on
+ MS-DOS or Unix (and they do too !)"; "The programs are pretty reasonable as
+ an introduction to some of the things that NNs can do."; "There are *two*
+ editions of this book. One comes with disks for the IBM PC, the other comes
+ with disks for the Macintosh".
+ 
+ McCord Nelson, M. and Illingworth, W.T. (1990). A Practical Guide to Neural
+ Nets. Addison-Wesley Publishing Company, Inc. (ISBN 0-201-52376-0).
+ Comments: "No formulas at all"; "It does not have much detailed model
+ development (very few equations), but it does present many areas of
+ application. It includes a chapter on current areas of research. A variety
+ of commercial applications is discussed in chapter 1. It also includes a
+ program diskette with a fancy graphical interface (unlike the PDP
+ diskette)".
+ 
+ Muller, B., Reinhardt, J., Strickland, M. T. (1995). Neural Networks. An
+ Introduction (2nd ed.). Berlin, Heidelberg, New York: Springer-Verlag. ISBN
+ 3-540-60207-0. (DOS 3.5" disk included.) Comments: The book was developed
+ out of a course on neural-network models with computer demonstrations that
+ was taught by the authors to Physics students. The book comes together with
+ a PC-diskette. The book is divided into three parts: (1) Models of Neural
+ Networks; describing several architectures and learing rules, including the
+ mathematics. (2) Statistical Physiscs of Neural Networks; "hard-core"
+ physics section developing formal theories of stochastic neural networks.
+ (3) Computer Codes; explanation about the demonstration programs. First part
+ gives a nice introduction into neural networks together with the formulas.
+ Together with the demonstration programs a 'feel' for neural networks can be
+ developed.
+ 
+ Orchard, G.A. & Phillips, W.A. (1991). Neural Computation: A Beginner's
+ Guide. Lawrence Earlbaum Associates: London. Comments: "Short user-friendly
+ introduction to the area, with a non-technical flavour. Apparently
+ accompanies a software package, but I haven't seen that yet".
+ 
+ Rao, V.B & H.V. (1993). C++ Neural Networks and Fuzzy Logic. MIS:Press,
+ ISBN 1-55828-298-x, US $45 incl. disks. "Probably not 'leading edge' stuff
+ but detailed enough to get your hands dirty!"
+ 
+ Wasserman, P. D. (1989). Neural Computing: Theory & Practice. Van Nostrand
+ Reinhold: New York. (ISBN 0-442-20743-3) Comments: "Wasserman flatly
+ enumerates some common architectures from an engineer's perspective ('how it
+ works') without ever addressing the underlying fundamentals ('why it works')
+ - important basic concepts such as clustering, principal components or
+ gradient descent are not treated. It's also full of errors, and unhelpful
+ diagrams drawn with what appears to be PCB board layout software from the
+ '70s. For anyone who wants to do active research in the field I consider it
+ quite inadequate"; "Okay, but too shallow"; "Quite easy to understand"; "The
+ best bedtime reading for Neural Networks. I have given this book to numerous
+ collegues who want to know NN basics, but who never plan to implement
+ anything. An excellent book to give your manager."
+ 
+ Wasserman, P.D. (1993). Advanced Methods in Neural Computing. Van
+ Nostrand Reinhold: New York (ISBN: 0-442-00461-3). Comments: Several neural
+ network topics are discussed e.g. Probalistic Neural Networks,
+ Backpropagation and beyond, neural control, Radial Basis Function Networks,
+ Neural Engineering. Furthermore, several subjects related to neural networks
+ are mentioned e.g. genetic algorithms, fuzzy logic, chaos. Just the
+ functionality of these subjects is described; enough to get you started.
+ Lots of references are given to more elaborate descriptions. Easy to read,
+ no extensive mathematical background necessary.
+ 
+ Zurada, Jacek M. (1992). Introduction To Artificial Neural Systems.
+ Hardcover, 785 Pages, 317 Figures, ISBN 0-534-95460-X, 1992, PWS Publishing
+ Company, Price: $56.75 (includes shipping, handling, and the ANS software
+ diskette). Solutions Manual available.
+ "Cohesive and comprehensive book on neural nets; as an engineering-oriented
+ introduction, but also as a research foundation. Thorough exposition of
+ fundamentals, theory and applications. Training and recall algorithms appear
+ in boxes showing steps of algorithms, thus making programming of learning
+ paradigms easy. Many illustrations and intuitive examples. Winner among NN
+ textbooks at a senior UG/first year graduate level-[175 problems]."
+ Contents: Intro, Fundamentals of Learning, Single-Layer & Multilayer
+ Perceptron NN, Assoc. Memories, Self-organizing and Matching Nets,
+ Applications, Implementations, Appendix) 
+ 
+ The Classics:
+ +++++++++++++
+ 
+ Kohonen, T. (1984). Self-organization and Associative Memory.
+ Springer-Verlag: New York. (2nd Edition: 1988; 3rd edition: 1989). Comments:
+ "The section on Pattern mathematics is excellent."
+ 
+ Rumelhart, D. E. and McClelland, J. L. (1986). Parallel Distributed
+ Processing: Explorations in the Microstructure of Cognition (volumes 1 & 2).
+ The MIT Press. Comments: "As a computer scientist I found the two Rumelhart
+ and McClelland books really heavy going and definitely not the sort of thing
+ to read if you are a beginner."; "It's quite readable, and affordable (about
+ $65 for both volumes)."; "THE Connectionist bible".
+ 
+ Introductory Journal Articles:
+ ++++++++++++++++++++++++++++++
+ 
+ Hinton, G. E. (1989). Connectionist learning procedures. Artificial
+ Intelligence, Vol. 40, pp. 185--234. Comments: "One of the better neural
+ networks overview papers, although the distinction between network topology
+ and learning algorithm is not always very clear. Could very well be used as
+ an introduction to neural networks."
+ 
+ Knight, K. (1990). Connectionist, Ideas and Algorithms. Communications of
+ the ACM. November 1990. Vol.33 nr.11, pp 59-74. Comments:"A good article,
+ while it is for most people easy to find a copy of this journal."
+ 
+ Kohonen, T. (1988). An Introduction to Neural Computing. Neural Networks,
+ vol. 1, no. 1. pp. 3-16. Comments: "A general review".
+ 
+ Not-quite-so-introductory Literature:
+ +++++++++++++++++++++++++++++++++++++
+ 
+ Anderson, J. A. and Rosenfeld, E. (Eds). (1988). Neurocomputing:
+ Foundations of Research. The MIT Press: Cambridge, MA. Comments: "An
+ expensive book, but excellent for reference. It is a collection of reprints
+ of most of the major papers in the field." 
+ 
+ Anderson, J. A., Pellionisz, A. and Rosenfeld, E. (Eds). (1990). 
+ Neurocomputing 2: Directions for Research. The MIT Press: Cambridge, MA.
+ Comments: "The sequel to their well-known Neurocomputing book."
+ 
+ Caudill, M. and Butler, C. (1990). Naturally Intelligent Systems. MIT Press:
+ Cambridge, Massachusetts. (ISBN 0-262-03156-6). Comments: "I guess one of
+ the best books I read"; "May not be suited for people who want to do some
+ research in the area".
+ 
+ Khanna, T. (1990). Foundations of Neural Networks. Addison-Wesley: New
+ York. Comments: "Not so bad (with a page of erroneous formulas (if I
+ remember well), and #hidden layers isn't well described)."; "Khanna's
+ intention in writing his book with math analysis should be commended but he
+ made several mistakes in the math part".
+ 
+ Kung, S.Y. (1993). Digital Neural Networks, Prentice Hall, Englewood
+ Cliffs, NJ.
+ 
+ Levine, D. S. (1990). Introduction to Neural and Cognitive Modeling.
+ Lawrence Erlbaum: Hillsdale, N.J. Comments: "Highly recommended".
+ 
+ Lippmann, R. P. (April 1987). An introduction to computing with neural nets.
+ IEEE Acoustics, Speech, and Signal Processing Magazine. vol. 2, no. 4, pp
+ 4-22. Comments: "Much acclaimed as an overview of neural networks, but
+ rather inaccurate on several points. The categorization into binary and
+ continuous- valued input neural networks is rather arbitrary, and may work
+ confusing for the unexperienced reader. Not all networks discussed are of
+ equal importance."
+ 
+ Maren, A., Harston, C. and Pap, R., (1990). Handbook of Neural Computing
+ Applications. Academic Press. ISBN: 0-12-471260-6. (451 pages) Comments:
+ "They cover a broad area"; "Introductory with suggested applications
+ implementation".
+ 
+ Pao, Y. H. (1989). Adaptive Pattern Recognition and Neural Networks
+ Addison-Wesley Publishing Company, Inc. (ISBN 0-201-12584-6) Comments: "An
+ excellent book that ties together classical approaches to pattern
+ recognition with Neural Nets. Most other NN books do not even mention
+ conventional approaches."
+ 
+ Refenes, A. (Ed.) (1995). Neural Networks in the Capital Markets.
+ Chichester, England: John Wiley and Sons, Inc.
+ Franco Insana comments:
+ 
+ * Not for the beginner
+ * Excellent introductory material presented by editor in first 5 
+   chapters, which could be a valuable reference source for any 
+   practitioner
+ * Very thought-provoking
+ * Mostly backprop-related
+ * Most contributors lay good statistical foundation
+ * Overall, a wealth of information and ideas, but the reader has to 
+   sift through it all to come away with anything useful
+ 
+ Rumelhart, D. E., Hinton, G. E. and Williams, R. J. (1986). Learning
+ representations by back-propagating errors. Nature, vol 323 (9 October), pp.
+ 533-536. Comments: "Gives a very good potted explanation of backprop NN's.
+ It gives sufficient detail to write your own NN simulation."
+ 
+ Simpson, P. K. (1990). Artificial Neural Systems: Foundations, Paradigms,
+ Applications and Implementations. Pergamon Press: New York. Comments:
+ "Contains a very useful 37 page bibliography. A large number of paradigms
+ are presented. On the negative side the book is very shallow. Best used as a
+ complement to other books".
+ 
+ Zeidenberg. M. (1990). Neural Networks in Artificial Intelligence. Ellis
+ Horwood, Ltd., Chichester. Comments: "Gives the AI point of view".
+ 
+ Zornetzer, S. F., Davis, J. L. and Lau, C. (1990). An Introduction to Neural
+ and Electronic Networks. Academic Press. (ISBN 0-12-781881-2) Comments:
+ "Covers quite a broad range of topics (collection of articles/papers ).";
+ "Provides a primer-like introduction and overview for a broad audience, and
+ employs a strong interdisciplinary emphasis".
+ 
+ The Worst
+ +++++++++
+ 
+    Blum, Adam (1992), Neural Networks in C++, Wiley. 
+ 
+    Welstead, Stephen T. (1994), Neural Network and Fuzzy Logic
+    Applications in C/C++, Wiley. 
+ 
+ Both Blum and Welstead contribute to the dangerous myth that any idiot can
+ use a neural net by dumping in whatever data are handy and letting it train
+ for a few days. They both have little or no discussion of generalization,
+ validation, and overfitting. Neither provides any valid advice on choosing
+ the number of hidden nodes. If you have ever wondered where these stupid
+ "rules of thumb" that pop up frequently come from, here's a source for one
+ of them: 
+ 
+    "A rule of thumb is for the size of this [hidden] layer to be
+    somewhere between the input layer size ... and the output layer size
+    ..." Blum, p. 60. 
+ 
+ (John Lazzaro tells me he recently "reviewed a paper that cited this rule of
+ thumb--and referenced this book! Needless to say, the final version of that
+ paper didn't include the reference!") 
+ 
+ Blum offers some profound advice on choosing inputs: 
+ 
+    "The next step is to pick as many input factors as possible that
+    might be related to [the target]." 
+ 
+ Blum also shows a deep understanding of statistics: 
+ 
+    "A statistical model is simply a more indirect way of learning
+    correlations. With a neural net approach, we model the problem
+    directly." p. 8. 
+ 
+ Blum at least mentions some important issues, however simplistic his advice
+ may be. Welstead just ignores them. What Welstead gives you is code--vast
+ amounts of code. I have no idea how anyone could write that much code for a
+ simple feedforward NN. Welstead's approach to validation, in his chapter on
+ financial forecasting, is to reserve two cases for the validation set! 
+ 
+ My comments apply only to the text of the above books. I have not examined
+ or attempted to compile the code. 
+ 
+ ------------------------------------------------------------------------
+ 
+ Subject: Journals and magazines about Neural Networks?
+ ======================================================
+ 
+ [to be added: comments on speed of reviewing and publishing,
+               whether they accept TeX format or ASCII by e-mail, etc.]
+ 
+ A. Dedicated Neural Network Journals:
+ +++++++++++++++++++++++++++++++++++++
+ 
+ Title:   Neural Networks
+ Publish: Pergamon Press
+ Address: Pergamon Journals Inc., Fairview Park, Elmsford,
+          New York 10523, USA and Pergamon Journals Ltd.
+          Headington Hill Hall, Oxford OX3, 0BW, England
+ Freq.:   10 issues/year (vol. 1 in 1988)
+ Cost/Yr: Free with INNS or JNNS or ENNS membership ($45?),
+          Individual $65, Institution $175
+ ISSN #:  0893-6080
+ WWW:     http://www.elsevier.nl/locate/inca/841
+ Remark:  Official Journal of International Neural Network Society (INNS),
+          European Neural Network Society (ENNS) and Japanese Neural
+          Network Society (JNNS).
+          Contains Original Contributions, Invited Review Articles, Letters
+          to Editor, Book Reviews, Editorials, Announcements, Software Surveys.
+ 
+ Title:   Neural Computation
+ Publish: MIT Press
+ Address: MIT Press Journals, 55 Hayward Street Cambridge,
+          MA 02142-9949, USA, Phone: (617) 253-2889
+ Freq.:   Quarterly (vol. 1 in 1989)
+ Cost/Yr: Individual $45, Institution $90, Students $35; Add $9 Outside USA
+ ISSN #:  0899-7667
+ URL:     http://www-mitpress.mit.edu/jrnls-catalog/neural.html
+ Remark:  Combination of Reviews (10,000 words), Views (4,000 words)
+          and Letters (2,000 words).  I have found this journal to be of
+          outstanding quality.
+          (Note: Remarks supplied by Mike Plonski "plonski@aero.org")
+ 
+ Title:   IEEE Transactions on Neural Networks
+ Publish: Institute of Electrical and Electronics Engineers (IEEE)
+ Address: IEEE Service Cemter, 445 Hoes Lane, P.O. Box 1331, Piscataway, NJ,
+          08855-1331 USA. Tel: (201) 981-0060
+ Cost/Yr: $10 for Members belonging to participating IEEE societies
+ Freq.:   Quarterly (vol. 1 in March 1990)
+ URL:     http://www.ieee.org/nnc/pubs/transactions.html
+ Remark:  Devoted to the science and technology of neural networks
+          which disclose significant  technical knowledge, exploratory
+          developments and applications of neural networks from biology to
+          software to hardware.  Emphasis is on artificial neural networks.
+          Specific aspects include self organizing systems, neurobiological
+          connections, network dynamics and architecture, speech recognition,
+          electronic and photonic implementation, robotics and controls.
+          Includes Letters concerning new research results.
+          (Note: Remarks are from journal announcement)
+ 
+ Title:   International Journal of Neural Systems
+ Publish: World Scientific Publishing
+ Address: USA: World Scientific Publishing Co., 1060 Main Street, River Edge,
+          NJ 07666. Tel: (201) 487 9655; Europe: World Scientific Publishing
+          Co. Ltd., 57 Shelton Street, London WC2H 9HE, England.
+          Tel: (0171) 836 0888; Asia: World Scientific Publishing Co. Pte. Ltd.,
+          1022 Hougang Avenue 1 #05-3520, Singapore 1953, Rep. of Singapore

==> nn5.changes.body <==
*** nn5.oldbody	Sun Apr 28 23:00:26 1996
--- nn5.body	Tue May 28 23:00:25 1996
***************
*** 1,4 ****
  Archive-name: ai-faq/neural-nets/part5
! Last-modified: 1996-04-27
  URL: ftp://ftp.sas.com/pub/neural/FAQ5.html
  Maintainer: saswss@unx.sas.com (Warren S. Sarle)
--- 1,4 ----
  Archive-name: ai-faq/neural-nets/part5
! Last-modified: 1996-05-27
  URL: ftp://ftp.sas.com/pub/neural/FAQ5.html
  Maintainer: saswss@unx.sas.com (Warren S. Sarle)
***************
*** 13,18 ****
  Part 1: Introduction
  Part 2: Learning
! Part 3: Information resources
! Part 4: Datasets
  Part 5: Free software
  
--- 13,18 ----
  Part 1: Introduction
  Part 2: Learning
! Part 3: Generalization
! Part 4: Books, data, etc.
  Part 5: Free software
  
***************
*** 65,69 ****
  30. AINET 
  
! Here are the full descriptions and references: 
  
  1. Rochester Connectionist Simulator
--- 65,69 ----
  30. AINET 
  
! See also http://www.emsl.pnl.gov:2080/docs/cie/neural/systems/shareware.html
  
  1. Rochester Connectionist Simulator
***************
*** 149,154 ****
     makes use of the Mac's graphical interface, and provides a number of
     tools for building, editing, training, testing and examining networks.
!    This program is available by anonymous ftp from dartvax.dartmouth.edu
!    [129.170.16.4] as /pub/mac/dartnet.sit.hqx (124 KB). 
  
  10. SNNS
--- 149,154 ----
     makes use of the Mac's graphical interface, and provides a number of
     tools for building, editing, training, testing and examining networks.
!    This program is available by anonymous ftp from ftp.dartmouth.edu as 
!    /pub/mac/dartnet.sit.hqx (124 KB). 
  
  10. SNNS
***************
*** 201,220 ****
     software is available from two FTP sites: from CMU's simulator collection
     on pt.cs.cmu.edu [128.2.254.155] in 
!    /afs/cs/project/connect/code/am6.tar.Z and from UCLA's cognitive science
!    machine ftp.cognet.ucla.edu [128.97.50.19] in /pub/alexis/am6.tar.Z (2
!    MB). 
! 
! 12. Adaptive Logic Network Educational Kit (for Windows)
! ++++++++++++++++++++++++++++++++++++++++++++++++++++++++
! 
!    This package differs from the traditional nets in that it uses logic
!    functions AND and OR in all hidden layers but the first, which uses
!    simple perceptrons. This representation of functions from real valued
!    inputs to real outputs allows the user to impose constraints on the
!    learned solution (monotonicity, convexity,...). Execution software is
!    provided in C source form for experimenters. Anonymous ftp from
!    ftp.cs.ualberta.ca in directory /pub/atree/atree3/. See files 
!    atree3ek.exe and atree3ek.brief.guide, This software is the same as the
!    commercial Atree 3.0 program for functions of one or two inputs. 
  
  13. NeuralShell
--- 201,235 ----
     software is available from two FTP sites: from CMU's simulator collection
     on pt.cs.cmu.edu [128.2.254.155] in 
!    /afs/cs/project/connect/code/unsupported/am6.tar.Z and from UCLA's
!    cognitive science machine ftp.cognet.ucla.edu [128.97.50.19] in 
!    /pub/alexis/am6.tar.Z (2 MB). 
! 
! 12. Adaptive Logic Network Educational Kit (for Windows) 
! +++++++++++++++++++++++++++++++++++++++++++++++++++++++++
! 
!    The type of neural net used in the Atree 3.0 Educational Kit (EK) package
!    differs from the traditional one. Logic functions AND and OR form the
!    units in all hidden layers but the first, which uses simple perceptrons.
!    Though this net can't compute real-valued outputs, since its outputs
!    are strictly boolean, it can easily and naturally represent real valued
!    functions by giving a 0 above the function's graph and a 1 otherwise.
!    This unorthodox approach is extremely useful, since it allows the user to
!    impose constraints on the functions to be learned (monotonicity, bounds
!    on slopes, convexity,...). Very rapid computation of functions is done by
!    an ALN decision tree at whose leaves are small expressions of minimum
!    and maximum operations acting on linear functions. 
! 
!    Two simple languages describe ALNs and the steps of training an ALN.
!    Execution software for ALN decision trees resulting from training is
!    provided in C source form for experimenters. EK and a brief guide are
!    obtained by anonymous ftp from ftp.cs.ualberta.ca in directory
!    /pub/atree/atree3/. Get the files atree3ek.exe and atree3ek.brief.guide. 
! 
!    An extensive User's Guide with an introduction to basic ALN theory is
!    available on WWW at http://www.cs.ualberta.ca/~arms/guide/ch0.htm . This
!    Educational Kit software is the same as the commercial Atree 3.0 program
!    except that it allows only two input variables and is licensed for
!    educational uses only. A built-in 2D and 3D plotting capability is useful
!    to help the user understand how ALNs work. 
  
  13. NeuralShell

==> nn6.changes.body <==
*** nn6.oldbody	Sun Apr 28 23:00:30 1996
--- nn6.body	Tue May 28 23:00:29 1996
***************
*** 1,4 ****
  Archive-name: ai-faq/neural-nets/part6
! Last-modified: 1996-03-28
  URL: ftp://ftp.sas.com/pub/neural/FAQ6.html
  Maintainer: saswss@unx.sas.com (Warren S. Sarle)
--- 1,4 ----
  Archive-name: ai-faq/neural-nets/part6
! Last-modified: 1996-05-27
  URL: ftp://ftp.sas.com/pub/neural/FAQ6.html
  Maintainer: saswss@unx.sas.com (Warren S. Sarle)
***************
*** 13,18 ****
  Part 1: Introduction
  Part 2: Learning
! Part 3: Information resources
! Part 4: Datasets
  Part 5: Free software
  Part 6: Commercial software
--- 13,18 ----
  Part 1: Introduction
  Part 2: Learning
! Part 3: Generalization
! Part 4: Books, data, etc.
  Part 5: Free software
  Part 6: Commercial software
***************
*** 30,41 ****
  
  Note for future submissions: Please restrict product descriptions to a
! maximum of 60 lines of 72 characters, and send an HTML-formatted version if
! possible. If you include the standard header (name, company, address, etc.),
! you need not count the header in the 60 line maximum.Try to make the
! descriptions objective, and avoid making implicit or explicit assertions
! about competing products, such as "Our product is the *only* one that does
! so-and-so." The FAQ maintainer reserves the right to remove excessive
! marketing hype and to edit submissions to conform to size requirements; if
! he is in a good mood, he may also correct your spelling and punctuation. 
  
  The following simulators are described below: 
--- 30,44 ----
  
  Note for future submissions: Please restrict product descriptions to a
! maximum of 60 lines of 72 characters, in either plain-text format or,
! preferably, HTML format. If you include the standard header (name, company,
! address, etc.), you need not count the header in the 60 line maximum. Please
! confine your HTML to features that are supported by most browsers,
! especially NCSA Mosaic 2.0; avoid tables for example--use <pre> instead. Try
! to make the descriptions objective, and avoid making implicit or explicit
! assertions about competing products, such as "Our product is the *only* one
! that does so-and-so." The FAQ maintainer reserves the right to remove
! excessive marketing hype and to edit submissions to conform to size
! requirements; if he is in a good mood, he may also correct your spelling and
! punctuation. 
  
  The following simulators are described below: 
***************
*** 69,73 ****
--- 72,79 ----
  27. NeuFrame, NeuroFuzzy, NeuDesk and NeuRun 
  28. OWL Neural Network Library (TM) 
+ 29. Neural Connection 
  
+ See also http://www.emsl.pnl.gov:2080/docs/cie/neural/systems/software.html 
+ 
  1. nn/xnn
  +++++++++
***************
*** 207,211 ****
          Fax: (919) 677-4444          (49) 6221 474 850
        Email: saswss@unx.sas.com (Neural net macros)
!              eurgxh@mvs.sas.com (Neural net GUI)
          URL: ftp://ftp.sas.com/pub/neural/README
     Operating systems for macros: MS Windows (3.1, 95, NT) IBM OS/2 (2.1, 3.0, Warp),
--- 213,217 ----
          Fax: (919) 677-4444          (49) 6221 474 850
        Email: saswss@unx.sas.com (Neural net macros)
!              sasjub@unx.sas.com or eurgxh@mvs.sas.com (Neural net GUI)
          URL: ftp://ftp.sas.com/pub/neural/README
     Operating systems for macros: MS Windows (3.1, 95, NT) IBM OS/2 (2.1, 3.0, Warp),
***************
*** 226,242 ****
     There is also the SAS Neural Network Application including a graphical
     user interface, on-site training and customisation. For prices and other
!    information, send email to eurgxh@mvs.sas.com or call the European
!    office. 
! 
!    TNN is an elaborate system of macros for feedforward neural nets
!    including multilayer perceptrons, radial basis functions, statistical
!    versions of counterpropagation and learning vector quantization, a
!    variety of built-in activation and error functions, multiple hidden
!    layers, direct input-output connections, missing value handling,
!    categorical variables, standardization of inputs and targets, and
!    multiple preliminary optimizations from random initial values to avoid
!    local minima. Training is done by state-of-the-art numerical optimization
!    algorithms instead of tedious backprop. Maximum likelihood and
!    hierarchical Bayesian training are provided for a wide range of noise
     distributions. TNN requires the SAS/OR product in release 6.08 or later.
     Release 6.10 or later is strongly recommended. Release 6.10 is required
--- 232,246 ----
     There is also the SAS Neural Network Application including a graphical
     user interface, on-site training and customisation. For prices and other
!    information, send email to sasjub@unx.sas.com (North America) or
!    eurgxh@mvs.sas.com (Europe). TNN is an elaborate system of macros for
!    feedforward neural nets including multilayer perceptrons, radial basis
!    functions, statistical versions of counterpropagation and learning vector
!    quantization, a variety of built-in activation and error functions,
!    multiple hidden layers, direct input-output connections, missing value
!    handling, categorical variables, standardization of inputs and targets,
!    and multiple preliminary optimizations from random initial values to
!    avoid local minima. Training is done by state-of-the-art numerical
!    optimization algorithms instead of tedious backprop. Maximum likelihood
!    and hierarchical Bayesian training are provided for a wide range of noise
     distributions. TNN requires the SAS/OR product in release 6.08 or later.
     Release 6.10 or later is strongly recommended. Release 6.10 is required
***************
*** 257,263 ****
         Phone: (412) 787-8222
           FAX: (412) 787-8220
!        Email: sales@nware.com (soon to change to: sales@neuralware.com).
!     Comments: We are also putting up a web page which should be operational
!               by Christmas or shortly afterward.
  
      Distributor for Europe:
--- 261,266 ----
         Phone: (412) 787-8222
           FAX: (412) 787-8220
!        Email: sales@neuralware.com
!          URL: http://www.neuralware.com/
  
      Distributor for Europe:
***************
*** 1246,1250 ****
     NeuFrame
     NeuFrame provides an easy-to-use, visual, object-oriented approach to
!    problem solving using intelligence technologies, including nneural
     networks and neurofuzzy techniques. It provides features to enable
     businesses to investigate and apply intelligence technologies from
--- 1249,1253 ----
     NeuFrame
     NeuFrame provides an easy-to-use, visual, object-oriented approach to
!    problem solving using intelligence technologies, including neural
     networks and neurofuzzy techniques. It provides features to enable
     businesses to investigate and apply intelligence technologies from
***************
*** 1373,1376 ****
--- 1376,1451 ----
       Outside USA and Canada: (US) $350 object code, (US) $1050 with source
       Shipping, taxes, duties, etc., are the responsibility of the customer.
+ 
+ 29. Neural Connection
+ +++++++++++++++++++++
+ 
+        Name: Neural Connection
+     Company: SPSS Inc.
+     Address: 444 N. Michigan Ave., Chicago, IL 60611
+       Phone: 1-800-543-2185
+              1-312-329-3500 (U.S. and Canada)
+         Fax: 1-312-329-3668 (U.S. and Canada)
+       Email: sales@spss.com
+         URL: http://www.spss.com
+ 
+    SPSS has offices worldwide.  For inquiries outside the U.S. and Canada, 
+    please contact the U.S. office to locate the office nearest you.
+ 
+    Operating system   : Microsoft Windows 3.1 (runs in Windows 95)
+    System requirements: 386 pc or better, 4 MB memory (8MB recommended), 4 MB 
+    free hard disk space, VGA or SVGA monitor, Mouse or other pointing device, 
+    Math coprocessor strongly recommended
+    Price: $995, academic discounts available
+ 
+    Description
+    Neural Connection is a graphical neural network tool which uses an
+    icon-based workspace for building models for prediction, classification,
+    time series forecasting and data segmentation.  It includes extensive
+    data management capabilities so your data preparation is easily done
+    right within Neural Connection.  Several output tools give you the
+    ability to explore your models thoroughly so you understand your
+    results.
+ 
+    Modeling and Forecasting tools
+    * 3 neural network tools: Multi-Layer Perceptron, Radial Basis Function, 
+      Kohonen network
+    * 3 statistical analysis tools: Multiple linear regression, Closest 
+      class means classifier, Principal component analysis
+ 
+    Data Management tools
+    * Filter tool: transformations, trimming, descriptive statistics, 
+      select/deselect variables for analysis, histograms
+    * Time series window: single- or multi-step prediction, adjustable step 
+      size
+    * Network combiner
+    * Simulator
+    * Split dataset: training, validation and test data
+    * Handles missing values
+ 
+    Output Options
+    * Text output: writes ASCII and SPSS (.sav) files, actual values, 
+      probabilities, classification results table, network output
+    * Graphical output: 3-D contour plot, rotation capabilties, WhatIf? tool 
+      includes interactive sensititivity plot, cross section, color contour
+      plot
+    * Time series plot
+ 
+    Production Tools
+    * Scripting language for batch jobs and interactive applications
+    * Scripting language for building applications
+ 
+    Documentation
+    * User s guide includes tutorial, operations and algorithms
+    * Guide to neural network applications
+ 
+    Example Applications
+    * Finance - predict account attrition
+    * Marketing - customer segmentation
+    * Medical - predict length of stay in hospital
+    * Consulting - forecast construction needs of federal court systems
+    * Utilities - predict number of service requests
+    * Sales - forecast product demand and sales
+    * Science - classify climate types
+ 
  
  ------------------------------------------------------------------------

==> nn7.changes.body <==
*** nn7.oldbody	Sun Apr 28 23:00:33 1996
--- nn7.body	Tue May 28 23:00:33 1996
***************
*** 1,4 ****
  Archive-name: ai-faq/neural-nets/part7
! Last-modified: 1996-03-27
  URL: ftp://ftp.sas.com/pub/neural/FAQ7.html
  Maintainer: saswss@unx.sas.com (Warren S. Sarle)
--- 1,4 ----
  Archive-name: ai-faq/neural-nets/part7
! Last-modified: 1996-05-27
  URL: ftp://ftp.sas.com/pub/neural/FAQ7.html
  Maintainer: saswss@unx.sas.com (Warren S. Sarle)
***************
*** 13,18 ****
  Part 1: Introduction
  Part 2: Learning
! Part 3: Information resources
! Part 4: Datasets
  Part 5: Free software
  Part 6: Commercial software
--- 13,18 ----
  Part 1: Introduction
  Part 2: Learning
! Part 3: Generalization
! Part 4: Books, data, etc.
  Part 5: Free software
  Part 6: Commercial software
***************
*** 27,33 ****
  =================================
  
! [who will write some short comment on the most important HW-packages and
! chips?]
  
  Various NN HW information can be found in the Web site 
  http://www1.cern.ch/NeuralNets/nnwInHepHard.html (from people who really use
--- 27,41 ----
  =================================
  
! Overview articles: 
  
+  o Ienne, Paolo and Kuhn, Gary (1995), "Digital Systems for Neural
+    Networks", in Papamichalis, P. and Kerwin, R., eds., Digital Signal
+    Processing Technology, Critical Reviews Series CR57 Orlando, FL: SPIE
+    Optical Engineering, pp 314-45, 
+    ftp://mantraftp.epfl.ch/mantra/ienne.spie95.A4.ps.gz or 
+    ftp://mantraftp.epfl.ch/mantra/ienne.spie95.US.ps.gz 
+  o ftp://ftp.mrc-apu.cam.ac.uk/pub/nn/murre/neurhard.ps (1995) 
+  o ftp://ftp.urc.tue.nl/pub/neural/hardware_general.ps.gz (1993) 
+ 
  Various NN HW information can be found in the Web site 
  http://www1.cern.ch/NeuralNets/nnwInHepHard.html (from people who really use
***************
*** 35,45 ****
  http://www1.cern.ch/NeuralNets/nnwInHepExpt.html 
  
- There is an overview article from 1993 available from 
- ftp://ftp.urc.tue.nl/pub/neural/hardware_general.ps.gz and another from 1995
- can be found in ftp://ftp.mrc-apu.cam.ac.uk/pub/nn/murre/neurhard.ps 
- 
  Further WWW pointers to NN Hardware:
  http://msia02.msi.se/~lindsey/nnwAtm.html
! ...
  Here is a short list of companies: 
  
--- 43,49 ----
  http://www1.cern.ch/NeuralNets/nnwInHepExpt.html 
  
  Further WWW pointers to NN Hardware:
  http://msia02.msi.se/~lindsey/nnwAtm.html
! 
  Here is a short list of companies: 
  
***************
*** 346,354 ****
  FAQ maintainer at saswss@unx.sas.com. 
  
-  o How do NNs generalize? 
   o What is the curse of dimensionality? 
   o How many training cases do I need? 
-  o How many layers should be used? 
   o How should I split the data into training and validation sets? 
   o What are some good constructive training algorithms? 
   o How can on-line/incremental training be done effectively? 
--- 350,357 ----
  FAQ maintainer at saswss@unx.sas.com. 
  
   o What is the curse of dimensionality? 
   o How many training cases do I need? 
   o How should I split the data into training and validation sets? 
+  o What error functions can be used? 
   o What are some good constructive training algorithms? 
   o How can on-line/incremental training be done effectively? 
***************
*** 356,361 ****
   o How can I select important input variables? 
   o How to handle missing data? 
-  o Comparison of MLPs and RBF networks? 
   o Should NNs be used in safety-critical applications? 
   o My net won't learn! What should I do??? 
   o My net won't generalize! What should I do??? 
--- 359,364 ----
   o How can I select important input variables? 
   o How to handle missing data? 
   o Should NNs be used in safety-critical applications? 
+  o What does unsupervised learning learn? 
   o My net won't learn! What should I do??? 
   o My net won't generalize! What should I do??? 
***************
*** 409,413 ****
   o Wlodzislaw Duch <duch@phys.uni.torun.pl> 
   o E. Robert Tisdale <edwin@flamingo.cs.ucla.edu> 
!  o Athanasios Episcopos <EPISCOPO@icarus.som.clarkson.edu> 
   o Frank Schnorrenberg <fs0997@easttexas.tamu.edu> 
   o Gary Lawrence Murphy <garym@maya.isis.org> 
--- 412,416 ----
   o Wlodzislaw Duch <duch@phys.uni.torun.pl> 
   o E. Robert Tisdale <edwin@flamingo.cs.ucla.edu> 
!  o Athanasios Episcopos <episcopo@fire.camp.clarkson.edu> 
   o Frank Schnorrenberg <fs0997@easttexas.tamu.edu> 
   o Gary Lawrence Murphy <garym@maya.isis.org> 
***************
*** 422,425 ****
--- 425,429 ----
   o Gamze Erten <ictech@mcimail.com> 
   o Ed Rosenfeld <IER@aol.com> 
+  o Franco Insana <INSANA@asri.edu> 
   o Javier Blasco-Alberto <jblasco@ideafix.cps.unizar.es> 
   o Jean-Denis Muller <jdmuller@vnet.ibm.com> 
***************
*** 435,438 ****
--- 439,443 ----
   o Kjetil.Noervaag@idt.unit.no 
   o Luke Koops <koops@gaul.csd.uwo.ca> 
+  o Kurt Hornik <Kurt.Hornik@tuwien.ac.at> 
   o Clark Lindsey <lindsey@particle.kth.se> 
   o William Mackeown <mackeown@compsci.bristol.ac.uk> 
-- 

Warren S. Sarle       SAS Institute Inc.   The opinions expressed here
saswss@unx.sas.com    SAS Campus Drive     are mine and not necessarily
(919) 677-8000        Cary, NC 27513, USA  those of SAS Institute.
