Newsgroups: comp.ai.neural-nets
Path: cantaloupe.srv.cs.cmu.edu!bb3.andrew.cmu.edu!newsfeed.pitt.edu!dsinc!netnews.upenn.edu!news.voicenet.com!news.sprintlink.net!news-dc-9.sprintlink.net!tezcat.com!news.ner.bbnplanet.net!nntp-hub2.barrnet.net!newsfeed.internetmci.com!news.ac.net!news.cais.net!news.mathworks.com!uunet!in2.uu.net!news.interpath.net!sas!newshost.unx.sas.com!saswss
From: saswss@hotellng.unx.sas.com (Warren Sarle)
Subject: Re: [Q] OLS Learning Algorithm
Originator: saswss@hotellng.unx.sas.com
Sender: news@unx.sas.com (Noter of Newsworthy Events)
Message-ID: <Dsn8Cu.BIy@unx.sas.com>
Date: Fri, 7 Jun 1996 18:50:54 GMT
X-Nntp-Posting-Host: hotellng.unx.sas.com
References: <4p38vd$c22@aggedor.rmit.EDU.AU> <4p5i0e$c5a@llnews.ll.mit.edu>
Organization: SAS Institute Inc.
Lines: 123


In article <4p5i0e$c5a@llnews.ll.mit.edu>, heath@ll.mit.edu (Greg Heath) writes:
|> I'm beginning to read up on RBFs. What is the citation for Chen's paper?

   Chen, S., Cowan, C.F.N., and Grant, P.M. (1991), "Orthogonal
   least squares learning for radial basis function networks,"
   IEEE Transactions on Neural Networks, 2, 302-309.

Subject: What are OLS and subset regression?

If you are statistician, "OLS" means "ordinary least squares" (as
opposed to weighted or generalized least squares), which is what the
NN literature often calls "LMS" (least mean squares). 

If you are a neural networker, "OLS" means "orthogonal least squares",
which is an algorithm for forward stepwise regression proposed by
Chen et al. (1991) for training RBF networks.

OLS is a variety of supervised training.  But whereas backprop and other
commonly-used supervised methods are forms of continuous optimization,
OLS is a form of combinatorial optimization. Rather than treating the
RBF centers as continuous values to be adjusted to reduce the training
error, OLS starts with a large set of candidate centers and selects a
subset that usually provides good training error. 

There are numerous methods for subset selection in regression
(Myers 1986; Miller 1990). The ones most often used are:

* Forward selection begins with no centers in the network. At each step
  the center is added that most decreases the error function.

* Backward elimination begins with all candidate centers in the network.
  At each step the center is removed that least increases the error
  function.

* Stepwise selection begins like forward selection with no centers in the
  network. At each step, a center is added or removed. If there are any
  centers in the network, the one that contributes least to reducing the
  error criterion is subjected to a statistical test (usually based on the
  F statistic) to see if it is worth retaining in the network; if the
  center fails the test, it is removed. If no centers are removed, then
  the centers that are not currently in the network are examined; the one
  that would contribute most to reducing the error criterion is subjected
  to a statistical test to see if it is worth adding to the network; if
  the center passes the test, it is added. When all centers in the
  network pass the test for staying in the network, and all other
  centers fail the test for being added to the network, the stepwise
  method terminates.

Leaps and bounds (Furnival and Wilson 1974) is an algorithm for
determining the subset of centers that minimizes the error function;
this optimal subset can be found without examining all possible subsets,
but the algorithm is practical only up to 30 to 50 candidate centers.

OLS is a particular algorithm for forward selection using modified
Gram-Schmidt (MGS) orthogonalization. While MGS is not a bad algorithm,
it is not the best algorithm for linear least-squares. For
ill-conditioned data, Householder and Givens methods are generally
preferred, while for large, well-conditioned data sets, methods based on
the normal equations require about one-third as many floating point
operations and much less disk I/O than OLS. Normal equation methods
based on sweeping (Goodnight 1979) or Gaussian elimination (Furnival and
Wilson 1974) are especially simple to program. 

While the theory of linear models is the most thoroughly developed area
of statistical inference, subset selection invalidates most of the
standard theory (Miller 1990; Roecker 1991; Derksen and Keselman 1992;
Freedman, Pee, and Midthune 1992). 

Subset selection methods usually do not generalize as well as
regularization methods in linear models (Frank and Friedman 1993).
Orr (1995) has proposed combining regularization with subset
selection for RBF training (see also Orr 199?). 

References:

   Chen, S., Cowan, C.F.N., and Grant, P.M. (1991), "Orthogonal
   least squares learning for radial basis function networks,"
   IEEE Transactions on Neural Networks, 2, 302-309.

   Derksen, S. and Keselman, H. J. (1992)
   "Backward, forward and stepwise automated subset selection
   algorithms: Frequency of obtaining authentic and noise variables,"
   British Journal of Mathematical and Statistical Psychology, 45,
   265-282,

   Frank, I.E. and Friedman, J.H. (1993) "A statistical view of some
   chemometrics regression tools," Technometrics, 35, 109-148.

   Freedman, L.S. , Pee, D. and Midthune, D.N. (1992) "The problem of
   underestimating the residual error variance in forward stepwise regression",
   The Statistician, 41, 405-412. 

   Furnival, G.M. and Wilson, R.W. (1974), "Regression by Leaps and 
   Bounds," Technometrics, 16, 499-511.

   Goodnight, J.H. (1979), "A Tutorial on the SWEEP Operator,"
   The American Statistician, 33, 149-158.

   Lawson, C. L. and Hanson, R. J. (1974),
   Solving Least Squares Problems, Englewood Cliffs, NJ:
   Prentice-Hall, Inc. (I've heard that a 2nd edition is out.)

   Miller, A.J. (1990), Subset Selection in Regression, Chapman & Hall.

   Myers, R.H. (1986), Classical and Modern Regression with
   Applications, Boston: Duxbury Press.

   Orr, M.J.L. (1995), "Regularisation in the selection of radial
   basis function centres," Neural Computation, 7, 606-623.

   Orr, M.J.L. (199?), "Introduction to radial basis function
   networks," http://www.cns.ed.ac.uk/people/mark/intro.ps or
   http://www.cns.ed.ac.uk/people/mark/intro/intro.html .

   Roecker, E.B. (1991) "Prediction error and its estimation for 
   subset-selected models," Technometrics, 33, 459-468.

-- 

Warren S. Sarle       SAS Institute Inc.   The opinions expressed here
saswss@unx.sas.com    SAS Campus Drive     are mine and not necessarily
(919) 677-8000        Cary, NC 27513, USA  those of SAS Institute.
