Newsgroups: comp.ai.neural-nets
Path: cantaloupe.srv.cs.cmu.edu!bb3.andrew.cmu.edu!nntp.sei.cmu.edu!cis.ohio-state.edu!math.ohio-state.edu!cs.utexas.edu!howland.reston.ans.net!newsfeed.internetmci.com!news.netrail.net!barney.gvi.net!redstone.interpath.net!sas!mozart.unx.sas.com!saswss
From: saswss@hotellng.unx.sas.com (Warren Sarle)
Subject: Re: Regularization (was: Guidelines to no. of neurons)
Originator: saswss@hotellng.unx.sas.com
Sender: news@unx.sas.com (Noter of Newsworthy Events)
Message-ID: <DJwo3y.K9q@unx.sas.com>
Date: Wed, 20 Dec 1995 22:07:10 GMT
X-Nntp-Posting-Host: hotellng.unx.sas.com
References: <x5IGc6X.predictor@delphi.com> <4a3os6INNobl@bhars12c.bnr.co.uk> <Pine.SUN.3.91.951206153641.10817C-100000@solitude> <4a6rbdINNqnq@bhars12c.bnr.co.uk> <4a6td2$cf0@nfw>
Organization: SAS Institute Inc.
Lines: 48


In article <4a6td2$cf0@nfw>, Andrei Korovikov <korovik@bear.com> writes:
|> pgh@bnr.co.uk (Peter Hamer) wrote:
|> >...
|> >In NN terms: Penalty-functions are usually a sum-of-squared-weights term
|> >added to the sum-of-squared-errors (I believe this is mathematically
|> >equivalent to "weight decay").

Yes.

|>      Don't penalty functions usually use the *number* of weights?

There are different types of penalty functions for different purposes.
Some statistics used for model selection, such as AIC and SBC, use a
penalty that is a function of the number of weights. But regularization
requires a penalty function that is a function of the magnitudes of
the weights--after all, the number of weights usually doesn't change
during training.

|> I see no
|> reason to penalize large weights as opposed to small ones -- it's mostly a question
|> of scaling, anyway. 

Large weights can cause the output function to be excessively rough,
or even to extend far beyond the range of the data (unless, of course,
the output activation confines the outputs to a known appropriate
range). Keeping the weights sufficiently small prevents overfitting
even with a large number of weights. The disadvantages of large weights
are illustrated in Sarle, W.S. (1995), "Stopped Training and Other
Remedies for Overfitting," available by anonymous ftp from ftp.sas.com
in the directory /pub/neural, under the file names inter95.ps.gz,
inter95.ps.Z, or inter95.zip (your choice of compression programs).

|> Now, the number of weights is a different story, since this
|> is directly related to the degrees of freedom the net has.
|> Besides, there is a fair number of weight decay techniques, which are not
|> equivalent to each other, much less to using a penalty function.

All of the weight decay techniques that I have seen _are_ equivalent
to using a penalty function--different weight decay techniques involve
different penalty functions.

 
-- 

Warren S. Sarle       SAS Institute Inc.   The opinions expressed here
saswss@unx.sas.com    SAS Campus Drive     are mine and not necessarily
(919) 677-8000        Cary, NC 27513, USA  those of SAS Institute.
