Newsgroups: comp.ai.neural-nets
Path: cantaloupe.srv.cs.cmu.edu!bb3.andrew.cmu.edu!newsfeed.pitt.edu!godot.cc.duq.edu!newsgate.duke.edu!news.mathworks.com!newsfeed.internetmci.com!news.sprintlink.net!new-news.sprintlink.net!news.interpath.net!sas!newshost.unx.sas.com!saswss
From: saswss@hotellng.unx.sas.com (Warren Sarle)
Subject: Re: Q: Small or large weights ?
Originator: saswss@hotellng.unx.sas.com
Sender: news@unx.sas.com (Noter of Newsworthy Events)
Message-ID: <DtxAAD.J46@unx.sas.com>
Date: Tue, 2 Jul 1996 15:42:13 GMT
X-Nntp-Posting-Host: hotellng.unx.sas.com
References: <4qrk8c$96f@eng_ser1.erg.cuhk.hk> <DtowKK.8Ft@unx.sas.com> <4r9arl$p0a@llnews.ll.mit.edu>
Organization: SAS Institute Inc.
Lines: 42


In article <4qrk8c$96f@eng_ser1.erg.cuhk.hk>, ccszeto@cs.cuhk.hk (Szeto Chi Cheong) writes:
|> If I have two networks
|> (1) small number of large weights
|> (2) large number of small weights
|> Which one is better ?
|> Does the first one correspond to limited degree of freedom and the second 
|> one correspond to limited extent of search space ?

saswss@hotellng.unx.sas.com (Warren Sarle) replied in part:
|> As for limited extent of search space, I have no idea what that means.

To which "Gregory E. Heath" <heath@ll.mit.edu> responded:
|> The small weight *size* constraint limits the search in weight space to a 
|> small neighborhood of the origin(vs the small *number* of weights 
|> constraint that limits the search to a lower dimensional subspace).

OK, that makes sense if you put a hard bound on the size (or in
general some norm) of the weights. But if you use weight decay, that
just changes the error function rather than limiting the feasible
region of the weight space.

Gregory continued:
|> Which network learns faster(e.g., weight elimination vs weight decay 
|> and/or fast stopping)?

Training the same network with same algorithm, early stopping is clearly
going to be faster than training to convergence. But early stopping
requires large hidden layers to avoid bad local optima, whereas weight
elimination/decay can be used with either large or small hidden layers
(data permitting). But if you use small hidden layers, you need to use
more expensive algorithms for global optimization. So the trade-offs
aren't entirely clear. I'm inclined to think that early stopping tends
to be faster.



-- 

Warren S. Sarle       SAS Institute Inc.   The opinions expressed here
saswss@unx.sas.com    SAS Campus Drive     are mine and not necessarily
(919) 677-8000        Cary, NC 27513, USA  those of SAS Institute.
