Newsgroups: comp.ai.neural-nets
Path: cantaloupe.srv.cs.cmu.edu!bb3.andrew.cmu.edu!newsfeed.pitt.edu!gatech!newsfeed.internetmci.com!in2.uu.net!news.interpath.net!sas!mozart.unx.sas.com!saswss
From: saswss@hotellng.unx.sas.com (Warren Sarle)
Subject: Re: FF-nets
Originator: saswss@hotellng.unx.sas.com
Sender: news@unx.sas.com (Noter of Newsworthy Events)
Message-ID: <Dp7Mu5.CK8@unx.sas.com>
Date: Tue, 2 Apr 1996 01:01:17 GMT
X-Nntp-Posting-Host: hotellng.unx.sas.com
References: <Pine.A32.3.91.960322123410.8235A-100000@tharros.dipchim.uniss.it> <4joe5b$moc@infa.central.susx.ac.uk>
Organization: SAS Institute Inc.
Lines: 40


In article <4joe5b$moc@infa.central.susx.ac.uk>, kevinc@cogs.susx.ac.uk (Kevin Charley) writes:
|> Antylox (bobo@tharros.dipchim.uniss.it) wrote:
|> : I'm working on a Feed-Forward net that gives me some problems. Its
|> : structure is 222-6-6 with backpropagation. The training set is made of
|> : 360 patterns, and the validation set of 30 patterns. The problem is
|> : the validation error curve doesn't reach a clear minimum, but falls
|> : continuously, with a slope that abruptly decrease after about 300 epochs.
|> : After this point, the curve decreases very slowly, as the training error
|> : curve does. How can I train it to the best point?
|> 
|> I'm hardly an expert on the subject but as a rule of thumb you are meant to
|> have at least as many training examples as weights in your network. 

That is true if you are doing maximum likelihood training, such as least
(mean) squares or cross entropy. However, you can have more weights than
cases and still possibly get good generalization if you use early
stopping or Bayesian estimation. See the FAQ for more info:
ftp://ftp.sas.com/pub/neural/FAQ.html

My interpretation of Antylox's problem is that the validation curve did
not reach a minimum, so it was not clear when to stop training.  Since
the net does have lots more weights than training cases, we would expect
to see some overfitting.  But 300 epochs is very little training for
standard backprop, and it's not very much even for conjugate gradients.
So perhaps the net just needs more training.

Another possibility is that the validation set is too similar to the
training set, or that the validation set is simply too small to estimate
generalization error accurately.

|> Either try and extend your trainning set ...

Definitely a good idea.

-- 

Warren S. Sarle       SAS Institute Inc.   The opinions expressed here
saswss@unx.sas.com    SAS Campus Drive     are mine and not necessarily
(919) 677-8000        Cary, NC 27513, USA  those of SAS Institute.
