Newsgroups: comp.ai.neural-nets
Path: cantaloupe.srv.cs.cmu.edu!bb3.andrew.cmu.edu!newsfeed.pitt.edu!gatech!newsfeed.internetmci.com!in1.uu.net!news.interpath.net!sas!mozart.unx.sas.com!saswss
From: saswss@hotellng.unx.sas.com (Warren Sarle)
Subject: Re: WLS and Much more samples of one pattern than the others
Originator: saswss@hotellng.unx.sas.com
Sender: news@unx.sas.com (Noter of Newsworthy Events)
Message-ID: <DoAr8G.5nF@unx.sas.com>
Date: Fri, 15 Mar 1996 06:55:28 GMT
X-Nntp-Posting-Host: hotellng.unx.sas.com
References: <4hgvri$7hd@goya.eunet.es> <DnxGvK.EyI@unx.sas.com> <4hqb1m$nrg@llnews.ll.mit.edu> <314430C8.41C67EA6@stats.ox.ac.uk>
Organization: SAS Institute Inc.
Lines: 47


In article <4hgvri$7hd@goya.eunet.es>, Jose Parga <bolsamad@dial.eunet.es> writes:
|> Let supose we want to train a NN for character recognition. We want to
|> teach the NN to recognize only the vowels (a,e,i,o,u). But for some reasons,
|> we have much more samples of the vowels "a" and "e" than the rest of them
|> "i,o,u"
|>
|> Do you think I should train the NN with the same number of samples from each
|> vowel? This of course will reduce the total number of samples.

In article <DnxGvK.EyI@unx.sas.com>, saswss@hotellng.unx.sas.com (Warren Sarle) writes:
|> Use all the training data, but weight the error function inversely to
|> the number of cases for each vowel. When applied to the usual squared-
|> error function, this technique is called "weighted least squares" (WLS)
|> and is discussed in any good textbook on linear regression, such as:

Greg Heath wrote:
> Going one step further, if the a priori probability of vowel i is Pi and the
> number of training cases is Ni, then weight the corresponding squared error term
> by the ratio (Pi/Ni).

In article <314430C8.41C67EA6@stats.ox.ac.uk>, "Prof. Brian Ripley" <ripley@stats.ox.ac.uk> writes:
|> This seems to me to be a classification problem, and Warren's references
|> are all to regression.  I find this topic is not usually covered in
|> books on pattern recognition, although it is (briefly) in mine (pp.58-9)
                                                 ^^^^^^^
Very! I'm still looking for something less brief.

|> Ripley, B.D. (1996) Pattern Recognition and Neural Networks. CUP.
|> ISBN 0-521-46086-7. See http://www.stats.ox.ac.uk/~ripley/PRbook/
|> 
|> The issues are the same _but_ you have also got to adjust the posterior
|> probabilities found by the weighted procedure.  People often forget the
|> latter!

What I suggested was correct for equal prior probabilities. What Greg
suggested handles unequal priors. But if you have a class with a small
prior, and you want to estimate probabilities for that class very
accurately, then you may need to use larger weights for that class
than what Greg said. In that case, as Prof. Ripley says, you would
have to adjust the posteriors.

-- 

Warren S. Sarle       SAS Institute Inc.   The opinions expressed here
saswss@unx.sas.com    SAS Campus Drive     are mine and not necessarily
(919) 677-8000        Cary, NC 27513, USA  those of SAS Institute.
