Newsgroups: comp.ai.neural-nets
Path: cantaloupe.srv.cs.cmu.edu!das-news2.harvard.edu!news2.near.net!news.mathworks.com!news.kei.com!ddsw1!redstone.interpath.net!sas!mozart.unx.sas.com!saswss
From: saswss@hotellng.unx.sas.com (Warren Sarle)
Subject: Re: Relation between MSE and % correct class.?
Originator: saswss@hotellng.unx.sas.com
Sender: news@unx.sas.com (Noter of Newsworthy Events)
Message-ID: <DA5FyM.8Bp@unx.sas.com>
Date: Wed, 14 Jun 1995 06:13:34 GMT
X-Nntp-Posting-Host: hotellng.unx.sas.com
References: <19950613090858.BKAMP@pi0192.kub.nl> <3rjtl8$lbt@uuneo.neosoft.com>
Organization: SAS Institute Inc.
Lines: 56


In article <3rjtl8$lbt@uuneo.neosoft.com>, hav@neosoft.com writes:
|> >   BKAMP@kub.nl  (Kamp B.) writes:
|> >
|> >  I am working on a NN that will classify data into seven classes.
|> >  I measure performance both by Mean Square Error (MSE) and the
|> >  percentage of correct classification: I use one output which
|> >  ranges from 0 to 6, and round the classification to the nearest
|> >  integer and compare to the actual class.

Are the outputs ordered in such a way that a misclassification from 0
into 6 is six times as serious as a misclassification from 0 into 1, and
similarly for all other classes?  If so, then the percentage of correct
classification is of little use, since it fails to incorporate
information about the severity of the errors. If not, then MSE is almost
meaningless, since it incorporates irrelevant information.

|> Also - to parrot recent postings here - how are you calculating %err for class zero -
|> have you tried using classes 1...7  instead?

The percentage of correct classification is different from the mean
squared percent error. The former is just the number of correctly
classified cases divided by the total number of cases.  The latter would
imply that missclassifications into the classes with small numbers are
more serious than missclassifications into the classes with large
numbers, and that missclassifications into class 0 are infinitely bad!

|> Also, since RMS (so MSE?) seems related to difference-vector measurement and
|> % seems more related to quantity measurements (I'm ducking Warren {;-),

Umm, I don't understand the question.

|> I've often wondered about what differences in dynamics might exist between
|> single-output vs multiple output topologies for the same problem. (I know,
|> I know...one output is simply a one-element vector ... still I wonder...)
|> Have you tried using 3 (or even 7) outputs instead of just 1?

If you use multiple outputs with the target values coded as 0/1, then
with any remotely reasonable training criterion, the outputs are
estimates of the probability of class membership (it helps to use
a softmax activation function to force the outputs to sum to one.)
With the single-output scheme described above, I'm not sure what
the outputs are estimating. Suppose a certain input pattern (case, not
variable) occurs 100 times in the training set. For 50 of those cases
the target is 1, and for the other 50 the target is 3. After training
with a sufficiently large number of hidden units, the net will
produce an output of 2 for those cases. In other words, if there is
a 50/50 chance the target is 1 or 3, the net will be quite certain
that the answer is 2. Does this make any sense for a classification
problem?

-- 

Warren S. Sarle       SAS Institute Inc.   The opinions expressed here
saswss@unx.sas.com    SAS Campus Drive     are mine and not necessarily
(919) 677-8000        Cary, NC 27513, USA  those of SAS Institute.
