Newsgroups: comp.ai.neural-nets
Path: cantaloupe.srv.cs.cmu.edu!das-news2.harvard.edu!news2.near.net!news.mathworks.com!news.duke.edu!concert!sas!mozart.unx.sas.com!saswss
From: saswss@hotellng.unx.sas.com (Warren Sarle)
Subject: Re: Question: Should 1 or 2 outputs Make a Difference?
Originator: saswss@hotellng.unx.sas.com
Sender: news@unx.sas.com (Noter of Newsworthy Events)
Message-ID: <Cyz0HK.JMp@unx.sas.com>
Date: Tue, 8 Nov 1994 22:44:07 GMT
References: <3NOV94.19210355@owl.sunnybrook.utoronto.ca> <39bukc$pah@cantaloupe.srv.cs.cmu.edu> <39dled$ssn@scapa.cs.ualberta.ca>
Nntp-Posting-Host: hotellng.unx.sas.com
Organization: SAS Institute Inc.
Lines: 53


In article <39dled$ssn@scapa.cs.ualberta.ca>, arms@cs.ualberta.ca (Bill Armstrong) writes:
|> sef@CS.CMU.EDU (Scott Fahlman) writes:
|>
|>
|> >In article <3NOV94.19210355@owl.sunnybrook.utoronto.ca> farrell@owl.sunnybrook.utoronto.ca writes:
|>
|> >       If the network is learning to classify a binary variable, such as
|> >   the presence or absence of some behaviour, should it make a difference whether
|> >   the target output is a single output ranging from 0 to 1 or a dual output
|> >   where "01" would be one classification and "10" would be the other possible
|> >   category?
|>
|> >If the two outputs always have complementary values in the training
|> >data, using two values makes a little bit of extra work for the net:
|> >the answer isn't right until both outputs are right, so you have to
|> >wait for the slower one.  The difference usually won't be dramatic.
|>
|> >-- Scott
|>
|>
|> >===========================================================================
|> >Scott E. Fahlman                    Internet:  sef+@cs.cmu.edu
|>
|>
|> I'm wondering about the influence of the shared subnets in the net.
|> If backpropagated signals come from *two* errors at the output which
|> are exactly opposite ( the desired output is 0 in one case and 1 in
|> the other, then, at least averaged over all starting situations, the
|> adaptation by backprop tends to cancel out in the shared subnets.

No, because the signs of the weights from the shared subnets to the
outputs should quickly achieve opposite signs, so there is no
cancellation.

|> Wouldn't this tend to *really* slow down learning, and thus explain a
|> great difference in performance observed by the original questioner
|> between the nets using one vs two outputs?

The problem is the ill-conditioning introduced by the shared subnets,
which slows down all training algorithms to some degree. One solution is
to use the softmax output activation function, which gives you an exact
singularity instead of ill-conditioning, and exact singularities are
easily dealt with in 2nd-order algorithms. Or you can just delete the
weights going to one output, which eliminates the singularity and
effectively reduces the net to one output. This idea generalizes to a
target variable with k categories: you really need only k-1 outputs.

-- 

Warren S. Sarle       SAS Institute Inc.   The opinions expressed here
saswss@unx.sas.com    SAS Campus Drive     are mine and not necessarily
(919) 677-8000        Cary, NC 27513, USA  those of SAS Institute.
