Newsgroups: comp.ai.neural-nets
Path: cantaloupe.srv.cs.cmu.edu!goldenapple.srv.cs.cmu.edu!das-news2.harvard.edu!cam-news-feed3.bbnplanet.com!cam-news-hub1.bbnplanet.com!news.bbnplanet.com!news.maxwell.syr.edu!cdc2.cdc.net!newsfeed.concentric.net!vyzynz!inquo!news.mira.net.au!harbinger.cc.monash.edu.au!news.rmit.EDU.AU!matilda.vut.edu.au!news
From: dunne@yarra.vut.edu.au (Robert Dunne)
Subject: Re: Unbalanced classes.
In-Reply-To: Greg Heath's message of Thu, 3 Jul 1997 17:43:01 -0400
Message-ID: <tp90znkz2g.fsf@yarra.vut.edu.au>
Lines: 80
Sender: dunne@yarra.vut.edu.au
Reply-To: dunne@matilda.vut.edu.au
Organization: victoria uni of tech
X-Newsreader: Gnus v5.1
References: <marzban.866563542@turks>
	<Pine.SOL.3.91.970618032811.4181A-100000@miles>
	<Pine.SOL.3.91.970702045831.462A-100000@miles>
	<tpyb7oy8tg.fsf@yarra.vut.edu.au>
	<Pine.SOL.3.91.970703170820.1218C-100000@miles>
Date: Fri, 4 Jul 1997 08:21:11 GMT


> Gregory E. Heath (heath@ll.mit.edu ) wrote 
> 
> I just skimmed that section in Bishop, intending to come back for t 
> crossings and i dottings. What I thought I picked up from that initial 
> pass was that 
> 
> 1. Posteriors are expected values of the output probabilities conditional 
> on the targets 
> 2. For a given class of objective functions, only SSE + linear activations 
> and XENT +softmax activations satisfied the proprty in 1.

greg,
    I think that we disagree about 2). Sorry to go on at such length -- but I
want to be sure that I understand this. As I understand it now

1) if \rho is the penalty function (ie SSE or XINT) and z is the network output
    then  \rho is minimize when z = E[t|x] (the conditional expectation)
   for a wide variety of \rho including SSE and XINT (with a restriction t and
    z that they sum to 1)    See \cite{Gish.1990}, 
        \cite{Hampshire.and.Perlmutter.1990} \cite{Richards.Lippmann.1991}

    This does not depend on the fact that z is the output of an MLP.


2) z is a "universal approximator" (\cite{Hornik.et.al.1989)
   ie it can approximate a continuous function "f" on compact regions.
  This is known for MLP with 1) linear output units
				 2) logistic output units, provided f is
                                                 restricted to (0,1)
   but I have not seen any results for
				 3) softmax
      (perhaps we can just assume it is true for now)
      

3) as 2) is uniform convergence in L^2 it implies L^2(\mu) convergence where
\mu is  some probability measure, so that from 1) and 2) we have consistency ie
		
			lim_{N\rightarrow \infty} E[z] = E[t|x]

where the expectation is over the data set of size N and the MLP can be of
arbitrary size.


All this together seems to say that whether we use
	1) SSE and linear output units
	2) SSE and logistic output units
	3) XINT and softmax 
we have the (asymptotic) result that
              lim_{N\rightarrow \infty} E[z] = E[t|x]

However only 3) guarantees that the outputs will sum to 1 for a finite N, and
in addition 3) seems to give better results for realistic sample sizes.


I think that what Bishop is saying is that
 for        SSE + linear activations 
 and for    XENT +softmax activations 
we get a delta_k term of the form (z_k - t_k). He considers this a
natural pairing -- I dont understand why although it does look
pleasingly simple in form. 

_----------------------------------------------------------------------

References are in Bishop, except for

@inproceedings{Gish.1990,
       author="Herbert Gish",
        title="A probabilistic approach to the understanding and training of
              neural network classifiers",
        year="1990",
        booktitle="Proceedings of the 1990 International Conference on
                    Acoustics, Speech and Signal Processing",
        month="April",
        pages="1361--1364" }
_----------------------------------------------------------------------		

 	  	so, what do you think?
				Cheers			
								rob

-- 
* Rob Dunne  
* Victoria University of Technology , Footscray Campus      
* Department of Computer and Mathematical Sciences                
* P.O. 14428, MCMC.                   Fax:   +61 3 9688 4050  
* MELBOURNE 8001, AUSTRALIA           Tel:   +61 3 9688 4757   
* <http://matilda.vut.edu.au/~dunne>  Email: dunne@matilda.vut.edu.au

You can measure a programmer's perspective by noting his attitude on
the continuing viability of FORTRAN.
                -- Alan Perlis
