Newsgroups: comp.ai.neural-nets
Path: cantaloupe.srv.cs.cmu.edu!nntp.club.cc.cmu.edu!bb3.andrew.cmu.edu!nntp.sei.cmu.edu!news.cis.ohio-state.edu!news.maxwell.syr.edu!feed1.news.erols.com!news.ecn.uoknor.edu!munnari.OZ.AU!harbinger.cc.monash.edu.au!news.rmit.EDU.AU!matilda.vut.edu.au!news
From: dunne@yarra.vut.edu.au (Robert Dunne)
Subject: Re:Posterior Estimation Validity [long]
Message-ID: <tpafjjd5ai.fsf@yarra.vut.edu.au>
Lines: 197
Sender: dunne@yarra.vut.edu.au
Reply-To: dunne@matilda.vut.edu.au
Organization: victoria uni of tech
X-Newsreader: Gnus v5.1
Date: Sat, 19 Jul 1997 04:39:17 GMT

Hi all, 
As this thread is getting old -- here is little history
I posted a URL to a paper with a numerical example showing that 
cross--entropy + softmax gave a better misclassification rate than
either 1) least squares + logistic
       2) least squares + softmax

Warren posted a counter example (the data set is shown below) 
that showed that least--squares performed better than cross-entropy 
in this case. Specifically that it was less sensitive to an outlying 
point. MLP 1 and MLP 2 below recapitulate a small part of Warren's 
example.

However MLP 3 (described below) has a cross-entropy penalty function
but is less sensitive to the outlier because of the hidden layer.

MLP 1
the MLP is of size 2.2 , that is 
  2 inputs + bias
  no hidden layer unit
  2 output units (softmax activation)
  cross-entropy penalty function
and the confusion matrix on Warrens example is 
   1  2 
1 47  3            ie 7 misclassified
2  4 47

and 

MLP 2
the MLP is of size 2.2 , that is 
  2 inputs + bias
  no hidden layer unit
  2 output units (logistic activation)
  least squares  penalty function
and the confusion matrix on  Warrens example is 
   1  2 
1 50  0       ie 1 misclassified
2  1 50

but

MLP 3
the MLP is of size 2.1.2 , that is 
  2 inputs + bias
  1 hidden layer unit (logistic activation)
  2 output units (softmax activation)
  cross-entropy penalty function
and the confusion matrix on  Warrens example is 
   1  2 
1 50  0     ie 1 misclassified
2  1 50

the outputs of the hidden layer unit are shown in the plot below. Clearly
the output layer can now separate the two classes, except for the outlier.
The next plot (of the fitted values) shows that this has occured, and the 
final plot shows the decision boundaries.

What is happening here can be viewed in two ways

1) as the logistic function maps R^n \rightarrow [0,1]
   the hidden layer unit is "squashing" the feature space 
   into the interval [0,1]. Hence the outlier is no longer 
   so outlying.

2) the problem can be viewed in terms of the influence curves -- the 
   problem with the cross-entropy penalty function is that the influence
   curves are unbounded. Adding a hidden--layer corrects this (although 
   not completely). 

   There is a draft paper on
   ftp://matilda.vut.edu.au:/pub/papers/dunne/draft/draft.influence.ps.gz
   that describes the ICs for a standard SSE+logistic MLP. I will add a 
   section describing them for a cross--entropy + softmax MLP.
   The paper is unfortunately long and a bit tedious, although the 
   calculation of the ICs is quite simple. 



I don't think that this is the end of the story  -- Although the paper 
above shows that the standard MLP is not a robust estimator (as the ICs
are not all bounded) the important practical question of the circumstances 
under which it will behave in a resistant fashion still seems unclear.


						Cheers
							rob 


 

________________________hidden layer outputs  MLP 3______________________
             the outlier              
                |
      ......... | ....................................................
      .         V
  1.0..  111111 0111111111111111111111111    1         1         1      1
      .
      .
      .
      .
      .                                                                 0
      .                                                          0
      .                                                0
      .                                     0
      .                              0
  0.8..                    0
      .          0
      .  0
      
      
       
      .
      .
  0.0..                                     0      0          0      0
      .
      .....................................................................
        0           20           40          60           80          100
                                       Index
______________________________________________________________________________

                                  fitted values  MLP 3
       .....................................................................
  1.0..
      .  111111 0111111111111111111111111    1         1         1      1
      .
      .
      .


      .
      .
      .
  0.0..  0       0         0         0      0                           0
      .
      .....................................................................
        0           20           40          60           80          100
                                       Index
______________________________________________________________________________

the two decision boundaries 
    ................................................................
   .            MLP 1      .
10..               .       .
   .               .       . 
   .               .     (MLP 2 and 3)
   .               .       .
   .               .       .
   .               .       .
   .                .      .
   .                .      .
   .                 .     .
   .                 .     .
 8..                 .    0. 1   1   1   1                       0
   .                  .    .
   .                  .    .
   .                  .    .
   .                   .   .
   .                   .  0. 1   1   1   1   1
   .                   .   .
   .                    .  .
   .                    .  .
   .                    .  .
 6..                     .0. 1   1   1   1   1
   .                     . .
   .                     . .
   .                      ..
   .                      ..
   .                      0. 1   1   1   1
   .                       .
   .                       .
   .                       ..
   .                       ..
 4..          0   0   0   0..1
   .                       . .
   .                       . .
   .                       . .
   .                       .  .
   .      0   0   0   0   0. 1.
   .                       .  .
   .                       .   .
   .                       .   .
   .                       .   .
 2..      0   0   0   0   0. 1  .           
   .                       .    .           
   .                       .    .           
   .                       .     .          
   .                       .     .
   .          0   0   0   0. 1    .
   .                       .      .
   .                       .      .
   .                       .       .
   .                       .       .
 0..                       .       .
   .                                .
   .................................................................
     0                   5                   10                 15
-- 
* Rob Dunne  
* Victoria University of Technology , Footscray Campus      
* Department of Computer and Mathematical Sciences                
* P.O. 14428, MCMC.                   Fax:   +61 3 9688 4050  
* MELBOURNE 8001, AUSTRALIA           Tel:   +61 3 9688 4757   
* <http://matilda.vut.edu.au/~dunne>  Email: dunne@matilda.vut.edu.au

You can measure a programmer's perspective by noting his attitude on
the continuing viability of FORTRAN.
                -- Alan Perlis
