Newsgroups: comp.ai.neural-nets
Path: cantaloupe.srv.cs.cmu.edu!das-news2.harvard.edu!fas-news.harvard.edu!newspump.wustl.edu!news.ecn.bgu.edu!vixen.cso.uiuc.edu!howland.reston.ans.net!news.sprintlink.net!redstone.interpath.net!sas!mozart.unx.sas.com!saswss
From: saswss@hotellng.unx.sas.com (Warren Sarle)
Subject: Re: Results of NN Challenge: NN versus Multiple Reg
Originator: saswss@hotellng.unx.sas.com
Sender: news@unx.sas.com (Noter of Newsworthy Events)
Message-ID: <D8nB1C.H9K@unx.sas.com>
Date: Tue, 16 May 1995 00:36:48 GMT
X-Nntp-Posting-Host: hotellng.unx.sas.com
References: <3p3d3n$p90@newsbf02.news.aol.com> <marzban.800577942@phyast>
Organization: SAS Institute Inc.
Lines: 96


In article <marzban.800577942@phyast>, marzban@phyast.nhn.uoknor.edu (Caren Marzban) writes:
|> ...
|> In addition to the previous comments, I must add this: Why is it that
|> the comparison is done at the level of R-squared (which I ssume is
|> simply the r-correlation)? After all, r is a measure of only *linear*
|> correlations.

R-squared is typically defined for nonlinear least-squares models as:

    2       SSE for nonlinear model
   R  = 1 - -----------------------
              SSE for null model

where SSE means "sum of squared errors".  The problem is how to define
the "null model". If the nonlinear model has an intercept term (in a
network with linear output units, the output bias is an intercept), the
null model would be an intercept-only model, i.e. just fit a mean to the
target values. Without an intercept (for example, a network with
logistic output units), the appropriate null model may not be clear.
Training criteria other than least squares also present problems.
Here is a recent post from the stat-l list regarding problems of
defining R-squared in logistic regression:

Newsgroups: bit.listserv.stat-l
Date: Sun, 26 Mar 1995 11:37:00 EST
Sender: Stat-l Discussion List <STAT-L@VM1.MCGILL.CA>
From: Sigurdur R Saemundsson <SIGGI@UNCMVS.OIT.UNC.EDU>
Subject: pseudo - R**2 in logistic regression.

Hi y'all

 As you are acutely aware of researchers often want to know the
 proportion of the variance explained by their precious little
 regression model. This is the case with me now, I have to
 get some handle on how well my model explains the data or
 how big a proportion of the phenomenon being modelled is
 explained by the model I've constructed.

 My problem is that the model is a logistic one and no statistic
 exists for that model that has the same interpretation as R**2
 in classical regression.

 Aldrich and Nelson say in their invaluable green book that there
 are several measures of goodness of fit for log models that are
 "in the spirit of R**2". Hosmer and Lemeshow mention two such
 statistics. So the ones I've found are:

   # Proportion of cases correctly predicted.

   # Tabulate predicted versus actual values and use some correlation
   statistic to summarize the table.

   # They (A&N) themselves suggest   c/(N+c)  where c=chi**2 statistic
   for overall fit c=-2log(L0/L1). And N=sample size

   # Hosmer and Lemeshow talk about
   R**2=100(L0-LP)/(L0-LS) where L0 and LP denote the log-likelihoods
   for models with only an intercept and a model with intercept and
   the p covariates
                     and LS is the same statistic for
   the saturated model.

   # Hosmer and Lemeshow also talk about R**2=100(L0-LP)/L0


   # McKelvey and Zavoina proposed a pseudo R**2 for probit models
   that when modified by Aldrich and Nelson for logistic models looks
   like this:    Pseudo R**2=ExSS/(ExSS+3.29N)  where
   ExSS= sum over all N {(Ypredicted - Ybar)**2}

 I've calculated  the last two for my model and frankly I was expecting
the two to indicate the same thing. To my surprise the two statistics
are not even in the same ballpark.  We are talking 18% vs 41%.

A&N go on to say that no one measures is universally accepted
or employed. Now my question is: are there any others that are better
accepted? Do any of these behave better than the others? Any one of
them better accepted than the others? Has anything
changed since 84 when A&N wrote their text?

 Thank you in advance.

 Sigurdur Runar Saemundsson (SIGGI the ICELANDER)
 Department of Epidemiology and
 Department of Pediatric Dentistry.
 University of North Carolina.
 USA.
 SIGGI@UNC.OIT.UNC.EDU


-- 

Warren S. Sarle       SAS Institute Inc.   The opinions expressed here
saswss@unx.sas.com    SAS Campus Drive     are mine and not necessarily
(919) 677-8000        Cary, NC 27513, USA  those of SAS Institute.
