Newsgroups: comp.ai.neural-nets
Path: cantaloupe.srv.cs.cmu.edu!bb3.andrew.cmu.edu!newsfeed.pitt.edu!scramble.lm.com!news.math.psu.edu!news.cac.psu.edu!howland.reston.ans.net!tank.news.pipex.net!pipex!newsfeed.internetmci.com!in1.uu.net!news.interpath.net!sas!newshost.unx.sas.com!saswss
From: saswss@hotellng.unx.sas.com (Warren Sarle)
Subject: Re: nn and stat decisions for med:  Bayes vs neorotic nets
Originator: saswss@hotellng.unx.sas.com
Sender: news@unx.sas.com (Noter of Newsworthy Events)
Message-ID: <Ds15K9.1u0@unx.sas.com>
Date: Sun, 26 May 1996 20:43:21 GMT
X-Nntp-Posting-Host: hotellng.unx.sas.com
References: <319E6EBB.41C67EA6@camis.stanford.edu> <4nngcr$g11@scapa.cs.ualberta.ca> <4nqgls$pg6@tuegate.tue.nl> <31A16580.41C67EA6@camis.stanford.edu> <4o25j2$geg@tuegate.tue.nl> <31A63CE8.167EB0E7@camis.stanford.edu>
Organization: SAS Institute Inc.
Lines: 45


In article <31A63CE8.167EB0E7@camis.stanford.edu>, Scott Schmidler <schmidler@camis.stanford.edu> writes:
|> ...
|> Whether or not modeling correlations is beneficial for classification
|> tasks still appears to be an empirical issue, for which I have not seen
|> a good theoretical discussion.  

It depends on the data.

|> Several people in the machine learning
|> community have recently shown that Naive Bayes classifiers commonly
|> outperform more complex Bayesian networks when the models are learned
|> from data.  

It is trivial to set up examples where one method works better, or
the other method works better.

|> Particularly for small data sets, models which estimate the
|> full joint density may perform worse, if the additional parameters being
|> estimated "cancel" in some sense upon conditioning.  I don't believe 
|> this is a well-understood problem, but would welcome any pointers to
|> a thorough treatment of this in the statistical literature.

A simple but slightly incorrect model will often generalize better
than a complicated, correct model if there is too little data or
too much noise to estimate the complicated model accurately. The
theory for linear models is well known. See chapter 21 of:

  Judge, G.G., Griffiths, W.E., Hill, R.C., Lutkepohl, H. and Lee, T.
  (1985), The Theory and Practice of Econometrics, 2nd ed., New York:
  John Wiley & Sons.

For discriminant analysis, see chapter 5 of:

   McLachlan, G.J. (1992) Discriminant Analysis and Statistical Pattern
   Recognition, Wiley: NY.

In practice, it is usually best to fit a variety of models and
estimate the generalization error of each.

-- 

Warren S. Sarle       SAS Institute Inc.   The opinions expressed here
saswss@unx.sas.com    SAS Campus Drive     are mine and not necessarily
(919) 677-8000        Cary, NC 27513, USA  those of SAS Institute.
