Newsgroups: sci.logic,sci.stat.math,comp.ai.neural-nets
Path: cantaloupe.srv.cs.cmu.edu!bb3.andrew.cmu.edu!newsfeed.pitt.edu!gatech!news.mathworks.com!news.sprintlink.net!news-peer.sprintlink.net!interpath!news.interpath.net!news.interpath.net!sas!newshost.unx.sas.com!saswss
From: saswss@hotellng.unx.sas.com (Warren Sarle)
Subject: Re: Occam's razor & WDB2T [was Decidability question]
Originator: saswss@hotellng.unx.sas.com
Sender: news@unx.sas.com (Noter of Newsworthy Events)
Message-ID: <E0vt38.6JL@unx.sas.com>
Date: Thu, 14 Nov 1996 22:28:20 GMT
X-Nntp-Posting-Host: hotellng.unx.sas.com
References: <55gpi8$1sk@dfw-ixnews12.ix.netcom.com> <327CC727.22EE@postoffice.worldnet.att.net> <32837820.7ACB@postoffice.worldnet.att.net> <56dgil$fcs@netserv.waikato.ac.nz>
Organization: SAS Institute Inc.
Lines: 41
Xref: glinda.oz.cs.cmu.edu sci.logic:20841 sci.stat.math:13247 comp.ai.neural-nets:34570


In article <56dgil$fcs@netserv.waikato.ac.nz>, maj@waikato.ac.nz (Murray Jorgensen) writes:
|> I regret that I do not have the time to respond to this thread in detail. 
|> I have looked at Geoff Webb's article in
|> http://www.cs.washington.edu/research/jair/table-of-contents-vol4.html
|> and it seems to conflict with all my intuition built up as a practising 
|> statistician.
|> ... It is widely accepted in the statistical 
|> community that 'overfitting' of a data set [using a needlessly complex 
|> model] results in a fitted model closely tuned to that particular data 
|> set that has poor predictive power. This is not to say that there is not 
|> additional complexity to be discovered, just that the data set under 
|> consideration does not contain enough information about possible 
|> elaborations to the model to make it safe to fit them.

I will try briefly to appease Murray's statistical intuition. The
problem with Geoff Webb's interpretation of his interesting and possibly
very useful work has to do with the meaning of "complexity". Whether the
number of splits or leaves in a tree-based model is a measure of the
model's complexity depends on how the tree is grown. Consider a
nonlinear regression (i.e. function approximation) problem. Suppose my
prior beliefs indicate that the regression function is smooth, as is
often the case in real life. Regression trees tend to sacrifice
smoothness for interpretability. But I could obtain a smooth regression
tree by doing some form of smooth regression, such as kernel regression,
and then growing a tree with billions of leaves to approximate the
smooth kernel regression surface instead of approximating the original
data.  The size of the resulting tree would not be a measure of the
tree's complexity--in fact, one could argue that the bigger the tree,
the simpler it is!

I have not figured out _exactly_ what Geoff Webb is doing, but I think
he is sort of smoothing the tree by growing more branches where there
is no training data. I need to reread the article more carefully to
verify that hypothesis.

-- 

Warren S. Sarle       SAS Institute Inc.   The opinions expressed here
saswss@unx.sas.com    SAS Campus Drive     are mine and not necessarily
(919) 677-8000        Cary, NC 27513, USA  those of SAS Institute.
