Newsgroups: comp.ai.neural-nets
Path: cantaloupe.srv.cs.cmu.edu!bb3.andrew.cmu.edu!newsfeed.pitt.edu!scramble.lm.com!news.math.psu.edu!news.cse.psu.edu!uwm.edu!news-res.gsl.net!news.gsl.net!news.mathworks.com!newsfeed.internetmci.com!in2.uu.net!news.interpath.net!sas!newshost.unx.sas.com!saswss
From: saswss@hotellng.unx.sas.com (Warren Sarle)
Subject: Re: Conjugate Gradient, Bayesian a Posterior Probability
Originator: saswss@hotellng.unx.sas.com
Sender: news@unx.sas.com (Noter of Newsworthy Events)
Message-ID: <Duv01t.3EK@unx.sas.com>
Date: Sat, 20 Jul 1996 20:39:29 GMT
X-Nntp-Posting-Host: hotellng.unx.sas.com
References: <4seuen$ovt@mozo.cc.purdue.edu> <4shnnk$8c4@sjx-ixn6.ix.netcom.com> <31EC8356.41C6@smi.stanford.edu> <4si9n4$cp6@dfw-ixnews3.ix.netcom.com>
Organization: SAS Institute Inc.
Lines: 43


In article <4si9n4$cp6@dfw-ixnews3.ix.netcom.com>, jdadson@ix.netcom.com(Jive Dadson) writes:
|> In <31EC8356.41C6@smi.stanford.edu> Scott Schmidler 
|> >
|> >CG is just another method for finding zeroes of functions ...
|> 
|> Brrzzzzt. I'm sorry, CG is just another method for finding
|> local minima of functions. 

But you _can_ use CG for finding zeroes of functions, too. 


|> >As Warren has pointed out before, MSE can be viewed as yielding robust
|> >estimates rather than MLEs, which may be desirable if your data
|> >contains outliers.
|> 
|> I'll have to think about that for a bit. MSE does not penalize
|> nearly as severely for giving a very low probability to a "winner"
|> in the training set as log-likelihood does. But golly, that's the most
|> costly mistake you can make. Just ask the folks who make the future
|> book at Caliente about their 50-1 shot that won the Kentucky Derby.

How costly a mistake is depends on the particular application. There
are numerous applications in which mistakes regarding rare events are
no more costly than mistakes regarding even odds. For example, if
you are a market researcher investigating probabilities of consumer 
purchases, you really don't care whether the probability of someone
buying a $100 box of cereal is 1e-6 or 1e-7. In other cases, the cost
may not be symmetric. For example, if you are studying aircraft
safety, you may care very much whether the probability of a hydraulic
line failing is 1e-6 or 1e-7 per flight, but the distinction between
a .999999 and a .9999999 failure probability is irrelevant.

The main point here is that people using neural nets or other
statistical models need to think about _why_ they are doing their analyses
and then to choose error functions, case weights, cost matrices, etc.
that are appropriate for their particular purposes.

-- 

Warren S. Sarle       SAS Institute Inc.   The opinions expressed here
saswss@unx.sas.com    SAS Campus Drive     are mine and not necessarily
(919) 677-8000        Cary, NC 27513, USA  those of SAS Institute.
