Newsgroups: comp.ai
Path: cantaloupe.srv.cs.cmu.edu!rochester!mucit
From: mucit@cs.rochester.edu (Bulent Murtezaoglu)
Subject: Re: How to do a T-test for statistical significance?
In-Reply-To: randy@axon.cs.byu.edu's message of Wed, 31 May 1995 16:51:52 -0700
Message-ID: <MUCIT.95May31203207@vein.cs.rochester.edu>
Sender: mucit@cs.rochester.edu (Bulent Murtezaoglu)
Organization: University of Rochester, Dept. of Computer Science
References: <randy-3105951651520001@rm3346m1.cs.byu.edu>
Date: 01 Jun 1995 00:32:07 GMT
Lines: 49

>>>>> "RW" == Randy Wilson <randy@axon.cs.byu.edu> writes:

    RW> I am working on several machine learning algorithms, and in
    RW> comparing the generalization accuracy of each algorithm to the
    RW> others (and to well-known algorithms like C4.5, IB4, Backprop,
    RW> etc.), I would like to be able to tell whether my average
    RW> accuracy percentages are statistically significant.  

I don't have a clear idea what 'statistically significant' would
mean in this context unless you are dealing with samples picked
from a large dataset.

    RW> Several
    RW> researchers have mentioned the t-test for stastical
    RW> significance.  Has anyone actually done this? Do you know the
    RW> formulas involved? Are there any code examples of how to do
    RW> this?

You can sonsult undergraduate statistics books for the formulas and
the general ideas.

    RW> The problem can be formalized as follows: Given two sets of n
    RW> numbers, X1..Xn and Y1..Yn, and their respective averages, Xav
    RW> and Yav, are Xav and Yav significantly different stastically?

The way I understand it, you cannot talk about significance with just
those numbers at hand and no other knowledge of what you are measuring.
Suppose you used (one of my favorite simple datasets) the 'votes'
set from the machine learning repository.  You split it up into
training and test sets and use identical ones for algorithms A and B.
A achives 90% accuracy and B achives 92%.  Is this statisticaly 
significant?  Is this an appropriate question to ask?  You've axhausted
the data set (and the politicians) so the difference cannot be an artifact
of (random) sampling errors.  Then what can 'statistical significance'
signify?  Am I missing something?


    RW> I would appreciate any information or pointers on this
    RW> subject.  I have looked in several statistics books but none
    RW> of them came right out and called anything a 't-test', and I'm
    RW> rusty enough on my statistics that I wouldn't necessarily
    RW> recognize the true formula if it didn't have the right name
    RW> there.

You might want to look under "Student's T."  

cheers,

BM
