Newsgroups: comp.ai.neural-nets
Path: cantaloupe.srv.cs.cmu.edu!rochester!cornellcs!newsstand.cit.cornell.edu!news.tc.cornell.edu!news.cac.psu.edu!news.math.psu.edu!chi-news.cic.net!news.cais.net!news.ac.net!imci4!newsfeed.internetmci.com!in2.uu.net!news.interpath.net!sas!mozart.unx.sas.com!saswss
From: saswss@hotellng.unx.sas.com (Warren Sarle)
Subject: Re: Nearest Neighbour Classifiers
Originator: saswss@hotellng.unx.sas.com
Sender: news@unx.sas.com (Noter of Newsworthy Events)
Message-ID: <DonpvA.D7C@unx.sas.com>
Date: Fri, 22 Mar 1996 06:54:46 GMT
X-Nntp-Posting-Host: hotellng.unx.sas.com
References: <4ieogp$gt@newsflash.hol.gr> <DoHJDp.1FE@unx.sas.com> <4im2q7$hev@llnews.ll.mit.edu>
Organization: SAS Institute Inc.
Lines: 35


In article <4im2q7$hev@llnews.ll.mit.edu>, heath@ll.mit.edu (Greg Heath) writes:
|> ...
|> My favorite for k-nn classification is 
|> 
|>       Devijver, P.A. and Kittler, J., (1982) Pattern Recognition: A Statistical
|>       Approach, Prentice-Hall International, London, ISBN 0-13-654236-0.
|> 
|> The trick is to "edit" (i.e., "remove") intrinsic training data errors before 
|> using Hart's "condensing" algorithm to thin out the correctly classified part of 
|> the training set. D&K introduce the concept of multiple(i.e., iterative) 1-nn 
|> editing instead of the more traditional (i.e., pre-1982) k-nn editing. 
|> 
|> However, with my large data sets (~ 200K 65-D vectors), I never got good 
|> generalization or fast classification until I started to reward "winning" 
|> training set vectors when classifications were correct. This created cluster
|> centroids similar to LVQ but I never punished winners for incorrect 
|> classifications and I never rewarded or punished "runners-up". Then I found it
|> more efficent to replace the pre-condensing editing of original data vectors
|> with post-condensing pruning of clusters based on leave-one-out discrimination. 
|> For squared Euclidean distances this just involves the multiplier 
|> (N_i/(N_i-1))^2 where N_i is the number of training vectors that have been 
|> assigned to cluster i. Although classification is now done LVQ winner-take-all
|> style, I could easily create an RBF or a k-Nearest-Cluster Classifier.

For such a large data set, how about first running a k-means cluster
analysis on each class to get an initial set of codebook vectors,
then running some sort of "editing" or "condensing" (I forget the
difference) algorithm on the cluster means, with or without rewards?

-- 

Warren S. Sarle       SAS Institute Inc.   The opinions expressed here
saswss@unx.sas.com    SAS Campus Drive     are mine and not necessarily
(919) 677-8000        Cary, NC 27513, USA  those of SAS Institute.
