Newsgroups: comp.ai.neural-nets
Path: cantaloupe.srv.cs.cmu.edu!das-news2.harvard.edu!news2.near.net!bloom-beacon.mit.edu!gatech!concert!sas!mozart.unx.sas.com!saswss
From: saswss@hotellng.unx.sas.com (Warren Sarle)
Subject: Re: K-Means distribution
Originator: saswss@hotellng.unx.sas.com
Sender: news@unx.sas.com (Noter of Newsworthy Events)
Message-ID: <CzFMM5.8q3@unx.sas.com>
Date: Thu, 17 Nov 1994 22:03:41 GMT
References:  <kerog-1711941413540001@sectrl> <CzF8FB.7M0@acsu.buffalo.edu>
Nntp-Posting-Host: hotellng.unx.sas.com
Organization: SAS Institute Inc.
Lines: 32


There are many variations on k-means, of which the one below is one of
the simplest. It works quite well for many applications except for the
initialization method.

In article <CzF8FB.7M0@acsu.buffalo.edu>, jn@cs.Buffalo.EDU (Jai Natarajan) writes:
|>
|> K-Means :
|>
|> Lets say you have n nodes or centres
|> Initialise them to random values

This type of initialization is prone to degenerate solutions--one or
more clusters are likely to have zero members. It is better to choose a
subset of the training cases as initial centers. This can be done
randomly or systematically. The systematic algorithm used in the
FASTCLUS procedure in the SAS/STAT product is guaranteed to produce a
global optimum if the data contain well-separated clusters. For RBF
applications, however, one would not often expect to have well-separated
clusters.

|> Loop :  Assign each training set sample to the centre closest to it (say in terms
|> of Euclidean distance)
|>          After assigning all samples recompute each center as the mean of samples
|>          which are clustered at that centre
|>          Repeat the Loop until the cluster allocations don't change in
|> consecutive iterations. Those centres are now your weights at the n nodes
-- 

Warren S. Sarle       SAS Institute Inc.   The opinions expressed here
saswss@unx.sas.com    SAS Campus Drive     are mine and not necessarily
(919) 677-8000        Cary, NC 27513, USA  those of SAS Institute.
