Newsgroups: comp.ai.neural-nets
Path: cantaloupe.srv.cs.cmu.edu!rochester!cornellcs!newsstand.cit.cornell.edu!intac!uunet!in3.uu.net!199.94.215.18!cam-news-hub1.bbnplanet.com!news.bbnplanet.com!news.mathworks.com!newsgate.duke.edu!interpath!news.interpath.net!news.interpath.net!sas!newshost.unx.sas.com!saswss
From: saswss@hotellng.unx.sas.com (Warren Sarle)
Subject: Re: Automatic category learning?
Originator: saswss@hotellng.unx.sas.com
Sender: news@unx.sas.com (Noter of Newsworthy Events)
Message-ID: <E5I5x6.C7@unx.sas.com>
Date: Wed, 12 Feb 1997 18:27:06 GMT
Distribution: inet
X-Nntp-Posting-Host: hotellng.unx.sas.com
References:  <nfyafplqxqq.fsf@delta.hut.fi>
Organization: SAS Institute Inc.
Lines: 308


In article <nfyafplqxqq.fsf@delta.hut.fi>, Ari.Huttunen@hut.fi (Ari Huttunen) writes:
|> I've just started learning NNs, but I kind of wonder if there
|> exists a learning method that can train the network to recognize
|> optimally useful categories.
|> 
|> By optimally useful categories I mean that "optimally useful
|> categories cut the world up at a basic level that most clearly
|> distinguishes categories from each other. The basic level maximizes
|> within-category resemblance while minimizing between-category
|> resemblance." ...

This problem goes by dozens of different names, the most common of
which is "cluster analysis". There is a huge amount of literature
on the subject, and hundreds--perhaps thousands--of algorithms
have been published. You will need to be at lot more specific about
what you mean by "optimally useful" before you can narrow down the
choices.

Some references that I, as a statistician, think are important are
listed in the bibliography below, most of which is excerpted from the
_SAS/STAT User's Guide_.  Someone in artificial intelligence would
probably have a different list.
_____________________________________________________________________
Massart and Kaufman (1983) is the best elementary introduction to
cluster analysis.  Other important texts are Anderberg (1973), Sneath
and Sokal (1973), Duran and Odell (1974), Hartigan (1975), Titterington,
Smith, and Makov (1985), McLachlan and Basford (1988), and Kaufmann and
Rousseeuw (1990).  Hartigan (1975) and Spath (1980) give numerous
FORTRAN programs for clustering.  Any prospective user of cluster
analysis should study the Monte Carlo results of Milligan (1980),
Milligan and Cooper (1985), and Cooper and Milligan (1984).  Essential
references on the statistical aspects of clustering include MacQueen
(1967), Wolfe (1970), Scott and Symons (1971), Hartigan (1977; 1978;
1981; 1985), Binder (1978; 1981), Symons (1981), Wong and Schaack
(1982), Wong and Lane (1983), Sarle (1983), Bock (1985), Banfield and
Raftery (1993), and SAS Institute (1993).  For fuzzy clustering, see
Bezdek (1981) and Bezdek and Pal (1992).  See Blashfield and Aldenderfer
(1978) for a discussion of the fragmented state of the literature on
cluster analysis.  Avoid articles in the Journal of Marketing Research.
There is a separate list of references at the end on nonparametric
clustering methods, which define a cluster as a mode in the probability
density function; these nonparamatric methods have major advantages over
all traditional methods.

Anderberg, M.R. (1973), Cluster Analysis for Applications, New
York: Academic Press, Inc.

Art, D., Gnanadesikan, R., and Kettenring, R. (1982), "Data-based
Metrics for Cluster Analysis," Utilitas Mathematica,
21A, 75-99.

Banfield, J.D. and Raftery, A.E. (1993). "Model-Based Gaussian and Non-
Gaussian Clustering", Biometrics, 49, 803-821.

Bezdek, J.C. (1981), Pattern Recognition with Fuzzy Objective Function
Algorithms, New York: Plenum Press.

Bezdek, J.C. & Pal, S.K., eds. (1992), Fuzzy Models for Pattern
Recognition, New York: IEEE Press.

Binder, D.A. (1978), "Bayesian Cluster Analysis," Biometrika, 65,
31-38.

Binder, D.A. (1981), "Approximations to Bayesian Clustering Rules,"
Biometrika, 68, 275-285.

Blashfield, R.K. and Aldenderfer, M.S. (1978), "The Literature on
Cluster Analysis," Multivariate Behavioral Research,
13, 271-295.

Bock, H.H. (1985), "On Some Significance Tests in Cluster
Analysis," Journal of Classification, 2, 77-108.

Calinski, T. and Harabasz, J. (1974), "A Dendrite Method for Cluster
Analysis," Communications in Statistics, 3, 1-27.

Cooper, M.C. and Milligan, G.W. (1988), "The Effect of
Error on Determining the Number of Clusters," Proceedings
of the International Workshop on Data Analysis, Decision
Support and Expert Knowledge Representation in Marketing and
Related Areas of Research, 319-328.

Duda, R.O. and Hart, P.E. (1973), Pattern Classification and Scene
Analysis, New York: John Wiley & Sons, Inc.

Duran, B.S. and Odell, P.L. (1974), Cluster Analysis, New York:
Springer-Verlag.

Englemann, L. and Hartigan, J.A. (1969), "Percentage Points of a Test
for Clusters," Journal of the American Statistical Association,
64, 1647-1648.

Everitt, B.S. (1979), "Unresolved Problems in Cluster Analysis,"
Biometrics, 35, 169-181.

Good, I.J. (1977), "The Botryology of Botryology," in Classification
and Clustering, ed. J. Van Ryzin, New York: Academic Press, Inc.

Harman, H.H. (1976), Modern Factor Analysis, 3d Edition,
Chicago: University of Chicago Press.

Hartigan, J.A. (1975), Clustering Algorithms, New York: John
Wiley & Sons, Inc.

Hartigan, J.A. (1977), "Distribution Problems in Clustering," in
Classification and Clustering, ed. J. Van Ryzin, New York:
Academic Press, Inc.

Hartigan, J.A. (1978), "Asymptotic Distributions for Clustering
Criteria,"Annals of Statistics, 6, 117-131.

Hartigan, J.A. (1981), "Consistency of Single Linkage for High-Density
Clusters," Journal of the American Statistical Association, 76,
388-394.

Hartigan, J.A. (1985), "Statistical Theory in Clustering,"
Journal of Classification, 2, 63-76.

Hawkins, D.M., Muller, M.W., and ten Krooden, J.A. (1982), "Cluster
Analysis," in Topics in Applied Multivariate Analysis, ed. D.M.
Hawkins, Cambridge: Cambridge University Press.

Hubert, L. (1974), "Approximate Evaluation Techniques for the
Single-Link and Complete-Link Hierarchical Clustering Procedures,"
Journal of the American Statistical Association, 69,
698-704.

Hubert, L.J. and Baker, F.B. (1977), "An Empirical Comparison of
Baseline Models for Goodness-of-Fit in r-Diameter Hierarchical
Clustering," in Classification and Clustering, ed. J. Van Ryzin,
New York: Academic Press, Inc.

Kaufmann, L. and Rousseeuw, P.J. (1990), Finding Groups in Data,
New York: John Wiley & Sons, Inc.

Lee, K.L. (1979), "Multivariate Tests for Clusters," Journal of the
American Statistical Association, 74, 708-714.

Ling, R.F (1973), "A Probability Theory of Cluster Analysis," Journal
of the American Statistical Association, 68, 159-169.

MacQueen, J.B. (1967), "Some Methods for Classification and Analysis of
Multivariate Observations,"Proceedings of the Fifth Berkeley
Symposium on Mathematical Statistics and Probability,
1, 281-297.

Marriott, F.H.C. (1971), "Practical Problems in a Method of Cluster
Analysis,"Biometrics, 27, 501-514.

Marriott, F.H.C. (1975), "Separating Mixtures of Normal Distributions,"
Biometrics, 31, 767-769.

Massart, D.L. and Kaufman, L. (1983), The Interpretation of
Analytical Chemical Data by the Use of Cluster Analysis, New York:
John Wiley & Sons, Inc.

McClain, J.O. and Rao, V.R. (1975), "CLUSTISZ: A Program to Test for the
Quality of Clustering of a Set of Objects," Journal of Marketing
Research, 12, 456-460.

McLachlan, G.J. and Basford, K.E. (1988), Mixture Models,
New York: Marcel Dekker, Inc.

Mezzich, J.E and Solomon, H. (1980), Taxonomy and Behavioral
Science, New York: Academic Press, Inc.

Milligan, G.W. (1980), "An Examination of the Effect of Six Types of
Error Perturbation on Fifteen Clustering Algorithms,"
Psychometrika, 45, 325-342.

Milligan, G.W. (1981), "A Review of Monte Carlo Tests of Cluster
Analysis," Multivariate Behavioral Research, 16, 379-407.

Milligan, G.W. and Cooper, M.C. (1985), "An Examination of Procedures
for Determining the Number of Clusters in a Data Set,"
Psychometrika, 50, 159-179.

Pollard, D. (1981), "Strong Consistency of k-Means Clustering,"
Annals of Statistics, 9, 135-140.

Sarle, W.S. (1982), "Cluster Analysis by Least Squares," Proceedings of
the Seventh Annual SAS Users Group International Conference,
651-653.

Sarle, W.S. (1983), Cubic Clustering Criterion, SAS Technical
Report A-108, Cary, NC: SAS Institute Inc.

SAS Institute Inc. (1993), SAS/STAT Software: The MODECLUS Procedure,
SAS Technical Report P-256, Cary, NC: SAS Institute Inc.

Scott, A.J. and Symons, M.J. (1971), "Clustering Methods Based on
Likelihood Ratio Criteria," Biometrics, 27, 387-397.

Sneath, P.H.A. and Sokal, R.R. (1973), Numerical Taxonomy, San
Francisco: W.H. Freeman.

Spath, H. (1980), Cluster Analysis Algorithms, Chichester,
England: Ellis Horwood.

Symons, M.J. (1981), "Clustering Criteria and Multivariate Normal
Mixtures," Biometrics, 37, 35-43.

Titterington, D.M., Smith, A.F.M., and Makov, U.E. (1985),
Statistical Analysis of Finite Mixture Distributions,
New York: John Wiley & Sons, Inc.

Ward, J.H. (1963), "Hierarchical Grouping to Optimize an Objective
Function," Journal of the American Statistical Association, 58,
236-244.

Wolfe, J.H. (1970), "Pattern Clustering by Multivariate Mixture
Analysis," Multivariate Behavioral Research, 5, 329-350.

Wolfe, J.H. (1978), "Comparative Cluster Analysis of Patterns of
Vocational Interest," Multivariate Behavioral Research,
13, 33-44.

Wong, M.A. (1982), "A Hybrid Clustering Method for Identifying
High-Density Clusters," Journal of the American Statistical
Association, 77, 841-847.

Wong, M.A. and Lane, T. (1983), "A kth Nearest Neighbor Clustering
Procedure," Journal of the Royal Statistical Society, Series B,
45, 362-368.

Wong, M.A. and Schaack, C. (1982), "Using the kth Nearest Neighbor
Clustering Procedure to Determine the Number of Subpopulations,"
American Statistical Association 1982 Proceedings of the Statistical
Computing Section, 40-48.


More references for nonparametric estimation of clusters as modes:

Barnett, V., ed. (1981), _Interpreting Multivariate Data_, New York:
John Wiley & Sons, Inc.

Girman, C.J. (1994), "Cluster Analysis and Classification Tree
Methodology as an Aid to Improve Understanding of Benign Prostatic
Hyperplasia," Ph.D. thesis, Chapel Hill, NC: Department of
Biostatistics, University of North Carolina.

Gitman, I. (1973), ``An Algorithm for Nonsupervised Pattern
Classification,'' IEEE Transactions on Systems, Man, and Cybernetics,
SMC-3, 66-74.

Hartigan, J.A. and Hartigan, P.M. (1985), ``The Dip Test of
Unimodality,'' Annals of Statistics_ 13, 70-84.

Hartigan, P.M. (1985), "Computation of the Dip Statistic to Test for
Unimodality,"  Applied Statistics, 34, 320-325.

Huizinga, D. H. (1978), ``A Natural or Mode Seeking Cluster Analysis
Algorithm,'' Technical Report 78-1, Behavioral Research Institute, 2305
Canyon Blvd., Boulder, Colorado 80302.

Koontz, W.L.G. and Fukunaga, K. (1972a), ``A Nonparametric
Valley-Seeking Technique for Cluster Analysis,'' IEEE Transactions on
Computers, C-21, 171-178.

Koontz, W.L.G. and Fukunaga, K. (1972b), ``Asymptotic Analysis of a
Nonparametric Clustering Technique,'' IEEE Transactions on Computers,
C-21, 967-974.

Koontz, W.L.G., Narendra, P.M., and Fukunaga, K. (1976), ``A
Graph-Theoretic Approach to Nonparametric Cluster Analysis,'' IEEE
Transactions on Computers, C-25, 936-944.

Minnotte, M.C. (1992), ``A Test of Mode Existence with
Applications to Multimodality,'' Ph.D. thesis, Rice University,
Department of Statistics.

Mizoguchi, R. and Shimura, M. (1980), ``A Nonparametric Algorithm for
Detecting Clusters Using Hierarchical Structure,'' IEEE Transactions on
Pattern Analysis and Machine Intelligence, PAMI-2, 292-300.

Mueller, D.W. and Sawitzki, G. (1991), ``Excess mass estimates and tests for multimodality,''
JASA 86, 738-746.

Polonik, W. (1993), "Measuring Mass Concentrations and Estimating
Density Contour Clusters--An Excess Mass Approach," Technical Report,
Beitraege zur Statistik Nr. 7, Universitaet Heidelberg.

SAS Institute Inc. (1993), SAS/STAT Software: The MODECLUS Procedure,
SAS Technical Report P-256, Cary, NC: SAS Institute Inc.

Silverman, B.W. (1986), _Density Estimation_, New York: Chapman and
Hall.

Tukey, P.A. and Tukey, J.W. (1981), ``Data-Driven View Selection;
Agglomeration and Sharpening,'' in Barnett (1981).

Wong, M.A. and Lane, T. (1983), ``A _k_th Nearest Neighbor Clustering
Procedure,'' _Journal of the Royal Statistical Society_, Series B, 45,
362-368.

Wong, M.A. and Schaack, C. (1982), ``Using the _k_th Nearest Neighbor
Clustering Procedure to Determine the Number of Subpopulations,''
_American Statistical Association 1982 Proceedings of the Statistical
Computing Section_, 40-48.

-- 

Warren S. Sarle       SAS Institute Inc.   The opinions expressed here
saswss@unx.sas.com    SAS Campus Drive     are mine and not necessarily
(919) 677-8000        Cary, NC 27513, USA  those of SAS Institute.
 *** Do not send me unsolicited commercial or political email! ***

