Newsgroups: comp.ai.neural-nets
Path: cantaloupe.srv.cs.cmu.edu!bb3.andrew.cmu.edu!newsfeed.pitt.edu!godot.cc.duq.edu!newsgate.duke.edu!news.mathworks.com!uunet!inXS.uu.net!news.interpath.net!sas!newshost.unx.sas.com!saswss
From: saswss@hotellng.unx.sas.com (Warren Sarle)
Subject: Re: cluster analysis
Originator: saswss@hotellng.unx.sas.com
Sender: news@unx.sas.com (Noter of Newsworthy Events)
Message-ID: <DuAw2q.1r4@unx.sas.com>
Date: Wed, 10 Jul 1996 00:01:37 GMT
X-Nntp-Posting-Host: hotellng.unx.sas.com
References:  <4regg5$e10@mack.rt66.com>
Organization: SAS Institute Inc.
Lines: 294


In article <4regg5$e10@mack.rt66.com>, llubet@rt66.com (Lloyd Lubet) writes:
|> Can anyone recommend readable books on cluster theory and associative
|> memory?

Here's an answer to the cluster analysis part of the question. I'll let
someone else respond regarding associative memory.

Massart and Kaufman (1983) is the best elementary introduction to
cluster analysis.  Other important texts are Anderberg (1973), Sneath
and Sokal (1973), Duran and Odell (1974), Hartigan (1975), Titterington,
Smith, and Makov (1985), McLachlan and Basford (1988), and Kaufmann and
Rousseeuw (1990).  Hartigan (1975) and Spath (1980) give numerous
FORTRAN programs for clustering.  Any prospective user of cluster
analysis should study the Monte Carlo results of Milligan (1980),
Milligan and Cooper (1985), and Cooper and Milligan (1984).  Essential
references on the statistical aspects of clustering include MacQueen
(1967), Wolfe (1970), Scott and Symons (1971), Hartigan (1977; 1978;
1981; 1985), Binder (1978; 1981), Symons (1981), Wong and Schaack
(1982), Wong and Lane (1983), Sarle (1983), Bock (1985), Banfield and
Raftery (1993), and SAS Institute (1993).  For fuzzy clustering, see
Bezdek (1981) and Bezdek and Pal (1992).  See Blashfield and Aldenderfer
(1978) for a discussion of the fragmented state of the literature on
cluster analysis.  Avoid articles in the Journal of Marketing Research.
There is a separate list of references at the end on nonparametric
clustering methods, which define a cluster as a mode in the probability
density function; these nonparamatric methods have major advantages over
all traditional methods.

Anderberg, M.R. (1973), Cluster Analysis for Applications, New
York: Academic Press, Inc.

Art, D., Gnanadesikan, R., and Kettenring, R. (1982), "Data-based
Metrics for Cluster Analysis," Utilitas Mathematica,
21A, 75-99.

Banfield, J.D. and Raftery, A.E. (1993). "Model-Based Gaussian and Non-
Gaussian Clustering", Biometrics, 49, 803-821.

Bezdek, J.C. (1981), Pattern Recognition with Fuzzy Objective Function
Algorithms, New York: Plenum Press.

Bezdek, J.C. & Pal, S.K., eds. (1992), Fuzzy Models for Pattern
Recognition, New York: IEEE Press.

Binder, D.A. (1978), "Bayesian Cluster Analysis," Biometrika, 65,
31-38.

Binder, D.A. (1981), "Approximations to Bayesian Clustering Rules,"
Biometrika, 68, 275-285.

Blashfield, R.K. and Aldenderfer, M.S. (1978), "The Literature on
Cluster Analysis," Multivariate Behavioral Research,
13, 271-295.

Bock, H.H. (1985), "On Some Significance Tests in Cluster
Analysis," Journal of Classification, 2, 77-108.

Calinski, T. and Harabasz, J. (1974), "A Dendrite Method for Cluster
Analysis," Communications in Statistics, 3, 1-27.

Cooper, M.C. and Milligan, G.W. (1988), "The Effect of
Error on Determining the Number of Clusters," Proceedings
of the International Workshop on Data Analysis, Decision
Support and Expert Knowledge Representation in Marketing and
Related Areas of Research, 319-328.

Duda, R.O. and Hart, P.E. (1973), Pattern Classification and Scene
Analysis, New York: John Wiley & Sons, Inc.

Duran, B.S. and Odell, P.L. (1974), Cluster Analysis, New York:
Springer-Verlag.

Englemann, L. and Hartigan, J.A. (1969), "Percentage Points of a Test
for Clusters," Journal of the American Statistical Association,
64, 1647-1648.

Everitt, B.S. (1979), "Unresolved Problems in Cluster Analysis,"
Biometrics, 35, 169-181.

Everitt, B.S. and Hand, D.J. (1981), Finite Mixture Distributions,
New York: Chapman and Hall.

Good, I.J. (1977), "The Botryology of Botryology," in Classification
and Clustering, ed. J. Van Ryzin, New York: Academic Press, Inc.

Harman, H.H. (1976), Modern Factor Analysis, 3d Edition,
Chicago: University of Chicago Press.

Hartigan, J.A. (1975), Clustering Algorithms, New York: John
Wiley & Sons, Inc.

Hartigan, J.A. (1977), "Distribution Problems in Clustering," in
Classification and Clustering, ed. J. Van Ryzin, New York:
Academic Press, Inc.

Hartigan, J.A. (1978), "Asymptotic Distributions for Clustering
Criteria,"Annals of Statistics, 6, 117-131.

Hartigan, J.A. (1981), "Consistency of Single Linkage for High-Density
Clusters," Journal of the American Statistical Association, 76,
388-394.

Hartigan, J.A. (1985), "Statistical Theory in Clustering,"
Journal of Classification, 2, 63-76.

Hawkins, D.M., Muller, M.W., and ten Krooden, J.A. (1982), "Cluster
Analysis," in Topics in Applied Multivariate Analysis, ed. D.M.
Hawkins, Cambridge: Cambridge University Press.

Hubert, L. (1974), "Approximate Evaluation Techniques for the
Single-Link and Complete-Link Hierarchical Clustering Procedures,"
Journal of the American Statistical Association, 69,
698-704.

Hubert, L.J. and Baker, F.B. (1977), "An Empirical Comparison of
Baseline Models for Goodness-of-Fit in r-Diameter Hierarchical
Clustering," in Classification and Clustering, ed. J. Van Ryzin,
New York: Academic Press, Inc.

Kaufmann, L. and Rousseeuw, P.J. (1990), Finding Groups in Data,
New York: John Wiley & Sons, Inc.

Lee, K.L. (1979), "Multivariate Tests for Clusters," Journal of the
American Statistical Association, 74, 708-714.

Ling, R.F (1973), "A Probability Theory of Cluster Analysis," Journal
of the American Statistical Association, 68, 159-169.

MacQueen, J.B. (1967), "Some Methods for Classification and Analysis of
Multivariate Observations,"Proceedings of the Fifth Berkeley
Symposium on Mathematical Statistics and Probability,
1, 281-297.

Marriott, F.H.C. (1971), "Practical Problems in a Method of Cluster
Analysis,"Biometrics, 27, 501-514.

Marriott, F.H.C. (1975), "Separating Mixtures of Normal Distributions,"
Biometrics, 31, 767-769.

Massart, D.L. and Kaufman, L. (1983), The Interpretation of
Analytical Chemical Data by the Use of Cluster Analysis, New York:
John Wiley & Sons, Inc.

McClain, J.O. and Rao, V.R. (1975), "CLUSTISZ: A Program to Test for the
Quality of Clustering of a Set of Objects," Journal of Marketing
Research, 12, 456-460.

McLachlan, G.J. and Basford, K.E. (1988), Mixture Models,
New York: Marcel Dekker, Inc.

Mezzich, J.E and Solomon, H. (1980), Taxonomy and Behavioral
Science, New York: Academic Press, Inc.

Milligan, G.W. (1980), "An Examination of the Effect of Six Types of
Error Perturbation on Fifteen Clustering Algorithms,"
Psychometrika, 45, 325-342.

Milligan, G.W. (1981), "A Review of Monte Carlo Tests of Cluster
Analysis," Multivariate Behavioral Research, 16, 379-407.

Milligan, G.W. and Cooper, M.C. (1985), "An Examination of Procedures
for Determining the Number of Clusters in a Data Set,"
Psychometrika, 50, 159-179.

Pollard, D. (1981), "Strong Consistency of k-Means Clustering,"
Annals of Statistics, 9, 135-140.

Sarle, W.S. (1982), "Cluster Analysis by Least Squares," Proceedings of
the Seventh Annual SAS Users Group International Conference,
651-653.

Sarle, W.S. (1983), Cubic Clustering Criterion, SAS Technical
Report A-108, Cary, NC: SAS Institute Inc.

SAS Institute Inc. (1993), SAS/STAT Software: The MODECLUS Procedure,
SAS Technical Report P-256, Cary, NC: SAS Institute Inc.

Scott, A.J. and Symons, M.J. (1971), "Clustering Methods Based on
Likelihood Ratio Criteria," Biometrics, 27, 387-397.

Sneath, P.H.A. and Sokal, R.R. (1973), Numerical Taxonomy, San
Francisco: W.H. Freeman.

Spath, H. (1980), Cluster Analysis Algorithms, Chichester,
England: Ellis Horwood.

Symons, M.J. (1981), "Clustering Criteria and Multivariate Normal
Mixtures," Biometrics, 37, 35-43.

Titterington, D.M., Smith, A.F.M., and Makov, U.E. (1985),
Statistical Analysis of Finite Mixture Distributions,
New York: John Wiley & Sons, Inc.

Ward, J.H. (1963), "Hierarchical Grouping to Optimize an Objective
Function," Journal of the American Statistical Association, 58,
236-244.

Wolfe, J.H. (1970), "Pattern Clustering by Multivariate Mixture
Analysis," Multivariate Behavioral Research, 5, 329-350.

Wolfe, J.H. (1978), "Comparative Cluster Analysis of Patterns of
Vocational Interest," Multivariate Behavioral Research,
13, 33-44.

Wong, M.A. (1982), "A Hybrid Clustering Method for Identifying
High-Density Clusters," Journal of the American Statistical
Association, 77, 841-847.

Wong, M.A. and Lane, T. (1983), "A kth Nearest Neighbor Clustering
Procedure," Journal of the Royal Statistical Society, Series B,
45, 362-368.

Wong, M.A. and Schaack, C. (1982), "Using the kth Nearest Neighbor
Clustering Procedure to Determine the Number of Subpopulations,"
American Statistical Association 1982 Proceedings of the Statistical
Computing Section, 40-48.


More references for nonparametric estimation of clusters as modes:

Barnett, V., ed. (1981), _Interpreting Multivariate Data_, New York:
John Wiley & Sons, Inc.

Girman, C.J. (1994), "Cluster Analysis and Classification Tree
Methodology as an Aid to Improve Understanding of Benign Prostatic
Hyperplasia," Ph.D. thesis, Chapel Hill, NC: Department of
Biostatistics, University of North Carolina.

Gitman, I. (1973), ``An Algorithm for Nonsupervised Pattern
Classification,'' _IEEE Transactions on Systems, Man, and Cybernetics_,
SMC-3, 66-74.

Hartigan, J.A. and Hartigan, P.M. (1985), ``The Dip Test of
Unimodality,'' _Annals of Statistics_, 13, 70-84.

Hartigan, P.M. (1985), "Computation of the Dip Statistic to Test for
Unimodality,"  Applied Statistics, 34, 320-325.

Huizinga, D. H. (1978), ``A Natural or Mode Seeking Cluster Analysis
Algorithm,'' Technical Report 78-1, Behavioral Research Institute, 2305
Canyon Blvd., Boulder, Colorado 80302.

Koontz, W.L.G. and Fukunaga, K. (1972a), ``A Nonparametric
Valley-Seeking Technique for Cluster Analysis,'' _IEEE Transactions on
Computers_, C-21, 171-178.

Koontz, W.L.G. and Fukunaga, K. (1972b), ``Asymptotic Analysis of a
Nonparametric Clustering Technique,'' _IEEE Transactions on Computers_,
C-21, 967-974.

Koontz, W.L.G., Narendra, P.M., and Fukunaga, K. (1976), ``A
Graph-Theoretic Approach to Nonparametric Cluster Analysis,'' _IEEE
Transactions on Computers_, C-25, 936-944.

Minnotte, M.C. (1992), ``A Test of Mode Existence with
Applications to Multimodality,'' Ph.D. thesis, Rice University,
Department of Statistics.

Mizoguchi, R. and Shimura, M. (1980), ``A Nonparametric Algorithm for
Detecting Clusters Using Hierarchical Structure,'' IEEE _Transactions on
Pattern Analysis and Machine Intelligence_, PAMI-2, 292-300.

Mueller, D.W. and Sawitzki, G. (1991), ``Excess mass estimates and tests for multimodality,''
JASA 86, 738-746.

Polonik, W. (1993), "Measuring Mass Concentrations and Estimating
Density Contour Clusters--An Excess Mass Approach," Technical Report,
Beitraege zur Statistik Nr. 7, Universitaet Heidelberg.

SAS Institute Inc. (1993), SAS/STAT Software: The MODECLUS Procedure,
SAS Technical Report P-256, Cary, NC: SAS Institute Inc.

Silverman, B.W. (1986), _Density Estimation_, New York: Chapman and
Hall.

Tukey, P.A. and Tukey, J.W. (1981), ``Data-Driven View Selection;
Agglomeration and Sharpening,'' in Barnett (1981).

Wong, M.A. and Lane, T. (1983), ``A _k_th Nearest Neighbor Clustering
Procedure,'' _Journal of the Royal Statistical Society_, Series B, 45,
362-368.

Wong, M.A. and Schaack, C. (1982), ``Using the _k_th Nearest Neighbor
Clustering Procedure to Determine the Number of Subpopulations,''
_American Statistical Association 1982 Proceedings of the Statistical
Computing Section_, 40-48.


-- 

Warren S. Sarle       SAS Institute Inc.   The opinions expressed here
saswss@unx.sas.com    SAS Campus Drive     are mine and not necessarily
(919) 677-8000        Cary, NC 27513, USA  those of SAS Institute.
