Box 1679, Station B

Vanderbilt University

Nashville, TN 37235 USA

Clustering is often used for discovering structure in data.
Clustering systems differ in the * objective function*
used to evaluate clustering quality and the * control
strategy* used to search the space of clusterings. Ideally,
the search strategy should consistently construct
clusterings of high quality, but be computationally inexpensive as well.
In general, we cannot have it both ways, but we can
partition the search so that a system
inexpensively constructs a `tentative' clustering for
initial examination, followed by iterative optimization, which
continues to search in background for improved clusterings.
Given this motivation, we evaluate an inexpensive strategy
for creating initial clusterings,
coupled with several control strategies for
* iterative optimization*, each of which repeatedly modifies an
initial clustering in search of a better one. One of these methods
appears novel as an iterative optimization
strategy in clustering contexts.
Once a clustering has been constructed it is judged by
analysts -- often according to task-specific criteria.
Several authors have abstracted these criteria and posited
a generic performance task akin to pattern completion,
where the error rate over completed patterns is used to
`externally' judge clustering utility. Given this performance
task, we adapt resampling-based pruning strategies
used by supervised learning systems to the task of
simplifying hierarchical clusterings, thus promising to ease
post-clustering analysis. Finally, we propose a number of objective
functions, based on attribute-selection measures for decision-tree
induction, that might perform well on the error rate and simplicity
dimensions.

- Introduction
- Generating Hierarchical Clusterings
- Iterative Optimization
- Simplifying Hierarchical Clusterings
- General Discussion
- Concluding Remarks
- References
- About this document ...

**Acknowledgements:** I thank Sashank Varma, Arthur Nevins,
and Diana Gordon
for comments on the paper. The reviewers and editor
supplied extensive and helpful comments.
This work was supported by grant NAG 2-834 from
NASA Ames Research Center. A very abbreviated discussion
of some of this article's results appear in Fisher
[1995], published by AAAI Press.

Douglas H. Fisher

Sat Mar 30 11:37:23 CST 1996