Clustering is often used for discovering structure in data.
Clustering systems differ in the * objective function*
used to evaluate clustering quality and the * control
strategy* used to search the space of clusterings. Ideally,
the search strategy should consistently construct
clusterings of high quality, but be computationally inexpensive as
well. Given the
combinatorial complexity of the general clustering problem,
a search strategy cannot be both computationally inexpensive
and give any guarantee about the quality of discovered clusterings
across a diverse set of domains and objective functions. However,
we can partition the search so that an initial clustering
is inexpensively constructed,
followed by iterative optimization procedures
that continue to search in background for improved clusterings.
This allows an analyst to get an early indication of the
possible presence and form of structure in data, but search can
continue as long as it seems worthwhile. This
seems to be a primary motivation behind the design of systems such as
AUTOCLASS [Cheeseman, Kelly, Self, Stutz, Taylor &
Freeman, 1988]
and SNOB [Wallace &
Dowe, 1994].

This paper describes and
evaluates three strategies for iterative optimization,
one inspired by the iterative `seed' selection strategy of CLUSTER/2 [Michalski &
Stepp, 1983a;
1983b],
one is a common form of optimization that
iteratively reclassifies single observations, and a third method
appears novel in the clustering literature. This latter strategy
was inspired, in part, by
macro-learning strategies [Iba, 1989]
-- collections of observations are reclassified
* en masse*, which appears to mitigate problems associated
with local maxima as measured by the objective function.
For evaluation purposes, we couple these strategies
with a simple, inexpensive
procedure used by COBWEB [Fisher,
1987a; 1987b]
and a system by Anderson and Matessa [Anderson
& Matessa, 1991],
which constructs an initial hierarchical
clustering. These iterative optimization strategies, however, can be
paired with other methods for constructing initial clusterings.

Once a clustering has been constructed it is judged by analysts -- often according to task-specific criteria. Several authors [Fisher, 1987a; 1987b; Cheeseman et al., 1988; Anderson & Matessa, 1991] have abstracted these criteria into a generic performance task akin to pattern completion, where the error rate over completed patterns can be used to `externally' judge the utility of a clustering. In each of these systems, the objective function has been selected with this performance task in mind. Given this performance task we adapt resampling-based pruning strategies used by supervised learning systems to the task of simplifying hierarchical clusterings, thus easying post-clustering analysis. Experimental evidence suggests that hierarchical clusterings can be greatly simplified with no increase in pattern-completion error rate.

Our experiments with clustering simplification suggest `external' criteria of simplicity and classification cost, in addition to pattern-completion error rate, for judging the relative merits of differing objective functions in clustering. We suggest several objective functions that are adaptations of selection measures used in supervised, decision-tree induction, which may do well on the dimensions of simplicity and error rate.

Douglas H. Fisher

Sat Mar 30 11:37:23 CST 1996