There are many important issues in clustering that we will not address
in depth. One of these is the possible advantage of overlapping
clusters [Lebowitz, 1987,Martin & Billman,
1994].
We have assumed tree-structured clusterings, which
store each observation in more than one cluster, but these clusters are
related by a proper subset-of relation as one descends a path in
the tree. In many cases,
* lattices* [Levinson, 1984;
Wilcox & Levinson,
1986,Carpineto &
Romano, 1993],
or more generally, * directed acyclic graphs*
(DAG) may be a better representation scheme. These structures
allow an observation to be included in multiple clusters, where
one such cluster need not be a subset of another. As such, they
may better provide an analyst with multiple perspectives of
the data.
For example, animals can be partitioned into clusters
corresponding to mammals, birds, reptiles, etc., or
they may be partitioned into clusters corresponding
to carnivores, herbivores, or omnivores. A tree would require
that one of these partitions (e.g., carnivore, etc.)
be `subordinate' to the other (e.g., mammals, birds, etc.);
Classes of the subordinate partition would necessarily
be `distributed' across descendents (e.g., carnivorous-mammal,
omnivorous-mammal, carnivorous-reptile, etc.)
of top level clusters, which ideally would
represent clusters of the other partition. A DAG allows
both perspectives to coexist in relative equality,
thus making both perspectives more explicit to an analyst.

We have also assumed that variables are nominally valued.
There have been numerous adaptations of the basic **PU**
function, other functions, and discretization strategies to accommodate
numeric variables [Michalski &
Stepp, 1983a; 1983b; Gennari
et al., 1989; Reich & Fenves,
1991; Cheeseman et al.,
1988; Biswas
et al., 1994].
The basic sorting procedure and the iterative optimization
techniques can be used with data described in whole or part by
numerically-valued variables regardless of which approach one takes.
The identification of numeric
variable frontiers using holdout can be done by
using the mean value for a variable at a node for generating predictions,
and identifying a variable's frontier as the set of clusters
that collectively minimize a measure of error such as
mean-squared error.

Douglas H. Fisher

Sat Mar 30 11:37:23 CST 1996