There are many important issues in clustering that we will not address in depth. One of these is the possible advantage of overlapping clusters [Lebowitz, 1987,Martin & Billman, 1994]. We have assumed tree-structured clusterings, which store each observation in more than one cluster, but these clusters are related by a proper subset-of relation as one descends a path in the tree. In many cases, lattices [Levinson, 1984; Wilcox & Levinson, 1986,Carpineto & Romano, 1993], or more generally, directed acyclic graphs (DAG) may be a better representation scheme. These structures allow an observation to be included in multiple clusters, where one such cluster need not be a subset of another. As such, they may better provide an analyst with multiple perspectives of the data. For example, animals can be partitioned into clusters corresponding to mammals, birds, reptiles, etc., or they may be partitioned into clusters corresponding to carnivores, herbivores, or omnivores. A tree would require that one of these partitions (e.g., carnivore, etc.) be `subordinate' to the other (e.g., mammals, birds, etc.); Classes of the subordinate partition would necessarily be `distributed' across descendents (e.g., carnivorous-mammal, omnivorous-mammal, carnivorous-reptile, etc.) of top level clusters, which ideally would represent clusters of the other partition. A DAG allows both perspectives to coexist in relative equality, thus making both perspectives more explicit to an analyst.
We have also assumed that variables are nominally valued. There have been numerous adaptations of the basic PU function, other functions, and discretization strategies to accommodate numeric variables [Michalski & Stepp, 1983a; 1983b; Gennari et al., 1989; Reich & Fenves, 1991; Cheeseman et al., 1988; Biswas et al., 1994]. The basic sorting procedure and the iterative optimization techniques can be used with data described in whole or part by numerically-valued variables regardless of which approach one takes. The identification of numeric variable frontiers using holdout can be done by using the mean value for a variable at a node for generating predictions, and identifying a variable's frontier as the set of clusters that collectively minimize a measure of error such as mean-squared error.