Our strategy for initial clustering is * sorting*, which is
a term adapted from a psychological task that requires subjects
to perform roughly the same procedure that we describe
here [Ahn & Medin, 1989].
Given an observation and a current partition,
sorting evaluates the quality of new clusterings that
result from placing the observation in each of the existing clusters,
and the quality of the clustering that results from creating a new
cluster that only covers the new observation; the option that
yields the highest quality score (e.g., using **PU**) is selected.
The clustering grows incrementally as new observations are added.

This procedure is easily incorporated into a recursive loop that builds tree-structured clusterings: given an existing hierarchical clustering, an observation is sorted relative to the top-level partition (i.e., children of the root); if an existing child of the root is chosen to include the observation, then the observation is sorted relative to the children of this node, which now serves as the root in this recursive call. If a leaf is reached, the tree is extended downward. The maximum height of the tree can be bounded, thus limiting downward growth to fixed depth. Figure 2 shows the tree of Figure 1 after two new observations have been added to it: one observation extends the left subtree downward, while the second is made a new leaf at the deepest, existing level of the right subtree.

**Figure 2:** An updated probabilistic categorization tree.

This sorting strategy is identical to that used by Anderson and Matessa
[Anderson &
Matessa, 1991]. The children of
each cluster partition the observations that are covered by their
parent, though the measure, **PU**, used to guide sorting differs
from that of Anderson and Matessa.
The observations themselves are stored as singleton clusters
at leaves of the tree. Other hierarchical-sort based strategies
augment this basic procedure in a manner described in Section 3.3
[Fisher1987a; Hadzikadic &
Yun, 1989; Decaestecker1991].

Douglas H. Fisher

Sat Mar 30 11:37:23 CST 1996