As in COBWEB, AUTOCLASS [ Cheeseman et al., 1988], and other systems [Anderson & Matessa, 1991], we will assume that clusters, , are described probabilistically: each variable value has an associated conditional probability, , which reflects the proportion of observations in that exhibit the value, , along variable . In fact, each variable value is actually associated with the number of observations in the cluster having that value; probabilities are computed `on demand' for purposes of evaluation.

**Figure 1:** A probabilistic categorization tree.

Probabilistically-described clusters arranged in a tree
form a hierarchical
clustering known as a probabilistic categorization tree. Each set of
sibling clusters partitions the observations covered by the common parent.
There is a single * root* cluster, identical in structure to other
clusters, but covering all observations and containing frequency
information necessary to compute 's as required
by category utility. Figure 1 gives
an example of a probablistic categorization
tree (i.e., a hierarchical clustering) in which each node
is a cluster of observations summarized probabilistically.
Observations are at leaves and are described by three variables:
` Size`, ` Color`, and ` Shape`.

Douglas H. Fisher

Sat Mar 30 11:37:23 CST 1996