next up previous
Next: Calculating Rule Vectors using Up: Overview of WIDC Previous: Overview of WIDC

Building a Large Decision Committee using Partition Boosting

Suppose that the hypothesis (not necessarily a decision committee, it might be e.g. a decision tree) we build realizes a partition of the domain ${\cal X}$ into disjoint subsets $X_1, X_2, ..., X_N$. Fix as $[\hspace{-0.055cm}[\pi ]\hspace{-0.055cm}]$ the function returning the truth value of a predicate $\pi$. Define

W_{+}^{j,l} & = & \sum_{(o,c_o) \in LS} {w((o,c_o))}[\hspace{-...
....055cm}[(o,c_o) \in X_j \wedge c_o\neq l]\hspace{-0.055cm}]\:\:.

In other words, $W_{+}^{j,l}$ represents the fraction of examples of class $l$ present in subset $X_j$, and $W_{-}^{j,l}$ represents the fraction of examples of classes $\neq l$ present in subset $X_j$. According to Schapire $\&$ Singer (1998), a weak learner should minimize the criterion:
$\displaystyle Z$ $\textstyle =$ $\displaystyle 2\sum_{j} {\sum_{l} {\sqrt{W_{+}^{j,l} W_{-}^{j,l}}}} \:\:.$ (1)

In the case of a decision tree, the partition is that which is built at the leaves of the tree [Quinlan1994] ; in the case of a decision list, the partition is that which is built at each rule, to which we add the subset associated to the default class [Nock Jappy1998]. Suppose that we encode the decision tree in the form of a subset of monomials, by taking for each leaf the logical-$\wedge$ of all attributes from the root to the leaf. Measuring $Z$ over the tree's leaves is equivalent to measure $Z$ over the partition realized by the set of monomials. However, the monomials are disjoint from each other (each example satisfies exactly one monomial). Due to this property, only $t$ subsets can be realized with $t$ monomials, or equivalently with a tree having $t$ leaves.
Suppose that we generalize this observation by removing the disjointness condition over the monomials. Then a number of subsets of order ${\cal O}(2^t)$ is now possible with only $t$ monomials, and it appears that the number of realized partitions can be exponentially larger using decision committees than decision trees. However, the expected running time is not bigger when using decision committees, since the number of partitions is in fact bounded by the number of examples, $\vert LS\vert$. Thus, we may expect some reduction in the size of the formula we build when using decision committee, which is of interest to interpret the classifier obtained.
Application of this principle in WIDC is straightforward: a large decision committee is built by growing iteratively, in a top-down fashion, a current monomial. In this monomial, the literal added at the current step is the one which minimizes the current $Z$ criterion, over all possible addition of literals, and given that the new monomial does not exist already in the current decision committee (in order to prevent multiple additions of a single monomial). The $Z$ criterion is computed using the partition induced over $LS$ by the current set of monomials built (if two examples satisfy the same monomials, they belong to the same subset of the partition). When no further addition of a literal decreases the $Z$ value, a new monomial is created and initialized at $\emptyset$, and then is grown using the same principle. When no further creation of a monomial decreases the $Z$ value, the algorithm stops and returns the current, large decision committee with still empty vectors. In the following step, WIDC calculates these vectors. In a previous approach to building rule sets for problems with two classes [Cohen Singer1999], an iterative growing-pruning algorithm is designed (SLIPPER). The rule-growing approach of SLIPPER is certainly close to what WIDC does for growing a DC since it optimizes a $Z$ criterion, yet a notable difference is that it does not compute $Z$ over a partition induced by a set of rules. Rather, the choice of SLIPPER is to grow at each step a single monomial, prune it, and then grow a second monomial, prune it, and so on until a final DNF-shaped formula is complete and returned. Notice that SLIPPER also modifies the weight of the examples, in accordance with Boosting's standards [Schapire Singer1998].

next up previous
Next: Calculating Rule Vectors using Up: Overview of WIDC Previous: Overview of WIDC
2002 AI Access Foundation and Morgan Kaufmann Publishers. All rights reserved.