next up previous
Next: Building Small Accurate Decision Up: Inducing Interpretable Voting Classifiers Previous: Our Contribution

Decision Committees

Let $c$ be the number of classes. Unless otherwise specified, an example $e$ is a couple $e=(o,c_o)$ where $o$ is an observation described over $n$ variables, and $c_o$ its corresponding class among $\{0, 1, ..., c-1\}$ ; to each example $(o,c_o)$ is associated a weight $w((o,c_o))$, representing its appearance probability with respect to a learning sample $LS$ which we dispose of. $LS$ is itself a subset of a whole domain which we denote ${\cal X}$. Obviously, we do not have entire access to ${\cal X}$ ( $LS\subset
{\cal X}$) : in general, we even have $\vert LS\vert \ll \vert{\cal X}\vert$ ($\vert.\vert$ denotes the cardinality; we suppose in all that follows that ${\cal X}$ is discrete with finite cardinality). In the particular case where $c=2$, the two classes are noted ``-'' ($c_o=0$) and ``+'' ($c_o=1$), and called respectively the negative and positive class. The learning sample is the union of two samples, noted $LS^-$ and $LS^+$, containing respectively the negative and positive examples. It is worthwhile to think the positive examples as belonging to a subset of ${\cal X}$ containing all possible positive examples, usually called the target concept.
As part of our goal in machine learning, is the need to build a reliable approximation to the true classification of the examples in ${\cal X}$, that is, a good approximation of the target concept, by using only the examples in $LS$. Good approximations shall have a high accuracy over ${\cal X}$, although we do not have access to this quantity, but rather to its estimator: a more or less reliable accuracy computable over $LS$. We refer the reader to standard machine learning books [Mitchell1997] for further considerations about this issue. A DC contains two parts:

For any observation $o$ and any monomial $t_i$, the proposition ``$o$ satisfies $t_i$'' is denoted by $o\Rightarrow t_i$. The opposite proposition ``$o$ does not satisfy $t_i$'' is denoted by `` $o \not\Rightarrow t_i$''. The classification of any observation $o$ is made in the following way: define $\vec{V}_o$ as follows

\vec{V}_o & = & \sum_{
o\Rightarrow t_i
\end{array}} {\vec{v}_i} \:\:.

The class assigned to $o$ is then: In other words, if the maximal component of $\vec{V}_o$ is unique, then the index gives the class assigned to $o$. Otherwise, we take the index of the maximal component of $\vec{D}$ corresponding to the maximal component of $\vec{V}_o$ (ties are solved by a random choice among the maximal components).
DC contains a subclass which is among the largest classes of Boolean formulas to be PAC-learnable [Nock Gascuel1995], however this class is less interesting from a practical viewpoint since rules can be numerous and hard to interpret. Nevertheless, a subclass of DC [Nock Gascuel1995] presents an interesting compromise between representational power and interpretability power. In this class, which is used by WIDC, each of the vector components are restricted to $\{-1,0,+1\}$ and each monomial is present at most once. The values $-1$, $0$, $+1$ allow natural interpretations of the rules, being either in favor of the corresponding class ($+1$), neutral with respect to the class ($0$), or in disfavor of the corresponding class ($-1$). This subclass, to which we relate as DC $_{\{-1,0,+1\}}$, is, as we now prove, suffering the same algorithmic drawbacks as DT [Hyafil Rivest1976] and DL [Nock Jappy1998]: even without restricting the components of the vectors, or with any restriction to a set containing at least one real value, the construction of small formulas with sufficiently high accuracy is hard. This is a clear motivation for using heuristics in decision committee's induction.

next up previous
Next: Building Small Accurate Decision Up: Inducing Interpretable Voting Classifiers Previous: Our Contribution
2002 AI Access Foundation and Morgan Kaufmann Publishers. All rights reserved.