Kearns Mansour (1998) present a novel algorithm to prune decision trees, based on a test
over locally observed errors. Its principle is simple: each internal node of a
DT is tested only once in a bottom-up fashion, and we estimate the
local error over the learning examples reaching this node, before and after
the removal of the node. If the local error after removal is not greater than the
local error before, plus a penalty term, then we remove the node and its
subtree. The penalty term makes the pruning essentially optimistic, that is,
we tend to overprune the decision tree. However, thanks to local uniform convergence
results, and due to the fact that certain sub-classes of decision trees are
reasonably large, Kearns Mansour (1998) are able to prove that with high probability, the
overpruning will not be too severe with respect to the optimal subtree of the
initial DT. We refer the reader to their paper for further
theoretical results, not needed here. The point is that by using the results
of Kearns Mansour (1998), we can obtain a similar test for DC. We emphasize that
our bound might not enjoy the same theoretical properties as for decision
trees, because of the cardinality reasons briefly outlined before. However,
such a test is interesting since it may lead especially to very small and
interpretable decision committees, with the obvious hope that their accuracy
will not decrease too much. Furthermore, the paper of Kearns Mansour (1998) does not contain
experimental results. We think our criterion as a way to test heuristically the experimental
feasibility of some of the results of Kearns Mansour (1998). The principle of our criterion is exactly the same as the original
test of Kearns Mansour (1998) : ``can we compare, when testing some rule and using the examples that satisfy the rule, the errors before and after removing the rule''? Let
represent the error before removing the rule, on the local sample
satisfying monomial . Denote
as the error before removing , still measured on the local sample
. Then we define the heuristic ``penalty'' (proof omitted: it is a rough upperbound of Kearns Mansour (1998), Lemma 1) :

(8) |

denotes the maximum number of literals of all rules except in the current DC, that an arbitrary example could satisfy. The fast calculability of is obtained at the expense of a greater risk of overpruning, whose effects on some small datasets were experimentally dramatic for the accuracy. In our experiments, which contain very small datasets, we have chosen to tune a parameter limiting the effects of this combinatorial upperbound. More precisely, We have chosen to uniformly resample into a larger subset of 5000 examples, when the initial contained less than 5000 examples. By this, we artificially increase and mimic for the small domains new domains with an identical larger size, with the additional benefits that reasonable comparisons may be made on pruning.

The value of