Our experience in different medical applications indicates that
intuitions like ``how expensive is every FP prediction in terms of
additional TP predictions made by a rule'' are useful for
understanding the problem of directing the search in the TP/FP
space. Suppose that the definition of cost parameter c is based on
the following argument: ``For every additional FP, the rule should
cover more than c additional TP examples in order to be better.''
Based on such reasoning, it is possible to define a quality measure
qc, using the following TP/FP tradeoff:
In Algorithm SD, the quality measure qg, using a different TP/FP tradeoff is used: qg = TP/(FP+g), where g is the generalization parameter.
If a subgroup discovery algorithm employs exhaustive search (or if all points in the TP/FP space are known in advance) then the two measures qg and qc are equivalent in the sense that every optimal solution lying on the convex hull can be detected by using any of the two heuristics; only the values that must be selected for parameters g and c are different. In this case, qc might be even better because its interpretation is more intuitive.
However, since Algorithm SD is a heuristic beam search algorithm, the situation is different. Subgroup discovery is an iterative process, performing one or more iterations (typically 2-5) until good rules are constructed by forming conjunctions of features in the rule body. In this process, a rule quality measure is used for rule selection (for which the two measures qg and qc are equivalent) as well as for the selection of features and their conjunctions that have high potential for the construction of high quality rules in subsequent iterations; for this use, rule quality measure qg is better than qc. Let us explain why.
Suppose that we have a point (a rule) R in the TP/FP space, where
TP and FP are its true and false positives, respectively.
For a selected g value, qg can be determined for this rule
R. It can be shown that all points that have the same quality
qg as rule R lie on a line defined by the following
In the TP/FP space, points with higher quality than qg are above this line, in the direction of the upper left corner. Notice that in the TP/FP space the top-left is the preferred part of the space: points in that part represent rules with the best TP/FP tradeoff. This reasoning indicates that points that will be included in the beam must all lie above the line of equal weights qbeam which is defined by the last rule in the beam. If represented graphically, first number of rules, found in the TP/FP space when rotating the line from point (0,Pos) in the clockwise direction, will be included in the beam. The center of rotation is point (-g,0). This is illustrated in Figure 12.
On the other hand, for the qc quality measure defined by the situation is similar but not identical. Points with the same quality lie on a line , but its slope is constant and equal to c. Points with higher quality lie above the line in the direction of the left upper corner. The points that will be included into the beam are the first points in the TP/FP space found by a parallel movement of the line with slope c, starting from point (0,Pos) in the direction towards the lower right corner. This is illustrated in Figure 13.
Let us now assume that we are looking for an optimal rule which is very specific. In this case, parameter c will have a high value while parameter g will have a very small value. The intention is to find the same optimal rule in the TP/FP space. At the first stage of rule construction only single features are considered and most probably their quality as the final solution is rather poor. See Figure 14 for a typical placement of potentially interesting features in the TP/FP space.
The primary function of these features is to be good building blocks so that by conjunctively adding other features, high quality rules can be constructed. By adding conjunctions, solutions generally move in the direction of the left lower corner. The reason is that conjunctions can reduce the number of FP predictions. However, they reduce the number of TP's as well. Consequently, by conjunctively adding features to rules that are close to the left lower corner, the algorithm will not be able to find their specializations nearer to the left upper corner. Only the rules that have high TP value, and are in the upper part of the TP/FP space, have a chance to take part in the construction of interesting new rules.
Figure 15 illustrates the main difference between quality measures qg and qc: the former tends to select more general features from the right upper part of the TP/FP space (points in the so-called `g space'), while the later `prefers' specific features from the left lower corner (points in the so-called `c space'). In cases when c is very large and g is very small, the effect can be so important that it may prevent the algorithm from finding the optimal solution even with a large beam width. Notice, however, that Algorithm SD is heuristic in its nature and no statements are true for all cases. This means that in some, but very rare cases, a quality measure based on parameter c may result in a better final solution.