The goal of this section is to clarify the relation between the ROC
space which is usually used for evaluating classifier performance, and
the *TP*/FP space which is being searched by the *q*_{g} heuristic in
the SD algorithm.

Evaluation of induced subgroups in the ROC space (ROC: Receiver
Operating Characteristic,
Provost & Fawcett, 2001) shows their
performance in terms of *TPr* and *FPr*, where *TPr* is the *
sensitivity* of a classifier measuring the fraction of positive
cases that are classified as positive, and *FPr* is the *false
alarm* measuring the fraction of incorrectly classified negative
cases:
,
and
.
A point in the ROC space shows
classifier performance in terms of false alarm rate *FPr* (plotted on the
*X*-axis) that should be as low as possible, and sensitivity *TPr*
(plotted on the *Y*-axis) that should be as high as possible (see
Figure 5 in Section 3.2).

The ROC space is appropriate for measuring the success of subgroup
discovery, since subgroups whose *TPr*/FPr tradeoff is close to the
diagonal can be discarded as uninteresting. Conversely, interesting
rules/subgroups are those sufficiently distant from the diagonal.
Those rules which are most distant from the diagonal define the points
in the ROC space from which a convex hull is constructed. The area
under the ROC curve defined by subgroups with the best *TPr*/FPr
tradeoff can be used as a quality measure for comparing the success of
different learners or subgroup miners. In subgroup construction, the
data analyst can try to achieve the desired *TPr*/FPr tradeoff by
building rules using different data mining algorithms, by different
parameter settings of a selected data mining algorithm or by applying
a cost-sensitive data mining algorithm that takes into the account
different misclassification costs.

The *q*_{g} measure in the SD algorithm that needs to be maximized,
tries to find subgroups that are as far as possible from the diagonal
of the ROC space in the direcion of the left upper corner (with *TPr*
equal to 100% and *FPr* equal to 0%). Note, however, that the actual
computation, as implemented in Algorithm SD, is not performed in terms
of *TPr* and *FPr*, as assumed in the ROC analysis, but rather in
terms of *TP* and *FP* in the so-called *TP*/FP space. The reason is
the improved computational efficiency of computing the *q*_{g} value
which is used as a search heuristic for comparing the quality of rules
for a given, fixed domain. For a fixed domain, the *TP*/FP space is as
appropriate as the ROC space: the ROC space is namely equivalent to
the normalized *TP*/FP space where *Pos* and *Neg* are normalization
constants for *Y* and *X* axes, respectively. The *TP*/FP space and
the ROC space are illustrated in Section 3.2 by
Figures 4 and 5,
respectively.