Evaluation of induced subgroups in the ROC space
(Provost & Fawcett, 2001)
shows classifier performance in terms of
false alarm or *false positive rate*
(plotted on the *X*-axis) that needs to be minimized, and sensitivity
or *true positive rate*
(plotted on
the *Y*-axis) that needs to be maximized. The ROC space is appropriate
for measuring the success of subgroup discovery, since subgroups whose
*TPr*/FPr tradeoff is close to the diagonal can be discarded as
insignificant. An appropriate approach to evaluating a set of induced
subgroups is by using the area under the ROC convex hull defined by
subgroups with the best *TPr*/FPr tradeoff as a quality measure for
comparing the success of different learners.

Alternatives to the area under the ROC convex hull computation are other standard evaluation measures used in rule learning, such as predictive accuracy or, in the case of time/efficiency constraints that need to be taken into the account, the tradeoff measures DEA (Keller, Paterson, & Berrer, 2000) and Adjusted Ratio of Ratios (ARR) (Brazdil, Soares, & Pereira, 2001) that combine accuracy and time to assess relative performance.

Optimized accuracy is, however, not the ultimate goal of subgroup discovery. In addition to the area under the ROC convex hull quality measure, other important success measures are rule significance (measuring the distributional unusualness of a subgroup), rule coverage (measuring how large is a discovered subgroup), rule size and size of a rule set (measuring the simplicity and understandability of discovered knowledge). These measures were used to evaluate the results of the CN2-SD subgroup discovery algorithm (Lavrac et al., 2002).