Evaluation of induced subgroups in the ROC space (Provost & Fawcett, 2001) shows classifier performance in terms of false alarm or false positive rate (plotted on the X-axis) that needs to be minimized, and sensitivity or true positive rate (plotted on the Y-axis) that needs to be maximized. The ROC space is appropriate for measuring the success of subgroup discovery, since subgroups whose TPr/FPr tradeoff is close to the diagonal can be discarded as insignificant. An appropriate approach to evaluating a set of induced subgroups is by using the area under the ROC convex hull defined by subgroups with the best TPr/FPr tradeoff as a quality measure for comparing the success of different learners.
Alternatives to the area under the ROC convex hull computation are other standard evaluation measures used in rule learning, such as predictive accuracy or, in the case of time/efficiency constraints that need to be taken into the account, the tradeoff measures DEA (Keller, Paterson, & Berrer, 2000) and Adjusted Ratio of Ratios (ARR) (Brazdil, Soares, & Pereira, 2001) that combine accuracy and time to assess relative performance.
Optimized accuracy is, however, not the ultimate goal of subgroup discovery. In addition to the area under the ROC convex hull quality measure, other important success measures are rule significance (measuring the distributional unusualness of a subgroup), rule coverage (measuring how large is a discovered subgroup), rule size and size of a rule set (measuring the simplicity and understandability of discovered knowledge). These measures were used to evaluate the results of the CN2-SD subgroup discovery algorithm (Lavrac et al., 2002).