The CHD case study illustrates that expert-guided induction is an iterative process in which the expert can change the requested generality of the induced subgroups and the subset of attributes (features) that are made available for rule construction. In this way it is possible to induce different patterns (subgroups) from the same data set. The selection of one or more subgroups representing the final solution is left to the expert; the decision depends both on rule prediction properties (like the number of true positives and the tolerated number of false positives), as well as subjective properties like the understandability, unexpectedness and actionability of induced subgroup descriptions (Silberschatz & Tuzhilin, 1995), which depend on the features used in the conditions of induced rules. In the application described in this paper, the main subjective acceptability criteria were understandability, simplicity and actionability.
Partitioning the CHD risk group problem into three data stages A-C was completely based on the expert's understanding of the typical diagnostic process. From the machine learning point of view this affects the selection of subsets of attributes that are used in different experiments. Moreover, at data stage A the partitioning of the example set has been used as well. At this data stage there are only a few attributes that could have been used for rule induction. The expert's understanding of the domain suggested that the CHD population be partitioned into two subpopulations based on the sex of patients, making it significantly easer to induce interesting subgroups. This partitioning resulted in patterns A1 and A2.
Alternatively, partitioning can be performed also in the phase of performing statistical characterization of discovered subgroups, by further splitting the detected subgroups in several parts (e.g., differentiating between male and female patients that are true positive cases for the subgroup) and then comparing attribute value distributions for these parts. Any significant difference in this distribution may be potentially interesting as part of the subgroup description. As a basis for subgroup partitioning one may use either some detected supporting risk factor or any other attribute or attribute combination which is potentially interesting based on the existing expert knowledge.
There has been some effort devoted also to automating the process of partitioning example sets by a method of unsupervised learning, but its presentation is out of the main scope of this work (Smuc, Gamberger, & Krstacic, 2001).
From the methodological point of view it is interesting to notice that the expert appreciated the induced subgroups covering many target class cases (with true positive rate of at least 20%) and with false positive rate as low as possible, with the intention to keep it below 10%. But in selecting a rule, its prediction quality has not been the most important factor. The necessary condition for selecting a rule was that the expert was able to recognize connections among features building the rule that are medically reasonable. In this sense, short rules are significantly more intuitive; it can be noticed from Table 1 that all rules selected by the expert have at most three features defining the principal risk factors. The fact that the expert did not select subgroups with an optimal TP/FP ratio is illustrated by Figures 16-18 in Section 4.2, which show the positions of the patterns A1-C1 in the TP/FP space and the TP/FP convex hulls induced for data stages A-C, connecting points with the optimal coverage properties. It can be noticed that none of the expert selected patterns is lying on the TP/FP convex hull but the selected patterns are close to the convex hull.