** Next:** The Subgroup Discovery Algorithm
** Up:** Subgroup Discovery: Rule Induction
** Previous:** Subgroup Discovery: Rule Induction

##
2.1 The Task of Expert-Guided Subgroup Discovery

The task of expert-guided subgroup discovery addressed in this work
differs slightly from the subgroup discovery task defined in
Section 1 and proposed by (Klösgen, 1996; Wrobel, 1997).
Instead of defining an optimal measure for automated subgroup search
and selection, here the goal is to *support* the expert in
performing flexible and effective search of a broad range of optimal
solutions. As a consequence, the decision of which subgroups will be
selected to form the final solution is left to the expert. The task of
the subgroup discovery algorithm is to enable the detection of rules
describing potentially optimal subgroups, which are characterized by
the property that they are correct for many target class cases
(patients with coronary heart disease, in the example domain used in
this work) and incorrect for all, or most of, non-target class cases
(healthy subjects). Target class cases included into a subgroup are
called *true positives* while non-target class cases incorrectly
included into a subgroup are called *false positives*.

The particular expert-guided subgroup discovery task addressed in this
work assumes the collaboration of the expert and the data analyst in
repeatedly running a subgroup discovery algorithm with a goal of
finding rules describing population subgroups which:

- have sufficiently large coverage,
- have a positive bias towards target class case coverage (have a
sufficiently large true positive/false positive ratio)
- are sufficiently diverse for detecting most of the target
population, and
- fulfill other experts' subjective measures of acceptability:
understandability, simplicity and actionability.

In each iteration, the task of the subgroup discovery algorithm is to
suggest one or more potentially optimal solutions.
Section 2.2 describes a heuristic search algorithm SD,
which can be used to construct many rules that are optimal with
respect to an expert selected generalization parameter. Since many of
the induced rules can be very similar, both in terms of their coverage
and the selected features, the RSS algorithm described in
Section 2.3 can be used to select a small number of
distinct rules that are offered to the expert as potentially optimal
solutions. Alternatively, subgroup discovery can be implemented
within a `weighted' covering algorithm DMS, as is the case in the
publicly available Data Mining Server ([Gamberger & Smuc, 2001), which generates up
to three best subgroups in every iteration.

** Next:** The Subgroup Discovery Algorithm
** Up:** Subgroup Discovery: Rule Induction
** Previous:** Subgroup Discovery: Rule Induction