Precision and Recall

Next: Examination of the Rulesets Up: Problematic Dialogue Predictor Previous: Hand-labelled Features

Precision and Recall

The performance of the system that uses automatic features (including auto-SLU-success) for the first utterance is given in Table 6. This system has an overall accuracy of 69.6%. These results show that, given the first exchange, the ruleset predicts that 18.3% of the dialogues will be problematic, while 33% of them actually will be. Of the problematic dialogues, it can predict 31.6% of them. Once it predicts that a dialogue will be problematic, it is correct 56.6% of the time.

**Figure 11:** A subset of rules learned by RIPPER when given the automatic features for determining problematic dialogues
$\begin{figure} \par\rule{6in}{.2mm} \\ {\bf if} (e2-salience-coverage $\leq$\s... ...bf then} {\it problematic} \\ \rule{6in}{.2mm} \\ \vspace{-.2in} \end{figure}$

**Figure 12:** A subset of rules learned by RIPPER when given the TASK-INDEPT features for determining problematic dialogues
$\begin{figure} \rule{6in}{.2mm} \\ {\bf if} (e2-top-confidence $\leq$\space 0.... ...bf then} {\it problematic} \\ \rule{6in}{.2mm} \\ \vspace{-.2in} \end{figure}$

Table 7: Precision and Recall with Exchange 1&2 Automatic Features

Class	Occurred	Predicted	Recall	Precision
TASKSUCCESS	67.0 %	80.0 %	94.8 %	79.1 %
PROBLEMATIC	33.0 %	20.0 %	49.5 %	79.7 %

The performance of the system that uses automatic features for Exchanges 1&2 is summarized in Table 7. These results show that, given the first two exchanges, this ruleset predicts that 20% of the dialogues will be problematic, while 33% of them actually will be. Of the problematic dialogues, it can predict 49.5% of them. Once it predicts that a dialogue will be problematic, it is correct 79.7% of the time. This classifier has an improvement of 17.87% in recall and 23.09% in precision, for an overall improvement in accuracy of 9.6% over using the first exchange alone.

Next: Examination of the Rulesets Up: Problematic Dialogue Predictor Previous: Hand-labelled Features

Helen Hastie
2002-05-09