next up previous
Next: Examination of the Rulesets Up: Problematic Dialogue Predictor Previous: Hand-labelled Features

Precision and Recall

The performance of the system that uses automatic features (including auto-SLU-success) for the first utterance is given in Table 6. This system has an overall accuracy of 69.6%. These results show that, given the first exchange, the ruleset predicts that 18.3% of the dialogues will be problematic, while 33% of them actually will be. Of the problematic dialogues, it can predict 31.6% of them. Once it predicts that a dialogue will be problematic, it is correct 56.6% of the time.


  
Figure 11: A subset of rules learned by RIPPER when given the automatic features for determining problematic dialogues
\begin{figure*}
\par\rule{6in}{.2mm} \\
{\bf if} (e2-salience-coverage $\leq$\s...
...bf then} {\it problematic} \\
\rule{6in}{.2mm} \\
\vspace{-.2in}
\end{figure*}


  
Figure 12: A subset of rules learned by RIPPER when given the TASK-INDEPT features for determining problematic dialogues
\begin{figure*}
\rule{6in}{.2mm} \\
{\bf if} (e2-top-confidence $\leq$\space 0....
...bf then} {\it problematic} \\
\rule{6in}{.2mm} \\
\vspace{-.2in}
\end{figure*}


 
Table 7: Precision and Recall with Exchange 1&2 Automatic Features
Class Occurred Predicted Recall Precision
TASKSUCCESS 67.0 % 80.0 % 94.8 % 79.1 %
PROBLEMATIC 33.0 % 20.0 % 49.5 % 79.7 %

 

The performance of the system that uses automatic features for Exchanges 1&2 is summarized in Table 7. These results show that, given the first two exchanges, this ruleset predicts that 20% of the dialogues will be problematic, while 33% of them actually will be. Of the problematic dialogues, it can predict 49.5% of them. Once it predicts that a dialogue will be problematic, it is correct 79.7% of the time. This classifier has an improvement of 17.87% in recall and 23.09% in precision, for an overall improvement in accuracy of 9.6% over using the first exchange alone.


next up previous
Next: Examination of the Rulesets Up: Problematic Dialogue Predictor Previous: Hand-labelled Features
Helen Hastie
2002-05-09