The goal of the PDP is to predict, on the basis of information that it has early in the dialogue, whether or not the system will be able to complete the user's task. The output classes are based on the four dialogue categories described above. However, as HANGUP, WIZARD and TASKFAILURE are treated as equivalently problematic by the system, as illustrated in Figure 5, these 3 categories are collapsed into PROBLEMATIC. Note that this categorization is inherently noisy because it is impossible to know the real reasons why a caller hangs up or a wizard takes over the call. The caller may hang up because she is frustrated with the system, or she may simply dislike automation, or her child may have started crying. Similarly, one wizard may have low confidence in the system's ability to recover from errors and use a conservative approach that results in taking over many calls, while another wizard may be more willing to let the system try to recover. Nevertheless, we take these human actions as a human labelling of these calls as problematic. Given this binary classification, approximately 33% of the calls in the corpus of 4692 dialogues are PROBLEMATIC and 67% are TASKSUCCESS.