HMIHY is a spoken dialogue system based on the notion of call routing [Gorin, Riccardi, WrightGorin et al.1997,Chu-Carroll CarpenterChu-Carroll Carpenter1999]. In the HMIHY call routing system, services that the user can access are classified into 14 categories, plus a category called other for tasks that are not covered by the automated system and must be transferred to a human operator [Gorin, Riccardi, WrightGorin et al.1997]. Each category describes a different task, such as person-to-person dialing, or receiving credit for a misdialed number. The system determines which task the caller is requesting on the basis of its understanding of the caller's response to the open-ended system greeting AT&T, How May I Help You?. Once the task has been determined, the information needed for completing the caller's request is obtained using dialogue submodules that are specific for each task [Abella GorinAbella Gorin1999].
The HMIHY system consists of an automatic speech recognizer, a spoken language understanding module, a dialogue manager, and a computer telephony platform. During the trial, the behaviors of all the system modules were automatically recorded in a log file, and later the dialogues were transcribed by humans and labelled with one or more of the 15 task categories, representing the task that the caller was asking HMIHY to perform, on a per utterance basis. The log files also included labels indicating whether the wizard had taken over the call or the user had hung up. Our experiments use the log files to extract automatically obtainable features used as predictors, and to define the classes of dialogues that we want to learn to predict. The corpus of 4692 dialogues used in our experiments was collected in several experimental trials of HMIHY on live customer traffic [Riccardi GorinRiccardi Gorin2000,E. Ammicht AlonsoE. Ammicht Alonso1999], and is referred to as HM2 in [Riccardi GorinRiccardi Gorin2000]. The dialogues vary in length, 97% are five exchanges or less with 23% of all the dialogues consisting of only two exchanges.
As mentioned above, dialogues in which HMIHY successfully automates the customer's call, as illustrated in Figure 1, are referred to as TASKSUCCESS. Other calls, which are problematic, are divided into three categories. The first category, referred to as HANGUP, results from a customer's decision to hang up on the system. A sample HANGUP dialogue is in Figure 2. A caller may hang up because s/he is frustrated with the system; our goal is to learn from the corpus which system behaviors led to the caller's frustration.
The second problematic category ( WIZARD), results from a human customer care agent's decision to take over the call from the system. Because HMIHY is experimental, each call during the field trial was monitored by a human agent serving as a wizard who could override the system. There were a number of agents who participated as wizards during the trial of HMIHY and each wizard was simply told to take over the call if s/he perceived problems with the system's performance. The wizard's decision was logged by the experimental setup, resulting in labelling the call as one that the wizard took over. Of course we can only infer what might have motivated the wizard to take over the call, but we assume that the wizard had good reason for doing so. A dialogue where the wizard decided that the dialogue was problematic and took over the call is shown in Figure 3.
The third problematic category, the TASKFAILURE dialogues, are cases where the system completed the call, but carried out a task that was not the one that the customer was actually requesting. An example TASKFAILURE dialogue is given in Figure 4: HMIHY interpreted utterance U2 as a request to make a third-party call e.g. to bill it to my home phone. HMIHY then asked the caller for the information it needed to carry out this task, the caller complied, and the system completed the call.