Next: The Temporal Annotations and Up: An Empirical Approach to Previous: Introduction

The Corpora

The algorithm was primarily developed on a sample of a corpus of Spanish dialogs collected under the JANUS project at Carnegie Mellon University [32]. These dialogs are referred to here as the ``CMU dialogs.'' The algorithm was later tested on a corpus of Spanish dialogs collected under the Artwork project at New Mexico State University by Daniel Villa and his students [42]. These are referred to here as the ``NMSU dialogs.'' In both cases, subjects were asked to set up a meeting based on schedules given to them detailing their commitments. The NMSU dialogs are face-to-face, while the CMU dialogs are like telephone conversations. The participants in the CMU dialogs rarely discuss anything from their real lives, and almost exclusively stay on task. The participants in the NMSU dialogs embellish the schedule given to them with some of their real life commitments, and often stray from the task, discussing topics other than the meeting being planned.