Maintaining the temporal context can aid in other aspects of understanding. For example, Levin et al.  and Rosé et al.  found that the temporal context, as part of the larger discourse context, can be exploited to improve various kinds of disambiguation, including speech act ambiguity, type of sentence ambiguity, and type of event ambiguity.
This paper presents the results of an in-depth empirical investigation of temporal reference resolution. Temporal reference resolution involves identifying temporal information that is missing due to anaphora, and resolving deictic expressions, which must be interpreted with respect to the current date. The genre addressed is scheduling dialogs, in which participants schedule meetings with one another. Such strongly task-oriented dialogs would arise in many useful applications, such as automated information providers and phone operators.
A model of temporal reference resolution in scheduling dialogs was developed through an analysis of a corpus of scheduling dialogs. A critical component of any method for anaphora resolution is the focus model used. It appeared from our initial observations that a recency-based model might be adequate. To test this hypothesis, we made the strategic decision to limit ourselves to a local, recency-based model of focus, and to analyze the adequacy of such a model for temporal reference resolution in this genre. We also limit the complexity of our algorithm in other ways. For example, there are no facilities for centering within a discourse segment [33,9], and only very limited ones for performing tense and aspect interpretation. Even so, the methods investigated in this work go a long way toward solving the problem.
From a practical point of view, the method is reproducible and relatively straightforward to implement. System results and the detailed algorithm are presented in this paper. The model and the implemented system were developed primarily on one data set, and then applied later to a much more complex data set to assess the generalizability of the model for the task being performed. Both data sets are challenging, in that they both include negotiation, contain many disfluencies, and show a great deal of variation in how dates and times are discussed. However, only in the more complex data set do the participants discuss their real life commitments or stray significantly from the scheduling task.
To support the computational work, the temporal references in the corpus were manually annotated. We developed explicit annotation instructions and performed an intercoder reliability study involving naive subjects, with excellent results. To support analysis of the problem and our approach, additional manual annotations were performed, including anaphoric chain annotations.
The system's performance on unseen test data from both data sets is evaluated. On both, the system achieves a large improvement over the baseline accuracy. In addition, ablation (degradation) experiments were performed, to identify the most significant aspects of the algorithm. The system is also evaluated on unambiguous input, to help isolate the contribution of the model itself to overall performance.
The system is an important aspect of this work, but does not enable direct evaluation of the model, due to errors committed by the system in other areas of processing. Thus, we evaluate the model itself based on detailed manual annotations of the data. Important questions addressed are how many errors are attributable specifically to the model of focus and what kinds of errors they are, and how good is the coverage of the set of anaphoric relations defined in the model and how much ambiguity do the relations introduce. The analysis shows that few errors occur specifically due to the model of focus, and the relations are low in ambiguity for the data sets.
The remainder of this paper is organized as follows. The data sets are described in Section 2. The problem is defined and the results of an intercoder reliability study are presented in Section 3. An abstract model of temporal reference resolution is presented in Section 4 and the high-level algorithm is presented in Section 5. Detailed results of the implemented system are included in Section 6, and other approaches to temporal reference resolution are discussed in Section 7. In the final part of the paper, we analyze the challenges presented by the dialogs to an algorithm that does not include a model of global focus (in Section 8.1), evaluate the coverage, ambiguity, and correctness of the set of anaphoric relations defined in the model (in Section 8.2), and assess the importance of the architectural components of the algorithm (in Section 8.3). Section 9 is the conclusion.
There are three online appendices. Online Appendix 1 contains a detailed specification of the temporal reference resolution rules that form the basis of the algorithm. Online Appendix 2 gives a specification of the input to the algorithm. Online Appendix 3 contains a BNF grammar describing the core set of the temporal expressions handled by the system. In addition, the annotation instructions, sample dialogs, and manual annotations of the dialogs are available on the project web site, http://www.cs.nmsu.edu/~wiebe/projects.