nextupprevious
Next:Constraint and preference setUp:Experimental workPrevious:Corpora, tools, and description

Importance of the anaphoric accessibility space

In order to show the importance of defining an adequate anaphoric accessibility space, a study of the location of the antecedent of each pronominal and adjectival anaphora was done using the training corpus. The results are given in Table 1.9
 

Table 1: Structural anaphoric accessibility space results
Same AP1 Previous AP2 Included AP3 TOPIC4 Elsewhere5
Pronominal  60.6%  24.6%  8.2% 4.9%  1.7%
Adjectival  44.7%  28.9%  5.2%  13.4%  7.8%
Total Results  Anaphoric accessibility space proposed: 95.9%  4.1%
(pronominal: 98.3%, adjectival: 92.2%)
1 The antecedent is found in the same adjacency pair as the anaphor 
2 The antecedent is found in the previous adjacency pair to the one containing the anaphor 
3 The antecedent is found in the adjacency pair containing the adjacency pair including the anaphor 
4 The antecedent is found in the topic of the dialogue 
5 The antecedent is found elsewhere
As can be seen in the table, 95.9% of the antecedents were located in the proposed structural anaphoric accessibility space. It is estimated that the remaining antecedents (4.1%) are located in the subtopics of the dialogues.10 In order to incorporate these remaining antecedents into the anaphoric accessibility space, one might employ a strategy that uses the full space (i.e., all the noun phrases from the beginning of the dialogue to the anaphor might be used). However, as shown in Table 2, our proposal for the anaphoric accessibility space (hereafter referred to as structural), reduces the average number of candidates per anaphor (before applying constraints) to 10.74 from the 34.14 that would be obtained if the full space approach were adopted. In others words, using the full space approach would increase the number of possible candidates by a factor of three, thereby greatly increasing both the required computational effort and the possibility of selecting incorrect antecedents. Notice, too, that these experiments were performed over a collection of short dialogues (around 332 words per dialogue). These problems will be even more acute in longer dialogues.
 

Table 2: Candidates to be processed for each anaphoric accessibility space
Anaphoric accessibility space Structural Full space Window of utterances
Total candidates 1,063 3,380 1,292
Candidates per anaphor 10.74 34.14 13.05
Proportion 100% 318% 122%
Other researchers have proposed using a window with a fixed number of sentences to define the anaphoric accessibility space. This type of approach might be called a window of sentences approach. For example, Ferrández et al. 2000 propose using the three previous sentences to define the accessibility space for pronouns and the four previous sentences for adjectival anaphora in Spanish. For English, Kameyama 1997 proposes the same space for the pronominal. However, there is no structural justification for these definitions. Ferrández et al. and Kameyama performed several empirical studies to show the optimal space for each experiment. Table 3 below shows the results of a study which we performed using the Corpus Infotren: Person, the goal of which was to define an anaphoric accessibility space based on a window of sentences that can then be adapted to dialogues by means of a window of utterances. As the table shows, 11 utterances for pronominal anaphora and 10 utterances for adjectival anaphora are needed in order to cover the same number of antecedents as was covered using the structural anaphoric accessibility space (which was defined based on adjacency pairs and the topic). Since the anaphoric space using a window of utterances is not based on any principle, but rather on empirical studies, it may vary from one text to another and therefore is inadequate. Moreover, the structural anaphoric accessibility space can cover only those cases that refer to NPs introduced at the outset of the dialogue (topics), not those with a window of sentences/utterances approach.

In conclusion, it would appear that the structural anaphoric accessibility space is to be preferred, at least for anaphora resolution in dialogues.
 

Table 3: Empirical study of anaphoric accessibility space based on a window of utterances
Window of utterance: % pronominal % adjectival
``From Anaphor's anaphora anaphora
utterance to'': antecedents antecedents
Anaphor's utterance 37.7 18.4
-1 54.1 44.7
-2 70.5 52.6
-3 77.0 55.3
-4 80.3 57.9
-5 83.6 71.0
-6 88.5 73.7
-7 91.8 76.3
-8 91.8 81.6
-9 95.1 81.6
-10 96.7 92.1
-11 98.4 94.7
-12 98.4 97.4
-13 98.4 97.4
-14 100.0 100.0
   

nextupprevious
Next:Constraint and preference setUp:Experimental workPrevious:Corpora, tools, and description
patricio 2001-10-17