Evaluation of Zero-Pronoun Resolution

Next: The Anaphora-Resolution Module Up: Elliptical Zero-Subject Constructions (Zero Previous: Evaluation of Zero-Pronoun Detection

Evaluation of Zero-Pronoun Resolution

After zero pronouns have been detected, they are then resolved in the subsequent module of anaphora resolution (explained in the following subsection). Basically, an algorithm that combines different kinds of knowledge by distinguishing between constraints and preferences is used [Ferrández et al., 1999,Palomar, M., et al., 2001].

The set of constraints and preferences presents two basic differences between zero-pronoun and pronominal anaphora resolution:

Zero-pronoun resolution has the constraint of agreement only in person and number, whereas pronominal anaphora resolution also requires gender agreement.
Two new preferences to solve zero pronouns are used: (a) preference is given to candidates in the same sentence as the anaphor that have also been the solution of a zero pronoun in the same sentence as the anaphor, and (b) in the case where the zero pronoun has gender information, preference is given to those candidates that agree in gender.

In evaluating zero-pronoun resolution so as to obtain the best order of preferences (one that produces the best performance), we used the training phase to identify the importance of each kind of knowledge. To do this, we analyzed the antecedent for each pronoun in the training corpora, and we identified their configurational characteristics with reference to the pronoun (e.g., if the antecedent was a proper noun, if the antecedent was an indefinite NP, if the antecedent occupied the same position with reference to the verb as the anaphor --before or after, etc.). Subsequently, we constructed a table that showed how often each configurational characteristic was valid for the solution of a particular pronoun (e.g., the solution of a zero pronoun was a proper noun 63% of the time, for a reflexive pronoun, it was a proper noun 53% of the time, etc.). In this way, we were able to define the different patterns of Spanish pronoun resolution and apply them in order to obtain the evaluation results that are presented in this paper. The order of importance was determined by first sorting the preferences according to the percentage of each configurational characteristic; that is, preferences with higher percentages were considered more important than those with lower percentages. After several experiments on the training corpora, an optimal order for each type of anaphora was obtained. Since in this phase we processed texts from different genres and by different authors, we can state that the final set of preferences obtained and their order of application can be used with confidence on any Spanish text.

After the training, we conducted a blind test over the entire test corpus, the results for which are shown in Table 3.

Table 3: Zero-pronoun resolution, evaluation phase

	Cataphoric	Exophoric	Anaphoric
			Correct	Total	P(%)
LEXESP	640	28	455	559	81.4
BB	76	8	30	37	81.1
TOTAL	716	36	485	596	81.4

It is important to mention here that out of 3,126 verbs in these corpora, 1,348 (Table 2) are zero pronouns in the third person and will be resolved. In Table 3 we present a classification of these third-person zero pronouns, which has been conveniently divided into three categories:

Cataphoric. This category is comprised of those zero pronouns whose antecedents, that is, the clause subjects, come after the verb. For instance, in the following Spanish sentence Ø Compró [un niño] en el supermercado ([A boy] bought in the supermarket), the subject, un niño (a boy), appears after the verb, compró (bought). These kinds of verbs are quite common in Spanish (P = 53.1%, 716 out of 1,348), as can be seen in Table 3, and represents one of the main difficulties in resolving anaphora in Spanish: the structure of a sentence is more flexible than in English. These represent intonationally marked sentences, where the subject does not occupy its usual position in the sentence, that is, before the verb. Cataphoric zero pronouns will not be resolved in AGIR, since semantic information is needed to be able to discard all of their antecedents and to give preference to those that appear within the same sentence and clause after the verb.

For example, the sentence Ø Compró un regalo en el supermercado ([He] $_{\O}$ bought a present in the supermarket) has the same syntactic structure as the previous sentence, i.e., verb, NP, and PP, where the object function of the NP can only be distinguished from the subject by means of semantic knowledge.
Exophoric. This category consists of those zero pronouns whose antecedents do not appear linguistically in the text (they refer to items in the external world rather than things referred to in the text). Exophoric zero pronouns will not be resolved by the system.
Anaphoric. This category is comprised of those zero pronouns whose antecedents are found before the verb. These kinds of pronouns will be resolved by our system.

In Table 3 the numbers of cataphoric, exophoric, and anaphoric zero pronouns for each corpus are shown. For anaphoric pronouns, the number of pronouns correctly solved as well as the obtained precision, P (number of pronouns correctly solved divided by the number of solved pronouns) is presented. For example, in the LEXESP corpus, there are 640 cataphoric, 28 exophoric, and 559 anaphoric zero pronouns. From these anaphoric pronouns, only 455 were correctly solved, giving a precision of 81.4%.

Discussion. In zero-pronoun resolution, the following results have been obtained: LEXESP corpus, P = 81.4%; BB corpus, P = 81.1%. For the combined corpora, an overall precision for this task of 81.4% (485 out of 596) was obtained. The overall recall, R (the number of pronouns correctly solved divided by the number of real pronouns) obtained was 79.1% (485 out of 613).

From these results, we have extracted the following conclusions:

There are no meaningful differences between the results obtained from each corpus.
Errors in the zero-pronoun-resolution stage are originated by different causes:
- exceptions in the application of preferences that imply the selection of an incorrect antecedent as solution of the zero pronoun (64% of the global mistakes)
- the lack of semantic information⁹, causing an error rate of 32.4%
- mistakes in the POS tagging (3.6%)

Since the results provided by other works have been obtained for different languages (English), texts, and sorts of knowledge (e.g., Hobbs and Lappin full parse the text), direct comparisons are not possible. Therefore, in order to accomplish this comparison, we have implemented some of these approaches in SUPAR¹⁰, adapting them for partial parsing and Spanish texts. Although these approaches were not proposed for zero pronouns and the comparison will not be fully fair, we have implemented them since that is the only way to compare our proposal directly with some well-known anaphora-resolution algorithms.

We have also compared our system with the typical baseline of proximity preference (i.e., the antecedent that appears closest to the anaphora is chosen from among those that satisfy the constraints--morphological agreement and syntactic conditions). We have also compared our system with the baseline presented by Hobbs¹¹ [Hobbs, 1978] and Lappin & Leass' method [Lappin & Leass, 1994]. Moreover, we also compared our proposal with centering approach by implementing functional centering [Strube & Hahn, 1999]. The precisions obtained with these different approaches and AGIR are shown in Table 4. As can be seen, the precision obtained in AGIR is better than those obtained using the other proposals.

Table 4: Zero-pronoun resolution in Spanish, comparison of AGIR with other approaches

	Proximity	Hobbs	Lappin	Strube	AGIR
LEXESP	54.9	60.4	66.0	59.7	81.4
BB	48.6	62.2	67.6	59.5	81.1

Next: The Anaphora-Resolution Module Up: Elliptical Zero-Subject Constructions (Zero Previous: Evaluation of Zero-Pronoun Detection

Jesus Peral 2002-12-13