next up previous
Next: Pronominal Anaphora Translation into Up: Evaluation of the Generation Previous: Evaluation of the Generation

Pronominal Anaphora Translation into Spanish

In this experiment, the translation of English, third-person, personal pronouns into Spanish was evaluated.

We tested the method on the portions of the SemCor and MTI corpora used previously in the process of anaphora resolution. The training corpus was used for improving the number and gender rules. The remaining fragments of the corpora were reserved for test data.

We needed to know the semantic category (person, animal, or object) and the grammatical gender (masculine or feminine) of the pronoun's antecedent in order to apply the number and gender rules. In the SemCor corpus, the WordNet sense was used to identify the antecedent's semantic category. In the MTI corpus, due to the lack of semantic information, a set of heuristics was used to determine the antecedent's semantic category.

With regard to information about the antecedent's gender, an English-Spanish electronic dictionary was used since the POS tag does not usually provide gender and number information. The dictionary was incorporated into the system as a database. For each English word, the dictionary provides a translation into Spanish, and the word's gender and number in Spanish.

The number and gender rules were applied using this morphological and semantic information. We conducted a blind test over the entire test corpus, and the obtained results appear in Table 10.

Table 10: Translation of pronominal anaphora into Spanish, evaluation phase
  Subject Compl Correct Total P(%)
SEMCOR 197 47 229 254 90.2
MTI 239 231 353 470 75.1
TOTAL 436 288 582 724 80.4

The evaluation of this task was automatically carried out after the anaphoric annotation of each pronoun. This annotation included information about the antecedent and the translation into the target language of the anaphor. To do so, the human annotators translated the anaphors according to the criteria established by the morphological rules. For example, the pronoun it with subject function was translated into the Spanish pronoun él if its antecedent was of the animal type and masculine; on the other hand, if its antecedent was of the object type and masculine, it was translated into the Spanish pronoun éste; and so on. In the Spanish-English translation, the pronoun él with subject function was translated into the English pronoun he if its antecedent was a person type and masculine; on the other hand, if its antecedent was an object/animal type and masculine/feminine, it was translated into the English pronoun it; and so on20.

Table 10 shows the anaphoric pronouns of each corpus classified by grammatical function: subject and complement (direct or indirect object). The last three columns represent the number of pronouns successfully solved, the total number of solved pronouns, and the obtained precision, respectively. For instance, the SemCor corpus contains 197 pronouns with subject function and 47 complement pronouns. The precision obtained in this corpus was of 90.2% (229 out of 254).

Discussion. In the translation of English personal pronouns in the third person into Spanish, an overall precision of 80.4% (582 out of 724) was obtained. Specifically, 90.2% P and 75.1% P were obtained in the SemCor and MTI corpora, respectively.

From these results, we have extracted the following conclusions:

In order to measure the efficiency of our proposal, we compared our system with one of the most representative MT systems of the moment: Systran. Systran was designed and built more than thirty years ago, and it is being continually modified in order to improve its translation quality. Moreover, it is easily accessible to Internet users through the service of MT on the web--BABELFISH21--which provides free translations between different languages. With regard to the problem of pronominal anaphora resolution and translation, Systran is one of the best MT systems studied (see Section 2) because, like our own system, it treats the problems of intersentential pronominal anaphora and Spanish zero pronouns on unrestricted texts after carrying out a partial parsing of the source text. As was mentioned in Section 2, a free trial of the commercial product SYSTRANLinks22 was used to translate between the English and Spanish languages the evaluation corpora. The results appear in Table 11.

Table 11: Translation of pronominal anaphora (complement pronouns only) into Spanish, SYSTRANLinks and AGIR
SEMCOR 75.4 82.5
MTI 58.1 69.3

The evaluation of the SYSTRANLinks output was carried out by a human translator by hand. Pronouns judged as acceptable by the translator were considered correctly translated; otherwise, they were considered incorrectly translated.

Table 11 only shows the evaluation of English complement pronoun translation into Spanish because Systran did not translate all the subject pronouns into Spanish. By analyzing the Systran outputs of both corpora, we extracted the following conclusions:

On the other hand, in our AGIR system, we have evaluated the correct application of the morphological rule to translate all source pronouns into target pronouns. A subsequent task must decide if the pronoun in the target language (a) must be generated as our system proposes, (b) must be substituted by another kind of pronoun (e.g., a possessive pronoun), or (c) must be eliminated (i.e., Spanish zero pronouns). Therefore, we have only taken into account the complement pronoun translation in order to make a fair comparison between the two systems.

As shown in Table 11, the precision obtained using AGIR is approximately 7-11% higher (depending on the corpus) than the one obtained using Systran. The errors in Systran originated in mistakes in the anaphora-resolution stage that caused incorrect translations, since the proposed antecedents and the correct ones have different grammatical gender. These errors can occur in intrasentential anaphors (as presented in Section 2) or in intersentential anaphors, as in the following example extracted from the corpora:

This example shows an incorrect English-Spanish translation of the pronoun it done by Systran. In this case, the antecedent (this information, feminine) is in the previous sentence to the anaphor. It is incorrectly solved, and then it is incorrectly translated (the pronoun él--masculine--instead of the pronoun ésta--feminine).
next up previous
Next: Pronominal Anaphora Translation into Up: Evaluation of the Generation Previous: Evaluation of the Generation
Jesus Peral 2002-12-13