next up previous
Next: Anaphora Resolution and its Up: Translation of Pronominal Anaphora Previous: Translation of Pronominal Anaphora

Introduction

The anaphora phenomenon can be considered one of the most difficult problems in natural language processing (NLP). The etymology of the term anaphora originates with the Ancient Greek word ``anaphora'' ( $\alpha \nu \alpha \varphi o \rho \alpha$), which is made up of the separate words, $\alpha \nu
\alpha$ (``back, upstream, back in an upward direction'') and $\varphi o \rho \alpha$ (``the act of carrying''), and which denotes the act of carrying back upstream.

Presently, various definitions of the term anaphora exist, but the same concept underlies all of them. Halliday & Hassan [Halliday & Hasan, 1976] defined anaphora as ``the cohesion (presupposition) which points back to some previous item.'' A more formal definition was proposed by Hirst [Hirst, 1981], which defined anaphora as ``a device for making an abbreviated reference (containing fewer bits of disambiguating information, rather than being lexically or phonetically shorter) to some entity (or entities) in the expectation that the receiver of the discourse will be able to disabbreviate the reference and, thereby, determine the identity of the entity.'' Hirst refers to the entity as an anaphor, and the entity to which it refers is its antecedent:

In this example, the pronoun she is the anaphor and the noun phrase Mary is the antecedent. This type of anaphora is the most common type, the so-called pronominal anaphora.

The anaphora phenomenon can be further broken down into two processes: that of resolution and generation. ``Resolution'' refers to the process of determining the antecedent of an anaphor; ``generation'' is the process of creating references over a discourse entity.

In the context of machine translation, the resolution of anaphoric expressions is of crucial importance in order to translate/generate them correctly into the target language [Mitkov & Schmidt, 1998]. Solving the anaphora and extracting the antecedent are key issues for correct translation into the target language. For instance, when translating into languages which mark the gender of pronouns, resolution of the anaphoric relation is essential. Unfortunately, the majority of MT systems do not deal with anaphora resolution, and their successful operation usually does not go beyond the sentence level.

We have employed a computational system that focuses on anaphora resolution in order to improve MT quality and have then measured the improvements. The SUPAR (Slot Unification Parser for Anaphora Resolution) system is presented in the work of Ferrández, Palomar, & Moreno [Ferrández et al., 1999]. This system can deal with several kinds of anaphora with good results. For example, the system resolves pronominal anaphora in Spanish with a precision rate of 76.8% [Palomar, M., et al., 2001]; it resolves one-anaphora in Spanish dialogues with a precision rate of 81.5% [Palomar & Martínez-Barco, 2001], and it resolves definite descriptions in Spanish direct anaphora and bridging references with precision rates of 83.4% and 63.3%, respectively [Munoz et al., 2000]. In the work presented here, we have used an MT system exclusively for pronominal anaphora resolution and translation. This kind of anaphora is not usually taken into account by most of the MT systems, and therefore pronouns are usually translated incorrectly into the target language. Although we have focused on pronominal anaphora, our approach can be easily extended to other kinds of anaphora, such as one-anaphora or definite descriptions previously resolved by the SUPAR system.

It is important to emphasize that in this work we only resolve and translate personal pronouns in the third person whose antecedents appear before the anaphor--that is, an anaphoric relation between the pronoun and the antecedent is established, and cataphoric relations (in which the antecedent appears after the anaphor) are not taken into account.

This paper focuses on the evaluation of the different tasks carried out in our approach that lead to the final task: the translation of the pronominal anaphora into the target language. The main contributions of this work are a presentation and evaluation of the multilingual anaphora resolution module (English and Spanish) and an exhaustive evaluation of the pronominal anaphora translation between these languages.

The paper is organized as follows: Section 2 shows the anaphora-resolution needs in MT and the deficiencies of traditional MT systems to resolve this phenomenon conveniently. Section 3 presents the analysis module of our approach. In Section 4, we identify and evaluate the NLP problems related to pronominal anaphora resolved in our system. Section 5 presents the generation module of the system. In Section 6, the generation module is evaluated in order to measure the efficiency of our proposal. Finally, we present our conclusions.

next up previous
Next: Anaphora Resolution and its Up: Translation of Pronominal Anaphora Previous: Translation of Pronominal Anaphora
Jesus Peral 2002-12-13