- ...
pronouns1
- This kind of pronouns will be presented in
detail in Section 4.1.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
- ... Systran2
- A free trial of the commercial product
SYSTRANLinks (copyright 2002 by SYSTRAN S.A.) has been used to
translate between the English and Spanish languages all the
corpora used in the evaluation of our approach. (URL =
http://w4.systranlinks.com/config, visited on
06/22/2002).
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
- ... correctly3
- In this paper, we
have used the symbols (S) and (E) to represent Spanish and English
texts, respectively. The symbol ``Ø'' indicates the
presence of the omitted pronoun. In the examples, the pronoun and
the antecedent have an index; co-indexing indicates co-reference
between them.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
- ...
descriptions4
- One-anaphora has the following
structure in English: a determiner and the pronoun one with
some premodifiers or postmodifiers (the red one; the
one with the blue bow). This kind of anaphors in Spanish consists
of noun phrases in which the noun has been omitted (el
rojo; el que tiene el lazo azul). In definite
descriptions, anaphors are formed by definite noun phrases that
refer to objects that are usually uniquely determined in the
context.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
- ... information5
- The SS stores the
following information for each constituent: constituent name (NP,
PP, etc.), semantic and morphological information, discourse
marker (identifier of the entity or discourse object), and the SS
of its subconstituents.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
- ...
stage6
- In the evaluation of our approach, we have only
used an English corpus (SemCor) where all content words are
annotated with their WordNet sense; this sense has been used to
identify the semantic category of the word. The remaining corpora
do not have information about the senses of the content words;
therefore, a set of heuristics has been used to identify their
semantic categories. Currently, a WSD module [Montoyo & Palomar, 2000]
is being developed in our Research Group, which will be incorporated into our system in the future.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
- ... LEXESP7
- The LEXESP corpus belongs to the project of
the same name, carried out by the Psychology Department of the
University of Oviedo and developed by the Computational
Linguistics Group of the University of Barcelona, with the
collaboration of the Language Processing Group of the Catalonia
University of Technology, Spain.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
- ... insignificant8
- In order to compare our
system with other systems, in Section 6.2 we evaluate pronoun
translation (including zero pronouns) between Spanish and English
using the commercial product SYSTRANLinks and the Spanish LEXESP
corpus. The evaluation highlights the deficiencies of zero-pronoun
detection, resolution, and translation (out of 559 anaphoric,
third-person, zero pronouns in the LEXESP corpus, only 266 were
correctly translated into English--a precision of only 47.6%).
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
- ... information9
- It is important to
mention here that semantic information was not available for
the Spanish corpora.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
- ...
SUPAR10
- A detailed study of these implementations in SUPAR
is presented in Palomar et al. [Palomar, M., et al., 2001].
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
- ... Hobbs11
- Hobbs's baseline is frequently used to
compare most of the work accomplished on anaphora resolution. Hobbs's algorithm does not work as well as ours because
it carries out a full parsing of the text. Furthermore,
the manner in which the syntactic tree is explored using Hobbs's algorithm is
not the best one for Spanish, since it is nearly a
free-word-order language.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
- ... resolution12
- As
previously mentioned, only anaphoric, third-person, personal
pronouns will be resolved in order to translate them into the
target language.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
- ...
MTI13
- This corpus was provided by the Computational
Linguistics Research Group of the School of Humanities, Languages
and Social Studies, University of Wolverhampton, England. The
corpus is anaphorically annotated indicating the anaphors and
their correct antecedents.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
- ... entities14
- If we use a
basic ontology based on semantic features, at the top level,
entities could be classified into three main categories:
person, animal, and object.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
- ...
anaphor15
- The sentences of the SemCor corpus are very long
(with an average of 24.3 words per sentence). This fact implies a
large number of candidates per anaphor (an average of 15.2) after applying constraints.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
- ...
anaphor16
- The sentences of the MTI corpus are not very long
(with an average of 15.5 words per sentence). However, the
candidates per anaphor, after applying constraints, are high (an
average of 13.6).
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
- ... pre\-sen\-ted17
- As mentioned earlier, all the results presented here were automatically
obtained after the anaphoric annotation of each pronoun. After
the tagging and the partial parsing of the source text,
pronominal anaphora were resolved and translated into the
target language. None of the intermediate outputs needed to be
adjusted manually in order to be processed subsequently.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
- ... information18
- Hobbs proposed the use
of semantic information using selectional restrictions as a
straightforward extension of his method in order to improve the
obtained results in anaphora resolution.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
- ... texts19
- In order to detect pleonastic
it pronouns in AGIR, a set of rules, based on pattern
recognition, that allow for the identification of this type of
pronoun is constructed. These rules were based on the work
of [Lappin & Leass, 1994,Paice & Husk, 1987,Denber, 1998], which
dealt with this problem in a similar way. We have used the
information provided by the POS tagger in order to improve the
detection of the different patterns. We have evaluated the
method using journalistic texts for a portion of the Federal
Register corpus that contains a set of 313 documents (156,831
words). In the detection of pleonastic it pronouns a
88.7% P (568 out of 640) was obtained. Finally, it is very
important to point out the high percentage of it pronouns
in the test corpus that are pleonastic (32.9%). This fact
demonstrates the importance of the correct detection of this kind
of pronoun in any MT system.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
- ... on20
- In the automatic
evaluation, a pronoun was considered as correctly translated when
the pronoun proposed by the system was the same as that proposed by
the human annotator. With this criterion, we evaluated the correct
application of the corresponding morphological rule.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
- ...
web--BABELFISH21
- URL =
http://www.babelfish.altavista.com (visited on
03/11/2002).
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
- ... SYSTRANLinks22
- URL =
http://w4.systranlinks.com/config (visited on
06/22/2002).
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.