Evaluation of Zero-Pronoun Detection

In the evaluation of zero-pronoun detection, the training phase was used to carry out modifications in the grammar in order to improve the processes of partial parsing and clause splitting. After this training, we conducted a blind test over the entire test corpus. To achieve this sort of evaluation, several different subtasks may be considered. First, each verb must be detected. This task is easily accomplished since both corpora have been previously tagged. The second task is to classify the verbs into two categories: (a) verbs whose subjects have been omitted, and (b) verbs whose subjects have not. The obtained results with the LEXESP and Blue Book corpora appear in Table 2.

Table 2: Zero-pronoun detection, evaluation phase
  Verbs with subject omitted Verbs with subject not omitted
  1$^{st}$ P(%) 2$^{nd}$ P(%) 3$^{rd}$ P(%) 1$^{st}$ P(%) 2$^{nd}$ P(%) 3$^{rd}$ P(%)
LX 240 96.7 54 98.1 1,227 97.1 31 71 17 94.1 1,085 83.3
  PRECISION = 97.1% PRECISION = 83.1%
BB 0 0 0 0 121 97.5 0 0 0 0 351 82
  PRECISION = 97.5% PRECISION = 82.0%

The table is divided into two parts, corresponding to categories (a) and (b) previously mentioned. For each category, the number of verbs in first, second, and third person, together with their precision (P), are represented. Precision was defined as the number of verbs correctly classified (subject omitted or not) divided by the total number of verb classifications attempted for each type. For example, in the LEXESP corpus 1,227 verbs in the third person with their subjects omitted were classified, and the precision obtained was 97.1%.

Discussion. In the detection of zero pronouns the following results were obtained: for the LEXESP corpus, precisions of 97.1% and 83.1% were obtained for verbs whose subjects were omitted or were not omitted, respectively; for the BB corpus, precisions of 97.5% and 82% were obtained. For both corpora, an overall precision of 90.4% (2,825 out of a total of 3,126) was obtained for this task.

From these results, we have extracted the following conclusions:

Since ours is the first study done specifically on Spanish texts and since the design of the detection stage mainly depends upon the structure of the language in question, we have not compared our results with those of other published works. Such comparisons would prove to be insignificant8.

Finally, it is important to emphasize the importance of this phenomenon in Spanish. Specifically, in both corpora, the subject is omitted in 52.5% (1,642 out of 3,126) of the verbs. Furthermore, this phenomenon is even more important in narrative texts (57.3% in the LEXESP corpus) than in the technical manuals (25.6% in the BB corpus). These percentages show the importance of correctly detecting these kinds of pronouns in an MT system so as to conveniently translate them into the target language.
