The final processing stage is to assign each NEW quote to a speaker from the cast of characters identified in the previous step. We want to find out how well this can be done with the minimum amount of deep analysis.
We analyzed stories within our development corpus to devise conditions that seemed adequate for speaker identification. A number of simplifying assumptions were made that turned out to be very reasonable. First we assumed that the speaker is referenced in the same paragraph where the quoted-speech occurs; this is, in fact, true in a high percentage of cases. We also assumed that character names within the quote are not the speaker even though some speakers refer to themselves in the third person.
Through some basic analysis on test stories we came up with the following simple rule: if there is a character name preceding the quoted speech within the same paragraph, then assign it as the speaker of the quoted speech. Otherwise use the named character following the quoted speech. We have additionally augmented this to favor character names which are proper names, using the character CLASS information acquired from the character-identification stage. However, results have shown that this addition only resulted in a minimal improvement in accuracy (approx. +1%). An example of the CSML:
<QUOTE TYPE="NEW" SPEAKER="ALICE"> `As wet as ever,'</QUOTE> said <CHARACTER NAME= "ALICE" CLASS="properName"> Alice </CHARACTER> in a melancholy tone: <QUOTE TYPE="CONT"> `it doesn't seem to dry me at all.'</QUOTE> <QUOTE TYPE="NEW" SPEAKER="THE_DODO"> `In that case,'</QUOTE> said <CHARACTER ID="THE_DODO" CLASS="defNP"> the Dodo</CHARACTER>solemnly...ESPER's speaker-identification performance is as follows:
We observe a rather high variance in ESPER's performance between the two test stories. it is evident that Chapter 3 of Alice contains a more consistent speech-speaker relationships, so that ESPER is able to achieve a relatively high accuracy in finding the speakers. On the hand, there is very little regularity in the speech-speaker relationships within Little Tuk, which results in a degradation in performance. Note that these two stories were selected for their contrasting stylistic differences in order to demonstrate the difficulties within speaker identification.
Somewhat surprisingly, two criteria that we thought would be necessary to achieve this level of accuracy did not appear to be as important: giving higher preference to character names in the proximity of ``speaking'' words such as said, cried, etc., and anaphora resolution. Nevertheless these conditions are not negligible and will be investigated further in future work.
Detailed error analysis indicates that there are several factors hindering speaker identification. The most prominent one seems to be the difficulty of adapting to novel text structures within a story. For example, in the Alice test story , there is an entire section which was artistically structured to resemble the shape of a mouse's tail, and some of these stylistic conventions were easily misinterpreted as paragraph breaks by our system. Other relatively minor sources of error include the problems of differentiating quoted labels from real quoted speech (as noted earlier), and resolving speech spoken by characters whose distance to the speech exceeds the paragraph scope.
We observe that almost all cases of errors are attributable to the incorrect character being selected as the speaker. This is preferrable to the alternative scenario where we are unable to find any speakers at all for a given piece of quoted speech. Hence we are confident that by refining the disambiguation algorithm in ESPER, we can make sizable improvements to the speaker-identification task. Furthermore, given more time we would prefer to be able to train this module, but currently we do not have sufficient data, nor as yet an appropriate model within which to train. However, our current work continues to investigate this matter.