next up previous
Next: Discussion Up: Identifying Speakers in Children's Previous: CSML

Story Data

We have examined a large number of children's stories for use as data. Among them are two major collections of stories, which are used for development and evaluation purposes. The children's stories by Hans Christian Andersen and works by Lewis Carroll (focusing mainly on Alice in Wonderland). Works from these two authors are selected because of their stylistic diversity, as well as their contrasting writing styles, which is useful for testing the performance flexibility of ESPER. Note that for our development/testing corpus, we hand-selected stories with an above average number of quoted speech segments.


Table 6: Statistics For the H.C. Andersen and L. Carroll Story Collections
Story Total # of Avg # of Avg # Quoted
Collection Stories Words Speech Segments
Andersen 130 2898 23
Carroll 3 20447 539



Alan W Black 2003-10-20