ESPER offers a basic process for marking up raw text from children's stories with quotes, characters, and identification of who spoke the quoted speech. The framework allows further expansion on the basic rules and trained models that we have currently provided.
The resulting markup can then be rendered as speech with (hand specified) appropriate voices for each character through a standard speech synthesis markup language.
Our future work will be in improving the current coverage with more analysis, in addition to expanding the types of data in our markup. In addition to resolving speakers, we would like to identify the properties of the speaker that might allow automatic selection of an appropriate voice to use as well as the style the speech should be delivered in.