next up previous
Next: Identifying Spoken Speech in Up: Identifying Speakers in Children's Previous: Introduction

ESPER: architecture

In order to narrate a children's story using a variety of synthesized voices, ESPER steps through a number of stages to identify the speaker for each piece of quoted speech in the story. It is necessary to first identify all the pieces of spoken speech in the story, as well as all the characters in the story who are potential speakers. Then an association must be made between each piece of quoted speech and the appropriate story character who has spoken it. At each processing step, ESPER encapsulates all the acquired speech information in a markup format such as HTML, Sable (an XML-based speech synthesis markup language) [3], and CSML, (Childrens Story Markup Language), a specially-created Markup language for speech information in children's stories.

ESPER is implemented within the Festival Speech Synthesis framework [4]. Although ESPER itself does not speak, it will be a component of the larger storyteller system. Festival also provides much of the infrastructure that detailed text analysis requires: such as punctuation and tokenization, part of speech tagging, utterance representation, and extraction of data for machine learning techniques. In addition, we make use of Festival's XML support.



Subsections
next up previous
Next: Identifying Spoken Speech in Up: Identifying Speakers in Children's Previous: Introduction
Alan W Black 2003-10-20