Building Synthetic Voices
This tutorial will give an overview of the basic techniques available
for building synthetic voices for speech synthesis systems, including
an actual example of voice building.
The first part will describe the basic components of a speech synthesis
system covering the state of the art techniques used within them.
Text Analysis: addressing issues of expansions of symbols,
numbers, acronyms etc and resolving homographs
Linguistic Analysis: "from words to how to say them",
addressing issues in lexical entries, letter to sound rules and
prosodic modeling, (phrasing, intonation and duration).
Waveform Synthesis: "from phones and prosody to waveforms"
describing basic techniques for making computers talk using
recorded prompts, diphones, and general unit selection synthesis
The second part will describe the basic stages required in building
new synthetic voices (in English or other languages):
- building a text analysis system
- building a lexicon and letter to sound rules
- build phrasing, intonation and duration models
- recording data for concatenative speech synthesis
(diphones, unit selection and/or limited domain)
This tutorial is based on the techniques, documentation and tools
freely distributed through CMU's FestVox project
(http://festvox.org/) leading to voices
that can be run on Edinburgh University's Festival Speech Synthesis