Many speech applications have their speech output generated by some computed function. Although there are some truly open domains, like reading email, many systems are substantially limited. This may be a simple as slot-and-filler templates, where some known set of names, prices, numbers, etc., and some standard prompts are used. Many IVR systems still use fully recorded prompts to keep quality up, at the price of resource footprint and flexibility. Our initial investigations into limited domain synthesizers were in the form or talking clocks and fixed weather reports, but we have found that we can also deal with more general dialog systems, especially if a backup method is provided for rare out-of-domain cases.
A key aspect of building a limited domain synthesizer is the design of a prompt list that adequately covers the domain. Ideally, we like to have an explicit representation of the utterances that can be generated (e.g. the grammar or templates of the generation system) plus information about their frequency of use. From this, a prompt list can be generated to ensure frequent (and most important) forms will be well-represented, while coverage extends to all cases. In a new system the frequency information is not always available but can be estimated. In general, prompts should have at least one occurrence of each word in the vocabulary in each prosodic context.