next up previous
Next: Emphasis Up: Unit Selection and Emotional Previous: Emotional Speech

Recording in style

When considering building a unit selection synthetic voice, knowing the most likely usage pattern can make it easier to define the most suitable style for building a voice.

To explicitly show how the same speaker may use different styles, and the listener may require the different, we constructed a voice designed to deliver the weather. This is very much a limited domain voice with an well defined explicit vocabulary and templates. We constructed 100 sentences that gave full coverage of temperature range, outlook, wind speed and direction etc. Then we recorded the same set of sentences in two distinct styles:

Genki
: from the Japanese word for healthy, upbeat.
News
: direct ``no-nonsense''.
A typical generated sentence would be of the form.
At 7 P.M., the temperature is sixty-eight degrees Fahrenheit. The wind is from the north, at eight miles per hour. The barometric pressure is thirty inches, and steady.
The output quality of each of these synthesizers is by any standards excellent, but the styles are different. (http://cepstral.com/demos)

When playing these two synthesizers to people we get different reactions. Although we only have anecdotal results, people who actually want to know the weather prefer the News-type synthesizer while people who wish to be impressed by high-quality synthesis prefer the Genki-style voice. Neither synthesizer, can be criticized for being unnatural, but the difference in style in which the information is delivered makes a significant difference in the listeners views.


next up previous
Next: Emphasis Up: Unit Selection and Emotional Previous: Emotional Speech
Alan W Black 2003-09-07