We use nonsense carrier words to collect all possible diphones, following . Others have successfully used natural carrier phrases, but the argued advantage of natural delivery offers may also be a disadvantage as people may assume too much, and fail to produce exactly the desired phones. Within this framework, an experiment may be carried out to compare the results of voices made from nonsense words to one made from naturalistic text, but we know of none having been performed as of yet.
It should be noted that delivering diphones is not a particularly natural endeavor. As these phone segments will be extracted both for pitch and duration, it is important that their delivery be consistent, so that joins are more likely to be acceptable.
We believe that the use of nonsense carrier material makes the delivery of the diphones more consistent. Also, the pronunciation of a phonetic string is more clearly defined in these nonsense words than in elicited natural words. We generate carrier phrases so that, where possible, we can extract the diphones from the middle of a word. As it takes time for the human articulation system to start, we do not want to extract diphones from syllables at the start or end of words, unless these transitions to or into silence (SIL) are part of the diphone in question.
for example, from
SIL T AA B AA B AA SILwe would extract the diphones [B-AA] and [AA-B], and from
SIL T AA T EY AE T AA SILwe would get [EY-AE], as the [T-EY] and [AE-T] are taken from elsewhere, though one could indeed get all three from the one prompt.
For each class in the language, consonant-vowel, vowel-consonant, vowel-vowel, consonant-consonant, silence-phone, phone-silence, and any other special sets like syllabic, consonant clusters and allophones we build simple carrier sets and loop through all possible values generating a long list of strings of phones which contain all possible diphones in the language to be recorded. Basic scripts are provide for this that can be adapted for other language, while specific scripts are provided for currently supported languages.