To build a unit selection speech synthesizer in Hindi our first task was to define the phoneme set; then construct a set of prompts that best covers the language. We generated a prompt-list covering most of the high frequency syllables in Hindi. A syllable is said to be a high frequency syllable if its frequency (occurrence) count in a given text corpus is relatively high. We used the large text corpus available with frequency count of the syllables in Indian languages . This text corpus contains text collected from various subjects ranging from philosophy to short stories. We selected sentences from this text corpus if it contained at least one unique instance of a high frequency syllable, not present in the previous selected sentences. These sentences were examined by a linguist primarily to break the longer sentences into smaller ones and to make these smaller sentences meaningful and easy to utter. These selected sentences were recorded by a female speaker, and a speech corpus of about 96 minutes was generated. The recording was done in a quiet room with a noise canceling microphone using the recording facilities of a typical multimedia computer system. The speech database was labeled at the phone level and the label boundaries were hand-corrected.
The duration of the speech data used in this study is about 90 minutes, and it has 620 utterances with 2344 syllables (22960 realizations), 1414 diphones (51282 realizations) and 48 phones (51282 realizations).