next up previous
Next: Related work Up: Special Challenges of Arabic Previous: Gender Differences in Speech

Voweling

As has been mentioned, normal Arabic text written for adults does not contain vowels and other phonological markings necessary to expand the orthography to a reasonably phonetic form. This is in some sense analogous to the grapheme-to-phoneme problem for English; the correct pronunciation of an English word is not often obvious from its spelling, and there are many words for which multiple pronunciations are possible. For English, however, we can rely on electronic lexicons that provide the correct pronunciation for an orthographic string. A comparable body of work does not exist for Arabic.

For synthesis, we must know what the correct vowel is. Diacritics indicating the correct MSA vowel are shown in religious texts and literature for children, and are known as the vocalization or the voweling. The process of adding all of the diacritics to an unmarked text is called diacritization.

There are two obvious approaches to solving the voweling problem for spoken language: inferring the vowels and enumerating the lexicon. The former has been applied with some success in recognition; vowels were guessed with 80% accuracy [6]. Synthesis requires a much higher level of accuracy than recognition, however, and we have selected the enumerative approach to voweling.


next up previous
Next: Related work Up: Special Challenges of Arabic Previous: Gender Differences in Speech
Alan W Black 2003-10-27