Next: An experiment Up: Unit selection without a Previous: Unit selection without a

Background

In the continuing goal to provide sufficient tools to build synthetic voices for all languages, this paper describes experiments in restricting phonetic knowledge in building voices. As the technology improves, we are finding more and more uses of speech output that were not considered before. There are around 6,000 active languages in the world and it seems unfair to exclude them from spoken language output systems because they are not one of top 20 or so langauges by population or economics.

In fact, we believe that speech technology can be most helpful when dealing with minority languages. In languages where there is a low level of literacy, either because reading/writing is taught in some other more widely spoken language, or because of the lack of educational resources, a spoken language system may be the only reasonable way to distribute information.

The AVENUES project at CMU is concerned with building speech to speech translation systems for indigenous languages in South America. This project is designed to address issues in building speech and translation components even when very little data exists. It is not unusual in minority languages that the orthography is not well defined.

[1] gives a description of writing systems used through out the world and their relative opacity with respect to their phonetics. It is our belief that languages with a short history in writing are often more closely related to their phonetics than those with a longer history. However there may also often be the complication that the alphabet used for such minority languages is not appropriate, as it may not have the variation suitable for the language. For example, the Spanish alphabet may be used for a native American language, and the shortcomings may be resolved by the addition of diacritics. That is the case of Mapudungun, an indigenous language spoken by around one million people in Chile, which uses umlaut in addition to an alphabet based on standard Spanish. Importantly, even with a defined alphabet, the relationship between the orthography and the phonetics may not actually be one to one. Especially when one considers dialects.

Making the assumption that there will be a relationship between the letters and pronunciation, we have built a number of synthesizers which use letter information alone to determine the ``phone'' set.

To build these voices, we based our techniques on the framework provided by the CMU FestVox tool suite [2], which provides basic templates and tools for building synthetic voices in new languages. It has already been used to build a wide range of voices in at least 40 different languages.

Next: An experiment Up: Unit selection without a Previous: Unit selection without a

Alan W Black 2002-10-01