Introduction For training and testing large vocabulary speaker-independent speech recognition systems a large amount of transcribed speech data is necessary. To achieve robust acoustic models about 3000 utterances (around 80.000 spoken words) are sufficient, but for training n-gram language models more than 7000 utterances should be available. Transcribing of conversational speech is one of the most expensive and time consuming step of a database collection. Now, as the demand for portability and fast development of recognition systems in several languages grows, techniques for rapid cross-language transfer like bootstrapping from multilingual phoneme sets is of increasing concern. The development of reliable multilingual phoneme sets and the evaluation of these techniques requires high quality speech data, which guarantees that the only difference of the acoustic data is the spoken language. |