LVCSR-BASED LANGUAGE IDENTIFICATION
        Tanja Schultz, Ivica Rogina, Alex Waibel
                  <tanja@ira.uka.de>

             Interactive Systems Laboratories 
             University of Karlsruhe (Germany)
             Carnegie Mellon University (USA)
                 
                 published at: ICASSP 96

Automatic language identification is an important problem in 
building multilingual speech recognition and understanding 
systems. Building a language identification module for four 
languages we studied the influence of applying different levels 
of knowledge sources on a large vocabulary continuous speech 
recognition (LVCSR) approach, i.e. the phonetic, phonotactic,
lexical, and syntactic-semantic knowledge.

The resulting language identification (LID) module can 
identify spontaneous speech input and can be used as a front-end 
for our multilingual speech-to-speech translation system JANUS-II.
A comparison of five LID systems showed that the incorporation of 
lexical and linguistic knowledge reduces the language identification 
error for the 2-language tests up to 50%.

Based on these results we build a LID module for German, English, 
Spanish, and Japanese which yields 84% identification rate on the
Spontaneous Scheduling Task (SST).