Newsgroups: comp.speech
Path: pavo.csi.cam.ac.uk!doc.ic.ac.uk!warwick!zaphod.crihan.fr!univ-lyon1.fr!scsing.switch.ch!sun.rediris.es!sevaxu.cica.es!gaia!jesus
From: jesus@hal.ugr.es (Jesus E. Diaz Verdejo)
Subject: Re: Problem with HMM training.
Message-ID: <1993Sep28.110722.19646@sevaxu.cica.es>
Sender: news@sevaxu.cica.es (USENET News System)
Nntp-Posting-Host: gaia.ugr.es
Reply-To: jesus@hal.ugr.es
Organization: Universidad de Granada
References: <1993Sep27.170912.9710@sevaxu.cica.es>
Date: Tue, 28 Sep 1993 11:07:22 GMT
Lines: 75


In article 9710@sevaxu.cica.es, jjesus@casip.ugr.es (Jose Jesus Fernandez Rodriguez) writes:
   >Hi everyone!!.
   >
   >I have got a question to ask to all of you: Masters of ASR!! 
   
You have some Masters of ASR next door in your own deparment. ;-)

   >
   >We have developed an hybrid system for Automatic Speech Recognition, 
   >specifically, for speaker independent isolated word recognition.
   >Our system consist of a Multilayer Perceptron (MLP), which acts as phonematic
   >recognizer, and a set of HMMs, which function is to process the sequences
   >of phonemes generated by the MLP.
   >
   >In order to train the set of HMMs, we have employ the ML algorithm (Maximum
   >Likelihood, or Baum-Welch algorithm). With this configuration, we have obtained
   >a max. test recognition rate of 94.25 %. And training sequences were
   >recognized in the range 99.7 to 99.8 %. Also, we try a system with a 
   >different number of states per HMM, and we were succeded in improving test rate 
   >to 94.5 %.
   >
   >When we got these rates, in order to improve them, we decided to employ a 
   >discriminative algorithm to train the HMMs: And we used the MMI algorithm
   >(Bahl, Brown, de Souza, Mercer: "Maximum mutual information estimation of
   >hidden Markov model parameters for speech recognition". Proc. ICASSP, 1986).
   >As a result, we have achieved a max. training seq. rate of even 100%,
   >however, the max. test recog. rate does not exceed 92.25%. We have tried to make
   >some modifications to this algorithm, but we have not obtained any improvement.
   >
   >I would like to ask you,... 
   >
   >why don't the system of HMMs trained with MMI algorithm generalize?, i.e.
   >why does it improve train. rate whereas make test rate worse?
   >
   >I would be grateful to you for any comments.
   >

I think you have to consider three main points:

   a) Does the database you are using contain enough data for a MMI estimation?
   It's well known that the MMI algorithm needs a big database for a good 
estimation. In same experiments carried out in our lab we have concluded that 
for the database we are using the MMI estimation algorithm does not improve 
the recognition rate. 

   b) The number of units you are using.
   If the number of units used for the recognition is small, it does not worth
to train the models with MMI. The main effort of the MMI training is done in
increasing the diferences between confusable clases. So, it is better to train
first the models with a ML algorithm to adapt them to each clase and after
that, train them with the MMI. This way, only the confusable models will be 
updated, not degrading the performance of the other models.

   c) Overtraining.
   If you try to enforce a high recognition rate for the training set, the
models will probably adapt its values to those sequences. So, you have to
be careful with the threshold you are using for the convergence of
the algorithm. If this threshold is too small, the final models will only
work properly with the sequences in the training set. I guess this is the
main problem you have.

   >--
   >Jose Jesus - jjesus@casip.ugr.es   
   >Departament of Electronics & Computer Technology.
   >University of Granada. Spain. 
   >

Jesus E. Diaz Verdejo
Speech Research Lab
Signal Processing & Communications Research Group
Dpt. of Electronics & Computer Technology
University of Granada. SPAIN   


