Newsgroups: comp.speech
Path: lyra.csx.cam.ac.uk!doc.ic.ac.uk!agate!spool.mu.edu!sol.ctr.columbia.edu!xlink.net!news.dfn.de!Germany.EU.net!EU.net!uunet!news.iij.ad.jp!wnoc-tyo-news!aist-nara!wnoc-kyo-news!atrwide!atr-la!lucke
From: lucke@itl.atr.co.jp (Helmut Lucke)
Subject: Re: HMM-isolated word recognizer
In-Reply-To: nemo@INRS-Telecom.UQuebec.CA's message of Mon, 11 Jul 1994 19:10:08 GMT
Message-ID: <LUCKE.94Jul18153132@atrq28.itl.atr.co.jp>
Sender: news@itl.atr.co.jp (USENET News System)
Nntp-Posting-Host: atrq28
Organization: ATR Telecommunications Research Labs., Kyoto, JAPAN
References: <2va0ne$hhj@prakinf2.PrakInf.TU-Ilmenau.DE>
	<LUCKE.94Jul8105817@atrq28.itl.atr.co.jp>
	<1994Jul11.191008.24306@INRS-Telecom.UQuebec.CA>
Date: Mon, 18 Jul 1994 06:31:32 GMT
Lines: 78


nemo@INRS-Telecom.UQuebec.CA (Capt. Nemo Semret) writes:

   Helmut Lucke (lucke@itl.atr.co.jp) wrote:

   : Whether you use Viterbi decoding or forward-backward calculations, your
   : likelihood should always increase. If it doesn't you certainly have
   : a bug in your programme. My advice: check your code again.

   : The reason why Likelihood monotonically increases for the Viterbi training
   : as well, is simple:

   : Consider first the problem of supervised training where we are given
   : the `correct' state sequence. In this case the re-estimation 
   : equations will give you a monotonically increasing likelihood for
   : each iteration. (Infact since the HMM is no longer `hidden' you will
   : get the optimal parameters in the first iteration and the parameters will
   : remain constant from then onwards, but this is not the point here).

   : Now if at each iteration you are allowed to choose the optimal
   : path you can only do better: The same state path as the previous
   : iteration would already result in a likelihood at least as good as in the
   : previous iteration. 

   : If the viterbi algorithm yealds an even better path, then
   : so be it. The Likelihood  must be even higher in this case.


   I believe that's not necessarily the case. The iteration might have
   improved the total likelihood while worsening the likelihood of that
   particular state path.

Well, you are wrong. As, I think the above argument shows the score
for the best path always increases if parameters are updated using the
Viterbi algorithm.

You write:
   For example, suppose there are only two paths, call them a and b. Let
   V(n,a) be the likelihood (Viterbi score) of path a after n
   iterations. Suppose, we have

   V(n,a)   > V(n,b)
   V(n+1,b) > V(n+1,a)

   ie after re-estimation, the optimal path has changed. The total
   likelihood is guaranteed to increase so

   p(n,a)V(n,a) + p(n,b)V(n,b) < p(n+1,a)V(n+1,a) + p(n+1,b)V(n+1,b)

   with p(n,a)+p(n,b)=1. 

   Now you could still have 

   V(n,a) > V(n+1,b) > V(n+1,a)

   ie the Viterbi score has not improved, neither on the old optimal
   path, nor on the new one.

This argument shows that if we were updating the parameters using
the Baum-Welch parameter updates then the likelihood of any given
path (including the optimal) may decrease. This is correct. However this
is not what we are doing. We are updating the parameters using the
Viterbi algorithm (not the Baum-Welch algorithm). This has the effect of
increasing the likelihood of the (previously) optimal path (and not the
overall likelihood). As I pointed out in my previous post the score of the
optimal path cannot decrease in this case wether the optimal path changes or
not. 



Helmut
--------------------------------------------------------------------
Helmut Lucke                                <lucke@itl.atr.co.jp>
ATR Interpreting Telecommunications Research Laboratories
2-2 Hikaridai, Seika-cho, Soraku-gun, Kyoto 619-02, JAPAN
Tel: +81-7749-5-1382 (direct)               Fax:   +81-7749-5-1308
     +81-7749-5-1301 (switchboard)
--------------------------------------------------------------------
