Newsgroups: comp.speech
Path: lyra.csx.cam.ac.uk!doc.ic.ac.uk!aixssc.uk.ibm.com!ibm.de!Munich.Germany.EU.net!Germany.EU.net!EU.net!howland.reston.ans.net!europa.eng.gtefsd.com!news.uoregon.edu!gaia.ucs.orst.edu!news.cs.indiana.edu!nstn.ns.ca!newsflash.concordia.ca!CC.UMontreal.CA!vogon!nemo
From: nemo@INRS-Telecom.UQuebec.CA (Capt. Nemo Semret)
Subject: Re: HMM-isolated word recognizer
Message-ID: <1994Jul7.043319.10058@INRS-Telecom.UQuebec.CA>
Organization: INRS Telecommunications
X-Newsreader: TIN [version 1.2 PL2]
References: <2va0ne$hhj@prakinf2.PrakInf.TU-Ilmenau.DE>
Date: Thu, 7 Jul 1994 04:33:19 GMT
Lines: 63

Praktikum DOT (opt1@rz.tu-ilmenau.de) wrote:

: The following fenomena appear during the BWA: 

: 1. Sometimes a covariance matrix singularity error in one of the
: states' density functions occurs. Could it be a problem of
: insufficient training data or is it an error in my implementation? If
: it is the first, what can one do to mind it?

It could very well be insufficient data. Le x(0),x(1),... be your
training sequence. The re-estimation of the covariance matrix U(i) of
a state i with mean vector u(i) is roughly

   U(i) =  sum (x(t)-u(i))(x(t)-u(i))' p(x(t),t,i)
	    t

where p(x,t,i) is something like the probability of being in state i
at time t and obesrving vector x. So if for your training sequence
{x(t)} is not long enough, you could end up with one or two
p(x(t),t,i) dominating the sum above, because your training sequence
does not 'pass through that state' often enough.  So your U(i) will be
very close to a matrix yy' (where y=x(t)-u(i) for some t), which is of
course singular.

At least I think that was what was happening to me when I was getting
singular covariances. You can solve that with more data if you have
some, or less parameters. You can get fewer parameters by
parameter-tying, eg having the same covariance matrix for many states,
or by having fixed covariances (not re-estimating them), etc.


: 2. At some points of iteration the procedure seems to provide a
: negative difference (log P' - log P) (where P' is the probability
: after the iteration step). I think this is possible because one
: computes the log P score by the Viterbi procedure and so gets only a
: part of the whole P (i.e. log P(observation,optimal sequence)). BUT
: where shall I stop the iteration? At the same step where the gain got
: negative or later on when the gain became positive again or should I
: check just the absolute value of the gain

Why don't you use forward-backward scoring so you get the full
probability of the sequence, not just the probability along the
optimal path (which is what viterbi gives you)? That way, you are
guaranteed that the score will always improve at each iteration of
the Baum-Welch algorithm. If it doesn't then it's certainly an
implementation problem.

As a stopping criterion, I stop when the score improves by less than a
certain percentage. This seems more consistent that say stoppping
after a preset number of iterations.

: I would enjoy every discussion concerning hidden-Markov modelling and described problems.

: Jiri Navratil

Good luck!

	-CaptN-
-- 
C N aturally
A E xpressing
P M y
T O pinions
