Newsgroups: comp.speech
Path: pavo.csi.cam.ac.uk!cam-eng!dsl!ajr
From: ajr@dsl.eng.cam.ac.uk (Tony Robinson)
Subject: Re: speech recognition by ANN
Sender: news@eng.cam.ac.uk (Usenet News)
Message-ID: <AJR.93Nov12221706@dsl.eng.cam.ac.uk>
In-Reply-To: Michael.Witbrock@cs.cmu.edu's message of Fri, 12 Nov 1993 13: 25:53 -0500
Date: Fri, 12 Nov 1993 22:17:06 GMT
References: <CGC7L8.L61@cpccspc.cphk.hk> <YgsxIlG00hsBMxs40M@cs.cmu.edu>
Nntp-Posting-Host: dsl.eng.cam.ac.uk
Organization: Engineering Department, Cambridge University, England.
Lines: 38

In article <YgsxIlG00hsBMxs40M@cs.cmu.edu> Michael.Witbrock@cs.cmu.edu writes:
> 
> Excerpts from netnews.comp.speech: 11-Nov-93 speech recognition by ANN
> BACS4 Class B@csun05 (306) 
> 
> > I am doing a speech recognition project by ANN. Does anybody tell me what 
> > features of the speech wave can be recognized by ANN. And ANN will good at 
> > which one ? 
> 
> >  I am reading some material about LPC coeficient. Can it be a feature
> > for speech recognized by ANN ? 
> 
> Yes, it can, but most people seem to have found no particular advantage
> for this over filterbank coefficients. 

Which is interesting as cepstra (the cosine transform of log filter bank
coefficients) are significantly more popular for HMM work.

The difference must lie in the channel to channel indepenence assumption
often made in HMMs (the use of a diagonal covariance matrix) which is
more true of cepstral coefficents than filterbank coefficents.

The story is a little more complicated in that, until recently, I have
always found that filterbank coefficents provide better performance than
cepstral coefficents.  This is odd in that the cosine transform which
relates the two can be incorporated into the first layer of weights and
so there should be no difference (disregarding the arguments relating to
the fact that gradient descent optimisation imposes severe priors on the
weights relating to the initial values).

To add more complication, I have just found that perceptual linear
prediction (PLP) is a cepstral domain representation that gives as good
performance as filter banks (and therefore better than the other
cepstral domain representations).

So, for a one line summary, try PLP, whether you use HMMs or ANNs.

Tony Robinson
