Newsgroups: comp.speech
Path: pavo.csi.cam.ac.uk!doc.ic.ac.uk!uknet!cam-eng!jam!ajr
From: ajr@jam.eng.cam.ac.uk (Tony Robinson)
Subject: Re: A little cepstral question.
Sender: ajr@eng.cam.ac.uk (Tony Robinson)
Message-ID: <AJR.93May18160204@jam.eng.cam.ac.uk>
In-Reply-To: sl2d1@neptune.ee.usu.edu's message of 8 May 93 15: 32:50 MDT
Date: Tue, 18 May 1993 21:02:04 GMT
Distribution: comp.speech
References: <MHALL.93May4124551@occs.cs.oberlin.edu> <1993May8.153250.67485@cc.usu.edu>
Nntp-Posting-Host: jam.eng.cam.ac.uk
Organization: Engineering Department, Cambridge University, England.
Lines: 37


mhall@occs.cs.oberlin.edu (Matthew Hall) writes:

|> 	I have heard a lot about cepstral analysis in regards to
|> speech processing.  Right now I am just using the results of an FFT,
|> for the patterns I create, however I would like to increase accuracy a
|> little.  The cepstral is supposed to imitate the way the ear hears,
|> which should make recognition accuracy better.

to which sl2d1@neptune.ee.usu.edu (kreifeldt richard allen) replies:

>  Depending on your application (mine is speech recognition) what you 
> may really want are LPC based Cepstral coefficients.  

But the phrase "to imitate the way the ear hears" suggests that what is
needed is mel scale cepstral coefficients.  As far as I know, the best way
to calculates these is to compute a power spectra via a FFT, do a non-linear
warping to the mel scale (mel = 1125.0 * log(0.0016 * hz + 1.0)), and then
compute a cosine transform to get the cepstral coefficients.

Now here is something that I'm not quite happy with.  Periodic signals, such
as voiced speech, only have power at the harmonics.  At other frequencies
you may have some leakage from the harmonics due to windowing, or perhaps
some background noise.  Let's assume that the values are small.  This
doesn't normally matter, but after taking log the variance is going to be
large.  Now comes the cosine transform stage, this is going to fit the
values to a minimum mean squared error, which in turn will be dominated by
the values between the harmonics where there is no speech!

In practical applications, this problem is mostly avoided by making the
order of the mel scaled power spectra about the same as the desired ceptral
order.  This means that most bins in the mel scaled power spectra contain at
least one harmonic, and hence are not dominated by noise.  However, as I see
it the remains an inherent weakness of the technique that you can't produce
high order mel scaled cepstral coefficients.

Tony Robinson
