Newsgroups: comp.speech
Path: lyra.csx.cam.ac.uk!warwick!news.dcs.warwick.ac.uk!str-ccsun!strath-cs!bnr.co.uk!pipex!howland.reston.ans.net!agate!library.ucla.edu!news.mic.ucla.edu!unixg.ubc.ca!quartz.ucs.ualberta.ca!acs.ucalgary.ca!cpsc.ucalgary.ca!hill
From: hill@cpsc.ucalgary.ca (David Hill)
Subject: Re: Simple Question in Speech Recognition
Message-ID: <Ctq7Gy.98@cpsc.ucalgary.ca>
Sender: news@cpsc.ucalgary.ca (News Manager)
Organization: University of Calgary Computer Science
References: <1994Jul25.131035.69013@kuhub.cc.ukans.edu> <slerner-280794135257@slerner.gte.com>
Date: Fri, 29 Jul 1994 23:48:33 GMT
Lines: 42

In article <slerner-280794135257@slerner.gte.com> slerner@gte.com (Sol Lerner) writes:
>In article <1994Jul25.131035.69013@kuhub.cc.ukans.edu>,
>christos@kuhub.cc.ukans.edu wrote:
>
>> Hello there,
>> 
>> 	I have a very basic question for the Speech Recognition gurus. In
>> isolated word recognition each word is divided into frames and for each frame
>> we extract speech features such as LPC coefficients, cepstrum coefficients etc.
>> It is customary that the frames overlap, i.e if frame1 extends from sample 1 to
>> sample 256, frame 2 might extend from sample 128 to sample 256+128. This
>> introduces correlation between neighbouring frames so that the feature vectors
>> are correlated. Why do we wish that neighbouring vectors are correlated?
>
>We don't _wish_ neighboring vectors to be correlated.  Rather, this process
>is a result of our need to both quantify fast-changing signals and to
>accurately estimate our features.  
>
>For example, with an 8-KHz sampling rate, 256 samples span 32 msec. 
>However, for fast changing phenomenon, we would like to be able to take
>feature samples about every 10 msec.  We can reduce the number of samples
>used to estimate our features (from 256 to 128 for example), but this would
>reduce the robustness and accuracy of our features for slow changing
>phenomenon.  Therefore, we compromise in the aformentioned way.
>
>Sol

You might find it helpful to do a pitch syncrhonous analysis, making the
frame length variable and equal to the pitch period.

Of course, you have to be able to identify pitch periods, which is a problem
in itself, but it gets rid of a lot of unpredictability which is good
regardless of the style of analysis you choose, and permits about the
fastest tracking that is practical.

david
------

-- 
david hill: hill@cpsc.ucalgary.ca	|	Imagination is more
voice: 403-282-6481, fax: 403-282-6778	|	important than knowledge.
nextmail: hill@trillium.ab.ca		|		(Albert Einstein)
