Newsgroups: comp.speech
Path: cantaloupe.srv.cs.cmu.edu!rochester!udel!gatech!swrinde!pipex!uknet!info!myhost.subdomain.domain!iijohn
From: iijohn@myhost.subdomain.domain (John Openshaw)
Subject: Re: zero crossing rate as a speech feature in IWR
X-Nntp-Posting-Host: iifeak.swan.ac.uk
Message-ID: <D82215.F4M@info.swan.ac.uk>
Sender: news@info.swan.ac.uk
Organization: String to put in the Organization Header
X-Newsreader: TIN [version 1.2 PL2]
References: <3o9icj$17iv@otho.cc.flinders.edu.au>
Date: Thu, 4 May 1995 13:12:40 GMT
Lines: 48

SuneelRandhawa wrote:
: Hi all,

:        As part of my experiments with isolated word recognition
: using discrete HMMs (speaker dependent), I have found that
: zero crossing rate in the feature vector improves the recogniton
: accuracy.  I was wondering if this would be the case in a
: speaker independent context.  Does this feature change considerably
: for a given subword(phoneme?) between different speakers?

: Any pointers to references or other info would be appreciated.

: Sincerely

: Suneel Randhawa



Unfortunately, the answer is yes. The zero crossing rate, for voiced
speech, is dependent on the pitch of the speaker. If I remember rightly, I
think that this can vary between 40-200 Hz with tall men generally having
lower pitches to small women. Pitch is also affected by stress and mood, so
it can have a big intra-speaker variance too. 

Saying that, one of the things that zero-crossing can help with is 
determining if speech is voiced or non-voiced. This may help the classifier
to decide between voiced and unvoiced phonemes. So, my opinion,
for what its worth, is that zero crossing rate may help a bit, but it is 
definitely not a feature to be relied upon.

I don't know of any studies that detail zero crossing rate between speakers
for phonemes, but one place where there may be information on inter-speaker
variance, is in voice activity detection (VAD) algorithms, as many VAD
algorithms use zero-crossing as a feature. Check out some papers on
end-point detection. One to start with would be a paper by J. Haigh  
and J Mason in Eurospeech-93, which will have further references in it. 

Cheers

John Openshaw


PS I tried to mail you directly, but your address seems to be incorrectly
configured... 




