Newsgroups: comp.speech
Path: pavo.csi.cam.ac.uk!warwick!uknet!pipex!bnr.co.uk!bnrgate!nott!torn!howland.reston.ans.net!usenet.ins.cwru.edu!magnus.acs.ohio-state.edu!csn!att!princeton!phoenix.Princeton.EDU!lseltzer
From: lseltzer@phoenix.Princeton.EDU (L. Seltzer)
Subject: Re: Algorithm for pitch detection
Message-ID: <1993May18.162011.3214@Princeton.EDU>
Originator: news@nimaster
Sender: news@Princeton.EDU (USENET News System)
Nntp-Posting-Host: phoenix.princeton.edu
Organization: Princeton University
References: <1993May17.200530.16738@en.ecn.purdue.edu>
Date: Tue, 18 May 1993 16:20:11 GMT
Lines: 41

>I am working on the F0 contour extractiion.  The algorithm I am using 
>is autocorrelation with center-clipping.  However, the result seems need
>to add in some post processing to make the countour less 'noisy'.  Is

The book entitled Pitch Determination of Speech Signals reports that
you can get 3% accuracy with an autocorrelation pitch detector.

There is a short article by Ken Steiglitz in an IEEE Transactions
on Acoustics, Speech and Signal Processing - I think it's from the
late 1970's, where he describes an optimal f0 detection algorithm
using trigonometric curve fitting. 

It would be interesting to hear other people's experiences, but I
think really accurate pitch determination is quite difficult.  It
might be useful to look at the spectrogram of the signal before
analyzing it and even breaking the signal into smaller sections
based on what the spectrogram shows (this is even before dividing
the signal into frames).  I'm not at all sure you can filter a long
signal with the same figure and get accurate results from a pitch
detector.  

F0 estimators can perform well enough to provide good enough results
to provide the quality we hear in compressed speech when we reconstruct
an LPC signal.  I.e. quality is degraded.  

I had mentioned 3% accuracy with autocorrelation.  That is not
good enough for applications such as ethnomusicology.  In spite of
the claims people make in ethnomusicology articles about the
results from commercial pitch trackers (they don't even know what's
inside the box, in terms of smoothers, etc.), I don't know whether
anyone has a system up and running that can determine f0 with
a half a percent in accuracy, which is more like what you need for
really accurate work.

It would be interesting to hear from anyone who has worked on this
more than I have.  Maybe this discussion should also be in
comp.dsp.




