Newsgroups: comp.speech
Path: cantaloupe.srv.cs.cmu.edu!bb3.andrew.cmu.edu!news.sei.cmu.edu!cis.ohio-state.edu!math.ohio-state.edu!scipio.cyberstore.ca!vanbc.wimsey.com!unixg.ubc.ca!quartz.ucs.ualberta.ca!news.ucalgary.ca!hill
From: hill@salab1.psych.ucalgary.ca (David Hill)
Subject: Re: extracting fundamen. freq. from sonogram (?)
Summary: Its a tough problem
Message-ID: <Oct7.060306.25992@acs.ucalgary.ca>
Sender: david hill
Date: Fri, 7 Oct 1994 06:03:06 GMT
References: <andrew.781422453@bettong>
Organization: U of Calgary, Canada
Keywords: pitch extraction fundamental frequency harmonics
Lines: 45

In article <andrew.781422453@bettong>,
Andrew McClure <andrew@cs.uwa.oz.au> wrote:
>Hi
>
>I've written a wavelet analyser to create a logarithmicly based
>sonogram but am keen to learn of ideas for removing the upper harmonics.
>
>There are some obviouse methods, like finding the first peak and thresholding
>against the highest valued frequecny, but any ideas are welcomed.
>
>cheers
>
>andrew
>
>

Pitch extraction is a well known hard problem in speech analysis.  Extracting
the fundamental (first harmonic) looks as though it ought to be easy but
in practice, what with noise, variability, non-continuity (due to voiceless
sounds), and pitch doubling or a periodic vocal fold vibration, no-one
has got a good solution.  One thing you can try is looking at the harmonic
spacing (which is the same as the fundamental).  As the resonant peaks
vary, harmonics come and go, but if you can detect the harmonic patterning,
wherever it occurs, and extract the spacing, and take the smallest value of
the spacing.

You could also look at a broad band analysis (broad enough to give good time
resolution) and look at the amplitude modulation of any energy in as many
bands as possible.  If you correlate across all channels, you could pick
up the voicing frequency (pitch, fundamental, ...) by taking the reciprocal of
the interval between successive peaks, where the peaks are determined by some
fairly sophisticated operation on all channel energy modulation.  That's
almost certainly closely related to how the human perceptual system does
it.

Good luck, and send me a copy of your working code please ;-)

david
-------

-- 
David R. Hill, CS Dept., U. Calgary         | Imagination is more
Calgary, AB, Canada T2N 1N4 Ph: 403-220-6315| important than knowledge.
hill@cpsc.ucalgary.ca       Fx: 403-282-6778|         (Albert Einstein)
NeXTMail: hill@trillium.ab.ca (Preferred)   | Kill your television!
