Newsgroups: comp.speech
Path: cantaloupe.srv.cs.cmu.edu!das-news2.harvard.edu!news2.near.net!howland.reston.ans.net!swrinde!pipex!uknet!info!atlantic!eeopensh
From: eeopensh@atlantic (J. P. Openshaw)
Subject: Re: Phoneme Recognition using formants
X-Nntp-Posting-Host: atlantic.swan.ac.uk
Message-ID: <D2JnoB.FKK@info.swan.ac.uk>
Sender: news@info.swan.ac.uk
Organization: Swansea Info Server
X-Newsreader: TIN [version 1.2 PL2]
References: <3esvis$g7f@cs6.rmc.ca> <slerner-1101951737210001@slerner.gte.com> <3f1sof$b4q@agate.berkeley.edu>
Date: Tue, 17 Jan 1995 09:57:47 GMT
Lines: 32

John Lazzaro (lazzaro@snap.CS.Berkeley.EDU) wrote:
: In article <slerner-1101951737210001@slerner.gte.com>,
: Sol Lerner <slerner@gte.com> wrote:
: >
: >If it were that easy, I could suitably quantize the Peterson & Barney
: >scattergraphs and look up the phoneme.  The problem IS locating the
: >formants reliably.
: >

: Then again, maybe the problem isn't modeling the quasi-steady-state
: parts of speech like vowel formants, but modeling the transients where
: so much of the information content is ...


Why do most people, myself included, stick mainly to frame-based cepstral 
analysis so much then? Is it because it works 'well enough' to continue 
getting research grants!? I must admit to not being the greatest fan of 
cepstral based analysis, and their attributes in noisy conditions are 
awful, but I haven't really seen anything much better.

As regards formant based analysis, are they the real 'holy grail', or 
just another feature that will have major limitations just like any 
other form of analysis. How well will they work in noise, or under 
speaker stress or with any sort of intra-speaker variance due to time 
or colds etc?


Anyway enough conjecture...

John Openshaw, e-mail j.p.openshaw@swansea.ac.uk


