Newsgroups: comp.speech
Path: pavo.csi.cam.ac.uk!warwick!doc.ic.ac.uk!agate!howland.reston.ans.net!math.ohio-state.edu!news.acns.nwu.edu!news.eecs.nwu.edu!bovik
From: bovik@eecs.nwu.edu (James Salsman)
Subject: Re: speech, non-speech discrimination
Message-ID: <CIs1oB.FDq@eecs.nwu.edu>
Summary: Accoustics, frequency-domain transformations, and microphone technology
Keywords: Speech Processing
Sender: usenet@eecs.nwu.edu
Organization: BRI
References: <63442@ogicse.ogi.edu>
Date: Wed, 29 Dec 1993 03:28:59 GMT
Lines: 43

In article <63442@ogicse.ogi.edu>,
  L Don Colton <ldcolton@chico.cse.ogi.edu> wrote:

> For my ph.d. research proficiency exam, i am interested in identifying
> speech versus non-speech (background noises, etc) in telephone
> waveforms. The idea of course is that by first identifying the time
> periods that are speech, subsequent stages of recognition can be more
> productive.

Speech is not exactly a descrete property of a sample of sound.  You
can use the accustical properties of three or more microphones to
isolate sound that originates at a particular point or a region of
space, based on speed-of-sound calculations.

For a more detailed "discrimination machine" that is supposed to
declare which if any or all intervals of a sound is speech, you
probably will find a lot of operations on the frequency domain.  This
means that you need good tools for accuratly representing the sound
wave in the frequency domain.  You can identify non-speech events
based on attributes derived from their behavior in the frequency
domain.  Representations in the time domain can tell you little more
than amplitude, but a sophisticated time-domain recognizer can
probably do as well as a simple frequency-domain discriminator.  (The
derivative of the time domain is a very crude approximation to the
state of the frequency domain.)

I know of a speech processing system that uses the complex function

                             1 - 0.97z^-1

to "pre-emphasize" speech.

Also, consider the grid with which you are quantizing the frequency
domain; what assumtions about speech are inherent there?

>Don Colton               ___e     Center for Spoken Language Understanding
>ldcolton@cse.ogi.edu   _`\ <;     Oregon Graduate Institute, 20000 NW Walker
>bicycle commuter______(_)/_(_)____P.O.Box 91000, Portland, OR 97291-1000
 ^^^^^^^^^^^^^^^^

Wise.

:James
