Newsgroups: comp.ai.neural-nets
Path: cantaloupe.srv.cs.cmu.edu!das-news.harvard.edu!news2.near.net!MathWorks.Com!europa.eng.gtefsd.com!howland.reston.ans.net!news.sprintlink.net!mv!turing.mv.com!user
From: buteau@turing.mv.com (Brandon L. Buteau)
Subject: Re: wavelets and NN
Message-ID: <buteau-160994123658@turing.mv.com>
Nntp-Posting-Host: turing.mv.com
Sender: usenet@mv.mv.com (System Administrator)
Organization: MV Communications, Inc.
Date: Fri, 16 Sep 1994 17:36:58 GMT
References: <35987j$r4n@kodak.rdcs.Kodak.COM>
Followup-To: comp.ai.neural-nets
Lines: 45

In article <35987j$r4n@kodak.rdcs.Kodak.COM>, martin@belteshazzar.Kodak.COM
(Craig Martin) wrote:

> 
> -- 
> I am currently working on a thesis project in speech recognition that
> uses wavelet tranforms to generate the feature vector and then training
> some of the vectors into associative memory (NN).  I have recorded
> 15 people saying the vowels from the Peterson/Barney group to use as
> my sound data base.  Some of these are trained into the associative
> memory and some are presented to the network for recognition.
>  
> My problem is that I cannot seem to generate good, distinct, feature
> vectors using the wavelet (DAUB4 as shown in numerical recipes).  My
> approach has been to parse the vowel from the beginning of each sound
> into a 1024 byte buffer, normalize the data, feed it into the wavelet
> algorithm, and then add up the coefficients in each section of the
> resulting wavelet transform (first 2, second 2, next 4, next 8...etc).
> There is some similarity in the summed coefficients for a given sound
> spoken by different people, but not enough to give me distinct feature
> vectors for a given sound.
>  
> I have tried the higher order wavelet transforms from the nr book with
> no improvement.
>  
> Is there anyone out there that can tell me what I am doing wrong?????
> It is probably a lack of understanding on my part as to how to interpret
> the wavelet transform.
>  

I don't know how to evaluate your understanding of the wavelet transform,
but the method you are using to combine wavelet coefficients (summing all
the coefficients at each "section" of the transform) is definitely
counter-productive.  As I understand the transform, each section or level
of the result effectively describes the amplitude of the basis wavelet
frequency for each sample of time.  In your example, the final level with
512 coefficients represents 512 samples of the highest frequency
information uniformly distributed from the start to the end of the original
sample.  Lower levels represent coarser samples of lower frequency
information.  If you add all the coefficients at each level, you are
throwing away the distribution of frequencies over time!  This is precisely
why the original wavelet transform results are valuable for applications
like yours.  Perhaps you could extract the peak (absolute value)
coefficient from each level along with its offset in time from the
beginning of the level and use these values to train the NN. 
