Newsgroups: comp.speech
Path: cantaloupe.srv.cs.cmu.edu!das-news2.harvard.edu!news2.near.net!howland.reston.ans.net!torn!alf.uwaterloo.ca!watserv2.uwaterloo.ca!mspundsa
From: mspundsa@coulomb.uwaterloo.ca (Mark Stephen Pundsack)
Subject: Re: Continuous phonetic speech recognition using HMM's?
Message-ID: <D3pFpr.It5@watserv2.uwaterloo.ca>
Sender: news@watserv2.uwaterloo.ca
Nntp-Posting-Host: babbage.uwaterloo.ca
Organization: University of Waterloo
References: <D3LB8r.9IC@watserv2.uwaterloo.ca> <AJR.95Feb7210045@compute.demon.co.uk>
Date: Wed, 8 Feb 1995 23:25:03 GMT
Lines: 48

In article <AJR.95Feb7210045@compute.demon.co.uk>,
Tony Robinson <ajr@demon.co.uk> wrote:
>In article <D3LB8r.9IC@watserv2.uwaterloo.ca> mspundsa@coulomb.uwaterloo.ca (Mark Stephen Pundsack) writes:
>
>> I'd like to use HMMs for continuous speech recognition, but I was having
>> some difficulties figuring out how to use the HMM continuously instead 
>> of doing some kind of break detection.
>
<snip>
>
>> Has anyone made continuous phone based speech recognizers using just
>> an HMM for the grammer?  I don't really want to try a stack based decoder
>> or other methods yet.
>
>Yes - you can do this for less than a few thousand words.  As you say,
>it is far better to get the standard beamwidth pruning time syncronous
>decodings working first before delving into anything else.

What are the limitation in using HMMs for more than a few thousand words?
My original problems came from trying to recognize very large vocabularies
such as entire grammers with 10,000+ words.  I've gotten recognition 
working for 500 words.  It is simply checking each word at a time.  I'd like
to avoid the single word at a time method and do sentences at a time.
This still worked fine with 500 words and 64MB of memory on a Sparc.  
The problem is that the larger the grammer, the larger the memory needed
for the HMM.  And it made it a little more difficult evaluating the
entire thing in one big pass.

Is it feasible to use a strictly HMM approach to 10,000+ word recognition?
I know that isn't the popular method since people like to optimize
further, but I just want to know if it is possible.

I would think that most applications using very large vocabularies
break the HMM into smaller parts and use some other decoder to do the
grammer.  When this is done for continuous speech, how do you know
what boundaries to use for the smaller HMMs?

>
>
>Tony Robinson
>
>At home: ajr@compute.demon.co.uk
>My first real post from my own news feed!

I can feel the web expanding! :)

Thanks,
Mark
