Newsgroups: comp.speech
Path: cantaloupe.srv.cs.cmu.edu!das-news2.harvard.edu!news2.near.net!howland.reston.ans.net!news.sprintlink.net!hookup!olivea!charnel.ecst.csuchico.edu!csusac!csus.edu!netcom.com!marcels
From: marcels@netcom.com (Marcel Schoppers)
Subject: Re: Recog. Accuracy; IN3, CSS, CVA, etc.
Message-ID: <marcelsD1wsE7.2Fn@netcom.com>
Organization: Netcom Online Communications Services (408-241-9760 login: guest)
References: <3eetbj$1vah@tequesta.gate.net>
Date: Thu, 5 Jan 1995 01:34:55 GMT
Lines: 39

I posted a similar question a few weeks ago and got only one reply to the
effect that I should not expect to find any published studies...  I then
found that I'd saved one from IEEE Computer March '94, which compared Media
Vision's ExecuVoice,  Creative Labs VoiceAssist, Covox VoiceBlaster, and
Digital Soup's Rover.  The article preferred VoiceAssist for convenience,
but ExecuVoice got the most words right, with VoiceAssist second.  That
study was all in the context of a quiet room, unfortunately.  (A useful
feature of VoiceAssist is, it warns you if two words sound too similar.)

Needing voice recognition in noisy environments, I have been examining the
Speech Systems (Phonetic Engine) PE500.  When I asked the vendor about noise
rejection they replied that this was heavily dependent on the microphone used
(noise-cancelling mikes a few inches in front of the user's mouth work best),
then added that their PE500 was demoed at Comdex where (despite all the
commotion all around) it worked very reliably.  Even if it's half true it's
still remarkable.  I interpret that this was due in part to the mike, and in
part to the fact that the system is continuous-speech, so typically works
with longer strings than individual words and gets some noise-rejection
from that.

A few more details:  the PE500 is a board plus software, for the PC.  It
recognizes continuous speech, is speaker-independent (no training required),
you define the target recognitions with an English grammar that the machinery
translates into target waveforms, the vocab is about 40,000, and when
recognition occurs you get out a character string as if the words had been
typed at a keyboard.  The catch is, you have to write your own programs to
control the software that comes with the board and to interpret the character
strings it returns.  The PE500 plus mike and software developer's kit sells
for about $1200.

If you believe my hunch that longer utterances improve recognition accuracy,
you may be able to find a continuous-speech system that doesn't make you do
the programming (see this BB's FAQ posting for a list of possibilities).  Or,
even easier, you can trick the usual one-word recognizers by teaching them
longer phrases instead, and choosing the phrases for easy distinction.  I'd
say, based on the IEEE Computer article, try VoiceAssist but replace their
mike with a headset.
Marcel

