Newsgroups: comp.speech
Path: pavo.csi.cam.ac.uk!warwick!uknet!pipex!sunic!psinntp!psinntp!verbex1!mwong
From: mwong@rad.verbex.com (Maurice K. Wong)
Subject: Re: Verbex Listen for Windows
Message-ID: <1993Jun28.162851.21130@rad.verbex.com>
Organization: Verbex Voice Systems, Inc.
References: <References: > <1993Jun24.194411.16746@rad.verbex.com> <BESHERS.93Jun25085614@tune.cs.columbia.edu>
Date: Mon, 28 Jun 1993 16:28:51 GMT
Lines: 50

In article <BESHERS.93Jun25085614@tune.cs.columbia.edu> beshers@cs.columbia.edu (Clifford Beshers) writes:
>Let's clarify that.  Verbex is not a dictation system, like
>Dragon Dictate.  It supports command and control.  You can say
>"Please open the second editor window" in one breath, because it
>is trying to recognize that phrase as a whole, rather than trying
>to sort out which words.
>
>Verbex is a nice, inexpensive product for controlling windows,
>but it does not have nearly the power of DD.  No commercially
>available speech recognition systems support full dictation and
>continuous speech, that I know of.
>
>--
>-----------------------------------------------
>Clifford Beshers
>450 Computer Science Department
>Columbia University
>New York, NY 10027
>Office:  (212) 939-7060
>Fax:     (212) 666-0140
>Email:   beshers@cs.columbia.edu
>

Thanks for the clarification about the fact that the Verbex system is
NOT a dictation system, and that there is currently no continuous
speech dictation system available.  I suspect a system that can truly
do unconstrained dictation using continuous speech will not be
available for quite some time.

I also have a point of clarification about the actual mechanics of
recognition.  That is, while it is true that a sentence such as
"Please open the second editor window" appears to be recognized "as a
whole", technically this is not quite true.  The Verbex system stores
internally one (or more) separate models for each word, such that if
you say instead "Please open the graphics window", the system will try
to match the word "please" using the same internal model for the word
"please", "open" for "open", "the" for "the", "window" for "window".
The only new model involved in this example is for the word
"graphics".  Or, in other words, the main point here is that the whole
phrase is NOT stored as a pattern, but individual words are, such that
once the word models are there, they can be recognized in different
combinations in phrases.  On the other hand, for an isolated word
recognition system, you can "trick" it into recognizing phrases said
continuously in one breathe by treating a phrase as one word, in
which case the whole phrase would be stored as one model internally,
and would truly be recognized as a whole.  (This means, of course, you
have to train each phrase separately as if each phrase is a single
word.)

Maurice Wong
