Newsgroups: comp.speech
Path: cantaloupe.srv.cs.cmu.edu!das-news2.harvard.edu!oitnews.harvard.edu!purdue!lerc.nasa.gov!magnus.acs.ohio-state.edu!math.ohio-state.edu!cs.utexas.edu!news.sprintlink.net!dispatch.news.demon.net!demon!uknet!newsfeed.ed.ac.uk!leeds.ac.uk!news
From: gavin@scs.leeds.ac.uk (G E Churcher)
Subject: Re: Speech Recognition For Visual Basic
Message-ID: <1995Aug11.101450.25587@leeds.ac.uk>
NNTP-Posting-Host: cspci06.leeds.ac.uk
X-Mailer: Mozilla 1.1N (Windows; I; 16bit)
Content-Type: text/plain; charset=us-ascii
MIME-Version: 1.0
Date: Fri, 11 Aug 1995 11:14:49 +0100 (BST)
References: <DD1B7v.K1@cogsci.ed.ac.uk>
    <1995Aug10.083940.1041@leeds.ac.uk>
Lines: 75
Content-Transfer-Encoding: 7bit

Hi all,


In reply to your queries, I shall tell you a bit more about the
PE500 from SSI.

First of all, SSI can be contacted at:

Speech Systems, Inc.
2945 Center Green Court South
Boulder
CO 90301-2275, USA

Tel: 303.938.1110
FAX: 303.938.1874

Unfortunately I do not have a sales email address for them.

The version I have is speaker independent and cannot be trained without having
an explicit contract with SSI. From what I've heard, its a pretty big task. So,
at the mo you are stuck with American Male/Female.

The PE500 uses a _proprietary interactive speech card_ (ISC) which incorporates 
an acoustic processor based on the Motorola 56001 Digital Signal Processor (DSP).
This converts analog voice input from a noise-cancelling microphone into digital
for the speech processor comprising of a phonetic encoder and decoder (memory-
resident on the host PC).

The system is controlled by a pre-compiled syntax which looks rather like a 
context-free grammar. It does provide a few other useful mechanisms, though.
Recursion, optionality and rule/word iteration are quite nicely supported.
Although the syntax does NOT allow any form of weighting mechanism, possibly
future versions may. You will have to contact SSI about this. I found that
for large, perplex syntaxes with large vocab, some form of weighting is helpful.
But there is a lot you can still do without it.

A nice feature is the ability to switch between different (pre-compiled)
syntaxes on the fly. It is possible to apply multiple syntaxes to the same
utterance, although this does take more processor time and hence not too
practical for real-time, time-critical systems.

The VBX (for Visual Basic) is quite easy to use, but does not offer the 
versatility of the DLLs provided. The SDK is well supported with a nice and
_complete_ Windows interface which *supposedly* does everything for you.

A final word about the lexicon - the developer must specify all the words
used by the app. I have been told that there are internal representations
for c400,000 words, and any other words are converted using a phonetic
algorithm. I'm not sure how effective this is, but it is possible to
change the internal representation with a bit of fiddling.

So, in summary:

	+ good development interface and support
	+ fairly comprehensive control DLLs and VBXs
	+ supposedly unlimited vocabulary
	- have to define vocab as a CFG
	- no weighting mechanism for syntax rules
	- proprietary card raises cost for user base
	+/- ideally suited for small-medium sub-languages with low perplexity


I have a paper available from anon FTP which you may wish to look at.
It details some of our findings with using commercial SDKs for developing
a corpus based grammar model for a loosely-defined sub-language, that of
Air Traffic Control.

ftp://agora.leeds.ac.uk/scs/doc/reports/1995/		file: 95.20.ps.Z
							(that's UNIX (un)compress)

If you have any further queries, don't hesitate to mail me. I would like to hear
about your work and which systems you have experience with or are intending to use.

Gavin.

