Archive-name: comp-speech-faq/part3
Last-modified: 1995/01/19
COMP.SPEECH FAQ POSTING - PART 3/3
[Note: this document has been automatically extracted from a WWW site:
http://www.speech.su.oz.au/comp.speech
This may introduce some formatting errors.]
===========================================================================
FAQ SECTION 5 - Speech Synthesis
Q5.1: WHAT IS SPEECH SYNTHESIS?
Speech synthesis is the task of transforming written input to spoken
output. The input can either be provided in a graphemic/orthographic
or a phonemic script, depending on its source.
_________________________________________________________________
Q5.2: HOW CAN SPEECH SYNTHESIS BE PERFORMED?
There are several algorithms. The choice depends on the task they're
used for. The easiest way is to just record the voice of a person
speaking the desired phrases. This is useful if only a restricted
volume of phrases and sentences is used, e.g. messages in a train
station, or schedule information via phone. The quality depends on the
way recording is done.
More sophisticated but worse in quality are algorithms which split the
speech into smaller pieces. The smaller those units are, the less are
they in number, but the quality also decreases. An often used unit is
the phoneme, the smallest linguistic unit. Depending on the language
used there are about 35-50 phonemes in western European languages,
i.e. there are 35-50 single recordings. The problem is combining them
as fluent speech requires fluent transitions between the elements. The
intellegibility is therefore lower, but the memory required is small.
A solution to this dilemma is using diphones. Instead of splitting at
the transitions, the cut is done at the center of the phonemes,
leaving the transitions themselves intact. This gives about 400
elements (20*20) and the quality increases.
The longer the units become, the more elements are there, but the
quality increases along with the memory required. Other units which
are widely used are half-syllables, syllables, words, or combinations
of them, e.g. word stems and inflectional endings.
_________________________________________________________________
Q5.3: WHAT ARE SOME GOOD REFERENCES/BOOKS ON SYNTHESIS?
The following are good introductory books/articles.
* Douglas O'Shaughnessy -- Speech Communication: Human and Machine
Addison Wesley series in Electrical Engineering: Digital Signal
Processing, 1987.
* D. H. Klatt, "Review of Text-To-Speech Conversion for English",
Jnl. of the Acoustic Society of America (JASA), v82, Sept. 1987,
pp 737-793.
* "Talking Machines, Theories, Models and Designs" Eds, G. Bailly &
C. Benoit (Elsevier: North Holland)
* I. H. Witten. Principles of Computer Speech. (London: Academic
Press, Inc., 1982).
* John Allen, Sharon Hunnicut and Dennis H. Klatt, "From Text to
Speech: The MITalk System", Cambridge University Press, 1987.
_________________________________________________________________
Q5.4: WHAT SPEECH SYNTHESIS SOFTWARE/HARDWARE IS AVAILABLE?
Please email any updates, corrections or additions to the following
list. The range of commercially available synthesis software is
growing rapidly so any help in keeping up to date will be appreciated.
Orator Text-to-Speech Synthesizer
* Platform: SUN SPARC, Decstation 5000. Written in C, and
therefore portable to other UNIX platforms. Some successful ports:
HP, RS-6000, PC-Unix [Linux].
* Description: Sophisticated speech synthesis package. Has text
preprocessing (for abbreviations, numbers), acronym rules, and
human-like spelling routines. Natural-sounding synthesis based on
demisyllable concatenation.
Has high accuracy for pronunciation of names of people, places and
businesses in America; good accuracy for English text; rules for
stress and intonation marking; various methods of user control and
customization at most stages of processing.
A new version of the ORATOR system is under development. Both
ORATOR and this new "ORATOR II" system are capable of very good
general text synthesis. The ORATOR II system has a more
natural-sounding voice.
* Hardware: Runs on common SPARC or Decstation workstations, using
their internal audio output capability. Recommend at least 16M of
memory.
* Availability and Pricing: Contact Bellcore's Licensing Office
(1-800-527-1080) or email Anthony Lindsey alin1@panix.com
Text to phoneme program (1)
* Platform: unknown
* Description: Text to phoneme program. Based on Naval Research
Lab's set of text to phoneme rules.
* Availability: by anonymous ftp
+ ftp://shark.cse.fau.edu/pub/src/phon.tar.Z
Text to phoneme program (2)
* Platform: unknown
* Description: Text to phoneme program.
* Availability: by anonymous ftp
+ ftp://wuarchive.wustl.edu/mirrors/unix-c/utils/phoneme.c
Text to phoneme program (3)
* Description: A public domain version of the same Naval Research
Lab text to phoneme rules.
* Availability: By anonymous ftp
+
ftp://svr-ftp.eng.cam.ac.uk/pub/comp.speech/sources/english2ph
oneme.shar
Text to speech program
* Description: A implementation of the Klatt phoneme to waveform
speech synthesiser.
* Availability: By anonymous ftp
+
ftp://svr-ftp.eng.cam.ac.uk/pub/comp.speech/sources/klatt-0.02
.tar.Z
"Speak" - a Text to Speech Program
* Platform: Sun SPARC
* Description: Text to speech program based on concatenation of
pre-recorded speech segments. A function library can be used to
integrate speech output into other code.
* Hardware: SPARC audio I/O
* Availability: by anonymous ftp
+ ftp://wilma.cs.brown.edu/pub/speak.tar.Z
TheBigMouth - a Text to Speech Program
* Platform: NeXT
* Description: Text to speech program based on concatenation of
pre-recorded speech segments. NeXT equivalent of "Speak" for Suns.
* Availability: try NeXT archive sites such as
sonata.cc.purdue.edu.
TextToSpeech Kit
* Platform: NeXT Computers
* Description: The TextToSpeech Kit does unrestricted conversion
of English text to synthesized speech in real-time. The user has
control over speaking rate, median pitch, stereo balance, volume,
and intonation type. Text of any length can be spoken, and
messages can be queued up, from multiple applications if desired.
Real-time controls such as pause, continue, and erase are
included. Pronunciations are derived primarily by dictionary
look-up. The Main Dictionary has nearly 100,000 hand-edited
pronunciations which can be supplemented or overridden with the
User and Application dictionaries. A number parser handles numbers
in any form. A letter-to-sound knowledge base provides
pronunciations for words not in the Main or customized
dictionaries. Dictionary search order is under user control.
Special modes of text input are available for spelling and
emphasis of words or phrases. The actual conversion of text to
speech is done by the TextToSpeech Server. The Server runs as an
independent task in the background, and can handle up to 50 client
connections.
* Misc: The TextToSpeech Kit comes in two packages: the Developer
Kit and the User Kit. The Developer Kit enables developers to
build and test applications which incorporate text-to-speech. It
includes the TextToSpeech Server, the TextToSpeech Object, the
pronunciation editor PrEditor, several example applications,
phonetic fonts, example source code, and developer documentation.
The User Kit provides support for applications which incorporate
text-to-speech. It is a subset of the Developer Kit.
* Hardware: Uses standard NeXT Computer hardware.
* Cost:
+ TextToSpeech User Kit: $175 CDN ($145 US)
+ TextToSpeech Developer Kit: $350 CDN ($290 US)
+ Upgrade from User to Developer Kit: $175 CDN ($145 US)
* Availability: Trillium Sound Research
1500, 112 - 4th Ave. S.W., Calgary, Alberta, Canada, T2P 0H3
Tel: (403) 284-9278 Fax: (403) 282-6778
Order Desk: 1-800-L-ORATOR (US and Canada only)
Email: TTSInfo@trillium.ab.ca
SGI Developers Toolbox Synthesiser
* Platform: SGI
* Description: The SGI Developer Toolbox 4.0 CDROM contains a
basic public domain text-to-speech program in the publics/speak
directory. The directory includes man pages and source.
* Availability: on the SGI Developer Toolbox 4.0 CDROM
rsynth
* Platform: Various (including Solaris2.3, SunOS4.1.3, HPUX, SGI
Irix4.x, Linux)
* Description:Public domain text-to-speech systm assembled from a
variety of sources. It supports CMU and "beep" format dictionaries
and now utilises stress marks in the dictionary in synthesising
intonation.
* Price: Free
* Availability: by anonymous ftp from
+
ftp://svr-ftp.eng.cam.ac.uk/pub/comp.speech/sources/rsynth-2.0
.tar.Z
+
ftp://svr-ftp.eng.cam.ac.uk/pub/comp.speech/sources/rsynth-2.0
.tar.gz
SENSYN speech synthesizer
* Platform: PC, Mac, Sun, and NeXt
* Rough Cost: $300
* Description: This formant synthesizer produces speech waveform
files based on the (Klatt) KLSYN88 synthesizer. It is intended for
laboratory and research use. Note that this is NOT a
text-to-speech synthesizer, but creates speech sounds based upon a
large number of input variables (formant frequencies, bandwidths,
glottal pulse characteristics, etc.) and would be used as part of
a TTS system. Includes full source code.
* Availability: Sensimetrics Corporation
64 Sidney Street, Cambridge MA 02139.
Fax: (617) 225-0470; Tel: (617) 225-2442.
Email: sensimetrics@sens.com
spchsyn.exe
* Platform: PC?
* Availability: By anonymous ftp as a self extracting DOS archive.
+
ftp://evans.ee.adfa.oz.au/mirrors/tibbs/applications/spchsyn.e
xe
* Requirements: May require special TI product(s), but all source
is there.
CSRE: Canadian Speech Research Environment
* Platform: PC
* Cost: Distributed on a cost recovery basis.
* Description: CSRE is a software system which includes in
addition to the Klatt speech synthesizer, SPEECH ANALYSIS and
EXPERIMENT CONTROL SYSTEM. A paper about the whole package can be
found in:
+ Jamieson D.G. et al, "CSRE: A Speech Research Environment",
Proc. of the Second Intl. Conf. on Spoken Language
Processing, Edmonton: University of Alberta, pp. 1127-1130.
* Hardware: Can use a range of data aqcuisition/DSP hardware.
* Availability: For more information contact
Krystyna Marciniak
email march@uwovax.uwo.ca
Tel (519) 661-3901 Fax (519) 661-3805.
For technical information email ramji@uwovax.uwo.ca
* Note: A more detailed description is given in Section 1.9 on
speech environments.
Eloquence (currently an alpha release)
* Platform: Windows and Solaris
* Description: Software based text-to-speech package. Generates
waveforms completely algorithmically instead of by concatenating
waveforms, for maximum flexibility and naturalism. For instance,
when the user requests a deeper voice, the software simulates a
larger vocal tract, instead of simply pitch-shifting samples.
Uses high-level linguistic parsing, which obviates the need for a
huge dictionary. Handles numbers, acronyms, currency, etc.
Includes a set of annotation symbols, for placing stress on
particular words, expressing excitement/boredom, etc. Also allows
phonetic input. The final version, including support for Windows
DDE and OLE and UNIX Sockets, will be released by the end of 1994.
Produces male and female voices for General American English.
Dialects under development include Alabama, Brooklyn, and Boston.
* Price: $5000 (unconfirmed)
* Availability:
Eloquent Technology, Inc.
2389 North Triphammer Road
Ithaca, NY 14850
Ph: (607) 607-266-7025 Fax: (607) 607-266-7030
Email: eti@plab.dmll.cornell.edu
JSRU
* Platform: UNIX and PC
* Cost: 100 pounds sterling (from academic institutions and
industry)
* Description: A C version of the JSRU system, Version 2.3 is
available. It's written in Turbo C but runs on most Unix systems
with very little modification. A Form of Agreement must be signed
to say that the software is required for research and development
only.
* Contact: Dr. E.Lewis eric.lewis@bristol.ac.uk)
Klatt-style synthesiser
* Platform: Unix
* Cost: Free
* Description: Software posted to comp.speech in late 1992.
* Availability: By anonymous ftp from the comp.speech archives
+
ftp://svr-ftp.eng.cam.ac.uk/pub/comp.speech/sources/klatt-0.02
.tar.Z
DECTalk
* Description: Speech synthesis hardware and software. Detailed
information on DECtalk and other DEC products is available on a
World-Wide Web site.
+ http://www.digital.com/info.html
For specific information on DECtalk, check out this www url:
+
http://www.digital.com/archive/pub/Digital/info/Customer-Updat
e/940620005.txt
Speech Manager and PlainTalk
* Platform: Macintosh
* Cost: Free
* Description: Apple's new text-to-speech system extension(s) that
enable applications (listed below) to perform text-to-speech
conversion. The Speech Manager runs on most Macs, but PlainTalk
(and the high quality voices) requires a 68020 Mac or better.
* Availability: By anonymous ftp from:
+ ftp://ftp.apple.com/dts/mac/sys.soft/speech
There are 3 files in this directory:
6273632 Aug 14 22:51 macintalk-pro.hqx
PlainTalk Text-To-Speech 1.0 speech synthesizer extension
(includes Female Voice, Compressed); TTS Female Voice;
TTS Male Voice; and TTS Male Voice, Compressed. Requires
68020 or better!
370108 Aug 13 04:30 speech-manager-docs.hqx
Apple DocViewer format (Inside Macintosh style, no
installation instructions - just drag everything onto
your closed System Folder).
262569 Aug 7 07:01 speech-manager.hqx
Speech Manager 1.1.1 (includes Marvin's voice) and
MacInTalk Voices 1.1.1 (9 more voices). Runs most Macs.
Various Mac Speech Output Applications
* Platform: Macintosh
* Cost: Free (except for At Ease)
* Description: Some of the Speech Manager aware text-to-speech
(TTS) applications, etc. are listed below (there are more on the
Apple Developer CD-ROMs).
Application, etc. Source Comments
_________________ ________ _________________________________________________
AddressSpeech info-mac 4D talking address book (from Speech Pack 2.0)
At Ease 2.0 MacWarehouse Friendly desktop that speaks file names
At Ease 2.0 WG MacWarehouse Friendly desktop that speaks file names
Eliza 3.1 AOL Talking Eliza (Rogerian psych therapist)
FB speech Inside Basic Mag, volume 3, no. 6. FutureBasic demo
FB Speech demo Inside Basic Mag, volume 3, no. 7. FutureBasic demo
Fortune 1.1 info-mac Like a talking UNIX fortune command - slick
Homer 0.92d9 zaphod.ee.pitt.edu GUI IRC client, assign nicks voices - slick
MacMessage 1.0 FirstClassBBS Share talking messages/customizable startup
Say info-mac MPW Tool which converts standard input to speech
ScriptTools 1.2 info-mac Write AppleScript scripts to say text messages
Siege Watch 1.01f info-mac Wryly political speaking clock
SoToSpeak1.0.0b10 info-mac Two voice conversation (also see Fortune's About)
Speak It! info-mac Type in a message and have it spoken
Speaker 1.11 info-mac Simple text file editor, speaks on CR, macros
Speecher 1.2.1 info-mac Customizable word pronunciation/substitution
SpeechManagerdemo info-mac Command line interface, C source, aka -explorer
Speech Pack 2.0 info-mac 4th Dimension external, add speech to database
SpeechUnitEx info-mac Pascal source code for speech in Lab 7
speek-02b info-mac Speech XCMD for HyperCard
TalkingClockPro2.0info-mac AppleScriptable talking clock extension (2.0b0)
TeachText 7.2 AV Mac Apple's talking TeachText (simple editor w/QT)
Tex-Edit 1.9 AOL Talking word processor, McSink like, modeming
VoiceDemo 1.0.1 info-mac Bare bones phrase talker
Welcome!v1.3.1 info-mac A talking Welcome to Macintosh startup
? ? Talking Plug-In-Module for MS Word 5,
experimental, unsupported, buggy, beware!
Speech Rhythms AOL A cool text file for one of the above apps
_____
* Sources:
+ AOL = America Online
+ info-mac = {ftp sumex-aim.stanford.edu, ftp
wuarchive.wustl.edu, et al.}
+ MacWarehouse = (800) 255-6227
* Misc: Apple's work in spoken language technologies and systems
is described in:
+ Lee, Kai-Fu. "The Conversational Computer: An Apple
Perspective." (Keynote Speech) In Proc. Eurospeech in Berlin,
September, 1993.
MacinTalk
* Platform: Macintosh
* Cost: Free
* Description: Formant based speech synthesis. There is also a
program called "tex-edit" which apparently can pronounce English
sentences reasonably using Macintalk.
* Note: MacinTalk doesn't run reliably on Macintosh's with new
sound hardware under the lastest OS (System 7.1 w/HUD 2.0). More
recent software is listed above.
* Availability: By anonymous ftp from many archive sites (have a
look on archie if you can). tex-edit is on many of the same sites.
Try
+
ftp://wuarchive.wustl.edu/mirrors2/info-mac/Old/card/macintalk
.hqx
+
ftp://wuarchive.wustl.edu/mirrors2/info-mac/Old/card/macintalk
-stack.hqx
+
ftp://wuarchive.wustl.edu/mirrors2/info-mac/app/tex-edit-15.hq
x
Monologue by Creative Labs
* Platform: PC Windows plus SoundBlaster 16
* Cost: $99.00 or free with some MultiMedia packages
* Description: Phoneme based speech synthesis software which
provides output on Sound Blaster compatible audio cards. It
includes a dictionary of words that are "exceptions" together with
a a dictionary manager for modifying those words. It can be used
as a stand alone program with Windows' Clipboard or as a DDE
server dynamically linked (DLL) to a program you write.
* Cost: $99.00 or free with some MultiMedia packages
* Contact:
Creative Labs Inc.
1901 McCarthy Boul, Milpitas, CA 95035, USA
Tel: 408-428-6622 Fax: 408-428-6633 BBS: 408-428-6660
OR Creative Technology Ltd.
67 Ayer Rajah Crescent #03-18, Singapore 0513
Tel: 65-870-0433 Fax: 65-773-0353 BBS: 65-776-2423
Lernout & Hauspie Text-To-Speech SDK
* Platform: IBM-Compatible
* Description: The L&H; Text-to-Speech software developers kit is
able to integrate text-to-speech technology with your own or
existing PC applications under Microsoft Windows 3.1. This
software will allow conversion of written text into clear human
sounding synthetic speech.
* Requirements: IBM-compatible PC 386 DX(33Mhz) or higher, 8Mb
RAM, MS DOS 5.0(or higher), MS Windows 3.1 (or higher), Compiler
and linker: Microsoft(R) Visual C++ or Borland C++, Windows(TM)
3.1 compatible sound card, preferably 16 bit e.g. Soundblaster,
Windows Sounds System, Pro Audio Spectrum
* Price: Unconfirmed $1,999 per copy, and $499 per each additional
language (American English, French, German, or Spanish).
* Contact: USA (617) 932-4118
Tinytalk
* Platform: PC
* Description: Shareware package is a speech 'screen reader' which
is used by many blind users.
* Availability: By anonymous ftp
+ ftp://handicap.shel.isc-br.com/speech
Get the files ttexe166.zip and ttdoc166.zip.
Narrator - narrator.device
* Platform: Amiga
* Description: Formant based speech synthesis. Includes a
Engish-to-phoneme translation library, and a SPEAK: pseudo-device
for speech output.
* Hardware: Standard Amiga hardware
* Availability: Part of AmigaOS
Infovox Product Range
* Description: Multilingual Text-to-speech systems, languages
available: American English, British English, German, French,
Spanish, Italian, Swedish, Norwegian, Icelandic, Danish and
Finnish.
* Product name: INFOVOX 500, PC BOARD
+ Product description: Half length expansion board for IBM PC,
XT, AT, PS/2 model 30 or compatible personal computers. The
board can also be connected via the serial port. Language and
control program for downloading into RAM or mounted on
EPROMs.
+ Platform: for IBM PC, XT, AT, PS/2 model 30 or compatible
* Product name: INFOVOX 600, OEM BOARD
+ Product description: OEM board built with CMOS IC's. Language
and control program are stored in on-board fixed memory.
+ Platform: any, Interface: 9-pole D-SUB (RS 232-C) 300-9600
Baud
* Product name: INFOVOX 700, DESKTOP UNIT
+ Product description: Desktop unit with built in Infovox 600
to be connected to any computer or terminal via an RS 232-C
serial interface. Built in loudspeaker and rechargable
battery for 4 hours use, and control knobs for continuous
control of speech volume and speed.
+ Platform: any
* Product name: INFOVOX 650, OEM BOARD
+ Product description: OEM-board built with CMOS IC's. Language
and control program are stored in on-board memory.
+ Platform:any, Interface: 9 pole D-SUB (RS 232-C) 300-9600
Baud
* Product name: INFOVOX 750, DESKTOP UNIT
+ Product description: Desktop unit with built in Infovox 650
to be connected to any computer or terminal via an RS 232-C
serial interface. Built in loudspeaker and rechargable
battery for 5 hours use, and a control knob for continuous
control of speech volume.
+ Platform: any
* Misc: Infovox multi-lingual Text-to-Speech Technologies can
interface with Apple's PlainTalk System. It enables Apple Third
party developers to write application software with synthetic
speech output using their usual Apple Plain Talk Text-to-Speech
interface. Software already written for the English speaking
market using Apple Plain Talk can be now distributed worldwide,
provided message strings are translated.
* Contact:
Telia Promotor Infovox AB
TTS Sales Division
P.O. Box 2069
S-171 02 Solna, Sweden
Ph: +46 8 764 35 00 Fax: +46 8 735 78 76
email: tts-sales@infovox.se
SIMTEL-20
* The following is a list of speech related software available from
SIMTEL-20 and its mirror sites for PCs.
* The SIMTEL internet address is WSMR-SIMTEL20.Army.Mil
[192.88.110.20] Try looking at your nearest archive site first.
[Note: problems have been reported in accessing this site - does
anyone know a new address?]
Directory PD1: MSDOS.VOICE
Filename Type Length Date Description
==============================================
AUTOTALK.ARC B 23618 881216 Digitized speech for the PC
CVOICE.ARC B 21335 891113 Tells time via voice response on PC
HEARTYPE.ARC B 10112 880422 Hear what you are typing, crude voice synth.
HELPME2.ARC B 8031 871130 Voice cries out 'Help Me!' from PC speaker
SAY.ARC B 20224 860330 Computer Speech - using phonemes
SPEECH98.ZIP B 41003 910628 Build speech (voice) on PC using 98 phonemes
TALK.ARC B 8576 861109 BASIC program to demo talking on a PC speaker
TRAN.ARC B 39766 890715 Repeats typed text in digital voice
VDIGIT.ZIP B 196284 901223 Toolkit: Add digitized voice to your programs
VGREET.ARC B 45281 900117 Voice says good morning/afternoon/evening
_________________________________________________________________
===========================================================================
FAQ SECTION 6 - Speech Recognition
Q6.1: WHAT IS SPEECH RECOGNITION?
Automatic speech recognition is the process by which a computer maps
an acoustic speech signal to text.
Automatic speech understanding is the process by which a computer maps
an acoustic speech signal to some form of abstract meaning of the
speech.
_________________________________________________________________
Q6.2: HOW CAN I BUILD A VERY SIMPLE SPEECH RECOGNISER?
Doug Danforth provides a detailed account in article 253 in the
comp.speech archives. A summary is provided below. It is also
available by anonymous ftp
*
ftp://svr-ftp.eng.cam.ac.uk/pub/comp.speech/info/DIY_SpeechRecognit
ion
QUICKY RECOGNIZER sketch:
Here is a simple recognizer that should give you 85%+ recognition
accuracy. The accuracy is a function of the words you have in your
vocabulary. Long distinct words are easy. Short similar words are
hard. You can get 98+% on the digits with this recognizer.
Overview:
* Find the begining and end of the utterance.
* Filter the raw signal into frequency bands.
* Cut the utterance into a fixed number of segments.
* Average data for each band in each segment.
* Store this pattern with its name.
* Collect training set of about 3 repetitions of each pattern
(word).
* Recognize unknown by comparing its pattern against all patterns in
the training set and returning the name of the pattern closest to
the unknown.
Many variations upon the theme can be made to improve the performance.
Try different filtering of the raw signal and different processing
methods.
Q6.7 contains information on public domain speech recognition
software: Lotec and Myers' Hidden Markov Model software.
_________________________________________________________________
Q6.3: WHAT DOES SPEAKER DEPENDENT/ADAPTIVE/INDEPENDENT MEAN?
A speaker dependent system is developed to operate for a single
speaker. These systems are usually easier to develop, cheaper to buy
and more accurate, but not as flexible as speaker adaptive or speaker
independent systems.
A speaker independent system is developed to operate for any speaker
of a particular type (e.g. American English). These systems are the
most difficult to develop, most expensive and accuracy is lower than
speaker independent systems. However, they are more flexible.
A speaker adaptive system is developed to adapt its operation to the
characteristics of new speakers. It's difficulty lies somewhere
between speaker independent and speaker dependent systems.
_________________________________________________________________
Q6.4: WHAT DOES SMALL/MEDIUM/LARGE/VERY-LARGE VOCABULARY MEAN?
The size of vocabulary of a speech recognition system affects the
complexity, processing requirements and the accuracy of the system.
Some applications only require a few words (e.g. numbers only), others
require very large dictionaries (e.g. dictation machines). There are
no established definitions, however, try
* small vocabulary - tens of words
* medium vocabulary - hundreds of words
* large vocabulary - thousands of words
* very-large vocabulary - tens of thousands of words.
_________________________________________________________________
Q6.5: WHAT DOES CONTINUOUS SPEECH OR ISOLATED-WORD MEAN?
An isolated-word system operates on single words at a time - requiring
a pause between saying each word. This is the simplest form of
recognition to perform because the end points are easier to find and
the pronunciation of a word tends not affect others. Thus, because the
occurrences of words are more consistent they are easier to recognise.
A continuous speech system operates on speech in which words are
connected together, i.e. not separated by pauses. Continuous speech is
more difficult to handle because of a variety of effects. First, it is
difficult to find the start and end points of words. Another problem
is "coarticulation". The production of each phoneme is affected by the
production of surrounding phonemes, and similarly the the start and
end of words are affected by the preceding and following words. The
recognition of continuous speech is also affected by the rate of
speech (fast speech tends to be harder).
_________________________________________________________________
Q6.6: HOW IS SPEECH RECOGNITION PERFORMED?
A wide variety of techniques are used to perform speech recognition.
There are many types of speech recognition. There are many levels of
speech recognition / analysis / understanding.
Typically speech recognition starts with the digital sampling of
speech. The next stage is acoustic signal processing. Most techniques
include spectral analysis; e.g. LPC analysis, MFCC, cochlea modelling
and many, many more.
The next stage is recognition of phonemes, groups of phonemes and
words. This stage can be achieved by many processes such as DTW
(Dynamic Time Warping), HMM (hidden Markov modelling), NNs (Neural
Networks), expert systems and combinations of techniques. HMM-based
systems are currently the most commonly used and most successful
approach.
Most systems utilise some knowledge of the language to aid the
recognition process.
Some systems try to "understand" speech. That is, they try to convert
the words into a representation of what the speaker intended to mean
or achieve by what they said.
_________________________________________________________________
Q6.7: WHAT ARE SOME GOOD REFERENCES/BOOKS ON SPEECH RECOGNITION?
Some reviews of speech recognition for personal computers:
* "Seybold Report on Desktop Publishing" published a nine-page,
head-to-head comparison of Dragon's DOS software with IBM's OS/2
software. March 7, 1994; Volume 8, Number 7; Pages 3-11;
ISSN:0889-9762; Seybold Publications, P.O. Box 644, Media, PA
19063 USA, phone (610) 565-2480.
* McGraw-Hill Inc.'s "BYTE, the Magazine of Technology Integration,"
published a two-page review of IBM's Personal Dictation System
software. May 1994; Volume ?, Number ?; Pages 145-146;
ISSN:0360-5280; Editorial, Executive, and Circulation address: One
Phoenix Mill Lane, Peterborough, NH 03458 USA, phone ?
Some general introduction books on speech recognition technology:
* Fundamentals of Speech Recognition; Lawrence Rabiner & Biing-Hwang
Juang Englewood Cliffs NJ: PTR Prentice Hall (Signal Processing
Series), c1993 ISBN 0-13-015157-2
* Speech recognition by machine; W.A. Ainsworth London: Peregrinus
for the Institution of Electrical Engineers, c1988
* Speech synthesis and recognition; J.N. Holmes Wokingham: Van
Nostrand Reinhold, c1988
* Douglas O'Shaughnessy -- Speech Communication: Human and Machine
Addison Wesley series in Electrical Engineering: Digital Signal
Processing, 1987.
* Electronic speech recognition: techniques, technology and
applications edited by Geoff Bristow, London: Collins, 1986
* Readings in Speech Recognition; edited by Alex Waibel & Kai-Fu
Lee. San Mateo: Morgan Kaufmann, c1990
More specific books/articles:
* Hidden Markov models for speech recognition; X.D. Huang, Y. Ariki,
M.A. Jack. Edinburgh: Edinburgh University Press, c1990
* Automatic speech recognition: the development of the SPHINX
system; by Kai-Fu Lee; Boston; London: Kluwer Academic, c1989
* Prosody and speech recognition; Alex Waibel (Pitman: London)
(Morgan Kaufmann: San Mateo, Calif) 1988
* S. E. Levinson, L. R. Rabiner and M. M. Sondhi, "An Introduction
to the Application of the Theory of Probabilistic Functions of a
Markov Process to Automatic Speech Recognition" in Bell Syst.
Tech. Jnl. v62(4), pp1035--1074, April 1983
* R. P. Lippmann, "Review of Neural Networks for Speech
Recognition", in Neural Computation, v1(1), pp 1-38, 1989.
_________________________________________________________________
Q6.8: WHAT SPEECH RECOGNITION PACKAGES ARE AVAILABLE?
The following packages are presented in no particular order.
HM2007 - Speech Recognition Chip
* Description: HM2007 is a 48-pin single chip CMOS voice
recognition LSI circuit with on-chip analog front end, voice
analysis, recognition process and system control functions. A 40
word isolated-word voice recognition system can be composed of an
external microphone, keyboard, SRAM and a few other components.
When combined with a microprocessor, an intelligent recognition
system can be built. A demo board for this chip is being
distributed by The Summa Group.
* Cost: Approx US$30 for the HM2007 and US$100 for the demo board.
* Warning: Several people have reported problems in obtaining
small numbers of this chip (say less than 10). It appears that the
distributors (include the one listed below) are only interested in
large volumes. If you know of a good source please send it in for
inclusion in the FAQ.
* Contact:
The Summa Group Limited
One California Street, Suite #1940,
San Francisco, CA 94111
Ph: (415) 288-0390
Voice Blaster Ver. 4.0
* Platform: IBM AT or higher, DOS or Wndows 3.1
* Description: Uses a Sound Blaster or compatible board. Contains
a microphone headset and a connector for LPT1:. A printer can
still be used on LPT1:. Will recognize 1024 words that are trained
by the operator. Each word activates a macro that can enter an
ascii word on the screen or into a word processor or invoke a
batch file. An optional footswitch may be installed. Software to
run under DOS or Windows 3.1 is included.
* Cost: Around $150 Canadian.
* Contact:
COVOX Inc.
675 Conger Street
Eugene, Oregon, 97402, USA
Ph: (503) 342-1271 Fax: (503) 342-1283
BBS: (503) 342-4135
Votan
* Platform: MS-DOS, SCO UNIX
* Description: Isolated word and continuous speech modes, speaker
dependant and (limited) speaker independent. Vocab size is 255
words or up to a fixed memory limit - but it is possible to
dynamically load different words for effectively unlimited number
of words.
* Rough Cost: Approx US $1,000-$1,500
* Requirements: Cost includes one Votan Voice Recognition ISA-bus
board for 386/486-based machines. A software development system is
also available for DOS and Unix.
* Misc: Up to 8 Votan boards may co-exist for 8 simultaneous voice
users. A telephone interface is also available. There is also a
4GL and a software development system. Apparently there is more
than one version - more info required.
* Contact: 800-877-4756, 510-426-5600
Entropic's HTK (HMM Toolkit)
* Platform: Range of Unix platforms.
* Description: HTK is a software toolkit for building continuous
density HMM based speech recognisers. It consists of a number of
library modules and a number of tools. Functions include speech
analysis, training tools, recognition tools, results analysis, and
an interactive tool for speech labelling. Many standard forms of
continuous density HMM are possible. Can perform isolated word or
connected word speech recognition. It van model whole words, sub-
word units. Can perform speaker verification and other pattern
recognition work using HMMs. HTK is now integerated with the
ESPS/Waves speech research environment which is described in
Section 1.8.
* Misc: The availability of HTK changed in early 1993 when
Entropic obtained exclusive marketing rights to HTK from the
developers at Cambridge.
* Cost: On request.
* Contact:
Entropic Research Laboratory,
600 Pennsylvania Ave, S.E. Suite 202,
Washington, D.C. 20003, USA
Phone: (202) 547-1420.
email - info@entropic.com
DragonDictate version 3.0
* Platform: PC
* Description: Speaker-adaptive recognition system for discrete
speech. Provides 110,000 word dictionary and also allows user to
add words. Active vocabulary of 5,000, 30,000, or 60,000 words.
Allows dictation into almost all DOS applications (word
processors, spreadsheets, etc.) and hands-free operation of the
PC.
* Cost:Prices including audio board and high-quality headset
microphone:
+ US$695 (5,000 word Starter Edition)
+ US$995 (30,000 word Classic Edition)
+ US$1,995 (60,000 word Power Edition)
* Requirements: Minimum of 33 Mhz 486 with 8-16M memory and at
least 29M disk space (depending on product), one 8-bit slot, DOS
5.0 and up (also runs in a DOS box under Windows or OS/2).
* Contact:
Dragon Systems, Inc.
320 Nevada Street
Newton, MA 02160, USA
Tel: 1-617-965-5200, Fax: 1-617-527-0372
DragonDictate for Windows
* Platform: PC
* Description: Speech-to-text dictation system. Discrete speech;
speaker- adaptive. Also provides command/control and mouse
movement for hands-free operation of Windows. Comes with a 120,000
word pronunciation dictionary; users can also add their own words
or phrases. Dictate directly into any application.
* Rough Cost:Prices including software, documentation and
microphone:
+ DragonDictate Starter Edition (5,000 words active) -- $395
+ DragonDictate Classic Edition (30,000 words active) -- $695
+ DragonDictate Power Edition (60,000 words active) -- $1,695
* Requirements: 486/33, 7-10 MB dedicated RAM (depending on
edition), Windows 3.1 or later. Supported sound boards: Media
Vision Pro Audio Studio 16, Creative Labs Sound Blaster 16,
Microsoft Windows Sound System, IBM Audio Capture/Playback
Adapter.
* Contact:
Dragon Systems, Inc.
320 Nevada Street
Newton, MA 02160, USA
Phone: (617)965-5200 Fax: (617)527-0372
DragonVoiceTools
* Platform: PC
* Description: Programmer's toolkit for developing speech-aware
DOS or Windows applications. Recognizes continuously spoken digits
and discretely spoken words or phrases. Up to 1,000 words can be
active at one time. Use words from 110,000 word dictionary
(included) and/or develop your own word models.
* Cost:
+ US$1,995 (developer's kit)
+ US$595 (end-user system)
* Requirements: Minimum of 20 Mhz 386 (larger vocabulary requires
faster processor) with at least 5M memory and at least 19M disk
space (depending on vocabulary size), DOS 5.0 and up, Windows 3.1
and up, Borland C or C++ or Microsoft C or C++. Also requires IBM
M-ACPA card available from IBM or Dragon Systems ($325).
* Contact:
Dragon Systems, Inc.
320 Nevada Street,
Newton, MA 02160, USA
Tel: 1-617-965-5200, Fax: 1-617-527-0372
IBM VoiceType Dictation
OR: Osborne Personal Dictation System (in Australia)
* Platform: Intel I486 & IBM OS/2
* Description: Independent Speaker, discrete speech dictation with
navigation. Navigation does not require setup, most applications
are automatically speech enabled by dynamic control analysis.
Dictation averages 70WPM with 95% accuracy and uses statistical
trigram modelling. The base system is 22K words, other
vocabularies available for specific industries.
* Requirements: 486SX or above, 16MB Ram, 30MB File space,
Dictation Adapter
* Cost: Software $495 (includes mic) / Hardware $495
* Misc 1: A Windows version is now available.
* Misc 2: Based on IBM Tangora Technology
* Availability: US English. Other languages (UK, FR, GR, IT, and
ES) available 3Q94.
* Contact: US Contact 1-800-TALK-2-ME or 1-914-766-9252.
VoiceServer for Windows
* Platform: PC
* Description: Speaker dependent, each with an independent
directory. Isolated word. Upto 1000 words/user, 300 words/window.
1 word occupies 2Kb on hard disk. Can be used to control Windows
applications by issuing voice commands instead of menu selection.
* Rough Cost: 292 Pounds(UK)
* Requirements: None
* Misc: Price includes a half-sized AT voice card (including a
DSP), software, documentation & a microphone (attachable to
keyboard or speaker). A light-weight high-spec headset is an
optional extra.
* Contact:
Mark Redwood
Applied Voice Technologies
26 Danbury Street, Islington,
London, UK, N1 8JU
Ph: + 44 71 454 1224 : Fax: + 44 71 454 1225
IN3 Voice Command for Windows
* Platform: PC with Windows 3.1
* Description: IN3 is now available for MS-Windows. Users can call
applications to the foreground with voice commands. Once the
application is called, the user may enter commands and data with
voice commands. Voice macros can reduce the strain of repetitive
stress injuries (RSI) such as Carpel Tunnel Syndrome (CTS) by
replacing heavy repetitive keyboard hammering with simple voice
operations. Voice macros take complex operations and reduce them
to simple verbal commands. Voice input can provide new facilities
for tasks which could not easily have been otherwise performed
without the multiple axis of input. IN3 is hardware-independent,
users with any Windows-compatible audio add speech recognition to
the desktop. IN3 works with either 8 bit or 16 bit Windows audio
boards. IN3 is based on continuous word-spotting technology. A
developer API is also available for creating voice-enabled
applications.
* Price: $179 U.S.
* Requirements: PC with 80386 processor or better, Microsoft
Windows 3.1, and Windows compatible audio system with microphone.
* Misc: Fully functional demos are available on Compuserve in
various Multimedia and CAD forums. Demos are also available from
"America on Line", the comp.binaries.ms-windows archive sites, and
various BBS systems. It is also available by anonymous ftp
+
ftp://ftp.wustl.edu/usenet/comp.binaries.ms-windows/v3/in3demo
.zip
+ ftp://ftp.uwasa.fi/mirror/ultrasound/demo/in3demo.zip
An equivilant Sun product is described below.
* Contact:
Brantley Kelly
Email: cbk@gacc.atl.ga.us CIS: 75120,431
FAX: 1-404-925-7924 Phone: 1-404-925-7950
Command Corp. Inc, 3675 Crestwood Parkway, Duluth GA 30136, USA
IN3 Voice Command
* Platform: Sun SPARCstation
* Description: IN3 provides a secure, robust, word spotting,
continuous speech recognition facility for the Sun OS or Solaris
operating systems. The recognition system is a secure operating
system facility capable of working with various interfaces,
microphones, and devices. The operating system interface works
with native UNIX outside of X Windows as well as provides enhanced
X Windows facilities including named window support. The user
interface provides a means to quickly create commands on the fly
for replacing long strings and complex operations with voice
macros. [Voice macros can reduce the strain of repetitive stress
injuries (RSI) such as Carpel Tunnel Syndrome (CTS) by replacing
heavy repetitive keyboard hammering with simple voice operations.
] The IN3 user interface works with generic X servers and window
managers. A developer API is also available for creating voice-
enabled applications, interfacing with other audio sources, and
providing extensive application control over the recognition
facility.
* Availability: SunSite archive at SunSITE.unc.edu as well as on
Catalyst CDware as both a runable demo and unlockable software.
* Hardware Required: Sun SPARCstation with audio input. Noise
canceling microphone recommended but not required.
* Software Required:
+ Sun OS 4.1.2 with OpenWindows 3.0
+ or, Sun OS 4.1.3
+ or, Solaris 2.1 or Solaris 2.2
* Misc: An equivilant MS-Windows product is described above.
* Price: $495 U.S.
* Contact:
Brantley Kelly
Email: cbk@gacc.atl.ga.us CIS: 75120,431
FAX: 1-404-925-7924 Phone: 1-404-813-8030
Command Corp. Inc, 3675 Crestwood Parkway, Duluth GA 30136, USA
Phonetic Engine 400 (PE400) - Speech Systems, Inc.
* Platform: PC
* Description: Speaker independent, large vocabulary, continuous
speech recognition for MS Windows or DOS.
* Rough Cost: $1195 US dollars. Includes board, microphone,
developer kit, documentation, 2 days of technical training and 90
days of technical support.
* Requirements: IBM AT class machine or better plus 5M disk space.
Most processing is performed on-board (4M standard or 16M
upgrade).
* Misc: Requires developer to provide a context-free grammar.
Vocabulary size unknown (quotes from 500 - 2000 words per
grammar), but dynamic grammar switching capabilities may increase
the effective vocabulary size. Development system includes
lower-level C,C++ library (VoiceLib), higher-level DLL (SPOT)
callable from many languages, SPOT/VBX, a custom control for
Visual Basic and Visual C++.
* Contact:
Speech Systems, Inc.
2945 Center Green Court South
Boulder, CO 80301-2275, USA
Tel: 303.938.1110 Fax: 303.938.1874
SayIt
* Platform: Sun SPARCstation
* Description: Voice recognition and macro building package for
Suns in the Openwindows 3.0 environment. Speaker dependent
discrete speech recognition. Vocabularies can be associated to
applications and the active vocabulary follows the application
that has input focus. Macros can include mouse commands,
keystrokes, Unix commands, sound, Openwindow actions and more. An
evaluation copy is available by email.
* Hardware: Microphone required (SunMicrophone is fine).
* Cost: $US295
* Contact:
Phone: 1-800-245-UNIX or 1-415-572-0200
Fax: 1-415-572-1300
Email: info@qualix.com
Kurzweil Voice for Windows
* Platform: MS Windows 3.1
* Description: Kurzweil Voice for Windows is a dictation product
enabling the user to create text and enter data by speaking to
Windows-based applications. System is adaptive but requires no
initial training. Users can choose either 30,000 or 60,000 word
active vocabulary. Application command translation templates for
popular Windows application such as WordPerfect, 1-2-3, Organizer,
Word.
* Cost: US $995
* Hardware: 486DX/33 or higher, 8 or 16 MB dedicated memory
(depends on vocabulary, 30 MBs dedicated disk space, VGA or
higher, Kurzweil-supplied microphone and DSP board.
* Contact:
Phone: 1-800-380-1234
Email: info@kurz-ai.com
D6006 Voice Control Processor
* Platform: ?
* Description: ?
* Contact:
DSP Telecommunications Inc.
2855 Kifer Road, Suite 202, Santa Clara CA 95051, USA
Tel:(408)986-4310
Fax:(408)986-4324
Speech Commander - Listen for Windows
* Platform: ?
* Description: ?
* Contact:
Verbex Voice Systems
1090 King Georges Post Rd., Bldg 107,
Edison NJ 08837, USA
Tel:(908)225-5225
Fax:(908)225-7764
Voice-Trek 2.0
* Platform: ?
* Description: ?
* Contact:
Tardis Technology Inc., Voice Recognition Div.
10321 Los Alamitos Blvd., Los Alamitos CA 90720
Tel:(310)799-3355 Fax:(310)799-3360
Visus SpeechKit
* Platform: NeXT
* Description: SpeechKit is based on SPHINX, a
speaker-independent, 1000 word or so, continuous speech
recognition system which allows you to incorporate speech
recognition into your applications. You can design your vocabulary
and grammars.
* Contact: Visus - no address or phone provided. A possible
contact is Robert Brennan at Carnegie Mellon University. email:
Robert_Brennan@cmu.edu
recnet
* Platform: UNIX
* Description: Speech recognition for the speaker independent
TIMIT and Resource Management tasks. It uses recurrent networks to
estimate phone probabilities and Markov models to find the most
probable sequence of phones or words. The system is a snapshot of
evolving research code. There is no documentation other than
published research papers. The components are:
+ A preprocessor which implements many standard and many non-
standard front end processing techniques.
+ A recurrent net recogniser and parameter files
+ Two Markov model based recognisers, one for phone recognition
and one for word recognition
+ A dynamic programming scoring package The complete system
performs competatively.
* Cost: Free
* Requirements: TIMIT and Resource Management databases
* Contact: Tony Robinson: ajr@eng.cam.ac.uk
* Availability: by anonymous ftp
+
ftp://svr-ftp.eng.cam.ac.uk/pub/comp.speech/misc/recnet-1.3.ta
r.Z
Lotec Speech Recognition Package
* Platform: Sun
* Description: Public domain speech recognition software. Operates
from input in Sun audio format (.au files) and outputs word
hypotheses and time labelling data. The software includes programs
to collect speech samples, a labeller, a "featurizer" which
parameterises speech files, a word spotter and the recogniser. The
software can perform real time recognition on a Sparc 10 for small
vocabularies.
* Requirements: Sun SPARC audio input and a "decent" microphone
Sun multimedia demo software (in /usr/demo/SOUND) and X.
* Availability: By anonymous ftp
+ ftp://ftp.sanpo.t.u-tokyo.ac.jp/pub/nigel/lotec/lotec.tar.Z
* Contact: Nigel Ward: nigel@sanpo.t.u-tokyo.ac.jp
Myers' Hidden Markov Model software
* Description: Hidden Markov model software for automatic speech
recognition. C++ code that implements a basic left-right hidden
Markov model and corresponding Baum-Welch (ML) training algorithm.
It is meant as an example of the HMM algorithms described by
L.Rabiner and others. The code was built in order to learn how HMM
systems work and we are now offering it to the net so that others
can learn how to use HMMs for speech recognition. Keep in mind
that ease of understanding was pit primary concern, not
efficiency. The code can be used to build an experimental speech
recognition systems using "train_hmm" and "test_hmm", and can be
used in conjunction with written tutorials on HMMs to understand
how they work.
* Availability: By anonymous ftp from the comp.speech archive
site. There are three files in the directory
+ ftp://svr-ftp.eng.cam.ac.uk/pub/comp.speech/sources
The files are
+ hmm.README
+ hmm-1.0.tar.Z
+ OR, hmm-1.0.tar.gz
(Note: hmm-1.0.tar.Z and hmm-1.0.tar.gz compressed and GNU compressed
versions of the same files)
* Contact: Richard Myers: email rmyers@ics.uci.edu
Voice Command Line Interface
* Platform: Amiga
* Description: VCLI will execute CLI commands, ARexx commands, or
ARexx scripts by voice command through your audio digitizer. VCLI
allows you to launch multiple applications or control any program
with an ARexx capability entirely by spoken voice command. VCLI is
fully multitasking and will run in the background, continuously
listening for your voice commands even while other programs are
running. Documentation is provided in AmigaGuide format. VCLI 6.0
runs under either Amiga DOS 2.0 or 3.0.
* Cost: Free?
* Requirements: Supports the DSS8, PerfectSound 3, Sound Master,
Sound Magic, and Generic audio digitizers.
* Availability: by ftp from wuarchive.wustl.edu in the file
systems/amiga/incoming/audio/VCLI60.lha and from
amiga.physik.unizh.ch as the file pub/aminet/util/misc/VCLI60.lha
* Contact: Author's email is RHorne@cup.portal.com
DATAVOX - French
* Platform: PC
* Description: Continuous speech - speaker independent or
dependent.
* Rough Cost: ?
* Requirements: 2 PC format boards (RdF1000 and TdS 96/25) and an
A/D - D/A module (ASA116)
* Misc: Application software may dialog with DATAVOX through 2
types of interfaces :
+ Keyboard overlay: The application software may be used with
any PC compatible package. No specific adaptation is
necessary, you only need to define your configuration with
the application software.
+ C library: Allows a user-written program to drive the
recognition system.
DATAVOX is based on the AMADEUS speech recognition software developed
at LIMSI. It provides
+ Continuous speech recognition with 500 words speaker
dependent, 50 words speaker independent (custom-made
vocabulary).
+ Grammar of the application language (syntax acquisition,
verification and simplification software).
+ Large vocabulary : DATAVOX can recognize vocabularies of
several thousand words as long as there are no more than 500
words in the active vocabulary at any given node. It takes
less than 1 second to change syntax and vocabulary.
+ Training controlled by the system (use of co-articulation
models).
+ Response time less than 500 ms for any phrase length.
+ Synthetis (ADPCM) can be heard simultaneously while
recognition is being carried out.
* Contact:
VECSYS
Le Chene rond, 91570 Bievres, France
Fax: 33 1 69 41 24 30
Voice: 33 1 69 41 15 04
PowerSecretary
* Platform: Centris 650, 660AV. Quadra 650, 660AV, 700,800, 840AV,
900, 950.
* Description: Speaker dependent/adaptive system requiring words
to be separated by short pauses.
* Vocabulary: 30,000 at any one time, automatically selected from
120,000-word dictionary.
* Cost: US$2,495; non-AV machines need an audio board will cost
about US$300.
* Requirements: Minimum of 16M of ram and System 7.0.
* Contact:
Articulate Systems
600 W. Cummings Park, Suite 4500
Woburn, MA 01801
Ph: (617) 935-5656 Fax: (617) 935-0490.
ICSS system from IBM
* Description: A large vocabulary, speaker independent, continuous
speech system which runs under Windows, OS/2, and AIX.
* Requirements: Soundboard (e.g. Soundblaster)
* Price: $US319
* Contact:
A&G Graphics Interface
ICSS Reseller
51 Gore Street, Cambridge, MA, 02139, USA
(617) 492-0120
Custom Voice(TM) by A&G Graphics Interface
* Description: Speech recognition custom control for Visual Basic,
Visual C++, Borland C++, and other development platforms that
support *.VBX. Provides an engine/proprietary independent
development platform for speech recognition. Currently supports
ICSS, but should soon support other platforms. Includes a grammar
debugger and parser APIs to parse spoken speech into useful data
types.
* Requirements: Visual Basic or any development platform that
supports VBX.
* Price: $US495 or $695 bundled with ICSS.
* Contact:
A&G Graphics Interface
51 Gore Street, Cambridge, MA, 02139, USA
(617) 492-0120
Creative VoiceAssist
* Platform: PC (?)
* Price: $US99.95
* Contact:
Creative Labs
Ph: 1-800-998-5227
_________________________________________________________________
Andrew Hunt
---
Speech Technology Research Group Ph: 61-2-351 4509
Dept. of Electrical Engineering Fax: 61-2-351 3847
University of Sydney, NSW, 2006, Australia email: andrewh@speech.su.oz.au