Newsgroups: comp.speech
Path: cantaloupe.srv.cs.cmu.edu!rochester!udel!news.mathworks.com!uunet!in1.uu.net!psinntp!xetron.com!hoffer.xetron.com!user
From: kevin@xetron.com (Kevin Hoffer)
Subject: Re: What is state of the art in speech synthesis?
Message-ID: <kevin-0504951126420001@hoffer.xetron.com>
Sender: news@xetron.com
Nntp-Posting-Host: hoffer.xetron.com
Organization: Xetron
References: <3liv51$n5@granite.sentex.net> <D6J9sM.8x0@cix.compulink.co.uk>
Date: Wed, 5 Apr 1995 15:24:48 GMT
Lines: 59

In article <D6J9sM.8x0@cix.compulink.co.uk>, jhaseler@cix.compulink.co.uk
("John Haseler") wrote:

> I have been using ProVoice for Windows, from Creative Technology 
> (connected with First Byte and SoundBlaster).  That is pretty good, and 
> easy to use. It does text to speech and has a large exception dictionary. 
>  However getting really natural sounds needs a pretty deep understanding 
> of the meaning of the words, in my opinion, or a human with that 
> understanding hand-tuning it.
> 
> I know that the system works quite hard to blend one phone with another, 
> that it works basically on phone (consonant-vowel) pairs, and that it can 
> adjust the length of phones a factor of perhaps 3 either way while still 
> sounding quite realistic.  You may find 'Monologue for Windows' with a 
> Soundblaster - this is basically the same, but (I guess) a first version 
> - the programming interface is awful, though the sound is still quite 
> good.
> 
> They advertise different language capabilities as well (French, Spanish 
> soon I think).  However unless you can find a way round it, the cost is 
> $600, which may well put many people off - mine came through work.
> 
> John Haseler

I am currently using a Windows program called TextAssist that ships with
higher-end SoundBlaster cards that include ASP (Advanced Signal
Processing).  TextAssist provides nine speakers (four male, four female,
and a child) and has a fairly sophisticated "voice setup" capability to
adjust not only rate, pitch, and volume but also gender, head size,
smoothness, richness, and laryngealization.  In my opinion, only a few of
the voices sound realistic.  

TextAssist may be associated with applications such that when the
application is run, a little control is attached to the program's main
window title bar that allows you to play and stop the speech.  To initiate
the text-to-speech process, simply select a range of text and select the
play button.  I added a text-to-speech capability to my application
without requiring the user to depress the play button (which by the way
can also be hidden from view).  Interested parties can e-mail me with
questions.

I bought the SoundBlaster 16MultiCD (with TextAssist bundled) for $189. 
Monologue for Windows ships with SoundBlaster's that do NOT have the ASP
capability.  I could not use Monologue for Windows since it does not
permit background processing while the speech is occurring.  I believe
ProVoice for Windows will allow background processing during speech.

I have evaluated both the Logitech SoundMan Wave sound card ($139) which
ships with a text-to-speech program called BestSpeech and Lernout &
Hauspie's text-to-speech product ($2000 for SDK + royalties) and I found
TextAssist to have the most realistic sounding speech.

Disclaimer: I am not an employee of any of the companies mentioned above
(except Xetron).  I just went through this evaluation process in the last
month and these represent my opinions only.

-- 
Kevin Hoffer      Software Engineer      Xetron Corp., Cincinnati, OH
(513) 881-3538    Fax: (513) 881-3379
