Newsgroups: comp.ai.nat-lang
Path: cantaloupe.srv.cs.cmu.edu!rochester!cornellcs!newsstand.cit.cornell.edu!newstand.syr.edu!news.maxwell.syr.edu!cam-news-hub1.bbnplanet.com!news.bbnplanet.com!howland.erols.net!cs.utexas.edu!chi-news.cic.net!ftpbox!mothost.mot.com!schbbs!news
From: Orhan Karaali <karaali@mot.com>
Subject: Motorola Neural Network Speech Synthesizer Article
Content-Type: text/plain; charset=us-ascii
Organization: Motorola Chicago Corporate Research Labs
Date: Wed, 12 Feb 1997 11:41:49 -0600
Message-ID: <330200DD.456F@mot.com>
X-Mailer: Mozilla 3.01 (X11; I; SunOS 5.5.1 sun4m)
Mime-Version: 1.0
Content-Transfer-Encoding: 7bit
Sender: news@schbbs.mot.com (SCHBBS News Account)
Nntp-Posting-Host: 182.1.83.53
Lines: 51

FTP-host: archive.cis.ohio-state.edu
FTP-filename: /pub/neuroprose/karaali.synthesis_wcnn96.ps.Z



Motorola Neural Network Speech Synthesizer Article

A new neural network based speech synthesizer has been developed here at
Motorola Chicago Corporate Research Laboratories by Speech Synthesis and
Machine Learning Group.  We believe that the quality of the synthesized
speech it produces surpasses the current state of the art particularly
in
naturalness.

An invited paper describing this neural network speech synthesizer
was presented in the Speech Session of the World Congress on Neural
Networks 96 in San Diego.  This paper is now available in NEUROPROSE
archive as karaali.synthesis_wcnn96.ps.Z.

If you have a problem getting the paper from NEUROPROSE, I can email
it to you.

Orhan Karaali

email: karaali@mot.com


---------------------------------------------------------------------

Speech Synthesis with Neural Networks
Orhan Karaali, Gerald Corrigan, and Ira Gerson
Motorola, Inc., 1301 E. Algonquin Road, Schaumburg, IL 60196
karaali@mot.com, corrigan@mot.com, gerson@mot.com

ABSTRACT

Text-to-speech conversion has traditionally been performed either by
concatenating short samples of speech or by using rule-based systems to
convert
a phonetic representation of speech into an acoustic representation,
which is
then converted into speech. This paper describes a system that uses a
time-delay neural network (TDNN) to perform this phonetic-to-acoustic
mapping,
with another neural network to control the timing of the generated
speech.
The neural network system requires less memory than a concatenation
system,
and performed well in tests comparing it to commercial systems using
other
technologies.
