Newsgroups: comp.speech
Path: lyra.csx.cam.ac.uk!warwick!slxsys!pipex!howland.reston.ans.net!usc!nic-nac.CSU.net!charnel.ecst.csuchico.edu!csusac!csus.edu!netcom.com!alvin
From: alvin@netcom.com (Alvin H. White)
Subject: Re: TTS Technology
Message-ID: <alvinCw1CLn.Cq4@netcom.com>
Organization: NETCOM On-line Communication Services (408 261-4700 guest)
References: <bluebCvp5Co.3K2@netcom.com> <AWB.94Sep9165611@as53.itl.atr.co.jp>
Date: Mon, 12 Sep 1994 21:20:11 GMT
Lines: 155

awb@itl.atr.co.jp (Alan W Black) writes:

>Well, again note I've only recently come to this field, one problem
>noticed by many is that it is difficult to measure the quality of the
>resulting speech, unlike recognition where we can give numbers saying
>how much it recognised (though there, there is a problem in defining
>how difficult the task is), in synthesis it is difficult to say if one
>synthesizer is better than an other. Or as is more likely whether
>changing a module in your own synthesizer makes it better or worse.
>Measurement of success makes it difficult to try out all the existing
>techniques (and vary their parameters) to maximise the quality when we
>have no real measure of it.  Often we have to use perceptual tests
>(i.e. have humans judge) but that brings in a whole host of
>other problems.

>Alan

>* Alan W Black ---  ATR Interpreting Telecommunications Laboratories *
>2-2 Hikaridai                         email: awb@itl.atr.co.jp
>Seika-cho, Soraku-gun,                tel: (+81) 7749 5 1314
>Kyoto 619-02, Japan                   fax: (+81) 7749 5 1308


Because of my belief in the importance of the subject to myself, my
family, my church, my school, my culture, my village, my town, my
city, my state, my nation, my world, and every living human being on,
under or over the surface of the globe, and maybe even to the family
dog if you take for evidence the picture of the dog on the old RCA
record labels 40 years ago with the caption "His Master's Voice" I
will write a little here.

I so enjoyed Alan Black's paragraph above that I gave it complete
there and will now divide it little by little to respond detail by
detail.

First let me say that a popular book here in my part of the world,
the West Coast of North America, is the Bible or more multitudenous
the New Testament part thereof in thousands of editions and tongues.

I haven't studied it too terribly much but occasionally one hears a
phrase now and then. One phrase that occasionally plays over and 
over in my mind goes something like "Study To Show Thyself Approved,
Rightly Dividing The Words Of Truth". It is thereabouts that ideas
like "Singing", "Music", "Speech", "Computer Music", "Computer Speech",
"Computer Singing", "Speech Teacher", "Singing Teacher", "Reading
Teacher" and "Literacy" come to play.

So believing that I would like to be able to "speak" and "read aloud"
well I have now and then invested a little of my life time in self
improvement studies: methods and materials.

Here in San Jose California, especially after 1980 for me, computing
as a hobby has had a significant and growing importance. There are
 "Garage Sales", "Flea Markets", "Computer Swap Meets", "Computer
Stores", "Ham/Electronics/Computer Swaps", and, "International and 
National Electronics Trade Shows".

I began to collect hard and soft ware. Music and speech were a 
proportion of the culture. The music was plentiful and varied if
not always the most edifying, the speech was at times more edifying
but if just the computer variety was considered it was not near so
plentiful and varied.

It seems to me that after one has acquired some "Quantity" "Quality"
then begins to increase in importance. A little "Variety" also comes
back to mind.

Is it then that the questions regarding "How Do You Judge?" arise?

A faint waft of news regarding the world history of the computer
chess tournaments comes to mind. Maybe it's a history of the World
Computer Chess Tournament. Was one of the machines named "Deep Thought?
Well, so far, we computer speachists haven't had "Great Voice" singing
at that one or at the Rose Bowl beer commercial but .... Then we also
haven't seen the latest entries from St. Peter's Organ [Is there one?],
The Mormon Tabernacle Organ and others of the great musical voices from
around the world. Great place for a plug for "Univers Musicum Omnium 
Colloquium", a little millinium music fest I have thought about trying to 
hatch up somewhere about.

Any latin speakers have any idea what that phrase might mean? It sounded
good to me but I am not sure what all it exactly might give one an idea of. 
But, so far at least, I haven't seen anyone else that says they have a
copyright on it.

So back to this idea of starting a worldwide contest on computer speech
and singing. "Singing" being in my mind, at the moment, "speech at an
easily documentable predefined rate/time."

So, "How Do We Judge?"

At the aforementioned "trade shows" there would some times be both 
"speech synthesis" and less often a "speech recognition" component
but almost or never in the same booth. Then there would be the "music"
group. But these groups did not talk to one another. If you went to
one sales representative and asked a question regarding their product
they would burst forth but if you then got in a small question 
regarding one of the other two of the three specialty terms
"Music/Speech Synthesis/Speech Recognition" the rep went instantly
stone deaf, blind and distracted. 

I used to say to myself that the speech synthesis people couldn't
tell you when it was going to start, stop or how long it was going
to last but it could be quite edifying when it came out. On the
other hand, the musicians could tell you very precisely when it
would start, when it would stop and the proportion/ratio in between,
but they had few, if any, word samples to distribute. The speech
recognition groups were so few and far between, I think Dragon
Systems of Newton Massachusetts was the only one I ever saw, that
I never had the opportunity to ask if they could recognize any music.
They said the Dragon, at that time, didn't speak.

I always wanted to set up a contest where on one side would be the
speaking machines and on the other side would be the speech recognition
machines. In the middle would be the referee and s/he would give out
the phrase to be said. In the background would be the computer music
of the moment. The speech machine would have to speak in time to the
music. The recognition contestants would display their results on a 
set of big screen displays, maybe somewhat like Bowling Alley score
projectors for a live crowd, and in different windows for home
viewers with  windows computers.

There would be two phases to each inning/round/attempt by a contestant.
The first was the completely machine phase that I have just described and
now the second which is the human phase. The speaking machine would 
speak into a telephone and a human listener would hear and then try to 
mimic the machine speech back into the telephone and thereby to the 
recognition machines.

There would also be contest ratings between the human listners/singers
such that different international contestants would be rated on their 
ability to sing what they had heard. 

Now what would it take to get that in to the games at Mt. Olympus. Do 
they have a Muse contest at the Olympics? Would you call that "poetics"
or "Music".

So then someone tries to promote these speaking contests locally
and you have prize winners, championships, trophies, ribbons,
certificates, Barnum and Bailey Circuses could have an event where
the contestant tried to get the machine to recognize what the machine
said before the contestant was shot out of a cannon.

What about one in the hospital emergency room where the first patient
to be able to get the machine to recognize what was said would be 
moved up on the list for medical treatment.

Bon.


-- 
alvin@netcom.COM

Alvin H. White, Gen. Sect.
G.O.D.S.B.R.A.I.N.
P.O.Box 26745
San Jose, CA 95159-6745 USA

(408) 446-1770 

Government Online Database Systems
Bureau for Resource Allocations to Information Networks
[an idea waiting to happen]      .
                                 U   Ohm's Law [Early Version]?
38 North 120 West                3~

Universe Musicum Omnium Colloquium 
Om Mani Padme Hum 
Oh Man! He Paid Me t'Hum

