Newsgroups: comp.speech
Path: pavo.csi.cam.ac.uk!doc.ic.ac.uk!agate!howland.reston.ans.net!vixen.cso.uiuc.edu!uchinews!iitmax!artn!stephan
From: stephan@artn (Stephan Meyers)
Subject: Novice questions
Message-ID: <1994Feb3.000458.12745@iitmax.iit.edu>
Sender: news@iitmax.iit.edu (News)
Organization: Illinois Institute of Technology / Academic Computing Center
X-Newsreader: TIN [version 1.2 PL0]
Date: Thu, 3 Feb 94 00:04:58 GMT
Lines: 61

	Hi, I have some questions that should be pretty naive by the standards
of this group.  If this material is in the FAQ, I would appreciate a pointer
to it.
	I'm going to phrase my question as a series of half remembered data
points:

* One of my first computers was a TI-99, noted for having a speech synthesizer
as a peripheral.  If I remember rightly how this thing worked, it used LPC to
model the human vocal tract.  Tapes of the original speaker were analyzed to
calculate the appropriate vocal tract parameters (tongue position, etc), which
turns into a very tiny amount of data.  The speech synthesizer chip would then
run this data through it's simplified vocal tract model.  Some of the speech
was astoundingly good.
* At the time, the computer used to do the encoding was a fairly hefty one -
what would that break down to in today's standards?  A pentium?  An indigo?
A connection machine?  My intuition is an indigo would be about equivalent, but
I don't know how fast the computation was then.

OK, enough brain dumping, you can probably see which way this is going.

	Would it be possible to (a) implement the LPC encoding on a common
workstation (Let's use an indigo R4000, 100 Mhz, since that's what I've got)
(b) run it in real time or near real time (c) is the data produced small enough
to run through a medium speed network link in real time or near real time?
(d) can the decoding run fast enough on the aforementioned machine, or did the
TI chip use some goofy analog thing that wouldn't work fast enough digitally?
(e) does TI own some sort of patent which would prevent an implementation
without licensing?  (if so, it should be up soon, patents last 17 years, which
would be 1976, and speak & spell came out not long after then) (f) has someone
already done this, or parts of it?

	If it runs in real time, this would seem ideal for an internet voice
chat program.  Yeah, I know phones are easier and cheaper, but it would be fun
and perhaps there are other things you could do with it that you couldn't do
with a phone (maybe link into network games like netrek or something).  Without
the network, this would be good for verbal note taking - remove the computer,
put the encoding and decoding in hardware, and you've got a nice little
ram-based dictation box (though I suspect LPC is very susceptible to background
noise.)

	If the decoding works in real time and the encoding doesn't, it would
be handy for games.

	If neither end runs in real time, there would still be applications,
such as distribution of internet broadcast radio, or speech in Mosaic.

	I know this technology has been in consumer applications for 10-15
years, it should be a lot easier to pull off now.

	Again, please forgive me if this is in a FAQ or if I've got some of the
information wrong - computer speech is not my field, but it seems everybody's
all hot on speech _recognition_, and seems to consider synthesis a boring, 
mostly solved problem.

		- Stephan

--
Stephan Meyers | stephan@artn.iit.edu
(Art)^n Laboratories, inventors of the PHSCologram (R)
GO(CS) d--- p---(+) c++++ !l u++ e++ m+++/--- 
	s--(+) n* h--- f++ g+ w+++ t++@ r x*
