Newsgroups: comp.speech
Path: cantaloupe.srv.cs.cmu.edu!das-news2.harvard.edu!news2.near.net!MathWorks.Com!yeshua.marcam.com!charnel.ecst.csuchico.edu!olivea!trib.apple.com!amd!netcomsv!netcomsv!netcom.com!alvin
From: alvin@netcom.com (Alvin H. White)
Subject: Multi-Lingual Singing Teacher's Computer Phone
Message-ID: <alvinCxL2tL.Ito@netcom.com>
Summary: Computer Synthesized MIDI Speech Organ for Singing in Tongues
Keywords: SMPTE Synchronized Bi-Lingual Polyglot Machine Translated Speech
Organization: NETCOM On-line Communication Services (408 261-4700 guest)
Date: Wed, 12 Oct 1994 23:34:32 GMT
Lines: 244


I saw this posted over in sci.electronics. It seems like the 
basic kind of building block information coding and explaination
that we Internet users need to develope if we are going to
be able to get our computers to sing all the languages on Earth.

I am going to post it to: sci.lang, comp.speech, comp.music,
comp.dsp and comp.sys.ibm.pc.soundcard.tech. 

If we could take the phoneme data and find out what is missing,
sample that, code how to modify it if singing with a particular
note on the piano, and create a program that would adjust the 
length, duration, of each phone, especially the vowels then 
they could be strung together to get the computer to speak/sing
at a well defined starting time, stopping time, rate of speed and
pitch. -alvin

On another subject. I think that conditions are near right for
someone to make a card that is a modem, speech recognition/synthesis,
recorder, dictation/transcription and MIDI music machine all in one. 
Maybe it could be called a phone card or simlpy a phone.

I wish one would get here for Christmas.
==================================================================

Xref: netcom.com sci.electronics:98035
Newsgroups: sci.electronics
Path: netcom.com!netcomsv!decwrl!src.dec.com!pa.dec.com!decuac.dec.com!haven.umd.edu!news.umbc.edu!europa.eng.gtefsd.com!howland.reston.ans.net!pipex!uunet!hobbes!earth.armory.com!rstevew
From: rstevew@armory.com (Richard Steven Walz)
Subject: Re: Looking for speech chip
Cc: blitzer@vax1.mankato.msus.edu
Organization: The Armory
Date: Tue, 11 Oct 1994 11:14:15 GMT
Message-ID: <CxI9vt.26L@armory.com>
References: <1994Oct6.214846.1@vax1.mankato.msus.edu>
Sender: news@armory.com (Usenet News)
Nntp-Posting-Host: deepthought.armory.com
Lines: 182

In article <1994Oct6.214846.1@vax1.mankato.msus.edu>,
 <blitzer@vax1.mankato.msus.edu> wrote:
>I am looking for a speech synthesis chip that can produce speech by addressing
>the phonones that humans make when speaking the english language. I know 
>Microchip use to make such a chip, but I have been told that they no longer
>produce it. I wish to use this chip in my senior design project for college. 
>
>If anyone knows where I can find this chip or anything similiar I would
>greatly
>appreciate your help.
>
>Lee E. Myers  (Blitzer@vax1.mankato.msus.edu)
>Mankato State University
>Mankato, MN
-------------------------------------
They are called phonemes or allophones, (Votrax vs SPO256-AL2). They are
listed in a file I typed in out of a book on the Votrax. It is useful
without the chip because a state machine can simply emulate this process
in combined EPROM or EEPROM data and address logic and you can do what we
do best now for speech reproduction, we SAMPLE IT!!! Imagine making a
device that can read ASCII in software and, using phonics rules and
exclusion tables of words in English, can speak the file as YOU would READ
it!!! This can be done by sampling your voice and extracting allophones by
some simple digital editing of PCM modulation or CVDS modulation. The thing
I will give you in this file, though, is the anticipated length of these
phonemes in proper speech in milliseconds!!!! Here it is!!!:
-Steve Walz   rstevew@armory.com
-----------------------------------
Unless someone wants to pay you a lot for a museum, THIS IS A KEEPER!
This is an early Votrax, phoneme synthesizer. As I recall it used nearly
the same or exactly the same phonemes and phoneme codes that the GI
SPO-256 used to generate sound. Phonemes are vowel/consonant or consonant/
vowel and just vowel and just consonant pieces of speech which codes need
to be passed to it one by one and the chip has a busy line and a strobe
line to it to handshake for the relatively long time (computer-wise) it
needs to say them and ask for another. It uses 64 codes, thus 6 bit words
to key the allophones, and it might, but I do not know for sure, have the
other two bits reserved for pitch or intonation. Direct your attention to
Steve Ciarcia (the guy who write howto articles for BYTE mag. or hunt up
his company, MicroMint), from which the data would be available. The
software is trivial and can be implemented in BASIC, as it only needs to
send a string of 6 or 8 bit values, which is what an LPT port is good for!
Just figure out whether your port is 03BC hex if you have your printer port
on an IBM video card, or whether it is located at 0378 hex or 0278, for
COM1 or COM2 if you have no LPT on the video card. The video card just
bumps the logical names up by one number to LPT1,2,3 if you have one rather
than just the last two addresses, LPT1,2. To control the strobe and read
the busy lines, you must read the highest bit on port number + 1 to detect
a BUSY ON as a low, or 0 meaning don't send another byte yet, I'm not done!
The strobe line might be needed, see if pin one is implemented from the
parallel port connector. If you need a strobe* which tells the chip that
it's ready for the data on the eight lines to be read now, then just output
a decimal 0 to the port number + 2 to send a low strobe*. Sounds like the
thing gets its power from the port, but I don't know where. Likely one of
the output control lines, autofeed, initialize, or select, which are known
for their open collector pulled up nature. Unless it has a supply with it,
look carefully. The amp might require a bit of juice, unless it's line
level, 100-500 mVolts p-p. Then you'll need your stereo. But in truth, the
thing should be pretty easy to 'suss out. The data lines are there, and the
one going to line 11 is the BUSY line, probably the latch is a 74LS374 or
such, and it holds the byte and gets the strobe, which comes from pin 1 on
the printer port connector. It sound's like the rest of it is all ready to
go. It's an output only device, man, you can't blow it up with simple OUT's
and INP's from the port of your choice, not unless you haven't checked the
cable and such for shorts or the board for wierdness. Just GO FOR IT! I
say! (All disclaimers apply here of course!:) ), but that's what I'd do,
because that's all that chip was or any of the Votrax chips.

Here, I just found the pinout and the phoneme set for you in the book, "The
IBM PC Connection" by Coffron.

The pinout of the Votrax:
1   Vp  nominally +12VDC   (7-14 VDC),  bypass cap to gnd .1 uF
2   I2 high bit of pitch (D7)
3   I1 low bit of pitch  (D6)
4   NC   no-connect
5   TP3 factory test point 3
6   TP2 factory test point 2
7   STB   strobe, active high
8   A/R   acknowledge/request (more data)
9   P5  high bit of phoneme (D5)
10  P4   (D4)
11  P3   (D3)
12  P2   (D2)
13  P1   (D1)
14  P0  low bit of phoneme (D0)
15  MCX  Master Clock eXternal
         15 & 16 usually tied together, pulled up to 12 VDC with 6.8 ->
         ohms and using a 330 pF cap to ground to clock chip at 720 kHz.
16  MCRC Master Clock - resistor/capacitor
17  TP1 factory test point 1
18  Vg   ground
19  NC   no-connect
20  CB   Current source for class B amp output
21  AF   Audio Feedback, output for class A amps
22  AO   Audio Out, 150 pF and 10K ohms in line to amp (741 or even LM386)

Now, understand that your latch is more likely an open collector latch and
voltage level shifter with pull up resistors, as this chip runs on and
expects 12 volt everything! So you DO have a twelve volt power supply or
else you must steal it from the IBM supply, very do-able. But the chip
won't work correctly with TTL inputs. Likewise with the strobe and ack/req
line, they must be level shifted with an open collector with pull ups and
the return ack/req line is voltage limited from the SC-01 chip with a
durable buffer, as the chip's output is not great enough to harm it.

As for the phoneme/allophone set:
hex dec  sym  as in word  duration      hex dec  sym  as in word  duration
00   0   EH3   jacket<-   59ms          20  32   A       day      185ms
01   1   EH2 ->enlist     71ms          21  33   AY      day       65ms
02   2   EH1    heavy    121ms          22  34   Y1     yard       80ms
03   3   PA0 (no sound    47ms)         23  35   UH3 mission<-     47ms
04   4   DT    butter     47ms          24  36   AH      mop      250ms
05   5   A2      made     71ms          25  37   P      past      103ms
06   6   A1      made    103ms          26  38   O      cold      185ms
07   7   ZH     azure     90ms          27  39   I       pin      185ms
08   8   AH2 ->honest     71ms          28  40   U      move      185ms
09   9   I3      bit<-    55ms          29  41   Y       any<-    103ms
0A  10   I2     ->in      80ms          2A  42   T       tap       71ms
0B  11   I1      hid     121ms          2B  43   R       red       90ms
0C  12   M       mat     103ms          2C  44   E      meet      185ms
0D  13   N       sun      80ms          2D  45   W       win       80ms
0E  14   B       bag      71ms          2E  46   AE      dad      185ms
0F  15   V       van      71ms          2F  47   AE1   after      103ms
10  16   CH*    chip      71ms          30  48   AW2   salty       90ms
11  17   SH     shop     121ms          31  49   UH2 ->about       71ms
12  18   Z       zoo      71ms          32  50   UH1 ->uncle      103ms
13  19   AW1  lawful     146ms          33  51   UH      cup      185ms
14  20   NG    thing     121ms          34  52   O2      for       80ms
15  21   AH1  father     146ms          35  53   O1    board      121ms
16  22   OO1 looking     103ms          36  54   IU    ->you       59ms
17  23   OO     book     185ms          37  55   U1      you<-     90ms
18  24   L      land     103ms          38  56   THV   ->the       80ms
19  25   K     trick      80ms          39  57   TH   ->thin       71ms
1A  26   J*    judge      47ms          3A  58   ER     bird      146ms
1B  27   H     hello      71ms          3B  59   EH      get      185ms
1C  28   G       get      71ms          3C  60   E1       be      121ms
1D  29   F      fast     103ms          3D  61   AW     call      253ms
1E  30   D      paid      55ms          3E  62   PA1 (no sound    185ms)
1F  31   S      pass      90ms          3F  63   STOP (no sound end 47ms)

     * /T/ must precede /CH/ to make CH sound!!
     * /D/ must precede /J/ to make J sound!!

And this phoneme set is NOT the same as the SPO-256 set. I get them
confused sometimes because I have used both extensively in the early
80's. I think the Votrax phonemes are nice because you can add the three
high values to a byte for pitch, giving it a human quality. The same can be
rigged on the SPO-256 by controlling the clock speed with a clock switching
gate, but this has it built in. I do, however, think that allophones are
more correct than phonemes for a host of uses, even though careful and
laborious use of longer strings of phonemes will work in more languages.

There you go!
Steve Walz    rstevew@deeptht.armory.com    {That's deepTHOUGHT!:)}
------------------------------------------
There, now you can have a state machine feeding addresses to counters for
the length of the phonemes, and then you can digitize your voice with a
simple ADC circuit into your parallel port, and then write a simple tool
to allow you to view and edit the recorded words and edit the byte strings
out of them, cropping them to the right length till you fit them together
into words!! I bet you can do it on two to three EPROMS and a few counters
and glue chips! I have seen it done with the numbers from one to ten, and
we had it working and saying "-teen" after the first 9 that same night!!!
It's up to you to perfect it and develop some good quality filters and make
it sound like you!!! With eight bit amplitude resolution it should still
fool even your family!!! And remember that the codes for the phonemes are
just six bits, so use the others for varying the clock rate with either a
shift or a programmed envelope!!! And you'll have it saying, "I'm sorry,
Dave, I can't do that.", in NO TIME at ALL!!!

EMAIL me for more hints and CVSD info as well. There are some very good
methods of encoding voice on EPROMS or EEPROMS. I know that ISD has that
talking chip, but it just records sentences. This can talk in your own
voice or another's and can say anything, or can read you your email!!!
It can read stories to the blind, or become your computer's prompt voice.
Then you just have to shoot for speech rcognition for your next degree!!!:)
There is also some info about the exclusion table on an ftp site, and it's
the one used by GI in their CTS256-AL2, ASCII to allophone code converter!
The info is public!!!
-Steve Walz   rstevew@armory.com

===========================================================================
alvin@netcom.COM

Alvin H. White, Gen. Sect.
G.O.D.S.B.R.A.I.N.
P.O.Box 26745
San Jose, CA 95159-6745 USA

(408) 446-1770 

Government Online Database Systems
Bureau for Resource Allocations to Information Networks
[an idea waiting to happen]      .
                                 U   Ohm's Law [Early Version]?
38 North 120 West                3~

Universe Musicum Omnium Colloquium
Om Mani Padme Hum
Oh Man! He Paid Me t'Hum!

-- 

alvin@netcom.com
