Newsgroups: comp.speech
Path: cantaloupe.srv.cs.cmu.edu!rochester!udel!gatech!swrinde!pipex!uunet!psinntp!sunsrvr6!adw
From: adw@cci.com (Derrick Williams)
Subject: Speaker Independent Developer's Kits?
Message-ID: <D7LLEu.GJC@sunsrvr6.cci.com>
Sender: root@sunsrvr6.cci.com (Operator)
Organization: Northern Telecom, Network Application Systems
Date: Tue, 25 Apr 1995 15:52:05 GMT
Lines: 73


 Hello! 

 I've been following this group for some time, and I'm interested in a
developer's kit for a speaker independent voice recognition system. My
application is to help hard of hearing students to use the telephone.

 "hard of hearing" as opposed to "deaf" usually refers to people who have
some hearing (often with the help of hearing aids), but because of nerve
damage and so forth, can't distingush speech without clues, such as lipreading.

 Telephones pose an interesting challenge, as there is no clues other than
sound. Many of today's systems can distinguish sounds better than many
hard of hearing people, and my group has several ideas of implementing a
hints based system.

 Total accuracy isn't really important, although it would be nice :-). 
Our studies have shown that most students, when given a sentence to listen
to, and a card with the sentence spelled out in syllables, understood the
sentence perfectly. When given one sentence, and several cards, almost
everyone picked the correct one. Students performed much better with
cards with distorted syllables, missing syllables, misleading syllables,
and extra syllables, than given no syllables at all. It seems to follow
that "hard of hearing" people can't "hear" so much as "distinguish". I've
glossed over a lot of the details: some students got lost with cards
frequently, but benefited when each syllable was pointed out in turn, and
the stress was not in comprehension, but rather "listening in contex"
exercises.

 Well, getting back to the point, my group has some extra money to try
to implement these ideas in a real-time system to see if we can help
students use telephones. I know about $500 can be spent, but as much as
$1000 could be earmarked if the technology looks promising. If the 
experiment works out well, some educational software could be developed
(ie, "Talking to the Bank", what kinds of words do banks use? Here's 
what they sound like, and let's practice listening for them, let's practice
feedback techniques to the bank officer, etc...) and polishing it up as
technology improves.

 I have a 486DX2 66 to use, and currently, I have my eye on IBM's 
Continuous Speech Series (ICSS) (not to be confused with VoiceType, a
different product). However, I have heard mention of Speech System's
Phonetic Engine 400 and 500, as well as Kurtzweil and other products.

 Although ICSS is a structured speech system, we hope it will be accurate
enough to pick out syllables, or syllables that kind of sound like the
correct one, given the several thousand we have collected from transcripts
(closed captioning data from TV shows!) The developer's kit is attractive,
since it will allow us to experiment with several techniques that we have
thought up, or will think up in the future.

 The biggest plus for ICSS is that it only cost $400, and we can use the
rest of the money to buy memory, soundcards, microphones, and so forth.
However, none of us have any first hand experience with ICSS. The few
product reviews we've seen speak of ICSS in glowing terms, but again 
it's not clear if ICSS will fit in with what we want to do. On the other
hand, ICSS works under OS/2, which gives a possibility of having 
multi-threaded applications that will print out "here's what we have now"
data, and in a few seconds later, print out "here's what we think it was 
after having more time to mull it over" data.

 Well, this post is long and mabye not quite to the point, but any 
experiences with ICSS, SS PE400 (or PE500) or any other products with
a developer's kit would be greatly appreciated. My request may be a bit
unusual, since the application is intended to help speech recognition for
humans, rather than the machine itself!

 
                                               Thanks again,


                                                      Derrick

