Newsgroups: comp.speech
Path: cantaloupe.srv.cs.cmu.edu!nntp.club.cc.cmu.edu!newsfeed.pitt.edu!portc02.blue.aol.com!cpk-news-hub1.bbnplanet.com!news.bbnplanet.com!ix.netcom.com!park
From: park@netcom.com (Bill Park)
Subject: Re: Kudos and Questions (was Re: An Open Letter to
	Jim and Janet Baker and the Dragon Gang.)
Message-ID: <parkED9Lw3.Izv@netcom.com>
Followup-To: comp.speech
Keywords: Nuance Schwab Jabra bone conduction shroud telephone answering machine plug in plug-in plugin noise cancellation deferred recognition
Organization: Netcom
References: <33C56DB5.7DAA@clark.net> <01bc8e16$e56ecd80$953aa5cc@davidp> <5qa0ql$1ic$1@nntp1.ba.best.com>
Date: Sun, 13 Jul 1997 16:02:27 GMT
Lines: 81
Sender: park@netcom2.netcom.com

In article <5qa0ql$1ic$1@nntp1.ba.best.com>,
Michael M. Butler <butler@comp*lib.org> wrote:

> However, once again I have to ask: what does it take to
> get support of deferred recognition? I don't want to lug a
> Pentium box with GUI around all day, and I have a hard
> time believing your QA folks haven't got a way to replay
> input as the software is revised.  I'm still waiting for
> an answer. Digital Dictate understands what I want, and
> their price is comparable to yours. But they want me to
> buy a special M-O disc recorder that's absurdly priced.
> MMB

Sounds as if you mean, "deferred until the end of the day."
So, why not use a pocket tape recorder?  Play the tape
directly into the audio input port of your computer when you
get back to it.  You could also easily review your
recordings during the day, correct them, and insert
additional notes (leave some blank space between topics for
insertions).

I don't know if an inexpensive recorder would work well
enough, but I understand that even the high-end Sony digital
stereo models only cost on the order of $600.  At the other
end of the scale, some recognition systems, such as the one
Nuance Communications (Menlo Park, CA) built for Schwab's
call-in stock-quoting service, can recognize speech over a
telephone.  So, if you have a telephone-capable system, you
should be able to call your office and speak your notes to
your answering machine for playback and recognition when you
return.  Or your secretary could handle it before you get
back (assuming he/she can't simply take dictation directly!)
Of course, you might have your recognition system answer the
phone and record on disc or immediately recognize your
speech, too.

If ambient noise is a problem when you are recording, use a
noise-canceling microphone.  If that's not good enough,
perhaps you could buy or improvise a sound-insulating shroud
like the ones that used to be used by court reporters during
trials. They pressed the open end of the shroud against
their face, right over their mouth, so that no one in the
courtroom could hear them dictating their stories, and so
that the judge wouldn't eject them for making noise. It also
kept what was being said by others in the courtroom from
being recorded, to avoid confusing the transcriptionist
later.  A shroud designed to fit onto the mouthpiece of a
telephone handset might improve telephone recgnition, too.

A shroud might be impractical, though -- peculiar-looking
and you wouldn't want to stop every now and then during a
business meeting to dictate "secret" notes that the client
couldn't hear.  It would sure draw attention to yourself,
too, if you used it in a public place.  So a bone-conduction
microphone of the type used in aircraft/helicopter cockpits
or the contemporary Jabra(tm) in-ear mike/earphone might be
a better choice.

You might have to retrain your recognition system completely
to use one of these recording techniques successfully.  I
should think that it might provide a marketing advantage for
a speech recognition system vendor if they offered
ready-made acoustic-to-phoneme transformation plug-in
modules for their products that were tuned for tape
recorders, shrouds, noise-canceling mikes, bone conduction
mikes, telephones, etc., as options -- perhaps extra-cost
options.

By the way, are there any speech recognition systems that
take advantage of stereo to improve recognition?  Seems like
a fairly obvious idea, given the ubiquity of stereo sound
hardware. You wouldn't be trying to locate where a sound
came from (the thrust of a lot of research with dummy
heads), but rather to hear sounds from the speaker's nearby
mouth more accurately. So there might be a comparatively
simple, general, licensable solution that you could
implement in a digital signal processor.  But that's another
thread.

Bill Park
=========
