Announcing:
   
                The Lotec Speech Recognition Package


All that you need to build a single-speaker, small-vocabulary,
low-quality continuous speech recognition module, for use as part of a
larger system.

Input: a sound sample in Sun .au file format, plus word templates in
  the same format
	
Output: a bunch of word hypotheses, each consisting of temporal
  location and likelihood score (eg, `template 2 for the word "central"
  matched the input best in the time span from 450 to 830 milliseconds,
  and the match discrepancy was 405.32). (That is, it outputs a lattice
  of word hypotheses.)

Hardware: SUN SparcStation, decent microphone

Software: SunOS 4.1.2 (Unix plus the "multimedia" library files in
/usr/demo/SOUND), X, and if you want to compile, gcc (the gnu C compiler)


======= CONTENTS ======= 

"grab" helps you record speech samples.

"labeler" lets you interactively assign word labels to a speech sample.

"chopper" chops a speech sample file into files for each word.

"featurizer" converts a speech file to a parametric representation.

"match" does word spotting.

"real" is an online version of grab|featurize|match that runs in real
time on a Sun SparcStation 10.

Other goodies.


======= POLITICAL STATMENT =======

It's time for people to exploit a little speech input functionality in
all sorts of systems.  To do so, they shouldn't have to buy expensive
software, nor learn a lot about speech processing.


======= PERSONAL STATEMENT =======

I hate C.  I don't understand signal processing.  But, having no luck
trying to beg, borrow, or steal some simple speech software, I had no
choice but together my own package.  It's simplistic, but has served
at least to let me try out some ideas (on the integration of speech
and language processing).

I'm making it available in the hope that others will find it useful,
but have no time or inclination to support it.

======= BACKGROUND =======

Naively, you might expect speech recognition to be like a
stenographer: converting your speech to words.  But that's impossible
without human-type knowledge.  Using acoustic, phonetic, and lexical
knowledge only, all you can get is probabilities for what word is
where.  Most systems hide this fact, by searching thru the lattice of
word hypotheses to come up with one (or a few) best sentence
hypotheses.  This is fine if it hits on the right interpretation, but
if not, the downstream system is stuck.  So it's better for the speech
recognizer to output the entire lattice of word hypotheses (or so I
claim).  So, if you're interested in building systems which use a
speech input, Lotec may be for you.

Moreover, the low quality shouldn't bother you.  My rationale is this:
Sometime in the next century there will be recognition systems which
can extract useful information from uncooperative speakers with bad
microphones in noisy environments.  What will the output of these
systems be like?  Not very good; and impossible to make sense without
the application of semantic knowledge.  That is, the output of these
future systems will probably be similar in quality to the output of
Lotec today, with a single speaker in a normal room with a good
microphone and a small vocabulary.  This means that, if you're
interested in building systems that use the results of speech
recognition, you can use Lotec today to prototype systems that will
work well with 21st century speech recognition technology.


======= HOW TO GET IT =======

Lotec is available by anonymous ftp.  To get it, do something like
this:

ftp ftp.sanpo.t.u-tokyo.ac.jp

anonymous                  ((when it asks for Name))

xxx@yyy.zzz                ((or whatever, when it asks for Password))
                          
cd pub/nigel/lotec

get  lotec.tar.Z

quit                       ((exit ftp)

gunzip  lotec.tar.Z        ((or uncompress lotec.tar.Z)

tar xvf lotec.tar


Now, put lotec/bin in your path, and enjoy.

If ftp is slow for you, take lotec-no-bin.tar.Z instead; then compile
it by going to lotec/src, saying "make all", and waiting a couple of
minutes.
	
---
Nigel Ward
nigel@sanpo.t.u-tokyo.ac.jp
University of Tokyo
May 1994
---

