User Manual

                The Lotec Speech Recognition Package


=========== Description ===========

Lotec is a small-vocabulary, speaker-dependent, continuous speech,
low-quality word spotter.

It takes a speech input and infers what words are likely to be present
and where.

Compared to "normal" (eg HMM) speech recognition systems, Lotec

- Does not attempt to come up with a single interpretation: it outputs
  the lattice of word hypotheses as is.
  This means it can be suitable as a front-end to user-interfaces
  which can apply semantic constraints to interpret the input, etc.

- Incorporates no knowledge of syntax or semantics.
  This may make it suitable for handling noisy or fragmentary speech.

- Is simple and easy to modify.

- Has recognition results of low quality (to see what this means, go
  to the "sample" directory and do "showmatch batman.wh").

- Is suitable only for a small vocabulary (a dozen words, perhaps).


=========== Using Lotec: example ===========

 === first install lotec, as described in the file ANNOUNCE ===

 === second, create some templates ===
  cd  /tmp                  
  mkdir  raw
  cd  raw
  grab  hello-there.au                 ((use the mike to record sentences))
  grab  who-is-frankenstein.au
  grab  who-are-you.au
  labeler  *.au                        ((interactively label the sentences))
  cd ..
  mkdir  templates
  chopper  templates raw/*.au          ((chop the sentences into templates))
  feat  templates*.au                  ((parameterize the templates))

 === now, record an input and match the templates to it ===

  grab  are-you-frankenstein.au        ((record an input with the mike)
  feat  are-you-frankenstein.au
  match  are-you-frankenstein.fe templates/*.fe ((run the word spotter))
  showmatch  are-you-frankenstein.wh   ((view the word spotting results)

 === Notes === 
-  instead of grab you can use soundtool or x_soundtool. These are in
   /usr/demo/SOUND.  This may be better when recording under noisy conditions.
-  to see how well lotec did, you can label the input, eg, with
   "labeler are-you-frankenstein.au".  After you do this, showmatch will 
   indicate how well the match output corresponds to the correct answer.  
-  you will probably want to record several templates for each word, 
   preferably spoken with a variety of neighbor words.


=========== Details ===========

 === more about grab ===

Grab outputs a click when it is ready to listen to you speaking.  When
it detects that you have stopped speaking, it writes the speech sample
to the specified .au file, and echos back what it recorded.

Every 64 milliseconds it prints out ! or . to indicate whether it is
detecting sound or not.  This is mostly to entertain your eyes while
you are recording samples.

Grab expects a fair-sized chunk of sound.  If you just say a
single word, it may not notice it.  (This is not an unreasonable
limitation, given that, if you want a system to recognize words spoken
in isolation, using lotec is not such a good idea anyway.)


 === more about labeler ===

Labeler is a tool that lets you interactively assign labels for (ie,
specify which words are present in) one or more .au files.

The display shows the audio file contents with time on the x-axis.
The black horizontal bar indicates the "active region".  You should
adjust this region until it corresponds to a single word, then label
the region with the appropriate label (eg, "frankenstein").

Labeler is a keyboard-based system; the mouse is not used.  The list
of available commands appears in the xterm window from which labeler
was invoked.

Labeler's display is not always kept perfectly up to date; this is
done so that its response is not intolerably slow.


 === more about showmatch ===

Sometimes the display comes up blank; if so, type "r" to redraw the
display.

The x-axis is time and the y-axis is match quality: better matches
appear higher up.  The numbers represent the distance between the
template and the input; thus lower numbers indicate better matches.


=========== File Types ===========

Lotec uses several different representations of the same data.  The
filename extensions for these are:

.au - AUdio data in Sun format (with header)
.fe - FEaturized version of audio data, computed with filterbanks
.wh - Word Hypotheses (recognition result): what words are present
.la - LAbel file (veridical human-assigned labels): what words are present

The various commands will generally infer the appropriate extensions.
For example, you can say "grab lola" and it will expand the filename
to "lola.au".

Each .wh file entry consists of 
 1. template name
 2. the place where this template best matches the input 
       (ie, startpoint in frames from the start of the input)
 3. template length (in frames)
 4. the quality of the match at this place
       (lower numbers mean better matches)
 (note: one frame is 10 milliseconds.)


=========== Command Summary ===========

grab -- records a speech sample and writes it to a .au file

feat -- converts a .au file to a .fe file

match -- matches an input file against some templates (all in .fe
format) and writes the result as a .wh file

labeler -- lets you interactively create a word label (.la) file
corresponding to a .au file

chopper -- given an .au file for an utterance, reads the corresponding
.la file and outputs separate .au files for each word in the utterance

showmatch -- reads a .wh file and displays the contents; if an .la
file exists, also shows the words that actually were present

judge -- scores the quality of the word spotting, by comparing a .wh
with the corresponding .la file

real -- equivalent to grab | feat | match; operates in realtime.  

(most commands will say something helpful when given the -help option)


=========== For More Information ===========

Look at the file proman (the programmers manual) and read the code.


---
Nigel Ward
University of Tokyo
May 1994
---

