Newsgroups: comp.speech
Path: cantaloupe.srv.cs.cmu.edu!das-news2.harvard.edu!news2.near.net!howland.reston.ans.net!gatech!newsfeed.pitt.edu!uunet!zib-berlin.de!gs.dfn.de!fauern!rrze.uni-erlangen.de!hub-n.franken.de!ark.franken.de!ralf
From: ralf@ark.franken.de (Ralf W. Stephan)
Subject: ANNOUNCE: ears-0.11 released
Message-ID: <1994Dec13.093529.10309@ark.franken.de>
Organization: his desk writing an article
Date: Tue, 13 Dec 1994 09:35:29 GMT
X-Newsreader: TIN [version 1.2 PL2]
Lines: 185

  Hi all,

the first public version of the EARS package has just been uploaded
to svr-ftp.eng.cam.ac.uk:/Inbox and will soon be moved to
/pub/comp.speech/sources.

As this is the very first release, the README is appended below.
New versions can be found in the sources directory under the
name 'ears-<version>.tar.gz'.  Thanks to Tony Robinson for giving
the package a home.

Have fun,
ralf

    ---------------------------------------------------------
                            EARS
    (something like 'Easy Automatic Recognition of Speech...)
    ---------------------------------------------------------

The EARS package is intended as a limited ready-to-use single word 
recognizer for Linux systems.  However, its design already aims at
being a host for all kinds of methods used in speech recognition (SR).

The ALPHA versions of the package consist of the following programs:

  train_ears - to speak a list of words and train the recognizer

  listen     - listens to your mic and, when a word is seen, shows
               the corresponding keyboard action

  (in beta versions, there will also be the 'ears' program itself
   which acts like 'listen' but additionally drives a shell.)


The primary server for EARS source code is svr-ftp.eng.cam.ac.uk
under /pub/comp.speech/sources.  Main versions can be found also
on sunsite.unc.edu (check mirrors first!) in ...apps/sound/speech/...

What you need
-------------
Preferably a Linux system with soundcard/mic.  It should be possible
to use EARS on other machines running the Voxware sound driver from
Hannu Savolainen and having GCC.  MS-DOS users are recommended to
install Linux (believe me, it pays in the long run).

What you get
------------
A SR program you can play with and add your own source code to it.
Isn't that nice?  ;)

What is EARS?
-------------
First, let us look at what is available now for free.  There are some 
packages that include functions for processing speech, but you have to have
an understanding of the functions and you must record training samples
and write a recognizer program yourself to see them work.  EARS does this
for you.

As there are many possible methods for processing speech, EARS gives
you the power to just say, take this feature extractor and that
recognizing method, and this list of words.  Then you speak the words
you want to be recognized later.  Your utterances can be saved to 
RIFF WAV files so you may inspect, change or delete them before 
they are further processed to the pattern files on which the
recognizer is finally trained.

New methods for single word recognition are integrated easily, since
EARS uses C++ abstract base classes to process speech and thus is 
designed to be a general-purpose wrapper for SR methods.

What is implemented now
-----------------------
As of this alpha release, the only feature extractor is Rasta-PLP.
However, getting LPC, PLP or Mel-Cepstrum to work requires only half
a dozen additional lines of code, thanks to the OGI tool library 
that is included with EARS.  Source code for the library can be
found for example on ftp.cs.tu-berlin.de:/pub/sci/speech/ogi.

The recognizer that works for now, but soon will be replaced by a
better technique, is Dynamic Time Warp (DTW), thanks to Dr. Robinson's
Cookbook.

Install
-------
EARS will untar into a directory 'ears-<version>', for example
  $ tar xvfz ears-0.11.tar.gz
untars into ./ears-0.11/*

Then cd to that dir and do a 'make'.  Then, depending on your habits, 
get the usual quantity of stimulant.  It's not really necessary to 
'make install' if you just want to play with the program.

Example session
---------------
Start by running train_ears!

  $ ./train_ears
  --> Ears did not detect its directories where data is stored.
      Do you want to create $(HOME)/.ears ? [Y/n] 
  --> Do you want to use the default word list (digits 0-9)? [Y/n]  
  --> There are new words.  Do you want to record them? [Y/n] 
  Measuring noise level.  Please be silent.

  Please speak: Five    OK.
  Please speak: Four    OK.
  [...]

I think you get the idea...
Then start the program 'listen'.  After again measuring the noise level,
you can check if your spoken digits are recognized.  Please wait until
a word is classified, then speak the next one.  You can stop the 'listen' 
program by typing Control-C.

Now try something different:  copy 'alphabet.words' from the distribution
into your $(HOME)/.ears directory, and start 'train_ears' with

$ ./train_ears -b alphabet
  --> There are new words.  Do you want to record them? [Y/n] 
  Measuring noise level.  Please be silent.

  Please speak: charlie    OK.
  Please speak: x-ray    OK.
  [...]

Then do:
$ ./listen -b alphabet

Now when you say 'hotel echo lima lima oscar whyskey oscar romeo lima delta'
you should get 'helloworld'  ;)  New words are easily added by either
editing a given .words file or creating a new one.  Don't forget to
train_ears after you increased the word list.

Understanding EARS
------------------
I have tried to outline the source in doc/implementation.txt.
After grokking that, it should be easier for you to read the code, 
which, unfortunately, isn't as much commented as I would like.

Improving EARS
--------------
Apart from improvements of the data handling, EARS needs to have the
following things to make it useful for method comparison:

- more feature extractors
- more, better and faster recognizers.  I guess there are dozens of
  methods that could be implemented, for example HMMs, TDNNs, all
  kind of non-recurrent and recurrent neural nets, wavelets etc.
- a program for cross-validation that gives error rates for specific
  feature extractor/recognizer combinations.

And, of course, we need the 'ears' program to drive a shell.  This
later has to have the ability to take back erroneous inputs, and all
things one needs to work with it, like showing of alternative
interpretations, learning while using, and so on.

Acknowledgements
----------------
I'm yet just another one standing on the shoulders of giants.
Without the following people, EARS wouldn't be possible:

- Linus Torvalds
- all people who made Linux to what it is
- the OGI people for their excellent toolkit
- Tilo Schuerer for librecog and his patience
- Tony Robinson for his cookbook and the archive in Cambridge

Thanks to all people who helped with the project:

- Tilman Enss adapted Makefile to GNU standard; fixed bugs;
  made German message catalog.

Contributing
------------
Changes to the source code should be mailed as diffs to ralf@ark.franken.de
I prefer unified diffs (use diff -u) as they are well readable.
Please include the version number the diffs are made against!

Many thanks in advance for your contribution,
ralf@ark.franken.de (Ralf W. Stephan)




--
You just began to read the sig and you have finished it now.
