Newsgroups: comp.speech
Path: pavo.csi.cam.ac.uk!doc.ic.ac.uk!agate!spool.mu.edu!olivea!sgigate.sgi.com!odin!twilight!zuni.esd.sgi.com!prophet.esd.sgi.com!gints
From: gints@prophet.esd.sgi.com (Gints Klimanis)
Subject: Re: Voice recognition : information?
Message-ID: <lh2oqes@zuni.esd.sgi.com>
Sender: news@zuni.esd.sgi.com (Net News)
Organization: Silicon Graphics, Inc.
References: <1993Nov2.173240.17766@topaz.ucq.edu.au> <1993Nov6.145616.17792@topaz.ucq.edu.au>
Date: Sat, 6 Nov 93 08:52:56 GMT
Lines: 191

From a fabulous source on svr-ftp.eng.cam.ac.uk, some text laboriously
compiled by Tony Robinson:

************************************************************************
******                   Studio Quality Speaker-Independent 
                          Connected-Digit Corpus
                                (TIDIGITS)

                                CD-ROM Set

                   NIST Speech Discs 4-1, 4-2, and 4-3
                              February, 1991


This three-disc set of CD-ROMs contains a corpus of speech which was designed
and collected at Texas Instruments (TI) for the purpose of "designing and
evaluating algorithms for speaker-independent recognition of connected digit
sequences."[1]  The corpus contains read utterances from 326 speakers (111 men,
114 women, 50 boys, and 51 girls) each speaking approximately* 77 digit
sequences and has been divided into test and training subsets. 

The digit sequences were made up of the digits: "zero", "oh", "one", "two",
"three", "four", "five", "six", "seven", "eight", and "nine".  The digit
sequences spoken by each speaker can be broken down as follows:

     22 isolated digits (2 productions of each of 11 digits)
     11 2-digit sequences
     11 3-digit sequences
     11 4-digit sequences
     11 5-digit sequences
     11 7-digit sequences
     --
     77

Detailed information on the design and collection of the corpus can be found in
the file, "tidigits.doc" in the "doc" subdirectory, which contains the original
TI documentation. 

The corpus has been reformatted for CD-ROM by the National Institute of
Standards and Technology (NIST) and is distributed with TI's permission.  The
speech waveform files have been converted to the NIST SPHERE format and have a
".wav" filename extension.  Because of its large size, the corpus has been
distributed over three CD-ROMs as follows:

  CD4-1 : Men and Women training utterances
  CD4-2 : Men and Women test utterances
  CD4-3 : Boys and Girls test and training utterances

Each disc contains identical copies of all documentation for the user's
convenience. 


CD-ROM File and Directory Structure:
-----------------------------------
The speech corpus is organized on the discs as follows:

FILESPEC ::= /tidigits/<USAGE>/<SPEAKER-TYPE>/<SPEAKER-ID>/
                                               <DIGIT-STRING><PRODUCTION>.wav

where,

     USAGE ::= test | train
     SPEAKER-TYPE ::= man | woman | boy | girl
     SPEAKER-ID ::= aa | ab | ac | ... | tc
     DIGIT-STRING ::= <DIGIT> | <DIGIT><DIGIT> | <DIGIT><DIGIT><DIGIT> |
                      <DIGIT><DIGIT><DIGIT><DIGIT> |
                      <DIGIT><DIGIT><DIGIT><DIGIT><DIGIT> |
                      <DIGIT><DIGIT><DIGIT><DIGIT><DIGIT><DIGIT><DIGIT>
     where,

          DIGIT ::= z | o | 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9 

     PRODUCTION ::= a | b

The digit codes in a filename indicate the digit sequence spoken and can be
decoded as follows:
     z -> zero          3 -> three          7 -> seven
     o -> oh            4 -> four           8 -> eight
     1 -> one           5 -> five           9 -> nine
     2 -> two           6 -> six

Note: two productions (a,b) were collected for all 11 single-digit strings and
two productions were randomly collected for a few multi-digit strings.

Example:
     /tidigits/train/man/fd/6z97za.wav

     "tidigits" corpus, training material, adult male, speaker code "fd",
     digit sequence "six zero nine seven zero", first production, NIST
     SPHERE file.

Example:
     /tidigits/test/woman/pf/1b.wav

     "tidigits" corpus, test material, adult female, speaker code "pf",
     digit sequence "one", second production, NIST SPHERE file.


Online Documentation 
-------------------- 
The following documentation files have been included on each CD-ROM and are
located in the directory, "/tidigits/doc":

     dialects.txt - dialect codes and description
     spkrinfo.txt - speaker codes and their attributes
     tidigits.doc - original TI documentation for the corpus


(* Note: 6 utterances have been removed from the corpus because they contained
   egregious speaking errors.)


References
----------
1.  Leonard, R. G., "A Database for Speaker-Independent Digit Recognition", 
    Proc. ICASSP 84, Vol. 3, p. 42.11, 1984.
    [Text is identical to that in the file, "tidigits.doc" in the "doc"
     subdirectory.]

 *********************************************************************

I'm posting this reply as I think it is of general interest and I'd
appreciate feedback for generating a FAQ answer.

Tony [Robinson]

*******************************************************************************

Good background to speech analysis with some speech recognition:

 Digital processing of speech signals; Lawrence R. Rabiner, Ronald W. Schafer.
 Englewood Cliffs; London: Prentice-Hall, 1978

 Voice and Speech Processing; T. W. Parsons.
 New York; McGraw Hill

General introduction books on speech recognition:

 Speech recognition by machine; W.A. Ainsworth
 London: Peregrinus on behalf of the Institution of Electrical Engineers, c1988

 Speech synthesis and recognition; J.N. Holmes
 Wokingham: Van Nostrand Reinhold, c1988

 Electronic speech recognition: techniques, technology and applications
 edited by Geoff Bristow,  London: Collins, 1986

A collection of papers I like which I think is both a good introduction and
a fair statement of stare-of-the-art is:

 Readings in speech recognition; edited by Alex Waibel & Kai-Fu Lee.
 San Mateo: Morgan Kaufmann, c1990

More specific books:

 Hidden Markov models for speech recognition; X.D. Huang, Y. Ariki, M.A. Jack.
 Edinburgh: Edinburgh University Press, c1990

 Automatic speech recognition: the development of the SPHINX system;
 by Kai-Fu Lee; Boston; London: Kluwer Academic, c1989

Major speech journals are:

 IEEE Speech Processing (from Jan 93)
 Computer Speech and Language, Academic Press

Major speech conferences are:

 ICASSP	    International Conference on Acoustics Speech and Signal Processing
 ICSLP	    International Conference on Spoken Language Processing
 EUROSPEECH European Conference on Speech Communication and Technology

From mfulsmb@uts.mcc.ac.uk Sun Nov 22 14:35:02 1992
From: mfulsmb@uts.mcc.ac.uk (Martin Barry)
Newsgroups: comp.speech
Subject: Re: Help!! Looking for the most recent published books or papers
Date: 20 Nov 92 10:29:35 GMT
Organization: Manchester University, UK.

Another good introductory book, perhaps less technical than the ones Tony
Robinson mentions (hi, Tony) is:

Douglas O'Shaughnessy -- Speech Communication: Human and Machine
(Addison Wesley series in Electrical Engineering: Digital Signal Processing,
1987).

Martin Barry,
Department of Linguistics,
University of Manchester. (e-mail: M.C.Barry@Manchester.Ac.UK)

***************************************************************************
