This directory contains the Census (AN4) database audio files. Some
files from the original database were excluded, namely those
with filenames starting with "cen9".

The AN4 database was recorded at Carnegie Mellon University circa
1991. For more detailes, please see "Acoustical and environmental
robustness in automatic speech recognition", by Alex Acero, published
by Kluwer Academic Publishers, 1993.

The files are in microsoft wav, sampled at 16kHz.

The directories contain:

-wav/train: training data set recorded on close talking microphone.

-wav/train: test data set recorded on close talking microphone.

-etc: directory containing the following:
      i.   an4_train.fileids: A list of training files. All filenames are
                              given assuming the "wav" directory as the root.
      ii.  an4_train.transcription: The transcriptions of the training files
      iii. an4_test.fileids: A list of the testing files to recognize.
                             Filenames are given assuming the "wav" directory
                             as root.
      iv.  an4_test.transcription: The transcription of the test files. Use
                                   these to score your recognition.
       v.  an4.vocab:  The list of all words in the training and test data.
      vi.  an4.phone: The list of phonemes for which HMMs must be learned.
           This includes "SIL", which represents silence.
     vii.  an4.dic:  The dictionary, which specifies the pronunciations of
           all the words in the vocabluary in terms of the phonemes.
    viii.  an4.trigramlm: The trigram language model which must be employed
           for recognition.
