
Main Readme file for the Entropics/LDC Speech Corpus Sampler

This directory tree contains samples from ten speech corpora that are
available on cd-rom from the Lingusitics Data Consortium (LDC), in
particular:

	atis0
	csr
	maptask
	ntimit
	rdrally
	rm1
	swb
	ti46
	tidigits
	timit

The samples from each corpus named above are stored in a subdirectory
named accordingly.  Within each subdirectory, there is a readme file 
describing the data.  

Within each directory containing speech data, there is a script waves_display
that will display the data using xwaves (in the case of timit and ntimit, 
the transcriptions are also displayed using xlabel).  In some cases, 
conversions are done prior to display and write permission is needed.  For 
details about what is going on, please look at the individual scripts. 

Note that, depending on the type of workstation in use, you may need
to select the play program used by xwaves. (Select the
"play/record..."  button on the Miscellaneous control panel and then
"Select Play Program", or provide a suitable .wave_pro file.)

For further information on obtaining these and other speech data bases
from the LDC, please send e-mail to ldc@unagi.cis.upenn.edu, or call
(215) 898-0464.  Information is also available via anonymous ftp:

	Connect to:  ftp.cis.upenn.edu
	Go to directory:    pub/ldc

There are a variety of files providing information about data bases
and membership in the LDC.

