
	    (README file last updated on:  April 9, 1994)

	WELCOME TO THE LINGUISTIC DATA CONSORTIUM'S FTP SOURCE

In this directory, you will find information about the organization,
goals, and data offerings of the LDC, as well as an explanation of
how to become a member of the LDC, and what membership means. The
LDC also makes databases available to non-members on a per-copy basis,
and has some data that is freely available.

The LDC creates, collects and distributes speech and text databases,
lexicons, and other resources, in support of research and development
in computer-based linguistic technologies.

The LDC mainly distributes its resources on CD-ROM, since this medium
offers high density, durability, random access, and reliable
cross-platform portability, all at fairly low cost for replication and
use.

The documents and data currently available in this directory are
listed below.  They will be updated, and more documents will be added,
as time goes by.

* ldc_intro -- a paper that explains the background and goals of
        the LDC.

* ldc_catalog.txt -- a catalog of resources available from the LDC,
        with a short description of each one.

* price_list -- a listing of all currently available and some planned CD-ROMs,
        with a description of the terms for providing  them to members,
	and the individual purchase prices for non-members.

* newsletters/vol(i).issue(j).txt -- the `newsletters' directory holds
	files containing the contents of each LDC newsletter published
	so far, providing articles that describe current and planned
	corpora in greater detail.

* mount.c.src -- a copy of source code for a UNIX program to mount an
	ISO 9660 cdrom (a.k.a "hsfs" - the format used for all LDC
	data on cdrom); this is a Gnu product available from numerous
	other sources.

* pb.data.tar.Z -- the complete formant data from Peterson and Barney
	(1952) "Control Methods Used in a Study of the Vowels"; JASA
	24.175-184; as restored and verified by Ray Watrous and
	described in his letter to JASA in the May 1991 issue
	(89.2459-2460).  Note: this is a compressed UNIX tar file;
	extract the data using "zcat pb.data.tar.Z | tar xf -" (or, if
	you can't do this, send e-mail to graff@chestnut.ling.upenn.edu)

* speech_sampler.tar.Z -- a sampling of waveform and associated data
	from ten of the LDC's first-year speech corpora (Switchboard,
	Road Rally, TI-46, TIDIGITS, Resource Management, ATIS-0, Map
	Task, CSR-0, TIMIT and NTIMIT).  It includes detailed
	descriptions of data formats, and scripts for use with the
	"xwaves" software available from Entropics, Inc.

* celex* -- these items provide information about the CELEX lexicons.

Some of the items listed above are present in one or both of two
forms: a postscript file (*.ps, or in compressed form, *.ps.Z), and/or
an uncompressed ASCII file (*.txt); if you would like hard-copy of the
postscript versions mailed to you, please contact:

	Judith Storniolo
	441 Williams Hall
	University of Pennsylvania
	Philadelphia, PA 19104-6305
	Phone:   (215) 898-0464
	Fax:     (215) 573-2175
	e-mail:  storniol@walnut.ling.upenn.edu
