American language standardized dictionary As an aid to those involved in natural language parsing, dictionary compression, or textual encryption, I have been collecting and compiling a lengthy list of words. It is expected that a comprehensive standardized dictionary will eventually result. This dictionary should contain most common American words, abbreviations, hyphenations, and even incorrect spellings. An anonymous ftp server has been built on wocket.vantage.gte.com which contains the following files in the pub/standard_dictionary directory: words bytes -r--r--r-- 8552448 Jan 28 12:00 dic-0194.tar -r--r--r-- 4058075 Jan 28 12:02 dic-0194.tar.Z -r--r--r-- 8880128 Feb 24 10:39 dic-0294.tar -r--r--r-- 4220442 Feb 24 10:41 dic-0294.tar.Z -r--r--r-- 1269760 Aug 16 1993 dic-0893.tar -r--r--r-- 523393 Aug 16 1993 dic-0893.tar.Z -r--r--r-- 421239 Aug 16 1993 dic-0893.zip -r--r--r-- 3186688 Sep 17 08:26 dic-0993.tar -r--r--r-- 1503561 Sep 17 09:27 dic-0993.tar.Z -r--r--r-- 7479296 Oct 26 17:29 dic-1093.tar -r--r--r-- 3516519 Oct 26 17:32 dic-1093.tar.Z -r--r--r-- 8273920 Dec 17 11:58 dic-1293.tar -r--r--r-- 3918385 Dec 17 11:59 dic-1293.tar.Z -r--r--r-- 1022 4088 Feb 24 10:37 length02.txt -r--r--r-- 21225 106125 Feb 24 10:37 length03.txt -r--r--r-- 52657 315940 Feb 24 10:37 length04.txt -r--r--r-- 83336 583349 Feb 24 10:37 length05.txt -r--r--r-- 113449 907655 Feb 24 10:37 length06.txt -r--r--r-- 123546 1111907 Feb 24 10:37 length07.txt -r--r--r-- 134549 1345480 Feb 24 10:37 length08.txt -r--r--r-- 94474 1039205 Feb 24 10:37 length09.txt -r--r--r-- 73793 885502 Feb 24 10:37 length10.txt -r--r--r-- 55147 716900 Feb 24 10:37 length11.txt -r--r--r-- 39799 557185 Feb 24 10:37 length12.txt -r--r--r-- 26870 403037 Feb 24 10:37 length13.txt -r--r--r-- 17801 284816 Feb 24 10:37 length14.txt -r--r--r-- 11525 195925 Feb 24 10:37 length15.txt -r--r--r-- 7228 130104 Feb 24 10:37 length16.txt -r--r--r-- 4559 86621 Feb 24 10:37 length17.txt -r--r--r-- 2894 57880 Feb 24 10:37 length18.txt -r--r--r-- 1871 39291 Feb 24 10:37 length19.txt -r--r--r-- 1196 26312 Feb 24 10:37 length20.txt -r--r--r-- 784 18032 Feb 24 10:37 length21.txt -r--r--r-- 562 13488 Feb 24 10:37 length22.txt -r--r--r-- 363 9075 Feb 24 10:37 length23.txt -r--r--r-- 240 6240 Feb 24 10:37 length24.txt -r--r--r-- 160 4320 Feb 24 10:37 length25.txt -r--r--r-- 106 2968 Feb 24 10:37 length26.txt -r--r--r-- 70 2030 Feb 24 10:37 length27.txt -r--r--r-- 1 30 Feb 24 10:37 length28.txt -r--r--r-- 0 0 Feb 24 10:37 length29.txt -r--r--r-- 0 0 Feb 24 10:37 length30.txt -r--r--r-- 0 0 Feb 24 10:37 length31.txt -r--r--r-- 1 34 Feb 24 10:37 length32.txt 869228 8853539 total -r--r--r-- 11521 Aug 13 1993 tarread.com The most recent compilation being dic-0294.tar is composed of the 31 text files and may be restored on an MS-DOS computer using the tarread.com utility program. Any words for inclusion in future dictionaries should be submitted to my E-Mail address directly or placed in the /pub/incoming directory. Please compare your dictionaries with standard Unix 'words' and submit only the differences. Many thanks to those that have submitted the 32,000 words during the last month. Take care. - Sig Sigurd P. Crossland Advanced Technology Lab Telephone: (703) 818-8504 GTE Facsimile: (703) 802-3110 15000 Conference Center Drive Internet: sig@seuss.vantage.gte.com Chantilly, VA 22021 Home: (703) 818-8942