THE ALVEY NATURAL LANGUAGE TOOLS (RELEASE 4) BASIC DESCRIPTION AND DISTRIBUTION ARRANGEMENTS A fourth (and final) release of the Alvey Natural Language Tools (ANLT) is now available. The UK Alvey Programme originally funded three projects at the Universities of Cambridge, Edinburgh and Lancaster to provide tools for use in natural language processing research. The DTI and SERC has funded their continued support and enhancement. The tools, a MORPHOLOGICAL ANALYSER, PARSERS and a GRAMMAR and LEXICON, are usable individually as well as together (integrated by a GRAMMAR DEVELOPMENT ENVIRONMENT) forming a complete system for the morphological, syntactic and semantic analysis of a considerable subset of English. DISTRIBUTION AND LICENSING The ANLT system is available by anonymous FTP from Cambridge University, Computer Laboratory. The files containing grammars, lexicons and source code are encrypted, however, reports describing the system, specimen licence agreement and other information is not. If after examining the documentation, you wish to purchase a licence for use of the system for research purposes, you should complete and sign the specimen agreement and return it together with a cheque for the amount specified in the agreement (currently 500 ECU -- 100 ECU upgrade -- or local currency equivalent) to: Lynxvale WCIU Programs 20 Trumpington St. Cambridge, CB2 1QA, UK Fax: +223 332797 On receipt Lynxvale will send you (by letter) the key which can be used in conjunction with the software provided to decrypt the remaining files. If you do not have access to anonymous FTP, you can write to Lynxvale for further details and obtain the system on magnetic tape or cartridge. We are currently negotiating with Longman Group UK Ltd, who have an interest in the large lexicon, to provide a commercial licence for use of the ANLT system. A specimen commercial licence agreement will be deposited in the files shortly. DESCRIPTION The MORPHOLOGICAL ANALYSER provides a set of mechanisms for the analysis of complex word forms. The analyser requires data files specifying a lexicon of base morphemes, rules governing spelling changes when concatenating morphemes, and rules describing valid combinations of morphemes in complex words. The tools include a description of English morphology in this form. The analyser should be capable, though, when provided with the necessary linguistic analyses, of being used for most European languages and many others. The morphological analyser is now available independently of the rest of the tools package by anonymous FTP from scott.cogsci.ed.ac.uk [129.215.144.3]:/pub/phonology/tools/MAP/MAP3.1.tar.Z Further enquiries may be sent to Alan W Black (awb@ed.ac.uk). There are two alternative PARSERS. The main one is an optimized chart parser, incorporating a 'packing' mechanism (making it much more efficient when parsing sentences containing multiple local ambiguities). The other parser is a non-deterministic LALR(1) parser which seems, in most cases, to be even more efficient than the chart parser. The GRAMMAR is a wide-coverage syntactic and semantic grammar of English, written in a metagrammatical formalism derived from Generalized Phrase Structure Grammar. The grammar pairs one or more formulas of the lambda calculus with each syntactic rule and these produce unscoped (mostly) first-order `event-based' compositional semantic representations. Full coverage is provided of the following constructions and their combinations: - all sentence types: declaratives, imperatives and questions (yes/no, tag and wh questions), - all unbounded dependency types: topicalisation, relativisation, wh questions, - a relatively exhaustive treatment of verb and adjective complement types, - phrasal and prepositional verbs of many complement types, - passivisation, verb phrase extraposition, - sentence and verb phrase modification, - noun phrase complements, - noun phrase pre- and post-modification, - partitives, - coordination of all major category types, - nominal and adjectival comparatives. The LEXICON contains 40,000 homonyms (63,000 entries in total) in the form required by the morphological analyser. The GRAMMAR DEVELOPMENT ENVIRONMENT gives access to all of the other components of the tools, allowing grammars to be input, edited, and browsed; it also compiles them into the base grammatical formalism used by the parsers, and provides extensive grammar debugging facilities. A simple quantifier scoping and post-processing module is supplied as an example of how the result of parsing a sentence can be converted into a representation suitable for further semantic and pragmatic processing. In addition, an illustrative database management application with a small database of wine merchants' stock is supplied. All of the software components are written in Common Lisp and have been tested in several implementations on a wide range of machines. We have created a BULLETIN BOARD which we hope can be used to inform existing users about developments, to provide some informal support, and as a forum for discussion between people doing research with the ANLT system. Submissions should be sent to alveynltools@cl.cam.ac.uk and requests to be added to or deleted from the distribution list should be sent to alveynltools-request@cl.cam.ac.uk. If you are an existing user and this message has come to you direct, your email address has been added to the list already; unfortunately though, we do not have up-to-date email addresses for all known users, so please email alveynltools-request otherwise. Two published REFERENCES to these projects are: Briscoe, E., C. Grover, B. Boguraev & J. Carroll, 'A Formalism and Environment for the Development of a Large Grammar of English', Proceedings of 10th International Joint Conference on Artificial Intelligence, Milan, 1987, pp. 703-708. Ritchie, G., G. Russell, A. Black & S. Pulman, 'Computational Morphology: Practical Mechanisms for the English Lexicon', MIT Press, 1991. Technical reports describing the system in detail are available via FTP as detailed in the file `instruct'. These contain many further references to papers describing aspects of the ANLT system. ******************** ANLT distribution arrangements and instructions, and a machine-readable specimen licence agreement are available in files on the FTP server ftp.cl.cam.ac.uk (128.232.0.56). To fetch this information use anonymous FTP (login with user name anonymous, and password your e-mail address), go to the directory `nltools', and fetch the files licence a machine-readable specimen licence agreement instruct instructions on how to FTP technical reports and the ANLT itself The following example shows how to fetch these files: $ ftp ftp.cl.cam.ac.uk Connected to swan.cl.cam.ac.uk. 220- swan.cl.cam.ac.uk FTP server (Version 5.60+UA) ready. ... Name (ftp.cl.cam.ac.uk:jac): anonymous Password (ftp.cl.cam.ac.uk:anonymous): ... ftp> cd nltools 250 CWD command successful. ftp> get licence ... ftp> get instruct ... ftp> quit 221 Goodbye. (The $ is the Unix shell command prompt). If the FTP command does not know about the address ftp.cl.cam.ac.uk, try giving the command the internet number (128.232.0.56) instead.