.de XP
.RT
.if \\n(1T .sp \\n(PDu
.ne 1.1
.if !\\n(IP .nr IP +1
.in +\\n(I\\n(IRu
.ti -\\n(I\\n(IRu
..
.de VS
.KS
.nf
.\\$1D \\$2 \\$1
.ft 1
.ps 8 
.if \\n(VS>=40 .vs \\n(VSu
.if \\n(VS<=39 .vs \\n(VSp
.cs 1 24
.cs 2 24
.cs 3 24
.lg 0
..
.de VE
.ce 0
.if \\n(BD .DF
.nr BD 0
.in \\n(OIu
.KE
.ps \\n(PS
.lg 1
.cs 1
.cs 2
.cs 3
.if \\n(TM .ls 2
.sp \\n(DDu
.fi
..
.ND
.nh
.nr PS 11
.nr VS 13
.nr PO 1i
.ds CH 
.LP
.DS C 
.sp 6
\s+3 The Edinburgh/Cambridge\s-3
\s+3 Morphological Analyser and Dictionary\s-3
\s+3 System\s-3
.sp 0.5v
[Version 3.0]

.sp 0.5v
\s+4User Manual\s-4
.sp 5
G.D. Ritchie
A.W. Black
.sp
Department of
Artificial Intelligence,
University of Edinburgh
.sp 3
S.G. Pulman
G.J. Russell
.sp
Computer Laboratory,
University of Cambridge
.sp 8 
This work was supported by SERC/Alvey grant GR/C/79114.
.sp 2
\s-2COPYRIGHT\s+2:\ \(co G.D. Ritchie, A.W. Black, S.G. Pulman, G.J. Russell
.sp 3
July 1987
.DE
.bp 1
.ds CF - % -
.SH
CONTENTS
.sp 2
.LP
.ft 3
1. Introduction
.sp
2. Compilation and loading of user files
.RS
.nf
.ft 3
2.1 Overview
2.2 User Files Required
2.3 Filename Conventions
2.4 Initiating Compilation of a Dictionary
2.5 Loading the Analyser into LISP
2.6 Loading a Compiled Dictionary into LISP
.RE
.nf
.ft 3
.sp
3. Using the Dictionary/Analyser
.RS
.nf
.ft 3
3.1 LISP functions available
3.2 Formats Returned by the Analyser
3.3 Dictionary Command Interpreter
3.4 UNIX Shell Commands
.RE
.nf
.ft 3
.sp
4. Debugging the Morphological Rules and Dictionary Entries
.RS
.nf
.ft 3
4.1 Morpheme Entries
4.2 Spelling Rules
4.3 Word Grammar
4.4 Other Points
.RE
.nf
.ft 3
.sp
5. User Specified Files
.RS
.nf
.ft 3
5.1 Overview and Notation
5.2 Sharing information between files \(em #include
.RE
.nf
.ft 3
.sp
6. Morphographemic Spelling Rules
.RS
.nf
.ft 3
6.1 Overview
6.2 Spelling Rules \(em Introduction
6.3 Spelling Rule Formalism \(em More Detail
6.4 Spelling Rules \(em User File Format
6.5 Spelling Rules \(em Example File
.RE
.nf
.ft 3
.sp
7. Word Grammar Rules and Feature Defaults
.RS
.nf
.ft 3
7.1 Word Grammar - Introduction
7.2 Features and Categories
.RS
.nf
.ft 3
7.2.1 Syntax of Categories
7.2.2 Feature Definitions
.RE
.ft 3
.nf
7.3 Word Grammar Rules
7.4 Aliases
7.5 Variables
7.6 Extension and Unification
.RS
.nf
.ft 3
7.6.1 Unrestricted Unification
7.6.2 Term Unification
.RE
.ft 3
.nf
.bp
.nf
.ft 3
7.7 Defining Structures with Grammar Rules
7.8 Declarations
7.9 Feature Passing Conventions
.RS
.nf
.ft 3
7.9.1 The Word-Head Convention
7.9.2 The Word-Daughter Convention
7.9.3 The Word-Sister Convention
.RE
.nf
.ft 3
7.10 Feature Defaults and LCategory Definitions
7.11 Word Grammar \(em User File Format
7.12 Word Grammar \(em Example File
.RE
.nf
.ft 3
.sp
8. Lexicon File
.RS
.nf
.ft 3
8.1 Morpheme Entries
8.2 Noninflectable Categories
8.3 Lexical Rules
8.4 Basic Format of Lexical Rules
8.5 Pattern Matching in Lexical Rules
8.6 Completion Rules
8.7 Multiplication Rules
8.8 Consistency Checks
8.9 Application of Lexical Rules
8.10 Lexicon File \(em User File Format
8.11 Sample Lexicon
.RE
.nf
.ft 3
.sp
9. Implementation
.RS
.nf
.ft 3
9.1 Basic System
9.2 Installation for Franz (opus 42.15)
9.3 Installation for Franz (opus 38.75)
9.4 Installation for Common LISP
9.5 Programming Conventions
9.6 Restrictions and Bugs
.RE
.nf
.ft 3
.sp
10. Enhancements
.sp
References
.fi
.ft 1
.ds RH Section 1
.bp
.NH 0
Introduction
.LP
This document describes how to use the morphological
analysis and dictionary system, which is intended for use within LISP
programs (i.e. it is implemented as a set of functions
which can be called from within the LISP system).
The expected use of the system is two-fold.  Firstly it may be used by
people who wish to use a general dictionary system within
some larger system (e.g. a
natural language interpreter) but who do not want to have to worry about the
morphology of words; such users will use not only the LISP facilities,
but the supplied set of data files describing English morphology.
Secondly the system is, we hope, general enough
to allow people interested in morphology itself to investigate linguistic
issues by modifying the sample user files.
.LP
The system allows the basic dictionary to contain one kind of entry, which we
term a \*Qmorpheme\*U.  A morpheme is a basic word, suffix or prefix 
e.g. \f2walk\f1,
\f2+ed\f1, \f2+ation\f1 but not \f2walks\f1 or \f2bigger\f1.  The Analyser 
part of the system
allows the user to specify how actual words like \f2walks\f1 
and \f2bigger\f1 can
be made up from other simple morphemes.
However, it is up to the user to decide what is entered as a separate
morpheme \(em any word, part of word, or even phrase could appear as a single
entry.
.LP
A dictionary system consists
basically of three sub-parts specified by the user: a set of \f2spelling
rules\f1, a \f2word grammar\f1,
and a \f2lexicon\f1.  The lexicon contains the entries for morphemes \(em
typically, words in 
their basic form and various affixes (both prefixes and suffixes).
Words can then be looked up using the two forms of morphological rules and
the basic lexicon.
.LP
The user is able to specify each of the three parts separately:
.IP 1.
Orthographic effects which occur when morphemes are concatenated
can be stated in a rule notation based on Koskenniemi (1983, 1985).
.IP 2.
Words can be assigned an internal (morphological) structure using a rule
format which is closely based on Gazdar and Pullum's (1982) generalisation 
of context-free notation.
.IP 3.
Lexicon entries can be specified, including
the various items of information (regarding syntactic category, etc.) 
associated
with each stem or affix.  Rules can also be employed 
expand entries automatically, thus reducing the amount of explicit 
information the user need specify.
.LP
The use of the system can be thought of as falling into two 
phases \(em using the Compiler to pre-process the three sub-parts mentioned
above, and actual running of the Analyser.  That is, the Compiler
takes the user-specified files and converts them into an internal format
suitable for use by the Analyser system.  LISP functions are provided
for compilation of sub-parts, loading of compiled sub-parts and dictionary
look-up.
(N.B. Throughout this document, apart from Section 9,
the term \*Qcompile\*U or \*Qcompilation\*U
will be used to denote this pre-processing of linguistic data files
into a form usable by the Analyser - it will \f2not\f1 mean the use
of the LISP compiler to perform ordinary program compilation).
.LP
Once the system has been installed (see Section 9),
if a compiled set of files are available, as described in Section 2 below,
then the Analyser may be used in either of two ways:
.IP (a)
the UNIX shell command \f3dci\f1 (\*QDictionary Command Interpreter\*U,
loads the Franz LISP system and executes an interactive
command interpreter (see Section 3.3);
.IP (b)
from within LISP, various LISP functions can be called to manipulate
the lexical information (see Section 3.1).
.LP
There are two versions of the system, these differ in their use of
unification during parsing.  This can be \f3term unification\f1 where
categories can only unify if they have the same number (and name) of 
features; or \f3unrestricted unification\f1 where categories can unify as
long as there are no clashing features (i.e. features with same name but
different values).  This distinction is explained is more detail in
section 7.2.  This selection between the two types of unification is
made at installation time (see section 9).  Dictionaries and analysers
written under one form of unification will not work under the other 
even though they superficially appear to be similar.  The examples
given in this document are for unrestricted unification.  Where the system 
acts differently between the two forms of unification the text describes
these differences.  Small example dictionaries and analysers of both
types are provided in the normal distribution.
.LP
It is important to bear in mind the distinction between the software
(i.e. the Analyser and Compiler programs), which embody the facilities,
rule-notations, etc.
provided by this system, and the user files (of which samples are
supplied with the software) which give a possible description of English
morphology. This document defines the software facilities, but the
user is free to create new data files to give a different description
of English.  
.LP
For a low level description of the internal algorithms and techniques used
in implementing this system the user is referred to the System Description
(Ritchie et al 1987).
.LP
This work has been designed as part of the Alvey Natural Language Tools
Projects.  In addition to this dictionary and analyser system a GPSG
parser (Phillips & Thompson 1986) and a GPSG Grammar
(Briscoe et al. 1986) have been developed.  Although these
modules have been developed to act as a integrated system they
also may be used independently within other systems or in stand-alone
mode.
.ds RH Section 2
.bp
.NH 1
Compilation and loading of user files
.NH 2
Overview
.LP
The information outlined in Section 1 above should be entered into three
separate files and compiled into a further four files (the lexicon compilation
process produces two files).  The Analyser itself requires three 
of these four sub-parts
to be loaded before it can run (see Section 2.6 below).
Before giving detailed definitions of the formats, function names, etc.,
it may be helpful to provide some further details of what exactly these
types of information are. 
The whole process can be pictured as in Figure 1 below which shows
the prerequisites for each section.  The top row lists the three basic input 
files.   The bottom row lists the three main analysis functions which
can be called from LISP after compilation (see Section 3.1 for
further details).  Each function
uses varying levels of analysis - \f3D-LookUp\f1 is full analysis of a string 
using the morpho-syntax rules and spelling rules, \f3D-Segment\f1 is
analysis only by the 
the spelling rules, and \f3D-Morpheme\f1 is a direct lookup of a given single
morpheme in the lexicon (i.e. using none of the rules).
.if n .so /tmp/table.out
.if t .so table.t
.NH 2
User Files Required
.LP
The three user files are:
.XP
\f2Spelling Rules\f1: 
This holds the definitions of the lexical and surface alphabets and the
sets of characters used in the spelling rules.  It also contains the spelling
rules themselves, describing the spelling changes that occur
when morphemes are combined.  Such changes are viewed as resulting from 
correspondences between \*Qsurface\*U and \*Qlexical\*U (or \*Qunderlying\*U)
characters in a particular
context.  For example, \f2moved\f1, is made up of the 
morphemes \f2move+ed\f1 with a surface \f2e\f1 corresponding to 
an underlying \f2e+e\f1 in the lexical forms.
These rules, and the file format, are defined in full in Section 6 below.
.XP
\f2Word-Grammar\f1: 
In addition to the word grammar itself, this file also
contains definitions of aliases, variables, features, etc. which appear
in the grammar, and the feature classes used by the feature passing 
conventions during analysis.  Note that the feature 
specifications made here must be compatible with those made in
the lexicon file. 
These rules, and the file format, are defined in full in Section 7 below.
.XP
\f2Entries\f1: 
This contains the definitions of all features, aliases, etc. that appear
in the syntactic category of an entry.
Each entry is a 5-tuple \(em citation form, phonological form,
syntactic category, semantic field, and a user field, and is intended to 
represent a single morpheme.  This file also contains 
three types of rules for processing lexical
entries.  Firstly, \*Qcompletion rules\*U rules which add unspecified features
which are predictable from the given specification, secondly
\*Qmultiplication rules\*U which can construct new entries 
from user specified ones,
and the thirdly \*Qconsistency checks\*U which verify the internal 
consistency of an entry.
These rules, and the file format, are defined in full in Section 8 below.
.NH 2
Filename Conventions
.LP
Each sub-part is identified by a filename with an extension that indicates
which sub-part it is (\f3.sp\f1, \f3.gr\f1, or \f3.le\f1, respectively.)
The name can be any alphanumeric sequence chosen
by the user (subject to the constraint that all files must have valid
UNIX file names). 
File names with special characters must be specified within
double quotes when referred to from within the Analyser system.
Notice that during the dictionary compilation process,
the output filenames are automatically produced from the input
filenames by appending \f3.ma\f1 to them.
.NH 2
Initiating Compilation of a Dictionary
.LP
Once the input files have been created,
compilation can be initiated in any of three ways:
.IP (a)
selecting the appropriate options from within the interactive
Dictionary Command Interpreter (see Section 3.3).
.IP (b)
calling the appropriate LISP functions, once the basic Analyser
has been loaded into Franz LISP (see Section 3.1)
.IP (c)
calling the UNIX shell commands \f3mksp\f1, \f3mkgram\f1, \f3mklex\f1
(see Section 3.4)
.NH 2
Loading the Analyser into LISP
.LP
In order to use the Analyser from within LISP, it is
first necessary to load the various program files and initialise
the internal structures before any of the user's lexical data files
can be loaded.
The first file that must be loaded before any functions can be used is
\f3maload\f1 \**
.FS
Technically, the file actually loaded will be that which was created by the
LISP compiler e.g. \f2maload.o\f1.  It is assumed the LISP system will
load such a file even if the \f2.o\f1 file extension is not specified.
.FE
This file defines the bootstrap function \f3d-maload\f1, which 
can take zero or one argument.  If specified, the argument
should be the name of the UNIX directory holding the rest of the program files
for the Analyser.
If no argument is given then the program files will be sought in the 
current directory (or any directory in the Franz LISP variable 
\f3load-search-path\f1
- see Section 9.2 for modifying this).  Before \f3d-maload\f1 is called no
dictionary functions are defined.  When it is called, all the dictionary
functions are loaded and the system is initialised (by a call to
\f3D-Initialise\f1 \(em see below).
Note that \f3d-maload\f1 does evaluate its arguments and hence the directory
argument should be quoted.
A typical call might be
.VS C
(d-maload "/usr/local/src/morph")
.VE
.LP
If the interactive Dictionary Command Interpreter is to be called up within
LISP the file \f3morphan\f must be loaded \(em see Section 3.3 below.
.NH 2
Loading a Compiled Dictionary into LISP
.LP
In order to use the Analyser from within LISP,
the data files created by the compilation (Section 2.4)
must be correctly loaded.
Once the Analyser program files have been successfully loaded
(Section 2.5 above), any of the LISP functions in Section 3.1 are
available, including those for loading the user files (i.e. 
\f3D-LoadSpRules\f1, \f3D-LoadWordGrammar\f1,
\f3D-LoadLexicon\f1, \f3D-AddLexicon\f1, \f3D-LoadAll\f1).
Alternatively, if the interactive Dictionary Command Interpreter
is being used (Section 3.3), then selecting suitable command options
will load the user's compiled dictionary files.  Note that compiled user
files may only be loaded by the same version of the Analyser system.  It 
is necessary for users to recompile their user files if they receive a
new version of the Analyser software.
.ds RH Section 3
.bp
.NH 1
Using the Dictionary/Analyser
.LP
A compiled set of dictionary files can be accessed either through
a set of LISP functions (see 3.1 below) or via the 
interactive Dictionary Command
Interpreter (see Section 3.3 below).
.NH 2
LISP functions available
.LP
The system provides a set of LISP functions.  Although they are
named here using upper and lower case, they may also be referenced
using lower case only.  (See Section 9 for details of implementation.)
All functions evaluate their arguments unless specifically stated otherwise.
.LP
The first four functions are used to load and initialise the system.  One of
the first three functions should be used to load the code.  If some other
method is used to load the code it is necessary to call the function 
\f3D-Initialise\f1 before any of the other system functions will work.
Square brackets, [..], are used to denote optional arguments while angled
brackets, <..>, denote mandatory arguments.
.XP
(\f3d-maload\f1 [\f2dir\f1]) : This function (note that the name
is all in lower case) loads the LISP object code for the whole dictionary
system.  If \f2dir\f1 is specified then it should be the name of the
directory containing the rest of the  object code files.  If \f2dir\f1
is not specified the directories in the Franz system variable
\f3load-search-path\f1 are searched for the code (see section 9.2).
Note that \f2dir\f1 is evaluated.  In addition to loading the program
this function initialises the dictionary system by a call to
\f3D-Initialise\f1 (see below).
.XP
(\f3d-maloadcomp\f1 [\f2dir\f1]) :  This is similar to the above
function except that it only loads the code for the compilers and not
the code for the actual Analyser.  See previous explanation for note about
the argument \f2dir\f1.  Note that if this function is used to load the
system any Analyser function called will give an error.  This function also
makes a call to \f3D-Initialise\f1.
.XP
(\f3d-maloadmap\f1 [\f2dir\f1]) :  This is similar to the function
\f3d-maloadcomp\f1 above.  This function loads only the code required
to load and run a dictionary and not the code required to compile any
of the user files.  Calling any of the compiler functions after this
function will give an error.
See the note in the description of \f3d-maload\f1 regarding
the \f2dir\f1 argument.  This function does call \f3D-Initialise\f1.
.XP
(\f3D-Initialise\f1) : Initialises the Analyser system by resetting the
global variables.  If a dictionary is already loaded this function call
resets the system and hence removes the previously loaded dictionary
from the LISP environment (the actual files are unaffected).
.LP
The following functions are available within LISP after
program loading and initialisation; e.g. after
\f3D-Initialise\f1 has been called.
.XP
(\f3D-MakeSpRules\f1 <name>) : Compiles the spelling rule file \f3<name>.sp\f1
into \f3<name>.sp.ma\f1.  If \f3<name>.sp\f1 does not exist an error is 
signalled.  
During compilation, information and non-fatal errors are displayed on
standard output.
Any currently loaded set of spelling rules is
not affected.
.XP
(\f3D-MakeWordGrammar\f1 <name>) : Compiles the word grammar in file
\f3<name>.gr\f1 into file \f3<name>.gr.ma\f1.  If \f3<name>.gr\f1 does not 
exist an error is signalled.  During compilation, information and non-fatal 
errors are displayed on standard output.  Any currently loaded word grammar is
not affected.
.XP
(\f3D-MakeLexicon\f1 <name>) : Compiles the lexicon in file \f3<name>.le\f1
into the two files \f3<name>.le.ma\f1 (containing a tree-structured index
to the entries)
and \f3<name>.en.ma\f1 (containing the expanded entries).
If \f3<name>.le\f1 does not exist an
error is signalled.  During compilation, messages are displayed on standard
output.  The user should be aware that compiling large dictionaries can take 
a long time.
Any currently loaded lexicon is not affected.
.XP
(\f3D-LoadSpRules\f1 <name>) : Loads in a previously compiled set of 
spelling rules held in \f3<name>.sp.ma\f1.  An error is signalled if this
file does not exist.  Any previously loaded set of spelling rules
is over-written.  
If the file being loaded was not created by a compilation
from \f2this\f1 version of the system an error is signalled.
Compiled user files can be loaded only by compatible versions.
.XP
(\f3D-LoadWordGrammar\f1 <name>) : Loads in a previously compiled word grammar
held in file \f3<name>.gr.ma\f1.  An error is signalled if this file does not 
exist.  Any previously loaded word grammar is over-written.
If the file being loaded was not created by a compilation
from \f2this\f1 version of the system an error is signalled.  If 
the system is using term unification the error "Incompatible category
types" will be produced if the category definitions in the word grammar
file are incompatible with those currently loaded (from a lexicon
file or a currently loaded word grammar file). 
This may mean the currently loaded lexicon and analyser have to be cleared 
(via the \f3D-Initialise\f1 function) before loading new files.  Or the
function \f3D-LoadAll\f1 can be used which resets the category 
definitions before loading in the new files. 
.XP
(\f3D-LoadLexicon\f1 <name>) : Loads a lexicon tree from the
file \f3<name>.le.ma\f1
and opens a port to \f3<name>.en.ma\f1.  This first file, \f3<name>.le.ma\f1, 
contains the lexicon index, which has pointers to the expanded entries 
held in the second file \f3<name>.en.ma\f1.  
The global variable \f3D-ENTRYFILEID\f1 is set to the 
open port.  If either file is not found an 
error is signalled.  Any previously loaded lexicon is over-written and the old
entry file port closed.  Note, if a loaded version of the system is 
dumped the port 
to \f3<name>.en.ma\f1 will be closed on re-entry to LISP.  This means it 
is necessary to reload the
lexicon each time you enter LISP.
If the files being accessed were not created by a compilation
from \f2this\f1 version of the system an error is signalled.  If
the system is using term unification the error "Incompatible category
types" will be produced if the category definitions in the lexicon file
are incompatible with those currently loaded (from a word grammar
file or a currently loaded lexicon file). 
This may mean the currently loaded lexicon and analyser have to be cleared 
(via the \f3D-Initialise\f1 function) before loading new files.  Or the
function \f3D-LoadAll\f1 can be used which resets the category 
definitions before loading in the new files. 
.XP
(\f3D-AddLexicon\f1 <name>) : Adds another lexicon to the one(s) all ready 
loaded.  This allows the user to keep separate lexicons and selectively load 
them.  Both \f3<name>.le.ma\f1  and \f3<name>.en.ma\f1 must exist otherwise an
error is signalled and the current lexicons are lost from the Analyser.  An
error is also signalled if there is no currently loaded
lexicon.  If <name> is already loaded the old version is removed from the 
lexicon and is replaced with the latest version of <name>.  Note this 
may not be the same lexicon (it may be a different lexicon with the same file
name).   The reverse can happen too in that a lexicon may not be replaced when
it is intended to be, this can happen if the user changes directory between
loads.
Having separate
lexicons is more inefficient than having one large lexicon with respect to look
up time, but separate lexicons are easier for debugging purposes and are
quicker to compile so they are suitable for development.
The number of added lexicons is limited to the number of open files
allowed by a UNIX process (i.e. 20 - possibly less under other operating
systems). 
If the files being accessed were not created by a compilation
from \f2this\f1 version of the system an error is signalled.  If
the system is using term unification the error "Incompatible category
types" will be produced if the category definitions in the lexicon file
are incompatible with those currently loaded (from a word grammar
file or a currently loaded lexicon file).  Lexicons with incompatible
category definitions \f2cannot\f1 be mixed.
.XP
(\f3D-LoadAll\f1 <name1> <name2> <name3>) : Load all three sub-parts. This is
equivalent to calling
.VS
(\f3D-LoadSpRules\f1 <name1>) 
(\f3D-LoadWordGrammar\f1 <name2>)
(\f3D-LoadLexicon\f1 <name3>)  
.VE
In the term unification version this function also
resets the category definitions before loading the new files.
If any of the respective files do not exist
then an error is signalled.  Any previously loaded sub-part is over-written.
If the files being accessed were not created by a compilation
from \f2this\f1 version of the system an error is signalled.  
.XP
(\f3D-VersionHeading\f1) : Prints \*QMorphological Analyser\*U and
the current version number.
.XP
(\f3D-LookUp\f1 <word>) : looks up the given <word> using the word grammar
and spelling 
rules and returns a list of possible analyses.  <word> must be a LISP
string or symbol made up of characters in the surface alphabet.  The
format of these analyses 
depends on the setting made by the function \f3D-ChangeLookUpFormat\f1
(see below). 
A set of spelling rules, a word grammar and a lexicon 
must 
be loaded before this function can be called.  If one of these sections 
is missing an error is signalled.
.XP
(\f3D-Morpheme\f1 <morph>) : Looks up <morph> (a string or symbol made up 
of lexical characters) directly in the lexicon, without the 
grammar or spelling rules.  A list of lexical entries is returned, one for 
each entry whose citation form is <morph>.  A lexicon must be loaded
otherwise an error is signalled.
.XP
(\f3D-Segment\f1 <word>) : Segments <word> (a string or symbol made up
of characters in the 
surface alphabet) into morphemes if possible, and includes in the result the 
lexical entries of those morphemes.  The result of 
\f3D-Segment\f1 is either the atom \f3nil\f1 (if no segmentation into 
morphemes was possible), or a list of possible segmentations (since there may
be more than one valid segmentation). A segmentation is represented as a list 
in which each element is a list representing a lexical entry. There must be a 
lexicon and a set of spelling rules loaded otherwise this function will signal
an error.  No word grammar need be loaded as this analysis does not use that
part of the system.
.XP
(\f3D-MorphemeConcat\f1 <morphs> <flags>) : Concatenate the given morphemes
to produce their surface form, where <morphs> is a list of
of symbols (or strings) made up of characters from the lexical alphabet, 
and <flags> is a list 
of flags.  This function returns a list of all surface forms that correspond
to the concatenation of the given lexical forms, with respect to the 
currently loaded spelling rules.  If no spelling rules are loaded this function
signals an error.  If <flags> is \f3nil\f1 then the surface forms are simply
returned including any null (0) characters.  The only option for <flags>
currently
supported is \f3NONULLS\f1, which suppresses surface nulls.  For example
.VS L
(D-MorphemeConcat '(move +ed) '(NONULLS))
.VE
would return \f2'(moved)\f1 (depending on the currently loaded spelling rules)
while a call of
.VS L
(D-MorphemeConcat '(move +ed) nil)
.VE
would return \f2'(mov00ed)\f1.
.XP
(\f3D-ChangeLookUpFormat\f1 <format>) : Selects the form of 
information returned by the 
function \f3D-LookUp\f1 (see next section for details). 
The argument <format> must be one of \f3D-CATEGORYFORM\f1,
\f3D-WORDSTRUCTURE\f1, \f3D-STRINGSEGMENTCAT\f1, or \f3D-STRINGSEGMENTWS\f1.  
Calling this function with an atom other than one of these four will cause an 
error.
.NH 2
Formats Returned by the Analyser
.LP
The information returned by the function \f3D-LookUp\f1
will be one of four possible formats (depending on the current setting
made by the last call of \f3D-ChangeLookUpFormat\f1).  The choice of
formats can 
be thought of in two dimensions: how a word
is described (category or word tree); and word form or edge form.
.LP
The first dimension is the way in which actual words are described. Currently
the system allows two forms, category and word structure.  Before the
distinction can be made between these two forms it is necessary to describe
some of the analysis process.
During
the analysis of words, a structure is built by the word-parser which records
all the lexical information about the component morphemes, together with
the morphological rules which were used to combine them (see Section 7).
This can be looked upon as a tree in which each non-terminal 
node has the following associated values:
.IP (a)
a syntactic category  (computed by the parsing process);
.IP (b)
the name of the rule (from the word grammar) used to build that part
of the word.
.IP (c)
a word structure for each daughter of the rule used to build that word
.LP
Where a word is a single morpheme (or at the lowest level of
the structure)
this cannot be the case, as no rule is involved
in building it. In that case, the fields contain:
.IP (a)
the syntactic category formed by the unification of the category in 
the entry with the category in the word-grammar rule which licensed it.
.IP (b)
the keyword ENTRY, (thus the user should not use the name ENTRY for a rule,
but is not stopped in doing so)
.IP (c)
the lexical entry for the morpheme (as expanded by the lexical rules).
.LP
Hence the overall BNF for the data-structure, which is expressed as LISP
s-expressions, is:
.VS L
<word-structure> ::= ( <category> <rule-name>
                       <word-structure>* )
                |    ( <category> \f3ENTRY\f1 <lexical-entry> )
.VE
The result of applying \f3D-LookUp\f1
to the word \f2applications\f1  might be:
.VS L
(((BAR 0) (PLU +) (INFL -) (N +) (V -))
 SUFFIXING
 ( ((BAR 0) (PLU -) (N +) (V -) (INFL +))
   SUFFIXING
   ( ((INFL +) (BAR 0) (V +) (N -))
     ENTRY
     (apply appli ((INFL +) (BAR 0) (V +) (N -)) APPLY NIL))
   ( ((BAR -1) (FIX SUF) (INFL +) (PLU -) (N +)
                   (V -) (STEM ((V +) (N -) (INFL +))))
      ENTRY
      (+ation ashon ((BAR -1) (FIX SUF) (INFL +) (PLU -) (N +)
            (V -) (STEM ((V +) (N -) (INFL +)))) ATION NIL))
 )
 ( ((BAR -1) (FIX SUF) (INFL -) (PLU +) (V -)
                   (N +) (STEM ((N +) (V -) (INFL +))))
    ENTRY (+s s ((BAR -1) (FIX SUF) (INFL -) (PLU +) (V -)
          (N +) (STEM ((N +) (V -) (INFL +)))) S NIL)))
.VE
.LP
Where there are unbound variables in categories these are denotes by a list
consisting of a number (unique to that variable), the atom
\f3<UNBOUND-VARIABLE>\f1 and the range of the variable.
.LP
Given this tree structure, the two forms of description that the system
allows are in terms of this word structure tree.  When \f3D-CATEGORYFORM\f1 
or \f3D-STRINGSEGMENTCAT\f1 
is selected as the format, words are described by the top syntactic
category (e.g ((BAR 0) (PLU +) (INFL -) (N +) (V -)) in the above
example).  When \f3D-WORDSTRUCTURE\f1 or \f3D-STRINGSEGMENTWS\f1 is the 
selected format the words are described by the whole word structure tree.
.LP
The other dimension in the returned format is that the 
function \f3D-LookUp\f1
can take conceptually two forms of input, words or sentences (though
formally it is the user that makes this distinction).
.LP
When the format is \f3D-CATEGORYFORM\f1 or \f3D-WORDSTRUCTURE\f1 the
lookup function returns a simple list of word descriptions (categories or
word structure trees respectively).  That is, all analyses of the given
string that span it and have a top category that is matches the definition
in the Top declaration in the word grammar 
(see Section 7 for a full definition).
If the word cannot be analysed an empty list of 
descriptions (i.e. \f3nil\f1) is returned.
.LP
When the
lookup format is \f3D-STRINGSEGMENTCAT\f1 or \f3D-STRINGSEGMENTWS\f1, the 
given parameter to \f3D-LookUp\f1
is treated as a sentence (string of words) and a chart representation
(Thompson and Ritchie(1984)) is returned.  This is of the form
.VS L
<chart representation> ::=
         ( 
            ( <chart start vertex> <chart end vertex> )
             <edge> *
         )

<edge> ::=
     ( 
      <edge start vertex>
      <edge end vertex>
      <edge label>
     )
.VE
where the vertex names are arbitrary LISP atoms.
The form of the edge label depends on which form of \f3STRINGSEGMENT\f1 is 
selected; if \f3D-STRINGSEGMENTCAT\f1 is selected the edge label is 
a syntactic category (as in \f3D-CATEGORYFORM\f1); if \f3D-STRINGSEGMENTWS\f1
is selected then the edge label is a word structure tree (as 
in \f3D-WORDSTRUCTURE\f1).
.LP
The edges that are returned represent all possible 
segmentations (i.e. exhaustive partitions) of the given lookup
string that consist of one or more words
(a word is defined by the Top declaration in a word grammar). The intention
is that the strings given are sentences.  e.g. if \f3D-STRINGSEGMENTCAT\f1 is 
selected
.VS
(D-LookUp "john liked mary")
.VE
might give the following (with suitable assumptions about the user files)
.VS
((g00001 g00002)
   (g00001 g00003 ((BAR 2) (N +) (V -) (PLU -)))
   (g00003 g00004 ((BAR 0) (N -) (V +) (VFORM ED)))
   (g00003 g00004 ((BAR 0) (N -) (V +) (VFORM EN)))
   (g00004 g00002 ((BAR 2) (N +) (V -) (PLU -)))
)
.VE
These \f3STRINGSEGMENT\f1 formats are intended to be used for segmenting 
sentences
into a form that can easily be fed into a chart parser that deals with
sentence level syntax.
.LP
Note that if this chart form is to be used the user should not also use 
the \f2NonInflect\f1 option (see section 8.2).
If it were to be used the classes in it should be 
things that can appear only at the end of \f2sentences\f1 rather
than \f2words\f1, which probably makes it
no longer a useful option.
.NH 2
Dictionary Command Interpreter
.LP
To make life easier when using the system for developing word grammars,
spelling rules and lexical entries, a simple command interpreter has been 
written.  This is not part of the Analyser system \f2per se\f1 but is an
example of a program that uses the dictionary and analysis functions.
The interpreter allows easier access to the various LISP functions.  It
can be run direct from the UNIX shell (via the command \f3dci\f1, see
Section 3.4) or called within LISP.
To call the command interpreter from LISP it is necessary to load the file
\f3morphan\f1.  The full path name for the file should be specified (or see 
Section 9.2), e.g.
.VS C
(load "/usr/local/src/morph/morphan")
.VE
This will load the interpreter and its related functions.
Once the command interpreter has been loaded it can be started by calling the
function \f3D-Start\f1 (no lower case equivalent is defined as it is not one 
of the actual dictionary functions).  \f3D-Start\f1 takes an an 
optional argument specifying the directory containing the Analyser
program files.
Hence the command interpreter can be started by a call like
.VS C
(D-Start "/usr/local/src/morph")
.VE
If no argument is specified the Franz system variable
\f3load-search-path\f1 is used to look for the code (see section 9.2).
\f3D-Start\f1 loads the Analyser code and initialises the dictionary
system (by calling the function \f3D-Initialise\f1) then enters the top
level of the user interface, giving the message \*QMorphological
Analyser\*U, the version number, and the prompt:
.VS
>
.VE
Notice that at this stage no data files have been loaded \(em the
dictionary is completely empty of word grammar, spelling rules  and
lexical entries.  The necessary sub-parts can then be compiled and
loaded using the set of commands below.  Commands may be typed with
their appropriate arguments following on the same line.  When essential
arguments are not given on the command line, the user will be prompted
for them. Any extra arguments are ignored.  Note where names of
spelling rules, word grammar and lexicons are referred to only the
\f2name\f1 should be used and not the the extensions (e.g. omit the .sp
or .sp.ma etc)
.XP
\f3h\f1 or \f3?\f1
.br
Prints this list of commands.
.XP
\f3e\f1, \f3q\f1, \f3exit\f1, \f3quit\f1 or \f3EOF\f1 (i.e. end-of-file)
.br
Exits from the Analyser system.
.XP
\f3l\f2  <spelling rule name> <word grammar name> <lexicon name>\f1
.br
Loads all three parts of a dictionary.  Note these must be parts that
have been previously compiled by the system.
.XP
\f3ls\f2  <spelling rule name>\f1
.br
Loads a previously compiled set of spelling rules.
.XP
\f3lg\f2  <word grammar name>\f1
.br
Loads a previously compiled word grammar.
.XP
\f3ll\f2 <lexicon name>\f1
.br
Loads a previously compiled lexicon.
.XP
\f3al\f2 <lexicon name>\f1
.br
Adds a new lexicon to currently loaded dictionary.  If <lexicon name> is 
already loaded it is reloaded replacing the currently loaded version of that
lexicon.
.XP
\f3la\f2 <name>\f1
.br
Loads a spelling rule set, word grammar and lexicon all called
<name>. Thus this is equivalent to l <name> <name> <name>.
.XP
\f3cs\f2  <spelling rule name>\f1
.br
Compiles a set of spelling rules.
.XP
\f3cg\f2  <word grammar name>\f1
.br
Compiles a word grammar.
.XP
\f3cl\f2  <lexicon name>\f1
.br
Compiles a lexicon.
.XP
\f3t\f1
.br
Toggles trace setting  (No arguments) (see section 4).
.XP
\f3f\f2  <type>\f1
.br
Changes the look up output format.
<type> is one of 1-4, if no argument is given a list
displaying the possible options is given and the user
is asked to select one.
.XP
\f3w\f2  <word>\f1
.br
Looks up word in the dictionary with morphological analysis
<word> is the word to be analysed (surface form)
which must be in double quotes if it contains any of
<>(){}[],.#;:!" space tab or newline.
.XP
\f3m\f2 <morpheme>\f1
.br
Looks up morpheme directly in lexicon.
<morpheme> is a morpheme (lexical form)
which must be in double quotes if it contains any of
<>(){}[],.#;:!" space tab or newline.
.XP
\f3s\f2 <word>\f1
.br
Segments word into morphemes.
<word> is a word (surface form)
which must be in double quotes if it contains any of
<>(){}[],.#;:!" space tab or newline
.XP
\f3cm\f2 <lexical string> <lexical string> ...\f1
.br
Concatenates the given lexical strings to produce a list
of corresponding surface strings.
.XP
\f3spd\f1
.br
Prompts user for a lexical and surface string and enters the 
spelling rule debugger (see section 4.2 for more details)
.XP
\f3db\f2 <word>\f1
.br
Looks up <word> in the dictionary with morphological analysis
(in the same way as the command w) but then enters
the word parser debugger (see section 4.3).
.XP
\f3st\f1
.br
If the system is configured for unrestricted unification this asks for
a category to be used as the distinguished category during word
analysis.  This overrides the TopCategory declaration in the currently
loaded grammar.  If the system is configured for term unification 
this allows the user to give a list (in braces {} separated by commas)
of category types which are to denote complete words in an analysis.
Again this overrides the TopCategory declaration in the currently
loaded grammar.   Note that in both cases the new top category(s)
must be given when prompted for rather than as argument to the command.
.XP
\f3ds\f1
.br
Displays the names of the currently loaded rules and lexicons.
.XP
\f3clear\f1
.br
Clears the current dictionary (i.e. makes it empty).  This
asks for confirmation before it clears the system.
.XP
\f3!\f2 <s-expression>\f1
.br
Evaluates the given LISP <s-expression>.  Hence
this allows access to the UNIX shell.
.LP
When the \f3w\f1 option is used for looking up words a figure is given
representing the amount of CPU time used in the analysis.  The figure is 
calculated using the Franz LISP function \f3ptime\f1.
.LP
To leave the Interpreter and return to the top level of
LISP type any of \f3q\f1,
\f3exit\f1, \f3quit\f1, \f3e\f1 or end-of-file (CTRL-D on UNIX).  If the user 
wishes to go back into the Interpreter, a \*Qwarm start\*U can be initiated
by the call
.VS C
(D-Restart)
.VE
On such a re-entry, any previously loaded files will still be present
(i.e. the dictionary contents will not have been altered by the exit and 
re-entry).  If a re-initialisation is required (i.e. a clean, empty 
dictionary), the function \f3D-Start\f1 could be used, as on the first 
occasion or, more simply, the command \f3clear\f1 from within the interpreter.
.NH 2
UNIX Shell Commands
.LP
In addition to the LISP functions, some of the above functions are available 
as shell commands.  These commands automatically call LISP and the appropriate
function. (Note that currently these commands are only available in the 
Franz version of the system -
see Section 9 for details of the installation process).
The following commands are provided:
.XP
\f3dci\f1 [\f2dir\f1] : enters the Dictionary Command Interpreter as
described in Section 3.3.  No 
LISP calls are required. When any of the exit options are selected,
the command exits
and returns to the shell.  If \f2dir\f1 is supplied then that directory is 
checked for the appropriate LISP object files.
.XP
\f3mksp\f1 <name> : will compile the spelling rules in file \f3<name>.sp\f1 
into the file \f3<name>.sp.ma\f1  This is functionally equivalent to the LISP 
function \f3D-MakeSpRules\f1 (see Section 3.1).
.XP
\f3mkgram\f1 <name> : will compile the word grammar in file \f3<name>.gr\f1 
into the file \f3<name>.gr.ma\f1.  This is functionally equivalent to the LISP
function
\f3D-MakeWordGrammar\f1 (see Section 3.1).
.XP
\f3mklex\f1 <name> : will compile the lexicon in file \f3<name>.le\f1 into the 
files \f3<name>.le.ma\f1 and \f3<name>.en.ma\f1.  This is functionally 
equivalent
to the LISP function \f3D-MakeLexicon\f1 (see Section 3.1).
.ds RH Section 4
.bp
.NH 1
Debugging the Morphological Rules and Dictionary Entries
.LP
Writing lexical entries and analyser rules is not easy, so it is very
likely that descriptions will contain bugs.  To try to help the user a number
of debugging aids have been included in the system.  The intention is that 
debugging should be done from the command interpreter (section 3.3) rather 
than as LISP functions.
.LP
There are various levels at which one can debug as there are various forms
of errors.  When a word fails to be analysed when it is thought that the
system should analyse it there are three basic levels where it may have failed.
.IP
- the morphemes are not actually in the lexicon.
.IP
- the spelling rules do not segment the surface word in the desired way
.IP
- the word grammar rules do not recognise the string of morphemes
.LP
In each of these main three cases (there are others) there are simple ways
to find out which level has failed and debugging aids to investigate 
the problem further.
.NH 2 
Morpheme Entries
.LP
First check that the required morphemes for the word are in the lexicon.
This can be done via the "m" command
in the command interpreter.  Note that the \f2lexical form\f1 must be used,
that is all characters that are in the citation form of the entry.  For example
in our example lexicon shown in section 8.11 the form "+ing" must
be entered when looking up morphemes directly rather than just "ing".
This is because none of the rule formalisms are used in morpheme look up,
look up is simply character by character.
.LP
If the morpheme is not found, first check the actual lexicon source file.
Is it there ? And is it spelled correctly ?  If it is there the most likely
problem is that its syntactic entry did not pass the validity test 
and/or the consistency checks.  This should be shown during the compilation
of the lexicon.  During compilation any entries that contain undeclared
features or feature values, or fail the consistency checks are written to 
the terminal with the corresponding error message and not added to the 
compiled lexicon. 
.LP
If the entry has failed the consistency checks it may be difficult to 
find out which check failed (and why).  To help solve the problem a tracing
facility for the lexical rules is available.  This may be used in two
forms.  To trace \f2all\f1 entries, the global
trace flag can be set with the "t" command in the command interpreter.  This
will toggle the current setting from off to on or vice versa.  The other
method of tracing is more selective.  When the compiler directives
.VS L
     #trace on

     #trace off
.VE
are placed round a number of entries in the source lexicon, tracing is 
switched on for that period.  
.LP
When tracing is switched on the lexicon compiler will print out the 
citation form of each entry as it processes it and also the names of
all the Completion Rules, Multiplication Rules and Consistency Checks
that have been applied (i.e their pre-condition matches).
.LP
Only in exceptional cases will it ever be necessary to check the compiler
output file (*.en.ma) that contains the expanded entries.  
Do not change the length of any of the entry fields in this file as the 
compiled lexicon indexes directly into this file.
.NH 2
Spelling Rules.
.LP
If the basic morphemes are in the lexicon the next stage to test is
whether the surface form of the word can be successfully segmented into
those morphemes.  The way to do this is by using the command "s" in the 
command interpreter.  This will show all the possible segmentations of a
surface string with respect to the currently loaded spelling rules.  There
can be a lot of segmentations for even what appear to be quite simple words.
Note that no morph-syntactic checks are done (i.e. no word grammar rules) on
these segmentations, hence a segmentation of a word into a noun and a third
person singular verbal suffix is quite probable.
.LP
A segmentation may fail because the spelling rules disallow it or one
of the intended morphemes is declared as \f2NonInflect\f1.  Any morpheme
which has a syntactic category that is an \f2extension\f1 of the declared
noninflectable categories can appear \f2only\f1 at the end of a segmentation
(see section 8.2 for more details).  During debugging it is advisable
not to declare any categories in the class \f2NonInflect\f1.
.LP
The other reason for failure to segment the surface form is that the
spelling rules disallow the segmentation.  To help with this problem 
a spelling rule debugger is provided.  The debugger takes a lexical
form and a surface form and compares them using the currently loaded set of
spelling rules. It displays which rules are being used during a match 
and explains the reason for any failure.
.LP
Before using the spelling rule debugger the user must know what the
intended lexical string and surface string (including nulls) is.  For example
(using the example spelling rules set in section 6.5) the lexical and
surface strings for \f2moved\f1 would be 
.VS L
      Lexical string:  move+ed
      Surface string:  mov00ed
.VE
The spelling rule debugger has two modes of analysis, ``STEP''  (or ``s'',
the default) and ``INFO'' (or ``i'').  ``STEP''
steps through each pair describing the current state of the matching rules,
pausing at each stage (waiting for a RETURN to continue);
while ``INFO'' simply attempts to match the two strings and gives a summary 
of the results at the end.  Note that that the answer ``q'' to the stepper
will quit the debugger and return the main command interpreter. 
For example, (again using the example spelling
rule set in section 6.5) an analysis of the ``flies'' is shown below.
.VS L
> spd
Enter Lexical string: fly+s
Enter Surface string: flies  
Which mode ? s
Debug Mode is STEP

Lexical string :  f l y + s
                  ^
Surface string :  f l i e s
The pair (f f) was licensed by:  DEFAULT
Left hand side(s) active: I-Spelling Elision Elision Y-replacement
 
Lexical string :  f l y + s
                    ^
Surface string :  f l i e s
The pair (l l) was licensed by:  DEFAULT
Left hand side(s) active: I-Spelling Elision Elision Y-replacement 

Lexical string :  f l y + s
                      ^
Surface string :  f l i e s
The pair (y i) was licensed by:  Y-replacement
Right hand side(s) pending for
   the rule(s) Y-replacement for the pair (y i)
Left hand side(s) active: I-Spelling Epenthesis
                          
Lexical string :  f l y + s
                        ^
Surface string :  f l i e s
The pair (+ e) was licensed by:  Epenthesis
Right hand side(s) pending for
   the rule(s) Epenthesis for the pair (+ e)
   the rule(s) Y-replacement for the pair (y i)
Left hand side(s) active: I-Spelling 

Lexical string :  f l y + s
                          ^
Surface string :  f l i e s
The pair (s s) was licensed by:  DEFAULT
The following rule(s) have terminated: Y-replacement Epenthesis
Left hand side(s) active: I-Spelling Elision Elision Y-replacement
                          Epenthesis Epenthesis
.VE
.LP
At each stage the name of the rule licensing the current pair is named.
This is usually \f3DEFAULT\f1 where they are the same 
character.  Also the names
of the rules currently active (both left and right hand sides) are named.
The left hand side of a spelling rule is said to be \f2active\f1 when there is
a left context of the rule matching the left context of the current match.
Right hand sides of rules are \f2pending\f1 when the rule has licensed a pair
but the full right context has yet to be found.
Note that where a rule name is named twice this means there are two 
possible uses of the rule at that time.
.LP
There are three ways a match can fail.  Firstly the current pair may not be
licensed by any of the rules or the defaults.  There are two possible
reasons for this.  The first is that the 
pair never occurs anywhere in any rule or set of defaults pairs or possibly
one of the characters has been omitted from one of the alphabets.  The other
reason is that although appearing in a rule, the correct left context has not
been found.  For example (using spelling rules in section 6.5) 
the following lexical and surface forms would fail
because of the unlicensed pair (+ e), as no context of the Epenthesis
rule has been found
.VS L
   Lexical string: month+s
   Surface string: monthes
.VE
The second reason for failure is rule blocking, which occurs
when a left and right context of a surface coercion or a combined rule
are found but the middle pair is not as stated in the rule.  Because these
forms of rule enforce a particular pair (or set of pairs) they will block when 
another pair is found.  For example (using spelling rules in section 6.5)
the following strings will fail because
the rule Epenthesis will block.
.VS L
   Lexical string: church+s
   Surface string: church0s
.VE
The final failure type is where a right hand side fails to match after
the rule has licensed a pair.  An example (again using section 6.5) 
of this is as following
.VS L
   Lexical string: church+y
   Surface string: churchey
.VE
This would fail because the pair (+ e) does not appear with a valid right
context as specified by the Epenthesis rule.
.LP
There are other types of problem that can occur (such as rule interaction
as described near the end of section 6.3).  Another possible direction
to take when debugging the spelling rules is to use the "cm" (concatenate
morphemes) command giving it the lexical string that is required.  This
will show which surface strings can correspond to the lexical string.
.NH 2
Word Grammar
.LP
If the spelling rules can successfully segment the surface string into the 
desired set of morphemes the next stage is to look at the word grammar.
First identify which rules should be used in the parse and check that
they are correct.  If there are sub-parts of the word which are also
words check that they can be recognised.
.LP
The word grammar is used to parse the segmented lexical string by a chart
parser.  To understand the word grammar debugger it is necessary to understand
a little about chart parsers.  For a reasonable description of them see
see ``Language as a Cognitive Process'' pages 116-127 (Winograd 1983).  The
word parser debugger allows the user to investigate which sub-structures
were successfully built during a word parse.  The command \f3db\f1 is used to 
look up a word for debugging.  This effectively is the same a the \f3w\f1
command but enters a debug loop after it has analysed the word.  
.LP
On entry into the debug loop the number of edges and vertices are displayed.
References to the edges and vertices are made as integers.  It
would be better if this chart structure could be displayed as a graph but
because of the lack of graphics within general LISPs this has been omitted.
To display an edge or vertex use the commands \f3de\f1 or \f3dv\f1 
respectively.
A list of edges (or vertices) may be given to these commands to make 
comparison between them easier.
.LP
As an option to the \f3dv\f1 command (display vertex) the user may display
all incomplete incoming edges to the given list of vertices (using 
the option \f3ii\f1).
The other option \f3co\f1 to \f3dv\f1 will display only complete outgoing edges
from the given list of edges.
.LP
As usual the command \f3q\f1, \f3exit\f1 or EOF will leave the 
debugger and return
to the top level command loop.  The command \f3h\f1 or \f3?\f1 will 
display information
about all the debug commands.
.NH 2
Other Points
.LP
The above are some of the methods that can be used to help debug a 
dictionary description.  The system is complex and contains a lot of options
so care should be taken when writing rules.
.LP
If you find it impossible to detect why an apparently acceptable word is
failing to be analysed, please contact Graeme Ritchie at Edinburgh.  Either
there is a bug in the system or the debuggers are not helping the user
find a problem.  If you find any typical errors that seem to be particularly
difficult to find within the system please inform us.
.ds RH Section 5
.bp
.NH 1
User Specified Files
.NH 2
Overview and Notation
.LP
This section describes each of the three sub-parts of a dictionary
system.  In each description a discussion is given of the section and
how it may be used, followed by a detailed description of the file
formats.
.LP
Formal definitions of notation will use the \*QBackus-Naur\*U
formalism. (If you are not familiar with this consult Wulf et al.
(1981)).  In general it is fairly
transparent: the only special symbols are \*Q::=\*U  (like  the \*Qrewrite\*U
symbol -> in context-free 
grammar), \*Q|\*U,   which  means  \*Qor\*U,  and  \*Q*\*U,  which  means  \*Q0  or  more
occurrences of\*U. We have also used the symbol \*Q*1\*U to mean \*Q1  or
more occurrences of\*U. Notice that the symbols \*Q(\*U, \*Q)\*U are
\f2not\f1
part of BNF notation and so when they appear, they represent literal symbols
appearing in the format.
The BNF definitions have been annotated with the
occasional comment, preceded by a semi-colon. These interjections
should be clearly distinguishable from the actual definitions.
.LP
As far as the user is concerned, there are three separate sections;
any compatible set of three sections is sufficient for the Analyser to run.
By  altering  or rewriting the contents of these files the user can tailor the
package to specific requirements, or rebuild it completely from scratch.
.NH 2
Sharing information between files \(em #include
.LP
Each of the sections are basically stored in separate files (though
some information may be shared between sections).
Each section is self-contained and all related declarations are made
within each section.  Where declarations need to be shared between 
two sections,
e.g. features and values in the word grammar and lexical entries, then
the \*Qinclude\*U facility can be used.  The compiler directive \*Qinclude\*U 
can be used only in the lexicon file and the word grammar file:
.VS
#include <filename>
.VE
This makes the Compiler include \f3<filename>\f1 in-line and 
interpret its contents.  (This is analogous to sharing header files in C 
programming).
.ds RH Section 6
.bp
.NH 1
Morphographemic Spelling Rules
.NH 2
Overview
.LP
These rules (called \*Qmorphographemic rules\*U) are concerned with
undoing spelling or phonological changes to 
recover the form of a word which corresponds to some morpheme entry in the 
lexicon. This version has been tested with a full set of rules for
(British) English spelling changes.
.LP
For example, \f2moved\f1 can be viewed as \f2move+ed\f1, but with the
deletion of the extra \f2e\f1; 
\f2provability\f1 can be viewed as \f2prove+able+ity\f1, with adjustments
occurring at both the internal boundary points.
.NH 2
Spelling Rules \(em Introduction
.LP
The formalism used within this system is based on the work of Koskenniemi
(1983a, 1983b, Karttunen 1983).  In earlier versions of this system the user 
needed to specify the spelling rules in a low level notation similar
to finite state automata, but now there is a high level notation based on
Koskenniemi (1985) allowing the rules to be written in a more readable form, 
and compiled into suitably interpretable structures
during the pre-processing stage.  The compilation process is
similar to Bear's method (Bear 1985), though the spelling rule form is
more like Koskenniemi's.
.LP
Before a more detailed description of the formalism is given a simple
example may help the reader to understand the notation.  The following example
describes the phenomenon of adding an \f2e\f1
when pluralising some nouns (also
making some verbs into their third person singular form).  e.g \f2boys\f1 
as \f2boy+s\f1 while \f2boxes\f1 as \f2box+s\f1. This phenomena is 
called \*Qepenthesis\*U:
.VS L
Epenthesis
    +:e  <=>  { < { s:s c:c } h:h > s:s x:x z:z } --- s:s
.VE
Note that this is only one way among several of describing this spelling 
change.  The
rule assumes that the morpheme \f2+s\f1
(see below for comments on the + character) is in the lexicon to represent the
plural morpheme. (Let us exclude for the time being its use as the third 
person singular morpheme).  Basically the epenthesis rule states that \f2e\f1
can be added at a morpheme boundary when and only when the boundary has 
\f2sh\f1, \f2ch\f1, \f2s\f1, \f2x\f1, or \f2z\f1 or on the left side
and \f2s\f1 
on the right.  The braces in the rule indicate alternatives and
angled brackets indicate 
sequences.  The \*Q---\*U can be thought of as marking the position of the
symbol pair \f2+:e\f1 (see later for full description).
.LP
Now let us show how such rules may be developed.  Consider the examples
\f2moved\f1.  The first point to 
understand about the rule formalism is that the rules describe relationships 
between the \*Qsurface form\*U, that is the actual word as it appears in a 
sentence, and the \*Qlexical form\*U, as it appears in the citation
forms of the 
lexical entries.  In the example above \f2moved\f1 is the surface form while 
\f2moveed\f1 is the lexical form.  What is required is a rule that allows the 
deletion of an \f2e\f1 from the lexical form.  Note that the rule should refer
to the context where the \f2e\f1 can be deleted and not just allow arbitrary 
deletions of \f2e\f1s in the lexical form as then the surface form \f2reed\f1 
would match \f2red\f1 in the lexicon.
.LP
The spelling rules are specified as a pair (lexical symbol : surface
symbol), and the context in which that pair is acceptable.  One possible 
rule that allows deletion of an \f2e\f1 in \f2moved\f1 is
.VS
e:0 <=> < m:m o:o v:v > --- < e:e d:d >
.VE
The 0 (zero) symbol (the null symbol) in the rule
pair means \*Qmatch with no character\*U.
This rule is of course is a very specific rule which only copes
with \f2moved\f1 (and also \f2removed\f1) but not words 
like \f2taped\f1, \f2prepared\f1 
or \f2taping\f1, \f2moving\f1, and \f2preparation\f1, all of which seem to 
exhibit the same phenomenon.
.LP
What we need to do is generalise our rule to say
that \f2e\f1 can be deleted when in the context \*Qconsonant on the left and
vowel on the right\*U.  This may look like:
.VS
e:0 <=> C:C --- V:V
.VE
where C is declared as the set of consonants and V the set of vowels.
This rule is now too general as it does not refer to the morpheme
boundary.  This means that \f2reed\f1 will not
match \f2reed\f1 in the lexicon as the rule states that an \f2e\f1
must be deleted even when there is no morpheme boundary.
.LP
There are within the Analyser system, deliberately, no built-in conventions
concerning morpheme boundaries.  The solution to our over-generalised
rule is to introduce some way of allowing the rule to stipulate the presence
of a morpheme boundary in the context.  One way to do this is to add a marker 
(some special character) to the lexical form of the morphemes
involved.  Rules would then be able to refer indirectly to morpheme 
boundaries by means of this special character in the context statement.
Within the description
of English distributed with the system the citation form of each suffix starts
with the character \*Q+\*U.  This means we have morphemes of the lexical 
form \f2+ed\f1, \f2move\f1, \f2+ing\f1, \f2+ation\f1, etc.  Our e-deletion 
rule will now be of the form
.VS
e:0 <=> C:C --- < +:0 V:V >
.VE
which describes e-deletion more accurately, though it is still a little too
general. The rule in our English description is
.VS
e:0  <=>  =:C2 --- < +:0 V:= >
        or   < C:C V:V > --- < +:0 e:e >
        or   { g:g c:c } --- < +:0 { e:e i:i } >
        or   l:0 --- +:0
        or   c:c --- < +:0 a:0 t:t >  ;; A-deletion
.VE
The \*Qor\*U operator allows the user to state alternative contexts for the
rule pair \f2e:0\f1, whereas the \*Q{}\*U notation only specifies local
alternatives within either a left or right context.
.LP
Each context is for particular cases: the first allows words 
like \f2moved\f1 as \f2move+ed\f1; the second 
allows \f2argued\f1 as \f2argue+ed\f1;
the third allows \f2encouraging\f1 as \f2encourage+ing\f1 but also
copes with \f2courageous\f1 as \f2courage+ous\f1; the fourth context deals
with e-deletion in words like \f2readability\f1 as \f2read+able+ity\f1;
and the last context allows e-deletion in \f2reduction\f1 
as \f2reduce+ation\f1.
.NH 2
Spelling Rule Formalism \(em More Detail
.LP
This section describes the rule formalism in detail and discusses some
of the issues in choosing types of rule.  The full BNF for the rules is 
given in the next section (6.4).
.LP
A set of spelling rules consists of zero or more rules.  Each rule
is of the form 
.VS L
<rule> ::= <name> <pair> <operator> <contexts> <where clause>
.VE
.VS L
<operator> ::=    \f3=>\f1  |  \f3<=\f1  |  \f3<=>\f1 
.VE
.VS L
<contexts> ::=  <simple context> 
             |  <simple context> \f3or\f1 <contexts>
.VE
.VS L
<simple context> ::= <context expr> \f3---\f1 <context expr>
.VE
.VS L
<where clause> ::= 
         \f3where\f1 <where variable name> \f3in\f1 <enumerated set>
         |                           ;; empty
.VE
where <name> is an atomic name, and
<context expr> (the left and right contexts) is a regular expression
using pairs of the form (lexical symbol : surface symbol).  The three
possible values for <operator> are: <=, => or <=>, which represent forms of
implication. 
.LP
Context Restriction:
.VS
a:b  =>  LC --- RC
.VE
means the lexical character \f2a\f1 can match the surface character \f2b\f1
only when it is in the context of LC and RC, hence
\f2a:b\f1 cannot appear in any other context.
.LP
Surface Coercion:
.VS
a:b  <=  LC --- RC
.VE
means in the context LC and RC a lexical \f2a\f1 can only be matched with 
a surface \f2b\f1 and nothing
else.
.LP
Combined Rule:
.VS
a:b  <=>  LC --- RC
.VE
this is equivalent to the combination of the context restriction and surface
coercion rules.  It means \f2a\f1 must match \f2b\f1 in the context LC and RC 
and only in that context.
.LP
Rules will typically be written only for pairs \f2a:b\f1
where \f2a\f1 and \f2b\f1 are different characters.  It is built into the 
formalism that unless otherwise restricted, all feasible pairs (see
below for definition) are accepted in any context.
.LP
Pairs are made up of a lexical symbol and a surface symbol separated 
by a \*Q:\*U
.VS L
<pair> ::= <lexical symbol> \f3:\f1 <surface symbol>
.VE
A lexical symbol can be one of three types: a lexical character
from the declared lexical alphabet; a lexical set, declared over a range
of lexical characters; or the symbol 0 (zero) which represents the null
symbol.  Similarly there are three possibilities for the surface symbol.
.LP
The left and right contexts are basically regular expressions.  Their
syntax is
.VS L
<context expr> ::=  <pair>
    |   \f3<\f1 <itemlist> \f3>\f1     ; sequential items
    |   \f3{\f1 <itemlist> \f3}\f1     ; or choice of items
    |   \f3(\f1 <itemlist> \f3)1+\f1   ; one or more occurrences

<itemlist> ::= <context expr>
           | <context expr> <itemlist>
.VE
Although alternatives can be specified within a left or right context
using the \*Q{ <itemlist> }\*U construct,
we also need the ability to allow alternatives for full contexts.  If
separate rules were given for each alternative left and right context
there would be the undesirable effect of each one blocking the other.
Rules are effectively ANDed together, or to get an OR choice for contexts
there is the \*Qor\*U connective as used in \*Qe-deletion\*U above.  
.LP
The interpretation of pairs containing sets depends on the 
\*Qfeasible pairs\*U.
When the rules are compiled, after the 
\*Qwhere\*U clauses
are expanded (see below), all rules are searched for what are termed 
\*Qconcrete pairs\*U.
Concrete pairs are those
which are made up of characters in the alphabets or null symbol only (i.e.
containing no sets).  In addition to these concrete pairs found from the 
rules, identity pairs made from the intersection of the lexical and 
surface alphabets are added.  This set represents the 
\*Qfeasible pairs\*U.
Pairs containing sets, such as \f2V:V\f1 where the lexical
set V is { a e i o u y }
and the surface set V is { a e i o u y } are interpreted as all feasible
pairs that match.  If \f2y:i\f1 is a feasible pair then it will 
match \f2V:V\f1.
.LP
In addition to the definition above for feasible pairs there is the 
facility to specifically add other pairs.  This may be useful where
some pair in a rule contains a set and the user wishes it to stand
for some concrete
pair that does not actually exist in any of the currently specified rules.
For example 
the pair \f2+:=\f1 may be used, where = is set ranging over the whole surface
alphabet.  The user may intend this pair to stand for, among 
others, \f2+:l\f1, although \f2+:l\f1 does not actually appear in any of the 
rules.  In this case, \f2+:l\f1 should be declared as a Default 
Pair (see below).
.LP
An addition to the formalism, which is formally not needed, is the 
introduction of a 
\*Qwhere\*U clause.
This saves the user typing separate rules
for similar phenomena. 
A good example can be seen in the rule for consonant doubling (gemination):
.VS
+:X  <=>  < C:C V:V =:X > --- V:V
        where X in { b d f g l m n p r s t }
.VE
The rule is effectively duplicated 
with the variable \f2X\f1 bound to each member of the set in turn.  Note, if
a \*Qwhere\*U clause were not used and \f2X\f1 declared as a set ranging over
{ b d f g l m n p r s t }, the value found for \f2X\f1 in the rule
pair \f2+:X\f1 would not necessary be the same value for \f2X\f1 in the left 
context.  There would be no point in changing the interpretation of these sets
to that of the \*Qwhere\*U variables 
as we do not want the \f2V:V\f1 in the left context necessarily to be the 
same \f2V:V\f1 in the right.  
.LP
Any number of spelling rules can be specified (the sample English description
has 16).  These rules are applied in parallel to the matching of the 
surface form and the lexical forms.  For a match to work, 
all rules must find it acceptable.  All members of the set of feasible pairs 
not on the left-hand side of some rule (i.e. \f2a:a, b:b, c:c\f1, etc.) are 
accepted in any context.
.LP
The rules refer to single characters (although set names may be
of arbitrary length).  This means that the surface and lexical alphabets
have to be lists of single character symbols.  Any normal typable characters
may be used, but the following characters must be surrounded by double
quotes:
.VS
space, tab, newline, comma, ), (, #, ;, :, }, {, >, <, ], [, !
.VE
A double quote itself is specified
as a double double quote within quotes, e.g. """" represents one quote.
.LP
Note that when a rule pair \f2a:b\f1 from some
rule A with the operator <=> or => also appears within a context of some rule 
B, the user must take care to ensure that the context where \f2a:b\f1 appears 
within rule B is catered for in rule A.  An example will help to illustrate 
this point.  Consider the following two rules:
.VS L
Elision
     e:0  <=>  =:C2 --- < +:0 V:= >
       or   < C:C V:V > --- < +:0 e:e >
       or   { g:g c:c } --- < +:0 { e:e i:i } >
       or   l:0 --- +:0

A-deletion 
     a:0  <=>  < c:c e:0 +:0 > --- t:t
.VE
The \f2e:0\f1 in the left context of the A-deletion rule is in a context that 
is not catered for within
the Elision rule.  This means that A-deletion will always fail.  What is
required is the addition of another context to the elision rule:
.VS L
       or   c:c --- < +:0 a:0 t:t >  ;; A-deletion
.VE
This rule-clashing is a problem and it does seem to be a common error
that occurs when specifying spelling rules.  It may be possible in future
versions of the system to have an automatic check for clashing pairs 
during compilation.
.LP
Another decision the user has to make is when to treat a given alternation as 
morphographemic, and when to treat it by writing distinct morpheme entries.
For example, it seems ridiculous to go as far as writing the following rule:
.VS
o:e  <=> g:w --- < +:0 e:n d:t >
.VE
which will match \f2went\f1 to \f2go+ed\f1.  This rule is in fact insufficient
as it introduces the pairs \f2w:g\f1, \f2e:n\f1 and \f2d:t\f1 into the 
feasible pairs set and thus allows \f2wear\f1 to match \f2gear\f1 etc.  If 
this rule were to be included
then three more would be needed to cope with these three extra 
pairs.  But rules that match surface forms to such different lexical forms
are not recommended.  It seems wise to have \f2went\f1 as a morpheme entry 
with the necessary past tense marking.  \f2Went\f1 is a clear example but some
others are not so clear.  Should \f2written\f1 match \f2write+en\f1?  
.LP
The question is whether a change is to be taken as a different morpheme or 
just as a spelling change.  The answer to this problem is up to the user, and
the best advice is choose whichever is more elegant.  (The definition 
of \*Qelegance\*U is left as an exercise for the reader.)
.NH 2
Spelling Rules \(em User File Format
.LP
The format for the \f3*.sp\f1 file is as follows.
.VS L
<spelling rules> ::= \f3SurfaceAlphabet\f1 <enumerated set>
                     \f3SurfaceSets\f1 <set declaration> *
                     \f3LexicalAlphabet\f1 <enumerated set>
                     \f3LexicalSets\f1 <set declaration> *
                     \f3DefaultPairs\f1 <pair> *
                     \f3Rules\f1 <rule> *
.VE
.VS L
<enumerated set> ::= \f3{\f1 <alphabet char> * \f3}\f1
.VE
.VS L
<alphabet char> ::= <surface character> 
                  | <lexical character>
                  | \f30\f1            ;; null symbol
.VE
.VS L
<surface character> or <lexical character> ::=
            <single non special character>
          | <quoted character>  ;; e.g. " " for space, """" for "
.VE
.VS L
<set declaration> ::=  <atom name> \f3is\f1 <enumerated set>
.VE
.VS L
<rule> ::= <name> <pair> <operator> <contexts> <where clause>
.VE
.VS L
<operator> ::=    \f3=>\f1  |  \f3<=\f1  |  \f3<=>\f1 
.VE
.VS L
<contexts> ::=  <simple context> 
             |  <simple context> \f3or\f1 <contexts>
.VE
.VS L
<simple context> ::= <context expr> \f3---\f1 <context expr>
.VE
.VS L
<context expr> ::=  <pair>
       |   \f3<\f1 <itemlist> \f3>\f1     ; sequential items
       |   \f3{\f1 <itemlist> \f3}\f1     ; or choice of items
       |   \f3(\f1 <itemlist> \f3)1+\f1   ; one or more occurrences
.VE
.VS L
<itemlist> ::= <context expr>
            | <context expr> <itemlist>
.VE
.VS L
<where clause> ::= 
         \f3where\f1 <where variable name>
                 \f3in\f1 <enumerated set>
         |         ;; no where clause
.VE
.VS L
<pair> ::= <lexical symbol> \f3:\f1 <surface symbol>
.VE
.VS L
<lexical symbol> ::= <lexical character>
                   | <lexical set name>
                   | <where variable name>
                   | \f30\f1                   ; null symbol
.VE
.VS L
<surface symbol> ::= <surface character>
                   | <surface set name>
                   | <where variable name>
                   | \f30\f1                   ; null symbol
.VE
Comments can appear anywhere in the file; they begin with a semi-colon
and continue to the end of the line (following the LISP convention).
.NH 2
Spelling Rules \(em Example File
.LP
As an example of a set of spelling rules, the five rules described
in Karttunen and Wittenburg (1983) could be specified as follows:
.VS L
SurfaceAlphabet
        { a b c d e f g h i j k l m n o p q r s t u v w x y z }
.VE
.VS L
SurfaceSets
        C is { b c d f g h j k l m n p q r s t v w x z }
        CC is { b d f h j k l m n p q r s t v w x y z }
        NA is { b c d e f g h j k l m n o p q r s t u v w x y z }
        S is { s x z }
        V is { a e i o u }
        = is { a b c d e f g h i j k l m n o p q r s t u v w x y z }
.VE
.VS L
LexicalAlphabet      ;; comments can appear if required
        { a b c d e f g h i j k l m n o p q r s t u v w x y z + }
.VE
.VS L
LexicalSets
        C is { b c d f g h j k l m n p q r s t v w x z }
        NA is { b c d e f g h j k l m n o p q r s t u v w x y z + } 
        S is { s x z }
        V is { a e i o u }
        = is { a b c d e f g h i j k l m n o p q r s t u v w x y z + } 
.VE
.VS L
DefaultPairs
        +:0 e:0    ;; actually not necessary in this description
.VE
.VS L
Rules

Epenthesis
        +:e   <=>  { < s:s h:h > S:S y:i }  ---  s:s
                or  < c:c h:h > --- s:s
.VE
.VS L
Gemination
        +:X  <=>  < V:V =:X > --- V:V
                where X in { b d f g l m n p r s t }
.VE
.VS L
Y-replacement
        y:i   <=>  C:C --- < +:= NA:NA >
.VE
.VS L
Elision
        e:0   <=>  =:CC --- < +:0 V:V >
                or  < C:C V:V > ---  < +:0 e:e >
                or  { g:g c:c } ---  < +:0 { e:e i:i } >
.VE
.VS L
I-Spelling
        i:y   <=>  =:= --- < e:0 +:0 i:i >
.VE
.ds RH Section 7
.bp
.NH 1
Word Grammar Rules and Feature Defaults
.NH 2
Word Grammar - Introduction
.LP
This file is concerned with derivational and inflectional
morphology and consists of a set of declarations and a
sequence of word-structure rules.
These rules describe what constitutes an allowable sequence of morphemes,
stating which concatenations are valid, and the category of the 
overall
word formed by several morphemes. For
example, \f2happy+ness\f1 is a valid noun, but \f2arrive+ness\f1 is not
a valid word.
.NH 2
Features and Categories
.LP
The word grammar is based on the concept of features and values.
Any constituent (morpheme, word, word-part, etc.)  can be represented by
a set of features and values, called a category.
.NH 3
Syntax of Categories
.LP
There are basically two ways of writing categories.
A simple \*QLISPish\*U notation
and a notation closer to that used in GPSG.  For example, a 
category of a plural noun can simply be represented as:
.VS
((N +) (V -) (PLU +) (BAR 0))
.VE
The other notation relies on the use of declared aliases (see 7.4 below)
so that with the appropriate aliases declared the above category can also be
written.
.VS
Noun[PLU +,BAR 0]
.VE
That is an alias followed by a feature \f2bundle\f1 in square brackets. The
above syntax applies regardless of the style of unification \(em see sections
7.6.1 and 7.6.2.
.NH 3
Feature Definitions
.LP
All features used in the word grammar (and lexical entries) must
be declared to the Analyser system.  There are two types of features,
\f2atomic-valued\f1
and
\f2category-valued\f1.
Atomic-valued features must be declared with an enumerated set of 
atomic values.  Category-valued features can take any valid category as
their value.  These are declared using the keyword \f3category\f1 (or
\f3CAT\f1), e.g.
.VS L
Feature  N    {+,-}
Feature  BAR  {-1,0,1,2}
Feature  AGR  category
.VE
Although our sample English description uses particular feature names,
there is no need for the user to copy such conventions.  There is only
one restriction on the features declared.  If a feature of the name STEM
is declared, it \f2must\f1
be a category-valued feature.  This feature is used by the WSister
Convention (see section 7.9.3) and should not be used in any other
way.
.NH 2
Word Grammar Rules
.LP
The word grammar is a feature unification grammar (see section 7.6)
with rules of the
form:
.VS L
mother  ->  daughter1 daughter2 ... daughterN
.VE
where \f2mother, daughter1, daughter2\f1, etc. are categories made up of
features.  Rules may have one or more daughters.  In addition to 
simple categories the grammar may also contain    
variables and aliases (see below).
.NH 2
Aliases
.LP
Aliases are a short-hand for writing categories (and parts of 
categories).  They allow an atomic name to be associated with
a category (or part of a category), and hence can be used to
represent that category in 
a rule.  For example the aliases \f2Noun\f1 and \f2Verb\f1 might be declared
as:
.VS L
Alias  Noun = ((N +) (V -))
Alias  Verb = ((V +) (N -))
Alias  Prep = [-V,-N]
Alias  -V   = ((V -))
Alias  -N   = ((N -))
.VE
Note that either category notation may used in declaring aliases and that 
aliases may be declared in terms of aliases.  Where an alias is declared
referring to itself the system displays an error.
.NH 2
Variables
.LP
There are two types of variables allowed within the categories in the 
grammar; \*Qrule-category variables\*U and \*Qfeature value variables\*U.
Rule-category variables range over specific categories, and are a short-hand 
for writing similar grammar
rules.  They are declared
with a range of possible values that must be stated as a list
of aliases.  Rule-category variables
can be used to capture generalisations in rules.  For example, in French
both nouns and adjectives can take a plural morpheme \f2s\f1 (which can
be represented by the category ((PLU +) (FIX SUF)) ). This phenomenon
could be described
using the follow alias statements and rules:
.VS L
Alias    Adj  =  ((BAR 0) (N +) (V +)) 
Alias    Noun =  ((BAR 0) (N +) (V -)) 

(AdjPlural
        Adj[PLU +] ->  Adj[PLU -], [PLU +,FIX SUF]  )

(NounPlural
       Noun[PLU +] ->  Noun[PLU -], [PLU +,FIX SUF] ) 
.VE
Alternatively, the two rules can be written as one by declaring
a category variable:
.VS L
Alias    Adj  =  ((BAR 0) (N +) (V +)) 
Alias    Noun =  ((BAR 0) (N +) (V -)) 

Variable   C  =  {Adj,Noun}

(Plural
       C[PLU +]  -> C[PLU -], [PLU +,FIX SUF] )
.VE
Rule-category variables are \*Qcompiled out\*U during grammar compilation,
and are thus actually used to collapse a number of rules.
.LP
Feature value variables, on the other hand,  can best be thought of 
as \*Qholes\*U that are filled in during parsing (although theoretically
they have equivalent semantics to rule-category variables, if we overlook
the distinction between abbreviations for finite sets and for infinite sets).
There are two
types of feature value variables; \f2atomic\f1-valued and \f2category\f1-valued
(category-valued variables are \f2not\f1 the same as rule-category
variables).  The distinction is analogous to that between the
atomic-valued features and category-valued features described above.
Atomic-valued variables are declared with an enumerated set of values,
while category-valued variables are declared with the keyword \f3category\f1
(or \f3CAT\f1):
.VS L
Variable   ALPHA  =  {+,-}
Variable   ?AGR   =   category
.VE
Feature value variables are not
compiled out at grammar compile time but are instantiated during parsing.
The ranges of feature value variables can be used to restrict the 
scope of rules.  They can also be used to \*Qcopy\*U values of
features up (and down) the parse tree.  For example, a compound
noun can be said to inherit its plural feature marking from the 
rightmost daughter.  Using feature value variables we can write
a rule that ensures that the compound noun will have the same PLU
marking as its rightmost daughter:
.VS L
Variable   ?X  =  {+,-}
Alias      N   =  ((BAR 0) (N +) (V -)) 

(NounCompound
      N[PLU ?X]   -> 
          N[PLU -],   ;; ensure basic noun
          N[PLU ?X]  )
)
.VE
Note that although atomic-valued variables can be thought of as a short-hand
for a number of rules, one for each value in the range of the variable,
category-valued variables cannot.  This is because there is potentially
an infinite number of categories that could be the value of a 
category-valued feature.
.LP
There are no typographical conventions built-in for specifying variables; the
user, however, may wish to adopt some convention such as all variables
starting with underscore or question mark.  This does make rules easier to read
but is in no way mandatory.
.LP
In addition to the use of variables for \*Qpassing\*U features
around during parsing there are some built-in feature passing conventions
(see Section 7.9 for more details).
.NH 2
Extension and Unification
.LP
Before a description of what constitutes a valid analysis can be given
two definitions are required.
.LP 
\f3Extension\f1
.IP(a)
A feature-value (either atomic or a category) is an extension
of any variable of an appropriate type.
.IP(b)
An atomic feature-value is an extension of itself.
.IP(c)
Category \f2A\f1 is an extension of category \f2B\f1 iff
for any feature \f2f\f1 in category \f2B\f1, there is a value of \f2f\f1 in
\f2A\f1
which is an extension of the value of \f2f\f1 in category \f2B\f1.
.ne 4
.LP 
\f3Unification\f1
.IP
The unification of two categories is the least specified category
that is an extension of both of them if such category exists.  It is
possible that no such category exists, and in that case unification is 
undefined.
.LP
Intuitively, extension and unification can be thought of as the set relation
superset and the set operation union, respectively, with the extra
refinement of allowing at most one entry for each feature within a category.
The creation of the unification of two (or more) categories is referred
to as \*Qunifying\*U the categories and may cause variables to be bound
to particular values.
.LP
The definition of a category and how they are used to represent parts
of the dictionary depends on the type of unification chosen in the system.
The choice is between \f3term unification\f1 and \f3unrestricted
unification\f1.
They are formally equivalent but in their normal usage they encourage
quite different viewpoints of what a category represents.
.NH 3
Unrestricted Unification
.LP
Both our large English descriptions are written using unrestricted unification
(though that fact is not very important in the Simple description).
.LP
This method of representation and unification is based heavily on 
the GPSG model of syntactic features (cf. Gazdar et al. (1985), chap. 2). 
The important distinction in the unrestricted unification version is that 
there is no formal concept of category types.  A category
is simply represented as \f2any\f1
set of features with values, where a feature value is atomic or a category \(em
although a description will implicitly have varying category types (since
there will be 
restrictions on which features can appear together in the one category).
.LP
This means that a category can be
used in a grammar rule to refer to many \f2user-\f1category-types while
this is not true in the term unification version.  However, the very general
unification process required is computationally intensive, and  not as
amenable to optimisation as term unification (see below).
.NH 3
Term Unification
.LP
In term unification there are a pre-declared number of category types
defined using the \f3CatDef\f1 definition.  A category definition consists
of an atomic name and a list of feature names that are part of that category.
For example:
.VS L
   CatDef  Noun has {CAT,PLU,COUNT,SUBCAT}
.VE
Note that \f2each\f1 category that appears in the lexical entries and grammar
rules \f2must be of one and only one\f1 category type, though 
it need not actually
specify all its features explicitly.  Also it must be the case that any
reference to a category must be such that the category type can be identified
solely on the feature names it contains - \f2not\f1 their values.
Note that the GPSG representation of nouns and verbs (using the features \f3N\f1
and \f3V\f1) does not allow this distinction and if that representation
is to be used it is necessary to ensure for examples all references to 
verbs contain the feature \f3VFORM\f1 and all noun references contain 
\f3NFORM\f1.  This assumes that the linguistic categories of
noun and verb are to be distinguished within the system.
.LP
User written categories are expanded to contain variable values for those
features that have not been mentioned.  When the categories are
returned or printed the full feature list is given.  Variable values
are denoted by a list consisting of a number (unique to that variable),
the atom \f3<UNBOUND-VARIABLE>\f1 then the range of the variable.  Thus
all information that the word parser has is returned to the user fo the
system.  The expansion of the basic category to the full tu category
is done after all other syntactic sugar has been applied (such as aliases).
.LP
Note that
in most of the examples given in this document a unrestricted unification
category is assumed.
.LP
The definitions for extension and unification (above) hold for both types
of unification.
However, it is worth adding that in term unification two categories
can unify (or that one category is an extension of the other) only if
they are of the same type.
.NH 2
Defining Structures with Grammar Rules
.LP
The Analyser finds all possible structures for a given word.
Each structure can be thought of as a tree, where each node is labelled with:
.IP (a)
a syntactic category, and
.IP (b)
either a lexical entry (if a terminal node) or a word grammar rule (if a
non-terminal node).
.LP
If a non-terminal node \f2N\f1 has a syntactic category \f2C0\f1 and a rule
\f2A -> d1, d2, ..., dn\f1, then \f2C0\f1 must be an extension of \f2A\f1,
and \f2N\f1 must have \f2n\f1 daughter nodes \f2M1, ..., Mn\f1 labelled with
syntactic categories \f2C1, ..., Cn\f1 such that \f2Ci\f1 is an
extension of \f2di\f1 for each \f2i\f1 from \f21\f1 to \f2n\f1.  Also the 
feature conventions must be true for any non-terminal node (see section 7.9).
For terminal nodes, the syntactic category must be an extension of the
category in its lexical entry. 
.LP
The syntactic category on the 
root of the structure must be (in unrestricted unification) an extension of 
the Top Category in the word grammar, and (in term unification) a category
type that is declared in the Top Category definition in the word grammar.
.LP
When the STRINGSEGMENT formats are not selected (see Section 3.2),
the Analyser (function D-LookUp) returns all constituents that span
the given input and have a category that is defined by the 
declared Top Category.  When one of the STRINGSEGMENT formats is selected
a list of edges is returned representing all paths through the given input
made up of subparts
that have a category that is valid in terms of the Top Category as defined
above.
.NH 2
Declarations
.LP
A number of different types of declarations are required in the 
word grammar.  This section gives a short description of the 
syntax and use of each declaration type.  Some declarations
might be better shared with an associated lexicon.  This can be done with
the compiler directive \f3#include\f1 (see Section 5.2).
These declarations may appear in
any order in the file.  Briefly the declarations allowable in the 
grammar file are as follows:
.XP
Features : all features and their values that appear in the rules must be
declared.
The name of a feature and its allowable feature-values must be declared
together (i.e. the range of possible values must be included).
For example the
following declaration would define two features:
.VS 
Feature  Plu  {+,-}  
Feature  Agreement CAT
.VE
A feature must either be \*Qatomic-valued\*U
(like Plu in the above example), or \*Qcategory-valued\*U
(like Agreement in the above example); no feature can have
a range which is a mixture of atomic values and categories.  In the case of
a feature being atomic-valued its values should be enumerated, and if the 
feature is category-valued the keyword \f3category\f1 (or \f3CAT\f1) should
be used; that is,
no details can be given of the exact values which a category-valued
feature can take.
The association of a range with a feature can be thought of as analogous
to associating a type with the field of a record in Pascal.  The feature 
definitions must
be the same as those made in the lexicon file (see Section 8 below).
.IP
There is one built-in feature STEM \(em if this feature is declared, it
\f2must\f1
be declared as category-valued, otherwise an error will be signalled.  STEM
is used in the WSister feature passing convention (see Section 7.9).
.XP
Category Definitions:  these are only allowed in the term unification 
version of the system.  Each category that is used in the 
grammar rules and lexical entries must correspond to one and only one 
category type.  A definition consists of a name and a list of feature names.
All feature names in a category definition must also be declared in the 
normal way.  An example category definition may be
.VS L
   CatDef Verb has {VFORM,PN,TENSE,SUBCAT}
   CatDef Noun has {PLU,COUNT,SUBCAT}
.VE
Though note that when referencing a category of the above defined type
all that is necessary is to use a subset of the features that is not
also a subset of any other category type definitions.  Note also 
that in the word grammar rules each category must be of one and
only one type.  Thus using the above definitions it would not be possible
to write a rule which generalised over nouns and verbs (without either
having a more general category type or using rule-category variables).
.XP
Aliases : these can be used to associate a mnemonic symbol with a 
category, aliases may be declared in terms of other aliases. e.g.
.VS L
Alias Noun  = ((N +) (V -))
Alias Verb  = ((N -) (V +))
.VE
.XP
Variables : there are effectively three types of variable, rule-category 
variables
which are used as a short-hand in writing rules; atomic-valued variables;
and category-valued variables.  Only rule-category variables are compiled out
at grammar compile time; atomic-valued and category-valued variables
are instantiated during parsing.  Rule-category variables must be declared
as ranging over aliases representing categories.  Atomic-valued features
must be declared with an atomic range, and category-valued features
must be declared with the keyword \f3category\f1 (or \f3CAT\f1).
For example, assuming that \f2Ingform\f1, \f2Edform\f1, \f2Infform\f1,
\f2Presform\f1, have all been declared
as aliases, and that \*Q+\*U and \*Q-\*U are atomic values, we can declare:
.VS L
Variable  Verbtype = {Ingform,Edform,Infform,Presform} ;; rule-category
Variable  Plurality = {+,-}      ;; atomic valued
Variable  ?AGR  = category       ;; category valued
.VE
This would declare \f2Verbtype\f1 as a rule-category variable, \f2Plurality\f1
as a atomic-valued variable and \f2?AGR\f1 as a category-valued variable.
.XP
Feature-classes : certain classes of feature are useful for defining rules;
at present, there are three classes.   Two of which, \f2WHead\f1 and 
\f2WDaughter\f1 are
used within the feature-passing conventions (see Section 7.9).  These are
only allowed in the unrestricted unification version of the system.
The
third class is \f2MorphologyOnly\f1.  When the format is D-CATEGORYFORM or
D-STRINGSEGMENTCAT all features in the returned syntactic category that
are declared within the class \f2MorphologyOnly\f1 are removed.  This allows 
features which are not of interest to the sentence 
syntax to be removed from the analyses.
.VS L
FeatureClass WHead = {AGR,INFL}
FeatureClass WDaughter = {TAKES}
FeatureClass MorphologyOnly = {INFL}
.VE
.IP
would cause the features AGR and INFL to be affected by the Word-Head
Convention, and TAKES to be affected by the Word-Daughter convention.
Also the feature INFL would be removed from any category before it was
returned.
Notice:
.IP
(a) There are
no
built-in members of these feature-classes \(em if the user 
does not declare
any elements of WHead, WDaughter and MorphologyOnly no conventions will take
affect, and all features will be returned.
.IP
(b) There is no way for the user to affect the third convention (Word-Sister)
as it applies not to a feature-class but to one specific feature (STEM).
.IP
(c) The features mentioned in the feature-class definitions must also
be declared as features in the normal way.
.IP
(d) No feature should normally be declared to be both a WHead
and a WDaughter feature,
but no error will be reported if this done.
.XP
LCategories : These specify necessary features in the returned form of 
a look up.  These definitions allow the user to specify that the
overall category of a looked up word should contain certain features adding
them with a variable value if necessary.  See section
7.10 for more details.  Note these are only allowed in the unrestricted
unification version of the system.
.XP
Feature Defaults : these allow specification of default values for features 
that are unspecified in a category (see Section 7.10 below)
.XP
Top Category : the syntax of this depends on the version of the system.
This definition is used to identify what a valid word is in a parse.
.IP
In unrestricted unification,
a valid word is any structure that has a category
which is an extension of this Top category.  If no Top category is 
declared a warning message is given and the Top category is set to 
the empty category.  This means all structures that span the given word
are valid.  If more than one Top category is declared then the latest
one is taken.  (When string segment options are selected - Section 3.2 -
this category is used to define what portion of a string counts as a word).
In the command interpreter this category may be reset during a session 
using the command \f3st\f1.
.IP
In term unification the Top category is a list of category types (names)
as defined in the category definitions.  If no Top category is defined
then an error is signalled as there must be a definition of valid words
for a word grammar to make any sense.
.LP
Declarations made here are referenced only within the grammar compilation.
But note that the feature declarations \f2must\f1
correspond exactly with the those in the lexicon file (see Section 8).
It is therefore
wise to \*Qinclude\*U the feature declarations and have a common file included
by both the word-grammar file and the lexicon file (see Section 5.2).
If declarations are made which conflict, the system will 
indicate an error.
.LP
A valid analysis is one that spans the
entire input string and (if the format is D-CATEGORYFORM or D-WORDSTRUCTURE)
whose label is valid with respect to the Top category, i.e. a possible
complete word or (if the format is
D-STRINGSEGMENTCAT or D-STRINGSEGMENTWS) a set of edges (which may be nil)
that represent all segmentations of the given string in which
each edge is labelled with a category which is valid with respect to
the Top category.
.NH 2
Feature Passing Conventions
.LP
These are available only in the unrestricted unification version of the system.
.LP
Feature passing conventions can be thought of as a way of extracting various
patterns which occur in the word-grammar rules and stating them separately.
The effect of this is to diminish the amount of explicit information
that needs to be stated in the word-grammar rules, reducing both the
size of the word-grammar (the number of rules) and the complexity
of the individual rules.
These regularities can be expressed as \*Qfeature passing conventions\*U
statements dealing with similarities which must hold between
the features in the derivation of words from the word grammar.
They can be thought of as rules for passing information \s-2UP\s+2 
the analysis tree (from terminal morphemes to the final word), or for passing
information \s-2DOWN\s+2 the analysis tree (from word to constituent 
morphemes). The style and content of these conventions are borrowed from 
the mechanisms employed by Generalised Phrase Structure Grammar at the level
of the sentence (Gazdar et al. (1985)), but the content is very close the
the percolation principles in Selkirk (1982).
.LP
There are three conventions built into the system at present.
Notice that the definitions
of the feature passing conventions themselves are not
under the control of the user, although the features that are
affected by 
the conventions may be controlled by suitable user declarations.
All three conventions act on certain specific
feature-classes,
so the user can make use of these conventions by defining certain features
to lie within these named classes (via the declaration in the \f3*.gr\f1 file
as described in Section 7.8 above).
The system will then automatically
apply the conventions to these features. A user who wishes to avoid
all use of the conventions should avoid declaring any features
to be in these feature-classes.  The user may also employ
variables to achieve similar effects to these conventions.
.LP
All three feature conventions act on what is called within GPSG
terminology a \*Qlocal tree\*U \(em a set of one mother node and its
immediate daughters.  The conventions apply \f2only\f1 to nodes with 
two daughters (i.e. created by binary branching rules).
The conventions are written in terms of \*Qmother\*U, \*Qleft daughter\*U 
and \*Qright daughter\*U 
The three conventions are \f2Word-Head\f1, \f2Word-Daughter\f1 and 
\f2Word-Sister\f1 and are described in the next three sections.
.NH 3
The Word-Head Convention
.IP
The values of the WHead features in the mother should be the same as the
values of the corresponding WHead features of the right daughter. 
.LP
In the word parser, this is achieved, roughly speaking,
by unifying the WHead features of
the right daughter and those of the mother when the daughter is attached.
WHead features are declared in the declarations file using the syntax:
.VS L
    FeatureClass WHead = 
        {<feature-1>,<feature-2>, ... ,<feature-N>}
.VE
For example:
.VS L
    FeatureClass WHead = {N,V,INFL,AFORM,VFORM,ADV,AGR,PLU}
.VE
From a linguistic point of view, the WHead features typically include those
that will be relevant to sentence-level syntax, and hence
those that will be of particular use to a sentence-parser which uses the
dictionary.  This convention is a straightforward analogue of the 
simplest case of the Head Feature Convention in (Gazdar et al. (1985)). Its
effect is to enforce identity of the relevant feature values between mother
and the head daughter.  Note that in the current system there is no formal
definition of \*Qhead\*U to which the user has access (despite the name given
to this convention), since the right
daughter always acts in this head-like fashion within our treatment of English
morphology.  Other analyses may deviate from this pattern, of course; 
different views of \*Qhead\*U may be implemented using variables and 
the definition of a well-formed structure given in section 7.7.
.LP
Assuming the set of WHead features defined above,
the Word-Head Convention would allow the following trees:
.VS L
((N +) (V -) (PLU +))
        ()
        ((BAR -1) (N +) (V -) (PLU +))
and
((N -) (V +) (VFORM ING))
        ((N -) (V +))
        ((BAR -1) (N -) (V +) (VFORM ING))
.VE
but not a tree of the form:
.VS L
((N +) (V +) (PLU +))
        () 
        ((BAR -1) (N +) (V -) (PLU +))
.VE
.LP
since it has a clash in the V value for mother and right
daughter.
.NH 3
The Word-Daughter Convention
.LP
For each WDaughter feature:
.IP (a)
If it exists on the
right daughter then the value must be the same  on the mother as it is
on the right daughter.
.IP (b)
If the WDaughter feature does not exist on the right
daughter but does on the left daughter, it must have the same value 
on the mother as on the left daughter.
.LP
Again, this is ensured by carrying out unification of the
appropriate feature markings during parsing.
This convention is designed to capture the fact that the subcategorization
class of a word (in English) is not affected by
inflectional affixation, although it may be affected by derivation.
.LP
WDaughter features are declared in the declarations file, using the syntax
.VS L
    FeatureClass WDaughter  = 
        {<feature-1>,<feature-2>, ... ,<feature-3>))
.VE
For example:
.VS L
    FeatureClass WDaughter = {TAKES}
.VE
Assuming TAKES to be the only WDaughter feature, this convention allows
trees such as:
.VS L
((TAKES NP))
        ((V +) (N -))
        ((TAKES NP))

((TAKES NP))
        ((TAKES NP))
        ((VFORM ING))
.VE
but not
.VS L
((TAKES NP))
        ((V +) (N -))
        ((TAKES VP))

((TAKES NP))
        ((TAKES VP))
        ((VFORM ING))
.VE
In the first example the right daughter is specified for a TAKES 
value,
and the mother has the same specification; in the second example,
the right daughter has no specification for TAKES
and so the second clause of the WDaughter convention applies.  The third
example is illegal because the values of TAKES on the right daughter and 
mother differ, and the fourth is illegal because, under 
clause (b) of the convention,
the left daughter 
and mother WDaughter features must be identical when there are no WDaughter
features in the right daughter.  
.NH 3
The Word-Sister Convention
.IP
When one
daughter (either left or right) has the feature STEM,
the category of the other daughter must be an extension (superset)
of the category value of STEM.
.LP
This third convention enables affixes to be subcategorized for the
type of stem to which they attach.
Notice that this convention is
not
defined in terms of any feature-classes, but is defined using just one
feature (STEM). Hence, the way that the user makes use of this convention
is not by declaring the extent of feature classes (as for the other two
conventions), but by adding STEM specifications to the features in morphemes
in the lexicon, thereby indicating the combination
possibilities for each affix.
The following examples follow the convention
.VS L
()
     ((N -) (V +))
     ((STEM ((N -) (V +))))

()
     ((V +) (N -) (INFL +))
     ((STEM ((N -) (V +) (INFL +))))
.VE
.LP
Note that the feature STEM is built into the system, and need not be 
declared by the user.  If the user does declare it, it must be declared as
a category-valued feature. If STEM is declared as atomic-valued an error
message is generated and the declaration is ignored.
.NH 2
Feature Defaults and LCategory Definitions
.LP
Feature Defaults are similar in concept to the Feature Specification
Defaults of Gazdar et al. (1985). They are statements which 
define values for particular
features in circumstances where no value has been entered
by other mechanisms (i.e. the original morpheme entries, the action of
the lexical rules, or the feature-passing conventions).
That is, they state what the value of a feature should be if there is
no information to indicate any other value for it.
The defaults are applied to all new constituents (words or parts of
words) built during morphological parsing.
There is an assumption that the morpheme entries have no need for defaults
as they can be fully expanded by lexical rules (as described in 
section 8.6).
The defaults are applied after all other mechanisms have been taken action.
Hence, all possible sources
of information about a feature-value are allowed to take effect before
the defaults are considered.
(In terms of the active chart implementation of the parsing mechanism,
the default checking is done whenever a complete (i.e. inactive) edge
is entered into the chart).
.LP
At present, only very simple defaults are available, compared to the various
kinds of defaults proposed (for sentence-level grammar) by Gazdar et
al. (1985). All the user can do is define the default value for a given
feature (either a category-valued feature or an atomic-valued one).
For example, the statement
.VS
Defaults  BAR 0, AGR Inf
.VE
declares default values for two features (BAR and AGR), where \*QInf\*U
could be an alias for some category.
.LP
LCategory definitions \**
.FS
LCategory Definitions are a temporary addition; there are
problems with them such as being ambiguous and not fully general and hence
they may change in later versions.
.FE
(which are allowed only in the unrestricted unification version)
are another way of adding a particular type of default.
These definitions act on the root category of a word (either the 
root category of a word structure tree or the syntactic category when
category form is selected).  They allow the user to specify which
features are necessary for a complete category.  If any features are not already
present then they are added with a variable value.  These are primarily
designed to cope with sentence level parsers that deal in \*Qterm
unification\*U where
the same features must exist in two categories before they may unify.  
That is, these definitions can form an interface between a morphological
analyser using \f2unrestricted\f1 unification and a sentence-level parser using
\f2term\f1 unification.
.LP
There are two forms, simple LCategory definitions and feature LCategory 
definitions.
The first of these specifies that a basic category implies a list of feature
names.  All root categories of a word that are extensions of the basic
category are checked against
the list of implied features, and if one of these features does not exist in
the root category it is added with a value of an atom starting with \f3@D\f1.
For example
.VS L
       LCategory [N +,V -] => {PLU}
.VE
This ensures that all words marked as with ((N +) (V -)) will
also be marked with a value for \f2PLU\f1 or will be given a special
value starting with \f3@D\f1\**.
.FS
The LISP macro (D-BlankVariable) in the file subrout is used to define
what forms of value are used to fill in missing feature values.  This
may be modified by the installer of the system if necessary.
.FE
.LP
A feature LCategory consists of a category-valued feature name
followed by a category, then an implied list of features.  All root categories
that contain that category-valued feature with a value which is an extension
of the specified category are dealt with.
The value in the root category of the category-valued feature is then
checked against the list of implied features.
Any feature that is not in that category
is added with a special value 
starting with \f3@D\f1. For example
.VS
   LCategory AGR () => {PLU,PERS}
.VE
This definition ensures that values of the category-valued feature
\f2AGR\f1 in root categories of words will be marked with either real 
values for the features
\f2PLU\f1 and \f2PERS\f1 or with atoms starting with \f3@D\f1.
.LP
It is the intention that values starting with \f3@\f1 be treated as a form
of variable by the sentence level parser.
.NH 2
Word Grammar \(em User File Format
.LP
The \f3*.gr\f1 file should have declarations for features and their values; 
aliases used in the grammar rules; variables used in the grammar rules; 
feature classes (for feature passing conventions); feature specification 
defaults; and the actual rules themselves.  The \f3include\f1 facility
(section 5.2)is useful 
here as the grammar must have the same feature and value declarations as the 
lexical 
entries.  Comments start with 
a semicolon and are terminated by the following newline.  
Options specific to unrestricted unification are commented with \f3UU only\f1
and those specific to term unification marked \f3TU only\f1.
The syntax of this file is:
.VS L
<word grammar> ::= Declarations <declaration> *
                   Grammar <word-structure-rule> *1
.VE
.bp
.VS L
<declaration> ::=  
      \f3Alias\f1 <alias name> \f3=\f1 <category>
    | \f3Variable\f1 <variable name> \f3=\f1 <variable range>
    | \f3Feature\f1 <feature name> <feature range>
    | \f3FeatureClass\f1 <feature class name> \f3=\f1 <feature list>
    | \f3Defaults\f1 <default list>
    | \f3LCategory\f1 <lcategory definition>            ;; UU only
    | \f3CatDef\f1 <cat name> \f3has\f1 <feature list>  ;; TU only
    | \f3Top\f1 \f3=\f1 <category>                      ;; UU only
    | \f3Top\f1 \f3=\f1 <category type list>            ;; TU only
.VE
.VS L
<variable range> ::= 
      \f3{\f1 <alias list> \f3}\f1         ;; rule-category variable
    | \f3{\f1 <feature value list> \f3}\f1 ;; atomic-valued variable
    | \f3category\f1 | \f3CAT\f1           ;; category-valued variable
.VE
.VS L
<feature value list> ::=
      <feature value> \f3,\f1 <feature value list>
    | <feature value>
.VE
.VS L
<alias list> ::=
      <alias name> \f3,\f1 <alias list>
    | <alias name>
.VE
.VS L
<feature range> ::=
      \f3{\f1 <feature value list> \f3}\f1 ;; atomic-valued feature
    | \f3category\f1 | \f3CAT\f1           ;; category-valued feature
.VE
.VS L
<feature class name> ::=
      \f3WHead\f1
    | \f3WSister\f1
    | \f3MorphologyOnly\f1
.VE
.VS L
<feature list> ::=
     \f3{\f1 <feature name list> \f3}\f1
.VE
.VS L
<feature name list> ::=
      <feature name> \f3,\f1 <feature name list>
    | <feature name>
.VE
.VS L
<category type list> ::=
     \f3{\f1 <category type name list> \f3}\f1
.VE
.VS L
<category type name list> ::=
      <cat name> \f3,\f1 <category type name list>
    | <cat name>
.VE
.VS L
<default list> ::=
      <feature name> <feature value> \f3,\f1 <default list>
    | <feature name> <feature value>
.VE
.VS L
<lcategory definition> ::=
      <category> \f3=>\f1 <feature list>
    | <category-valued feature> <category> \f3=>\f1 <feature list>
.VE
.VS L
<feature name>, <feature value>, <alias name>, <variable name> ::=
      <atomic-symbol>
.VE
.VS L
<word-structure-rule> ::=      ;; one or more daughters
      \f3(\f1 <name> <node-spec> \f3->\f1 <node-spec list> \f3)\f1
.VE
.VS L
<node-spec list> ::=
       <node-spec> \f3,\f1 <node-spec list>
   |   <node-spec>
.VE
.VS L
<node-spec> ::=  <category>
.VE
.VS L
<category> ::=
      <GPSG category form>
    | <simple category form>
.VE
.VS L
<GPSG category form> ::= 
      <variable name>                     ;; variables are only allowed 
   |  <variable name> <feature bundle>    ;; in grammar rule categories
   |  <alias name> 
   |  <alias name> <feature bundle>
   |  <feature bundle>
.VE
.VS L
<feature bundle> ::=
      \f3[\f1 <GPSG feature pair list> \f3]\f1
.VE
.VS L
<GPSG feature pair list> ::=
      <GPSG feature pair> \f3,\f1 <GPSG feature pair list>
   |  <GPSG feature pair>
.VE
.VS L
<GPSG feature pair> ::=
      <feature name> <variable name>   ;; only in rule category
   |  <feature name> <category>
   |  <alias name>
.VE
.VS L
<simple category form> ::= 
      \f3(\f1 <feature pair> * \f3)\f1
.VE
.VS L
<feature pair> ::=
      <alias name>
   |  <variable name>                     ;; only if a rule category
   |  \f3(\f1 <feature name> <variable name> \f3)\f1
   |  \f3(\f1 <feature name> <category> \f3)\f1
   |  \f3(\f1 <feature name> <feature value> \f3)\f1
.VE
.LP
Where items are declared twice the second declaration is taken.
If declared STEM must be a category-valued feature
otherwise an error will
be signalled.  If STEM is not declared by the user, it will be declared 
automatically as category-valued.
.NH 2
Word Grammar \(em Example File
.LP
An example word grammar file (using unrestricted unification) is:
.VS L
;;
;;   an example word grammar 
;;   
Declarations
   Feature V {+,-}
   Feature N {+,-}
   Feature STEM category   
   Feature BAR {-1,0,1,2}
   Feature PN {PER1,PER2,PER3,PLUR}
   Feature PLU {+,-}
   Feature VFORM {EN,ED,ING,BSE}
   Feature AFORM {ER,EST}
   Feature INFL {+,-}
   Feature FIX {PRE,SUF}
.VE
.VS L
   Alias        Noun   =    ((BAR 0) (N +) (V -)) 
   Alias        Verb   =    ((BAR 0) (N -) (V +)) 
   Alias        Adj    =    ((BAR 0) (N +) (V +))
   Alias        Prep   =    ((BAR 0) (N -) (V -))
   Alias        PNoun  =    ((BAR 2) (N -) (V -))
   Alias        Prefix =    ((FIX PRE)(BAR -1))
   Alias        Suffix =    ((FIX SUF)(BAR -1))
.VE
.VS L
   Variable alpha  = {ING,ED,PAS}
   Variable WORDS  = {0,1,2}

   FeatureClass WHead = {PLU,PN,V,N,VFORM,AFORM,INFL}
   FeatureClass MorphologyOnly = {INFL}

   Top = [BAR WORDS]
.VE
.VS L
Grammar
   
   (PREFIXING
       [BAR 0] -> Prefix, [BAR 0]   )
 
   (SUFFIXING
       [BAR 0] -> [BAR 0], Suffix   )
.VE
.ds RH Section 8
.bp
.NH 1
Lexicon File
.LP
The entries are specified in a file \f3*.le\f1.  This file should contain
basically three parts, declarations of features aliases etc, lexical rules,
and finally the entries themselves.  It is expected that the feature
declarations be common between the lexical file and the word grammar file.
This can be achieved by using the \f3#include\f1 facility.
.NH 2
Morpheme Entries
.LP
Each lexical entry is an n-tuple \(em in the current version n = 5,
although only two fields are actively used.
.VS L
<lexical-entry> ::=
     (  <citation-form> 
        <phonological-form>
        <syntax-field> 
        <semantics-field> 
        <user-field>  )
.VE
The <citation-form> is a form of a word for which there is a separate
entry.
However, it is not mandatory to have only one entry that contains
a given citation form - \*Qbank\*U, for example, might appear several
times in the lexicon, once for each \*Qmeaning\*U of the word.
.VS L
<citation-form> ::= <lexical-alphabet symbol> *1
.VE
Each <citation form> represents what we
call a morpheme.  It is difficult to give a formal definition of what 
constitutes a morpheme.  Although we have borrowed the term from theoretical 
linguistics, we are using it in a sense which is rather different from any 
current in that field.  To some linguists, for example, the words \f2men\f1 
and \f2dogs\f1 are both realisations of two abstract entities, and these 
entities are termed morphemes.  However, we treat only the latter as being 
composed of two subparts.  Similarly, in our analysis of English morphology, 
the word \f2division\f1 is listed separately in the lexicon, rather than 
being derived from the obviously related \f2divide\f1.  There is in principle 
no reason not to decompose such items; the choice of how far to segment words
is the user's.  There is of course a trade-off between the number and 
complexity of spelling and grammar rules and the separate listing of such
irregular forms.  The plural \f2children\f1 could be treated as 
morphologically complex, either by writing spelling rules that change the 
general plural suffix \f2s\f1 to \f2ren\f1 in the desired context, or by
creating an entry for \f2ren\f1 as a plural suffix in its own right.  In the
latter case, its ability to attach to stems other than \f2child\f1 would need
to be restricted by means of a suitable STEM feature specification.  It 
is clearly 
simpler to take \f2children\f1 as being morphologically undecomposable.  
.LP
Any normal typable characters
may be used, but if any of the following characters are used, the entire
citation form must be surrounded by double quotes:
.VS
space, tab, newline, comma, ), (, #, ;, :, }, {, >, <, ], [, !
.VE
Double quotes may be used within citations by using a double double quote
within quotes e.g  \f2"hello""world"\f1 will give the form \f2hello"world\f1 
in the lexicon.
(Notice that this set of characters requiring quotation is the same as
those requiring quotations in spelling rules \(em see Section 6.3)
.LP
All characters specified in the citation form must be dealt with
by the spelling rules (otherwise no analyses involving that form will be
found).  It is the 
responsibility of the user to identify particular contexts for the spelling 
rules, but as a guideline the following method may be used.
When a particular context has to be referred to within the spelling
rules, add a special character to the lexical form at that position.
For example in the case where spelling changes occur with the adjoining
of the morpheme \f2ing\f1, the user can specify a citation form of \f2+ing\f1
and hence refer to the \f2+\f1 in the spelling rules that relate to that 
context.
Note that any extra character in the lexical form must be resolved with 
some character (or more often null) in the surface form. This system
does \f2not\f1 support wholly null citation forms.
.LP
The <phonological form> is either an atomic name or a list of atomic
names.  The intention is in future versions of the system to allow look
up by phonological form as well as citation form.  At present, though,
this field is not actively used, but simply returned in a word
structure when the appropriate format is selected.
.LP
The <syntax-field> is a category as described in the previous section on the
word grammar.
.VS L
<syntax-field> ::= <category> 
.VE
Examples of syntax field entries:
.VS L
PP        ; where PP is an alias for ( (BAR 2)(N -)(V -) )

( N (PLU +) )      ; N is an alias for ( (N +)(V -)(BAR 0) )

N[PLU +]     ; same as previous example

( (V +)(BAR 0)(N -)(VFORM fin)(PAST +) )    

V[VFORM fin,PAST +]  ; V is an alias for ( (V +)(N -)(BAR 0) )
.VE
.LP
If variables are included in entries they must only appear once in
an entry.
.LP
In the entries used to test this version the semantic field
for a word is just an atomic
symbol representing its citation form, which may or may not be
found elsewhere as a heading. E.g. the semantic field for both \f2man\f1
and \f2men\f1 is \f2MAN\f1. This reflects the fact that no real processing 
is done on this field as yet, but a suitable mnemonic label makes 
the entries more readable.  However any LISP s-expression may be 
specified here.  Note that this must be a simple s-expression and no macro
characters (including single quotes) take effect when reading this string
.VS L
<semantics-field> ::= <LISP s-expression> 
.VE
.LP
Similarly, we have an extremely general definition of the field
which is available to users:
.VS L
<user-field> ::=  <LISP s-expression>
.VE
In the implementation, any items which the user places
in this position (when defining the morpheme entries) are stored in the
lexicon and passed back in the composition form  returned by the look up 
processes (see Section 3.2).  
.LP
Example entries:
.VS L
( man man  N[PLU -] MAN nil)
( men men  ( (N +)(V -)(BAR 0) (PLU +)) MAN nil)
( afterwards AftuwEdz PP AFTERWARDS nil)
( saw sO ( V (PAST +)) SEE nil)
.VE
.LP
Note that each morpheme entry has 
one 
syntactic field and 
one
semantic field.  If the user wishes a morpheme (e.g. \f2bank\f1 or \f2tap\f1)
to have more than one syntactic entry or semantic entry then 
separate entries must be made in the lexicon file.
.NH 2
Noninflectable Categories
.LP
In addition to the standard declarations as described in the previous
chapter, a user may also optionally declare categories in a class 
of \f2NonInflect\f1.  All entries whose syntactic entry is an extension
of one of the non-inflectable categories will be marked so that 
they can appear \f2only\f1
at the end of a word.  This declaration allows the system to analyse
words more efficiently and hence speed up look up time.  Possible categories
for inclusion in this set are (in English) prepositions, determiners, 
irregular inflected forms, inflexional suffixes etc.
.LP
An example declaration may be
.VS L
   NonInflect = { ((CAT prep)),
                  ((CAT determiner)),
                  ((INFL -)) }
.VE
If more than one declaration is made, only the last one takes affect.  If a
\f2NonInflect\f1 declaration is made in a word grammar file it is ignored.
.LP
Another way of indicating non-inflectable categories is by the compiler
directive \f3noninflect\f1.  Entries in the lexicon file between the
directives
.VS L
   #noninflect on

   #noninflect off
.VE
are treated as non-inflectable (irrespective of their syntactic field values).
.NH 2
Lexical Rules
.LP
The system allows rules to be written that can expand the user written 
lexical entries in some predictable way.  These are designed to 
make the task of creating large lexicons easier.  There are three
types of lexical rule:
.XP
Completion Rules: these are used to add predictable features to the 
syntactic fields of user-specified entries (Section 8.6).
.XP
Multiplication Rules: these construct new lexical entries that are predictable
from the user-specified ones (Section 8.7).
.XP
Consistency Checks: these ensure the lexical entries are internally consistent
before they are added to the lexicon (Section 8.8).
.NH 2
Basic Format of Lexical Rules
.LP
All three types of rule have the same basic form:
.VS
< name > : < pre-condition > < operator > < action >
.VE
The name of rule is used for identification during debugging (see 
Section 4.1).
The < operator > and < action > are different in each type of rule but
the syntax of the < pre-condition > is the same.  
.LP
Pre-conditions are specified as conjunctions of (possibly negated) lexical 
entry patterns.  Note that these patterns may \f2not\f1 contain aliases.
They have have the syntax:
.VS L
< pre-condition > ::= < literal > \f3and\f1 < pre-condition >
                  |   < literal >
.VE
.VS L
< literal > ::= \f3~\f1 < lexical entry pattern > ;; tilde indicates negation
                | < lexical entry pattern >
.VE
.VS L
< lexical entry pattern > ::=
         ( < citation-form pattern > 
          < phonological pattern >
          < syntactic pattern >
          < semantic pattern > 
          < user-field pattern > )
.VE
.VS L
< citation-form pattern > ::=
                    \f3_\f1            ; wild card (underscore)
            | < citation-form >  ; atom in lexical alphabet
.VE
.VS L
< phonological pattern > ::=
                    \f3_\f1            ; wild card (underscore)
            | < phonological form> ; atom
.VE
.VS L
< semantic pattern > ::= 
                    \f3_\f1            ; wild card
            | < s-expression >  ;; any form (no macro characters)
.VE
.VS L
< user-field pattern > ::= 
                    \f3_\f1            ; wild card
            | < s-expression >  ;; any form (no macro characters)
.VE
.VS L
< syntactic pattern > ::=
                    \f3_\f1            ; wild card
            | ( < feature patterns > < rest > )
.VE
.VS L
< rest > ::=  < variable >
            |           ;; empty
.VE
.VS L
< feature patterns > ::=
               < feature pattern > < feature patterns >
            |           ;; empty
.VE
.VS L
< feature pattern > ::=
      \f3~\f1 ( < feature name >  < feature value pattern > ) 
      | ( < feature name >  < feature value pattern > )
.VE
.VS L
< feature value pattern > ::=
              \f3_\f1             ; wild card
           |  < feature value >
           |  < variable > 
.VE
.VS L
< variable > is  an atom starting with an underscore _
.VE
.NH 2
Pattern Matching in Lexical Rules
.LP
Variables are denoted by atoms starting with an underscore e.g.
\f3_fred\f1, \f3_fix\f1 etc.  Variables are bound during matching and
can be used later in a match or in a rule action.  There is a special
variable consisting of only an underscore (\*Q_\*U), which never gets
bound and hence can be used to match anything (cf. Prolog).  All other
variables have a consistent interpretation throughout a rule.  Matching
is done from left to right which \f2is\f1 significant in the matching
of syntactic fields.  
The entry being matched does not have to have the
features in the same order as the pattern.  Note that the pattern acts
on a fully expanded category and it does not matter what format (simple
LISPish format or Alias and feature bundle format as described in section 
7.2) the
lexical entry was originally specified in by the user.  Syntactic patterns in
the lexical rules however, can only be specified in the simple LISPish form,
with no aliases.
.LP
The following examples illustrate some of the above points:
.VS L
((FIX _fix) ~(BAR _) _rest) 
matches 
((FIX SUF) (N +) (V -)) 
with \f3_fix\f1 bound to SUF and \f3_rest\f1 bound to ((N +) (V -))

((FIX _fix) ~(BAR _) _rest) 
does not match 
((FIX SUF) (BAR -1) (N +) (V -))

((N -) (V +) _rest)  
matches 
((V +) (PLU -) (N -) (INFL +)) 
with \f3_rest\f1 bound to ((PLU -) (INFL +))
.VE
The pattern ((N +) _junk (V -)) does not match any syntactic category
because the variable \f3_junk\f1 will match all remaining features in the 
category being checked.
.VS  L
((N +) (V -) _rest) 
matches 
((V -) (N +)) 
with \f3_rest\f1 bound to an empty list of features.
.VE
.LP
When negation is used no bindings that are made within a negative pattern are 
passed on through the match (again cf. Prolog), although bindings can be 
passed into negations.
.LP
The following pre-condition
.VS
~(be _ _ _ _) and (_ _ ((N -) (V +) _rest) _ _)
.VE
would match all entries that do not have the citation 
form \*Qbe\*U and are marked with the features (N -) and (V +), and
.VS
(_ _ ((N +) (V -) ~(PLU _) _) _ _)
.VE
would match any entry with the features (N +) and (V -) but not
the feature PLU (with any value).
.NH 2
Completion Rules
.LP
Completion Rules are designed for adding defaults, etc. to the 
entries that are specified by the user.  Completion rules are applied to 
the expanded entries (after aliases have been expanded) and are applied
in the order they are specified.  Accordingly, the order of the 
completion rules within the file
is significant.  A completion rule is of the form 
.VS
< pre-condition > => < entry skeleton >
.VE
If a pre-condition matches an entry the entry is \f2replaced\f1 with the newly
constructed one described by the entry skeleton.
The syntax of an \f2<entry skeleton>\f1 is 
.VS L
< entry skeleton > ::=
      ( < citation form >
        < phonological form>
        < syntactic form >
        < semantic form >
        < user field form > )
.VE
.VS L
< citation form > ::=  \f3&\f1  ;; as in entry being checked
             |   < atom >                   ;; new morpheme
.VE
.VS L
< phonological form > ::=  \f3&\f1  ;; as in entry being checked
             |   < atom >                   ;; new version
.VE
.VS L
< semantic form > ::=  \f3&\f1   ;; as in entry being checked
                 |   < s-expression >  ;; new semantic form
.VE
.VS L
< user field form > ::=  \f3&\f1 ;; as in entry being checked
                 |   < s-expression >     ;; new user field    
.VE
.VS L
< syntactic form > ::=  \f3&\f1  ;; as in entry being checked
                 | ( < feature forms > )
.VE
.VS L
< feature forms > ::= < feature form > < feature forms >
                 |                                 ;; empty
.VE
.VS L
< feature form > ::= ( < feature name > 
                         < feature value form > )
                 |   < variable >          ;; rest variable
.VE
.VS L
< feature value form > ::= 
                 |  < atom starting with _ >  ;; variable
                 |  < atomic value >
                 |  < syntactic form >        ;; category-valued
.VE
.LP
It is the responsibility of the user to ensure that the
newly built entry is a valid entry; care should be taken not
to add features which may already be specified in the entry.  If an
invalid entry is built then a warning message is given only after all three
types of rules have applied.  The use of unbound variables in the skeleton
will cause an error to be signalled and the lexicon compilation will 
halt.
.LP
For example the rules:
.VS L
Add_BAR_MINUS_ONE:
(_ _ ((FIX _fix) ~(BAR _) _rest) _ _) =>
       (& & ((FIX _fix) (BAR -1) _rest) & &)

Add_BAR_ZERO:
(_ _ (~(BAR _) _rest) _ _) =>
       (& & ((BAR 0) _rest) & &)

Add_INFL_PLUS:
(_ _ ((STEM (~(INFL _) _stem) _rest) _ _) =>
       (& & ((STEM ((INFL +) _stem)) _rest) & &)
.VE
have the action of adding (BAR -1) to entries containing the feature
FIX, adding (BAR 0) to all entries that do not have a BAR marking and
lastly adding (INFL +) to all values of STEM that do not already have a
marking for INFL.  Note that the ordering of the first two rules is
significant. If the first two rules were in the reverse order, the FIX rule
would not apply, as all entries would by that time have had (BAR 0) added.
.NH 2
Multiplication Rules
.LP
These rules construct additional entries (as opposed to replacing 
entries as in
Completion Rules).  These are typically used to generate similar entries
with slightly varying feature markings, e.g. in English these rules
can be used to generate the first person singular, second person singular
and plural of
verbs from the base form.  The syntax of these rules is very similar
to that of the completion rules.
.VS L
< name > : < pre-condition > =>> ( < list of entry skeletons > )
.VE
The syntax of an entry skeleton is described above.
.LP
The ordering of the rules is not significant as newly created entries are 
not re-tested against the other multiplication rules.  This is to avoid 
the possibility of infinite application of the rules.
.LP
A multiplication rule to generate the first and second person
singular and plural of a base verb could be
.VS L
MultRule1:
(_ _ ((V +) (N -) (BAR 0) (VFORM BSE) (INFL +) _rest) _ _) =>>
    (
      (& & ((V +) (N -) (BAR 0) (PN PER1) (INFL -) _rest) & &)
      (& & ((V +) (N -) (BAR 0) (PN PER2) (INFL -) _rest) & &)
      (& & ((V +) (N -) (BAR 0) (PN PLUR) (INFL -) _rest) & &)
     )
.VE
Note that the entry being tested is not replaced but remains 
in the lexicon (assuming the Consistency Checks are
passed; see below).  So, given the entry
.VS L
(like lAIk ((BAR 0) (V +) (VFORM BSE) (N -) 
            (INFL +) (SUBCAT VP2a)) LIKE NIL)
.VE
four entries would exist after the application of the multiplication rule, 
having the form:
.VS L
(like lAIk ((BAR 0) (V +) (VFORM BSE)
            (N -) (INFL +) (SUBCAT VP2a)) LIKE NIL)        
(like lAIk ((V +) (N -) (PN PER1) 
            (BAR 0) (INFL -) (SUBCAT VP2a)) LIKE NIL)        
(like lAIk ((V +) (N -) (PN PER2)
            (BAR 0) (INFL -) (SUBCAT VP2a)) LIKE NIL)        
(like lAIk ((V +) (N -) (PN PLUR)
            (BAR 0) (INFL -) (SUBCAT VP2a)) LIKE NIL)        
.VE
.NH 2
Consistency Checks
.LP
After the above two sets of rules have applied, each entry (including
newly created ones) is compared against the consistency checks.
Any entry that does not pass these tests is not added to the lexicon.
The system itself requires only that entries are quintuples and that the 
syntactic field is a set of feature pairs with values as declared.  These
consistency checks allow the user to check linguistic dependencies 
within entries;
for example, the system will not reject a category that contains
a V marking but no N marking, but the user may wish to specify that
such a category is invalid.  Consistency checks are statements of
the form:
.VS L
< name > : < pre-condition > demands < post-condition >
.VE
The < post-condition > has the same syntax as the pre-conditions.  The
interpretation is:
.IP
If an entry matches the pre-condition it \f2must\f1
match the post-condition as well. 
.LP
Entries that match the pre-condition
but fail the post-condition are printed with an error message and \f2not\f1
added to the lexicon. 
.LP
For instance, if all entries that are marked for V must 
also be marked for N and vice versa then this condition can be written as:
.VS L
MustHave_N:
   (_ _ ((V _) _) _ _) demands (_ _ ((N _) _) _ _)
MustHave_V:
   (_ _ ((N _) _) _ _) demands (_ _ ((V _) _) _ _)
.VE
.NH 2
Application of Lexical Rules
.LP
Although Completion Rules and Multiplication Rules have been described
in this order they may also be declared with Multiplication Rules before
Completion Rules.  Note that the two types of rule cannot be mixed.  The order
chosen by the user is significant, since changing the order of these two types
of rules will change the order in which they apply, and hence
usually result in different lexicons.  Note also that the
order of application of each Completion Rule is also significant.  After
the two modifying rule types have been applied in the order they were
specified by the user the Consistency Checks are applied.   These checks 
are applied to both the expanded and newly created entries.
.LP
Two types of error can occur.  Severe errors are fatal 
to the compilation process; when they occur the current line is displayed
and an error message is given before the compilation is terminated.  Severe
errors occur in two cases; when the user-file is syntactically incorrect
and when lexical rules try to use unbound variables.
Mild errors are not fatal to the compilation process; they occur when
an alias is unknown or
an entry fails a Consistency Check or the syntax field is not a
valid category.  When mild errors occur
an error message is given but the compilation continues without adding
the erroneous entry to the lexicon.
.NH 2
Lexicon File \(em User File Format
.LP
The format of the lexicon file is as follows:
.VS L
<lexicon file> ::= Declarations <decls> *
                   Rules  <rule types> 
                   Entries <entry> *
.VE
.VS L
<decls> ::=  
      \f3Alias\f1 <alias name> \f3=\f1 <category>
    | \f3Feature\f1 <feature name> <feature range>
    | \f3Variable\f1 <variable name> \f3=\f1 <variable range>
    | \f3NonInflect\f1 \f3=\f1 \f3{\f1 <category list> \f3}\f1
    | \f3CatDef\f1 <cat name> \f3has\f1 <feature list> ;; TU only
.VE
.VS L
<variable range> ::= 
      \f3{\f1 <feature value list> \f3}\f1 ;; atomic-valued variable
    | \f3category\f1 | \f3CAT\f1           ;; category-valued variable
.VE
.VS L
<category list> ::=
      <category> \f3,\f1 <category list>
    | <category>
.VE
.VS L
<feature range> ::=
      \f3{\f1 <feature value list> \f3}\f1
    | \f3category\f1 | \f3CAT\f1
.VE
.VS L
<feature list> ::=
     \f3{\f1 <feature name list> \f3}\f1
.VE
.VS L
<feature name list> ::=
     <feature name> \f3,\f1 <feature name list>
   | <feature name>
.VE
.VS L
<feature value list> ::=
      <feature value> \f3,\f1 <feature value list>
    | <feature value>
.VE
.VS L
<feature name>, <feature value>, <alias name>, 
       <variable name>, <cat name>  ::= <atomic-symbol>
.VE
.VS L
<category> ::=
      <GPSG category form>
    | <simple category form>
.VE
.VS L
<GPSG category form> ::= 
   |  <alias name> 
   |  <alias name> <feature bundle>
   |  <feature bundle>
.VE
.VS L
<feature bundle> ::=
      \f3[\f1 <GPSG feature pair list> \f3]\f1
.VE
.VS L
<GPSG feature pair list> ::=
      <GPSG feature pair> \f3,\f1 <GPSG feature pair list>
   |  <GPSG feature pair>
.VE
.VS L
<GPSG feature pair> ::=
   |  <feature name> <category>
   |  <alias name>
.VE
.VS L
<simple category form> ::= 
      \f3(\f1 <feature pair> * \f3)\f1
.VE
.VS L
<feature pair> ::=
      <alias name>
   |  \f3(\f1 <feature name> <category> \f3)\f1
   |  \f3(\f1 <feature name> <feature value> \f3)\f1
.VE
.VS L
<rule types> ::= 
        \f3Completion Rules\f1 <completion rule> *
        \f3Multiplication Rules\f1 <multiplication rule> *
        \f3Consistency Checks\f1 <consistency check> *
     |  \f3Multiplication Rules\f1 <multiplication rule> *
        \f3Completion Rules\f1 <completion rule> *
        \f3Consistency Checks\f1 <consistency check> *
.VE
.VS L
<completion rule> ::=
    <name> : <pre-condition> \f3=>\f1 <entry skeleton>
.VE
.VS L
<multiplication rule> ::= 
    <name> : <pre-condition> \f3=>>\f1 \f3(\f1 <entry skeleton> * \f3)\f1
.VE
.VS L
<consistency checks> ::= 
    <name> : <pre-condition> \f3demands\f1 <post-condition>
.VE
.VS L
<name> := <atomic-symbol>
.VE
.VS L
<post-condition> ::= 
                <literal> \f3and\f1 <post-condition>
                | <literal>
.VE
.VS L
<pre-condition> ::= 
                <literal> \f3and\f1 <pre-condition>
                | <literal>
.VE
.VS L
<literal> ::=   \f3~\f1 <lexical entry pattern>        ; not
              | <lexical entry pattern>
.VE
.VS L
<lexical entry pattern> ::=
     \f3(\f1
        <citation-form pattern> <syntactic pattern>
        <phonological pattern> <semantic pattern> 
        <user-field pattern>
     \f3)\f1
.VE
.VS L
<citation-form pattern> ::=
             \f3_\f1        ; wild card
          |  <citation form>   ; atom in lexical alphabet
.VE
.VS L
< phonological pattern > ::=
            \f3_\f1        ; wild card
          | < phonological-form >  ; atom
.VE
.VS L
<syntactic pattern> ::=
            \f3_\f1       ; wild card
          | (<feature patterns> <rest>)
.VE
.VS L
<semantic pattern> ::=
            \f3_\f1       ; wild card
          | <s-expression> ; any form
.VE
.VS L
<user-field pattern> ::=
            \f3_\f1               ; wild card
          | <s-expression> ; any form
.VE
.VS L
<rest> ::= <variable> 
          |     ;; empty, all features were specified
.VE
.VS L
<feature patterns> ::=
          <feature pattern> <feature patterns>
        |        ; empty
.VE
.VS L
<feature pattern> ::=
          \f3~(\f1 <feature name> <feature value pattern> \f3)\f1  ; not
        | \f3(\f1 <feature value name> <feature value pattern> \f3)\f1
.VE
.VS L
<feature value pattern> ::=
             \f3_\f1       ; wild card
        | <feature value>
        | <variable>
.VE
.VS L
<variable> is an atom starting with \f3_\f1
.VE
.VS L
<entry skeleton> ::=
          \f3(\f1
             <citation form>
             <phonological form>                
             <syntactic form>
             <semantic form>
             <user field form>
          \f3)\f1
.VE
.VS L
<citation form> ::=
            \f3&\f1     ; as in entry being checked
        | <atom>     ; new morpheme
.VE
.VS    L
< phonological form > ::=  
            \f3&\f1
        | <atom>     ; new phonological form
.VE
.VS L
<syntactic form> ::= 
            \f3&\f1
        | \f3(\f1 <feature forms> \f3)\f1
.VE
.VS L
<semantic form> ::=
            \f3&\f1
        | <s-expression>  ; new semantic form
.VE
.VS L
<user field form> ::= 
            \f3&\f1
        | <s-expression>  ; new user field
.VE
.VS L
<feature forms> ::=
          <feature form> <feature forms>
        |       ;; empty
.VE
.VS L
<feature form> ::= 
          \f3(\f1 <feature name> <feature value form> \f3)\f1
        | <variable>     ;; rest variable
.VE
.VS L
<feature value form> ::=
           <atom starting with _>        ; variable
        |  <atomic value>
        |  <syntactic form>  ; in category valued variables
.VE
.NH 2
Sample Lexicon
.LP
The following is an example lexicon (using unrestricted unification).
.VS L
;;
;;   Example lexicon
;;
.VE
.VS L
Declarations
   Feature        BAR        {-1,0,1,2}
   Feature        STEM       CAT
   Feature        V          {+,-}
   Feature        N          {+,-}
   Feature        PN         {PER1,PER2,PER3,PLUR}
   Feature        PLU        {+,-}
   Feature        VFORM      {EN,ED,ING,BSE}
   Feature        AFORM      {ER,EST}
   Feature        INFL       {+,-}
   Feature        FIX        {PRE,SUF}
.VE
.VS L
   Alias        Noun    =   ((N +) (V -))
   Alias        Verb    =   ((N -) (V +))
   Alias        Adj     =   ((N +) (V +)) 
   Alias        Prep    =   ((N -) (V -)) 
   Alias        PNoun   =   ((BAR 2) (N -) (V -)) 
.VE
.VS L
   NonInflect = {((INFL -))}
.VE
.VS L
Rules
.VE
.VS L
Multiplication Rules
MultVerb:
(_ _ ((V +) (N -)) _ _) =>>
  (
    (& & ((V +) (N -) (BAR 0) (PN PER1) (INFL -)) & &)
    (& & ((V +) (N -) (BAR 0) (PN PER2) (INFL -)) & &)
    (& & ((V +) (N -) (BAR 0) (PN PLUR) (INFL -)) & &)
  )
.VE
.VS L
Completion Rules
Add_BAR_MINUS_ONE:
    (_ _ ((FIX _fix) ~(BAR _) _rest) _ _) =>
      (& & ((BAR -1) (FIX _fix) _rest) & &)
Add_INFL_MINUS:
    (_ _ ((FIX _fix) ~(INFL _) _rest) _ _) =>
      (& & ((INFL -) (FIX _fix) _rest) & &)
.VE
.VS L
Add_INFL_to_STEM:
    (_ _ ((STEM (~(INFL _) _stem)) _rest) _ _) =>
      (& & ((STEM (_stem (INFL +))) _rest) & &)
Add_BAR_ZERO:
    (_ _ ((N _n) (V _v) ~(BAR _) _rest) _ _) =>
      (& & ((BAR 0) (N _n) (V _v) _rest) & &)
.VE
.VS L
Add_Def_PLU:
    (_ _ ((N +) (V -) (BAR 0) ~(PLU _) _rest) _ _) =>
      (& & ((PLU -) (N +) (V -) (BAR 0) _rest) & &)
Add_Def_INFL_Preps:
    (_ _ ((N -) (V -) (BAR 0) ~(INFL _) _rest) _ _) =>
      (& & ((INFL -) (N -) (V -) (BAR 0) _rest) & &)
.VE
.VS L
Add_Def_INFL:
    (_ _ ((N _n) (V _v) (BAR 0) ~(INFL _) _rest) _ _) =>
      (& & ((INFL +) (N _n) (V _v) (BAR 0) _rest) & &)
Add_VFORM:
    (_ _ ((V +) (N -) (BAR 0) ~(VFORM _) ~(PN _) _rest) _ _) => 
      (& & ((VFORM BSE) (V +) (N -) (BAR 0) _rest) & &)
.VE
.VS L
Consistency Checks
Demands_N:
      (_ _ ((V _) _) _ _) demands (_ _ ((N _) _) _ _) 
Demands_V:
      (_ _ ((N _) _) _ _) demands (_ _ ((V _) _) _ _)
Demands_BAR:
      (_ _ ((N _) (V _) _) _ _)
               demands (_ _ ((BAR _) _) _ _)
.VE
.VS L
Entries
   (boy bOI Noun BOY NIL)
   (girl guRl Noun GIRL NIL)
   (park pAk Noun PARK NIL)
   (saw sO Noun SAW NIL)
   (telescope teliskowp Noun TELESCOPE NIL)
   (john jon PNoun JOHN NIL)
   (mary meerI PNoun MARY NIL)
   (like lAIk Verb LIKE NIL)
   (see sI Verb SEE NIL)
   (saw sO (Verb (VFORM ED) (INFL -)) SEE NIL)
   (walk wAk Verb WALK NIL)
   (+s s (Verb (PN PER3) (FIX SUF) (STEM Verb)) S NIL)
   (+s s (Noun (PLU +) (FIX SUF) (STEM Noun)) S NIL)
   (+ed id (Verb (VFORM ED) (FIX SUF) (STEM Verb)) ED NIL)
   (+ing iN (Verb (VFORM ING) (FIX SUF) (STEM Verb)) ING NIL)
   (big big Adj BIG NIL)
   (+er ur (Adj (AFORM ER) (FIX SUF) (STEM Adj)) ER NIL)
   (+er ur (Noun (FIX SUF) (INFL +) (PLU -) (STEM (V (INFL +)))) ER NIL)
   (+est ist (Adj (AFORM EST) (FIX SUF) (STEM Adj)) ER NIL)
.VE
.ds RH Section 9
.bp
.NH 1
Implementation
.NH 2
Basic System
.LP
The system is written in Franz LISP (opus 42.15) for UNIX Berkeley
4.2 systems.  Provisions are also made for running the system under
Franz LISP opus 38.79 (the version distributed for VAX systems) and 
Common LISP.
.LP
The system is distributed in a directory containing 5 sub-directories:
.XP
\f3src\f1:
contains the LISP source code for the system.  Also a file \f3example.l\f1
which shows how the system can be used within a lisp program.
.XP
\f3common\f1:
contains the files used to map the Franz LISP version of the system to
Common LISP.
.XP
\f3man\f1:
UNIX manual pages for the four UNIX level commands.
.XP
\f3doc\f1:
User manual and descriptions of example lexicons.
.XP
\f3examples\f1:
This directory contains two sub-directories which contain two different
example sets of user files.  The first \f3examples/GPSG\f1, contains
a 6800 morpheme dictionary and uses feature names and values mainly
consistent with those in GPSG (Gazdar et al. 1985).  This description
is intended to be compatible with the grammar of Briscoe et al. (1986)
and the parser of Phillips and Thompson (1986).  The second
sub-directory is \f3examples/Simple\f1.  This contains a 3300 morpheme
lexicon but its method of description is somewhat simpler than the GPSG
version.  Both these sample descriptions are much more complex and
realistic than the illustrative examples given in this user document.
Also in this directory are two small example descriptions one called 
\f3test\f1 which is used as an example throughout this manual, and one
called \f3tu\f1 which is a small term unification example.
.LP
Using the GPSG sample lexicon as a benchmark the system takes on average
1.0 seconds to analyse a word.  Simple non-inflected words take significantly
less time. These timings are using Franz (opus 42.15) on a Sun 2/120
with 4Mb of memory.  Franz (opus 38.75) is about twenty per cent faster
while Sun 3s are around three times faster than Sun 2s.  The Common LISP
version is significantly slower than the Franz version because of lack
of main memory and because the system was ported to Common LISP rather
than re-implemented making full use of Common LISP's features. 
.LP
Users interested in the internal implementation details of the system should
consult the System Description (Ritchie et al 1987).
.NH 2
Installation for Franz (opus 42.15)
.LP
A makefile is provided for building and installing the system.  To compile the
system, first change to the top directory of the system and type
.VS C
make
.VE
This will use the Liszt compiler to compile the source files in the 
\f3src\f1 directory (it takes around 40 minutes on a Sun 2 to
compile the full system).  This
command also copies the necessary object code files to the top directory
ready for use.  Note it is assumed by the system that the commands 
\f3lisp\f1 and \f3liszt\f1 invoke a 42.15 version of the system.  This 
can be discovered by simply entering LISP from the UNIX shell and checking its
version number as displayed before the prompt.
.LP
The UNIX level commands described in section 3.4 are also compiled and
copied into the top directory.  The manual pages may be installed in directory
\f3/usr/man/manl\f1 (manl for local manuals) by the command
.VS C
make install
.VE
This can be run only by super-user.
.LP
This basic \f3make\f1 builds the unrestricted unification version of the 
system.  This can also be built by typing
.VS C
make 42u
.VE
To build a term unification version type
.VS C
make 42t
.VE
Each user who wishes to use the dictionary system is advised to add the top
directory of the system to the search path of the LISP system.  Using
a UNIX environment variable this can be done by adding the following
two lines to their \f3.login\f1 file (in their home directory).
.VS
setenv usemap "/usr/local/src/morph"
set path=($path $usemap)
.VE
replacing the string \f3/usr/local/src/morph\f1 with the name of the top 
directory of the installed system.  Users should also add the following
lines to the \f3.lisprc\f1 file in their home directory:
.VS L
(eval (list
      'sstatus
      'load-search-path
         (append
            (status load-search-path)
            (list
               (cdr (assoc "usemap" environment)))
         )))
.VE
The above code ensures that the morphology system can be loaded
irrespective of the 
current directory.  It is possible that the user may wish not to
make these changes to his UNIX and LISP environments; if this is 
the case then the directory options described in section 3.3 can be used.
.NH 2
Installation for Franz (opus 38.75)
.LP   
The system will also run under opus 38.75, which is the version of Franz
LISP distributed with the Berkeley 4.2 UNIX systems for VAXes and other
larger machines.  This version is actually faster than the more recent
opus 42.15.  To build a unrestricted unification version from 38.75 type
.VS C
make 38u
.VE
and for a term unification version type
.VS C
make 38t
.VE
.LP
There are a few minor changes required to the source code to make it compatible 
with the later version of Franz.  Although, a makefile is provided for
making the necessary changes automatically, but it will be useful for the
implementor to know what the actual changes are. Again note NO changes
are actually necessary \(em the following is just to explain the problems.
.LP
The basic system itself will compile and run with no changes, but the UNIX
commands require slight modification.  The files \f3dci.l\f1, \f3mksp.l\f1,
\f3mklex.l\f1 and \f3mkgram.l\f1,  require expressions referring to
the environment variable \f3usemap\f1 to be changed.  In each of these files
there is an expression of the form
.VS C
(assoc "usemap" environment)
.VE 
This should be changed to the form
.VS C
(assoc 'usemap environment)
.VE
Also the changes to a user's \f3.lisprc\f1 file suggested in section 9.2
above, 
must be changed to replace the \f3"usemap"\f1 with \f3'usemap\f1.
.LP
If the system is to be used on another version of LISP it is recommended that
for versions up to opus 42 use the 38 version and for running on versions
later than 42.15 use the 42 version.  
.NH 2
Installation for Common LISP
.LP
In addition to a Franz LISP version there is also compatibility code that 
allows the system to be run in Common LISP.  Again a makefile is provided,
but the distributed code is set up to compile for Kyoto Common Lisp.
However only very simple changes are necessary to make it run on other
Common LISP
systems (see below).  To build a Common LISP version (unrestricted 
unification version) 
of the system type
.VS C
make cl
.VE
in the top directory.  Note that the UNIX level functions have not been ported
to Common LISP as there is currently no way to define these in Common LISP
(though there is in Kyoto Common Lisp).  All other functions have been 
translated without change in functionality.
And similar to the Franz LISP version
.VS C
make clu
.VE
builds a Common LISP unrestricted unification version and
.VS C
make clt
.VE
builds a Common LISP term unification version.
.LP
One problem not dealt with is error catching, as Common LISP currently has
no definition for error handling (though some suggestions are now being
discussed).  During the lexicon user file compilation if an error is
found a LISP error is signalled, which will put the user into the break
handler.  There is no way in general to catch an error like this and
simply return to the top level of the dictionary command interpreter.
.LP
The port is done by first loading the file \f3mapcl\f1 which contains a set 
of definitions (mostly macros)
which translate code from Franz to Common LISP.
The other change is the redefinition of some of the routines
in the general source file \f3subrout\f1.  This is done by loading the
redefinitions at the end of the file \f3subrout\f1.  User who wish to complete
the port by catching errors (if their version of Common LISP allows it)
can best do this by modifying the macro definition for \f3errset\f1 in the
file \f3mapcl\f1.  Currently this function is ignored.  This unfortunately
means that when errors occur in the Analyser system the break loop is
entered.
.LP
If the code is to be run on some Common LISP other than Kyoto then it
may be necessary for the installer to make various changes.  Unfortunately it
is not the case that Common LISP code is completely compatible between 
Common LISP systems.  The system has been tried on Lucid Common Lisp and
Gould Common Lisp with a fair degree of success (functionally it works but
there was a problem with prompts).  An attempt has also been
made to run the code on Poplog Common Lisp (version 12).  This was unsuccessful
because of (at least) the Common LISP function \f3(file-position)\f1 not being
implemented.  This function is used by the system to allow the actual entries
to be held on disk rather than filling up the virtual memory space.  It may be
that minor changes to the code could circumvent this problem.  Also the
code has been run on DEC's VAXLisp running under VMS and GCLISP running
on an IBM PC AT; both of these were successful but both required modifications
so that the lexical entries are held in LISP rather than on disk.  This
requires two very simple changes in the functions \f3D-GetFilePos\f1 
(in file \f3entryconv\f1) and \f3D-ReadLexicalEntry\f1 (in file \f3subrout\f1).
.LP
If the code is to be ported to a system other than Kyoto then use the command
.VS C
make setup
.VE
in the \f3common\f1 directory.  This copies the source code from the \f3src\f1
directory.  It also appends the conversion functions in the file \f3mapcl\f1
onto the front of \f3morphan.l\f1 and \f3maload.l\f1.  Note that the order in
which the
files are compiled is significant (or can be, if only parts of the system are
loaded at once).  The file \f3mafuncs.l\f1 should be compiled first as the
\f3include\f1 facility only loads files that have not already been loaded
in that session (via the global variable \f3D-LOADEDFILES\f1).  \f3include\f1
has this function as there is no need to re-load source code files as there
is no concept of "local functions" in Common LISP.
\f3mapcl\f1 contains mapping function from Franz to Common LISP, and it may
be necessary to change these.  If an \f3errset\f1 is offered by the local
system it should be implemented via the macro \f3errset\f1 (Common LISP
has not defined \f3errset\f1).  One change that was required in some systems
was the deletion of the first call to \f3D-ReadToEndOfLine\f1 in the function
\f3D-Restart\f1.  This reads in the carriage return from the terminal port,
that is at the end of the line that had the function call on it.  This is
not always required by all systems.  The effect is that no prompt is given 
until a command is given (which is then ignored).  Also the prompts did not
always
appear in the some of the systems tested.  Apparently the function
\f3clear-output\f1 was not working in the intended manner.
.LP
In all the ports the major problems existed at the interface with the 
operating system, with either the naming of files or reading of characters
from them.  The system provides its own reader (in file \f3readatom\f1)
which once implemented should allow most of the system to run with little
problem.
.LP
Another point that implementors should consider is memory management.
At present nothing is done in the Common LISP versions to set up the 
virtual memory space in the most efficient way.  This task is very important
in ensuring an efficient system.
.LP
Some of these problems may improve with later versions of the system as
corrections and solutions to the porting problems are found.  In the meantime
although most of the work involved in porting the system to Common LISP
is done some work will be required by the local implementor to complete
this task.
.LP
Note the system has only been \f2ported\f1 to Common LISP rather 
than \f2re-implemented\f1.
There are more efficient ways to implement this system than has been done for
this port.  The intention was to allow the system to at least run in
Common LISP so it can have a potentially wider user group.  It may be that
in future versions a more efficient implementation could be distributed.
.NH 2
Programming Conventions
.LP
This section describes some points that may be of interest to 
people who are going to use the Analyser within some other program.
.LP
All the functions mentioned in section 3.1 are also defined in lower case.
This allows the user not to have to worry about the case.  
The functions are defined
(and described) in the notation used by some LISP programmers with upper
case letters at the start of each word in the name. (Note that in the Common
LISP version all functions are defined in \f2all\f1 upper case only \(em
but they may be referred to in any mixture of case)
.LP
All atoms in the dictionary system that appear on the oblist have the 
prefix \f3d-\f1 or \f3D-\f1.  This is intended to ensure that a user program 
does
not have name clashes with any of the dictionary system functions or global 
variables.  Franz opus 42.15 (and Common LISP) has a packages facility
that allows distinct name spaces but the intention is to avoid facilities that
would prevent the use of the dictionary on opus 38.75 (i.e. the Franz
LISP opus available with Berkeley 4.2 on VAX systems).
.NH 2
Restrictions and Bugs
.LP
One very important thing to remember about the system is that it
is a prototype and not intended for wide distribution.  
Constructive suggestions will be appreciated.
.LP
Not all the checks that are necessary to make a robust system are 
carried out yet,
in particular checking that the feature declarations are consistent 
between the different files.
.LP
If you find any other bugs please report them to Graeme Ritchie at Edinburgh
(JANET: graeme@uk.ac.edinburgh.edai or UUCP: ..!seismo!mcvax!ukc!edai!graeme).
At present it is not possible to guarantee that bugs will be fixed.
.ds RH Section 10
.bp
.NH 1
Enhancements
.LP
This is a prototype system and written for its functionality
rather than its aesthetics.  Improvements would be desirable in the following
areas:
.IP 1.
Spelling Rules \(em Unintended interference can happen between rules,
so it would be interesting to look into possible ways of detecting 
this and informing
the user during the compilation phase.  Also there may be better notations
for specifying spelling rules (see Black et al. (1987)).
.IP 2.
The user interface \(em improvements like tracing, 
adding more comprehensive checking of the declarations and use of features,
variables etc.,
and other simple but useful aids could be introduced.  
.IP 3.
Phonological Form \(em it would be desirable to integrate the phonological
form in some way into the system.  This could be in the form of offering
two forms of spelling rules \(em one for orthography and one for 
phonology \(em and hence offering different look-up functions based on
orthography and phonology.
.IP 4.
Generation \(em the system could in principle be used for generation
of words. Currently only simple concatenation of morphemes using the 
spelling rules is possible but other features could be included.
.IP 5.
Speed \(em It is hoped that sometime in the future the 
system will also be implemented in some other language like C which
may offer a far faster response time. 
.ds RH 
.bp
.sp 2
.SH
References
.sp 1
.XP
Bear, J. (1985) \*QA Morphological Recogniser with Syntactic and Phonological
Rules.\*U  Unpublished paper. SRI International, Menlo Park, CA., USA.
.XP
Black, A.W., G.D. Ritchie, S.G. Pulman, and G.J. Russell (1987)
\*QFormalisms for Morphographemic Description\*U In: \f2Proceedings of
3rd Conference of the European Chapter of the Association for Computational
Linguistics\f1 Copenhagen, Denmark.
.XP
Briscoe, E.J., I. Craig, and C. Grover. (1986)
\*QThe Use of the LOB Corpus in the Development of a Phrase Structure Grammar
of English.\*U
In: \f2 Proceedings of 6th ICAME\f1, Amsterdam.
(To be published eds. Meijs, W., and van der Steen, G.J.).
.XP
Gazdar, G. and G.K. Pullum (1982) \f2Generalised Phrase Structure 
Grammar \(em A Theoretical Synopsis.\f1 Indiana University Linguistics Club.
.XP
Gazdar,G., E. Klein, G.K. Pullum, and I.A. Sag, (1985) \f2Generalised Phrase 
Structure Grammar.\f1 Oxford: Blackwell.
.XP 
Karttunen, L. and K. Wittenburg (1983) \*QA Two-level Morphological Analysis
of English.\*U \f2Texas Linguistics Forum 22\f1, Department of Linguistics,
University of Texas, Austin, Texas.
.XP
Karttunen, L. et al.(1983)
\*QKIMMO : A General Morphological Processor.\*U
In \f2Texas Linguistics Forum 22\f1, Department of Linguistics, University
of Texas, Austin, Texas.
.XP
Kay, M. (1983)
\*QWhen meta-rules are not meta-rules.\*U Pp. 94-116 in K. Sparck Jones 
and Y. Wilks (eds.) \f2Automatic Natural Language Parsing,\f1 
Chichester: Ellis Horwood.
.XP
Koskenniemi, K. (1983a)
\*QTwo-level model for morphological analysis.\*U
Pp. 683-685 in \f2Proceedings of the Eighth International Joint Conference on
Artificial Intelligence\f1, Karlsruhe.
.XP
Koskenniemi, K. (1983b) \f2Two-level Morphology: a general computational 
model for word-form recognition and production.\f1
Publication No.11, University of Helsinki, Finland.
.XP 
Koskenniemi, K. (1984)
\*QA General Computational Model for Word-Form Recognition and Production.\*U
Pp. 178-181 in \f2Proceedings of COLING-84\f1 (10th International Conference
on Computational Linguistics/22nd Annual Meeting of the ACL), Stanford, CA.
.XP
Koskenniemi, K.(1985) \*QCompilation of Automata from Two-Level Rules.\*U talk 
given at Workshop on Finite-State Morphology, CSLI, Stanford, CA July 1985.
.XP
Phillips, J. and Thompson H. (1986)
\*QA Parser for Generalised Phrase-Structure Grammars.\*U 
D.A.I. Research paper 289, Department of A.I. University of Edinburgh.
.XP 
Ritchie, G; Black, A; Pulman, S; and Russell G. 1987 \*QThe
Edinburgh/Cambridge Morphological Analyser and Dictionary System:
System Description. Version 3.0\*U Software Paper no. 11. Department of
Artificial Intelligence, University of Edinburgh. 
.XP
Russell, G.J., S.G. Pulman, G.D. Ritchie, and A.W. Black(1986)
\*QA Dictionary and Morphological Analyser for English.\*U Pp 277-279
In \f2Proceedings of the 11th International Conference on Computational
Linguistics\f1,
Bonn.
.XP
Selkirk, Elisabeth O. (1982)
\f2The Syntax of Words.\f1
Cambridge, Mass: MIT Press.
.XP
Thompson, H. and Ritchie G (1984)
\*QImplementing Natural Language Parsers.\*U in T. O'Shea and M Eisenstadt
(ed.) \f2Artificial Intelligence: Tools, Techniques, and Applications.\f1
New York: Harper and Row.
.XP
Winograd, T. (1983) \f2Language as a Cognitive Process\f1. 
Reading, Mass.: Addison-Wesley.
.XP 
Wulf, W.A, M. Shaw, P.N. Hilfinger and L. Flon (1981) \f2Fundamental 
Structures of Computer Science\f1.  
Reading, Mass.: Addison-Wesley.  Pp. 356-7.
