This Protolexicon is version 0.  It is divided into the following
parts:

this directory		
src			source files for building the system
Lexicon			sample lexica

The system is implemented in rough accordance with the protolexicon
specification of winter '87. with the following differences.

Morphological Tables
====================

The syntax of morphological tables has changed.  They now have the
following form:

table(Id, IC, 
	[Rule1 ..... RuleN],
	[Form1 ..... FormN]).

where 
Id is a unique identifer for the table, 
IC is the set of input conditions,
Rule1 thru RuleN are the rules to apply and
Form1 thru FormN are the resulting forms.

The syntax of the IC's and Forms has changed somewhat.  IC's can now
consist of properties (i.e. (possibly parameterized) templates), and
equations of the form Root = Stem + e and of the form Character = `c.
The first form analyses a string into two or more parts.  The second
requires that Character be instantiated to a member of the character
class ``c''.  See the file Protolexicon/Lexicon/UCG/proto.mt for
examples. 

There is the additional construct 

class(Abbreviation, Set)

for defining character classes, where Abbreviation is a PROLOG atom
and Set is of the form {Chars}, Chars a comma separated list.

Lexical Entries
===============

The syntax of lexical entries is slightly changed.  They now have the
form

Item :- CommaList

(rather than Item: CommaList).  This is simply to get round a problem
of operator precedence.  


Files
=====

NB:  THE NAME OF THE CUSTOMIZATION FILE HAS BEEN CHANGED TO
custom.q
rather unmnemonically.  

In order to run the system, the following files need to be in the
directory specified by the variable `grammar_directory', with the
assumption that the `grammar_name' variable is set to "proto":

proto.axiom	*	axioms about the sort system (see T1.6)
proto.def		type declarations, path_abbreviations,
			templates
proto.gram		grammar rules
proto.lex		lexical entries
proto.lr		lexical rules
proto.mt		morphological tables
proto.net		definitions of the lexical network
proto.sort	*	definitions of the sort system

All these files MUST exist, or the system won't start.  (An exception
to this is if you are running the graph unification system, in which
case the files marked * are not necessary).  The file proto.axiom
should contain at least the line:

properties([]).

Apart from that, any of the files can be empty.

The ``source'' feature for including definitions &c from different
files is not guaranteed to work for files apart from those with the
suffixes .lex .def and .gram 

Any of the suffixes can be redefined in the customization file.

Unimplemented features
======================

Some of the things that were talked about at the Protolexicon Meeting
just prior to the review have not been implemented (or at least not
fully).  In particular, there is no facility for allowing disjunction
and negation within a lexical entry.  The main reason for this is that
I'm not sure how these things should interact with defaults.  


The Syntax and Semantics of PIMPLE and Protolexicon entries
=============================================

(This section is not intended to be complete, merely to give
information on some of the more arcane aspects).

The syntax of PIMPLE objects is basically  unchanged, as is their semantics.
This means that you can include definitions you already have for your
system and test Protolexical entries with the rest of a grammar.

An addition to the PIMPLE repertoire is the notion of a parameterized
template.  A parameterized template has the form:

template(T1 ....Tn ) .......

where Ti is a variable bound in the definition or a path name which
points into the associated graph.  

Another addition (which is a hack for the Protolexicon) is the
``template eqivalence''.  An example (as I write, the only example) of
this is:

morphology(regular) <-> string(X), morph_stem(X), morph_root(X).

This is a hack to get information about the string form defaulted into
expanded lexical entries.  If anyone can suggest a better way of doing
this I'd be glad to hear about it.

As described in the Protolexicon document, the processing of lexical
items starts by expanding them with respect to the network defined in
the .net file.  Once that is done, (here comes the hack), the
information about the actual string form of the current item (i.e.
string(Item)) is added briefly to the set of templates/properties of
the item in order that the template equivalence relation can have some
effect.  This information is then removed.  The process of computing
the actual structure of the lexical item from the template names then
goes ahead, and we throw the generated structure and set of template
names at the morphological tables.  

NB:  UNLIKE THE PIMPLE INTERPRETATION OF TEMPLATES in lexical
definitions , it is not necessary for all the templates referenced in
Protolexical entries to be defined.  The process of constructing real
term or graph structure ignores by default templates for which there
is no definition.  This allows the use of template names/properties as
basically diacritic features to inhibit or invoke particular
morphological tables, and you can do similar things with the
invocation of defaults.  If you don't like this behaviour, you can
change the value of the variable ``ignore_unknown_templates'' to be
``off''.  

Loading of Definitions, Entries &c.
===================================

As with PIMPLE, the Protolexicon system resides in a PROLOG saved
state Protolexicon/src/pl.ss  Saying

prolog+ <PROLOG ARGUMENTS IF YOU NEED THEM> <PATHNAME TO DIRECTORY>src/pl.ss

in a directory with a custom.q file in it will automatically start up
the system.  

The command ``load'' is the top level command for loading the
definitions, entries &c.  (This must be called after the customization
routine).  The following commands load up the different types of file:


p_load_axioms			proto.axiom
p_load_sorts			proto.sort
q_load_network,			proto.net
q_load_templates,		proto.def
q_load_lexical_rules,		proto.lr
q_load_tables,			proto.mt
q_load_lexicon,			proto.lex
load_grammar.			proto.gram

(``q_'' is generally the prefix for routines to do with the
Protolexicon).   Any of these can be called to reload the relevant
information (taking care to ensure that all things dependent on those
files are also reloaded).  After reloading the sorts and axioms, you
should explicitly invoke ``encode_sort_system''.

As specified in the Protolexicon document, the total lexicon is
defined as the transitive closure of the basic entries (i.e. those
defined in the .lexical file) with respect to the set of morphological
tables.  Currently, this closure is not computed automatically.  You
must invoke the routine `q_compute_lexicon_closure' by hand.  If
you want to change this behaviour, set the variable
'compile_all_entries' to be ``on''.  Note that you still have to run
the routine `q_compute_lexicon_closure' by hand after (re)loading
individual files.  (I.e. only the top level routine `load' takes
account of the setting of this variable.)

Loading Times (Caveat)
======================

The loading times for the system are pretty slow, especially for the
morphological tables.  This arises basically
from the fact that we have to check for subsumption pair-wise across
all of the tables, so the cost is proportional to n^2, where n
is the number of tables.  (A rough timing for a system with 33 tables
is between 5 and 10 minutes).

Extra Features
=============

There is an extra feature available to those who have a system
spelling checker available for the language you're working in.  (For
English, the UNIX spelling checker is invoked for example).  The code
that does this is in the file src/qlex.pl.  If the variable,
``check_output_against_system_speller'' is set to `on', every lexical
entry generated by the Protolexicon is checked against the system's
idea of what good words are.  This should work with any spelling
checker that accepts text via standard input and writes unknown words
to standard output.  Are there French and German versions of this
utility?  If there are change the string in qlex.pl

q_system_speller("spell -b").

to whatever your utility is called.  Beware, even for English the UNIX
spelling checker has bugs.


More generally useful features are the following two predicates:

q_test(Word)

where Word is an atom and there is a corresponding basic lexical item
reports all the lexical items generable from Word.

q_test_templates(Ts)

where Ts is a list of templates, reports the maximal subset(s) of Ts
whose template definitions are compatible.


