Language Technologies Institute
11-712: Self-Paced Laboratory

Algorithms for NLP:
GLR Module Design Specs

Goal

Write a small parsing system that can handle simple declarative sentences, simple NPs, and prepositional phrases, using simple semantic restrictions to block attachment of PPs that aren't licensed by a basic semantic lexicon.

Syntactic Lexicon

The syntactic lexicon should contain lines with the following form:

("word" (feature value)+)

Look at the given syntactic lexicon for an example.

The syntactic lexicon should encode the following slots:

Nouns: cat (n), number (sg or pl, for irregular forms), sem (see below)
Verbs: cat (v), valency (trans or intrans), sem (see below)
Prepositions: cat (p), semrole (see below)
Determiners: cat (det), reference (definite or indefinite), number (sg or pl)

The value of the sem feature is a symbol corresponding to the word's entry in the semantic lexicon. It is comprised of a prefix (*A- for verbs, *O- for nouns), a symbol denoting the root form of the word (e.g., SEE), and an index to differentiate the particular meaning of the (root, pos) pair (positive integers, starting at 1). For example, to encode both the transitive and intransitive meanings of "see", we would use *A-SEE-1 and *A-SEE-2).

The value of the semrole feature is a symbol corresponding to the relation (slotname) denoted by the prepositions in the semantic lexicon. It is comprised of a prefix (+) and the name of the relation; e.g., for "in" we might encode the semantic role as +LOCATION.

Since part of your assignment is to re-use the Tomita morphology code and combine it with lexical lookup to inflect lexical entries for number, you shouldn't have to include the number feature unless you are encoding an irregular form not handled by the morphology code (this piece of the assignment is described in more detail later on).

Semantic Lexicon

The semantic lexicon should contain entries with the following form:

(frame-name (slot-name slot-value)+)

Look at the given semantic lexicon for an example.

The semantic lexicon should encode the following features:

Objects (Nouns): =is-a (class), semroles
Actions (Verbs): =is-a (class), semroles

Lexical concepts (those which appear in the sem feature in the syntactic lexicon) are always prefixed with *A- or *O-, and appear as the first element in the semantic lexicon entries.

Inheritance (e.g., IS-A) links are always prefixed with '='. For this assignment, you only need to include =IS-A. These links appear inside the semantic lexicon entry as a list, where the first element is =IS-A and the second element is the parent class.

Classes are always prefixed with '&', and appear as fillers in =IS-A or semantic role slots.

Semantic role names (semroles) are always prefixed with '+'. Semantic roles appear in semantic lexicon entries as lists, where the first element is the role name and the second element denotes the class restricting the set of legal fillers for the role.

Coverage

You will need to write syntactic lexicon entries to handle the word occurrences in these sentences:

a man
the man
the men
the boy
the boys
the man sees
the man sees the boy
the man sees the boy with the telescope
the man sees the boy with the dog

You will need to write semantic lexicon entries to handle these concepts:

*A-SEE-1
*A-SEE-2
&O_ANIMATE
*O-MAN-1
*O-BOY-1
*O-DOG-1
*O-TELESCOPE-1

You will need to encode these semantic roles:

*A-SEE-1, *A-SEE-2: (+INSTRUMENT &OPTICAL_INSTRUMENT)
&O_ANIMATE: (+ACCOMPLICE &O_ANIMATE)

Your semantic frames should model this hierarchy fragment:

Loading the Lexicons

In the given code file, you will find functions load-lexicon and load-semantics, which you can use to load your completed lexicons into Lisp.

Syntactic Grammar

The grammar should include rules for the following constructions:

<start> <==> (<np>)
<start> <==> (<vp>)
<start> <==> (<np> <vp>)
<np> <==> (<np> <pp>)
<np> <==> (<det> <n>)
<np> <==> (<n>)
<vp> <==> (<vp> <pp>)
<vp> <==> (<v> <np>)
<vp> <==> (<v>)
<pp> <==> (<p> <np>)

Lexical Lookup, Morphological Inflection

Your grammar should not use lexical rules inside the grammar; instead, you should use the Tomita "wildcard" rule syntax, and write a Lisp callout function to read in lexical items:

<n> <-- (%)
<v> <-- (%)
<det> <-- (%)
<p> <-- (%)

The form of each rule should be like this:

(<n> <-- (%)
     ((x0 <= (parse-eng-word (string-downcase (symbol-name (x1 value)))))
      ((x0 cat) = n)))

You should write a function called parse-eng-word, which performs morphology on its string argument, and returns the inflected lexical f-structure for the word. This should be done in three steps:

Use the built-in function parse-eng-morph to return the set of ("root" morph) pairs that are possible for the word;

Look up each root form in the lexicon to see if it exists;

If a morpheme was found attached to the root, inflect any lexical entries appropriately. Write a function called inflect-lex to inflect nouns and verbs, as follows:

INFLECT-LEX

Assigns agreement features for N and V, depending on presence or
absence of +S morpheme and/or explicit lexical features:

 N: 
   - Defaults to (PERSON 3), unless feature supplied by lexicon
   - Defaults to (NUMBER SG), unless:
            * feature supplied by lexicon
            * +S is present -> (NUMBER PL)
 V: 
   - If +S present, (PERSON 3), else will unify with any SUBJ
     (functionally the same as (*OR* 1 2 3)
   - Defaults to (NUMBER PL), unless:
            * feature supplied by the lexicon
            * +S is present -> (NUMBER SG)

(Hint: you should study the data structure provided by the load-lexicon function, so you can retrieve the uninflected lexical items from the lexicon using gethash).

Compiling and Loading the Grammar

You should use the compgra function to compile and load the grammar (see the example in the given code file). You will need to recompile your grammar with compgra each time you make a change to the grammar before you will be able to test the change.

Semantic Restrictions on PP Attachment

Once you have your grammar working, you should add Lisp callouts to the rules which attach PPs to NP and VP, in order to implement semantic restrictions.

The function semrole-filler-match, provided in the given code file, will do most of the work for you -- its arguments are the semantic lexicon entry for the head (NP or VP), the semrole (from the P's syntactic lexicon entry), and the semantic lexicon entry for the filler (the PP object). This function will return T or NIL depending on whether the semantic lexicon contains information that licenses the given attachment, using some inheritance methods defined in the function.

In order to use this function, you will have to write a grammar callout, called license-attachment, which takes two arguments: the f-structure for the head (NP or VP) and the f-structure for the filler (PP), extracts appropriate information from the f-structure(s) and/or the semantic lexicon, calls semrole-filler-match, and returns the appropriate new f-structure (head with PP attached) or NIL depending on the result of the call to semrole-filler-match.

Your code for license-attachment should print a trace message signalling the result of each call; see the examples (mentioned below) for the format of the messages.

Examples

When you're all done, you should get outputs like those shown in this set of examples, assuming you've got all the parts right. (See the instructions on how to run the testing function, (run-tests).

5-Nov-96 by ehn@cs.cmu.edu

Language Technologies Institute11-712: Self-Paced Laboratory

Algorithms for NLP: GLR Module Design Specs