Language Technologies Institute
11-712: Self-Paced Laboratory


Algorithms for NLP:
GLR Module Instructions, Scone semantics

NLP researchers working on large projects are frequently asked to fix partially working code, or to update existing code to improve or extend its functionality. In this assignment, you are presented with a hypothetical scenario which involves both fixing and extending a pre-existing parsing module.


Day 1

Your advisor calls you into his office to relay some discouraging news. Narley Q. Hacker, a second-year grad student on your project, has left CMU for a job writing Java applets at corporate giant Acmesoft. Unfortunately, Narly was right in the middle of prototyping a new syntactic parser for the project, and was only half done when he left. There is very little documentation. But your job will be to analyze the existing code and figure out how to complete it, while fixing any bugs that Narley left behind. Your advisor gives you a pointer to the rough design specs that he supplied initially. As you leave your advisor's office, you wonder what kind of shape Narley's code is in, and you silently swear never to buy an Acmesoft product again.


Day 2

Having read the design specs, you decide to check out the current state of the code.

You fire up Lisp, and load the given code file: given-code.lisp

(NOTE: Before loading, be sure to update this file with the name of your working directory. In the section labeled "SET HOME DIRECTORY", change the value of the variable *module-home* to your own working path.)

You note that the code loads without errors -- at least Narley left the files in a stable state. However, looking at the trace output, you note that the syntactic lexicon contains only one entry, and the Scone Knowledge Base has fewer than 30 elements:

SYNLEX: ("man" (CAT N) (ROOT MAN))
 ...
Elements in the Knowledge Base:
  {common:thing}
  {common:part}
  {common:relation}
 ...

You take a look at the grammar file, and realize that Narley never implemented the lexical lookup function that your advisor was talking about; instead, there are just a bunch of lexical rules hard-wired into the grammar:

;; Temporary lexical rules

(<n> <-- (boy)
     (((x0 root) = boy)
      ((x0 number) = sg)
      ((x0 cat) = n)))

(<n> <-- (boys)
     (((x0 root) = boy)
      ((x0 number) = pl)
      ((x0 cat) = n)))

(<v> <-- (see)
     (((x0 root) = see)
      ((x0 valency) = (*or* trans intrans))
      ((x0 cat) = v)))

(<det> <-- (the)
     (((x0 root) = the)
      ((x0 cat) = det)))

(<det> <-- (a)
     (((x0 root) = a)
      ((x0 cat) = det)))

(<p> <-- (with)
     (((x0 root) = with)
      ((x0 cat) = p)))

You decide to experiment a little bit, and find that the grammar seems to work on some cases:

>(parser boy)

((CAT N) (NUMBER SG) (ROOT BOY))
NIL

>(parser the boy)

((CAT N) (NUMBER SG) (ROOT BOY)
 (DET
    ((CAT DET) (ROOT THE))))
NIL

>(parser the boys see)

((MOOD DECLARATIVE) (VALENCY INTRANS) (CAT V) (ROOT SEE)
 (SUBJ
    ((CAT N) (NUMBER PL) (ROOT BOY)
     (DET
        ((CAT DET) (ROOT THE))))))
NIL

However, there are definitely examples where the existing grammar doesn't meet the design specification:

>(parser the boy see)

((MOOD DECLARATIVE) (VALENCY INTRANS) (CAT V) (ROOT SEE)
 (SUBJ
    ((CAT N) (NUMBER SG) (ROOT BOY)
     (DET
        ((CAT DET) (ROOT THE))))))
NIL

>(parser a boys)

((CAT N) (NUMBER PL) (ROOT BOY)
 (DET
    ((CAT DET) (ROOT A))))
NIL

Day 3

You implement the features that are required for proper DET-N and NP-VP (subject object) agreement by changing the hard-wired lexical rules and the appropriate rules for NP and S.


Day 4

You remove the hard-wired lexical rules from the grammar, and put them into the lexicon file in the proper format. Then you add the callout-based lexical rules for N, V, P and DET, as shown in the design specs.

You experiment by calling the built-in function parse-eng-morph on some inflected strings like "sees", "boys", etc.

Having loaded in your syntactic lexicon using the given load-lexicon function, you write your version of parse-eng-word as per the data specs.

You test your version of parse-eng-word by running it by hand on some sample strings.

Finally, you integrate your version of parse-eng-word by recompiling the grammar and running some tests using the parser macro.


Day 5

You notice that the basic given system allows ambiguous attachment of PP to NP and VP:

>(parser the boy see the boy with the boy)

((ROOT SEE) (CAT V) (VALENCY TRANS) (MOOD DECLARATIVE)
 (OBJ
    ((CAT N) (NUMBER SG) (ROOT BOY)
     (PPADJUNCT
        ((CAT P) (ROOT WITH)
         (OBJ
            ((CAT N) (NUMBER SG) (ROOT BOY)
             (DET
                ((CAT DET) (ROOT THE)))))))
     (DET
        ((CAT DET) (ROOT THE)))))
 (SUBJ
    ((CAT N) (NUMBER SG) (ROOT BOY)
     (DET
        ((CAT DET) (ROOT THE))))))

((ROOT SEE) (CAT V) (VALENCY TRANS) (MOOD DECLARATIVE)
 (OBJ
    ((CAT N) (NUMBER SG) (ROOT BOY)
     (DET
        ((CAT DET) (ROOT THE)))))
 (PPADJUNCT
    ((CAT P) (ROOT WITH)
     (OBJ
        ((CAT N) (NUMBER SG) (ROOT BOY)
         (DET
            ((CAT DET) (ROOT THE)))))))
 (SUBJ
    ((CAT N) (NUMBER SG) (ROOT BOY)
     (DET
        ((CAT DET) (ROOT THE))))))
NIL

Following your advisor's instructions in the design specs, you write a callout called license-attachment, and rewrite the rules that attach PP to NP and VP to use that callout.

In order to get the right attachment licensed, you'll have to write the correct dependencies into the Scone Knowledge Base. Use the design specs, which describe the proper format of a KB, is-a relations, and Scone roles, to write a KB that capture the hierarchy fragment given in the design specs.

(Hint: you will need to encode the semrole for "with" in your syntactic lexicon using the *OR* operator, since "with" can map to either {INSTRUMENT} or {ACCOMPLICE}.)

If you've gotten it all put together correctly, then the previously ambiguous sentence will have only one legal attachment for PP, and you'll have something like this coming out:

>(parser the men see the boys with the telescope)

** ATTACH FAILED {common: telescope} cannot be the {instrument (role)} of {common: boy}
** ATTACH FAILED {common: telescope} cannot be the {common: accomplice (role)} of {common: boy}
** ATTACHING {common: telescope} as the {instrument (role)} of {common: see.01}
** ATTACH FAILED {common: telescope} cannot be the {common: accomplice (role)} of {common: see.01}
((MOOD DECLARATIVE) (NUMBER PL) (PERSON 3) (VALENCY TRANS)
 (CAT V) (ROOT "see") (SEM {common: see.01})
 (SUBJ
    ((CAT N) (PERSON 3) (ROOT "men") (NUMBER PL) (SEM {common: man})
     (DET
        ((NUMBER PL) (CAT DET) (ROOT "the") (REFERENCE DEFINITE)))))
 (OBJ
    ((CAT N) (NUMBER PL) (PERSON 3) (ROOT "boy") (SEM {common: boy})
     (DET
        ((NUMBER PL) (CAT DET) (ROOT "the") (REFERENCE DEFINITE)))))
 (PPADJUNCT
    ((CAT P) (ROOT "with") (SEMROLE {common: instrument (role)})
     (OBJ
        ((CAT N) (NUMBER SG) (PERSON 3) (ROOT "telescope")
         (SEM {common: telescope})
         (DET
            ((NUMBER SG) (CAT DET) (ROOT "the") (REFERENCE DEFINITE))))))))

Day 6

You test your code by loading the test file:

/afs/cs/user/atribble/www/scone-lab/test-code.lisp

Call the function (run-tests). Once you fix any remaining bugs, you're ready to comment your code and hand it in! (Note: the output of (run-tests) should be placed in a file called test-output.txt in your handin directory.)


03-Feb-2005 by atribble@cs.cmu.edu Updated from original by EHN.