NLP researchers working on large projects are frequently asked to fix partially working code, or to update existing code to improve or extend its functionality. In this assignment, you are presented with a hypothetical scenario which involves both fixing and extending a pre-existing parsing module.
Your advisor calls you into his office to relay some discouraging news. Narley Q. Hacker, a second-year grad student on your project, has left CMU for a job writing Java applets at corporate giant Acmesoft. Unfortunately, Narly was right in the middle of prototyping a new syntactic parser for the project, and was only half done when he left. There is very little documentation. But your job will be to analyze the existing code and figure out how to complete it, while fixing any bugs that Narley left behind. Your advisor gives you a pointer to the rough design specs that he supplied initially. As you leave your advisor's office, you wonder what kind of shape Narley's code is in, and you silently swear never to buy an Acmesoft product again.
Having read the design specs, you decide to check out the current state of the code. You fire up Lisp, and load the given code file:
/afs/cs/project/cmt-55/lti/Lab/Modules/NLP-712/glr/given-code.lisp
You note that the code loads without errors -- at least Narley left the files in a stable state. However, looking at the trace output, you note that the lexicons contain only one entry each:
SYNLEX: ("man" (CAT N) (SEM *O-MAN-1))
SEMLEX: (*O-MAN-1 (=IS-A &O_ANIMATE))
You take a look at the grammar file, and realize that Narley never implemented the lexical lookup function that your advisor was talking about; instead, there are just a bunch of lexical rules hard-wired into the grammar:
;; Temporary lexical rules
(<n> <-- (boy)
     (((x0 root) = boy)
      ((x0 number) = sg)
      ((x0 cat) = n)))
(<n> <-- (boys)
     (((x0 root) = boy)
      ((x0 number) = pl)
      ((x0 cat) = n)))
(<v> <-- (see)
     (((x0 root) = see)
      ((x0 valency) = (*or* trans intrans))
      ((x0 cat) = v)))
(<det> <-- (the)
     (((x0 root) = the)
      ((x0 cat) = det)))
(<det> <-- (a)
     (((x0 root) = a)
      ((x0 cat) = det)))
(<p> <-- (with)
     (((x0 root) = with)
      ((x0 cat) = p)))
You decide to experiment a little bit, and find that the grammar seems to work on some cases:
>(parser boy)
((CAT N) (NUMBER SG) (ROOT BOY))
NIL
>(parser the boy)
((CAT N) (NUMBER SG) (ROOT BOY)
 (DET
    ((CAT DET) (ROOT THE))))
NIL
>(parser the boys see)
((MOOD DECLARATIVE) (VALENCY INTRANS) (CAT V) (ROOT SEE)
 (SUBJ
    ((CAT N) (NUMBER PL) (ROOT BOY)
     (DET
        ((CAT DET) (ROOT THE))))))
NIL
However, there are definitely examples where the existing grammar doesn't meet the design specification:
>(parser the boy see)
((MOOD DECLARATIVE) (VALENCY INTRANS) (CAT V) (ROOT SEE)
 (SUBJ
    ((CAT N) (NUMBER SG) (ROOT BOY)
     (DET
        ((CAT DET) (ROOT THE))))))
NIL
>(parser a boys)
((CAT N) (NUMBER PL) (ROOT BOY)
 (DET
    ((CAT DET) (ROOT A))))
NIL
You implement the features that are required for proper DET-N and NP-VP (subject object) agreement by changing the hard-wired lexical rules and the appropriate rules for NP and S.
You remove the hard-wired lexical rules from the grammar, and put them into the lexicon file in the proper format. Then you add the callout-based lexical rules for N, V, P and DET, as shown in the design specs.
You experiment by calling the built-in function
parse-eng-morph on some inflected strings like "sees",
"boys", etc.
Having loaded in your syntactic lexicon using the given
load-lexicon function, you write your version of
parse-eng-word as per the data specs.
You test your version of parse-eng-word by running it by hand on  some sample strings.
Finally, you integrate your version of parse-eng-word
by recompiling the grammar and running some tests using the
parser macro.
You notice that the basic given system allows ambiguous attachment of PP to NP and VP:
>(parser the boy sees the boy with the boy)
((ROOT SEE) (CAT V) (VALENCY TRANS) (MOOD DECLARATIVE)
 (OBJ
    ((CAT N) (NUMBER SG) (ROOT BOY)
     (PPADJUNCT
        ((CAT P) (ROOT WITH)
         (OBJ
            ((CAT N) (NUMBER SG) (ROOT BOY)
             (DET
                ((CAT DET) (ROOT THE)))))))
     (DET
        ((CAT DET) (ROOT THE)))))
 (SUBJ
    ((CAT N) (NUMBER SG) (ROOT BOY)
     (DET
        ((CAT DET) (ROOT THE))))))
((ROOT SEE) (CAT V) (VALENCY TRANS) (MOOD DECLARATIVE)
 (OBJ
    ((CAT N) (NUMBER SG) (ROOT BOY)
     (DET
        ((CAT DET) (ROOT THE)))))
 (PPADJUNCT
    ((CAT P) (ROOT WITH)
     (OBJ
        ((CAT N) (NUMBER SG) (ROOT BOY)
         (DET
            ((CAT DET) (ROOT THE)))))))
 (SUBJ
    ((CAT N) (NUMBER SG) (ROOT BOY)
     (DET
        ((CAT DET) (ROOT THE))))))
NIL
Following your advisor's instructions in the design specs, you
write a callout called license-attachment, and rewrite
the rules that attach PP to NP and VP to use that callout.
In order to get the right attachment licensed, you'll have to write the correct entries into the semantic lexicon. Use the design specs, which describe the proper format of frames, is-a relations, and semantic roles, to write frames that capture the hierarchy fragment given in the design specs.
(Hint: you will need to encode the semrole for "with"
in your syntactic lexicon using the *OR* operator, since
"with" can map to either +INSTRUMENT or
+ACCOMPLICE.)
If you've gotten it all put together correctly, then the previously ambiguous sentence will have only one legal attachment for PP, and you'll have something like this coming out:
>(parser the men see the boys with the telescope)
** ATTACH FAILED (*O-BOY-1 (+INSTRUMENT *O-TELESCOPE-1))
** ATTACH FAILED (*O-BOY-1 (+ACCOMPLICE *O-TELESCOPE-1))
** ATTACHING (*A-SEE-1 (+INSTRUMENT *O-TELESCOPE-1))
** ATTACH FAILED (*A-SEE-1 (+ACCOMPLICE *O-TELESCOPE-1))
((MOOD DECLARATIVE) (NUMBER PL) (PERSON 3) (VALENCY TRANS)
 (CAT V) (ROOT "see") (SEM *A-SEE-1)
 (SUBJ
    ((CAT N) (PERSON 3) (ROOT "men") (NUMBER PL) (SEM *O-MAN-1)
     (DET
        ((NUMBER PL) (CAT DET) (ROOT "the") (REFERENCE DEFINITE)))))
 (OBJ
    ((CAT N) (NUMBER PL) (PERSON 3) (ROOT "boy") (SEM *O-BOY-1)
     (DET
        ((NUMBER PL) (CAT DET) (ROOT "the") (REFERENCE DEFINITE)))))
 (PPADJUNCT
    ((CAT P) (ROOT "with") (SEMROLE +INSTRUMENT)
     (OBJ
        ((CAT N) (NUMBER SG) (PERSON 3) (ROOT "telescope")
         (SEM *O-TELESCOPE-1)
         (DET
            ((NUMBER SG) (CAT DET) (ROOT "the") (REFERENCE DEFINITE))))))))
NIL
You test your code by loading the test file:
/afs/cs/project/cmt-55/lti/Lab/Modules/NLP-712/glr/test-code.lisp
Call the function (run-tests). Once you fix any
remaining bugs, you're ready to comment your code and hand it in!
(Note: the output of (run-tests) should be placed in a
file called test-output.txt in your handin directory.)