University of Sussex 8 Sep 2003 Software for morphological processing of English as developed by Kevin Humphreys , John Carroll and Guido Minnen. 1. Morpha Input Specification ----------------------------- Where the -u option is not used, each input token is expected to be of the form _. For example: A_AT1 move_NN1 to_TO stop_VV0 Mr._NNS Gaitskell_NP1 from_II nominating_VVG Contractions and punctuation must have been separated out into separate tokens. The tagset is assumed to resemble CLAWS-2, in the following respects: V... all verbs NP... all proper names N[^P]... all common nouns and for specific cases of ambiguous lexical items: 'd_VH... root is 'have' 'd_VM... root is 'would' 's_VBZ... root is 'be' 's_VHZ... root is 'have' 's_$... possessive morpheme (also _POS for CLAWS-5) ai_VB... root is 'be' ai_VH... root is 'have' ca_VM... root is 'can' sha_VM... root is 'shall' wo_VM... root is 'will' n't_XX... root is 'not' In the output, proper names and the pronoun 'I' (expected to appear as I_PPIS1 / I_PNP) are capitalised; all other wards are downcased (unless the -c option is specified). 2. Morphg Input Specification ----------------------------- Each input token is assumed to be of the form +_, for example: nominate+ing_VVG With the option -t, the output for this example will be nominating_VVG 3. Usage Examples ----------------- See the file README for a description of the command-line options. If the -f option is used it should be the last option specified and the following argument should be the name of a file containing a list of verb stems that undergo consonant doubling. If this option is not specified then the default file, 'verbstem.list' in the user's current directory, will be used. You can change the default directory and filename in the source code: else read_verbstem("verbstem.list"); in both morpha and morphg, but you will need to recompile them. Morpha and morphg act as filters, reading from the standard input and writing to the standard output. For example: morpha -act < file.txt > analysed.txt morpha -u < untagged.txt echo 'nominate+ing_VVG' | morphg -t As well as accepting piped input within a unix command line, morpha and morphg can be invoked within programs; examples follow of use within Common Lisp and Perl. ------ ;;; For Franz Allegro CL (in-package :cl-user) (defparameter *morph-name* "./morpha -at") (defun open-morph () "Open a stream to morph" (run-shell-command *morph-name* :input :stream :output :stream :wait nil)) (defun read-exhausting (stream) (do ((acc nil) (ch (read-char-no-hang stream) (read-char-no-hang stream))) ((null ch) (coerce (nreverse acc) 'string)) (push ch acc))) (defun close-morph (morph-stream) (close morph-stream) (sys:os-wait)) #| (setq str (open-morph)) (format str "added_VVD many_DA2 more_DAR exceptions_NN2 ;_; ~ revised_VVN to_TO work_VV0 as_II a_AT1 filter_NN1 ._.~%") (read-exhausting str) (close-morph str) |# ------ #!/local/perl5/bin/perl -w # # For Perl # use FileHandle; use IPC::Open2; $pid = open2( \*Reader,\*Writer,"./morpha -at"); Writer->autoflush(); sleep 1; while(<>) { $n++; print Writer "$n $_"; $got = ; print $got; }