Source code and
a Mac OS X (Intel) binary are available for a newer implementation
of the parser (models are not compatible with previous
versions, and only a WSJ model is provided for now).
Usage:
(Usage for the new version is slightly different. See the
README file included in the version you download for usage
information.)
./ksdep -m MODEL -b 10 INPUTFILE
where MODEL can be
- wsj.mod (a model trained on PTB WSJ 02-21)
- genia.mod (a model trained on the genia treebank)
- combo-genia-wsj.mod (a model trained on genia + wsj)
and INPUTFILE is in CoNLL-X format.
The -b option controls the beam width. For deterministic parsing,
use -b 1. Larger values (for example, -b 1000) might improve
accuracy minimally, at the expense of much greater computational
cost.
The parser can be trained, using the -t option.
When training, -c FLOAT sets the regularization parameter
(-c 1.0 is usually a good guess, smaller values may overfit more).
The -i INT option sets the number of iterations for maxent
training (at every 100 iterations, a snapshot of the model will
be written to disk).
The -m STRING option sets the model name, like it does in
parse mode (except that in training mode the file will be created,
or overwritten if it already exists).
Download:
GDep (GENIA Dependency parser)
GDep is a version of KSDep that does
part-of-speech tagging, named entity recognition and dependency parsing,
tuned specifically for biomedical text using the GENIA Treebank. GDep
takes plain text as input, with one sentence per line.
KSDep with a GENIA model Use this for parsing biomedical text if you want to use your own part-of-speech tagger (for plain text input, use GDep instead). Input must be in CoNLL-X format, with part-of-speech tags as in the GENIA treebank. Included: linux binary and source code. To build on Mac OSX or Windows (with cygwin), just type "make" in the directory where you unpacked the files.
WSJ tagging + parsing
This version includes tokenization and POS tagging (by Yoshimasa Tsuruoka), and takes plain text sentences as input. Source code with Linux binaru and WSJ models for tagging and parsing. Don't try to train with this parser; it won't work. If you need to train new parsing models, use the one below.
Anyone is free to download and use the parser and the models included.
However, because this is an alpha release, I strongly recommend you
contact me (sagae+lrdep at cs dot cmu dot edu) if you want to do
anything beyond simple testing.