GDep (GENIA Dependency parser)

A dependency parser for biomedical text developed by
Kenji Sagae
at Tsujii Lab (University of Tokyo) and
the Institute for Creative Technologies (University of Southern California).

This is a version of the KSDep dependency parser trained on the GENIA Treebank for parsing biomedical text. KSDep is described in

Sagae, K., Tsujii, J. 2007. Dependency parsing and domain adaptation with LR models and parser ensembles. Proceedings of the CoNLL 2007 Shared Task. Joint Conferences on Empirical Methods in Natural Language Processing and Computational Natural Language Learning (EMNLP-CoNLL'07). Prague, Czech Republic.

and was used in the experiments in

Miyao, Y., Saetre, R., Sagae, K., Matsuzaki, T. and Tsujii, J. 2008. Task-oriented Evaluation of Syntactic Parsers and Their Representations. Proceedings of the 45th Meeting of the Association for Computational Linguistics (ACL'08:HLT).


GDep beta1
Source code and models for tagging and parsing biomedical text.

Executables for Linux, Mac OS X and Windows coming soon.

Building GDep

To build GDep in Linux, MacOS X, or Windows with Cygwin, you need gcc. Unpack the archive with
tar xzvf gdep-beta1.tar.gz

Then type
cd gdep-beta1

This will produce an executable named gdep.

Using GDep

To parse biomedical text, simply type
where INPUTFILE is a text file containing one sentence per line.

Output is written to stdout. To save the output to a file, type

where OUTPUTFILE is the file where the parser output will be written.

If you don't specify an input file, GDep accepts input from stdin.

Anyone is free to download and use the parser and the models included for research purposes. However, because this is a beta release, I strongly recommend you contact me (sagae+lrdep at cs dot cmu dot edu) if you want to do anything beyond simple testing.