AQMAR Arabic Wikipedia Named Entity Corpus & Tagger
This is a 74,000-token corpus of 28 Arabic Wikipedia articles hand-annotated for named entities.
This is a tagger for Arabic text, implemented in Java. It includes a pretrained named entity model.
The tagger is available for download on github.
Please cite the following if you write any papers involving the use of the data above:
This research was supported by Qatar National Research Fund grant NPRP 08-485-1-083.
Please e-mail behrang [strudel] cmu.edu or nschneid [strudel] cs.cmu.edu with questions.