AQMAR Arabic Wikipedia Named Entity Corpus & Tagger
Corpus
This is a 74,000-token corpus of 28 Arabic Wikipedia articles hand-annotated for named entities.
Tagger
This is a tagger for Arabic text, implemented in Java. It includes a pretrained named entity model.
-
The tagger is available for download on github.
Further Reading
Please cite the following if you write any papers involving the use of the data above:
Acknowledgments
This research was supported by Qatar National Research Fund grant NPRP 08-485-1-083.
Contact
Please e-mail behrang [strudel] cmu.edu or nschneid [strudel] cs.cmu.edu with questions.