AQMAR Arabic Wikipedia Named Entity Corpus & Tagger

These resources were developed by Behrang Mohit, Nathan Schneider, Rishav Bhowmick, Kemal Oflazer, and Noah Smith as part of the AQMAR project.


This is a 74,000-token corpus of 28 Arabic Wikipedia articles hand-annotated for named entities.


This is a tagger for Arabic text, implemented in Java. It includes a pretrained named entity model.

Further Reading

Please cite the following if you write any papers involving the use of the data above:


This research was supported by Qatar National Research Fund grant NPRP 08-485-1-083.


Please e-mail behrang [strudel] or nschneid [strudel] with questions.