Manaal Faruqui

Lexical & Distributional Semantics Evaluation Benchmarks

This page contains various evaluation benchmarks developed and released (as open source) by researchers working in the field of semantics (in no particular order). If you know of a resource that should be present here, please drop me an email.

Check the better updated page here.


Word Similarity

  1. WordSim-353. [data and reference]

  2. WordSim-353 similarity. [data and reference]

  3. WordSim-353 relatedness. [data and reference]

  4. Rubenstein and Goodenough. [data] [reference]

  5. Miller and Charles. [data] [reference]

  6. Word pair similarity using MTurk. [data] [reference]

  7. MEN dataset of word pair similarity [data and reference]

  8. Word pair similarity in context. [data] [reference]

  9. Rare word similarity dataset. [data and reference]
    • 2034 word pairs that are relatively rare with human similarity scores.
    • Example: belligerence hostility 8.6

  10. TOEFL Word pair similarity. [reference]

Word Relations

  1. Syntactic word relations. [code, data and reference]

  2. Semantic word relations. [code, data and reference]

  3. SAT word analogy. [reference]

  4. Nouns and their colors. [data] [reference]

  5. BLESS collection. [data] [reference]

  6. Similarity scores between phrases. [data][reference]

  7. SemEval 2010 Task 8. [data and reference]

  8. SemEvam 2012 Task 2. [data and reference]

Other Properties

  1. MSR Sentence completion dataset. [data and reference]

  2. Noun-noun entailment. [data] [reference]

  3. Concreteness/Abstractness. [reference]

  4. TroFi. [data and reference]

  5. Intensional/Non-intensional adj-noun pairs. [data] [reference]

  6. Literal/Non-literal usage of colors. [data] [reference]

Other Languages

  1. WordSim-353 for different languages

  2. Miller and Charles for different languages

  3. German: [data and reference]