Question-Answer Dataset

This page provides a link to a corpus of Wikipedia articles, manually-generated factoid questions from them, and manually-generated answers to these questions, for use in academic research. These data were collected by Noah Smith, Michael Heilman, Rebecca Hwa, Shay Cohen, Kevin Gimpel, and many students at Carnegie Mellon University and the University of Pittsburgh between 2008 and 2010.

Download

Manually-generated factoid question/answer pairs with difficulty ratings from Wikipedia articles. Dataset includes articles, questions, and answers.

Version 1.2 released August 23, 2013 (same data as 1.1, but now released under GFDL and CC BY-SA 3.0)
README.v1.2; Question_Answer_Dataset_v1.2.tar.gz

Archived Releases

Version 1.1 released August 6, 2010
README.v1.1; Question_Answer_Dataset_v1.1.tar.gz
Version 1.0 released February 18, 2010
README.v1.0; Question_Answer_Dataset_v1.0.tar.gz

Acknowledgments

This research project was supported by NSF IIS-0713265 (to Smith), an NSF Graduate Research Fellowship (to Heilman), NSF IIS-0712810 and IIS-0745914 (to Hwa), and Institute of Education Sciences, U.S. Department of Education R305B040063 (to Carnegie Mellon).

Question-Answer Dataset

Download

Archived Releases

Further Reading

Acknowledgments