The fields of reading comprehension and question answering are quickly developing, both in terms of modeling and data work. There are currently over 100 datasets, over 40 of which were published after 2018. However, most new datasets get "solved" soon after publication, and this is largely due not to the verbal reasoning capabilities of our models, but to annotation artifacts and shallow cues in the data that they can exploit. This talk discusses the biggest methodology issues, as well as the latest tips and tricks for formulating challenging questions. The target audience is both the researchers working on benchmark development, and the NLP practitioners who would like to know what the current benchmarks are actually measuring.
Anna Rogers is a post-doctoral associate at the University of Copenhagen. Her main research areas are interpretability, evaluation and analysis of deep learning models for NLP. She is also active in the sphere of meta-research and NLP methodology, working on issues in peer review and organizing the workshop on Insights from Negative Results in NLP (EMNLP 2020, 2021).
Zoom Participation. See announcement.