Machine Learning / Duolingo Seminar

  • Remote Access Enabled - Zoom
  • Virtual Presentation
  • Assistant Professor of Linguistics, Data Science & Computer Science
  • Department of Linguistics and Center for Data Science
  • New York University

How do we fix natural language understanding evaluation?

We'd like computers to understand language. Tasks like textual entailment, question answering, and coreference resolution offer an appealing way to isolate and measure key natural language understanding (NLU) skills and, for a while, datasets for these tasks served as useful benchmarks for research. Recently, though, we've reached human-level performance on most of these datasets, despite abundant evidence that we haven't solved NLU. So what do we do now?

This talk opens with a brief discussion of what I learned from organizing GLUE and SuperGLUE—two multitask benchmark competitions for NLU—and then transitions into an opinionated survey of recent ideas about evaluation from across the NLU research community, ending with some vague suggestions and open questions about what's next. I'll try to leave plenty of time for discussion, so bring your own provocative ideas about evaluation.

Sam Bowman has been an assistant professor at NYU since 2016, when he completed a PhD with Chris Manning and Chris Potts at Stanford. At NYU, Sam is part of the new school-level Center for Data Science, which focuses on machine learning, the Department of Linguistics, and the Department of Computer Science. Sam's research focuses on data, evaluation techniques, and modeling techniques for sentence and paragraph understanding in natural language processing, and on applications of machine learning to scientific questions in linguistic syntax and semantics. Sam organized a twenty-three person research team at JSALT 2018, organized the GLUE and SuperGLUE benchmark competitions, and received a 2015 EMNLP Best Resource Paper Award, a 2017 Google Faculty Research Award, and a 2019 *SEM Best Paper Award.

Zoom Participation. See announcement.

For More Information, Please Contact: