Language Technologies Institute Special Talk

  • Assistant Professor of Computational Linguistics
  • Department of Linguistics
  • Georgetown University

A Multilayer View of Discourse Relation Graphs

Discourse relations such as ‘contrast’, ‘cause’ or ‘background’ are often postulated to explain our ability to construct coherence in discourse. Within discourse analysis frameworks such as Rhetorical Structure Theory (RST), it is assumed that discourse relations can be structured hierarchically, forming a graph or tree of discourse units. In this talk I will empirically examine properties of discourse graphs using multifactorial methods. Taking advantage of the richly annotated GUM corpus (Zeldes 2017) with 64,000 tokens annotated for 4,700 instances of 20 discourse relations in four English genres, I will suggest refinements to proposed constraints on discourse structures. Using ensemble methods and RNNs trained on multiple annotation layers in the corpus, we can visualize ‘heat maps’ for areas of referential accessibility in discourse graphs, and identify and disambiguate discourse markers in a manner that is sensitive to utterance level context..

Amir Zeldes is a computational linguist specializing in corpus linguistics, the extraction and analysis of linguistic structures in digital text collections. His main areas of interest are at the syntax-semantics interface: He is interested in how we say what we want to say, and especially in the kinds of discourse models we retain across sentences. This includes representing entity models of who or what has been mentioned, how they are introduced and referred back to, but also relationships between utterances as a complex discourse is constructed, such as expressing causality, signaling support for arguments and opinions with evidence, contrasts and more.

He is also very interested in how we learn to be productive in our first, second and subsequent languages, producing some (but not only, and not just any) utterances and combinations we have never heard before. He believes that very many factors constantly and concurrently influence the choice between competing constructions, which means that we need multifactorial methods and multilayer corpus data in order to understand what it is that we do when we produce and understand language.



Instructor: Graham Neubig

For More Information, Please Contact: