SCS Researchers Honored by the Association for Computational Linguistics

Aaron AupperleeTuesday, August 22, 2023

Work by several SCS researchers received awards and honors at the recent Annual Meeting of the Association for Computational Linguistics.

Work by several researchers in the School of Computer Science received awards and honors at the recent Annual Meeting of the Association for Computational Linguistics.

Maarten Sap, an assistant professor in the Language Technologies Institute (LTI), and Jenny Liang, a Ph.D. student in the Software and Societal Systems Department, were among the researchers who won an Outstanding Paper Award for "NLPositionality: Characterizing Design Biases of Datasets and Models."

NLPositionality provides a framework for characterizing design biases and quantifying the positionality of natural language processing (NLP) data sets and models. The researchers found that data sets and models align predominantly with Western, white, college-educated and younger populations, and that certain groups, such as nonbinary people and nonnative English speakers, were further marginalized by data sets and models. The paper also discusses how researchers can examine their own positionality by considering how their identity, background and life experiences influence their research, data sets and models — opening the door for more inclusive NLP systems.

Sachin Kumar, a Ph.D. student in the LTI, contributed to "Minding Language Models' (Lack of) Theory of Mind: A Plug-and-Play Multi-Character Belief Tracker," which also received an Outstanding Paper Award. The researchers developed SymbolicToM, a plug-and-play algorithm that enhances the theory of mind (ToM) of off-the-shelf neural language models.

Theory of mind is the ability to reason about the mental states of other people and a key element of human social intelligence. Despite their increasingly impressive performance, large-scale neural language models still lack basic ToM capabilities. SymbolicToM reasons about the belief states of multiple characters in reading comprehension tasks and tracks each entity's beliefs, their estimation of other entities' beliefs and higher-order levels of reasoning. This method allows for more precise and interpretable reasoning than previous approaches.

LTI Professor Alex Rudnicky and Ta-Chung Chi, an LTI Ph.D. student, won an Outstanding Paper Award for their contributions to "Dissecting Transformer Length Extrapolation via the Lens of Receptive Field Analysis." Length extrapolation allows researchers to train a transformer language model on short sequences while preserving perplexities when tested on substantially longer sequences.

The research examined some of the most common length extrapolation methods using receptive field analysis empowered by a novel, cumulative, normalized gradient tool. The researchers discovered that alignment between the receptive field and training sequence length is the key to successful length extrapolation. This finding motivated a new relative positional embedding method, Sandwich, that truly makes use of the additional information in longer sequences.

Rudnicky and Chi joined Li-Wei Chen, also an LTI Ph.D. student, on "Latent Positional Information Is in the Self-Attention Variance of Transformer Language Models Without Positional Embeddings," which received an honorable mention. The research supports recent work that recommended discarding positional embeddings to facilitate more efficient pretraining of transformer language models.

Graham Neubig, an associate professor in the LTI; LTI Ph.D. students Patrick Fernandes and Emmy Liu; Kayo Yin, a former master's student in the LTI now pursuing her Ph.D. at the University of California, Berkeley; and André Martins, a former Ph.D. student in the CMU-Portugal dual degree Ph.D. program and now an associate professor at Instituto Superior Técnico, won the Resource Award for their work, "When Does Translation Require Context? A Data-Driven, Multilingual Exploration." The work seeks to improve machine translation by addressing elements of text or speech that require context to interpret properly, known as discourse phenomena.

In the paper, the researchers developed the Multilingual Discourse-Aware (MuDA) benchmark, a series of taggers that identify and evaluate model performance in these areas. They released code and data for 14 language pairs to encourage the multilingual translation community to focus on accurately capturing discourse phenomena.

For more on this and related research, visit the Association for Computational Linguistics Annual Meeting website.

For More Information

Aaron Aupperlee | 412-268-9068 | aaupperlee@cmu.edu