Language Technologies Ph.D. Thesis Defense
- Remote Access Enabled - Zoom
- Virtual Presentation
- DANIEL SCHWARTZ
- Ph.D. Student
- Language Technologies Institute
- Carnegie Mellon University
Using Multitask Learning to Understand Language Processing in the Brain
Understanding the cognitive processes involved in human language comprehension has been a longstanding goal in the scientific community. While significant progress towards that goal has been made, the processes involved in integrating a sequence of individual word meanings into the meaning of a clause, sentence, or discourse are poorly understood. Recently, the natural language processing (NLP) community has demonstrated that deep language models are, to an extent, capable of representing word meanings and performing integration of those meanings into a representation that can successfully capture the meaning of a sequence. In this thesis, we therefore leverage deep language models as an analysis tool to improve our understanding of human language processing.
In this talk, we first investigate what a deep language model learns when it is trained to predict brain activity recordings. We find evidence that when a deep language model is fine-tuned to predict recordings of brain activity, the information encoded into the parameters of the model better represents the information used by people to process language than a model which has not been fine-tuned to predict brain activity. Furthermore, the modifications to the parameters which occur during fine-tuning generalize to prediction of an unseen person's brain activity, and to some degree, generalize across different brain activity recording modalities. This suggests that the fine-tuning process causes the model to learn representations which better fit the underlying latent cognitive processes involved in language and not just idiosyncrasies of a particular person's brain activity or a particular recording modality.
Next, we develop an analysis method which compares how a model trained in a multitask setting makes predictions when predicting different kinds of labels on language data from the psycholinguistics, neuroscience, and NLP communities. In our demonstration of the method we combine together roughly eighty different tasks from eleven different datasets that do not share examples. We show that the method rates tasks as highly similar if they are equivalent by definition or are intuitively very similar even when those tasks come from different datasets and do not share examples.
We next show how the similarity profile of a task (those tasks to which it is most similar) can help us understand the mechanisms a deep language model uses to make its predictions. We also examine the similarities between cognition-relevant tasks and NLP tasks, and find that the mechanisms underlying the model's predictions in cognition-relevant tasks are most related to semantic information, non-core arguments and modifiers in a sentence. The methods developed here can be applied with different sets of tasks to gain different kinds of insight into both deep language models and cognitive processing and offer a promising direction for understanding language processing in the brain.
Tom Mitchell (Chair)
Stefan Frank (Radboud University Nijmegen)
Zoom Participation Enabled. See announcement.