- Maggie Henderson (postdoc, co-advised by Mike Tarr)
- Aria Wang (PNC/MLD PhD student, co-advised by Mike Tarr)
- Jennifer Williams (CBD PhD student)
- Ruogu Lin (CBD PhD student)
- Andrew Luo (PNC/MLD PhD student, co-advised by Mike Tarr)
- Tara Pirnia (PNC/MLD PhD student, co-advised by Bonnie Nozari)
- Joel Ye (PNC PhD student, co-advised by Rob Gaunt)
- Yuchen Zhou (CNBC/Psych PhD student, co-advised by Mike Tarr)
- Mariya Toneva (MLD PhD student, co-advised by Tom Mitchell) - Assistant Professor at the Max Plank Institute for Software Systems
- Anand Bollu (CS Master's student) - Applied Intuition
- Srinivas Ravishankar (MLD Master's student) - IBM research
- Aniketh Reddy (MLD Master's student) - UC Berkeley PhD student
- Nidhi Jain (CS Master's student)
Success in AI is often defined as achieving human level performance on tasks such as text or scene understanding. To perform like the human brain, is it useful for neural networks to have representations that are similar to brain representations?
In these projects, we use brain activity recordings to interpret neural network representations, to attempt to find heuristics to improve them, and even to change the weights learned by networks to make them more brain like. The results promise an exciting research direction.
In this project, we use functional Magnetic Resonance Imaging (fMRI) to record the brain activity of subjects while they read an unmodified chapter of a popular book. We model the measured brain activity as a function of the content of the text being read. Our model is able to extrapolate to predict brain activity for novel passages of text - beyond those on which it has been trained. Not only can our model be used for decoding what passage of text was being read from brain activity, but it can also report which type of information about the text (syntax, semantic properties, narrative events etc.) is modulating the activity of every brain region. Using this model, we found that the different regions that are usually associated with language appear to be processing different types of linguistic information. We were able to build detailed reading representations maps, in which each voxel is labeled by the type of information the model suggests it is processing.
Our approach is important in many ways. We are able not only to detect where language processing increases brain activity, but to also reveal what type of information is encoded in each one of the regions that are classically reported as responsive to language. From just one experiment, we can reproduce a mutiple findings. Had we chosen to follow the classical method, each of our results would have required its own experiment. This approach could make neuroimaging much more flexible. If a researcher develops a new reading theory after running an experiment, they would annotate the stimulus text accordingly, and test the theory against the previously recorded data without having to collect new experimental data.
To study the sub-word dynamics of story reading, we turned to Magnetoencephalography (MEG), which records brain activity at a time resolution of one millisecond. We recorded the MEG activity when the subjects undergo the same naturalistic task of reading a complex chapter from a popular novel. We were interested in identifying the different stages of continuous meaning construction when subjects read a text. We noticed the similarity between neural network language models which can ``read" a text word by word and predict the next word in a sentence, and the human brain. Both the models and the brain have to maintain a representation of the previous context, they have to represent the features of the incoming word and integrate it with the previous context before moving on to the next word.
We used the neural network language model to detect these different processes in brain data. Our novel results include a suggested time-line of how the brain updates its representation of context. They also demonstrate the incremental perception of every new word starting early in the visual cortex, moving next to the temporal lobes and finally to the frontal regions. Furthermore, the results suggest the integration process occurs in the temporal lobes after the new word has been perceived.
Abstract: Many statistical models for natural language processing exist, including context-based neural networks that (1) model the previously seen context as a latent feature vector, (2) integrate successive words into the context using some learned representation (embedding), and (3) compute output probabilities for incoming words given the context. On the other hand, brain imaging studies have suggested that during reading, the brain (a) continuously builds a context from the successive words and every time it encounters a word it (b) fetches its properties from memory and (c) integrates it with the previous context with a degree of effort that is inversely proportional to how probable the word is. This hints to a parallelism between the neural networks and the brain in modeling context (1 and a), representing the incoming words (2 and b) and integrating it (3 and c). We explore this parallelism to better understand the brain processes and the neural networks representations. We study the alignment between the latent vectors used by neural networks and brain activity observed via Magnetoencephalography (MEG) when subjects read a story. For that purpose we apply the neural network to the same text the subjects are reading, and explore the ability of these three vector representations to predict the observed word-by-word brain activity.
Our novel results show that: before a new word i is read, brain activity is well predicted by the neural network latent representation of context and the predictability decreases as the brain integrates the word and changes its own representation of context. Secondly, the neural network embedding of word i can predict the MEG activity when word i is presented to the subject, revealing that it is correlated with the brain's own representation of word i. Moreover, we obtain that the activity is predicted in different regions of the brain with varying delay. The delay is consistent with the placement of each region on the processing pathway that starts in the visual cortex and moves to higher level regions. Finally, we show that the output probability computed by the neural networks agrees with the brain's own assessment of the probability of word i, as it can be used to predict the brain activity after the word i's properties have been fetched from memory and the brain is in the process of integrating it into the context.
Abstract: Story understanding involves many perceptual and cognitive subprocesses, from perceiving individual words, to parsing sentences, to understanding the relationships among the story characters. We present an integrated computational model of reading that incorporates these and additional subprocesses, simultaneously discovering their fMRI signatures. Our model predicts the fMRI activity associated with reading arbitrary text passages, well enough to distinguish which of two story segments is being read with 74% accuracy. This approach is the first to simultaneously track diverse reading subprocesses during complex story processing and predict the detailed neural representation of diverse story features, ranging from visual word properties to the mention of different story characters and different actions they perform. We construct brain representation maps that replicate many results from a wide range of classical studies that focus each on one aspect of language processing, and offers new insights on which type of information is processed by different areas involved in language processing. Additionally, this approach is promising for studying individual differences: it can be used to create single subject maps that may potentially be used to measure reading comprehension and diagnose reading disorders.
We present a methodological approach employing magnetoencephalography (MEG) and machine learning techniques to investigate the flow of perceptual and semantic information decodable from neural activity in the half second during which the brain comprehends the meaning of a concrete noun. Important information about the cortical location of neural activity related to the representation of nouns in the human brain has been revealed by past studies using fMRI. However, the temporal sequence of processing from sensory input to concept comprehension remains unclear, in part because of the poor time resolution provided by fMRI. In this study, subjects answered 20 questions (e.g. is it alive?) about the properties of 60 different nouns prompted by simultaneous presentation of a pictured item and its written name. Our results show that the neural activity observed with MEG encodes a variety of perceptual and semantic features of stimuli at different times relative to stimulus onset, and in different cortical locations. By decoding these features, our MEG-based classifier was able to reliably distinguish between two different concrete nouns that it had never seen before. The results demonstrate that there are clear differences between the time course of the magnitude of MEG activity and that of decodable semantic information. Perceptual features were decoded from MEG activity earlier in time than semantic features, and features related to animacy, size, and manipulability were decoded consistently across subjects. We also observed that regions commonly associated with semantic processing in the fMRI literature may not show high decoding results in MEG. We believe that this type of approach and the accompanying machine learning methods can form the basis for further modeling of the flow of neural information during language processing and a variety of other cognitive processes.
Abstract: Functional neuroimaging measures how the brain responds to complex stimuli. However, sample sizes are modest, noise is substantial, and stimuli are high-dimensional. Hence, direct estimates are inherently imprecise and call for regularization. We compare a suite of approaches which regularize via shrinkage: ridge regression, the elastic net (a generalization of ridge regression and the lasso), and a hierarchical Bayesian model based on small-area estimation (SAE) ideas. The SAE approach draws heavily on borrowing strength from related areas as to make estimates more precise. We contrast regularization with spatial smoothing and combinations of smoothing and shrinkage. All methods are tested on functional magnetic resonance imaging data from multiple subjects participating in two different experiments related to reading, for both predicting neural response to stimuli and decoding stimuli from responses. Interestingly, cross validation (CV) automatically picks very low/high regularization parameters in regions where the classification accuracy is high/low, indicating that CV is a good tool for identification of relevant voxels for each feature. However, surprisingly, all the regularization methods work equally well, suggesting that beating basic smoothing and shrinkage will take not just clever methods, but careful modeling.
Abstract: This paper deals with the problem of nonparametric independence testing, a fundamental decision-theoretic problem that asks if two arbitrary (possibly multivariate) random variables X,Y are independent or not, a question that comes up in many fields like causality and neuroscience. While quantities like correlation of X,Y only test for (univariate) linear independence, natural alternatives like mutual information of X,Y are hard to estimate due to a serious curse of dimensionality. A recent approach, avoiding both issues, estimates norms of an operator in Reproducing Kernel Hilbert Spaces (RKHSs). Our main contribution is strong empirical evidence that by employing shrunk operators when the sample size is small, one can attain an improvement in power at low false positive rates. We analyze the effects of Stein shrinkage on a popular test statistic called HSIC (Hilbert-Schmidt Independence Criterion). Our observations provide insights into two recently proposed shrinkage estimators, SCOSE and FCOSE - we prove that SCOSE is (essentially) the optimal linear shrinkage method for estimating the true operator; however, the non-linearly shrunk FCOSE usually achieves greater improvements in test power. This work is important for more powerful nonparametric detection of subtle nonlinear dependencies for small samples.
Abstract: As a person reads, the brain performs complex operations to create higher order semantic representations from individual words. While these steps are effortless for competent readers, we are only beginning to understand how the brain performs these actions. Here, we explore semantic composition using magnetoencephalography (MEG) recordings of people reading adjective-noun phrases presented one word at a time. We track the neural representation of semantic information over time, through different brain regions. Our results reveal several novel findings: 1) the neural representation of adjective semantics observed during adjective reading is reactivated after phrase reading, with remarkable consistency, 2) a neural representation of the adjective is also present during noun presentation, but this neural representation is the reverse of that observed during adjective presentation 3) the neural representation of adjective semantics are oscillatory and entrained to alpha band frequencies. We also introduce a new method for analyzing brain image time series called Time Generalized Averaging. Taken together, these results paint a picture of information flow in the brain as phrases are read and understood.