Machine Learning Thesis Defense

  • Ph.D. Student
  • Machine Learning Department
  • Carnegie Mellon University
Thesis Orals

Using Machine Learning for Time Series to Elucidate Sentence Processing in the Brain

Language comprehension is a crucial human ability, but many questions remain unanswered about the processing of sentences. Specifically, how can sentences that are structured differently, e.g. "The woman helped the man.'' and "The man was helped by the woman.'' map to the same proposition? High temporal resolution neuroimaging, coupled with machine learning can potentially provide answers. Using magnetoencephalography (MEG) we can measure the activity of many neurons at a rate of 1kHz while humans read sentences. With machine learning, we can decode sentence attributes from the neural activity and gain insight into the inner computations of the brain during sentence comprehension.

We collected data from subjects reading active and passive voice sentences in two experiments: a pilot and a confirmation set The pilot set constituted a testbed for optimizing the application of machine learning to MEG data, and was used for exploratory analysis to generate data-driven hypotheses. The confirmation set allowed for confirmation of these hypotheses via replication.

Through exploration of the pilot data set, we are able to make several concrete recommendations on the optimal application of machine learning to MEG data. Specifically, we demonstrate that by combining data from multiple human subjects as additional features, classifier performance is significantly improved, even without additional data samples. Furthermore we show that while test set signal-to-noise ratio (SNR) is critical for classifier performance, training set SNR has limited impact on performance. We achieve near-perfect classification accuracy on a wide range of decoding tasks from neural activity. We also explored a non-machine learning technique, representational similarity analysis (RSA) that is quite popular for analyzing neuroimaging data, and show that by combining data across subjects we can again greatly improve performance.

We examine how sentence processing differs between active and passive sentences by showing the information flow over time during the reading of each type of sentence. We additionally explore post-sentence wrap-up activity that carries information about syntax and integrated semantics of the sentence being read. We compare the ability of models that separate syntax, semantics, and integration to explain neural activity during the post-sentence time period. Our results provide converging evidence that after a sentence is read, its syntactic structure is processed, followed by a semantic integration of sentence meaning. These results refine previous theories of sentence processing as a purely incremental process by revealing the existence of a post-sentence wrap-up period.

Thesis Committee:
Tom Mitchell (Chair)
John Anderson
Geoffrey Gordon
Mark Richardson (UPMC, Department of Neurological Surgery)
Stanislas Dehaene, Coll├Ęge de France

Copy of Thesis Document

For More Information, Please Contact: