![]() June 2008 @ Kobe, Japan |
Kai-min Kevin Chang
Research Associate (Special Faculty),
Language Technologies Institute,
School of Computer Science,
Carnegie Mellon University.
Profile: CV, Resume Research Statement |
My research interests include using mathematical methodologies and machine learning techniques to investigate and model various human cognitive processes. In particular, I have studied semantic presentation of objects using functional Magnetic Resonance Imaging, knowledge representation in the context of an Intelligent Tutoring System, and language processing in the connectionist framework.
Recent advances in functional Magnetic Resonance Imaging (fMRI) provide a significant new approach to studying semantic representations in humans by making it possible to directly observe brain activity while people comprehend words and sentences. fMRI measures the hemodynamic response (changes in blood flow and blood oxygenation) related to neural activity in the human brain. Images can be acquired at good spatial resolution and reasonable temporal resolution - the activity level of 15,000 to 20,000 brain volume elements (voxels) of about 50 mm3 each can be measured every second. Supervised by Dr. Marcel Just and Dr. Tom Mitchell, I used functional Magnetic Resonance Imaging to study the cortical systems that underpin semantic representation of object knowledge. In a picture-naming task, participants were presented with black and white line drawings of 60 objects from a range of categories (e.g. tools, dwellings, animals, etc.) and were instructed to think of the same properties consistently during each presentation. In Mitchell et al. (2008), we showed that word features computed from the occurrences of stimulus words (within a trillion-token Google text corpus that captures the typical use of words in English text) can predict the brain activity associated with the 638 meaning of these words. We developed a generative model that is capable of predicting fMRI neural activity well enough that it can successfully match words it has not yet encountered to their previously unseen fMRI images with accuracies far above chance level. The distributed pattern of neural activity encodes the meanings of words, and the model's success indicates some initial access to the encoding. This work was also interviewed by CBS 60 Minutes: Mind Reading and aired on June 28, 2009. More recently, we applied the vector-based models of semantic representation used in computational linguistics to model neural activation patterns obtained while subjects comprehended multi-word expressions such as adjective-noun phrases (Chang et al., 2009) and noun-noun concept combinations.
Intelligent tutoring systems derive much of their power from having a student model that describes the learner's competencies. However, constructing a student model is challenging for computer tutors that use automated speech recognition (ASR) as input, due to inherent inaccuracies in ASR. Under the supervision of Dr. Jack Mostow and Dr. Joseph Beck, I proposed two models of developing word decoding skills and demonstrated that sufficient information existed in ASR output to determine which model better fits student performance and under what circumstances (Chang et al., 2005). Moreover, we found modeling individual learners' proficiencies may enable improved speech recognition in a computer tutor (Beck et al., 2005). In the above work, we used Knowledge Tracing, a derivative of Atkinson's model (1972) of human memory, to trace student's knowledge across different skills. We then followed Reye's work (1998), which proved that Knowledge Tracing is a special case of a Bayesian network, and implemented a generic Bayesian network toolkit (BNT-SM; Chang et al., 2006) for student modeling. BNT-SM inputs a data set and a compact XML specification of a (dynamic) Bayes net model hypothesized by a researcher to describe causal relationships among student knowledge and observed behavior. It generates and executes the code to train and test the model using the Bayes Net Toolbox (Murphy, 1998). BNT-SM allows researchers to easily explore different hypothesis with respect to the knowledge representation in a student model. For example, by varying the graphical structure of a Bayesian network, we examined how tutoring intervention can affect students' knowledge state - whether the intervention is likely to scaffold or to help students to learn.
I have also worked on low-level computational models of reading, in which cognitive processes are implemented in terms of cooperative and competitive interactions among large numbers of simple, neuron-like processing units. Contemporary leaders in computational model of reading are divided in whether a localist or distributed representation is more appropriate. For example, localist representation assumes activation of a word to be a corresponding lexical unit, whereas distributed representation assumes such to be a pattern of activation distributed over a number of primitive representational units. Fortunately, I have the opportunity to work in both camps. For the localist representation, I worked with Dr. Derek Besner to challenge one of the fundamental assumption of DRC, a leading computational model of reading which assumes a Dual Route model and that information processing occurs in a Cascaded fashion within the model. In order to have the model correctly reproduces the joint effects of letter length and stimulus quality seen in skilled readers, I implemented a threshold at the letter level in DRC, as part of my undergraduate thesis. For the distributed representation, I worked with Dr. David Plaut to address two tasks in the reading model of parallel distributed processing (PDP) framework: 1) learning static representations of variable-length strings, and 2) generating continuous articulatory trajectories as output. These two tasks are fundamental extensions of PDP modeling of word reading, enabling models to process multi-syllabic words and to generate more realistic analogues of human response time.
The ultimate automated tutor could peer directly into students' minds to identify their mental states (e.g. engagement, competencies, and intentions) and decide accordingly what and how to teach at each moment. Recent advances in brain imaging technologies have brought upon several portable EEG headsets that are commercially-available and simple enough to use in schools (NeuroSky; Emotiv; BCInet). Using EEG signals recorded from adults and children reading text and isolated words, both aloud and silently, we train and test classifiers to tell if students are reading easy or hard sentences, and to distinguish among easy words, hard words, pseudo-words, and unpronounceable strings (Mostow, Chang & Nelson, 2011). Better-than-chance performance shows promise for tutors to use EEG at school. This development makes it feasible to record longitudinal EEG data in authentic school settings.
Humans use speech to communicate what's on their mind. However, until now, automatic speech recognizers (ASR) and dialogue systems have had no direct way to take into account what is going on in a speaker's or listener's mind. One way to address this limitation is to use EEG signals to infer mental states. Chen et al. (to appear) utilized EEG to adapt language model for ASR. We train and test classifiers that input this EEG signal from adults and children reading text. We use its probabilistic output to control weighted interpolation of separate language models for easy and difficult reading. We show that such EEG-adapted ASR achieves higher accuracy than two baselines, and analyze how its performance depends on EEG classification accuracy. Furthermore, ASR are usually used to recognize a target speaker's speech, but sometimes they are used in an environment full of other sounds (e.g., background noise, speech from other people, etc.). The ASR recognition performance suffers as some of the background noises are erroneously recognized as user speech and the insertion errors increases as a result. Because humans use speech to communicate what's on their mind, taking into account speakers' mind can help distinguish if the sounds are from the speaker's speech or not. Chen et al. (submitted) train and test EEG classifiers that estimate the probability that a target user is speaking, listening or idling. The probability estimates are used to remove recognized words not from the target user and thereby reduce insertion errors generated by other sounds. These pilot work are steps towards improving ASR more generally by using EEG to distinguish mental states.
| When | Where | What |
|---|---|---|
| 1981-1995 | Taipei, Taiwan | I spent the first 14 years of my life in Taiwan. I was pretty ordinary. |
| 1995 | Canada | At age of 14, my family decided to immigrate to Canada - a move that fundamentally shapes my life and my character. |
| 1995-1998 | Vancouver, BC, Canada | I studied in Eric Hamber Secondary School. |
| Summer 1998 | Hamilton, ON, Canada | I was a MacShad98 of Shad Valley. |
| 1999-2003 | Waterloo, ON, Canada | I graduated with a Bachelor of Mathematics in Computer Science and Psychology at University of Waterloo. |
| 2003-2004 | Taipei, Taiwan | I worked on the Automatic Speech Analysis System engine of MyET, a promising English-teaching software developed by LLabs. |
| 2003-present | Pittsburgh, PA, USA | I am a graduate student in the Language Technology Institute at Carnegie Mellon University. |
| March 8, 2010 | Tokyo, Japan | I am engaged! |
| Dec 29, 2010 | Vancouver, BC, Canada | I am married to my lovely Yi-Chia Wang. |
| June 6, 2011 | Pittsburgh, PA, USA | Dr. Chang! |
|
Some people write their diaries with words, some record them with pictures. I mark mine with food! Yes, I love to eat! My plan is taste all the savoury dishes in the world and mark them on my Savoury Google Maps! Still a long way to go, but I am getting there! :p I like to read the Slashdot, the tw.bbs.talk.joke newsgroup, and watch Comedy Central on TV. Three comic strips that I frequently visit are Piled Higher and Deeper, Dilbert, and River's 543. For leisure, I enjoy playing poker, chess and pool. I am also very into mobile devices. I frequent xda-developers and stay up to date on many smart phone devices. My current phone is AT & T Tilt2. Finally, I treasure freedom in speech, thoughts, codes, and am an advocate of Open Source software. PS, I was named a student of Watermelon according to this news article, originally published by University of Waterloo school officials on Apr 1, 2003. ;) Quite frankly, I joined Carnegie Mellon University later and that indeed made me a Watermelon. FYI, Kevyn Collins-Thompson is also a Watermelon. |
![]() |