|
|
Project LISTEN |
|
||||||||||||||||
|
Summary |
Project LISTEN Publications [Note: Links to full text are included when possible, e.g. after publication or conference presentation. * marks publications by others. [ITS 2008 help] Beck, J. E., Chang, K.-m., Mostow, J., & Corbett, A. (2008, June 23-27). Does help help? Introducing the Bayesian Evaluation and Assessment methodology. 9th International Conference on Intelligent Tutoring Systems, Montreal. Abstract: Most ITS have a means of providing assistance to the student, either on student request or when the tutor determines it would be effective. Presumably, such assistance is included by the ITS designers since they feel it benefits the students. However, whether-and how-help helps students has not been a well studied problem in the ITS community. In this paper we present three approaches for evaluating the efficacy of the Reading Tutor's help: creating experimental trials from data, learning decomposition, and Bayesian Evaluation and Assessment, an approach that uses dynamic Bayesian networks. We have found that experimental trials and learning decomposition both find a negative benefit for help--that is, help hurts! However, the Bayesian Evaluation and Assessment framework finds that help both promotes student long-term learning and provides additional scaffolding on the current problem. We discuss why these approaches give divergent results, and suggest that the Bayesian Evaluation and Assessment framework is the strongest of the three. In addition to introducing Bayesian Evaluation and Assessment, a method for simultaneously assessing students and evaluating tutorial interventions, this paper describes how help can both scaffold the current problem attempt as well as teach the student knowledge that will transfer to later problems. [ITS 2008 LD] Beck, J. E., & Mostow, J. (2008, June 23-27). How
who should practice: Using learning
decomposition to evaluate the efficacy of different types of practice for
different types of students. 9th International Conference on Intelligent
Tutoring Systems, Montreal. Abstract: A basic question of instruction is how much students will actually learn from it. This paper presents an approach called learning decomposition, which determines the relative efficacy of different types of learning opportunities. This approach is a generalization of learning curve analysis, and uses non-linear regression to determine how to weight different types of practice opportunities relative to each other. We analyze 346 students reading 6.9 million words and show that different types of practice differ reliably in how efficiently students acquire the skill of reading words quickly and accurately. Specifically, massed practice is generally not effective for helping students learn words, and rereading the same stories is not as effective as reading a variety of stories. However, we were able to analyze data for individual student's learning and use bottom-up processing to detect small subgroups of students who did benefit from rereading (11 students) and from massed practice (5 students). The existence of these has two implications: 1) one size fits all instruction is adequate for perhaps 95% of the student population using computer tutors, but as a community we can do better and 2) the ITS community is well poised to study what type of instruction is optimal for the individual. [ITS 2008 compare] Zhang, X., Mostow, J., & Beck, J. E. (2008). A Case Study Empirical Comparison of Three Methods to Evaluate Tutorial Behaviors. 9th International Conference on Intelligent Tutoring Systems, Montreal. Abstract: Researchers have used various methods to evaluate the fine-grained interactions of intelligent tutors with their students. We present a case study comparing three such methods on the same data set, logged by Project LISTEN's Reading Tutor from usage by 174 children in grades 2-4 (typically 7-10 years) over the course of the 2005-2006 school year. The Reading Tutor chooses randomly between two different types of reading practice. In assisted oral reading, the child reads aloud and the tutor helps. In "Word Swap," the tutor reads aloud and the child identifies misread words. One method we use here to evaluate reading practice is conventional analysis of randomized controlled trials (RCTs), where the outcome is performance on the same words when encountered again later. The second method is learning decomposition, which estimates the impact of each practice type as a parameter in an exponential learning curve. The third method is knowledge tracing, which estimates the impact of practice as a probability in a dynamic Bayes net. The comparison shows qualitative agreement among the three methods, which is evidence for their validity. [FLET 2008] Mostow, J. (2008). Experience from a Reading Tutor that listens: Evaluation purposes, excuses, and methods. In C. K. Kinzer & L. Verhoeven (Eds.), Interactive Literacy Education: Facilitating Literacy Environments Through Technology, pp. 117-148. New York: Lawrence Erlbaum Associates, Taylor & Francis Group. Click here to order book from Amazon.com. Abstract: This chapter gives three good reasons to evaluate reading software, identifies three methods for doing so, and refutes three excuses for not evaluating – namely, that evaluation is premature, unnecessary, or will be done by others: (1) Wizard of Oz
experiments help test whether (and clarify how) a proposed approach might
work, and refute the excuse that evaluation is premature because the approach
has not yet been implemented in a proposed system that may take years to
develop. (2) Conventional
controlled studies help determine whether an implemented system helps
children gain more in reading than they would otherwise. This criterion
is necessary to improve on the status quo, but the difficulty of meeting it refutes
the excuse that evaluation is unnecessary due to the supposedly innate
superiority of learning on computers, or of a proposed way to use them. (3) Experiments
embedded in an automated tutor help analyze which tutorial actions help which
students and words, thereby guiding improvement of the tutor in ways that
third party evaluation cannot, thus refuting the excuse that evaluation can
be left to others. The chapter
details some practical lessons learned from designing, performing, and
analyzing experiments embedded in Project LISTEN’s school-deployed Reading
Tutor, which uses speech recognition to listen to children read aloud, and is
helping hundreds of children learn to read. [STLL 2008 SC] Aist, G., & Mostow, J. (2008). Faster, better task choice in a reading tutor that listens. In V. M. Holland & F. P. Fisher (Eds.), The Path of Speech Technologies in Computer Assisted Language Learning: From Research Toward Practice (pp. 220-240). New York: Routledge. Abstract: We analyze the efficiency and effectiveness of task choice in the context of a reading tutor that listens to children read aloud. We define efficiency as the time to pick a story, and effectiveness in terms of exposing students to new material. We describe design features we added to improve the Reading Tutor’s efficiency and effectiveness, and evaluate the resulting systems quantitatively, as follows. First, we made the story menu child-friendlier by incorporating two improvements: (a) to support use by nonreaders, the new menu spoke all items on the list; (b) to speed up choice, the new menu required just one click to select an item. Second, we instituted a mixed-initiative story choice policy where the Reading Tutor and the student took turns choosing stories. These improvements made story choice measurably more efficient and effective. [STLL
2008 S98] Mostow, J., Aist, G., Huang, C., Junker, B., Kennedy, R.,
Lan, H., Latimer, D., O'Connor, R., Tassone, R., Tobin, B., & Wierman, A.
(2008). 4-Month evaluation of a learner-controlled Reading Tutor that
listens. In V. M. Holland & F. P. Fisher (Eds.), The Path of Speech
Technologies in Computer Assisted Language Learning: From Research Toward Practice (pp.
201-219). New York: Routledge. Abstract: We evaluated an automated Reading Tutor that let children pick stories to read, and listened to them read aloud. All 72 children in three classrooms (grades 2, 4, 5) were independently tested on the nationally normed Word Attack, Word Identification, and Passage Comprehension subtests of the Woodcock Reading Mastery Test (where they averaged nearly 2 standard deviations below national norms), and on oral reading fluency. We split each class into 3 matched treatment groups: Reading Tutor, commercial reading software, or other activities. In 4 months, the Reading Tutor group gained significantly more in Passage Comprehension than the control group (effect size = 1.2, p=.002) - even though actual usage was a fraction of the planned daily 20-25 minutes. To help explain these results, we analyzed relationships among gains in Word Attack, Word Identification, Passage Comprehension, and fluency by 108 additional children who used the Reading Tutor in 7 other classrooms (grades 1-4). Gains in Word Identification predicted Passage Comprehension gains only for Reading Tutor users, both in the controlled study (n=21, p=.042, regression coefficient B=.495± s.e. .227) and in the other classrooms (n=108, p=.005, B=.331±.115), where grade was also a significant predictor (p=.024, B=2.575±1.127). * [JECR
2007] Poulsen, R., Wiemer-Hastings, P., & Allbritton, D. (2007).
Tutoring Bilingual Students with an Automated Abstract: Children from non-English-speaking homes are doubly disadvantaged when learning English in school. They enter school with less prior knowledge of English sounds, word meanings, and sentence structure, and they get little or no reinforcement of their learning outside of the classroom. This article compares the classroom standard practice of sustained silent reading with the Project LISTEN Reading Tutor which uses automated speech recognition to "listen" to children read aloud, providing both spoken and graphical feedback. Previous research with the Reading Tutor has focused primarily on native speaking populations. In this study 34 Hispanic students spent one month in the classroom and one month using the Reading Tutor for 25 minutes per day. The Reading Tutor condition produced significant learning gains in several measures of fluency. Effect sizes ranged from 0.55 to 1.27. These dramatic results from a one-month treatment indicate this technology may have much to offer English language learners. [SLaTE 2007 ASL] Xu, L.,
Varadharajan, V., Maravich, J., Tongia, R., & Mostow, J. (2007, October
1-3). DeSIGN: An Intelligent Tutor to Teach American Sign Language.
SLaTE workshop on Speech and Language Technology for Education, ISCA Tutorial
and Research Workshop, The Summit Inn, Abstract: This paper presents the development of
DeSIGN, an educational software application for those deaf students who are
taught to communicate using American Sign Language (ASL). The software
reinforces English vocabulary and ASL signs by providing two essential
components of a tutor, lessons and tests. The current version was designed
for 5th and 6th graders, whose literacy skills lag by a grade or more on
average. In addition, a game that allows the students to be creative has been
integrated into the tests. Another
feature of DeSIGN is its ability to intelligently adapt its tests to the
changing knowledge of the student as determined by a knowledge tracing algorithm.
A separate interface for the teacher enables additions and modifications to
the content of the tutor and provides progress monitoring. These dynamic
aspects help motivate the students to use the software repeatedly. This
software prototype aims at a feasible and sustainable approach to increase
the participation of deaf people in society. DeSIGN has undergone an
iteration of testing and is currently in use at a school for the deaf in [AIED
2007 motivation] Beck, J. E. (2007, July 9-13). Does learner control affect
learning? Proceedings of the 13th International Conference on Artificial
Intelligence in Education, Abstract: Many intelligent tutoring systems permit some degree of learner control. A natural question is whether the increased student engagement and motivation such control provides results in additional student learning. This paper uses a novel approach, learning decomposition, to investigate whether students do in fact learn more from a story they select to read than from a story the tutor selects for them. By analyzing 346 students reading approximately 6.9 million words, we have found that students learn approximately 25% more in stories they choose to read, even though from a purely pedagogical standpoint such stories may not be as appropriate as those chosen by the computer. Furthermore, we found that (for our instantiation of learner control) younger students may derive less benefit from learner control than older students, and girls derive less benefit than boys. [AIED 2007 comprehension] Zhang,
X., Mostow, J., & Beck, J. E. (2007, July 9-13). Can a Computer Listen
for Fluctuations in Abstract: The ability to detect fluctuation in students' comprehension of text would be very useful for many intelligent tutoring systems. The obvious solution of inserting comprehension questions is limited in its application because it interrupts the flow of reading. To investigate whether we can detect comprehension fluctuations simply by observing the reading process itself, we developed a statistical model of 7805 responses by 289 children in grades 1-4 to multiple-choice comprehension questions in Project LISTEN's Reading Tutor, which listens to children read aloud and helps them learn to read. Machine-observable features of students' reading behavior turned out to be statistically significant predictors of their performance on individual questions. [EDM 2007 LFA transfer] Leszczenski, J. M., & Beck, J. E. (2007, July 9). What’s in a word? Extending learning factors analysis to modeling reading transfer. Proceedings of the AIED2007 Workshop on Educational Data Mining, Marina del Rey, CA, 31-39. Click here for .pdf file. Abstract: Learning Factors Analysis (LFA) has been proposed as a generic solution to evaluate and compare cognitive models of learning [1]. By performing a heuristic search over a space of statistical models, the researcher may evaluate different cognitive representations of a set of skills. We introduce a scalable application of this framework in the context of transfer in reading and demonstrate it upon Reading Tutor data. Using an assumption of a word-level model of learning as a baseline, we apply LFA to determine whether a representation with fewer word independencies will produce a better fit for student learning data. Specifically, we show that representing some groups of words as their common root leads to a better fitting model of student knowledge, indicating that this representation offers more information than merely viewing words as independent, atomic skills. In addition, we demonstrate an approximation to LFA which allows it to scale tractably to large datasets. We find that using a word root-based model of learning leads to an improved model fit, suggesting students make use of this information in their representation of words. Additionally, we present evidence based on both model fit and learning rate relationships that low proficiency students tend to exhibit a lesser degree of transfer through the word root representation than higher proficiency students. [EDM 2007 LD transfer] Zhang, X.,
Mostow, J., & Beck, J. E. (2007, July 9). All in the (word)
family: Using learning decomposition
to estimate transfer between skills in a Abstract: In this paper, we use the method of learning decomposition to study students’ mental representations of English words. Specifically, we investigate whether practice on a word transfers to similar words. We focus on the case where similar words share the same root (e.g., “dog” and “dogs”). Our data comes from Project LISTEN’s Reading Tutor during the 2003—2004 school year, and includes 6,213,289 words read by 650 students. We analyze the distribution of transfer effects across students, and identify factors that predict the amount of transfer. The results support some of our hypotheses about learning, e.g., the transfer effect from practice on similar words is greater for proficient readers than for poor readers. More significant than these empirical findings, however, is the novel analytic approach to measure transfer effects. [EDM 2007 Dirichlet] Beck, J. E. (2007, July 9). Difficulties in inferring student knowledge from observations (and why you should care). Proceedings of the AIED2007 Workshop on Educational Data Mining, Marina del Rey, CA, 21-30. Click here for .pdf file. Abstract: Student modeling has a long history in the field of intelligent educational software and is the basis for many tutorial decisions. Furthermore, the task of assessing a student’s level of knowledge is a basic building block in the educational data mining process. If we cannot estimate what students know, it is difficult to perform fine-grained analyses to see if a system’s teaching actions are having a positive effect. In this paper, we demonstrate that there are several unaddressed problems with student model construction that negatively affect the inferences we can make. We present two partial solutions to these problems, using Expectation Maximization to estimate parameters and using Dirichlet priors to bias the model fit procedure. Aside from reliably improving model fit in predictive accuracy, these approaches might result in model parameters that are more plausible. Although parameter plausibility is difficult to quantify, we discuss some guidelines and propose a derived measure of predicted number of trials until mastery as a method for evaluating model parameters. [UM 2007] Beck, J. E., & Chang,
K.-m. (2007, June 25-29). Identifiability: A Fundamental Problem of
Student Modeling. Proceedings of
the 11th International Conference on User Modeling (UM 2007), Abstract: In this paper we show how model identifiability is an issue for student modeling: observed student performance corresponds to an infinite family of possible model parameter estimates, all of which make identical predictions about student performance. However, these parameter estimates make different claims, some of which are clearly incorrect, about the student’s unobservable internal knowledge. We propose methods for evaluating these models to find ones that are more plausible. Specifically, we present an approach using Dirichlet priors to bias model search that results in a statistically reliable improvement in predictive accuracy (AUC of 0.620 ± 0.002 vs. 0.614 ± 0.002). Furthermore, the parameters associated with this model provide more plausible estimates of student learning, and better track with known properties of students’ background knowledge. The main conclusion is that prior beliefs are necessary to bias the student modeling search, and even large quantities of performance data alone are insufficient to properly estimate the model. [ICASSP 2007] Anumanchipalli, G. K.,
Ravishankar, M., & Reddy, R. (2007, April 15-20). Improving
Pronunciation Inference Using N-Best List, Acoustics and Orthography.
Proc. 32nd IEEE International
Conference on Acoustics, Speech, and Signal Processing (ICASSP), Abstract: In this paper, we tackle the problem of pronunciation inference and Out-of-Vocabulary (OOV) enrollment in Automatic Speech Recognition (ASR) applications. We combine linguistic and acoustic information of the OOV word using its spelling and a single instance of its utterance to derive an appropriate phonetic baseform. The novelty of the approach is in its employment of an orthography-driven n-best hypothesis and rescoring strategy of the pronunciation alternatives. We make use of decision trees and heuristic tree search to construct and score the n-best hypotheses space. We use acoustic alignment likelihood and phone transition cost to leverage the empirical evidence and phonotactic priors to rescore the hypotheses and refine the baseforms. [IERI 2007] Mostow, J., & Beck, J. (2007).
When the Rubber Meets the Road:
Lessons from the In-School Adventures of an Automated Abstract:
Project LISTEN's Reading Tutor (www.cs.cmu.edu/~listen) uses automatic
speech recognition to listen to children read aloud, and helps them learn to
read. Its experimental deployment in schools has expanded from a single
computer used by eight third graders in one school in 1996 to two hundred
computers used by children in grades 1-3 in nine schools in 2003. This
project illustrates how technology can not just scale up an intervention, but
instrument its implementation. For example, analysis of 2002-2003 usage
showed that session frequency and duration averaged significantly higher in
lab settings than in classrooms. [ICSLP2006] Mostow, J. (2006, September 17-21). Is ASR accurate enough for automated reading tutors, and how can we tell? Ninth International Conference on Spoken Language Processing (Interspeech 2006 — ICSLP), Pittsburgh, PA, 837-840. Click here for .pdf file. Abstract: We discuss pros and cons of several ways to evaluate ASR accuracy in automated tutors that listen to students read aloud. Whether ASR is accurate enough for a particular reading tutor function depends on what ASR-based judgment it requires, the visibility of that judgment to students and teachers, and the amount of input speech on which it is based. How to tell depends on the purpose, criterion, and space of the evaluation. [AAAI2006 help] Chang, K., Beck, J. E., Mostow, J., &
Corbett, A. (2006, July 17). Does Help Help? A Bayes Net Approach to Modeling Tutor
Interventions. AAAI2006 Workshop on Educational Data Mining, Abstract: This paper describes an effort to measure the effectiveness of tutor help in an intelligent tutoring system. Conventional pre- and post- test experimental methods can determine whether help is effective but are expensive to conduct. Furthermore, a pre and post- test methodology ignores a source of information: students request help about words they do not know. Therefore, we propose a dynamic Bayes net (which we call the help model) that models tutor help and student knowledge in one coherent framework. The help model distinguishes two different effects of help: scaffolding immediate performance vs. teaching persistent knowledge that improves long term performance. We train the help model to fit the student performance data gathered from usage of Reading Tutor. The parameters of the trained model suggest that students benefit from both the scaffolding and teaching effects of help. Thus, our framework is able to distinguish two types of influence that help has on the student, and can determine whether help helps learning without an explicit controlled study. [SSSR2006
cloze] Hensler, B. S., & Beck, J. (2006, July 6-8). Are all
questions created equal? Factors that
influence cloze question difficulty. Thirteenth Annual Meeting of the
Society for the Scientific Study of Abstract: The multiple choice cloze (MCC)
assessment methodology is widely used in assessing reading comprehension;
therefore an improved scoring methodology would have broad impact within the
reading research community. We have
constructed an MCC question model that simultaneously estimates the student's
comprehension proficiency and the impact of various terms on MCC difficulty.
To build the model, we analyzed 16,161 MCC question responses that were
administered by a computer reading tutor over the course of a school year. Participants were 373 students in grades 1
through 6 (ages 5-12) in urban and suburban public schools in To develop our model of MCC difficulty, we used multinomial logistic regression to calculate the relative impact of a number of factors. Our model includes the location of the deleted target word within the sentence and question length as covariates. As factors, we used student identity, reaction time (rounded to the nearest second) and level of difficulty of the target word. We hypothesized that more proficient readers would use syntactic cues while less proficient readers would not. To add syntax to the model, we used the TreeTagger part of speech tagger to annotate the part of speech of the correct answer for each cloze question. We then computed how many of the distractors could have the same part of speech as the answer. Presumably questions with many distractors able to take on the same part of speech as the answer would be harder.
After training the model on our 16,161 MCC questions, there were two main findings. First, our model found that students who had a second grade reading proficiency (as measured by Woodcock Reading Comprehension Cluster) or higher were sensitive to how many of the possible responses could take on the same part of speech as the correct answer (p= 0.002) for the cloze sentence, while students below second grade proficiency were insensitive to this term (p=0.467). This result suggests that students' syntactic awareness, at least within the context of MCC questions, begins at around the second grade. The second main finding was the degree of correlation of each student's Beta parameter, the model's estimate of her ability to answer MCC questions, with her associated Woodcock test score. The mean within-grade correlation between Beta and the Reading Comprehension Cluster score was 0.69, a very strong fit. [SSSR2006
fluency] Mostow, J. and J. Beck (2006, July 6-8). Refined micro-analysis of fluency gains in a Abstract: Our SSSR2005 talk presented a linear model of speedup in word reading between successive encounters in connected text, based on a quarter of a million such encounters. The model indicated that reading a word in a new context contributed more to speedup than re-encountering it in an old context, implying that wide reading builds fluency more than rereading. Our new, improved model uses a growth curve to model word reading time as a function of the number and types of encounters of the word. This approach lets us estimate -- both overall and at different reading levels -- the relative value of encountering a word in a new context versus an old one, and for the first time on a given day versus subsequently. [ITS2006
gaming] Baker, R. S. J. d., Corbett, A. T., Koedinger, K. R., Evenson,
S., Roll, I., Wagner, A. Z., Naim, M., Raspat, J., Baker, D. J., & Beck,
J. E. (2006, June 26-30). Adapting to When Students Game an Intelligent
Tutoring System [Best Paper]. Proceedings of the 8th International
Conference on Intelligent Tutoring Systems, Abstract: It has been found in recent years that many students who use intelligent tutoring systems game the system, attempting to succeed in the educational environment by exploiting properties of the system rather than by learning the material and trying to use that knowledge to answer correctly. In this paper, we introduce a system which gives a gaming student supplementary exercises focused on exactly the material the student bypassed by gaming, and which also expresses negative emotion to gaming students through an animated agent. Students using this system engage in less gaming, and students who receive many supplemental exercises have considerably better learning than is associated with gaming in the control condition or prior studies. [ITS2006
BNT-SM] Chang, K., Beck, J., Mostow, J., & Corbett, A. (2006, June
26-30). A Bayes Net Toolkit for Student Modeling in Intelligent Tutoring
Systems. Proceedings of the 8th International Conference on Intelligent
Tutoring Systems, Abstract: This paper describes an effort to model a student’s changing knowledge state during skill acquisition. Dynamic Bayes Nets (DBNs) provide a powerful way to represent and reason about uncertainty in time series data, and are therefore well-suited to model student knowledge. Many general-purpose Bayes net packages have been implemented and distributed; however, constructing DBNs often involves complicated coding effort. To address this problem, we introduce a tool called BNTSM. BNT-SM inputs a data set and a compact XML specification of a Bayes net model hypothesized by a researcher to describe causal relationships among student knowledge and observed behavior. BNT-SM generates and executes the code to train and test the model using the Bayes Net Toolbox [1]. Compared to the BNT code it outputs, BNT-SM reduces the number of lines of code required to use a DBN by a factor of 5. In addition to supporting more flexible models, we illustrate how to use BNT-SM to simulate Knowledge Tracing (KT) [2], an established technique for student modeling. The trained DBN does a better job of modeling and predicting student performance than the original KT code (Area Under Curve = 0.610 > 0.568), due to differences in how it estimates parameters. [ITS2006
cloze] Hensler, B. S., & Beck, J. (2006, June 26-30). Better
student assessing by finding difficulty factors in a fully automated
comprehension measure. Proceedings of the 8th International Conference on
Intelligent Tutoring Systems, Abstract: The multiple choice cloze (MCC) question format is commonly used to assess students' comprehension. It is an especially useful format for ITS because it is fully automatable and can be used on any text. Unfortunately, very little is known about the factors that influence MCC question difficulty and student performance on such questions. In order to better understand student performance on MCC questions, we developed a model of MCC questions. Our model shows that the difficulty of the answer and the student’s response time are the most important predictors of student performance. In addition to showing the relative impact of the terms in our model, our model provides evidence of a developmental trend in syntactic awareness beginning around the 2nd grade. Our model also accounts for 10% more variance in students’ external test scores compared to the standard scoring method for MCC questions. [ITS2006
vocabulary] Heiner, C., Beck, J., & Mostow, J. (2006, June 26-30). Automated
Vocabulary Instruction in a Abstract: This paper presents a within-subject, randomized experiment to compare automated interventions for teaching vocabulary to young readers using Project LISTEN's Reading Tutor. The experiment compared three conditions: no explicit instruction, a quick definition, and a quick definition plus a post-story battery of extended instruction based on a published instructional sequence for human teachers. A month long study with elementary school children indicates that the quick instruction which lasts about seven seconds has immediate effects on learning gains that did not persist. Extended instruction which lasted about thirty seconds longer than the quick instruction had a persistent effect and produced gains on a posttest one week later. [ITS2006
decomposition] Beck, J. (2006, June 26). Using learning decomposition
to analyze student fluency development. ITS2006 Educational Data Mining
Workshop, Abstract: This paper introduces an approach called learning decomposition to analyze what types of practice are most effective for helping students learn a skill. The approach is a generalization of learning curve analysis, and uses non-linear regression to determine how to weight different types of practice opportunities relative to each other. We are able to show that different types of practice differ reliably in how quickly students acquire the skill of reading words quickly and accurately. Specifically, massed practice is generally not effective for helping students learn words, but may be acceptable for less proficient readers. Rereading the same story is generally not as effective as reading a variety of stories, but might be beneficial for more proficient readers. [JNLE2006] Mostow, J. and J. Beck (2006). Some useful tactics to modify, map, and mine data from intelligent tutors. Natural Language Engineering (Special Issue on Educational Applications) 12(2),195-208. © 2006 Cambridge University Press. Click here for .pdf file. Abstract: Mining data logged by intelligent tutoring systems has the potential to discover information of value to students, teachers, authors, developers, researchers, and the tutors themselves -- information that could make education dramatically more effcient, effective, and responsive to individual needs. We factor this discovery process into tactics to modify tutors, map heterogeneous event streams into tabular data sets, and mine them. This model and the tactics identified mark out a roadmap for the emerging area of tutorial data mining, and may provide a useful vocabulary and framework for characterizing past, current, and future work in this area. We illustrate this framework using experiments that tested interventions by an automated reading tutor to help children decode words and comprehend stories. [IJAIED2006] Beck, J. E., & Sison, J. (2006). Using knowledge tracing in a noisy environment to measure student reading proficiencies. International Journal of Artificial Intelligence in Education, 16, 129-143. (In Special “Best of ITS 2004” Issue.) Click here for .pdf file. Abstract: Constructing a student model for language tutors is a challenging task. This paper describes using knowledge tracing to construct a student model of reading proficiency and validates the model. We use speech recognition to assess a student’s reading proficiency at a subword level, even though the speech recognizer output is at the level of words and is statistically noisy. Specifically, we estimate the student’s knowledge of 80 letter to sound mappings, such as ch making the sound /K/ in “chemistry.” At a coarse level, the student model did a better job at estimating reading proficiency for 47.2% of the students than did a standardized test designed for the task. Although not quite as strong as the standardized test, our assessment method can provide a report on the student at any time during the year and requires no break from reading to administer. Our model’s estimate of the student’s knowledge on individual letter to sound mappings is a significant predictor of whether he will ask for help on a particular word. Thus, our student model is able to describe student performance both at a coarse- and at a fine-grain size. [AIED2005 event] Mostow,
J., Beck, J., Cen, H., Gouvea, E., & Heiner, C. (2005, July). Interactive
Demonstration of a Generic Tool to Browse Tutor-Student Interactions.
Interactive Events Proceedings of the 12th International Conference on
Artificial Intelligence in Education (AIED 2005), Abstract: Project LISTEN's Session Browser is a generic tool to browse a database of students' interactions with an automated tutor. Using databases logged by Project LISTEN's Reading Tutor, we illustrate how to specify phenomena to investigate, explore events and the context where they occurred, dynamically drill down and adjust which details to display, and summarize events in human-understandable form. The tool should apply to MySQL databases from other tutors as well. [AIED2005
browser] Mostow, J., Beck, J., Abstract: A basic question in mining data from an intelligent tutoring system is, "What happened when…?" A generic tool to answer such questions should let the user specify which phenomenon to explore; explore selected events and the context in which they occurred; and require minimal effort to adapt the tool to new versions, to new users, or to other tutors. We describe an implemented tool and how it meets these requirements. The tool applies to MySQL databases whose representation of tutorial events includes student, computer, start time, and end time. It infers the implicit hierarchical structure of tutorial interaction so humans can browse it. A companion paper [1] illustrates the use of this tool to explore data from Project LISTEN's automated Reading Tutor. [AIED2005
interruption] Heiner, C., Beck, J., & Mostow, J. (2005, July 18-22). When do students interrupt help? Effects of individual differences. Proceedings
of the 12th International Conference on Artificial Intelligence in Education
(AIED 2005),
Abstract. When do students interrupt help to request different help? To study this question, we analyze a within-subject experiment in the 2003-2004 version of Project LISTEN's Reading Tutor. From 168,983 trials of this experiment, we report patterns in when students choose to interrupt help. To improve model fit for individual data, we adjust our model to account for individual differences. We report small but significant correlations between a student parameter in our model and gender as well as external measures of motivation and academic performance. [AIED2005
engagement] Beck, J. (2005, July 18-22). Engagement tracing: using
response times to model student disengagement. Proceedings of the 12th International Conference on Artificial
Intelligence in Education (AIED 2005), Abstract: Time on task is an important predictor for how much students learn. However, students must be focused on the learning for the time invested to be productive. Unfortunately, students do not always try their hardest to solve problems presented by computer tutors. This paper explores student disengagement and proposes an approach, engagement tracing, for detecting whether a student is engaged in answering questions. This model is based on item response theory, and uses as input the difficulty of the question, how long the student took to respond, and whether the response was correct. From these data, the model determines the probability a student was actively engaged in trying to answer the question. The model has a reliability of 0.95, and its estimate of student engagement correlates at 0.25 with student gains on external tests. Finally, the model is sensitive enough to detect variations in student engagement within a single tutoring session. The novel aspect of this work is that it requires only data normally collected by a computer tutor, and the affective model is validated against student performance on an external measure. [AIED2005 ASR] Beck, J. E., Chang, K., Mostow, J., & Corbett,
A. (2005, July 19). Using a student
model to improve a computer tutor's speech recognition. Proceedings of the AIED 05 Workshop on
Student Modeling for Language Tutors, 12th International Conference on
Artificial Intelligence in Education, Abstract: Intelligent computer tutors can derive much of their power from having a student model that describes the learner’s competencies. However, constructing a student model is challenging for computer tutors that use automated speech recognition (ASR) as input. This paper reports using ASR output from a computer tutor for reading to compare two models of how students learn to read words: a model that assumes students learn words as whole-unit chunks, and a model that assumes students learn the individual letteràsound mappings that make up words. We use the data collected by the ASR to show that a model of letteràsound mappings better describes student performance. We then compare using the student model and the ASR, both alone and in combination, to predict which words the student will read correctly, as scored by a human transcriber. Surprisingly, majority class has a higher classification accuracy than the ASR. However, we demonstrate that the ASR output still has useful information, and that classification accuracy is not a good metric for this task, and the Area Under Curve (AUC) of ROC curves is a superior scoring method. The AUC of the student model is statistically reliably better (0.670 vs. 0.550) than that of the ASR, which in turn is reliably better than majority class. These results show that ASR can be used to compare theories of how students learn to read words, and modeling individual learner’s proficiencies may enable improved speech recognition. [AIED 2005
model] Chang, K.., Beck, J. E., Mostow, J., & Corbett, A. (2005, July
19). Using speech recognition to evaluate two student models for a reading
tutor. Proceedings of the AIED 05
Workshop on Student Modeling for Language Tutors, 12th International
Conference on Artificial Intelligence in Education,
Abstract: Intelligent
Tutoring Systems derive much of their power from having a student model that
describes the learner's competencies. However, constructing a student model
is challenging for computer tutors that use automated speech recognition
(ASR) as input, due to inherent inaccuracies in ASR. We describe two
extremely simplified models of developing word decoding skills and explore
whether there is sufficient information in ASR output to determine which
model fits student performance better, and under what circumstances one model
is preferable to another. The two models
that we describe are a lexical model that assumes students learn words as
whole-unit chunks, and a grapheme-to-phoneme (G-to-P) model that assumes students
learn the individual letter-to-sound mappings that compose the words. We use
the data collected by the ASR to show that the G-to-P model better describes
student performance than the lexical model. We then determine which model
performs better under what conditions. On one hand, the G-to-P model better
correlates with student performance data when the student is older or when
the word is more difficult to read or spell. On the other hand, the lexical
model better correlates with student performance data when the student has
seen the word more times. [AAAI 2005
workshop] Beck, J. (Ed.). (2005, July 10). Proceedings of the AAAI2005
Workshop on Educational Data Mining. [AAAI2005
browser] Mostow, J., Beck, J., Cen, H., Abstract: A basic question in mining data from an intelligent tutoring system is, "What happened when…?" We identify requirements for a tool to help answer such questions by finding occurrences of specified phenomena and browsing them in human-understandable form. We describe an implemented tool and how it meets the requirements. The tool applies to MySQL databases whose representation of tutorial events includes student, computer, start time, and end time. It automatically computes and displays the temporal hierarchy implicit in this representation. We illustrate the use of this tool to mine data from Project LISTEN's automated Reading Tutor. [AAAI2005
usage] Abstract: Students in two classes in the fall of 2004 making extensive use of online courseware were logged as they visited over 500 different “learning pages” which varied in length and in difficulty. We computed the time spent on each page by each student during each session they were logged in. We then modeled the time spent for a particular visit as a function of the page itself, the session, and the student. Surprisingly, the average time a student spent on learning pages (over their whole course experience) was of almost no value in predicting how long they would spend on a given page, even controlling for the session and page difficulty. The page itself was highly predictive, but so was the average time spent on learning pages in a given session. This indicates that local considerations, e.g., mood, deadline proximity, etc., play a much greater role in determining student pace and attention than do intrinsic student traits. We also consider the average time spent on learning pages as a function of the time of semester. Students spent less time on pages later in the semester, even for more demanding material. [SSSR 2005] Mostow, J., & Beck,
J. (2005). Micro-analysis of fluency gains in a Reading Tutor that
listens: Wide vs. repeated guided oral
reading. Talk at Twelfth Annual
Meeting of the Society for the Scientific Study of Abstract: Fluency growth is essential but imperfectly understood. By using automatic speech recognition to listen to children read aloud, Project LISTEN's Reading Tutor provides a novel instrument to study fluency development. During the 2002-2003 school year, hundreds of children in grades 1-4 used the Reading Tutor, which recorded them reading millions of words of text. The latency preceding each word reflects the reader’s cognitive effort to identify the word. Using automatic speech recognition to analyze latency changes between successive encounters of words in the same or different contexts provides new data about how fluency grows. * [Toronto 2005] Cunningham, T., & Geva, E. (2005, June 24).
The effects of reading technologies on literacy development of ESL students
[poster presentation]. Twelfth Annual
Meeting of the Society for the Scientific Study of *
[UBC 2005] Reeder, K., Early, M., Kendrick, M., Shapiro, J., & [AERA 2005] Beck,
J. E., & Mostow, J. (2005). Mining Data from Randomized Within-Subject
Experiments in an Automated Reading Tutor (poster in session 34.080,
"Logging Students' Learning in Complex Domains: Empirical Considerations and Technological
Solutions"). American Educational Research Association 2005 Annual
Meeting: Demography and Democracy in
the Era of Accountability, Abstract: Experiments embedded in the Reading Tutor help evaluate its decisions in tutoring decoding, vocabulary, and comprehension.
Abstract:
This study looked at factors influencing teachers’ perception and usage of
Project LISTEN’s Reading Tutor, a computerized tutor used with elementary
students in 9 classroom-based, 10 computer lab-based, and 3 specialist-room
school settings. Thirteen interviews and 22 survey responses (of a
possible 28 teachers) examined teachers’ perception of the Reading Tutor and
suggested that teachers’ belief in the Tutor influenced their usage of it (r
= .46, p < .03). Three factors seemed to influence teacher belief:
1) perceived ease of use (r = .52, p < .01), 2) teachers’ reported
experience with computers (r = .41, p < .04) and instructional technology
(r = .48, p < .03), and 3) perceived technical problems such as frequency
of technical problems (r = -.44, p < .04) and speed with which problems
were fixed (r = .49, p < .02). Analysis of these factors suggested
four themes that cut-across factors and seem to influence the way teachers
evaluate and use the Reading Tutor – the technology’s degree of convenience,
competition from other educational priorities and practices, teacher
experience and/or interest with technology, and data available to teachers
and the way teachers prioritize that data. These results suggest that
improving convenience of the Reading Tutor, instituting specialized training
programs, and improving feedback mechanisms for teachers by providing
relevant, situated data may influence teacher belief in the Reading Tutor and
thereby increase teacher usage. This study contributes to current
literature on educational technology usage by supporting previous literature
suggesting that teacher belief in the importance of a technology influences
their use of it. One unique feature of this study is that is uses both
quantitative and qualitative methods to look at the research questions from
two different research perspectives.
Abstract:
A two-month pilot study comprised of 34 second through fourth grade Hispanic
students from four bilingual education classrooms was conducted to compare
the efficacy of the 2004 version of the Project LISTEN Reading Tutor against
the standard practice of sustained silent reading (SSR). The Reading
Tutor uses automated speech recognition to listen to children read
aloud. It provides both spoken and graphical feedback in order to
assist the children with the oral reading task. Prior research with
this software has demonstrated its efficacy within populations of native
English speakers. This study was undertaken to obtain some initial
indication as to whether the tutor would also be effective within a
population of English language learners. The study
employed a crossover design where each participant spent one month in each of
the treatment conditions. The experimental treatment consisted of 25
minutes per day using the Reading Tutor within a small pullout lab
setting. Control treatment consisted of the students who remained in the
classroom where they participated in established reading instruction
activities. Dependent variables consisted of the school districts
curriculum based measures for fluency, sight word recognition and
comprehension. The Reading Tutor
group out-gained the control group in every measure during both halves of the
crossover experiment. Within subject results from a paired T-Test
indicate these gains were significant for one sight word measure (p = .056)
and both fluency measures (p < .001). Effect sizes were 0.55 for
timed sight words, a robust 1.16 for total fluency and an even larger 1.27
for fluency controlled for word accuracy. These dramatic results
observed during a one-month treatment indicate this technology may have much
to offer English language learners.
Abstract:
We describe the automated generation and use of 69,326 comprehension cloze
questions and 5,668 vocabulary matching questions in the 2001-2002 version of
Project LISTEN's Reading Tutor used by 364 students in grades 1-9 at seven
schools. To validate our methods, we used students' performance on
these multiple-choice questions to predict their scores on the Woodcock
Reading Mastery Test. A model based on students' cloze performance
predicted their Passage Comprehension scores with correlation R=.85.
The percentage of vocabulary words that students matched correctly to their
definitions predicted their Word Comprehension scores with correlation R=.61.
We used both
types of questions in a within-subject automated experiment to compare four
ways to preview new vocabulary before a story - defining the word, giving a
synonym, asking about the word, and doing nothing. Outcomes included
comprehension as measured by performance on multiple-choice cloze questions
during the story, and vocabulary as measured by matching words to their
definitions in a posttest after the story. A synonym or short
definition significantly improved posttest performance compared to just
encountering the word in the story - but only for words students didn't
already know, and only if they had a grade 4 or better vocabulary. Such
a preview significantly improved performance during the story on cloze
questions involving the previewed word - but only for students with a grade
1-3 vocabulary. [TICL fluency] Beck, J. E., Jia, P., & Mostow, J. (2004). Automatically assessing oral reading fluency in a computer tutor that listens. Technology, Instruction, Cognition and Learning, 2, 61-81. Click here to download .pdf file. Abstract:
Much of the power of a computer tutor comes from its ability to assess
students. In some domains, including oral reading, assessing the
proficiency of a student is a challenging task for a computer. Our
approach for assessing student reading proficiency is to use data that a
computer tutor collects through its interactions with a student to estimate
his performance on a human-administered test of oral reading
fluency. A model with data collected from the tutor's speech
recognizer output correlated, within-grade, at 0.78 on average with student
performance on the fluency test. For assessing students, data from the
speech recognizer were more useful than student help-seeking behavior.
However, adding help-seeking behavior increased the average within-grade
correlation to 0.83. These results show that speech recognition is a
powerful source of data about student performance, particularly for reading. [ITS 2004 tracing] Beck, J. E., & Sison, J. (2004,
September 1-3). Using knowledge tracing to measure student reading
proficiencies. Proceedings of the 7th International Conference on
Intelligent Tutoring Systems, | |||||||||||||||||