Simone Says

Simone Says is an effort to develop a software environment for teaching basic language skills to young, language-disordered children, especially those with autistic spectrum disorders (ASD). The significant language delays and impairments in this population often co-occur with a marked preference for computer rather than human interaction, strong visual processing skills and rote memory, age-appropriate articulation, and preferential attention for language that is patiently repeated with little or no variation in prosody, word choice, or syntactic structure (for example, television commercials and videos). In other words, the weaknesses and strengths of children with ASD pair particularly well with the strengths and weaknesses of current Artificial Intelligence (AI) technology, especially leading-edge speech recognition, language processing, and computer-aided instruction. By combining such technology in a clinically-informed way, we may be able to create both a uniquely useful tool for remediation and a platform for studying the language acquisition process in impaired populations.

Educational and clinical techniques for stimulating language in children with ASD focus on achieving a complete, speech-to-speech, communicative loop. Research and practice both stress the need for achieving engagement and sustaining the motivation to use language in functionally appropriate ways. Of particular concern is the area of social communication, and much therapeutic effort is spent on pragmatic skills like conversational turn taking and establishing joint attention (for example, understanding the need to specify the large book when two books are present but only one should be taken). In contrast, current software options consist primarily of comprehension drill for vocabulary and syntax, with interaction that is mouse- or keyboard-based rather than verbal. Existing software that provides speech-based turn taking (for example, IBM's SpeechWriter products) targets only the acoustic level, with a focus on reinforcing prosodic features such as pitch and duration.

Unlike other software intervention, Simone Says is intended to be a speech-based, interactive environment for young, non-mute children with ASD. The system combines sound clinical practice with the engaging features found in off-the-shelf early learning software, including child-centered control, animation and sound, and readily-available help in the form of animated character guides. It is designed to create opportunities for meaningful, verbal language practice across a wide range of linguistic tasks in a simple social world. Tasks cover vocabulary, basic syntax, semantics and pragmatics, joint attention, conversational turn taking, and simple conversational repair. The system's interactive loop consists of (1) the presentation of a visual image, (2) the production of a referentially meaningful speech act by the child (or modeled by subdialog with Simone the cat or another character), and (3) a natural-consequence animation sequence as reward. The visual stimuli consist of common, everyday objects, actions and situations, both to teach functionally useful vocabulary and to maximize the likelihood of practice and transfer in the home and school settings. Multiple examples of each stimulus are generated automatically within relevant dimensions of variability (for example, color and size) in order to increase the likelihood of generalization.

To create interactions like these requires a number of advanced technologies. A primary requirement is a speaker-independent, continuous speech recognition system (such as SPHINX-II, developed at Carnegie Mellon University) with an acoustic model that can be adapted to young children's voices. Simone Says also requires a method for dynamically tracking the child's language development (CHAMP, also developed at CMU) in order to construct the next target example and provide a constrained language model to the speech recognizer. Finally, the system must have a method for translating the linguistic description of the next example into a visual image and animation that can dynamically accommodate variation along dimensions such as color, size, number of objects, participation of animated characters, and substitution of objects from the same semantic class. The off-the-shelf animation authoring environment Director seems to be adequate to this task. Each of the components is understood well enough to make the implementation of Simone Says possible.

Part of the motivation for building Simone Says is that it provides a platform for collecting data and testing hypotheses that, in turn, can inform our models of human language processing. Such an environment can answer questions about the efficacy of the technology (for example, is there demonstrable growth in language during human-computer interaction, as measured by response latency, errors, generalization, etc.?). Beyond such straightforward questions, however, a computational system also makes it practical to systematically examine relationships between language learning and other factors such as rate of repetition, the variability (or constancy) of prosodic, lexical, and syntactic information in the environment, importance of non-verbal cues like gaze-following and pointing, and the likelihood of skill transfer to human-human interaction given different environmental simplifications. Understanding these relationships empirically in well-controlled, reproducible experimental settings (like an instrumented computer environment)is important to researchers in cognitive science, language development, and autism.

The more compelling reason for this work, however, is what it can mean to children with autism and, eventually, the 3-5% of all children who enter school with a language disorder. The lesson from the new therapeutic focus on early intervention is quite clear: acquiring age-appropriate language has a profound effect on behavior, socialization, and the long-term prognosis for an independent adulthood. By providing meaning-based interactive experiences that range linguistically from vocabulary-building to simple social discourse, Simone Says may be the first chance for children who need it to learn the efficacy of language from a constantly available, infinitely patient teacher.