Automatic Conversation Analysis

The core technical contribution of my research is in the area of automated analysis of conversational interactions (especially automation of the SouFLé framework described in the previous section) as well as analysis of the social aspects of text (i.e., perspective modeling, sentiment analysis, and opinion mining). I refer to work on these problems as social interpretation of language. Basic research contributions to the field of language technologies from my group’s work on these problems have been published in the past 10 years in 6 full and 8 short papers at the field’s top conferences, namely ACL, NAACL, EACL, EMNLP, and SIGDIAL. In the same time, applications of this work to the field of education have been published as 12 full and 4 short papers in the top conferences in learning sciences, namely ICLS and CSCL as well as 15 full and 13 short papers in the top conferences in educational technology, namely AIED and ITS, and finally 4 articles in the top journal in Computer Supported Collaborative Learning, namely ijCSCL.

Impact: Seminal Ideas and Findings. What sets my group’s work apart is its key idea that what enables automated social interpretation of language is using insights from theories in sociolinguistics and discourse analysis that motivate the design of novel representations of language. Designing computational models that reflect these insights makes the patterns learnable. My work in this area has been influential in the birth and growth of the area of Automated Analysis of Collaborative Learning Processes. Pierre Dillenbourg, leader in the area of scripted collaboration, has been advocating the usage of machine learning for modeling human collaborative learning processes since the inception of the field of Computer Supported Collaborative Learning, which is at least since the mid-90s. However, observing the session title names in conferences prior to the past six years, there was no formal acknowledgement of this as a focal area of CSCL research. The tide began to turn in a series of visionary lectures at CSCL 2005 forecasting the important contributions to the field over the next 10 years. Automation of analysis of collaborative learning processes was highlighted as a focal area for the community. It was at this conference that my group’s debut into this area was published and nominated for a Best Paper award. That work began with a collaboration with Frank Fischer, educational psychologist and leader in the area of collaborative learning process analyses. His research group produced one of the most sophisticated current multi-dimensional coding schemes for characterizing aspects of collaborative learning processes from multiple theoretical perspectives in the learning sciences. Our collaboration produced the first proof of concept that machine learning applied to raw communication data could replicate such a sophisticated analysis on multiple dimensions that approached human reliability. Extensions of that work were published in my 2008 article in the International Journal of Computer Supported Collaborative Learning (ijCSCL), which is one of the most highly cited publications in the field of CSCL published in the past 5 years. Since 2005 there has been growing recognition of automated analysis of collaborative learning processes as an important area. In January 2006, the first Kaleidoscope CSCL Rendez Vous was held in Villars Switzerland. I was invited to give a keynote talk on dynamic support for collaborative learning enabled by automated analysis of collaborative learning processes as well as two additional invited workshop talks at the same event. Invitations for talks in this area have continued since that time, including a symposium talk at ICLS 2008, and panel presentations at the European Association for Research on Learning and Instruction in Fall of 2011 and Fall of 2013. At CSCL 2011, my now graduated PhD student Gahgene Gweon presented her paper on automated analysis of Transactivity in speech data and won the Best Student Paper award.

Impact: Tools and Resources for Other Researchers. In addition to basic research in machine learning applied to problems in conversation analysis, we have produced two publically available tool kits that are in wide use, namely TagHelper tools (Rosé et al., 2008) and LightSIDE (Mayfield & Rosé, in press), each of which have been downloaded over 4,000 times from over 70 countries. In the past three months alone, LightSIDE has been downloaded by over 400 users, including users from 43 out of the top 100 schools of computer science in the country and 55 other universities from 20 countries. A recent survey sent out to the 200 most recent downloaders of LightSIDE indicates that 39% of respondents continue to use LightSIDE regularly after downloading. Both tool kits provide a convenient GUI environment for novice users of text classification technology easily run text extraction and classification experiments. On top of that, LightSIDE serves as a vehicle for dissemination of new techniques for effective application of machine learning to text mining, including novel feature extraction techniques (Gianfortoni et al., 2011). The newest version (LightSIDE 2.0) includes a model specification panel that enables easy use of multi-level modeling techniques from applied statistics as domain adaptation and multi-domain learning approaches. One of its most unique capabilities is its sophisticated support for error analysis. 52% of respondents to the recent LightSIDE survey report using LightSIDE for Applied Text Mining, another 39% report using it for homework, 29% for Research, another 29% for writing assessment, and 10% for teaching, including instructors at the University of Maryland, the University of North Carolina, University of Pennsylvania, University of Texas at Austin, and American University. Tutorials for use of TagHelper Tools and LightSIDE have been offered both locally and at international conferences such as CSCL and AIED. LightSIDE’s value to research in learning sciences has been recognized in a recent invitation for me to offer a tutorial on Discourse Analytics using LightSIDE at the upcoming Learning Analytics Summit that will be held at Stanford University in July 2013. In a similar vein, I was invited to contribute a unit on “Learning Analytics and Educational Data Mining of Discourse Data” by the International Society of the Learning Sciences Network of Academic Programs in the Learning Sciences (NAPLES), for inclusion in a collection of resources they will disseminate as short online courses.

Impact: Change in research, teaching, and assessment practices. Because of its performance in a recent Automated Student Assessment Prize (ASAP) nationwide evaluation of automated essay scoring technology (comparable to top industrial vendors such as Kaplan and ETS) as well as its successful performance in smaller evaluations by leaders in educational assessment for science education, LightSIDE has been featured on NPR, Education Week, and other news sources. The media coverage of LightSIDE regarding the ASAP Challenge sparked an increase in interest in use of LightSIDE, especially for assessment of student work. As an example of its use, Science Education Assessment expert, Ross Nehm, has published four journal articles and seven conference papers since 2010 about his work using LightSIDE. This work was recognized in an Editor’s Choice column in Science. Increasing interest in consulting services related to LightSIDE as well as industrial invitations to offer tutorials on its use from large companies such as College Board and McGraw Hill have encouraged PhD student and LightSIDE developer Elijah Mayfield to work with Olympus, a CMU based organization for supporting technology transfer efforts, to grow the LightSIDE project into a startup company called LightSIDElabs.com focused on development of enterprise software solutions in the area of text analytics, with College Board and CTB McGraw Hill as current large contracts. LightSIDElabs.com recently won 2nd place for “Best university-based startup in Pittsburgh” at the Three Rivers Venture Fair.

Carolyn Penstein Rose (cprose@cs.cmu.edu)/ Carnegie Mellon University