Automatic Conversation Analysis

The previous section focused on theoretical and methodological work related to discussion analysis. The core technical contribution of my research is in the area of automated analysis of conversational interactions (especially automation of the SouFLť framework introduced in the previous section) as well as analysis of the social aspects of text (i.e., perspective modeling, sentiment analysis, and opinion mining). I refer to work on these problems as social interpretation of language. Basic research contributions to the field of language technologies from my groupís work on these problems have been published in the past 5 years in 9 full papers at the Language Technologies fieldís top conferences, namely ACL, NAACL, EACL, EMNLP, and SIGDIAL. In the same time, applications of this work to the field of education have been published as 7 full papers in the top conferences in learning sciences, namely ICLS and CSCL as well as 7 full papers in the top conferences in educational technology, namely AIED, ITS, EDM, and LAK and finally 14 journal articles that span these three fields.

What sets my groupís work apart is its key idea: Using insights from theories in sociolinguistics and discourse analysis to motivate the design of novel representations of language is what enables automated social interpretation of language. Designing computational models that reflect these insights makes the patterns learnable. My early work in this area served as the first proof of concept that machine learning applied to raw communication data could replicate multi-dimensional approaches to analysis of collaborative processes that were recognized as influential within the CSCL community. Extensions of that work were published in my 2008 article in the International Journal of Computer-Supported Collaborative Learning (ijCSCL), which is one of the most highly cited publications in the field of CSCL, ranking as 5th most cited article in the journal since its inception in 2005. Since my tenure review, the focus of my computational work has shifted from analysis at the turn level to analysis of role based behavior profiles and how these predict important outcomes in large scale social interaction, such as in MOOCs, Wikipedia, and GitHub. Several of the top cited articles in the area of automated analysis of discussion in MOOCs are from my groupís work. For example, according to a Google Scholar collection of articles related to implications of social interaction in MOOCs, the ten top cited articles focusing on analysis of discussion includes four from my groupís work, and the top cited article within this subset is from my group.

In addition to basic research in machine learning applied to problems in conversation analysis, my research group has produced two publically available tool kits that are in wide use, namely TagHelper tools (Rosť et al., 2008) and LightSIDE (Mayfield & Rosť, 2013), which cumulatively have been downloaded over 18,000 times from over 70 countries. At the time of my tenure review, my former PhD students Elijah Mayfield and David Adamson had spun off a company building on LightSIDE technology referred to as LightSIDE Labs, which has recently been acquired by TurnItIn.com. The company continues to be active in the Computational Linguistics, Assessment, and Educational Technology communities. In addition to leveraging these tools in my own teaching and sharing them with other instructors locally, I taught this tool as one of four instructors of a Massive Open Online Course on Learning Analytics offered by edX in Fall of 2014 with over 40,000 students enrolled.

Carolyn Penstein Rose (cprose@cs.cmu.edu)/ Carnegie Mellon University