Text Mining and Educational Discourse

The key insight communicated through this tutorial is that if we can understand the connection between socio-psychological processes and language by means of the social signals encoded in them, we can structure computational models of language interactions more effectively. This tutorial will be composed of a theoretical component and a hands on component. In the theoretical component, I will give an overview of work related to the connection between discourse and learning. In service of that, I will discuss my group’s work on a computational reinterpretation of qualitative frameworks for studying conversational interactions from sociolinguistics and discourse analysis. The focus of the work I will present is to take these rich and expansive but informally specified constructs and distil them into precise operationalizations that capture the most important essence and render them learnable by machine learning algorithms. The challenge is in identifying what that most important essence is in a way that lends itself to generalization across contexts and predictive validity for important learning and interaction outcomes. In the hands-on component, I will offer instruction on use of a freely downloadable tool for facilitating the application of machine learning to natural language data called LightSIDE that provides a convenient GUI environment for novice users of text classification technology easily run text extraction and classification experiments. On top of that, LightSIDE serves as a vehicle for dissemination of new techniques for effective application of machine learning to text mining, including novel feature extraction techniques. The newest version (LightSIDE 2.0) includes a model specification panel that enables easy use of multi-level modeling techniques from applied statistics as domain adaptation and multi-domain learning approaches. One of its most unique capabilities is its sophisticated support for error analysis.

To prepare for the tutorial:

  1. Download and unpack LightSIDE . If it does not work properly when you click on the executable, send email to dadamson@cs.cmu.edu. Be sure to look through the user's manual, which is included in the download.
  2. Skim through this intro article related to automated collaborative learning process analysis.
  3. Skim through this overview of the Soufle framework for analysis of discussions for learning. Feel free to email cprose@cs.cmu.edu to obtain a data set that is formatted for LightSIDE and is annotated with this 3 dimensional annotation scheme.
  4. You may want to skim through the slide decks from last year's tutorial Part 1, Part 2, Part 3, Part 4.

Selected Recent Publications (available on request):

  1. Rosé, C. P., Wang, Y.C., Cui, Y., Arguello, J., Stegmann, K., Weinberger, A., Fischer, F., (2008). Analyzing Collaborative Learning Processes Automatically: Exploiting the Advances of Computational Linguistics in Computer-Supported Collaborative Learning, submitted to the International Journal of Computer Supported Collaborative Learning 3(3), pp237-271.
  2. Gweon, G., Jain, M., Mc Donough, J., Raj, B., Rosé, C. P. (accepted). Measuring Prevalence of Other-Oriented Transactive Contributions Using an Automated Measure of Speech Style Accommodation, International Journal of Computer Supported Collaborative Learning
  3. Howley, I., Mayfield, E. & Rosé, C. P. (2013). Linguistic Analysis Methods for Studying Small Groups, in Cindy Hmelo-Silver, Angela O’Donnell, Carol Chan, & Clark Chin (Eds.) International Handbook of Collaborative Learning, Taylor and Francis, Inc.
  4. Dyke, G., Kumar, R., Ai, H., Rosé, C. P. (2012). Challenging Assumptions: using sliding window visualizations to reveal time-based irregularities in CSCL processes, in Proceedings of the International Conference of the Learning Sciences. Sydney, Australia.
  5. Sionti, M., Ai, H., Rosé, C. P., Resnick, L. (2011). A Framework for Analyzing Develpoment of Argumentation through Classroom Discussions, in Niels Pinkwart & Bruce McClaren (Eds.) Educational Technologies for Teaching Argumentation Skills, Bentham Science.
  6. Mayfield, E. & Rosé, C. P. (2011). Recognizing Authority in Dialogue with an Integer Linear Programming Constrained Model, in Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies, pp 1018–1026
  7. Howley, I., Mayfield, E., Rosé, C. P. (2011). Missing Something? Authority in Collaborative Learning, in Proceedings of the 9th International Computer Supported Collaborative Learning Conference, Volume 1: Long Papers , pp 336-373
Carolyn Penstein Rose (cprose@cs.cmu.edu)/ Carnegie Mellon University