Automatic Conversation Analysis

The core technical contribution of my research is in the area of automated analysis of conversational interactions (especially automation of the SouFLé framework described in the previous section) as well as analysis of the social aspects of text (i.e., perspective modeling, sentiment analysis, and opinion mining). I refer to work on these problems as social interpretation of language. Basic research contributions to the field of language technologies from my group’s work on these problems have been published in full and short papers at the field’s top conferences including ACL, EACL, EMNLP, and SIGDIAL. Applications of this work to the field of education have been published in the top conferences in learning sciences including ICLS and CSCL as well as top conferences in educational technology including AIED and ITS as well as the top journal in Computer Supported Collaborative Learning, namely ijCSCL.

The key idea behind my recent computational work enabling social interpretation of language has been using insights from theories in sociolinguistics and discourse analysis to motivate the design of novel representations of language in order to make these problems learnable. One such example is computational work on analysis of Authoritativeness (Mayfield & Rosé, 2011). In this work, we draw insights from the theoretical foundation for the coding scheme that imposes sequencing constraints on patterns of codes within an interaction. While the codes are assigned to individual contributions in a conversation, we are able to encode the sequencing constraints within an Integer Linear Programming framework. The best performing model included these constraints imported directly from the theory foundation for the coding scheme, and significantly outperformed an otherwise equivalent model without the constraints. The model achieved high correlation with Authoritativeness ratings from human assigned codes in a corpus of direction giving dialogues (R = .97) as well as a corpus of doctor-patient interactions (R = .96).

Another example is modeling speech style accommodation in speech using unsupervised Dynamic Bayesian Networks (Jain et al., 2012), work that was done collaboratively with Bhiksha Raj. When stylistic shifts are focused on specific linguistic features, then measuring the extent of the stylistic accommodation is simple since a speaker’s style may be represented on a one or two dimensional space, and movement can then be measured precisely within this space using simple linear functions. However, the rich sociolinguistic literature on speech style accommodation highlights a much greater variety of speech style characteristics that may be associated with social status within an interaction and may thus be beneficial to monitor for stylistic shifts. Unfortunately, within any given context, the linguistic features that have these status associations are only a small subset of the linguistic features that are being used in some way. Furthermore, which features carry this status related indexicality are specific to a context. Thus, separating the socially meaningful variation from variation in linguistic features occurring for other reasons is difficult to do using a discriminative approach. In this case, the theory is agnostic to the features that should be used but instead informs the structure of the model itself that is then able to identify the important structure in the speech data without further supervision. The hypothesis that drives this technical work is that stylistic shifts that occur as a result of social processes are likely to display some consistency over time, and if we leverage this insight in the structure of the model, we will achieve the capability of measuring this important social process using an unsupervised approach. We do this by including the concept of an accommodation state in the model, that embodies the idea that the effect of one speaker’s style on another speaker’s style is regulated by the extent to which accommodation is happening throughout the interaction. Our evaluation demonstrated that including the novel accommodation states within the model had a significant positive effect on the ability of the model to detect stylistic accommodation.

The work on speech style accommodation contributes to a series of papers on computationally modeling Transactivity in chat (Joshi & Rosé, 2007), newsgroup style interactions (Rosé et al., 2008), transcribed whole classroom discussions (Ai et al., 2010), and face to face conversations using raw speech (Gweon et al., 2012). The concept of Transactivity originally grows out of a Piagetian theory of learning where this conversational behavior is said to reflect a balance of perceived power within an interaction. Earlier research in the area of speech style accommodation suggests that it should be possible to find evidence of power differentials as well as adjustments in these differentials through shifts in language usage patterns. It can be expected, then, that linguistic accommodation would predict the occurrence of Transactivity, and therefore a representation for language that represents evidence of such language usage shifts should be useful for predicting occurrence of Transactivity. This hypothesis has been confirmed through a demonstration that speech style accommodation as measured by Jain et al. unsupervised model has a significant positive correlation with prevalence of Transactive contributions in debates between undergraduate students discussing reasons for the fall of the Ottoman empire (R = .4) (Gweon et al., 2012). Consistent with this work, what we have also found is that in a variety of efforts to automatically identify Transactive conversational contributions in various forms of conversational data, those in which we have included a feature that represents language similarity have been the most successful (Rosé et al., 2008; Ai et al., 2010).

Lexical accommodation is an important language process to consider in computational modeling of perspective-based lexical selection in text (Nguyen, Mayfield, & Rosé, 2010). In this work analyzing contributions to a politics discussion forum where participants self-identify as left affiliated or right affiliated, we construct a measure of political polarization of word usage. In an analysis of language usage patterns in replies and how they shift depending upon the affiliation of the poster of the initiating post, we are able to identify strategies within accommodation behavior that show intentional avoidance of appearing to concede to the alternate viewpoint while maintaining coherence within the interaction.

Not all of my work on design of effective text representations for classification has been so closely tied to theories from sociolinguistics. Earlier work was motivated from specific problems with over-fitting that occur due to the inherently non-IID nature of social interaction data, which can also be addressed using multi-domain learning techniques (Joshi et al., 2012). The principle motivating this earlier work was to strike a balance between informativity and generalizability. Part-of-Speech ngrams, for example, are able to estimate syntactic structure and style without modeling it directly. In an attempt to capture syntactic structure more faithfully, some of my earlier experimentation within the area of sentiment analysis on using syntactic dependency features showed promise (Joshi & Rosé, 2009; Arora, Joshi, & Rosé, 2009). One direction that has proven successful at exceeding the representational power and performance of POS bigrams with only a very modest increase in feature space size has been a genetic programming based approach to learning to build a strategic set of rich features so that the benefits of rich features can be obtained without the expense in terms of feature space expansion. Successful experiments with this technique have been conducted in the area of sentiment analysis, with terminal symbols including unigrams in one case (Mayfield & Rosé, 2010) and graph features extracted from dependency parses in another (Arora et al., 2010). What has been even more successful in practice is a final direction, which has been to construct template based features called stretchy patterns (Gianfortoni, Adamson, & Rosé, 2011) that combine some aspects of POS ngrams in that they are a flat representation, and the backoff version of dependency features, in that the symbols represent sets of words, which may be POS tags, learned word classes, distribution based word classes (such as high frequency words or low frequency words), or words. Stretchy patterns have yielded significant improvements for gender recognition in blog data. Even greater success has been achieved with stretchy patterns in more recent work using them to learn extraction patterns for cancer events in discussion forum data (Wen et al., 2012).

Carolyn Penstein Rose (cprose@cs.cmu.edu)/ Carnegie Mellon University