Malouf and Mullen, WICOW 2007

From ScribbleWiki: Analysis of Social Media

Jump to: navigation, search
This page maintained by: Mahesh Joshi

Graph-based User Classification for Informal Online Political Discourse

Rob Malouf and Tony Mullen

PDF

Summary

The paper describes the task of identifying the political affiliation of the users of an online discussion board.

In this paper, following up on their previous work from Mullen and Malouf, AAAI 2006 the authors have experimented with two new approaches to the task of classifying users according to a “left” vs “right” taxonomy of political affiliation.

The first approach that they try is that of applying the idea of semantic orientation of a phrase from [Turney, ACL 2001], to perform unsupervised classification of users into the “left” and “right” categories. Along the lines of the semantic orientation of a phrase for positive or negative sentiments, Malouf and Mullen define the political orientation of a phrase as follows:

SO(w) = PMI(w, liberal) - PMI(w, conservative)

where,

<math>PMI(x, y) = \frac{P(x,y)}{P(x)P(y)}\,\!</math>

For evaluating the PMI scores of phrases, the authors use a subset of the Reuters corpus, consisting of 200 million words. Although the high ranking phrases for the “left” and “right” classes seem to make sense, the accuracy that this approach obtained is only around 41%, which is even worse than a majority class baseline that assigns the “left” category to all the users, achieving a 52% accuracy. The authors have not elaborated on the possible causes for this, but do acknowledge that the corpus size that they used for evaluating the political orientation may not be large enough. Essentially, the paper does not conclusively establish the failure of semantic orientation approach for this task.

The second approach that the authors propose is that of utilizing citation or quoting behavior of the users. The hypothesis is that users whose quoting behavior is similar will have the same affiliation. This is based on the initial analysis that users, more often than not, tend to quote others with opposite affiliation. The key steps in this approach are as follows:

  1. Create an adjacency matrix that consists of the number of times the users quote any other user
  2. Apply singular value decomposition to this adjacency matrix to get a low rank approximation
    • The exact number of reduced dimensions is not reported
  3. Cluster the users in this low rank space
    • The details of the clustering algorithm used, or the number of clusters formed and how that number was chosen is not reported
  4. Combine all the postings of the users of a given cluster
  5. Run the naïve Bayes classifier on this modified set of documents
  6. Assign the predicted political affiliation for the combined document to all the users in the cluster

This approach essentially constrains the classification to assign the same class to a group of users, and achieves improvement over the simple text categorization style naïve Bayes baseline reported in the previous paper: Mullen and Malouf, AAAI 2006. The accuracy obtained is around 68%. It is good to see this improvement using a simple constraint that takes into consideration the conversational nature of the data, further utilization of which seems promising.

Views
Personal tools
  • Log in / create account