Zinman and Donath, CEAS 2007
From ScribbleWiki: Analysis of Social Media
Is Britney Spears Spam?
Authors: Aaron Zinman and Judith Donath
Paper: Is Britney Spears Spam?
The paper proposes a "conceptual scaffold" for spam filtering in social networking services (SNS) like MySpace and Facebook. This is different from traditional email or comment spam in that different users will have very different preferences about what they consider spam (e.g. some people want to subscribe to Britney Spears's PR news on MySpace), and some profiles are ambiguous. Additionally, many SNS users receive hundreds of friend requests, many of which have very little content. Zinman and Donath's method is designed to make profile and network features more clear to help the user decide in ambiguous cases, because it's too difficult for a computer to understand the subjective preferences of users.
The long-term goals of the research are to build a "people-oriented reasoning AI engine" that matches users' mental models of who to friend (e.g. that guy central to the punk rock scene, or someone who shares similar media as I share). In the near term, the researchers present lower-level feature bundles (e.g. someone who sends more movie clips that he receives, or someone with little public information). These near-term goals are reduced even further in this experiment, to two measures: sociability and promotionalism.
The authors point out that network approaches traditionally used for spam classification don't work as well in SNS because:
- Trust in "friends of friends" changes over time and in different contexts, so it can't be confidently evaluated several hops away.
- Network clustering components work for classic spam, but not for borderline cases (like undesirable friends posting political spam). Clustering components don't adequately match the user's mental model.
A set of 800 MySpace profiles and their top friends were harvested and hand-scored on 5pt scales for sociability (s):
- # of personal comments
- customized graphics
- other "normal social activity"
and promotionality (p):
- amount of material meant to influence others
Half had p>1; half had p=1. They expected users to fall into one of four quadrants:
|low sociability||high sociability|
|low promotion||New members (no info)||Normal users|
|high promotion||Spammers, PR agents||Bands, small labels|
They used a set of 40 features to classify sociabilty and promotionality. Features were:
- Network based
The best model (90% accuracy) used both network and profile features, although it performed only marginally better than a profile-only model. The authors suggest, however, that profile features are more easily faked in the spam arms race; network features, being more expensive (in time and effort), will be more important in these models in the future.