From ScribbleWiki: Analysis of Social Media

Jump to: navigation, search


Varieties of social media spam

Social media spam is the use of any electronic communication platform to disseminate unsolicited commercial messages. The earliest and most prevalent form of spam is email spam, but with the growth of social media platforms like MySpace, blogs, and discussion groups, unsolicited messages now appear in many forms, including:

Goals of social media spam

Generally, social media spam isn't intended to fool human readers; it it simply trying to increase a site's PageRank (also known as "spamdexing"). (Mishne et al, WWW 2005). Also, because blog search engines rank results based on recency rather than relevancy, as a regular search engine does, blog spam that is updated frequently is likely to appear prominently in search results.

Social media phishing, unlike strict spamming, is intended to fool users, such as those at Facebook and MySpace, where it is relatively easy to identify users in a target demographic. One form of phishing uses fake social network profiles claiming to raise money for charity, while others use dubious profiles to "friend" users and then post advertisements on those users' walls. (Zinman and Donath, CEAS 2007)

Approaches to spam control

Generally, approaches to social media spam control fall into two categories: human-moderated and automatic. Human-moderated techniques require time-consuming reading, introducing a publishing bottleneck disliked by many bloggers. Automatic techniques are generally based on keywords or blacklists.

Human-moderated techniques:

  • Human moderation of comments
  • Distributed human moderation (e.g. Wikipedia)
  • Posting by people in your friend network (e.g. Facebook)

Automatic techniques:

  • Captchas: Simple Turing tests that spam bots cannot decipher
  • Content-based filters (keywords)
  • Network-based filters (network shape or poster reputation)
  • White or blacklists
  • Preventing HTML in the comments
  • Throttling comment rate
  • Link markup to prevent search engines from trusting links (e.g. rel=“nofollow”)


Towards Spam Detection at Ping Servers, P Kolari et al ICWSM 2007

Detecting Spam Blogs: A Machine Learning Approach Kolari et al, AAAI 2006

Blocking Blog Spam with Language Model Disagreement Mishne et al, WWW 2005

Is Britney Spears Spam? Zinman and Donath, CEAS 2007

Web Spam Taxonomy Gyongyi and Garcia-Molina, 2004

Leveraging social networks to fight spam Boykin and Roychowdhury, 2005

Taking TrackBack Back (from Spam) Gerecht et al, 2005

Blog Track Open Task: Spam Blog Classification Kolari et al, TREC 2006

BlogVox: Separating Blog Wheat from Blog Chaff Java et al. IJCAL 2006

SVMs for the Blogosphere: Blog Identification and Splog Detection Kolari, AAAI 2006

Splog Detection Using Self-Similarity Analysis on Blog Temporal Dynamics Lin et al, AIRWeb 2007

Splog Detection using Content, Time and Link Structures Lin et al, ICME 2007

A Quantitative Study of Forum Spamming Using Context-based Analysis, Niu et al, 2007

Weblog Classification for Fast Splog Filtering: A URL Language Model Segmentation Approach Salvetti and Nicolov, HLTC 2006

Relaxed Online SVMs for Spam Filtering Sculley and Wachman, SIGIR 2007

A Learning Approach to Spam Detection based on Social Networks Lam and Yeung, CEAS 2007

Characterizing the Splogosphere Kolari et al, WWE 2006

Detecting Spam Web Pages through Content Analysis Ntoulas et al, WWW 2006

Learning Fast Classifiers for Image Spam Dredze et al, CEAS 2007

Filtering Image Spam with Near-Duplicate Detection Wang et al, CEAS 2007

Asymmetric Gradient Boosting with Application to Spam Filtering He et al, CEAS 2007

SpamRank - Fully Automatic Link Spam Detection Benczur et al, 2005

Combating Web Spam with Trustrank Gyongyi et al, VLDB 2004

Link Spam Detection based on Mass Estimation Gyongyi et al, VLDB 2006

A Large-Scale Study of Link Spam Detection by Graph Algorithms Saito et al, AIRWEB 2007

Detecting Link Spam Using Temporal Information Shen et al, ICDM 2006

Additional Resources

Personal tools
  • Log in / create account