Leskovec et, SDM 2007
From ScribbleWiki: Analysis of Social Media
Cascading Behavior in Large Blog Graphs
In many ways, this paper is an extension of the authors’ pervious work on viral marketing (Lescovec et al, PAKDD2006) to blogosphere. Here blogosphere is the collective term encompassing all blogs and posts linked together. Within blogosphere, posts and links between them form a post network; blogs and weighted edges between them (by collapsing all links between posts from the blogs) form blog network; and several posts that are linked in the direction of information propagation form a directed sub-graph (cascade) within the post network.
The goal of the paper is to explore the properties and dynamics of these networks and sub-networks, and thereby understand the underlying blogosphere. Specifically, they try to answer 3 questions:
-- Temporal questions: How does popularity die off? Is there burstiness/periodicity?
-- Topological questions: What topological patterns do posts and blogs follow? What are the characteristic (size, shape, etc.) of a cascade?
-- Generative model: Can we build model that generate realistic cascades?
Their study is done on 45,000 blogs participating in cascades, and all their posts for 3 months (Aug-Sept '05). In total there are 2.4 million posts with about 5 million links between them.
As to the temporal question, they found a periodic week-end drop in both number of posts and number of links; Post popularity drop-off follows a power law with exponent of -1.5, exactly as found in other work about the bursty nature of human behavior (The origin of bursts and heavy tails in human dynamics).
Topological observations are that (heavy tailed) power laws are everywhere: in-degree and out-degree distribution of blog network, in-degree and out-degree distribution of post network (though post network is more sparsely connected than blog network), size distribution of cascades (in general and at different levels) etc. It’s also found that cascades are mostly tree-like, and blog cascades tend to be larger than viral marketing cascades.
Finally they try to model cascade generation as spreading of an epidemic, using the Simple virus propagation type of model (SIS). Simulations show that this simple model can generate models that match multiple properties of a realistic cascade, such as in-degree and cascade size. However, the process only generates cascades that are trees, so out-degree for all cascade is 1.
The empirical observations in the paper do not seem quite useful on their own. But based on these findings, more realistic models of the social network (blogosphere in this case) may be constructed to facilitate the solving of other more practical problems. One example is the same authors' subsequent work on Cost-effective outbreak detection in networks (Leskovec et al, KDD 2007).