Kleinberg et al, KDD 2003
From ScribbleWiki: Analysis of Social Media
Maximizing the spread of influence through a social network
In this paper, the question inverstigated by the authors is one at the core of every marketing campaign: given a certain budget, and a potential network of consumers, and the goal is to maximize the adoption of a new product through the network; what set of individuals should we focus the budget on?
A key to answering the question is the understanding of the underlying network and how influence gets spread through the network. In the paper, two basic diffusion models are discussed: linear threshold and independent cascade. Both models represent the social network as a directed graph, with nodes as individuals, and weighted edges as influence from one to another. Nodes start either active or inactive, and an active node may trigger activation of neighboring nodes (but active nodes never deactivate), so influence gets spread.
Within the framework of these two models, the problem becomes how to select the most influential set of k nodes, given a budget k, so as to maximize the expected number of active nodes at the end of the process. Unfortunately, the authors demonstrated that the objective function for both models are submodular functions(if a function f maps a finite ground set U to non-negative real numbers, and satisfies a natural "diminishing returns" property, then f is a submodular function). It means that the optimization problem of selecting k elements is NP-hard, according to a proven theorem. On the other hand, the same "diminishing returns" property lends the problem to a simple greedy hill-climbing algorithm, which can obtains a solution that is provably within 63% of optimal. Besides its provable guarantees, the paper shows that their greedy algorithm significantly out-performs other widely used heuristics including degree centrality and distance centrality) in computational experiments.
The paper also describes generalizations of the basic models in various aspects: a general framework that has equivalent formulations in terms of thresholds and cascades; a model in which nodes CAN switch in both directions rather only from inactivity to activity; a more realistic scenario in which different marketing actions are available, and each of them only affect some subset of nodes probabilistically rather than deterministically. Finally, the paper shows that all of these generalized models satisfy the nice submodularity property, and hence can all be solved with the proposed greedy algorithm and with the same performance guarantee.
The contribution of the paper is that it (approximately) solved a computationally intractable problem with a neat and mathematically sound greedy heuristics, by exploiting a subtle "diminishing returns" property. This idea, as well as some of the richer models proposed by the paper, subsequently proved useful to researchers modeling the spread of influence in other domains, such as the spread of news stories through the links among weblogs. It also won the paper "Best Paper" award for KDD 2003.