Kleinberg, STOC 2000
From ScribbleWiki: Analysis of Social Media
The Small-World Phenomenon: An Algorithmic Perspective
- Author: Kleinberg J.
- Conference: 32nd ACM Symposium on Theory of Computing, 2000.
- Link: http://www.cs.cornell.edu/home/kleinber/swn.pdf
- Maintainer: Sameer Badaskar
This paper gives an algorithmic perspective of the "Small World Phenomenon", observed first in the experiments of Milgram. The implication of Milgram's observations are that the diameter of real-life networks like social-networks is small. Here, it is assumed that the diameter of a network is defined as the average number of steps (or hops) required for a message to get transmitted between any two randomly selected nodes.
The work of Watts and Strogatz attempts to explain the small-world phenomenon by proposing a model in which every node in the network has a set of local connections and a few geographically long-range connections. When a message is being transmitted from a source node to a destination node, every node in the chain selects (with uniform probability) another node among its connections to pass on the message. While the work of Strogatz et. al. does capture the structure of a social network, Kleinberg argues that this model is inefficient. If N is the number of nodes in the network, the model of Strogatz results in the average number of transmission-steps being polynomial in N (<math> N^k </math>, where k is a real number <math> k > 0</math>). Morover, individuals in a social network make use of geographical cues to transmit the message to the target which the model of Strogatz et. al. does not account for. The contributions of this paper are:
- A more generalized extension of the network model of Strogatz et. al.
- Bounds on the average number of steps required to transmit a message between any two randomly selected nodes for special cases of the generalized model.
The model of Strogatz et. al. is generalized as follows
- It is assumed that the network of individuals are arranged in lattice that represents their geographical location.
- Every node X has a set of local connections.
- X also has a set of long-range contacts having a connection probability that is inversely proportional to the distance. In other words if Y is a long-range node, the probability of X sending a message to Y is P(X,Y) = A / d(X,Y)^R. Here, d(X,Y) is the distance, R is a characteristic of the network and A, the proportionality constant.
The average delivery time of a message between a source node and target node has been shown to be dependent on the network characteristic R. Specifically, this paper treats 3 cases
- R = 0 (Strogatz et.al. Model): The average transmission time is polynomial in N (number of nodes of the network).
- R = 2: The average time for transmitting a message is upper-bounded by polynomial in log(N) which explains the short-chains inherently present in a social-network.
- R > 2 and 0 < R < 2: The lower-bound of the average transmit time is shown to be polynomial in N.
Given a node X, let the distances between X and other N-1 nodes in the network be quantized into logarithmic distance-bins B_0 B_1 ... B_log(N). A node Y is present in distance-bin B_i if its distance from X is between 2^i and 2^(i+1).
The model in case 1 is inefficient because, it does not use geographical information. If R > 2, a node in the transmission chain prefers to pass on the message to nodes in distance-bins nearer to it thereby making "slow progress" towards the target node. On the other hand, if 0 < R < 2 a node prefers nodes in farther distance-bins for passing on the message which leads to an increased likelihood of "missing" or "over-shooting" the target node. However, when R = 2, all distance-bins are equally favored thereby enabling nodes in the chain to "home-in" progressively on the target node. Thus, for a two-dimensional lattice, R=2 is shown to be optimal network characteristic which results in short message chains on an average like in Milgram's experiments.
In general, for a K dimensional lattice, R=K is the optimal characteristic of network connectivity which can result in short (polynomial in log(N)) transmission chains.
The distance function used for analysis in this paper captures the geographical proximity between two nodes. The bounds for average transmission time are arrived at based on this distance measure. However, in real-life individuals use a variety of cues to guide message to the target. These cues (in addition to the geographical location) can be the target person's organization, community etc. For example, if S wants to transmit to T and if S has a friend Q who works in the same company as T, S has a better chance of reaching T by passing on the message to Q. The distance between between two persons in the same community may be "lesser" even though they may be situated geographically far apart. It will be interesting to see how different distance functions based on community closeness affect the bounds on the average transmission time. Further, every node in the network can be characterized by a feature vector which stores geographical location, organization-affiliation etc on which a suitable distance measure could be defined for analysis.