Huberman et. al., 2003
From ScribbleWiki: Analysis of Social Media
Information Dynamics in the Networked World
- Authors: Huberman B. A. and Adamic L. A.
- Journal: Lecture Notes in Physics, Springer, 2003.
- Link: http://www.hpl.hp.com/research/idl/papers/infodynamics/infodynamics.pdf
- Maintainer: Sameer Badaskar
Individuals in a social network are connected through various modalities, email being one of them. This paper explores different aspects of social-networks which are formed using email as a means of networking. The dynamics of information flow among networked individuals is examined experimentally from 3 perspectives:
- Detection of communities in an email network.
- Factors affecting the rate of spread (viral/non-viral) of information among a set of email-connected individuals.
- The small world phenomenon in communities
Detection of Communities
Communities typically represent a set of people who share common interests. For example a mailing list for a course consists of a set of individuals crediting the course. The mailing list represents a community where members can interact and share information. In this paper communities are extracted from an email-network by successively eliminating edges in the email network that have the highest betweenness. Individuals can be affiliated to multiple communities with varying strengths of affiliations. To account for multiple affiliations, the community detection is randomized using the following procedure
- For K iterations do
- Check if the connected subgraph of N nodes is a community. If the highest betweenness of the edge of the subgraph is less than N-1, the subgraph is output as a community.
- If the subgraph is not a community remove the edge of highest betweenness and keep the two subgraphs
- Repeat 1 and 2 till all the nodes have been placed into communities
For large sub-graphs, step 2 is expensive (O(n^4)) and only a randomly chosen subset of nodes is used to identify the edge with highest betweenness. Therefore, the whole procedure is repeated for K random iterations at the end of which, the communities are averaged into a final list of communities. From a list of 185,000 emails within the same organization, the authors obtained 60 communities. To evaluate the procedure, 16 individuals chosen from some communities were interviewed. All the individuals agreed that the communities they came from were correctly identified.
Spread of Information
Email networks are scale free in the sense that their structure obeys power-law distributions. While the spread of viruses in these networks can result in an epidemic, same may not be the case for spread of a piece of information (like a rumor). Individuals tend to pass on information to other interested individuals. This means that information is more likely to spread among individuals having similar characteristics. The authors analyze the email network of an organization for assessing the trends in the spread of information. Given a source node, a node which is several steps removed from the source node is less likely to have anything common with the source node. Therefore, the probability of transmission from the source node to a destination node decreases exponentially with the distance between them. This effectively limits the viral spread of information in the network. Experiments conducted on an actual email-network confirm with this analysis. Note: interested readers may directly refer to eqn (16) in the paper which gives a relation for the epidemic threshold.
Small World Phenomenon
To analyze the small world phenomenon the authors replicate Milgram's experiments in an actual email network within an organization. The three strategies for transmitting a message from a source to a destination are:
- Passing on the message to person with the most number of connections
- Passing the message to the person who is closest to the target in the organizational hierarchy
- Passing the message to the person who is geographically closest to the target person as in (Kleinberg, STOC 2000).
The 3 strategies were compared in terms of the average number of steps required for transmitting a message from a source to a target in the organization. Strategy 2. performed the best followed by Strategy 3. and Strategy 1. Strategy 1. was found to perform the worst (avg steps 40) perhaps due to the fact that most connected individuals could be thought of as "spammers" and ignored. The good performance of Strategy 2. shows that individuals use cues apart from geography in order to route the message in a chain. This point is discussed in the comments section on the annotation of (Kleinberg, STOC 2000).