# Graph-based analysis

### From ScribbleWiki: Analysis of Social Media

## Class meetings

The Sep 11 lecture of the Analysis of Social Media course included an overview of this area. Advanced topics were discussed on Sep 25 and Oct 2.

## Modeling social networks, a short history

While not necessarily meant as a model for social networks, Paul Erdos and Alfred Renyi developed a model for generating graphs (Erdos-Renyi random graph). It begins with a fixed number of nodes *n* and randomly adds edges under probability *p*. In (Erdos+1960) they proved several properties about the properties of these graphs over time. While E-R graphs have many differing properties from graphs found in real networks (such as degree distribution and clustering), they formed a basis for modeling real graphs.

The first wide study on studying social networks on a large scale was done by Jeffrey Travers and Stanley Milgram (the same Milgram responsible who headed the experiments on obedience to authority). They picked a stockbroker in the Boston area, and chose samples from phone books in Nebraska, Kansas, and Massachusetts. Each person in the sample was mailed a package asking them to forward the package to somebody they knew on a first-name ("handshake") basis who they think would be "closer" to the target stockbroker, based on number of handshakes. Of the approximately 20 percent of the letters that did reach the destination, the distribution of "handshakes" was bimodal, with modes at 5 and 7. This is where the idea of "six degrees of separation" came from. They published results in (Travers+1969).

Duncan Watts and Steven Strogatz designed a model (Watts+1998) which builds on random networks. Beginning with a regular lattice (each node is connected with *k* neighbors), choose a parameter *p*. For each node *i*, for each edge from node *i* to node *j*, with probability *p* "rewire" it, removing the edge from *i* to *j* and make new edge *i* *k* where *k* is a random node. As *p* approaches 1, it will become a random (E-R) graph. But for *p* between 0 and 1, there is something between a random graph and a completely "local" model.

Granovetter created a model for behavior that may be applied to social networks. He was able to find that there is often a *threshold* for creating a behavior. The idea of thresholds is prevalent in graph modeling, that often a certain parameter will generate very different graph properties on different sides of a threshold. Kempe, Kleinberg, and Tardos presented diffusion models for social network which present threshold properties. Granovetter also proposed that "weak ties' were in many ways more "important" in marketing of politics (Granovetter, 1973), which influenced later work in developing network models.

The idea of *scale-free* networks is accepted by many scientists, although there is still controversy (See Clauset+2007 for a compelling argument against). Under scale-free networks, degree distributions follow a power law. Barabasi and Albert proposed the *preferential attachment* model to generate such networks (Barabasi+1999). Under preferential attachment, a new node joins an existing network, linking to an existing node *j* with probability proportional to its degree. Clearly this will generate "rich-get-richer" behavior, where well-connected nodes gain more connections.

Several other models for web graphs have been proposed. Kleinberg et al proposed a model for graph formation based on stochastic processes (Klenberg+1999) and "copying" (that is, creating a node "j" identical to "i" in the set of neighbors, and then stochastically rewiring). Copying was also introduced in a model by Krapivsky and Redner. Kroenecker multiplication has also been used to generate self-similar networks, which can exhibit both scale-free and small-world properties (Leskovec+2007). Leskovec, Kleinberg, and Faloutsos also introduced two other potential models, *Community-guided attachment*, which is based on self-similarity arguments, and *forest fire*, which is based on diffusion of ties (Leskovec+2005). And of course, there are many other models which have not been addressed here.

- Albert-László Barabási & Réka Albert (October 1999). "Emergence of scaling in random networks". Science 286: 509–512.
- Erdős, P.; Rényi, A. (1960). "The Evolution of Random Graphs". Magyar Tud. Akad. Mat. Kutató Int. Közl. 5: 17-61.
- Granoveter, Mark "The Strength of Weak Ties"; American Journal of Sociology, Vol. 78, No. 6., May 1973, pp 1360-1380.
- Granovetter, Mark. "Threshold Models of Collective Behavior"; American Journal of Sociology, Vol. 83, No. 6, November 1978, pp 1420-1443.
- David Kempe, Jon Kleinberg, and Eva Tardos. Maximizing the spread of influence through a social network. In Proceedings of the Ninth International Conference on Knowledge Discovery and Data Mining, 2003.
- Jon M. Kleinberg, Ravi Kumar, Prabhakar Raghavan, Sridhar Rajagopalan and Andrew S. Tomkins. "The Web as a graph: measurements, models and methods" Proceedings of the 5th International Computing and combinatorics Conference,1999
- Krapivsky, P. L., and Redner, S., Network growth by copying, Phys. Rev. E 71 036118 (2005).
- Jure Leskovec, Jon Kleinberg, Christos Faloutsos. Graphs over Time: Densification Laws, Shrinking Diameters and Possible Explanations, KDD 2007.
- Jure Leskovec, Christos Faloutsos. Scalable Modeling of Real Graphs using Kronecker Multiplication. International Conference on Machine Learning (ICML), 2007.
- Travers, Jeffrey & Stanley Milgram. 1969. "An Experimental Study of the Small World Problem." Sociometry, Vol. 32, No. 4, pp. 425-443.
- Watts, D. J., and S. H. Strogatz, "Collective dynamics of 'small-world' networks." 1998, Nature (London) 393, 440.

## Proximity on Graphs

Hanghang Tong will take care of the following sub-field in graph-based analysis