III:Small:Influence and Virus Propagation in Large Graphs - Theory and Algorithms

 
Christos Faloutsos Phone: (412)-268.1457
Department of Computer Science Fax : (412)-268.5576
Carnegie Mellon Univ. Email: christos@cs.cmu.edu
Pittsburgh, PA 15213 WWW page: http://www.cs.cmu.edu/~christos

This material is based upon work supported by the National Science Foundation under Grant No. IIS-1017415. Any opinions, findings, and conclusions or recommendations expressed in this material are those of the author(s) and do not necessarily reflect the views of the National Science Foundation.

1. GENERAL INFORMATION

1.1. Abstract

Link to NSF abstract

Given a graph, such as a social/computer network, or the blogosphere, how will a virus (or rumor or new product) propagate in it? Will it take over, creating a pandemic? How to select the 'k' best nodes/edges for immunization, or, conversely, the best 'k' blogs for the fastest dissemination of a new idea? The answer to such questions is vital, for public health, for network security, for market penetration, for blog monitoring and many more applications. In the PI's past work, they studied arbitrary graphs, and showed that propagation depends on a single number, namely, the first eigenvalue of the adjacency matrix of the network. Specifically, they studied the so-called 'epidemic threshold' for flu-like propagation (``SIS'' model = susceptible-infectious-susceptible), on un-directed, un-weighted static graphs. All earlier work focused on full cliques, or homogeneous graphs, or specific cases of power-law graphs - *all* of which are special cases of the PI's eigenvalue result. The major thrusts of the current proposal are two. The first is *theory*: For a mumps-like model (``SIR'' = susceptible - infected - recovered), and for additional models, when will a virus result in a pandemic? What can we say about weighted graphs? About time-evolving graphs, like an ad-hoc network of mobile phone users? The second thrust is on *algorithms*: Given a graph, a virus model (SIS, SIR, etc), and a fixed budget of 'k' nodes/edges to immunize, how can we quickly find an optimal or near-optimal solution, to best contain the virus? How can we modify the algorithm, when the network changes over time? The TECHNICAL MERIT of the work is that it is the first to focus on *arbitrary* graphs, thus including real ones. In contrast, the vast majority of past analytical work makes unrealistic assumptions about the graph topology (cliques, homogeneous graphs etc.). The BROADER IMPACT is high, as dynamics of large-scale graphs appear in numerous settings: cascades on blogs; product penetration and viral marketing; rumor/information propagation; immunization policies; advertisement policies etc.

1.2. Keywords

Data mining, Virus Propagation, Immunization, Viral Marketing.

1.3. Funding agency

2. PEOPLE INVOLVED

In addition to the PI, the following people work on the project.

3. RESEARCH

3.1. Project goals

There are two motivating questions, one for each proposed thrust: (a) Theory: Given the virus/influence property and the graph structure, what is the condition that an epidemic will break out (or a piece of information will survive)? (b) Algorithms: Given the virus/influence property and the graph structure, which k nodes/edges are the best to immunize so that we can prevent an epidemic from breaking out to the maximum extent? We give the highlights of each thrust next.
Theory: The first thrust tries to understand the epidemic threshold of virus propagation. In our past work, we discovered a general epidemic threshold for SIS model with un-directed static graphs. First of all, we propose to generalize it to (1) other types of viruses (e.g., SIR), and (2) more general graphs (e.g., directed graphs). Unlike the existing work on epidemic threshold, which are designed for some special graphs; in our proposed work, we aim to address the arbitrary, real graphs. What is more, we also want to study some important settings, which are barely (if at all) studied by the existing work. To name a few, we plan to study the epidemic threshold (1) for multiple competing viruses, (2) on dynamic graphs, etc.
Algorithms: The second thrust involves the development of immunization algorithms. In our on-going work, we developed a fast near-optimal immunization strategy for SIS model with un-directed, un-weighted static graphs. We propose to continue this work by generalizing it to (1) SIR virus propagation model; (2) directed weighted graphs. Unlike the existing immuniza- tion algorithms which are based on some heuristic and/or designed for some special graphs, the proposed immunization algorithms will focus on arbitrary, real graphs; and try to (ap- proximately) maximize the corresponding epidemic threshold. What is more, we also want to develop immunization algorithms under some important, yet barely (if at all) studied settings in the existing work, including (1) dynamic immunization, (2) multiple viruses cost-sensitive immunization. Finally, we want to handle the scalability issue by implementing our algorithms on map/reduce architecture.

3.2. Current Results

Our main results so far are as follows:

3.3. Publications

Last updated: September 12, 2011, by B. Aditya Prakash.