Leskovec et al, KDD 2007
From ScribbleWiki: Analysis of Social Media
Cost-Effective Outbreak Detection in Networks
In this paper, the authors present a general methodology for near optimal sensor placement in outbreak detection in networks. The main contributions are three-fold:
1. They show that many objective functions for detecting outbreaks in networks are submodular, including detection likelihood, detection time and population affected.
2. They exploit the submodularity of the objective to develop an efficient approximation algorithm, CELF, which achieves near-optimal placements.
3. They evaluate their methodology in applications such as water quality and blogosphere monitoring.
I am most impressed by the following points:
1. By changing from the original minimization problem to the alternative maximization problem, the objective function becomes non-decreasing and submodular, which makes possible simple near-optimal algorithms for the unit cost case and the more general non-constant cost case. Furthermore, submodularity also helps reduce the computation load greatly.
2. In real applications where we have multiple objectives, a natural solution is to minimize / maximize a weighted average of the objectives. As mentioned in section 2.4, such solutions are actually Pareto-optimal solutions. Furthermore, nonnegative linear combination of submodular functions is still submodular.
3. To demonstrate the performance of the proposed CELF algorithm, the authors did experiments on two large data sets: blog network with 45,000 blogs, 10.5 million posts, and 16.2 million links; water networks with 21,000 nodes and 25,000 pipes. With both of these datasets, CELF is significant better than the heuristic methods.
However, I do not quite agree on the online bounds. According to my understanding, in the online setting, the data should come in sequentially. However, in theorem 4, the scores for each node are calculated using ALL the data, and the produced bound is not quite for the ONLINE setting.