This work explores scalable and accurate dynamic pattern detection methods in graph-based data sets. The proposed Dynamic Subset Scan method is applied to the task of detecting, tracking, and source-tracing contaminant plumes spreading through a water distribution system equipped with noisy, binary sensors. While static patterns affect the same subset of data over a period of time, dynamic patterns may affect different subsets of the data at each time step. These dynamic patterns require a new approach to define and optimize penalized likelihood ratio statistics in the subset scan framework, as well as new computational techniques that scale to large, real-world networks. To address the first concern, this work develops a new subset scan methods that allows the detected subset of nodes to change over time, while incorporating temporal consistency constraints to reward patterns that do not dramatically change between adjacent time steps.
Second, the Additive GraphScan algorithm allows this novel scan statistic to process small graphs (500 nodes) in 4.1 seconds on average while maintaining an approximation ratio over 98% compared to an exact optimization method, and to scale to large graphs with over 12,000 nodes in 30 minutes on average. Evaluation results across multiple detection, tracking, and source-tracing tasks demonstrate substantial performance gains achieved by the Dynamic Subset Scan approach.