National Science Foundation, Award IIS-0953330
CAREER: Machine Learning and Event Detection for the Public Good
PI: Daniel B. Neill (neill @ cs.cmu.edu)

Funding duration: July 1, 2010 - June 30, 2015
Funding amount: $529,962

Project personnel:

Daniel B. Neill (Associate Professor of Information Systems, Heinz College, CMU) (PI)
Seth Flaxman (Ph.D. student, Joint Ph.D. in Machine Learning and Public Policy, Heinz College and School of Computer Science, CMU)
Edward McFowland III (Ph.D. student, Heinz College, CMU)
Kenton Murray (M.S. student, Language Technologies Institute, CMU)
Sriram Somanchi (Ph.D. student, Heinz College, CMU)
Skyler Speakman (Ph.D. student, Heinz College, CMU)
Donghan (Jarod) Wang (research programmer and system administrator, CMU)
Xin Wu (M.S. student, Very Large Information Systems, CMU)
Yating Zhang (MISM student, Heinz College, CMU)

Project alumni:

Michael Baysek (research programmer and system administrator, CMU)
Tarun Kumar (M.S., Very Large Information Systems, CMU)
Yandong Liu (M.S., Language Technologies, CMU)
Rajas Lonkar (M.S., Information Systems Management, CMU)
Amrut Nagasunder (M.S., Very Large Information Systems, CMU)
Kan Shao (Ph.D., Engineering and Public Policy, and M.S., Machine Learning, CMU)

Project description:

The goal of this research is to create and explore novel methods for detection of emerging events in massive, complex real-world datasets. The approach consists of new algorithms to efficiently and exactly find the most anomalous subsets of a large, high-dimensional dataset, as well as methodological advances to incorporate incremental model learning from user feedback into event detection, incorporate society-scale data from emerging, transformative technologies such as cellular phones and user-generated web content, and augment event detection by creating methods and tools for event characterization, explanation, visualization, investigation and response.

The experimental research is integrated with a multi-pronged educational initiative to incorporate machine learning into the public policy curriculum through development of courses and seminars, workshops in machine learning and policy research and education, and establishment of a new Joint Ph.D. Program in Machine Learning and Policy. The results of this project will be incorporated into deployed event surveillance systems and applied to the public health, law enforcement, and health care domains, enabling more timely and accurate detection of emerging outbreaks of disease, prediction of emerging hot-spots of violent crime, and identification of anomalous patterns of patient care.

Detailed descriptions of our current research and educational activities, and results/findings are available here.



Publications:

Daniel B. Neill and Gregory F. Cooper. A multivariate Bayesian scan statistic for early event detection and characterization. Machine Learning 79: 261-282, 2010. (pdf)

Daniel Oliveira, Daniel B. Neill, James H. Garrett Jr., and Lucio Soibelman. Detection of patterns in water distribution pipe breakage using spatial scan statistics for point events in a physical network. Journal of Computing in Civil Engineering 25(1): 21-30, 2011. (pdf)

Daniel B. Neill. Fast Bayesian scan statistics for multivariate event detection and visualization. Statistics in Medicine 30(5): 455-469, 2011. (pdf)

Daniel B. Neill, Edward McFowland III, and Huanian Zheng. Fast subset scan for multivariate spatial biosurveillance. Emerging Health Threats Journal 4: s42, 2011. (pdf)

Daniel B. Neill and Yandong Liu. Generalized fast subset sums for Bayesian detection and visualization. Emerging Health Threats Journal 4: s43, 2011. (pdf)

Kan Shao, Yandong Liu, and Daniel B. Neill. A generalized fast subset sums framework for Bayesian event detection. Proceedings of the 11th IEEE International Conference on Data Mining, 617-625, 2011. (pdf)

Yandong Liu and Daniel B. Neill. Detecting previously unseen outbreaks with novel symptom patterns. Emerging Health Threats Journal 4: 11074, 2011. (pdf)

Sriram Somanchi and Daniel B. Neill. Fast graph structure learning from unlabeled data for outbreak detection. Emerging Health Threats Journal 4: 11017, 2011. (pdf)

Skyler Speakman, Edward McFowland III, Sriram Somanchi, and Daniel B. Neill. Scalable detection of irregular disease clusters using soft compactness constraints. Emerging Health Threats Journal 4: 11121, 2011. (pdf)

Daniel B. Neill. Fast subset scan for spatial pattern detection. Journal of the Royal Statistical Society (Series B: Statistical Methodology) 74(2): 337-360, 2012. (pdf)

Daniel B. Neill. New directions in artificial intelligence for public health surveillance. IEEE Intelligent Systems 27(1): 56-59, 2012. (pdf)

Daniel B. Neill, Edward McFowland III, and Huanian Zheng. Fast subset scan for multivariate event detection. Statistics in Medicine, in press, 2012. Article published online: 22 NOV 2012, DOI: 10.1002/sim.5675. (link)

Skyler Speakman, Sriram Somanchi, Edward McFowland III, and Daniel B. Neill. Scalable detection of anomalous subgraphs. Book chapter, Encyclopedia of Social Network Analysis and Mining, in press, 2012.

Seth Flaxman and Daniel B. Neill. Detecting spatially localized subsets of leading indicators for event prediction. Submitted for publication, 2012.

Tarun Kumar and Daniel B. Neill. Fast tensor scan for event detection and characterization. Submitted for publication, 2012.

Edward McFowland III, Skyler Speakman, and Daniel B. Neill. Fast generalized subset scan for anomalous pattern detection. Submitted for publication, 2012.

Sriram Somanchi and Daniel B. Neill. Fast graph structure learning from unlabeled data for event detection. Submitted for publication, 2012.

Skyler Speakman, Edward McFowland III, and Daniel B. Neill. Scalable detection of anomalous patterns with connectivity constraints. Submitted for publication, 2012.



Presentations:

Daniel B. Neill. Fast subset sums for scalable Bayesian detection and visualization. Fifth International Workshop on Applied Probability, Madrid, Spain, July 2010. (pdf)

Skyler Speakman, Edward McFowland III, and Daniel B. Neill. Scalable detection of anomalous patterns with connectivity constraints. INFORMS Annual Conference, Austin, TX, November 2010. (pdf)

Edward McFowland III, Skyler Speakman, and Daniel B. Neill. Fast generalized subset scan for anomalous pattern detection. INFORMS Annual Conference, Austin, TX, November 2010. (pdf)

Daniel B. Neill, Edward McFowland III, and Huanian Zheng. Fast subset scan for multivariate spatial biosurveillance. International Society for Disease Surveillance Annual Conference, Park City, UT, December 2010. (pdf)

Daniel B. Neill and Yandong Liu. Generalized fast subset sums for Bayesian detection and visualization. International Society for Disease Surveillance Annual Conference, Park City, UT, December 2010. (pdf)

Daniel B. Neill. Research challenges for biosurveillance: the next ten years (invited plenary). International Society for Disease Surveillance Annual Conference, Park City, UT, December 2010. (pdf)

Daniel B. Neill. Spatial and subset scanning for multivariate health surveillance. Data Fusion Research Meeting, Ottawa, ON, March 2011. (pdf)

Daniel B. Neill. Machine learning for population health and disease surveillance. Advanced Analytics Workshop, Washington, DC, April 2011. (pdf)

Edward McFowland III and Daniel B. Neill. Fast generalized subset scan for anomalous pattern detection in mixed data sets. 17th Conference for African-American Researchers in the Mathematical Sciences, Los Angeles, CA, June 2011.

Daniel B. Neill. Fast multivariate subset scanning for scalable cluster detection. Joint Statistical Meetings 2011, Miami, FL, August 2011. (pdf)

Edward McFowland III and Daniel B. Neill. Efficient methods for anomalous pattern detection in general datasets. INFORMS Annual Conference, Charlotte, NC, November 2011. (pdf)

Sriram Somanchi and Daniel B. Neill. Fast learning of graph structure from unlabeled data for anomalous pattern detection. INFORMS Annual Conference, Charlotte, NC, November 2011. (pdf)

Skyler Speakman and Daniel B. Neill. Dynamic pattern detection with connectivity and temporal consistency constraints. INFORMS Annual Conference, Charlotte, NC, November 2011. (pdf)

Daniel B. Neill. Analytical methods for large scale surveillance of unstructured data. International Conference on Digital Disease Detection, Boston, MA, February 2012. (pdf)

Daniel B. Neill and Edward McFowland III. Fast generalized subset scan for anomalous pattern detection. Sixth International Workshop on Applied Probability, Jerusalem, Israel, June 2012.

Daniel B. Neill, Skyler Speakman, Edward McFowland III, and Sriram Somanchi. Efficient subset scanning with soft constraints. Sixth International Workshop on Applied Probability, Jerusalem, Israel, June 2012.

Skyler Speakman, Edward McFowland III, and Daniel B. Neill. Scalable detection of anomalous patterns with connectivity constraints. 29th Quality and Productivity Research Conference, Long Beach, CA, June 2012.

Daniel B. Neill and Seth Flaxman. Detecting spatially localized subsets of leading indicators for event prediction. 32nd International Symposium on Forecasting, Boston, MA, June 2012.



Broader Impacts: The Machine Learning and Policy (MLP) Initiative

With the critical importance of addressing global policy problems ranging from disease pandemics to crime and terrorism, and the continuously increasing size and complexity of policy data, the use of machine learning has become increasingly essential for data-driven policy analysis and for development of new, practical information technologies that can be directly applied for the public good. The numerous challenges facing our world will require broad, successful innovations at the intersection of machine learning and public policy. This endeavor will require widespread collaboration between machine learning and policy researchers, increased emphasis on the education of future researchers with in-depth knowledge of both disciplines, and a broadly shared research focus on developing novel machine learning methods which directly address critical policy challenges. We are working to build a multi-pronged curricular program, the Machine Learning and Policy (MLP) initiative. This program will facilitate the widespread use of machine learning methods for the public good by incorporating machine learning throughout the public policy curriculum. Key components of this program include a new Joint Ph.D. program in Machine Learning and Public Policy, an introductory course in machine learning ("Large Scale Data Analysis for Policy") geared toward public policy students, a Ph.D.-level research seminar in Machine Learning and Policy, and a course series in "Special Topics in Machine Learning and Policy", with courses including "Event and Pattern Detection" (Spring 2010), "Machine Learning for the Developing World" (Spring 2011), "Harnessing the Wisdom of Crowds" (Spring 2012), and "Crime Hot-Spot Detection and Prediction" (anticipated Spring 2013).



Tutorials and Educational Material:

Daniel B. Neill. Lecture slides for the course, Large Scale Data Analysis for Public Policy. Last taught Fall 2011. (link)

Daniel B. Neill. Machine learning and event detection for the public good. Guest lecture, April 2011. (pdf)

Daniel B. Neill and Weng-Keen Wong. A tutorial on event detection. Presented at the 15th ACM SIGKDD Conference on Knowledge Discovery and Data Mining, 2009. (pdf)

Daniel B. Neill. Spatial scan tips and tricks for practical outbreak detection. Invited webinar for the International Society for Disease Surveillance, January 2011. (pdf)



Awards:

The Project PI, Dr. Neill, was named one of the "AI's 10 to Watch" by IEEE Intelligent Systems, Jan/Feb 2011. (link)

Edward McFowland III was awarded an NSF Graduate Research Fellowship (link) and an AT&T Labs Research Fellowship, 2011. (link)

Edward McFowland III was the 2012 winner of the Suresh Konda Award, presented yearly to Heinz College's best Second Heinz Research Paper.



This material is based upon work supported by the National Science Foundation, grants IIS-0953330 (primary funding source), IIS-0916345, and IIS-0911032. Any opinions, findings, and conclusions or recommendations expressed in this material are those of the author(s) and do not necessarily reflect the views of the National Science Foundation.

Back to Daniel's home page
Contact the PI: Daniel Neill, neill (at) cs (dot) cmu (dot) edu
Last update: May 18, 2012