National Science Foundation, Award IIS-0953330
CAREER: Machine Learning and Event Detection for the Public Good
PI: Daniel B. Neill (neill @ cs.cmu.edu)
Funding duration: July 1, 2010 - June 30, 2015
Funding amount: $529,962
Project personnel:
Daniel B. Neill (Associate
Professor of Information Systems, Heinz College, CMU) (PI)
Seth Flaxman (Ph.D. student, Joint
Ph.D. in Machine Learning and Public Policy, Heinz College and School of
Computer Science, CMU)
Edward McFowland III
(Ph.D. student, Heinz College, CMU)
Kenton Murray (M.S. student,
Language Technologies Institute, CMU)
Sriram Somanchi (Ph.D. student, Heinz College, CMU)
Skyler Speakman (Ph.D. student, Heinz College, CMU)
Donghan (Jarod) Wang (research programmer and system administrator,
CMU)
Xin Wu (M.S. student, Very Large Information Systems, CMU)
Yating Zhang (MISM student, Heinz College, CMU)
Project alumni:
Michael Baysek (research programmer and system administrator, CMU)
Tarun Kumar (M.S., Very Large Information Systems, CMU)
Yandong Liu (M.S., Language
Technologies, CMU)
Rajas Lonkar (M.S., Information Systems Management, CMU)
Amrut Nagasunder (M.S., Very Large Information Systems, CMU)
Kan Shao (Ph.D.,
Engineering and Public Policy, and M.S., Machine Learning, CMU)
Project description:
The goal of this research is to create and explore novel methods
for detection of emerging events in massive, complex real-world
datasets. The approach consists of new algorithms to efficiently
and exactly find the most anomalous subsets of a large,
high-dimensional dataset, as well as methodological advances to
incorporate incremental model learning from user feedback into
event detection, incorporate society-scale data from emerging,
transformative technologies such as cellular phones and
user-generated web content, and augment event detection by
creating methods and tools for event characterization,
explanation, visualization, investigation and response.
The experimental research is integrated with a multi-pronged
educational initiative to incorporate machine learning into the
public policy curriculum through development of courses and
seminars, workshops in machine learning and policy research and
education, and establishment of a new Joint Ph.D. Program in
Machine Learning and Policy. The results of this project will be
incorporated into deployed event surveillance systems and applied
to the public health, law enforcement, and health care domains,
enabling more timely and accurate detection of emerging outbreaks
of disease, prediction of emerging hot-spots of violent crime, and
identification of anomalous patterns of patient care.
Detailed descriptions of our current research and educational
activities, and results/findings are
available here.
Publications:
Daniel B. Neill and Gregory F. Cooper. A multivariate Bayesian scan
statistic for early event detection and characterization. Machine
Learning 79: 261-282, 2010. (pdf)
Daniel Oliveira, Daniel B. Neill, James H. Garrett Jr., and Lucio
Soibelman. Detection of patterns in water distribution pipe breakage
using spatial scan statistics for point events in a physical network.
Journal of Computing in Civil Engineering 25(1): 21-30,
2011. (pdf)
Daniel B. Neill. Fast Bayesian scan statistics for multivariate event
detection and visualization. Statistics in Medicine 30(5):
455-469, 2011. (pdf)
Daniel B. Neill, Edward McFowland III, and Huanian Zheng. Fast subset
scan for multivariate spatial biosurveillance. Emerging Health Threats
Journal 4: s42, 2011. (pdf)
Daniel B. Neill and Yandong Liu. Generalized fast subset sums for
Bayesian detection and visualization. Emerging Health Threats
Journal 4: s43, 2011. (pdf)
Kan Shao, Yandong Liu, and Daniel B. Neill. A generalized fast subset
sums framework for Bayesian event detection. Proceedings of the 11th
IEEE International Conference on Data Mining, 617-625, 2011. (pdf)
Yandong Liu and Daniel B. Neill. Detecting previously unseen outbreaks
with novel symptom patterns. Emerging Health Threats Journal 4:
11074, 2011. (pdf)
Sriram Somanchi and Daniel B. Neill. Fast graph structure learning from
unlabeled data for outbreak detection. Emerging Health Threats
Journal 4: 11017, 2011. (pdf)
Skyler Speakman, Edward McFowland III, Sriram Somanchi, and Daniel B.
Neill. Scalable detection of irregular disease clusters using
soft compactness constraints. Emerging Health Threats Journal 4:
11121, 2011. (pdf)
Daniel B. Neill. Fast subset scan for spatial pattern detection.
Journal of the Royal Statistical Society (Series B: Statistical
Methodology) 74(2): 337-360, 2012. (pdf)
Daniel B. Neill. New directions in artificial intelligence for public health
surveillance. IEEE Intelligent Systems 27(1): 56-59, 2012. (pdf)
Daniel B. Neill, Edward McFowland III, and Huanian Zheng. Fast subset
scan for multivariate event detection. Statistics in Medicine,
in press, 2012. Article published online: 22 NOV 2012, DOI:
10.1002/sim.5675. (link)
Skyler Speakman, Sriram Somanchi, Edward McFowland III, and Daniel B.
Neill. Scalable detection of anomalous subgraphs. Book
chapter, Encyclopedia of Social Network Analysis and Mining,
in press, 2012.
Seth Flaxman and Daniel B. Neill. Detecting spatially localized subsets
of leading indicators for event prediction. Submitted for publication,
2012.
Tarun Kumar and Daniel B. Neill. Fast tensor scan for event detection and
characterization. Submitted for publication, 2012.
Edward McFowland III, Skyler Speakman, and Daniel B. Neill. Fast
generalized subset scan for anomalous pattern detection. Submitted for
publication, 2012.
Sriram Somanchi and Daniel B. Neill. Fast graph structure learning from
unlabeled data for event detection. Submitted for publication,
2012.
Skyler Speakman, Edward McFowland III, and Daniel B. Neill. Scalable
detection of anomalous patterns with connectivity constraints. Submitted
for publication, 2012.
Presentations:
Daniel B. Neill. Fast subset sums for scalable Bayesian detection and
visualization. Fifth International Workshop on Applied Probability,
Madrid, Spain, July 2010. (pdf)
Skyler Speakman, Edward McFowland III, and Daniel B. Neill. Scalable
detection of anomalous patterns with connectivity constraints. INFORMS
Annual Conference, Austin, TX, November 2010. (pdf)
Edward McFowland III, Skyler Speakman, and Daniel B. Neill. Fast
generalized subset scan for anomalous pattern detection. INFORMS Annual
Conference, Austin, TX, November 2010. (pdf)
Daniel B. Neill, Edward McFowland III, and Huanian Zheng. Fast subset
scan for multivariate spatial biosurveillance. International Society for
Disease Surveillance Annual Conference, Park City, UT, December
2010. (pdf)
Daniel B. Neill and Yandong Liu. Generalized fast subset sums for
Bayesian detection and visualization. International Society for Disease
Surveillance Annual Conference, Park City, UT, December 2010. (pdf)
Daniel B. Neill. Research challenges for biosurveillance: the next ten
years (invited plenary). International Society for Disease Surveillance
Annual Conference, Park City, UT, December 2010. (pdf)
Daniel B. Neill. Spatial and subset scanning for multivariate health
surveillance. Data Fusion Research Meeting, Ottawa, ON, March
2011. (pdf)
Daniel B. Neill. Machine learning for population health and disease
surveillance. Advanced Analytics Workshop, Washington, DC, April
2011. (pdf)
Edward McFowland III and Daniel B. Neill. Fast generalized subset scan
for anomalous pattern detection in mixed data sets. 17th Conference for
African-American Researchers in the Mathematical Sciences, Los Angeles,
CA, June 2011.
Daniel B. Neill. Fast multivariate subset scanning for scalable cluster
detection. Joint Statistical Meetings 2011, Miami, FL, August
2011. (pdf)
Edward McFowland III and Daniel B. Neill. Efficient methods for anomalous
pattern detection in general datasets. INFORMS Annual Conference,
Charlotte, NC, November 2011. (pdf)
Sriram Somanchi and Daniel B. Neill. Fast learning of graph structure from
unlabeled data for anomalous pattern detection. INFORMS Annual Conference,
Charlotte, NC, November 2011. (pdf)
Skyler Speakman and Daniel B. Neill. Dynamic pattern detection with
connectivity and temporal consistency constraints. INFORMS Annual
Conference, Charlotte, NC, November 2011. (pdf)
Daniel B. Neill. Analytical methods for large scale surveillance of
unstructured data. International Conference on Digital Disease Detection,
Boston, MA, February 2012. (pdf)
Daniel B. Neill and Edward McFowland III. Fast generalized subset scan
for anomalous pattern detection. Sixth International Workshop on Applied
Probability, Jerusalem, Israel, June 2012.
Daniel B. Neill, Skyler Speakman, Edward McFowland III, and Sriram
Somanchi. Efficient subset scanning with soft constraints. Sixth
International Workshop on Applied Probability, Jerusalem, Israel, June
2012.
Skyler Speakman, Edward McFowland III, and Daniel B. Neill. Scalable
detection of anomalous patterns with connectivity constraints. 29th
Quality and Productivity Research Conference, Long Beach, CA, June
2012.
Daniel B. Neill and Seth Flaxman. Detecting spatially localized subsets
of leading indicators for event prediction. 32nd International Symposium
on Forecasting, Boston, MA, June 2012.
Broader Impacts: The Machine Learning and Policy (MLP) Initiative
With the critical importance of addressing global policy problems ranging
from disease pandemics to crime and terrorism, and the continuously
increasing size and complexity of policy data, the use of machine learning
has become increasingly essential for data-driven policy analysis and for
development of new, practical information technologies that can be
directly applied for the public good. The numerous challenges facing our
world will require broad, successful innovations at the intersection of
machine learning and public policy. This endeavor will require widespread
collaboration between machine learning and policy researchers, increased
emphasis on the education of future researchers with in-depth knowledge of
both disciplines, and a broadly shared research focus on developing novel
machine learning methods which directly address critical policy
challenges. We are working to build a multi-pronged curricular program,
the Machine Learning and Policy (MLP) initiative. This program will
facilitate the widespread use of machine learning methods for the public
good by incorporating machine learning throughout the public policy
curriculum. Key components of this program include a new Joint Ph.D.
program in Machine Learning and Public Policy, an introductory course in
machine learning ("Large Scale Data Analysis for Policy") geared toward
public policy students, a Ph.D.-level research seminar in Machine Learning
and Policy, and a course series in "Special Topics in Machine Learning and
Policy", with courses including "Event and Pattern Detection" (Spring
2010), "Machine Learning for the Developing World" (Spring 2011),
"Harnessing the Wisdom of Crowds" (Spring 2012), and "Crime
Hot-Spot Detection and Prediction" (anticipated Spring 2013).
Tutorials and Educational Material:
Daniel B. Neill. Lecture slides for the course, Large Scale Data Analysis
for Public Policy. Last taught Fall 2011. (link)
Daniel B. Neill. Machine learning and event detection for the public good. Guest lecture,
April 2011.
(pdf)
Daniel B. Neill and Weng-Keen Wong. A tutorial on event
detection. Presented at the 15th ACM SIGKDD Conference on Knowledge
Discovery and Data Mining, 2009. (pdf)
Daniel B. Neill. Spatial scan tips and tricks for practical outbreak
detection. Invited webinar for the International Society for Disease
Surveillance, January 2011. (pdf)
Awards:
The Project PI, Dr. Neill, was named one of the "AI's 10 to Watch" by IEEE
Intelligent Systems, Jan/Feb 2011. (link)
Edward McFowland III was awarded an NSF Graduate Research Fellowship
(link) and an AT&T Labs Research
Fellowship, 2011. (link)
Edward McFowland III was the 2012 winner of the Suresh Konda Award,
presented yearly to Heinz College's best Second Heinz Research
Paper.
This material is based upon work supported by the National Science
Foundation, grants IIS-0953330 (primary funding source), IIS-0916345, and
IIS-0911032. Any opinions, findings, and conclusions or recommendations
expressed in this material are those of the author(s) and do not
necessarily reflect the views of the National Science Foundation.
Back to Daniel's home page
Contact the PI: Daniel Neill, neill (at) cs (dot) cmu (dot) edu
Last update: May 18, 2012