Daniel B. Neill
Dean's Career Development Professor
and Associate Professor of Information Systems
Director, Event and Pattern Detection Laboratory
H.J. Heinz III College
Carnegie Mellon University
Hamburg Hall #2105B, x8-3885
I am currently teaching four courses at the Heinz College. Course
descriptions, sample syllabi, and lecture slides can be obtained by
clicking on the course names below, and current course materials are
available on Blackboard.
Large Scale Data Analysis for Public
Policy (90-866) is a master's level course which focuses on the
application of artificial intelligence and machine learning methods to
real-world policy problems.
I am also teaching two Ph.D.-level seminar courses, intended for doctoral
students (and qualified master's students) from Heinz College, the Machine
Learning Department, and other university departments who wish to engage
in cutting-edge research at the intersection of machine learning and
public policy. The Research Seminar in
Machine Learning and Policy (90-904, cross-listed in MLD as 10-830)
is a half-semester course which covers a broad range of MLP topics.
Special Topics in Machine Learning and Policy (90-921, cross-listed in MLD
as 10-831) is a half-semester course which will explore a single MLP topic
in detail. Topics covered include Event and Pattern Detection
(Spring 2010), Machine Learning for the Developing World (Spring
2011), Harnessing the Wisdom of Crowds (Spring 2012), and Mining Massive Datasets (Spring 2013).
I am also coordinating the new Joint Ph.D. Program in Machine Learning and
Public Policy, offered jointly by the Heinz College and Machine Learning
Department at CMU. Information about this program is available
here.
Research:
My research is focused on novel statistical and computational methods for
discovery of emerging events and other relevant patterns in complex and
massive datasets, applied to real-world policy problems ranging from
medicine and public health to law enforcement and security. Application
areas include disease surveillance (e.g., using electronically
available public health data such as hospital visits and medication sales
to automatically identify and characterize emerging outbreaks), law
enforcement (e.g., detection and prediction of crime patterns using
offense reports and 911 calls), and health care (e.g., detecting
anomalous patterns of care which significantly impact patient
outcomes).
I was recently featured in IEEE
Intelligent Systems Magazine, as one of their "ten artificial intelligence
researchers to watch". A more detailed description of my research
(updated July 2011) is available here,
and my 2011 CSD/MLD immigration course talk is available here.
*** I am currently seeking Heinz and SCS Ph.D. students for research on
the following funded projects: ***
NSF IIS-0953330, CAREER: Machine Learning and Event Detection for the
Public Good (summary) (NSF
page) (project page).
In general, my research focuses on the development of new statistical and
computational techniques for accurate and efficient pattern detection in
massive, high-dimensional datasets. While most previous data mining work
has focused on detection and classification of single records, pattern
detection extends these methods to groups of records, in order
to detect and identify patterns not visible from any individual record
alone. A key idea of our work is that pattern detection can often be
transformed into a subset scan problem, in which we search over
subsets of the data records to find those groups that are likely to
correspond to some probabilistically modeled pattern type. However, this
idea creates two main challenges: the statistical problem of evaluating
the "interestingness" of a given subset (whether it corresponds to some
specific pattern, is anomalous, etc.) and the computational problem of
efficiently searching a massive dataset for the most interesting subsets
(finding a "needle in the haystack").
Our past work has focused primarily on detection of
emerging events (e.g. outbreaks of disease) in multivariate spatial time
series data. We have developed a variety of new statistical methods which
achieve more timely and accurate event detection through better use of
spatial and temporal information, integration of multiple data streams,
and incorporation of prior knowledge.
Some current research topics include:
Extending event detection methodology to more general approaches for
pattern detection in large multivariate datasets.
Developing novel Bayesian and nonparametric approaches for more
accurate detection, characterization, and explanation of events and
patterns.
Creating new, fast algorithms for computationally efficient detection
of patterns in massive datasets.
Incorporating model learning into the event detection framework,
enabling us to distinguish between relevant and irrelevant patterns.
Incorporating active learning from user feedback, enabling us to
rapidly "zero in" on those patterns that are most relevant to an
individual user.
Integrating web-scale data sources, such as search engine queries or
information from online social networks.
Providing interactive tools for investigation, tracking, and
discovery of patterns in massive data.
Primary application areas include disease surveillance, monitoring
of water quality and food safety, detection and prediction of crime
patterns, network intrusion detection, fraud detection, and scientific
discovery. We are currently involved in the development and deployment of
several large-scale systems for health and crime surveillance. These
collaborations will provide exciting opportunities to work with real-world
data, interact with law enforcement and public health officials, and
directly contribute to the public good by improving health, safety, and
security.
Here are links to some recent papers. A complete list of publications is
available in my CV.
EVENT AND PATTERN DETECTION
Daniel B. Neill, Edward McFowland III, and Huanian Zheng. Fast subset
scan for multivariate event detection. Statistics in Medicine,
in press, 2012. Article published online: 22 NOV 2012, DOI:
10.1002/sim.5675. (link)
Daniel B. Neill. Fast subset scan for spatial pattern detection.
Journal of the Royal Statistical Society (Series B: Statistical
Methodology) 74(2): 337-360, 2012. (pdf)
Daniel B. Neill. New directions in artificial intelligence for public health
surveillance. IEEE Intelligent Systems 27(1): 56-59, 2012. (pdf)
Kan Shao, Yandong Liu, and Daniel B. Neill. A generalized fast subset
sums framework for Bayesian event detection. Proceedings of the 11th
IEEE International Conference on Data Mining, 617-625, 2011. (pdf)
Daniel B. Neill. Fast Bayesian scan statistics for multivariate event
detection and visualization. Statistics in Medicine 30(5):
455-469, 2011. (pdf)
Daniel Oliveira, Daniel B. Neill, James H. Garrett Jr., and Lucio
Soibelman. Detection of patterns in water distribution pipe breakage
using spatial scan statistics for point events in a physical network.
Journal of Computing in Civil Engineering 25(1): 21-30,
2011. (pdf)
Daniel B. Neill and Gregory F. Cooper. A multivariate Bayesian scan
statistic for early event detection and characterization. Machine
Learning 79: 261-282, 2010. (pdf)
Daniel B. Neill and Weng-Keen Wong. A tutorial on event detection.
Presented at the 15th ACM SIGKDD Conference on Knowledge Discovery and
Data Mining, 2009. (pdf)
Daniel B. Neill. An empirical comparison of spatial scan statistics for
outbreak detection. International Journal of Health Geographics 8:
20, 2009. (pdf) (open
access)
Daniel B. Neill. Expectation-based scan statistics for monitoring spatial
time series data. International Journal of Forecasting 25:
498-517, 2009. (pdf)
Daniel B. Neill, Gregory F. Cooper, Kaustav Das, Xia Jiang, and Jeff
Schneider. Bayesian network scan statistics for multivariate pattern
detection. In J. Glaz, V. Pozdnyakov, and S. Wallenstein, eds., Scan
Statistics: Methods and Applications, 221-250, 2009. (pdf)
Kaustav Das, Jeff Schneider, and Daniel B. Neill. Anomaly pattern
detection in categorical datasets. Proceedings of the 14th ACM SIGKDD
Conference on Knowledge Discovery and Data Mining, 169-176, 2008.
(pdf)
Maxim Makatchev and Daniel B. Neill. Learning outbreak regions in
Bayesian spatial scan statistics. Proceedings of the ICML/UAI/COLT
Workshop on Machine Learning for Health Care Applications, 2008.
(pdf)
Daniel B. Neill. Detection of spatial and spatio-temporal clusters.
Ph.D. thesis, Carnegie Mellon University, Department of Computer
Science, Technical Report CMU-CS-06-142, 2006.
(pdf)
Daniel B. Neill, Andrew W. Moore, and Gregory F. Cooper. A
Bayesian spatial scan statistic. In Y. Weiss, et al., eds. Advances
in Neural Information Processing Systems 18, 1003-1010, 2006.
(pdf)
Daniel B. Neill, Andrew W. Moore, Maheshkumar Sabhnani, and Kenny
Daniel. Detection of emerging space-time clusters.
Proceedings of the 11th ACM SIGKDD Conference on Knowledge Discovery
and Data Mining, 218-227, 2005.
(pdf)
Daniel B. Neill and Andrew W. Moore. Anomalous spatial cluster
detection. Proceedings of the KDD 2005 Workshop on Data Mining
Methods for Anomaly Detection, 2005.
(pdf)
FAST DETECTION ALGORITHMS
Daniel B. Neill, Andrew W. Moore, Francisco Pereira, and Tom Mitchell.
Detecting significant multidimensional spatial clusters. In L.K. Saul, et
al., eds. Advances in Neural Information Processing Systems 17,
969-976, 2005.
(pdf)
Daniel B. Neill and Andrew W. Moore. Rapid detection of
significant spatial clusters. Proceedings of the 10th ACM
SIGKDD Conference on Knowledge Discovery and Data Mining,
256-265, 2004.
(pdf)
DISEASE SURVEILLANCE
Xia Jiang, Gregory F. Cooper, and Daniel B. Neill. Generalized AMOC
curves for evaluation and improvement of event surveillance.
Proceedings of the American Medical Informatics Association Annual
Symposium, 281-285, 2009.
(pdf)
Maheshkumar R. Sabhnani, Daniel B. Neill, Andrew W. Moore, Fu-Chiang
Tsui, Michael M. Wagner, and Jeremy U. Espino. Detecting anomalous
patterns in pharmacy retail data. Proceedings of the KDD 2005
Workshop on Data Mining Methods for Anomaly Detection, 2005.
(pdf)
M. Wagner, F.-C. Tsui, J. Espino, W. Hogan, J. Hutman, J. Hersh, D. Neill,
A. Moore, G. Parks, C. Lewis, and R. Aller. A national retail data
monitor for public health surveillance. Morbidity and Mortality Weekly
Report 53: 40-42, 2004.
(pdf)
HEALTH CARE INFORMATION SYSTEMS
Sharique Hasan, George T. Duncan, Daniel B. Neill, and Rema Padman.
Automatic detection of omissions in medication lists. Journal of the
American Medical Informatics Association 18(4): 449-458, 2011.
Huanian Zheng, Rema Padman, Sharique Hasan, and Daniel B. Neill. A
comparison of collaborative filtering methods for medication
reconciliation. Proceedings of the 13th International Congress on
Medical Informatics, 2010.
(pdf)
Sharique Hasan, George T. Duncan, Daniel B. Neill, and Rema Padman.
Towards a collaborative filtering approach to medication reconciliation.
Proceedings of the American Medical Informatics Association Annual
Symposium, 288-292, 2008.
(pdf)
Christopher A. Harle, Daniel B. Neill, and Rema Padman. An information
visualization approach to classification and assessment of diabetes risk
in primary care. Proceedings of the 3rd INFORMS Workshop on Data
Mining and Health Informatics, 2008.
(pdf)
GAME THEORY
Daniel B. Neill. Cascade effects in heterogeneous
populations. Rationality and Society 17(2): 191-241, 2005.
(pdf)
Daniel B. Neill. Evolutionary stability for large populations.
Journal of Theoretical Biology 227(3): 397-401, 2004.
(pdf)
Daniel B. Neill. Evolutionary dynamics with large aggregate
shocks. Dept. of Computer Science, Technical Report CMU-CS-03-197, 2003.
(pdf)
Daniel B. Neill. Cooperation and coordination in the Turn-Taking
Dilemma. Proceedings of the Ninth Conference on Theoretical Aspects
of Rationality and Knowledge: 231-244, 2003.
(pdf)
Daniel B. Neill. Optimality under noise: higher memory
strategies for the Alternating Prisoner's Dilemma. Journal of
Theoretical Biology 211(2): 159-180, 2001.
(pdf)
NATURAL LANGUAGE PROCESSING
Paul Hsiung, Andrew Moore, Daniel Neill, and Jeff Schneider.
Alias detection in link data sets. Proceedings of the First
International Conference on Intelligence Analysis, 2005.
(pdf)
Daniel B. Neill. Fully automatic word sense induction by
semantic clustering. Cambridge University, masters thesis, M.Phil. in
Computer Speech, 2002.
(pdf)