Daniel Daniel B. Neill
Assistant Professor of Information Systems
H.J. Heinz III College
School of Public Policy and Management
School of Information Systems and Management
Carnegie Mellon University
Hamburg Hall #2105B, x8-3885

neill @ cs.cmu.edu

I am an Assistant Professor of Information Systems in the Heinz College at Carnegie Mellon University. I also hold courtesy appointments in the Machine Learning Department and Robotics Institute in CMU's School of Computer Science, and an adjunct appointment in the Department of Biomedical Informatics at the University of Pittsburgh. I received my Ph.D. in Computer Science from CMU in 2006. Before that, I received my B.S.E. in Electrical Engineering and Computer Science from Duke University, M.Phil. in Computer Speech from Cambridge University, and M.S. in Computer Science from Carnegie Mellon.


Teaching:

I am currently teaching four courses at the Heinz College. Course descriptions, sample syllabi, and lecture slides can be obtained by clicking on the course names below, and current course materials are available on Blackboard.

Statistics for IT Managers (95-796) is the core statistics course for students in the Master of Information Systems Management program.

Artificial Intelligence Tools for Policy (90-866) is a new elective course that I developed and taught for the first time in Spring 2008. It is geared primarily for students in the Master of Science in Public Policy and Management program, but is open to any student who is interested in the application of artificial intelligence and machine learning to real-world policy problems. No previous background in artificial intelligence is required.

I am also teaching two Ph.D.-level seminar courses, intended for doctoral students (and qualified master's students) from Heinz College, the Machine Learning Department, and other university departments who wish to engage in cutting-edge research at the intersection of machine learning and public policy. The Research Seminar in Machine Learning and Policy (90-904, cross-listed in MLD as 10-830) is a half-semester course which covers a broad range of MLP topics. Special Topics in Machine Learning and Policy (90-921, cross-listed in MLD as 10-831) is a half-semester course which will explore a single MLP topic in detail. Anticipated future topics are Event and Pattern Detection (Spring 2010) and Machine Learning for the Developing World (Spring 2011).

I am also coordinating the new Joint Ph.D. Program in Machine Learning and Public Policy, offered jointly by the Heinz School and Machine Learning Department at CMU. Information about this program is available here.


Research:

My research interests include pattern detection, machine learning, data mining, algorithms, biosurveillance, and health care information systems. I am currently researching new machine learning methods and fast algorithms for pattern detection in massive datasets. One major application of this work is the development of systems for early detection of emerging outbreaks of disease. A more detailed description of my research is available here, and my 2009 CSD/MLD IC talk is available here.

*** I am currently seeking Heinz and SCS Ph.D. students for research on the following NSF-funded projects: ***

NSF IIS-0916345: Fast Subset Scan for Anomalous Pattern Detection (summary) (NSF page).

NSF IIS-0911032: Discovering Complex Anomalous Patterns (summary) (NSF page).
This research project will be conducted jointly with Artur Dubrawski (CMU), Jeff Schneider (CMU), Greg Cooper (Pitt), and Gilles Clermont (Pitt).

In general, my research focuses on the development of new statistical and computational techniques for accurate and efficient pattern detection in massive, high-dimensional datasets. While most previous data mining work has focused on detection and classification of single records, pattern detection extends these methods to groups of records, in order to detect and identify patterns not visible from any individual record alone. A key idea of our work is that pattern detection can often be transformed into a subset scan problem, in which we search over subsets of the data records to find those groups that are likely to correspond to some probabilistically modeled pattern type. However, this idea creates two main challenges: the statistical problem of evaluating the "interestingness" of a given subset (whether it corresponds to some specific pattern, is anomalous, etc.) and the computational problem of efficiently searching a massive dataset for the most interesting subsets (finding a "needle in the haystack").

Our past work has focused primarily on detection of emerging events (e.g. outbreaks of disease) in multivariate spatial time series data. We have developed a variety of new statistical methods which achieve more timely and accurate event detection through better use of spatial and temporal information, integration of multiple data streams, and incorporation of prior knowledge.

Some current research topics include: Primary application areas include disease surveillance (using electronic health data such as hospital visits and medication sales to detect and characterize emerging outbreaks), monitoring of water quality and food safety, detection and prediction of crime patterns, network intrusion detection, fraud detection, and scientific discovery. We are currently involved in the development and deployment of several large-scale systems for health and crime surveillance. These collaborations will provide exciting opportunities to work with real-world data, interact with law enforcement and public health officials, and directly contribute to the public good by improving health, safety, and security.


Here are links to some recent papers. A complete list of publications is available in my CV.

EVENT AND PATTERN DETECTION

Daniel B. Neill and Weng-Keen Wong. A tutorial on event detection. Presented at the 15th ACM SIGKDD Conference on Knowledge Discovery and Data Mining, 2009. (pdf)

Daniel B. Neill. An empirical comparison of spatial scan statistics for outbreak detection. International Journal of Health Geographics 8: 20, 2009. (pdf) (open access)

Daniel B. Neill. Expectation-based scan statistics for monitoring spatial time series data. International Journal of Forecasting 25: 498-517, 2009. (pdf)

Daniel B. Neill and Gregory F. Cooper. A multivariate Bayesian scan statistic for early event detection and characterization. Machine Learning, 2009, e-pub ahead of print, DOI 10.1007/s10994-009-5144-4. (pdf)

Daniel B. Neill, Gregory F. Cooper, Kaustav Das, Xia Jiang, and Jeff Schneider. Bayesian network scan statistics for multivariate pattern detection. In J. Glaz, V. Pozdnyakov, and S. Wallenstein, eds., Scan Statistics: Methods and Applications, 221-250, 2009. (pdf)

Kaustav Das, Jeff Schneider, and Daniel B. Neill. Anomaly pattern detection in categorical datasets. Proceedings of the 14th ACM SIGKDD Conference on Knowledge Discovery and Data Mining, 169-176, 2008. (pdf)

Maxim Makatchev and Daniel B. Neill. Learning outbreak regions in Bayesian spatial scan statistics. Proceedings of the ICML/UAI/COLT Workshop on Machine Learning for Health Care Applications, 2008. (pdf)

Daniel B. Neill. Detection of spatial and spatio-temporal clusters. Ph.D. thesis, Carnegie Mellon University, Department of Computer Science, Technical Report CMU-CS-06-142, 2006. (pdf)

Daniel B. Neill, Andrew W. Moore, and Gregory F. Cooper. A Bayesian spatial scan statistic. In Y. Weiss, et al., eds. Advances in Neural Information Processing Systems 18, 1003-1010, 2006. (pdf)

Daniel B. Neill, Andrew W. Moore, Maheshkumar Sabhnani, and Kenny Daniel. Detection of emerging space-time clusters. Proceedings of the 11th ACM SIGKDD Conference on Knowledge Discovery and Data Mining, 218-227, 2005. (pdf)

Daniel B. Neill and Andrew W. Moore. Anomalous spatial cluster detection. Proceedings of the KDD 2005 Workshop on Data Mining Methods for Anomaly Detection, 2005. (pdf)

FAST DETECTION ALGORITHMS

Daniel B. Neill, Andrew W. Moore, Francisco Pereira, and Tom Mitchell. Detecting significant multidimensional spatial clusters. In L.K. Saul, et al., eds. Advances in Neural Information Processing Systems 17, 969-976, 2005. (pdf)

Daniel B. Neill and Andrew W. Moore. Rapid detection of significant spatial clusters. Proceedings of the 10th ACM SIGKDD Conference on Knowledge Discovery and Data Mining, 256-265, 2004. (pdf)

DISEASE SURVEILLANCE

Maheshkumar R. Sabhnani, Daniel B. Neill, Andrew W. Moore, Fu-Chiang Tsui, Michael M. Wagner, and Jeremy U. Espino. Detecting anomalous patterns in pharmacy retail data. Proceedings of the KDD 2005 Workshop on Data Mining Methods for Anomaly Detection, 2005. (pdf)

M. Wagner, F.-C. Tsui, J. Espino, W. Hogan, J. Hutman, J. Hersh, D. Neill, A. Moore, G. Parks, C. Lewis, and R. Aller. A national retail data monitor for public health surveillance. Morbidity and Mortality Weekly Report 53: 40-42, 2004. (pdf)

HEALTH CARE INFORMATION SYSTEMS

Sharique Hasan, George T. Duncan, Daniel B. Neill, and Rema Padman. Towards a collaborative filtering approach to medication reconciliation. Proceedings of the American Medical Informatics Association Annual Symposium, 288-292, 2008. (pdf)

Christopher A. Harle, Daniel B. Neill, and Rema Padman. An information visualization approach to classification and assessment of diabetes risk in primary care. Proceedings of the 3rd INFORMS Workshop on Data Mining and Health Informatics, 2008. (pdf)

GAME THEORY

Daniel B. Neill. Cascade effects in heterogeneous populations. Rationality and Society 17(2): 191-241, 2005. (pdf)

Daniel B. Neill. Evolutionary stability for large populations. Journal of Theoretical Biology 227(3): 397-401, 2004. (pdf)

Daniel B. Neill. Evolutionary dynamics with large aggregate shocks. Dept. of Computer Science, Technical Report CMU-CS-03-197, 2003. (pdf)

Daniel B. Neill. Cooperation and coordination in the Turn-Taking Dilemma. Proceedings of the Ninth Conference on Theoretical Aspects of Rationality and Knowledge: 231-244, 2003. (pdf)

Daniel B. Neill. Optimality under noise: higher memory strategies for the Alternating Prisoner's Dilemma. Journal of Theoretical Biology 211(2): 159-180, 2001. (pdf)

NATURAL LANGUAGE PROCESSING

Paul Hsiung, Andrew Moore, Daniel Neill, and Jeff Schneider. Alias detection in link data sets. Proceedings of the First International Conference on Intelligence Analysis, 2005. (pdf)

Daniel B. Neill. Fully automatic word sense induction by semantic clustering. Cambridge University, masters thesis, M.Phil. in Computer Speech, 2002. (pdf)


Links:

My Poetry
Google
CNN.com
The Onion
Arts and Letters Daily