Daniel B. Neill

Daniel B. Neill, Ph.D.

Associate Professor of Information Systems
Director, Event and Pattern Detection Laboratory
H.J. Heinz III College of Information Systems and Public Policy
Carnegie Mellon University

(effective 7/1/2018)
Associate Professor of Computer Science, Public Service, and Urban Analytics
New York University
firstname.lastname @ nyu.edu

*** PLEASE NOTE THAT THIS PAGE IS NO LONGER BEING UPDATED (AS OF JULY 2018). FOR UP-TO-DATE INFORMATION, PLEASE VISIT MY NEW NYU WEB PAGE. ***

I am an Associate Professor of Information Systems in the Heinz College at Carnegie Mellon University, where I have been the H.J. Heinz III College Dean's Career Development Professor. I also hold courtesy appointments in the Machine Learning Department and Robotics Institute in CMU's School of Computer Science, and an adjunct appointment in the Department of Biomedical Informatics at the University of Pittsburgh. I received my Ph.D. in Computer Science from CMU in 2006. Before that, I received my B.S.E. from Duke University, M.Phil. from Cambridge University, and M.S. from Carnegie Mellon. At CMU, I direct the Event and Pattern Detection Laboratory, and co-direct the Healthcare Information Technology thrust of Heinz College's iLab.

Latest News:

As of July 1st, 2018, I will be leaving CMU to join New York University, as Associate Professor of Computer Science and Public Service, in NYU's Courant Institute Department of Computer Science and Wagner School of Public Service, and Associate Professor of Urban Analytics, in NYU's Center for Urban Science and Progress. Please direct all correspondence to the NYU e-mail address above.

Our pre-syndromic surveillance project was selected as the runner-up in the Department of Homeland Security's Hidden Signals Challenge, a nationwide system design competition which focuses on detecting emerging bio-threats in real time. Here is the link to the winner announcement.

I am guest co-editor of a special issue of GeoInformatica on "Analytics for Local Events and News". Submissions are due August 15th. Please feel free to distribute this call for papers. Note that all papers should be submitted through the Springer GeoInformatica website.

Our rodent prevention work was recently featured in an article on CityLab. According to the article, "The city of Chicago is still running Neill's predictive analytics approach and has touted that it's 20 percent more effective than the traditional method of baiting rats after they've been discovered."

Our paper on Semantic Scan: Detecting Subtle, Spatially Localized Events in Text Streams was named the winner of this year's Yelp Dataset Challenge. Our approach for identifying emerging topics can be used both for public health (detecting "novel" outbreaks with rare or previously unseen symptom patterns) as well as identifying emerging regional business trends. Thanks to both Yelp and CMU for their very nice press coverage of this work!

Our crime prediction work with the Pittsburgh Bureau of Police was featured in an editorial in the 30 Sep 2016 issue of Science.

Our comprehensive review article, "Youth violence: what we know and what we need to know", was featured in a press release by the American Psychological Association. The article was published in the January 2016 issue of the APA's flagship journal, American Psychologist, and is available here.

We are grateful to the Richard King Mellon Foundation for their support of our project, "Urban Predictive Analytics for a Safer and Cleaner Pittsburgh", as part of the award, "Metro21: Knowledge-Powered Pittsburgh to Improve Urban Quality of Life". More information on this project is available here.

What can machine learning do for the healthcare industry? Here are some examples from my own work, presented as part of the UPMC Enterprises "Inspiration, Innovation, and Excellence" talk series. And here is a related summary of our lab's recent work and ongoing projects in healthcare and other domains.

Click here for more EPD Lab news updates.

Teaching:

I am currently teaching four courses at the Heinz College. Course descriptions, sample syllabi, and lecture slides can be obtained by clicking on the course names below, and current course materials are available on Blackboard.

Statistics for IT Managers (95-796) is the core statistics course for students in the Master of Information Systems Management program.

Large Scale Data Analysis for Public Policy (90-866) is a master's level course which focuses on the application of artificial intelligence and machine learning methods to real-world policy problems.

I am also teaching two Ph.D.-level seminar courses, intended for doctoral students (and qualified master's students) from Heinz College, the Machine Learning Department, and other university departments who wish to engage in cutting-edge research at the intersection of machine learning and public policy. The Research Seminar in Machine Learning and Policy (90-904, cross-listed in MLD as 10-830) is a half-semester course which covers a broad range of MLP topics. Special Topics in Machine Learning and Policy (90-921, cross-listed in MLD as 10-831) is a half-semester course which will explore a single MLP topic in detail. Topics covered include Event and Pattern Detection (Spring 2010 and Spring 2014), Machine Learning for the Developing World (Spring 2011), Harnessing the Wisdom of Crowds (Spring 2012), and Mining Massive Datasets (Spring 2013).

I also direct the Joint Ph.D. Program in Machine Learning and Public Policy, offered jointly by the Heinz College and Machine Learning Department at CMU. Information about this program is available here.

Research:

My research is focused on novel statistical and computational methods for discovery of emerging events and other relevant patterns in complex and massive datasets, applied to real-world policy problems ranging from medicine and public health to law enforcement and security. Application areas include disease surveillance (e.g., using electronically available public health data such as hospital visits and medication sales to automatically identify and characterize emerging outbreaks), law enforcement (e.g., detection and prediction of crime patterns using offense reports and 911 calls), health care (e.g., detecting anomalous patterns of care which significantly impact patient outcomes), and urban analytics (e.g., helping city governments to predict and proactively respond to emerging patterns of citizen needs).

A more detailed description of my research, updated December 2015, is available here, and a complete list of publications is available in my CV. Also, please see our new Event and Pattern Detection Laboratory web site, http://epdlab.heinz.cmu.edu, for the most up to date descriptions of our ongoing research projects, and links to our publications and presentations.

My research has been partially supported by the following grants from the National Science Foundation:

NSF IIS-0953330, CAREER: Machine Learning and Event Detection for the Public Good (summary) (NSF page) (project page).

NSF IIS-0916345, Fast Subset Scan for Anomalous Pattern Detection (summary) (NSF page) (project page).

NSF IIS-0911032, Discovering Complex Anomalous Patterns (summary) (NSF page) (project page).

I also gratefully acknowledge funding support from a UPMC Healthcare Technology Innovation Grant, NSF Graduate Research Fellowship, the John D. and Catherine T. MacArthur Foundation, Richard King Mellon Foundation, and Disruptive Health Technology Institute. Any opinions, findings, and conclusions or recommendations expressed in this material are those of the author(s) and do not necessarily reflect the views of the National Science Foundation, UPMC, DHTI, Richard King Mellon Foundation, or MacArthur Foundation.

Below are links to some recent papers, organized by topic. Additional papers and presentations are accessible through the Event and Pattern Detection Laboratory. A complete list of publications is available in my CV.

EVENT AND PATTERN DETECTION- SUBSET SCAN

Daniel B. Neill. Subset scanning for event and pattern detection. In S. Shekhar and H. Xiong, eds., Encyclopedia of GIS, 2nd ed., Springer, 2017, pp. 2218-2228. (pdf)

Skyler Speakman, Sriram Somanchi, Edward McFowland III, and Daniel B. Neill. Penalized fast subset scanning. Journal of Computational and Graphical Statistics, 25(2): 382-404, 2016. Selected for "Best of JCGS" invited session by the journal's editor in chief. (pdf).

Skyler Speakman, Edward McFowland III, and Daniel B. Neill. Scalable detection of anomalous patterns with connectivity constraints. Journal of Computational and Graphical Statistics 24(4): 1014-1033, 2015. (pdf)

Edward McFowland III, Skyler Speakman, and Daniel B. Neill. Fast generalized subset scan for anomalous pattern detection. Journal of Machine Learning Research, 14: 1533-1561, 2013. (pdf)

Skyler Speakman, Yating Zhang, and Daniel B. Neill. Dynamic pattern detection with temporal consistency and connectivity constraints. Proc. 13th IEEE International Conference on Data Mining, 697-706, 2013. (pdf)

Daniel B. Neill, Edward McFowland III, and Huanian Zheng. Fast subset scan for multivariate event detection. Statistics in Medicine 32: 2185-2208, 2013. (pdf)

Daniel B. Neill. Fast subset scan for spatial pattern detection. Journal of the Royal Statistical Society (Series B: Statistical Methodology) 74(2): 337-360, 2012. (pdf)

EVENT AND PATTERN DETECTION- TWITTER EVENT DETECTION

Feng Chen and Daniel B. Neill. Human rights event detection from heterogeneous social media graphs. Big Data 3(1): 34-40, 2015. (pdf)

Feng Chen and Daniel B. Neill. Non-parametric scan statistics for event detection and forecasting in heterogeneous social media graphs. Proceedings of the 20th ACM SIGKDD Conference on Knowledge Discovery and Data Mining, 1166-1175, 2014. (pdf)

EVENT AND PATTERN DETECTION- BAYESIAN SCAN STATISTICS

Daniel B. Neill. Bayesian scan statistics. In J. Glaz and M. V. Koutras, eds., Handbook of Scan Statistics, 2019, in press.

Kan Shao, Yandong Liu, and Daniel B. Neill. A generalized fast subset sums framework for Bayesian event detection. Proceedings of the 11th IEEE International Conference on Data Mining, 617-625, 2011. (pdf)

Daniel B. Neill. Fast Bayesian scan statistics for multivariate event detection and visualization. Statistics in Medicine 30(5): 455-469, 2011. (pdf)

Daniel B. Neill and Gregory F. Cooper. A multivariate Bayesian scan statistic for early event detection and characterization. Machine Learning 79: 261-282, 2010. (pdf)

Daniel B. Neill, Gregory F. Cooper, Kaustav Das, Xia Jiang, and Jeff Schneider. Bayesian network scan statistics for multivariate pattern detection. In J. Glaz, V. Pozdnyakov, and S. Wallenstein, eds., Scan Statistics: Methods and Applications, 221-250, 2009. (pdf)

Maxim Makatchev and Daniel B. Neill. Learning outbreak regions in Bayesian spatial scan statistics. Proceedings of the ICML/UAI/COLT Workshop on Machine Learning for Health Care Applications, 2008. (pdf)

Daniel B. Neill, Andrew W. Moore, and Gregory F. Cooper. A Bayesian spatial scan statistic. In Y. Weiss, et al., eds. Advances in Neural Information Processing Systems 18, 1003-1010, 2006. (pdf)

EVENT AND PATTERN DETECTION- SPATIAL SCAN STATISTICS

Daniel Oliveira, Daniel B. Neill, James H. Garrett Jr., and Lucio Soibelman. Detection of patterns in water distribution pipe breakage using spatial scan statistics for point events in a physical network. Journal of Computing in Civil Engineering 25(1): 21-30, 2011. (pdf)

Daniel B. Neill. An empirical comparison of spatial scan statistics for outbreak detection. International Journal of Health Geographics 8: 20, 2009. (pdf) (open access)

Daniel B. Neill. Expectation-based scan statistics for monitoring spatial time series data. International Journal of Forecasting 25: 498-517, 2009. (pdf)

Daniel B. Neill, Andrew W. Moore, Maheshkumar Sabhnani, and Kenny Daniel. Detection of emerging space-time clusters. Proceedings of the 11th ACM SIGKDD Conference on Knowledge Discovery and Data Mining, 218-227, 2005. (pdf)

Daniel B. Neill and Andrew W. Moore. Anomalous spatial cluster detection. Proceedings of the KDD 2005 Workshop on Data Mining Methods for Anomaly Detection, 2005. (pdf)

Daniel B. Neill, Andrew W. Moore, Francisco Pereira, and Tom Mitchell. Detecting significant multidimensional spatial clusters. In L.K. Saul, et al., eds. Advances in Neural Information Processing Systems 17, 969-976, 2005. (pdf)

Daniel B. Neill and Andrew W. Moore. Rapid detection of significant spatial clusters. Proceedings of the 10th ACM SIGKDD Conference on Knowledge Discovery and Data Mining, 256-265, 2004. (pdf)

EVENT AND PATTERN DETECTION- GENERAL

Feng Chen, Petko Bogdanov, Daniel B. Neill, and Ambuj K. Singh. Anomalous and significant subgraph detection in attributed networks. Tutorial presented at IEEE International Conference on Big Data, 2016. (part 1) (part 2)

Daniel B. Neill and Weng-Keen Wong. A tutorial on event detection. Presented at the 15th ACM SIGKDD Conference on Knowledge Discovery and Data Mining, 2009. (pdf)

Kaustav Das, Jeff Schneider, and Daniel B. Neill. Anomaly pattern detection in categorical datasets. Proceedings of the 14th ACM SIGKDD Conference on Knowledge Discovery and Data Mining, 169-176, 2008. (pdf)

Daniel B. Neill. Detection of spatial and spatio-temporal clusters. Ph.D. thesis, Carnegie Mellon University, Department of Computer Science, Technical Report CMU-CS-06-142, 2006. (pdf)

BAYESIAN NONPARAMETRICS / GAUSSIAN PROCESSES

William Herlands, Edward McFowland III, Andrew Gordon Wilson, and Daniel B. Neill. Gaussian process subset scanning for anomalous pattern detection in non-iid data. Proc. 21st International Conference on Artificial Intelligence and Statistics, PMLR 84: 425-434, 2018. (pdf)

William Herlands, Andrew Gordon Wilson, Hannes Nickisch, Seth Flaxman, Daniel B. Neill, Willem van Panhuis, and Eric P. Xing. Scalable Gaussian processes for characterizing multidimensional change surfaces. Proc. 19th International Conference on Artificial Intelligence and Statistics, PMLR 51: 1013-1021, 2016. (pdf)

Seth R. Flaxman, Daniel B. Neill, and Alexander J. Smola. Gaussian processes for independence tests with non-iid data in causal inference. ACM Transactions on Intelligent Systems and Technology, 7(2): 22:1-22:23, 2015. (pdf)

Seth R. Flaxman, Andrew Gordon Wilson, Daniel B. Neill, Hannes Nickisch, and Alexander J. Smola. Fast Kronecker inference in Gaussian processes with non-Gaussian likelihoods. Proc. 32nd International Conference on Machine Learning, PMLR 37: 607-616, 2015. (pdf)

PUBLIC HEALTH / DISEASE SURVEILLANCE

Daniel B. Neill and William Herlands. Machine learning for drug overdose surveillance. Journal of Technology in Human Services 36(1): 8-14, 2018. Presented at Bloomberg Data for Good Exchange Conference, 2017. (pdf) (link to journal version)

Sriram Somanchi and Daniel B. Neill. Graph structure learning from unlabeled data for early outbreak detection. IEEE Intelligent Systems 32(2): 80-84, 2017. (pdf) (extended version on arXiv)

Zachary Faigen, Lana Deyneka, Amy Ising, Daniel B. Neill, Mike Conway, Geoffrey Fairchild, Julia Gunn, David Swenson, Ian Painter, Lauren Johnson, Chris Kiley, Laura Streichert, and Howard Burkom. Cross-disciplinary consultancy to bridge public health technical needs and analytic developers: asyndromic surveillance use case. Online Journal of Public Health Informatics, 7(3):e228, 2015. (pdf)

Daniel B. Neill. New directions in artificial intelligence for public health surveillance. IEEE Intelligent Systems 27(1): 56-59, 2012. (pdf)

Xia Jiang, Gregory F. Cooper, and Daniel B. Neill. Generalized AMOC curves for evaluation and improvement of event surveillance. Proceedings of the American Medical Informatics Association Annual Symposium, 281-285, 2009. (pdf)

Maheshkumar R. Sabhnani, Daniel B. Neill, Andrew W. Moore, Fu-Chiang Tsui, Michael M. Wagner, and Jeremy U. Espino. Detecting anomalous patterns in pharmacy retail data. Proceedings of the KDD 2005 Workshop on Data Mining Methods for Anomaly Detection, 2005. (pdf)

M. Wagner, F.-C. Tsui, J. Espino, W. Hogan, J. Hutman, J. Hersh, D. Neill, A. Moore, G. Parks, C. Lewis, and R. Aller. A national retail data monitor for public health surveillance. Morbidity and Mortality Weekly Report 53: 40-42, 2004. (pdf)

HEALTH CARE INFORMATION SYSTEMS

Sriram Somanchi, Daniel B. Neill, and Anil V. Parwani. Discovering anomalous patterns in large digital pathology images. Statistics in Medicine, 2018, in press. (link)

Daniel Gartner, Rainer Kolisch, Daniel B. Neill, and Rema Padman. Machine learning approaches for early DRG classification and resource allocation. INFORMS Journal of Computing 27(4): 718-734, 2015. (pdf) (supplementary material)

Daniel B. Neill. Using artificial intelligence to improve hospital inpatient care. IEEE Intelligent Systems 28(2): 92-95, 2013. (pdf)

Sriram Somanchi and Daniel B. Neill. Discovering anomalous patterns in large digital pathology images. Proc. 8th INFORMS Workshop on Data Mining and Health Informatics, 2013. (pdf)

Christopher A. Harle, Daniel B. Neill, and Rema Padman. Information visualization for chronic disease risk assessment. IEEE Intelligent Systems 27(6): 81-85, 2012. (pdf)

Sharique Hasan, George T. Duncan, Daniel B. Neill, and Rema Padman. Automatic detection of omissions in medication lists. Journal of the American Medical Informatics Association 18(4): 449-458, 2011. (pdf)

Huanian Zheng, Rema Padman, Sharique Hasan, and Daniel B. Neill. A comparison of collaborative filtering methods for medication reconciliation. Proceedings of the 13th International Congress on Medical Informatics, 2010. (pdf)

Sharique Hasan, George T. Duncan, Daniel B. Neill, and Rema Padman. Towards a collaborative filtering approach to medication reconciliation. Proceedings of the American Medical Informatics Association Annual Symposium, 288-292, 2008. (pdf)

Christopher A. Harle, Daniel B. Neill, and Rema Padman. An information visualization approach to classification and assessment of diabetes risk in primary care. Proceedings of the 3rd INFORMS Workshop on Data Mining and Health Informatics, 2008. (pdf)

YOUTH VIOLENCE

Brad J. Bushman, Katherine Newman, Sandra L. Calvert, Geraldine Downey, Mark Dredze, Michael Gottfredson, Nina G. Jablonski, Ann S. Masten, Calvin Morrill, Daniel B. Neill, Daniel Romer, and Daniel W. Webster. Youth violence: what we know and what we need to know. American Psychologist 71(1): 17-39, 2016. (pdf) (APA press release)

GAME THEORY

Daniel B. Neill. Cascade effects in heterogeneous populations. Rationality and Society 17(2): 191-241, 2005. (pdf)

Daniel B. Neill. Evolutionary stability for large populations. Journal of Theoretical Biology 227(3): 397-401, 2004. (pdf)

Daniel B. Neill. Evolutionary dynamics with large aggregate shocks. Dept. of Computer Science, Technical Report CMU-CS-03-197, 2003. (pdf)

Daniel B. Neill. Cooperation and coordination in the Turn-Taking Dilemma. Proceedings of the Ninth Conference on Theoretical Aspects of Rationality and Knowledge: 231-244, 2003. (pdf)

Daniel B. Neill. Optimality under noise: higher memory strategies for the Alternating Prisoner's Dilemma. Journal of Theoretical Biology 211(2): 159-180, 2001. (pdf)

NATURAL LANGUAGE PROCESSING

Paul Hsiung, Andrew Moore, Daniel Neill, and Jeff Schneider. Alias detection in link data sets. Proceedings of the First International Conference on Intelligence Analysis, 2005. (pdf)

Daniel B. Neill. Fully automatic word sense induction by semantic clustering. Cambridge University, masters thesis, M.Phil. in Computer Speech, 2002. (pdf)

Links:

Event and Pattern Detection Laboratory
My Poetry
Google
CNN.com
The Onion
Arts and Letters Daily