Experiments in CyberSpace

Two kinds of experiments are being run in CyberSpace: the ones by the attacking black hats, and the ones by the defending whites. The black-hat experiments need only result in demonstrations -- deployed worms, viruses and other forms of malicious activity that may work only partially, yet remain effective as black-hat goals. White-hat experiments, that underpin defenses, are typically far less effective. That the black-hats enjoy even a modestly continuing success is testimony to the relative failure of the white-hats. It is the goal of this work to change that. The research addresses the issues of metrics, measurement, reference data sets, and experimental computer science in computer security and intrusion detection. In particular:

Metrics for gauging the effectiveness of detection algorithms. Common metrics are presently not available in the detection community; we are providing a basis set of metrics that everyone can use. These are being drawn from the extensive literature and practice of reliability.
Reference data sets; these are gold standards, with calibrated ground truth, to be shared among producers and consumers of detection technologies. These are generated by a synthesizer, to be stored online for anyone to use, particularly for replication of experiments, and for determining extent of improvement when detectors are modified to compensate for measured weaknesses.
A data synthesizer/generator for producing reference and customized data sets used to calibrate detector algorithms on a common basis. This is a software system, being developed over the course of the project, and employed in analysis of detection algorithms.
Automated evaluation tools and analyses for achieving rigorous precision. One of the difficulties in conducting experimental evaluations is that there are no standards to follow, with the consequence that when different people evaluate in different ways, common comparisons cannot be made. The evaluation tools produced here will assure a uniform and consistent approach to measurement and evaluation across the developer and consumer communities.

This material is based upon work supported by the National Science Foundation under Grant No. 0430474. Any opinions, findings, and conclusions or recommendations expressed in this material are those of the author(s) and do not necessarily reflect the views of the National Science Foundation.