Experiments in CyberSpace
Two kinds of experiments are being run in CyberSpace: the ones
by the attacking black hats, and the ones by the defending whites.
The black-hat experiments need only result in demonstrations --
deployed worms, viruses and other forms of malicious activity that may
work only partially, yet remain effective as black-hat goals.
White-hat experiments, that underpin defenses, are
typically far less effective. That the black-hats enjoy even a
modestly continuing success is testimony to the relative failure of
the white-hats. It is the goal of this work to change that.
The research addresses the issues of metrics,
measurement, reference data sets, and experimental computer science in
computer security and intrusion detection. In particular:
This material is based upon work supported by the National Science
Foundation under Grant No. 0430474.
Any opinions, findings, and conclusions or recommendations expressed in this
material are those of the author(s) and do not necessarily reflect the views
of the National Science Foundation.
- Metrics for gauging the effectiveness of detection
algorithms. Common metrics are presently not available in the
detection community; we are providing a basis set of metrics that
everyone can use. These are being drawn from the extensive literature
and practice of reliability.
- Reference data sets; these are gold standards, with calibrated
ground truth, to be shared among producers and consumers of detection
technologies. These are generated by a synthesizer, to be stored
online for anyone to use, particularly for replication of experiments,
and for determining extent of improvement when detectors are modified
to compensate for measured weaknesses.
- A data synthesizer/generator for producing reference and
customized data sets used to calibrate detector algorithms on a common
basis. This is a software system, being developed over the
course of the project, and employed in analysis of detection
- Automated evaluation tools and analyses for achieving rigorous
precision. One of the difficulties in conducting experimental
evaluations is that there are no standards to follow, with the
consequence that when different people evaluate in different ways,
common comparisons cannot be made. The evaluation tools produced here
will assure a uniform and consistent approach to measurement and
evaluation across the developer and consumer communities.