You can download R from the webpage for the R Project for Statistical
Computing (
http://www.r-project.org).
The R statistical-programming environment is a general programming
language with many functions and packages for conducting a range of
statistical analyses and data visualizations. It is available for
most modern operating systems, and it is free and open-source. We
developed and tested our evaluation script with R version 2.6.2, but
we expect that it will work with similar versions of R.
If you are not familiar with R, there are many tutorials and
references available online. The following is a collection of some
that we have used, or that have been recommended to us:
Once you have installed R and have become familiar with how to use it,
the next step is to install an additional package (called
ROCR) that is necessary
for running our evaluation. The R project maintainers have organized
a large collection of packages, and have made it easy to download and
install these packages. The necessary package can be installed with a
single R command:
install.packages( 'ROCR' )
The
ROCR package [2] provides a set of functions that help with the
analysis of the performance of anomaly-detection algorithms.
Specifically, we use the package to generate ROC curves based on the
output of each anomaly detector's scoring function, and then use the
ROC curves to calculate the detector's equal-error rates.
The final step is to download and install the
evaluation script from this webpage,
and to place it in the same directory as the data in
fixed-width format. To run the
script, use the R command
source:
source('evaluation-script.R')
The script first loads libraries that contain functionality used in
the evaluation. The next action the script takes is to define
training and scoring functions for the three anomaly detectors (i.e.,
the Euclidean, Manhattan, and Mahalanobis detectors). Then, the
script defines a function for calculating the equal-error rate of a
detector based on its output, and a test function that evaluates how
well a given detector can discriminate a given subject from the rest.
Finally, the script loads the typing data, and uses the previously
defined functions to calculate the average equal error rate of each
detector across all subjects.
If you have installed R correctly, installed the appropriate packages,
and run the evaluation script successfully, it should print
information with which you can monitor the progress of the evaluation.
Eventually, it should tally and print the following results for the
three anomaly detectors:
eer.mean eer.sd
Euclidean 0.171 0.095
Manhattan 0.153 0.092
Mahalanobis 0.110 0.065
Note that these results are fractional rates between 0.0 and 1.0 (not
percentages between 0% and 100%). They match the average equal-error
rates and standard deviations for the detectors from Table 2 of our
original paper (and reproduced in the table of results, below). By
running this script successfully, you will have replicated our
evaluation methodology and reproduced our results for these three
detectors.