.tie5Roanl) 400 times.
.tie5Roanl) was chosen to be representative of a strong 10-character password.
read.tablecommand can be used to read the data into a structure called a
X <-read.table( 'DSL-StrongPasswordData.txt', header=TRUE )
s057). Even though the data set contains 51 subjects, the identifiers do not range from
s051; subjects have been assigned unique IDs across a range of keystroke experiments, and not every subject participated in every experiment. For instance, Subject 1 did not perform the password typing task and so
s001does not appear in the data set. The second column, sessionIndex, is the session in which the password was typed (ranging from 1 to 8). The third column, rep, is the repetition of the password within the session (ranging from 1 to 50).
H.keydesignate a hold time for the named key (i.e., the time from when key was pressed to when it was released). Column names of the form
DD.key1.key2designate a keydown-keydown time for the named digraph (i.e., the time from when key1 was pressed to when key2 was pressed). Column names of the form
UD.key1.key2designate a keyup-keydown time for the named digraph (i.e., the time from when key1 was released to when key2 was pressed). Note that
UDtimes can be negative, and that
UDtimes add up to
subject sessionIndex rep H.period DD.period.t UD.period.t ... s002 1 1 0.1491 0.3979 0.2488 ...The example presents typing data for subject 2, session 1, repetition 1. The
periodkey was held down for 0.1491 seconds (149.1 milliseconds); the time between pressing the
periodkey and the
tkey (keydown-keydown time) was 0.3979 seconds; the time between releasing the
periodand pressing the
tkey (keyup-keydown time) was 0.2488 seconds; and so on.
www.r-project.org)—demonstrates how to use the data to evaluate three anomaly detectors (called Euclidean, Manhattan, and Mahalanobis). ROCR for generating ROC curves .
http://www.r-project.org). The R statistical-programming environment is a general programming language with many functions and packages for conducting a range of statistical analyses and data visualizations. It is available for most modern operating systems, and it is free and open-source. We developed and tested our evaluation script with R version 2.6.2, but we expect that it will work with similar versions of R.
install.packages( 'ROCR' )
eer.mean eer.sd Euclidean 0.171 0.095 Manhattan 0.153 0.092 Mahalanobis 0.110 0.065Note that these results are fractional rates between 0.0 and 1.0 (not percentages between 0% and 100%). They match the average equal-error rates and standard deviations for the detectors from Table 2 of our original paper (and reproduced in the table of results, below). By running this script successfully, you will have replicated our evaluation methodology and reproduced our results for these three detectors.
newScore, it could be evaluated simply by adding these two functions to the
detectorSetlist of detectors:
detectorSet = list( NewDetector = list( train = newTrain, score = newScore ) );Our intent in sharing the data is for the password-timing tables to be used to evaluate a range of anomaly detectors so that the results of the evaluations can be soundly compared, using the same data and the same evaluation procedure. Consequently, we encourage other researchers to use our evaluation script to evaluate new and better anomaly-detection strategies for keystroke dynamics.
|Detector||Average Equal-Error Rate (stddev)|
|Manhattan (scaled)||0.0962 (0.0694)|
|Nearest Neighbor (Mahalanobis)||0.0996 (0.0642)|
|Outlier Count (z-score)||0.1022 (0.0767)|
|SVM (one-class)||0.1025 (0.0650)|
|Mahalanobis (normed)||0.1101 (0.0645)|
|Manhattan (filter)||0.1360 (0.0828)|
|Neural Network (auto-assoc)||0.1614 (0.0797)|
|Euclidean (normed)||0.2153 (0.1187)|
|Fuzzy Logic||0.2213 (0.1051)|
|k Means||0.3722 (0.1391)|
|Neural Network (standard)||0.8283 (0.1483)|