.tie5Roanl
) 400 times.
DSL-StrongPasswordData.txt
(fixed-width format)
DSL-StrongPasswordData.csv
(comma-separated-value format)
DSL-StrongPasswordData.xls
(Excel format)
.tie5Roanl
) was chosen to be representative of a strong
10-character password.read.table
command can be used to read the data into a
structure called a data.frame
: X <-read.table( 'DSL-StrongPasswordData.txt', header=TRUE )
s002
or
s057
). Even though the data set contains 51 subjects,
the identifiers do not range from s001
to s051
; subjects have been assigned unique IDs across a
range of keystroke experiments, and not every subject participated in
every experiment. For instance, Subject 1 did not perform the
password typing task and so s001
does not appear in the
data set. The second column, sessionIndex, is the session in
which the password was typed (ranging from 1 to 8). The third
column, rep, is the repetition of the password within the
session (ranging from 1 to 50).H.key
designate a hold time for the named key (i.e., the time from when
key was pressed to when it was released). Column names of the
form DD.key1.key2
designate a
keydown-keydown time for the named digraph (i.e., the time from when
key1 was pressed to when key2 was pressed). Column
names of the form UD.key1.key2
designate a
keyup-keydown time for the named digraph (i.e., the time from when
key1 was released to when key2 was pressed). Note that
UD
times can be negative, and that H
times
and UD
times add up to DD
times.subject sessionIndex rep H.period DD.period.t UD.period.t ... s002 1 1 0.1491 0.3979 0.2488 ...The example presents typing data for subject 2, session 1, repetition 1. The
period
key was held down for 0.1491 seconds
(149.1 milliseconds); the time between pressing the
period
key and the t
key (keydown-keydown
time) was 0.3979 seconds; the time between releasing the
period
and pressing the t
key (keyup-keydown
time) was 0.2488 seconds; and so on.
www.r-project.org
)—demonstrates how to use the data to evaluate three anomaly
detectors (called Euclidean, Manhattan, and Mahalanobis).
Note that this script depends on the R package ROCR for generating ROC
curves [2].http://www.r-project.org
).
The R statistical-programming environment is a general programming
language with many functions and packages for conducting a range of
statistical analyses and data visualizations. It is available for
most modern operating systems, and it is free and open-source. We
developed and tested our evaluation script with R version 2.6.2, but
we expect that it will work with similar versions of R.install.packages( 'ROCR' )
source
:source('evaluation-script.R')
eer.mean eer.sd Euclidean 0.171 0.095 Manhattan 0.153 0.092 Mahalanobis 0.110 0.065Note that these results are fractional rates between 0.0 and 1.0 (not percentages between 0% and 100%). They match the average equal-error rates and standard deviations for the detectors from Table 2 of our original paper (and reproduced in the table of results, below). By running this script successfully, you will have replicated our evaluation methodology and reproduced our results for these three detectors.
newTrain
and newScore
, it could be
evaluated simply by adding these two functions to the
detectorSet
list of detectors:
detectorSet = list( NewDetector = list( train = newTrain, score = newScore ) );Our intent in sharing the data is for the password-timing tables to be used to evaluate a range of anomaly detectors so that the results of the evaluations can be soundly compared, using the same data and the same evaluation procedure. Consequently, we encourage other researchers to use our evaluation script to evaluate new and better anomaly-detection strategies for keystroke dynamics.
Detector | Average Equal-Error Rate (stddev) |
---|---|
Manhattan (scaled) | 0.0962 (0.0694) |
Nearest Neighbor (Mahalanobis) | 0.0996 (0.0642) |
Outlier Count (z-score) | 0.1022 (0.0767) |
SVM (one-class) | 0.1025 (0.0650) |
Mahalanobis | 0.1101 (0.0645) |
Mahalanobis (normed) | 0.1101 (0.0645) |
Manhattan (filter) | 0.1360 (0.0828) |
Manhattan | 0.1529 (0.0925) |
Neural Network (auto-assoc) | 0.1614 (0.0797) |
Euclidean | 0.1706 (0.0952) |
Euclidean (normed) | 0.2153 (0.1187) |
Fuzzy Logic | 0.2213 (0.1051) |
k Means | 0.3722 (0.1391) |
Neural Network (standard) | 0.8283 (0.1483) |