Principal Investigator |
Roy A. Maxion, Carnegie Mellon University
(maxion@cs.cmu.edu)
|
Project Heading |
Invictus |
Objective |
Develop a synthetic environment which will generate realistic, but
carefully controlled, datasets for testing anomaly-detection systems.
|
Overview |
Cinnamon is a synthetic environment to generate realistic system performance
data that enables:
- Evaluation of competing anomaly-detection methodologies, either
within the Invictus project or across other projects, benchmarked
to a common standard.
- Assessment, in a statistically rigorous fashion, of the capability
of the core algorithms (e.g., robustness of types I & II error
rates to noise, multidimensionality, nonstationarity, etc.).
- Measurement of system scalability (i.e., performance degradation
as a function of the complexity of the system of systems).
- Generation of patterns that imitate evolutionary system behavior,
as well as patterns that represent attempted intrusion of other
kinds of system compromise.
The synthesizer will support designed factorial experiments that
make statistical comparisons of different kinds of system monitors and
enable tuning of Harbinger to specific applications.
|
Work Completed |
The following items have been completed:
- Generate univariate interval data from any one of the following
statistical distributions: Binomial, Cauchy, Chi-squared,
Exponential, F, Normal, Poisson, T, Uniform, or Weibull.
- Generate multivariate interval data from any one of the following
statistical distributions: ARMA (Auto Regression and Moving
Average), or Multivariate Normal.
- Generate univariate nominal data using any one of the following
statistical distributions: Binomial, Multinomial, or Markov Model.
- Add linear, exponential, or sinusoidal drift to generated univariate
interval data.
- Insert perturbations into generated data where a perturbation is
data generated from a different statistical distribution.
- Generate autocorrelated univariate interval data.
- Generate data that is a probabilistic mixture of two or more
statistical distributions.
- Provide the capability for continual data generation.
- Write text for About Cinnamon and About Data.
- Develop a web interface to allow the user to enter a specification
of the data to be generated, to view graphs of the generated data,
and to retrieve the generated data files.
- Write help text for each statistical distribution that describes
an appropriate application of the distribution.
|
Work In Progress |
There is currently no work in progress.
|
Future Plans |
The following items are planned for future development:
- Write text for About Random Numbers.
- Allow the user to submit a file containing a specification of the
data to be generated rather than entering on the web interface.
- Add capability to specify intermittent perturbations (e.g., insert
perturbation X every 500 time steps).
- Add capability to specify lagged perturbations (e.g., perturb vector
2 with perturbation X 5 time steps after perturbing vector 1 with
perturbation Y).
- Produce a key which enumerates the perturbations inserted into the
data and the details/characteristics for each.
- Validate the specifications entered by the user to ensure there
are no conflicts and that all necessary information has been entered.
- Modify the web interface to include default values for each user-specified
entry so as to correspond with the example scenario for the distribution.
|