Harbinger: Anomaly Detection Techniques

Principal Investigator

Roy A. Maxion, Carnegie Mellon University (maxion@cs.cmu.edu)

Project Heading



Build a system that will learn about a new or changing environment in ways similar to the ways that people learn, and will detect anomalous behavior in such environments automatically.


Harbinger is an inner core of anomaly-detection techniques and algorithms that applies new statistical methods in high-dimensional data analysis to the problem of detecting system anomalies indicating intrusion or other kinds of system compromise. Goals include:
  • Detectability of new, previously unseen system threats, as well as previously-known threats, with 98% success rate and 5% false alarm rate.
  • Adaptability to evolutionary drift, so that changing patterns of normal system usage will not trigger impractical numbers of false alarms.
  • Tunability, so that the system can be tailored to be particularly, but not exclusively, sensitive to the kinds of threats that are anticipated in the local context.
  • Visualizability of the structure of high-dimensional data to support management assessment of suspected threats hidden in complex system performance data.
  • Discoverability of unanticipated patterns in data, with capability for generating descriptive rules for future use.
Feasibility and technology transitioning will be facilitated through real-world application in collaboration with an industrial partner (e.g., international banking, telecommunications, semiconductor fabrication).

Work Completed

The following items have been completed:
  • Longitudinal data, monitored from several large Ethernet networks, were analyzed to discover anomalous behavior associated with network fault conditions, achieving 86% correct diagnosis overall, and up to 99% for selected faults.
  • Data drawn from real-time monitoring of over 46,000 production wafers in an operational fabrication plant were used to derive rules for diagnosing faults in semiconductor silicon-wafer fabrication. These rules achieved 100% correct fault detection and diagnosis, with no misses and no false alarms. The semiconductor fabrication domain is an especially difficult one, because the fabrication process is inherently nonstationary.

Work In Progress

The following items are currently in progress:
  • Data from mission-critical computer systems (e.g., the national telecommunications infrastructure, regional power grids, international financial systems, corporate intranets, etc.) are being analyzed to discover past or present presence of information-warfare intrusions.
  • Tools for detecting anomalies in data are being developed and the efficacy of each is being evaluated.

Future Plans

Plans for the near future involve developing more anomaly detection tools, evaluating their efficacy, and applying them to real-world data.