Jia-Yu Pan's Publications

Sorted by DateClassified by Publication TypeClassified by Research Category

FastCARS: Fast, Correlation-Aware Sampling for Network Data Mining

Jia-Yu Pan, Srinivasan Seshan, and Christos Faloutsos. FastCARS: Fast, Correlation-Aware Sampling for Network Data Mining. CMU-CS-02-167, Carnegie Mellon University, 2002.

Download

[PDF]1.2MB  [gzipped postscript]556.0kB  

Abstract

Measuring traffic on routers is vital for finding patterns, traffic modeling, and anomaly detection. Unfortunately, technology trends are making it more and more difficult to observe and record the large amount of data generated by high speed links. Traffic sampling techniques provide a simple alternative that reduces the volume of data collected. Real world data is seldom temporally independent and data observed at one time is likely to have important correlations with data observed at close-by instants in time. A good sampling method should be able to give measurements that take this correlation into account. Unfortunately, existing sampling techniques largely hide any temporal relationship in the recorded data.
Our proposed method, ``FastCARS'', naturally captures statistics forpackets that are 1, 2 or more steps away. It has thefollowing properties: (a) provides accurate measurementsof full trace's statistics, (b) is simple and scalable for implementation,(c) captures correlations between successive packets,as well as packets that are further apart,(d) evenly separate sampling efforts over time, and(e) generalizes previously proposed sampling methods andincludes them as special cases.
We also propose several new tools for network data mining and demonstrate the good quality of the information provided by FastCARS. These tools include:(a) The $n$-step histograms which give correlated statistics at different levels of temporal correlation,(b) the convolution test which could be used to examine the dependence level between packet arrivals.(c) the n-step packet-size/delay graph which provides accurate bandwidth estimation and load monitoring, and(d) the 1-step flow graph which effectively visualizes flow patterns hidden in a trace.
The experimental results on multiple, real-world datasets (479Mb intotal), show that the proposed FastCARS sampling method and these new datamining tools are effective. With these tools, we show that theindependence assumption of packet arrival is not correct, and that packettrains may not be the only cause of dependence among arrivals.The provided tools may be useful in applications such as monitoring link load and traffic flows.

BibTeX Entry

@TechReport{TechReport02FastCARS,
  author =       {Jia-Yu Pan and Srinivasan Seshan and Christos Faloutsos},
  title =        {FastCARS: Fast, Correlation-Aware Sampling for Network Data Mi
ning},
  institution =  {CMU-CS-02-167, Carnegie Mellon University},
  year =         2002,
  abstract = {Measuring traffic on routers is vital for finding patterns, traffic modeling, and anomaly detection.  Unfortunately, technology trends are making it more and more difficult to observe and record the large amount of data generated by high speed links. Traffic sampling techniques provide a simple alternative that reduces the volume of data collected.  Real world data is seldom temporally independent and data observed at one time is likely to have important correlations with data observed at close-by instants in time.  A good sampling method should be able to give measurements that take this correlation into account.  Unfortunately, existing sampling techniques largely hide any temporal relationship in the recorded data. <br>
Our proposed method, ``FastCARS'', naturally captures statistics for
packets that are 1, 2 or more steps away. It has the
following properties: (a) provides accurate measurements
of full trace's statistics, (b) is simple and scalable for implementation,
(c) captures correlations between successive packets,
as well as packets that are further apart,
(d) evenly separate sampling efforts over time, and
(e) generalizes previously proposed sampling methods and
includes them as special cases. <br>
We also propose several new tools for network data mining and demonstrate the good quality of the information provided by FastCARS. These tools include:
(a) The \textit{$n$-step histograms} which give correlated statistics at different levels of temporal correlation,
(b) the \textit{convolution test} which could be used to examine the dependence level between packet arrivals.
(c) the \textit{n-step packet-size/delay graph} which provides accurate bandwidth estimation and load monitoring, and
(d) the \textit{1-step flow graph} which effectively visualizes flow patterns hidden in a trace. <br>
The experimental results on multiple, real-world datasets (479Mb in
total), show that the proposed FastCARS sampling method and these new data
mining tools are effective.  With these tools, we show that the
independence assumption of packet arrival is not correct, and that packet
trains may not be the only cause of dependence among arrivals.
The provided tools may be useful in applications such as monitoring link load and traffic flows.},
  bib2html_pubtype = {Tech Report},
  bib2html_rescat = {Network Data Mining},
}

Generated by bib2html (written by Patrick Riley ) on Wed Sep 01, 2004 13:24:30