SIGMOD 2004 tutorial

Indexing and Mining Streams.

Christos Faloutsos, CMU


DESCRIPTION - OBJECTIVES

How can we find patterns in a sequence of sensor measurements (eg., a sequence of temperatures, or water-pollutant measurements)? How can we compress it? What are the major tools for forecasting and outlier detection? The objective of this tutorial is to provide a concise and intuitive overview of the most important tools, that can help us find
patterns in sensor sequences. Sensor data analysis becomes of increasingly high importance, thanks to the decreasing cost of hardware and the increasing on-sensor processing abilities. We review the state of the art in three related fields: (a) fast similarity search for time sequences, (b) linear forecasting with the traditional AR (autoregressive) and ARIMA methodologies and (c) non-linear forecasting, for chaotic/self-similar time sequences, using lag-plots and fractals. The emphasis of the tutorial is to give the intuition behind these powerful tools, which is usually lost in the technical literature, as well as to give case studies that illustrate their practical use.

NOTICE:  At SIGMOD, Prof. Dennis Shasha will be delivering a related but complementary tutorial, which will discuss multi-window techniques for burst detection, moving window correlation, a query language for order and applications in physics, music and finance.

 FOILS

The pdf of the foils is here

CONTENT AND OUTLINE

GOAL - WHO SHOULD ATTEND

Researchers that want to get up to speed with the major tools in time sequence analysis. Also, practitioners who want a concise, intuitive overview of the state of the art.

PREREQUISITES:

None. The emphasis is on the intuition behind all these mathematical tools.

PRESENTER - BIO

Christos Faloutsos is a Professor at Carnegie Mellon University. He has received the Presidential Young Investigator Award by the National Science Foundation (1989), four ``best paper'' awards, and several teaching awards. He is a member of the executive committee of SIGKDD; he has published over 120 refereed articles, one monograph, and holds four patents. His research interests include data mining in streams and graphs, fractals, indexing methods for multimedia and text data bases, and data base performance.