Mining Large Time-evolving Data Using Matrix and Tensor Tools
SDM 2007 tutorial, Minneapolis, MN
DESCRIPTION - OBJECTIVES
How can we find patterns in sensor streams (eg., a
sequence of temperatures, water-pollutant measurements, or
machine room measurements)?
How can we mine Internet traffic graph over time?
Further, how can we make the process incremental?
We review the state of the art in four related fields:
(a) numerical analysis and linear algebra (b) multi-linear/tensor analysis
(c) graph mining and (d) stream mining.
We will present both theoretical results and algorithms as well as
case studies on several real applications.
Our emphasis is on the intuition behind each method,
and on guidelines for the practitioner.
CONTENT AND OUTLINE
- Part I. Core
- Data model - Fundamental concepts
- Time series
- Matrix analysis
- SVD, PCA and eigen-decomposition
- Page-rank, HITS
- sparse decompositions: CUR
- Co-clustering and cross-associations
- Tensor analysis
- Tucker Model
- Tucker 1 and PCA;
- Tucker 2 and Tensor PCA;
- Tucker 3 and High-order SVD (HO-SVD)
- Other models
- Combination of PARAFAC and Tucker
- Part II. Extensions
- Nonnegative matrix factorization
- Nonnegative tensor factorization
- Missing values
- Stream mining
- Incremental PCA
- Dynamic tensor analysis
- Window-based tensor analysis
- Part III. Practitioner's guide
- Issues: Scalability, Accuracy, Sparsity
- Case studies
- sensor network, machine monitoring
- Internet forensic computing
- social network analysis
- web graph study
WHO SHOULD ATTEND
Researchers who want to get up to speed with the major tools
in stream mining, graph mining.
Also, practitioners who want a concise, intuitive
overview of the state of the art.
ABOUT THE INSTRUCTORS
Christos Faloutsos is a Professor at Carnegie Mellon University.
He has received the Presidential Young Investigator Award by
the National Science Foundation (1989), seven ``best paper'' awards,
and several teaching awards.
He has served as a member of the executive committee of SIGKDD;
he has published over 140 refereed articles, one monograph,
and holds five patents. His research interests include data mining
for streams and networks, fractals, indexing for
multimedia and bio-informatics data bases, and performance.
Tamara G. Kolda is a researcher at Sandia National Laboratories in
Livermore, California and has received the Presidential Early
Career Award for Scientists and Engineers (2003). She has
published over 25 refereed articles and released several software
packages including the MATLAB Tensor Toolbox. She is an associate
editor for the SIAM Journal on Scientific Computing. Her research
interests include multilinear algebra and tensor decompositions,
data mining, optimization, nonlinear solvers, graph algorithms,
parallel computing and the design of scientific software.
is a PhD candidate in Computer Science Department at Carnegie
Mellon University. His rearch interests include data mining on streams,
graphs and tensors, anomaly detection.
Last updated: Feb. 3, 2007