TITLE: Mining large graphs and streams using matrix and tensor tools INSTRUCTOR: Christos Faloutsos, CMU Tamara G. Kolda, Sandia National Labs Jimeng Sun, CMU INTENDED DURATION: 3 hours DESCRIPTION - OBJECTIVES How can we find patterns in sensor streams (eg., a sequence of temperatures, water-pollutant measurements, or machine room measurements)? How can we mine Internet traffic graph over time? Further, how can we make the process incremental? We review the state of the art in four related fields: (a) numerical analysis and linear algebra (b) multi-linear/tensor analysis (c) graph mining and (d) stream mining. We will present both theoretical results and algorithms as well as case studies on several real applications. Our emphasis is on the intuition behind each method, and on guidelines for the practitioner. CONTENT AND OUTLINE Note to evaluators: We are asking for a 3-hour duration, to include all parts. However, we could easily create a 2-hour version, by omitting Part II below. [Part I. Core - 1.5 hour] Data model - Fundamental concepts - Time series - Matrices - Tensors Matrix analysis - SVD, PCA and eigen-decomposition - Page-rank, HITS - Independent Component Analysis (ICA) - sparse decompositions: CUR - Semi-discrete decomposition (SDD) - Co-clustering Tensor analysis - Intro - Parafac - Tucker Model Tucker 1 and PCA; Tucker 2 and Tensor PCA; Tucker 3 and High-order SVD (HO-SVD) - Other models Combination of PARAFAC and Tucker DEDICOM [Part II. Extensions - 1hour] Non-negativity - Nonnegative matrix factorization - Nonnegative tensor factorization Missing values - Matrices - Tensors Stream mining - Incremental PCA - Dynamic tensor analysis - Window-based tensor analysis [Part III. Practitioner's guide - 30'] Software - Intro - Issues Scalability, Accuracy, Sparsity Case studies - sensor network, machine monitoring - Internet forensic computing - social network analysis - web graph study WHO SHOULD ATTEND Researchers who want to get up to speed with the major tools in stream mining, graph mining. Also, practitioners who want a concise, intuitive overview of the state of the art. RELATED PREVIOUS TUTORIALS (if any) None ABOUT THE INSTRUCTORS Christos Faloutsos is a Professor at Carnegie Mellon University. He has received the Presidential Young Investigator Award by the National Science Foundation (1989), seven ``best paper'' awards, and several teaching awards. He has served as a member of the executive committee of SIGKDD; he has published over 140 refereed articles, one monograph, and holds five patents. His research interests include data mining for streams and networks, fractals, indexing for multimedia and bio-informatics data bases, and performance. Tamara G. Kolda is a researcher at Sandia National Laboratories in Livermore, California and has received the Presidential Early Career Award for Scientists and Engineers (2003). She has published over 25 refereed articles and released several software packages including the MATLAB Tensor Toolbox. She is an associate editor for the SIAM Journal on Scientific Computing. Her research interests include multilinear algebra and tensor decompositions, data mining, optimization, nonlinear solvers, graph algorithms, parallel computing and the design of scientific software. Jimeng Sun is a PhD candidate in Computer Science Department at Carnegie Mellon University. His rearch interests include data mining on streams, graphs and tensors, anomaly detection.