Large graph mining: patterns, tools and case studies

Tutorial proposal for CIKM 2008, Napa Valley, California

Christos Faloutsos and Hanghang Tong,
Carnegie Mellon University


How do graphs look like? How do they evolve over time? How can we find patterns, anomalies and regularities in them? How to find influential nodes in the network? We will present both theoretical results and algorithms as well as case studies on several real applications. Our emphasis is on the intuition behind each method, and on guidelines for the practitioner.

The tutorial has the following parts: (a) Statistical properties and models and graph generators of static and evolving networks. (b) Tools for the analysis of static and dynamic graphs, like the Singular Value Decomposition, tensor decomposition for community detection, HITS/PageRank etc. (c) Proximity measurements on graphs, the main ideas to quantify the closeness of two nodes of the graph, fast algorithms to compute the proximity scores, applications of proximity, like CenterPiece subgraphs, pattern match, trend analysis etc. (d) Case studies of how a virus or information or influence spreads through the network, how to  find influential bloggers or nodes to target for viral marketing, how to find fraudsters on eBay, how to find communities on graphs.

Keywords: Graph mining, linear algebra, SVD, tensors, pageRank


[Part 1 | Part 2 | Part 3 | Part 4]

Aims and Learning Objectives

The goal of the tutorial is to cover the most powerful tools for the analysis of large, real graphs. The tutorial starts old and new patterns that most real graphs obey (small diameter, power laws etc). It continues with powerful, traditional tools from linear algebra (singular value decomposition SVD, eigenvalue analysis); it shows that they form the basis for the extremely successful PageRank and HITS algorithms; and it concludes with more advanced tools, namely, sparse low rank approximations ('CUR' and derivatives).
The next part focuses on proximity of two nodes on a graph, and how to assess it. We describe several measures (electric current, maximum flow, escape probability), we compare them and we focus on the most successful ones and on fast algorithms to compute them.
The tutorial concludes with several case studies: influence propagation, fraud detection on e-bay, a survey of algorithms for community detection and graph partitioning, and a description of the map/reduce method for the analysis of Tera- and Peta-byte scale graphs.

Outline - Description of topics

The proposed format is 1 day (6 hours).

·      Part I: Patterns [1h - Faloutsos]

Target Audience

The target audience is data management, data mining and machine learning researchers and professionals who work on static or time-evolving graphs and want to know about tools and models when dealing with large network datasets.
Computer science background (B.Sc. or equivalent); familiarity with undergraduate linear algebra (eigenvectors). The tutorial will focus on intuition and examples, carefully introducing only the minimal necessary mathematical tools, and always focusing on practical applications.

Tutorial History

Overlapping tutorials have been presented at:

We also plan to submit a short version (3 hours) of this tutorial to ICDE 2009.

About the instructors

christos photo

Christos Faloutsos is a Professor at Carnegie Mellon University. He has received the Presidential Young Investigator Award by the National Science Foundation (1989), the Research Contributions Award in ICDM 2006, twelve ``best paper'' awards, and several teaching awards. He has served as a member of the executive committee of SIGKDD; he has published over 160 refereed articles, 11 book chapters and one monograph. He holds five patents and he has given over 20 tutorials and 10 invited distinguished lectures. His research interests include data mining for streams and graphs, fractals, database performance, and indexing for multimedia and bio-informatics data.

(Full CV at )

Hanghang Tong is a senior Ph.D. student in the Machine Learning Department at Carnegie Mellon University. He has received best paper awards from  SIAM-DM 2008 and ICDM 2006, and he has 25 refereed publications. He holds an M.S. degree and a B.S. degree from Tsinghua University, P.R. China. His research interests include data mining for multimedia and for graphs. (Full CV at )


Related References


  1. Sergey Brin and Lawrence Page. The Anatomy of a Large-Scale Hypertextual Web Search Engine, Computer Networks 30(1-7): 107-117, 1998.
  2. Randy Bryant. Data Intensive Scientific Computing, Tech report. available at
  3. Deepayan Chakrabarti, Spiros Papadimitriou, Dharmendra S. Modha, and Christos Faloutsos. Fully Automatic Cross-Associations, KDD 2004, Washington, DC.
  4. Inderjit S. Dhillon, Subramanyam Mallela, and Dharmendra S. Modha. Information-theoretic co-clustering. KDD 2003, Washington, DC.
  5. Petros Drineas, Ravi Kannan, and Michael W. Mahoney. Fast monte carlo algorithms for matrices iii: Computing a compressed approximate matrix decomposition, SIAM Journal of Computing, 2005.
  6. Jon Kleinberg. Authoritative sources in a hyperlinked environment, Proc. 9th ACM-SIAM Symposium on Discrete Algorithms, 1998.
  7. Tamara Kolda, Brett Bader, and Joseph Kenny. Higher-order Web link analysis using multilinear algebra, ICDM 2005, Houston, Texas.
  8. Jure Leskovec, Jon Kleinberg and Christos Faloutsos. Graphs over Time: Densification Laws, Shrinking Diameters and Possible Explanations, KDD 2005, Chicago, IL. ("Best Research Paper" award).
  9. Jure Leskovec, Deepayan Chakrabarti, Jon Kleinberg, and Christos Faloutsos. Realistic, Mathematically Tractable Graph Generation and Evolution, Using Kronecker Multiplication, ECML/PKDD 2005, Porto, Portugal.
  10. Jure Leskovec and  Christos Faloutsos. Scalable Modeling of Real Graphs using Kronecker Multiplication, ICML 2007, Corvallis, OR, USA
  11. Jure Leskovec, Mary McGlohon, Christos Faloutsos, Natalie S. Glance, and Matthew Hurst. Patterns of Cascading Behavior in Large Blog Graphs, SDM 2007, Minneapolis, Minnesota.
  12. Shashank Pandit, Duen Horng (Polo) Chau, Samuel Wang and Christos Faloutsos. NetProbe: A Fast and Scalable System for Fraud Detection in Online Auction Networks WWW 2007, Banff, Alberta, Canada, May 8-12, 2007.
  13. Jimeng Sun, Dacheng Tao, and Christos Faloutsos. Beyond Streams and Graphs: Dynamic Tensor Analysis, KDD 2006, Philadelphia, PA.
  14. Jimeng Sun, Yinglian Xie, Hui Zhang, Christos Faloutsos. Less is More: Compact Matrix Decomposition for Large Sparse Graphs, SDM 2007, Minneapolis, Minnesota. ("Best Research Paper" award)
  15. Jimeng Sun, Spiros Papadimitriou, Philip S. Yu, and Christos Faloutsos. GraphScope: parameter-free mining of large time-evolving graphs, KDD 2007, San Jose, CA.
  16. Hanghang Tong, Christos Faloutsos, and Jia-Yu Pan. Fast Random Walk with Restart and Its Applications, ICDM 2006, Hong Kong. ("Best Research Paper" award)
  17. Hanghang Tong and Christos Faloutsos. Center-Piece Subgraphs: Problem Definition and Fast Solutions, KDD 2006, Philadelphia, PA.
  18. Hanghang Tong, Brian Gallagher, Tina Eliassi-Rad, and Christos Faloutsos. Fast best-effort pattern matching in large attributed graphs, KDD 2007, San Jose, CA.
  19. Hanghang Tong, Yehuda Koren, and Christos Faloutsos. Fast direction-aware proximity for graph mining, KDD 2007, San Jose, CA.
  20. Hanghang Tong, Spiros Papadimitriou, Philip S. Yu and Christos Faloutsos. Proximity Tracking on Time-Evolving Bipartite Graphs. SDM 2008, Atlanta, GA. ("Best Paper" award)
  21. Hanghang Tong, Spiros Papadimitriou, Jimeng Sun, Philip S. Yu, and Christos Faloutsos. Colibri: Fast Mining of Large Static and Dynamic Graphs, KDD 2008, Las Vegas, NV.
  22. Yang Wang, Deepayan Chakrabarti, Chenxi Wang and Christos Faloutsos. Epidemic Spreading in Real Networks: an Eigenvalue Viewpoint, SRDS 2003, Florence, Italy.


Last Updated: July 27, 2008