Large graph mining: patterns, tools
and case studies
proposal for CIKM 2008, Napa Valley, California
Faloutsos and Hanghang Tong,
How do graphs look like? How do they evolve over time? How
can we find patterns, anomalies and regularities in them? How to find
influential nodes in the network? We will present both theoretical results and
algorithms as well as case studies on several real applications. Our emphasis
is on the intuition behind each method, and on guidelines for the practitioner.
The tutorial has the following parts: (a) Statistical
properties and models and graph generators of static and evolving networks. (b) Tools for the analysis of static and dynamic graphs, like the
Singular Value Decomposition, tensor decomposition for community detection,
HITS/PageRank etc. (c) Proximity measurements on graphs, the
main ideas to quantify the closeness of two nodes of the graph, fast algorithms
to compute the proximity scores, applications of proximity, like CenterPiece
subgraphs, pattern match, trend analysis etc. (d) Case studies
of how a virus or information or influence spreads through the network, how
to find influential bloggers or nodes to target for viral marketing, how
to find fraudsters on eBay, how to find communities on graphs.
Keywords: Graph mining, linear algebra, SVD, tensors, pageRank
[Part 1 | Part 2 | Part 3 | Part 4]
Aims and Learning Objectives
The goal of the tutorial is to cover the most powerful tools
for the analysis of large, real graphs. The tutorial starts old and new
patterns that most real graphs obey (small diameter, power laws etc). It
continues with powerful, traditional tools from linear algebra (singular value
decomposition SVD, eigenvalue analysis); it shows that they form the basis for
the extremely successful PageRank and HITS algorithms; and it concludes with
more advanced tools, namely, sparse low rank approximations ('CUR' and
The next part focuses on proximity of two nodes on a graph, and how to assess
it. We describe several measures (electric current, maximum flow, escape
probability), we compare them and we focus on the most successful ones and on
fast algorithms to compute them.
The tutorial concludes with several case studies: influence propagation, fraud detection
on e-bay, a survey of algorithms for community detection and graph
partitioning, and a description of the map/reduce method for the analysis of
Tera- and Peta-byte scale graphs.
Outline - Description of topics
The proposed format is 1 day (6 hours).
I: Patterns [1h - Faloutsos]
- patterns on static graphs
- patterns on time-evolving graphs
- patterns on weighted graphs
- Part II: Power Tools [2h - Faloutsos]
- HITS, PageRank
- example-based decomposition: CUR, CMD and Colibri
- tensors - concepts and applications
- Part III: Proximity [1.5h - Tong]
- main ideas, properties & generalizations (random
walks, escape probabilities, etc)
- fast algorithms
- applications: (center-piece subgraphs, pattern match
- Part IV: Case Studies [1.5 hour - Faloutsos]
- virus/influence propagation
- blog analysis
- eBay fraud detection
- community detection and tracking (co-clustering,
cross-associations, graphScope etc)
- tensor for web mining: TOPHITS
- scalability: map/reduce and graph mining
The target audience is data management, data mining and
machine learning researchers and professionals who work on static or
time-evolving graphs and want to know about tools and models when dealing with
large network datasets.
Prerequisites: Computer science background (B.Sc. or equivalent);
familiarity with undergraduate linear algebra (eigenvectors). The tutorial will
focus on intuition and examples, carefully introducing only the minimal
necessary mathematical tools, and always focusing on practical applications.
Overlapping tutorials have been
also plan to submit a short version (3 hours) of this tutorial to ICDE
About the instructors
Christos Faloutsos is a Professor
at Carnegie Mellon University. He has received the Presidential Young
Investigator Award by the National Science Foundation (1989), the Research
Contributions Award in ICDM 2006, twelve ``best paper'' awards, and several
teaching awards. He has served as a member of the executive committee of
SIGKDD; he has published over 160 refereed articles, 11 book chapters and one
monograph. He holds five patents and he has given over 20 tutorials and 10
invited distinguished lectures. His research interests include data mining
for streams and graphs, fractals, database performance, and indexing for
multimedia and bio-informatics data.
CV at www.cs.cmu.edu/~christos/webvitae.pdf
Hanghang Tong is a senior Ph.D. student in the Machine
Learning Department at Carnegie Mellon University. He has received best paper
awards from SIAM-DM 2008 and ICDM 2006, and he has 25 refereed
publications. He holds an M.S. degree and a B.S. degree from Tsinghua
University, P.R. China. His research interests include data mining for
multimedia and for graphs. (Full CV at www.cs.cmu.edu/~htong/pdf/cv_Tong.pdf
- Sergey Brin and Lawrence Page. The Anatomy of a
Large-Scale Hypertextual Web Search Engine, Computer Networks 30(1-7):
- Randy Bryant. Data Intensive Scientific Computing, Tech
report. available at http://www.cs.cmu.edu/~bryant/pubdir/cmu-cs-07-128.pdf.
- Deepayan Chakrabarti, Spiros Papadimitriou, Dharmendra
S. Modha, and Christos Faloutsos. Fully Automatic Cross-Associations, KDD
2004, Washington, DC.
- Inderjit S. Dhillon, Subramanyam Mallela, and
Dharmendra S. Modha. Information-theoretic co-clustering. KDD 2003,
- Petros Drineas, Ravi Kannan, and Michael W. Mahoney.
Fast monte carlo algorithms for matrices iii: Computing a compressed
approximate matrix decomposition, SIAM Journal of Computing, 2005.
- Jon Kleinberg. Authoritative sources in a hyperlinked
environment, Proc. 9th ACM-SIAM Symposium on Discrete Algorithms, 1998.
- Tamara Kolda, Brett Bader, and Joseph Kenny.
Higher-order Web link analysis using multilinear algebra, ICDM 2005,
- Jure Leskovec, Jon Kleinberg and Christos Faloutsos.
Graphs over Time: Densification Laws, Shrinking Diameters and Possible
Explanations, KDD 2005, Chicago, IL. ("Best Research Paper"
- Jure Leskovec, Deepayan Chakrabarti, Jon Kleinberg, and
Christos Faloutsos. Realistic, Mathematically Tractable Graph Generation
and Evolution, Using Kronecker Multiplication, ECML/PKDD 2005, Porto,
- Jure Leskovec and Christos Faloutsos. Scalable
Modeling of Real Graphs using Kronecker Multiplication, ICML 2007,
Corvallis, OR, USA
- Jure Leskovec, Mary McGlohon, Christos Faloutsos,
Natalie S. Glance, and Matthew Hurst. Patterns of Cascading Behavior in
Large Blog Graphs, SDM 2007, Minneapolis, Minnesota.
- Shashank Pandit, Duen Horng (Polo) Chau, Samuel Wang
and Christos Faloutsos. NetProbe: A Fast and Scalable System for Fraud
Detection in Online Auction Networks WWW 2007, Banff, Alberta, Canada, May
- Jimeng Sun, Dacheng Tao, and Christos Faloutsos. Beyond
Streams and Graphs: Dynamic Tensor Analysis, KDD 2006, Philadelphia, PA.
- Jimeng Sun, Yinglian Xie, Hui Zhang, Christos
Faloutsos. Less is More: Compact Matrix Decomposition for Large Sparse
Graphs, SDM 2007, Minneapolis, Minnesota. ("Best Research Paper"
- Jimeng Sun, Spiros Papadimitriou, Philip S. Yu, and
Christos Faloutsos. GraphScope: parameter-free mining of large
time-evolving graphs, KDD 2007, San Jose, CA.
- Hanghang Tong, Christos Faloutsos, and Jia-Yu Pan. Fast
Random Walk with Restart and Its Applications, ICDM 2006, Hong Kong.
("Best Research Paper" award)
- Hanghang Tong and Christos Faloutsos. Center-Piece
Subgraphs: Problem Definition and Fast Solutions, KDD 2006, Philadelphia,
- Hanghang Tong, Brian Gallagher, Tina Eliassi-Rad, and
Christos Faloutsos. Fast best-effort pattern matching in large attributed
graphs, KDD 2007, San Jose, CA.
- Hanghang Tong, Yehuda Koren, and Christos Faloutsos.
Fast direction-aware proximity for graph mining, KDD 2007, San Jose, CA.
- Hanghang Tong, Spiros Papadimitriou, Philip S. Yu and Christos
Faloutsos. Proximity Tracking on Time-Evolving Bipartite Graphs. SDM 2008,
Atlanta, GA. ("Best Paper" award)
- Hanghang Tong, Spiros Papadimitriou, Jimeng Sun, Philip
S. Yu, and Christos Faloutsos. Colibri: Fast Mining of Large Static and
Dynamic Graphs, KDD 2008, Las Vegas, NV.
- Yang Wang, Deepayan Chakrabarti, Chenxi Wang and
Christos Faloutsos. Epidemic Spreading in Real Networks: an Eigenvalue
Viewpoint, SRDS 2003, Florence, Italy.
Updated: July 27, 2008