PEGASUS (Peta Graph Mining Project)
Pegasus logo designed by Maria Tsiarli
Goals
The goal of this project is to create a software library using HADOOP
that performs typical graph mining tasks in big graphs, currently handling hundreds of giga-bytes and heading towards tera- and peta-byte sized graphs.
For some information click
here.
Currently, PEGASUS provides software for 0) degree distributions and PageRank, 1) diameter estimation, 2) connected components and
3) triangle counting.
Code
You can get the binaries from
here.
If you want the triangle counting code (JAVA, HADOOP and MATLAB) accompanying the paper "Doulion: Counting Triangles in Massive Graphs with a Coin", KDD '09,
send an email to
graphminingtoolbox@gmail.com.
Related Publications
Collaborators
U Kang and Christos Faloutsos.
Triangles
Goals
My personal interest in this problem, i.e., computing the number of triangles in the graph,
began in the context of social networks where triangles have a natural interpretation: "friends of friends tend to become friends themselves".
Two frequently computed metrics in complex network analysis, involve computing the number of triangles in a graph, namely the
clustering coefficient and the transitivity ratio.
My goal was to develop algorithms that compute triangles fast approximately in large graphs.
Short Description of Contributions
The first contribution was effectively an observation: we can take advantage of the properties of the spectrum of the adjacency matrix of
a typical "power-law" graph and make a good estimate of the number of triangles using few eigenvalues and eigenvectors.
Furthermore, in that same paper, two power laws are observed and a theorem on the number of triangles of the so-called
Kronecker graphs (see Leskovec et.al.) is proved.
Given the experimental nature of the spectral approach (to the best of my knowledge, there is no paper explaining why
the bulk of the eigenvalues is almost symmetric around 0) the goal was to develop an algorithm for the triangle counting problem
that works for all graphs. The first contribution, was a simple sampling idea, i.e., choose a random subset of the edges, let's say
in expectation of size pm, where m is the number of edges p is the percentage of edges we keep, reweight each edge with 1/p
and count weighted triangles. This idea appeared in the KDD '09 paper, with experimental evidence that the estimator is good even
if the graph is very sparse with respect to the number of triangles.
The natural question of what is the minimum sparsification value for the Doulion sparsification idea was answered by M.N. Kolountzakis, Gary L. Miller and me and can be found in our preprint in Arxiv.
For more details, take a look in the related publications.
Related Publications
-
Fast Counting of Triangles in Large Real Networks, without counting: Algorithms and Laws
Charalampos E. Tsourakakis
In IEEE International Conference on Data Mining (ICDM 2008)
Invited as a best paper to KAIS Journal
-
Counting Triangles using Projections: Algorithms and Laws ,
Charalampos E. Tsourakakis
(under review, KAIS Journal)
-
Spectral Counting of Triangles in Power-Law Graphs via Element-Wise Sparsification
Charalampos E. Tsourakakis, Petros Drineas, Eirinaios Michelakis, Ioannis Koutis and Christos Faloutsos
In Advances in Social Networks Analysis and Mining (ASONAM '09)
-
Spectral Counting of Triangles via Element-Wise Sparsification and Triangle-based Link Recommendation
Charalampos E. Tsourakakis, Petros Drineas, Eirinaios Michelakis, Ioannis Koutis and Christos Faloutsos
Invited as a chapter to the upcoming book entitled "Advances in Social Networks Analysis and Mining".
-
Doulion: Counting triangles in massive graphs with a coin
Charalampos E. Tsourakakis, U Kang, Gary L. Miller, and Christos Faloutsos
In Knowledge Discovery and Data Mining (KDD '09)
- Approximate Triangle Counting
Charalampos E. Tsourakakis, Mihail N. Kolountzakis, Gary L. Miller
Preprint (all preprints are subject to changes)
Tensor Mining
Goals
The goal of this project is to new develop tensor mining algorithms.
Related Publications
-
Two heads better than one: pattern discovery in time-evolving multi-aspect data
Jimeng Sun, Charalampos E. Tsourakakis, Evan Hoke, Christos Faloutsos and Tina Eliassi-Rad
In European Conference on Machine Learning and Principles and Practice of Knowledge Discovery in Databases (ECML-PKDD 2008)
Invited Journal paper
-
Two heads better than one: pattern discovery in time-evolving multi-aspect data
Jimeng Sun, Charalampos E. Tsourakakis, Evan Hoke, Christos Faloutsos and Tina Eliassi-Rad
In Data Mining and Discovery Journal
-
MACH: Fast Randomized Tensor Decompositions
Charalampos E. Tsourakakis
Preprint (all preprints are subject to changes)
Collaborators
Jimeng Sun, Christos Faloutsos
Graph Mining: Detecting "Bridges"
Related Publications
-
BridgeFind: Fast Detection of Top-k Bridges in Large Graphs
Hanghang Tong, Aditya Prakash, Charalampos E. Tsourakakis Tina Eliassi-Rad, Christos Faloutsos and Duen Horng Chau
Under submission
Graph Similarity
Ongoing work with Ioannis Koutis, Gary Miller and Christos Faloutsos.
Manifold Learning
Ongoing work. Check our reading group by clicking at Don's web page .
Spectral Rounding for Vision and Machine Learning
Image Segmentation produced using the Spectral Rounding Algorithm
Ongoing work with Dave Tolliver.
A recent paper by Dave Tolliver and Gary Miller called Spectral Rounding
introduced a way to segment an image by reweighting iteratively the edges of the
graph in a specific way so that it "breaks up" in a "natural" way
into k components, i.e. the multiplicity of the eigenvalue 0 is k (see paper to learn how this idea actually works).
We are investigating further this method.