HyCoM Hyperbolic Community Model - pdf

HyCoM-FIT is a free software for the identification of communities following the hyperbolic model in graphs. It's a fast and effective method that can be applied to graphs with millions of nodes and edges.
HyCoM-FIT has been successful in detecting communities in social networks and in identifying interesting clusters in the Wikipedia-articles graph.


What do real communities in social networks look like? Community detection plays a key role in understanding the structure of real-life graphs with impact on recommendation systems, load balancing and routing. Previous community detection methods look for uniform blocks in adjacency matrices. However, after studying four real networks with ground-truth communities, we provide empirical evidence that communities are best represented as having an hyperbolic structure. We detail HyCoM - the Hyperbolic Community Model - as a better representation of communities and the relationships between their members, and show improvements in compression compared to standard methods.

We also introduce HyCoM-FIT, a fast, parameter free algorithm to detect communities with hyperbolic structure. We show that our method is effective in finding communities with a similar structure to self-declared ones. We report findings in real social networks, including a community in a blogging platform with over 34 million edges in which more than 1000 users established over 300000 relations.


Quick guide

java -jar comdet.jar <path to configuration file>

HyCoM-FIT is included as part of a community detection framework. Users specify their network and community detection method by editing simple configuration files passed as arguments to the framework.

You can try a simple example by running java -jar comdet.jar; more examples can be found under examples/.

Configuration files

Configuration files have a very simple structure and are very self-explanatory. We suggest the user to start by editing an existing file. The following are mandatory options:

An integer representing the number of dimensions of the data, i.e. 2 for matrices and 3 or more for higher order tensors. In case of doubt, choose 2 for static graphs and 3 for time-evolving data.
The number of elements on each mode of the data, where K represents the modes from 0 to numDimensions-1; elements should be numbered from 0 to dimensionSizeK-1. In case of static non-bipartite graphs, dimensionSize0 should equal dimensionSize1.
Path to a space separated file listing the non-zeros of the graph. Each line should contain numDimension integers between 0 and dimensionSizeK-1. Edges are treated as directed, so be sure to include both directions for undirected networks.
Number of non-zeros (lines) in edgelistfile.
Path for the csv output file.
Shape of the communities to be found, either "hyperbola" or "block".
One per dimensionSize. Used to specify that some (or all) of the modes share community members. If sharedDimensionK = K, then the algorithm will treat this dimension individually; otherwise, the algorithm will consider dimension K and its shared dimension to be the same. For example: in a static graph, you might want to represent a community by either a single set of nodes (in which case sharedDimension1 = 0) or by two separate sets of nodes (in which case sharedDimension0 = 0 and sharedDimension1 = 1).

The following options related to termination are optional. By default, the algorithm will terminate once all non-zeros (edges) have been explained in a community (full deflation).

The algorithm will terminate after numComs suitable communities have been found.
The algorithm will ignore communities smaller than this threshold.
The algorithm will terminate if maxConsecutive communities do not meet the minCommunitySize threshold.
Warning: Some options have never been tested simultaneously and will probably not work together, such as high-dimensional hyperbolic communities.