Clique Percolation Method in R: a fast implementation

Clique Percolation Method (CPM) is an algorithm for finding overlapping communities within networks, intruduced by Palla et al. (2005, see references). This implementation in R, firstly detects communities of size k, then creates a clique graph. Each community will be represented by each connected component in the clique graph.

Algorithm

The algorithm performs the following steps:

1- first find all cliques of size k in the graph
2- then create graph where nodes are cliques of size k
3- add edges if two nodes (cliques) share k-1 common nodes
4- each connected component is a community

Example with k=3

Graph
Graph

The presented graph contains the following cliques:

{1, 2, 3} {1, 3, 4} {4, 5, 6} {5, 6, 7} {5, 6, 8} {5, 7, 8} {6, 7, 8}

Each clique will represent a node in the clique graph and those node are connected each other if they share k-1 (2 in this case) nodes.

Clique Graph
Clique Graph

As a result, the clique graph presents two connected components containing {1,2,3,4} and {4,5,6,7,8}. The node 4 belongs to both communities. In other words, the graph contains two overlapping communities.

Description

Usage

  • clique.community(graph, k) : Basic implementation with some bug fix from an old version (see section Bug Fix)
  • clique.community.opt(graph, k) : Optimized version with reduction of search space (see section Optimizations)
  • clique.community.opt.par(graph, k): Optimization via parallelization (see section Optimizations)

Arguments

  • graph: the input graph (igraph object)
  • k: the size of the clique (usually = 3)

Values

  • memberships: an object containing list of communities

Package Dependencies

This implementation requires the following packages:

install.packages("igraph")
install.packages("doParallel")
install.packages("foreach")

Implementation, bug fix and optimisations

An old and rather inefficient implementation of this algorithm can be found here: http://igraph.wikidot.com/community-detection-in-r#toc0. However, this implementation contains couple of bugs, related to some updates of the library and it has room for an optimisation.

Bug Fix

1) Non-negative Vertex Id

Since version 0.6, vertices and edge are indexed from one in R iGraph and then that implementation produces the following error:

Error in .Call("R_igraph_create", as.numeric(edges) - 1, as.numeric(n),  : 
  At type_indexededgelist.c:117 : cannot create empty graph with negative number of vertices, Invalid value
In addition: Warning message:
In max(edges) : no non-missing arguments to max; returning -Inf
Called from: .Call("R_igraph_create", as.numeric(edges) - 1, as.numeric(n), 
    as.logical(directed), PACKAGE = "igraph")

This issue has been now fixed.

2) Clique-Graph

In the same old implementation, the graph was created using the edge list: clq.graph <- simplify(graph(edges)) and then simplified. However, when performing the clique() function, some isolated cliques may appear (cliques that do not share k-1 nodes with others cliques). Relying on the previous version, isolated cliques and therefore cliques without any edge do not appear in the final computed clique graph. In some extreme cases, when the edges structure is empty (every clique is not connected with others), the algorithm generates the following error:

Error in .Call("R_igraph_create", as.numeric(edges) - 1, as.numeric(n),  : 
  At type_indexededgelist.c:117 : cannot create empty graph with negative number of vertices, Invalid value
In addition: Warning message:
In max(edges) : no non-missing arguments to max; returning -Inf
Called from: .Call("R_igraph_create", as.numeric(edges) - 1, as.numeric(n), 
    as.logical(directed), PACKAGE = "igraph")

In this new version, clique.community.R, the graph is created inserting the right amount of vertexes and then all the edges. In this way, we avoid to lose nodes (cliques without edges), that can represent communities.

Optimizations of the Algorithm

The most computationally expensive section in the algorithm is the clique-graph creation process. It performs an extensive search on the space of cliques, looking for couples that share k-1 nodes. In the basic implementation (clique.community.R) there are two nested for loops comparing cliques and then performing n*n iterations.

Reduction of the Search Space

clique.community.opt.R implements an optimised approach to discover couples of cliques in the search space. As already mentioned the algorithm performs an exhaustive search of couples that share k-1 nodes. The two nested for loops over the same list of cliques correspond to a symmetric matrix-based search space that can be reduced in investigating either the upper or lower part. This implementation therefore reduces the number of iterations to n*(n-1)/2.

Parallelisation

Another implementation of the algorithm can be found in clique.community.opt.par.R. This implementation parallelises the research of couples of cliques exploiting the number of cores that nowadays CPU have at their disposal. In addition, it also implements the previous optimisation. To run, this implementation requires the foreach and doParallel packages and the number of clusters (parallel session of R) to parallelise the execution need to be defined at priori by the user. They can be defined using the following code:

cl <- makeCluster(4) #test on 4 cores
registerDoParallel(cl)
getDoParWorkers()

Results

Using the Zachary network as a benchmark, and running the three implementations on a Processor Intel(R) Core(TM) i7-3630QM CPU @ 2.40GHz, 2401 Mhz, 4 Core(s), 8 Logical Processor(s), it is possible to compare the elapsed of the different implementations:

> g <- make_graph("Zachary") #the karate network
> #Execution of the different implementation of the algorithm
> ptm <- proc.time()
> res1<-clique.community(g,3)
> proc.time() - ptm
   user  system elapsed 
   3.47    0.02    3.49 
> ptm <- proc.time()
> res2<-clique.community.opt(g,3)
> proc.time() - ptm
   user  system elapsed 
   1.78    0.00    1.78 
> ptm <- proc.time()
> res3<-clique.community.opt.par(g,3)
> proc.time() - ptm
   user  system elapsed 
   0.06    0.00    0.09 

Project info

Github repository: https://github.com/angelosalatino/CliquePercolationMethod-R
Issue tracker: Here

References

Palla, G., Derényi, I., Farkas, I., & Vicsek, T. (2005). Uncovering the overlapping community structure of complex networks in nature and societyNature, 435(7043), 814-818.