RDBMS as an Efficient Tool to Mine Cliques on Complex Networks

Ana Paula Appel, Adriano Arantes Paterlini, Caetano Traina Jr.

Abstract



Complex networks are intrinsically present in a wide range of applications.
Real world networks have several unique properties, such as, sparsity, node degree distribution, which follow a power law and a large amount of triangles that further form larger cliques.
Triangles and cluster coefficient, which are usually used to find groups, are not always enough to distinguish a different node neighborhood topology.
By using cliques of sizes 4 and 5, it is possible to study how triangles become involved to form large cliques.
To retrieve these cliques called k4 and k5 a novel technique called ``FCR - Fast Clique Retrieval'' has been developed, taking advantage of the data management and optimization techniques of a relational database management system and SQL to query cliques of sizes 4 and 5. This paper demonstrates that cliques (3, 4 and 5) follow interesting power laws that allow identifying nodes with suspicious behaviors. It also presents an extension of the cluster coefficient formula, which may become a valuable equation to identify nodes that most influence the network first eigenvalue.

Keywords


cliques; cluster coefficient; graph mining; power law; RDBMS

Full Text:

PDF


An official publication of the Brazilian Computer Society Special Interest Group on Databases.