Izvestiya of Saratov University.

Mathematics. Mechanics. Informatics

ISSN 1816-9791 (Print)
ISSN 2541-9005 (Online)


For citation:

Ionkin M. S., Ogneva M. V. Implementation, Efficiency Analysis and Quality Evaluation of Clustering Algorithms for Graph Models of Social Networks. Izvestiya of Saratov University. Mathematics. Mechanics. Informatics, 2017, vol. 17, iss. 4, pp. 441-451. DOI: 10.18500/1816-9791-2017-17-4-441-451, EDN: ZXJPOD

This is an open access article distributed under the terms of Creative Commons Attribution 4.0 International License (CC-BY 4.0).
Published online: 
28.11.2017
Full text:
(downloads: 132)
Language: 
Russian
Heading: 
UDC: 
519.17, 519.237
EDN: 
ZXJPOD

Implementation, Efficiency Analysis and Quality Evaluation of Clustering Algorithms for Graph Models of Social Networks

Autors: 
Ionkin Michael S., Saratov State University
Ogneva Marina V., Saratov State University
Abstract: 

The article deals with the community detection problem (the clustering problem) for undirected graphs. The clustering (grouping together of similar objects) is one of the fundamental tasks in the data analysis. This task is applied in a wide range of areas: image segmentation, marketing, anti-fraud, forecasting, text analysis and much more. At the moment, there is no universal and effective solution of this problem. There are several dozens of methods and there are many modifications of them which group objects that are as similar as possible to each other. The article describes algorithms for solving this task, presents their asymptotic complexity estimates, traditional metrics and quality functionals needed to evaluate the results of their work. The authors propose a solution to the problem which is the opposite of the resolution limit problem (algorithms find communities that are quite small in relation to the entire graph). The authors implemented the Smart Local Moving algorithm which is an improvement of the well-known Louvain algorithm. Performed an experimental comparison of the considered algorithms efficiency on large sparse graphs containing several hundreds of thousands of vertices and edges which corresponding to real data from YouTube, Amazon, Live Journal. The comparative analysis was performed on these three “impersonal” data sets with a previously known division into communities (ground-truth communities), as well as on a data set with all available information about the vertices (users) from the social network “Vkontakte”. The communities found by different algorithms on the same data set were also compared with each other. The authors examined such characteristics as the execution time of algorithms, values of modularity and normalized mutual information.

References: 
  1. Aggarwal C. C., Charu C., Reddy C. K. Data clustering. Algorithms and applications. N.Y. : CRC Press, 2014. 652 p.
  2. Jain A. K., Murty M. N., Flynn P. J. Data clustering: a review // ACM Computing Surveys. 1999. Vol. 31, iss. 3. P. 264–323. DOI: https://doi.org/10.1145/331499.331504.
  3. Newman M. E. J. Detecting community structure in networks // The European Physical Journal B – Condensed Matter and Complex Systems. 2004. Vol. 38, iss. 2. P. 321–330. DOI: https://doi.org/10.1140/epjb/e2004-00124-y.
  4. Leskovec J., Rajaraman A., Ullman J. Mining of massive datasets. 2nd ed. Cambridge Univ. Press, 2014. 511 p.
  5. Fortunato S. Community detection in graphs // Phys. Rep. 2010. Vol. 486, iss. 3–5. P. 75– 174. DOI: https://doi.org/10.1016/j.physrep.2009.11.002.
  6. Информационно-аналитический ресурс, посвященный машинному обучению, распознаванию образов и интеллектуальному анализу данных. URL: http://www.machinelearning.ru (дата обращения: 12.02.2017).
  7. Rosvall M., Axelsson D., Bergstrom C. T. The map equation // The European Physical Journal – Special Topics. 2009. Vol. 178, iss. 1. P. 13–23. DOI: https://doi.org/10.1140/epjst/e2010-01179-1.
  8. Pons P., Latapy M. Computing communities in large networks using random walks // Computer and Information Sciences – ISCIS 2005. 2005. P. 284–293. DOI: https://doi.org/10.1007/11569596_31.
  9. Raghavan U. N., Albert R., Kumara S. Near linear time algorithm to detect community structures in large-scale networks // Phys. Rev. E. 2007. Vol. 76, iss. 3. P. 036106. DOI: https://doi.org/10.1103/PhysRevE.76.036106.
  10. Clauset A., Newman M. E. J., Moore C. Finding community structure in very large networks // Phys. Rev. E. 2004. Vol. 70, iss. 6. P. 066111. DOI: https://doi.org/10.1103/PhysRevE.70.066111.
  11. Girvan M., Newman M. E. J. Community structure in social and biological networks // Proc. National Academy of Sciences. 2002. Vol. 99, № 12. P. 7821–7826. DOI: https://doi.org/10.1073/pnas.122653799.
  12. Blondel V. D., Guillaume J., Lambiotte R., Lefebvre E. Fast unfolding of communities in large networks // Journal of Statistical Mechanics : Theory and Experiment. 2008. Vol. 2008, № 10. P. P10008. DOI: https://doi.org/10.1088/1742-5468/2008/10/P10008.
  13. Waltman L., Eck N. J. A smart local moving algorithm for large–scale modularity–based community detection // The European Physical Journal B. 2013. Vol. 86, iss. 11. P. 471. DOI: https://doi.org/10.1140/epjb/e2013-40829-0.
  14. Romano S., Bailey J., Nguyen V., Verspoor K. Standardized mutual information for clustering comparisons: one step further in adjustment for chance // Proc. 31st Intern. Conf. on Machine Learning. Beijing, China : PMLR, 2014. Vol. 32, № 2. P. 1143–1151. URL: http://proceedings.mlr.press/v32/romano14.pdf (дата обращения: 25.04.2017).
  15. Хайкин C. Нейронные сети : полный курс. М. : ИД «Вильямс», 2006. 1104 с.
  16. Fortunato S., Barthelemy M. Resolution limit in community detection // Proc. National Academy of Sciences. 2007. № 104. P. 36–41. DOI: https://doi.org/10.1073/pnas.0605965104.449
  17. Traag V. A., Dooren P. V., Nesterov Y. Narrow scope for resolution-limit-free community detection // Phys. Rev. E. 2011. Vol. 84, iss. 1. P. 016114. DOI: https://doi.org/10.1103/PhysRevE.84.016114.
  18. Stanford Large Network Dataset Collection. URL: https://snap.stanford.edu/data (дата обращения: 25.04.2017).
Received: 
19.07.2017
Accepted: 
12.11.2017
Published: 
05.12.2017
Short text (in English):
(downloads: 59)