Graph Analytics for Community Detection in Social Media Data
Keywords:
graph analytics; community detection; social media; modularity; Leiden; Louvain; Infomap; spectral clustering; label propagation; node2vec; LFR benchmark; conductance; NMI; dynamic networks; GNNsAbstract
Social media platforms generate massive, complex networks in which users, posts, and interactions form densely connected substructures commonly called communities. Detecting these communities enables tasks such as rumor tracking, influencer mapping, interest-based recommendation, and coordinated-behavior analysis. This manuscript presents a comprehensive, practical study of graph analytics for community detection in social media data. We synthesize foundations (graph models, modularity, conductance, and information-theoretic criteria), classical algorithms (Louvain, Leiden, Infomap, spectral clustering, label propagation), and modern embedding/GNN-based approaches. To ground the discussion, we design a simulation research protocol using the LFR benchmark to emulate social graphs with power-law degree distributions, variable community sizes, overlapping memberships, noise, and temporal drift. We also outline preprocessing steps for real platforms (retweet/reply/mention graphs; interaction weighting; bot/noise mitigation; attribute integration) that make methods reliable at scale.
Our methodology compares six approaches—Louvain, Leiden, Infomap, spectral clustering, label propagation, and node2vec+k-means—under controlled scenarios. Evaluation uses modularity (Q), Normalized Mutual Information (NMI) against ground truth (for simulations), and cut quality (conductance) alongside runtime. Statistical analysis over 10 randomized runs shows that Leiden improves modularity by ~5.4% and NMI by ~6.2% over Louvain with a small runtime overhead; Infomap yields the best conductance but is slower; label propagation remains fastest yet unstable; spectral performs strongly in quality but scales poorly; and embedding-based clustering is competitive and flexible, especially when attributes are informative. We discuss limitations (resolution limits, sensitivity to parameter choices, sampling bias, and temporal dynamics) and offer design guidelines for production pipelines—covering graph construction, algorithm selection, quality assurance, and ethical use. The study concludes with a set of actionable recommendations for deploying community detection in real social media analytics.
Downloads
Downloads
Published
Issue
Section
License
Copyright (c) 2026 The journal retains copyright of all published articles, ensuring that authors have control over their work while allowing wide dissenmination.

This work is licensed under a Creative Commons Attribution-NonCommercial 4.0 International License.
Articles are published under the Creative Commons Attribution NonCommercial 4.0 License (CC BY NC 4.0), allowing others to distribute, remix, adapt, and build upon the work for non-commercial purposes while crediting the original author.
