Code Smell Detection Using Machine Learning in Static Analysis

Authors

  • Prof (Dr) Ajay Shriram Kushwaha Sharda University, Knowledge Park III, Greater Noida, U.P. 201310, India kushwaha.ajay22@gmail.com Author

DOI:

https://doi.org/10.63345/

Keywords:

code smell detection; static analysis; software metrics; machine learning; class imbalance; cross-project generalization; explainability

Abstract

Code smells—recurring design or implementation symptoms that indicate deeper problems—degrade maintainability, increase fault-proneness, and inflate long-term costs. Traditional smell detection relies on heuristics woven into static analysis rules or on expert judgment, both of which struggle to generalize across projects and languages. This manuscript presents a machine-learning (ML) approach that uses features derived from static analysis—object-oriented metrics, control-flow measures, dependency signals, and lightweight lexical cues—to detect prominent smells such as God Class, Long Method, Feature Envy, Data Class, and Shotgun Surgery. We design a reproducible pipeline covering dataset construction, feature extraction, imbalance handling, model training, evaluation, and statistical testing. A simulation study emulates multi-project conditions and cross-version drift to approximate realistic industrial scenarios. Four supervised learners—Logistic Regression, Random Forest, SVM (RBF), and XGBoost—are compared under stratified cross-validation and cross-project holdout. Performance is reported using F1-score (primary), with secondary examinations of calibration, error structure, and explainability (via model-agnostic feature attribution).

Results show tree-based ensembles (Random Forest and XGBoost) consistently outperform linear and kernel baselines, particularly for class-imbalance-sensitive smells (e.g., Shotgun Surgery). Statistical analysis using non-parametric tests indicates significant differences among learners, and ablation suggests that combining structure-aware metrics with succinct lexical signals yields the best trade-off between accuracy and interpretability. We conclude with practical guidance for toolsmiths and teams: use ensemble ML as an assistive layer on top of static analysis, expose explainable rankings rather than hard flags, calibrate thresholds by smell type, and validate models cross-project to avoid overfitting to local coding styles.

Downloads

Download data is not yet available.

Additional Files

Published

2026-06-06

How to Cite

Kushwaha, Prof (Dr) Ajay Shriram. “Code Smell Detection Using Machine Learning in Static Analysis”. International Journal of Advanced Research in Computer Science and Engineering (IJARCSE) U.S. ISSN: 3071-0154 2, no. 2 (June 6, 2026): Jun (38–49). Accessed June 13, 2026. https://ijarcse.org/index.php/ijarcse/article/view/150.

Similar Articles

1-10 of 88

You may also start an advanced similarity search for this article.