Code Smell Detection Using Machine Learning in Static Analysis

Prof (Dr) Ajay Shriram Kushwaha

Code Smell Detection Using Machine Learning in Static Analysis

Authors

Prof (Dr) Ajay Shriram Kushwaha Sharda University, Knowledge Park III, Greater Noida, U.P. 201310, India kushwaha.ajay22@gmail.com Author

Keywords:

code smell detection; static analysis; software metrics; machine learning; class imbalance; cross-project generalization; explainability

Abstract

Code smells—recurring design or implementation symptoms that indicate deeper problems—degrade maintainability, increase fault-proneness, and inflate long-term costs. Traditional smell detection relies on heuristics woven into static analysis rules or on expert judgment, both of which struggle to generalize across projects and languages. This manuscript presents a machine-learning (ML) approach that uses features derived from static analysis—object-oriented metrics, control-flow measures, dependency signals, and lightweight lexical cues—to detect prominent smells such as God Class, Long Method, Feature Envy, Data Class, and Shotgun Surgery. We design a reproducible pipeline covering dataset construction, feature extraction, imbalance handling, model training, evaluation, and statistical testing. A simulation study emulates multi-project conditions and cross-version drift to approximate realistic industrial scenarios. Four supervised learners—Logistic Regression, Random Forest, SVM (RBF), and XGBoost—are compared under stratified cross-validation and cross-project holdout. Performance is reported using F1-score (primary), with secondary examinations of calibration, error structure, and explainability (via model-agnostic feature attribution).

Results show tree-based ensembles (Random Forest and XGBoost) consistently outperform linear and kernel baselines, particularly for class-imbalance-sensitive smells (e.g., Shotgun Surgery). Statistical analysis using non-parametric tests indicates significant differences among learners, and ablation suggests that combining structure-aware metrics with succinct lexical signals yields the best trade-off between accuracy and interpretability. We conclude with practical guidance for toolsmiths and teams: use ensemble ML as an assistive layer on top of static analysis, expose explainable rankings rather than hard flags, calibrate thresholds by smell type, and validate models cross-project to avoid overfitting to local coding styles.

Downloads

Additional Files

Confirmation Letter ⁵

Published

2026-06-06

Issue

Vol. 2 No. 2 (2026): Apr-Jun 2026

Section

Articles

License

This work is licensed under a Creative Commons Attribution-NonCommercial 4.0 International License.

Articles are published under the Creative Commons Attribution NonCommercial 4.0 License (CC BY NC 4.0), allowing others to distribute, remix, adapt, and build upon the work for non-commercial purposes while crediting the original author.

How to Cite

Kushwaha, Prof (Dr) Ajay Shriram. “Code Smell Detection Using Machine Learning in Static Analysis”. International Journal of Advanced Research in Computer Science and Engineering 2, no. 2 (June 6, 2026): Jun (38–49). Accessed July 28, 2026. https://ijarcse.org/index.php/ijarcse/article/view/150.

Download Citation

Code Smell Detection Using Machine Learning in Static Analysis

Authors

Keywords:

Abstract

Downloads

Additional Files

Published

Issue

Section

License

How to Cite

Similar Articles

ISSN

Visitors

Find Us at

Keywords

Call Submission

Make a Submission

Information

Browse

Language

Latest publications

Similar Articles

Feature Engineering for Software Defect Prediction Models

AI-Driven Malware Behaviour Classification and Detection Systems

Early Disease Prediction Using Hybrid Ensemble ML Techniques

AI-Based Intrusion Detection Systems for Software-Defined Networks

Explainability-Driven Feature Selection for Financial Fraud Detection

IoT Firmware Security Auditing Using Automated Vulnerability Scanning

Cross-Dataset Face Anti-Spoofing Using Domain Adaptation Techniques

ML-Based Fault Prediction in Wind Turbine Monitoring Systems

Adaptive Learning Rate Strategies in Deep Reinforcement Learning Agents

Self-Healing AI: An Autonomous Deep Learning Approach for Software Error Correction