Early Disease Prediction Using Hybrid Ensemble ML Techniques
DOI:
https://doi.org/10.63345/ijarcse.v1.i1.301Keywords:
Early disease prediction, machine learning, ensemble models, hybrid techniques, healthcare analytics.Abstract
Early disease prediction plays a pivotal role in the modern healthcare paradigm by enabling timely interventions, improving prognosis, and reducing the burden on medical systems. The increasing availability of electronic health records (EHRs), wearable sensor data, and large-scale medical databases has facilitated the application of machine learning (ML) to extract meaningful patterns for early diagnosis. However, no single ML model has proven to be universally optimal across all disease categories due to data heterogeneity, imbalance, and complexity.
This study addresses the challenge by proposing a hybrid ensemble machine learning approach that integrates multiple model types—specifically, Random Forest (RF), Gradient Boosting Machine (GBM), XGBoost, and a Multi-layer Perceptron (MLP) neural network—within ensemble frameworks such as stacking and soft voting. By combining the predictive strengths of individual algorithms, the hybrid model mitigates overfitting, enhances generalization, and offers robustness against noisy or incomplete data.
Three benchmark medical datasets—diabetes, heart disease, and chronic kidney disease—were used to evaluate model performance. Standard preprocessing techniques such as normalization, missing value imputation, and label encoding were applied. The models were evaluated on metrics including accuracy, precision, recall, F1-score, and area under the ROC curve (AUC). Statistical analysis was conducted using paired t-tests and ANOVA to establish the significance of observed improvements.
Simulation experiments under varying data quality conditions confirmed that the hybrid model retained high predictive capability even in challenging scenarios. Results indicated that the hybrid ensemble model achieved up to 94.2% accuracy and outperformed all individual base learners.
The findings emphasize the potential of hybrid ensemble ML frameworks in the early detection of chronic diseases, with applications in clinical decision support systems, remote diagnostics, and personalized healthcare. The integration of interpretable machine learning and model explainability is suggested for future work to ensure transparency and clinical trust.
Downloads
Downloads
Additional Files
Published
Issue
Section
License
Copyright (c) 2025 The journal retains copyright of all published articles, ensuring that authors have control over their work while allowing wide dissenmination.

This work is licensed under a Creative Commons Attribution-NonCommercial 4.0 International License.
Articles are published under the Creative Commons Attribution NonCommercial 4.0 License (CC BY NC 4.0), allowing others to distribute, remix, adapt, and build upon the work for non-commercial purposes while crediting the original author.