Auto-Scaling Algorithms in Serverless Cloud Environments
Keywords:
Auto-scaling; Serverless computing; Function-as-a-Service; Predictive scaling; Cloud performanceAbstract
Serverless cloud computing, epitomized by Function-as-a-Service (FaaS) platforms, offers a revolutionary paradigm where developers focus solely on code logic while infrastructure concerns are fully abstracted by cloud providers. By enabling fine-grained resource billing based on actual execution duration and per-request consumption, serverless mitigates upfront capacity planning and minimizes idle infrastructure costs. However, inherent workload variability and abrupt request bursts introduce performance and cost challenges. Traditional reactive auto-scaling approaches, which provision additional function instances only after utilization thresholds are breached, often incur cold-start delays and transient latency spikes. Conversely, fully predictive algorithms, relying on historical time-series forecasting, can misestimate sudden demand changes, leading to under- or over-provisioning that either degrades user experience or elevates cost inefficiency. In this manuscript, we propose a novel Hybrid Predictive-Reactive (HPR) auto-scaling algorithm specifically tailored for serverless environments.
The algorithm integrates lightweight single exponential smoothing for near-term workload forecasting with robust reactive threshold triggers, triggering proactive scale-outs when forecasts anticipate imminent capacity exhaustion and reactive adjustments when actual utilization deviates beyond safe bounds. Controlled experiments are conducted in an enhanced CloudSim-based simulation framework, employing synthetic sinusoidal patterns, randomized Poisson bursts, and an industry-standard real-world FaaS workload trace. Performance metrics such as average response time, scaling latency, CPU utilization, throughput, and cost per thousand requests are systematically evaluated. Compared against baseline reactive-only and predictive-only schemes, HPR reduces mean response time by over 20 %, lowers average scaling latency by 25 %, increases utilization by 8 %, and cuts cost by 8 % on average. These results underscore the effectiveness of combining predictive foresight with reactive safety nets in achieving both stringent Service Level Objectives (SLOs) and cost efficiency in serverless auto-scaling. Implications for practical deployment and avenues for integrating advanced machine-learning forecasts and dynamic threshold tuning are also discussed.
Downloads
Downloads
Additional Files
Published
Issue
Section
License
Copyright (c) 2025 The journal retains copyright of all published articles, ensuring that authors have control over their work while allowing wide dissenmination.

This work is licensed under a Creative Commons Attribution-NonCommercial 4.0 International License.
Articles are published under the Creative Commons Attribution NonCommercial 4.0 License (CC BY NC 4.0), allowing others to distribute, remix, adapt, and build upon the work for non-commercial purposes while crediting the original author.
