Auto-Scaling Algorithms in Serverless Cloud Environments

Revathi Arul

Authors

Revathi Arul Independent Researcher Velachery, Chennai, India (IN) – 600042 Author

Keywords:

Auto-scaling; Serverless computing; Function-as-a-Service; Predictive scaling; Cloud performance

Abstract

Serverless cloud computing, epitomized by Function-as-a-Service (FaaS) platforms, offers a revolutionary paradigm where developers focus solely on code logic while infrastructure concerns are fully abstracted by cloud providers. By enabling fine-grained resource billing based on actual execution duration and per-request consumption, serverless mitigates upfront capacity planning and minimizes idle infrastructure costs. However, inherent workload variability and abrupt request bursts introduce performance and cost challenges. Traditional reactive auto-scaling approaches, which provision additional function instances only after utilization thresholds are breached, often incur cold-start delays and transient latency spikes. Conversely, fully predictive algorithms, relying on historical time-series forecasting, can misestimate sudden demand changes, leading to under- or over-provisioning that either degrades user experience or elevates cost inefficiency. In this manuscript, we propose a novel Hybrid Predictive-Reactive (HPR) auto-scaling algorithm specifically tailored for serverless environments.

The algorithm integrates lightweight single exponential smoothing for near-term workload forecasting with robust reactive threshold triggers, triggering proactive scale-outs when forecasts anticipate imminent capacity exhaustion and reactive adjustments when actual utilization deviates beyond safe bounds. Controlled experiments are conducted in an enhanced CloudSim-based simulation framework, employing synthetic sinusoidal patterns, randomized Poisson bursts, and an industry-standard real-world FaaS workload trace. Performance metrics such as average response time, scaling latency, CPU utilization, throughput, and cost per thousand requests are systematically evaluated. Compared against baseline reactive-only and predictive-only schemes, HPR reduces mean response time by over 20 %, lowers average scaling latency by 25 %, increases utilization by 8 %, and cuts cost by 8 % on average. These results underscore the effectiveness of combining predictive foresight with reactive safety nets in achieving both stringent Service Level Objectives (SLOs) and cost efficiency in serverless auto-scaling. Implications for practical deployment and avenues for integrating advanced machine-learning forecasts and dynamic threshold tuning are also discussed.

Downloads

Download data is not yet available.

Auto-Scaling Algorithms in Serverless Cloud Environments

Authors

Keywords:

Abstract

Downloads

Downloads

Additional Files

Published

Issue

Section

License

How to Cite

Similar Articles

MakeSubmission

Call Submission

Information

Visitors

Keywords

Similar Articles

Cost-Performance Optimization in Hybrid Cloud Deployment Models

IoT-Enabled Smart Waste Management Systems Using RFID and Sensors

Distributed Load Testing for SaaS Applications in Cloud Environments

Energy-Efficient Resource Allocation in Green Cloud Infrastructure

Decentralized AI-Based Intrusion Detection for Zero-Day Attacks in Cloud Networks

Fog Computing for Edge AI Workloads in Smart Transportation Systems

AI-Enabled VM Migration Strategies in Cloud Infrastructure

Performance Evaluation of Lightweight Deep Learning Models on Embedded Systems

Hybrid AI Models for Real-Time Object Detection in Low-Bandwidth Environments

QoS-Aware Multi-Tenant Container Orchestration Using Kubernetes