Real-Time Stock Price Forecasting Using Big Data Pipelines
Keywords:
real-time forecasting, big data pipelines, limit order book, streaming analytics, LSTM, Transformer, concept drift, quantile regression, feature store, market microstructureAbstract
Real-time stock price forecasting is no longer just a modeling problem—it is a systems problem. Predictive performance depends as much on a low-latency, fault-tolerant data pipeline as on model choice. This manuscript presents an end-to-end approach for forecasting next-interval prices (and uncertainty bands) using a streaming big-data architecture that ingests tick-level market data and exogenous signals, engineers microstructure-aware features on the fly, and serves probabilistic deep learning forecasts with millisecond latency. We unify three strands: (i) robust ingestion/processing with distributed logs and stream processors, (ii) online learning with drift-aware model updates, and (iii) risk-aware evaluation that ties forecast quality to trading utility under realistic constraints. The literature review traces the evolution from ARIMA/GARCH to LSTM/Transformer families and highlights how scalable stream processing (e.g., Kafka-like logs, Spark/Flink operators) made “always-learning” models viable. Our methodology deploys a dual-path feature stack—ultra-low-latency order-flow features and slightly slower enriched signals (options-implied volatility, news/sentiment)—merged by a temporal attention forecaster trained with quantile loss.
A walk-forward protocol with rolling re-calibration and change-point monitoring combats concept drift. Simulation research replays historical limit-order-book (LOB) streams at real time, benchmarking classical baselines (ARIMA, GBM), machine learning (XGBoost), and deep learning (LSTM, Transformer with temporal fusion). The statistical analysis shows the proposed pipeline improving RMSE/MAE by 8–15% over strong baselines while keeping p99 end-to-end latency under 80 ms on commodity cloud instances. Results illustrate that (a) microstructure features dominate sub-minute horizons, (b) probabilistic forecasts enable superior drawdown control, and (c) lightweight online fine-tuning maintains edge during volatility regimes. We conclude with deployment guidance, limitations (microstructure regime shifts, data quality, and tail events), and directions for future research in adaptive uncertainty calibration and multi-asset transfer learning.
Downloads
Downloads
Published
Issue
Section
License
Copyright (c) 2026 The journal retains copyright of all published articles, ensuring that authors have control over their work while allowing wide dissenmination.

This work is licensed under a Creative Commons Attribution-NonCommercial 4.0 International License.
Articles are published under the Creative Commons Attribution NonCommercial 4.0 License (CC BY NC 4.0), allowing others to distribute, remix, adapt, and build upon the work for non-commercial purposes while crediting the original author.
