Real-Time Stock Price Forecasting Using Big Data Pipelines

Authors

  • Niharika Singh ABES Engineering College, Crossings Republik, Ghaziabad, Uttar Pradesh 201009 niharika250104@gmail.com Author

Keywords:

real-time forecasting, big data pipelines, limit order book, streaming analytics, LSTM, Transformer, concept drift, quantile regression, feature store, market microstructure

Abstract

Real-time stock price forecasting is no longer just a modeling problem—it is a systems problem. Predictive performance depends as much on a low-latency, fault-tolerant data pipeline as on model choice. This manuscript presents an end-to-end approach for forecasting next-interval prices (and uncertainty bands) using a streaming big-data architecture that ingests tick-level market data and exogenous signals, engineers microstructure-aware features on the fly, and serves probabilistic deep learning forecasts with millisecond latency. We unify three strands: (i) robust ingestion/processing with distributed logs and stream processors, (ii) online learning with drift-aware model updates, and (iii) risk-aware evaluation that ties forecast quality to trading utility under realistic constraints. The literature review traces the evolution from ARIMA/GARCH to LSTM/Transformer families and highlights how scalable stream processing (e.g., Kafka-like logs, Spark/Flink operators) made “always-learning” models viable. Our methodology deploys a dual-path feature stack—ultra-low-latency order-flow features and slightly slower enriched signals (options-implied volatility, news/sentiment)—merged by a temporal attention forecaster trained with quantile loss.

A walk-forward protocol with rolling re-calibration and change-point monitoring combats concept drift. Simulation research replays historical limit-order-book (LOB) streams at real time, benchmarking classical baselines (ARIMA, GBM), machine learning (XGBoost), and deep learning (LSTM, Transformer with temporal fusion). The statistical analysis shows the proposed pipeline improving RMSE/MAE by 8–15% over strong baselines while keeping p99 end-to-end latency under 80 ms on commodity cloud instances. Results illustrate that (a) microstructure features dominate sub-minute horizons, (b) probabilistic forecasts enable superior drawdown control, and (c) lightweight online fine-tuning maintains edge during volatility regimes. We conclude with deployment guidance, limitations (microstructure regime shifts, data quality, and tail events), and directions for future research in adaptive uncertainty calibration and multi-asset transfer learning.

Downloads

Download data is not yet available.

Published

2026-02-03

How to Cite

Singh, Niharika. “Real-Time Stock Price Forecasting Using Big Data Pipelines”. International Journal of Advanced Research in Computer Science and Engineering (IJARCSE) 2, no. 1 (February 3, 2026): Feb (12–22). Accessed February 5, 2026. https://ijarcse.org/index.php/ijarcse/article/view/109.

Similar Articles

21-30 of 72

You may also start an advanced similarity search for this article.