Video Summarization Techniques Using Attention-Based CNN-LSTM Models

Authors

  • Camille Dupont Independent Researcher Lille, France, FR, 59000 Author

Keywords:

Video summarization, CNN-LSTM, attention mechanism, deep learning, temporal modeling, feature extraction

Abstract

Video summarization is a critical task in multimedia processing that aims to generate concise, informative, and visually appealing summaries from lengthy video content while preserving essential information. Traditional approaches relied on handcrafted features, shot detection, and heuristic rules, which often failed to generalize to diverse content domains. With the advent of deep learning, convolutional neural networks (CNNs) and recurrent architectures such as long short-term memory (LSTM) networks have shown remarkable potential in visual feature extraction and temporal sequence modeling, respectively. Recent advances integrate attention mechanisms to enhance the relevance and quality of generated summaries by selectively focusing on the most informative segments. This paper investigates attention-based CNN-LSTM models for supervised and unsupervised video summarization.

The proposed model employs a CNN backbone for spatial feature encoding, an LSTM layer for temporal dynamics modeling, and a self-attention module for learning importance scores. A comprehensive simulation is performed using benchmark datasets such as SumMe and TVSum, and the results are evaluated using F-score and mean Average Precision (mAP) metrics. Statistical analysis demonstrates that attention-enhanced models outperform baseline CNN-LSTM approaches by up to 12% in summarization accuracy. This study concludes that attention mechanisms significantly improve temporal context understanding and help create more human-like summaries, paving the way for practical deployment in surveillance, entertainment, and educational video applications.

Downloads

Download data is not yet available.

Published

2026-01-02

How to Cite

Dupont, Camille. “Video Summarization Techniques Using Attention-Based CNN-LSTM Models”. International Journal of Advanced Research in Computer Science and Engineering (IJARCSE) 2, no. 1 (January 2, 2026): Jan (1–5). Accessed February 5, 2026. https://ijarcse.org/index.php/ijarcse/article/view/101.

Similar Articles

1-10 of 56

You may also start an advanced similarity search for this article.