Video Summarization Techniques Using Attention-Based CNN-LSTM Models
Keywords:
Video summarization, CNN-LSTM, attention mechanism, deep learning, temporal modeling, feature extractionAbstract
Video summarization is a critical task in multimedia processing that aims to generate concise, informative, and visually appealing summaries from lengthy video content while preserving essential information. Traditional approaches relied on handcrafted features, shot detection, and heuristic rules, which often failed to generalize to diverse content domains. With the advent of deep learning, convolutional neural networks (CNNs) and recurrent architectures such as long short-term memory (LSTM) networks have shown remarkable potential in visual feature extraction and temporal sequence modeling, respectively. Recent advances integrate attention mechanisms to enhance the relevance and quality of generated summaries by selectively focusing on the most informative segments. This paper investigates attention-based CNN-LSTM models for supervised and unsupervised video summarization.
The proposed model employs a CNN backbone for spatial feature encoding, an LSTM layer for temporal dynamics modeling, and a self-attention module for learning importance scores. A comprehensive simulation is performed using benchmark datasets such as SumMe and TVSum, and the results are evaluated using F-score and mean Average Precision (mAP) metrics. Statistical analysis demonstrates that attention-enhanced models outperform baseline CNN-LSTM approaches by up to 12% in summarization accuracy. This study concludes that attention mechanisms significantly improve temporal context understanding and help create more human-like summaries, paving the way for practical deployment in surveillance, entertainment, and educational video applications.
Downloads
Downloads
Published
Issue
Section
License
Copyright (c) 2026 The journal retains copyright of all published articles, ensuring that authors have control over their work while allowing wide dissenmination.

This work is licensed under a Creative Commons Attribution-NonCommercial 4.0 International License.
Articles are published under the Creative Commons Attribution NonCommercial 4.0 License (CC BY NC 4.0), allowing others to distribute, remix, adapt, and build upon the work for non-commercial purposes while crediting the original author.
