Video Summarization Techniques Using Attention-Based CNN-LSTM Models

Camille Dupont

Video Summarization Techniques Using Attention-Based CNN-LSTM Models

Authors

Camille Dupont Independent Researcher Lille, France, FR, 59000 Author

Keywords:

Video summarization, CNN-LSTM, attention mechanism, deep learning, temporal modeling, feature extraction

Abstract

Video summarization is a critical task in multimedia processing that aims to generate concise, informative, and visually appealing summaries from lengthy video content while preserving essential information. Traditional approaches relied on handcrafted features, shot detection, and heuristic rules, which often failed to generalize to diverse content domains. With the advent of deep learning, convolutional neural networks (CNNs) and recurrent architectures such as long short-term memory (LSTM) networks have shown remarkable potential in visual feature extraction and temporal sequence modeling, respectively. Recent advances integrate attention mechanisms to enhance the relevance and quality of generated summaries by selectively focusing on the most informative segments. This paper investigates attention-based CNN-LSTM models for supervised and unsupervised video summarization.

The proposed model employs a CNN backbone for spatial feature encoding, an LSTM layer for temporal dynamics modeling, and a self-attention module for learning importance scores. A comprehensive simulation is performed using benchmark datasets such as SumMe and TVSum, and the results are evaluated using F-score and mean Average Precision (mAP) metrics. Statistical analysis demonstrates that attention-enhanced models outperform baseline CNN-LSTM approaches by up to 12% in summarization accuracy. This study concludes that attention mechanisms significantly improve temporal context understanding and help create more human-like summaries, paving the way for practical deployment in surveillance, entertainment, and educational video applications.

Downloads

Published

2026-01-02

Issue

Vol. 2 No. 1 (2026): Jan-Mar 2026

Section

Articles

License

This work is licensed under a Creative Commons Attribution-NonCommercial 4.0 International License.

Articles are published under the Creative Commons Attribution NonCommercial 4.0 License (CC BY NC 4.0), allowing others to distribute, remix, adapt, and build upon the work for non-commercial purposes while crediting the original author.

How to Cite

Dupont, Camille. “Video Summarization Techniques Using Attention-Based CNN-LSTM Models”. International Journal of Advanced Research in Computer Science and Engineering (IJARCSE) U.S. ISSN: 3071-0154 2, no. 1 (January 2, 2026): Jan (1–5). Accessed June 29, 2026. https://ijarcse.org/index.php/ijarcse/article/view/101.

Download Citation

Video Summarization Techniques Using Attention-Based CNN-LSTM Models

Authors

Keywords:

Abstract

Downloads

Published

Issue

Section

License

How to Cite

Most read articles by the same author(s)

Similar Articles

ISSN

Visitors

Find Us at

Keywords

Call Submission

Make a Submission

Information

Browse

Language

Latest publications

Similar Articles

Crowd Behavior Analysis Using AI in Surveillance Video Streams

Real-Time Stock Price Forecasting Using Big Data Pipelines

Deep Learning Techniques for Spam URL Detection in Emails

DeepFake Video Detection Using Spatio-Temporal Feature Fusion

Anomaly Detection in Time-Series IoT Data Using Transformer Architectures

ML-Based Predictive Maintenance in Industrial IoT Networks

AI-Based Intrusion Detection Systems for Software-Defined Networks

Healthcare Predictive Analytics Using BigQuery and TensorFlow

AI-Driven Malware Behaviour Classification and Detection Systems

Hybrid AI Models for Real-Time Object Detection in Low-Bandwidth Environments