DeepFake Video Detection Using Spatio-Temporal Feature Fusion

Authors

  • Huang Bo Independent Researcher Heping District, Tianjin, China (CN) – 300041 Author

Keywords:

DeepFake detection; spatio-temporal fusion; video forensics; transformer; rPPG; frequency features; domain generalization

Abstract

DeepFake videos—synthetic clips that manipulate a subject’s identity or expression—pose escalating risks to privacy, journalism, elections, and platform integrity. While early detectors focused on per-frame spatial artifacts (e.g., blending seams, color mismatches, and frequency anomalies), modern generators increasingly minimize such cues, shifting the detection frontier toward temporal inconsistencies in motion, physiology, and cross-frame coherence. This manuscript proposes a principled framework for spatio-temporal feature fusion (STFF) that integrates complementary signals across three axes: (i) rich spatial descriptors from RGB and frequency representations, (ii) subtle physiological and photometric cues (e.g., remote photoplethysmography (rPPG) and specular dynamics), and (iii) temporal dynamics captured by convolutional and attention-based sequence models. We outline a full pipeline—from face tracking and frame sampling to multi-branch feature extraction, attention-based temporal aggregation, and calibrated video-level decisioning—along with robust training strategies for cross-codec robustness and cross-dataset generalization.

A statistical analysis (with an illustrative results table) suggests that fusing spatial and temporal features yields consistent gains in AUC and F1 over spatial-only and temporal-only baselines across common benchmarks. We discuss ablations, error modes under heavy compression, open-world domain shift, and model calibration. The paper concludes with limitations and future directions, including self-supervised pretraining, open-set recognition, and causal temporal modeling to reduce overfitting to superficial artifacts.

Downloads

Download data is not yet available.

Published

2026-01-15

How to Cite

Bo, Huang. “DeepFake Video Detection Using Spatio-Temporal Feature Fusion”. International Journal of Advanced Research in Computer Science and Engineering (IJARCSE) 2, no. 1 (January 15, 2026): Jan (23–29). Accessed February 5, 2026. https://ijarcse.org/index.php/ijarcse/article/view/105.

Similar Articles

1-10 of 43

You may also start an advanced similarity search for this article.