DeepFake Video Detection Using Spatio-Temporal Feature Fusion
Keywords:
DeepFake detection; spatio-temporal fusion; video forensics; transformer; rPPG; frequency features; domain generalizationAbstract
DeepFake videos—synthetic clips that manipulate a subject’s identity or expression—pose escalating risks to privacy, journalism, elections, and platform integrity. While early detectors focused on per-frame spatial artifacts (e.g., blending seams, color mismatches, and frequency anomalies), modern generators increasingly minimize such cues, shifting the detection frontier toward temporal inconsistencies in motion, physiology, and cross-frame coherence. This manuscript proposes a principled framework for spatio-temporal feature fusion (STFF) that integrates complementary signals across three axes: (i) rich spatial descriptors from RGB and frequency representations, (ii) subtle physiological and photometric cues (e.g., remote photoplethysmography (rPPG) and specular dynamics), and (iii) temporal dynamics captured by convolutional and attention-based sequence models. We outline a full pipeline—from face tracking and frame sampling to multi-branch feature extraction, attention-based temporal aggregation, and calibrated video-level decisioning—along with robust training strategies for cross-codec robustness and cross-dataset generalization.
A statistical analysis (with an illustrative results table) suggests that fusing spatial and temporal features yields consistent gains in AUC and F1 over spatial-only and temporal-only baselines across common benchmarks. We discuss ablations, error modes under heavy compression, open-world domain shift, and model calibration. The paper concludes with limitations and future directions, including self-supervised pretraining, open-set recognition, and causal temporal modeling to reduce overfitting to superficial artifacts.
Downloads
Downloads
Published
Issue
Section
License
Copyright (c) 2026 The journal retains copyright of all published articles, ensuring that authors have control over their work while allowing wide dissenmination.

This work is licensed under a Creative Commons Attribution-NonCommercial 4.0 International License.
Articles are published under the Creative Commons Attribution NonCommercial 4.0 License (CC BY NC 4.0), allowing others to distribute, remix, adapt, and build upon the work for non-commercial purposes while crediting the original author.
