Gesture Recognition Systems Using Depth Sensors and Neural Networks
Keywords:
Depth sensors; gesture recognition; neural networks; 3D skeleton; graph convolution; video transformer; CNN-LSTM; depth motion maps; robustness; edge AIAbstract
Depth sensing has transformed gesture recognition from a brittle, appearance-driven problem into one that can reason directly about 3D structure and motion. This manuscript proposes an end-to-end design for gesture recognition using commodity depth sensors and modern neural architectures. We outline a pipeline that converts raw depth frames into multiple complementary representations—temporally aligned depth maps, depth-motion summaries, and 3D skeleton graphs—and we develop three models tailored to those views: (1) DepthMapNet, a lightweight 2D CNN with a bidirectional LSTM for temporal context; (2) SkeletoNet, a spatio-temporal graph convolutional network (ST-GCN) over skeletal joints; and (3) DepthFormer, a factorized video transformer operating directly on depth clips. We evaluate on a composite, depth-only gesture corpus of 30 classes created by harmonizing multiple public-style protocols (cross-subject and cross-view), and we present simulation studies probing robustness to sensor noise, occlusion, and distance.
Late-fusion of the three models improves macro-F1 by 6.3 percentage points over the depth-map baseline while maintaining sub-10 ms per-frame latency on an edge GPU. Statistical analysis across five folds shows the transformer and ST-GCN significantly outperform the CNN-LSTM baseline (paired t-tests, p < 0.05) with medium-to-large effect sizes. The study underscores three practical lessons: depth-only systems can be privacy-preserving yet highly accurate; skeleton graphs are strong under occlusion; and transformers capture long-range temporal dependencies but require careful regularization. We conclude with implementation guidance for embedded deployment and outline future directions in self-supervised pretraining and multi-sensor calibration.
Downloads
Downloads
Published
Issue
Section
License
Copyright (c) 2026 The journal retains copyright of all published articles, ensuring that authors have control over their work while allowing wide dissenmination.

This work is licensed under a Creative Commons Attribution-NonCommercial 4.0 International License.
Articles are published under the Creative Commons Attribution NonCommercial 4.0 License (CC BY NC 4.0), allowing others to distribute, remix, adapt, and build upon the work for non-commercial purposes while crediting the original author.
