Gesture Recognition Systems Using Depth Sensors and Neural Networks

Authors

  • Li Na Independent Researcher Pudong, Shanghai, China (CN) – 200120 Author

Keywords:

Depth sensors; gesture recognition; neural networks; 3D skeleton; graph convolution; video transformer; CNN-LSTM; depth motion maps; robustness; edge AI

Abstract

Depth sensing has transformed gesture recognition from a brittle, appearance-driven problem into one that can reason directly about 3D structure and motion. This manuscript proposes an end-to-end design for gesture recognition using commodity depth sensors and modern neural architectures. We outline a pipeline that converts raw depth frames into multiple complementary representations—temporally aligned depth maps, depth-motion summaries, and 3D skeleton graphs—and we develop three models tailored to those views: (1) DepthMapNet, a lightweight 2D CNN with a bidirectional LSTM for temporal context; (2) SkeletoNet, a spatio-temporal graph convolutional network (ST-GCN) over skeletal joints; and (3) DepthFormer, a factorized video transformer operating directly on depth clips. We evaluate on a composite, depth-only gesture corpus of 30 classes created by harmonizing multiple public-style protocols (cross-subject and cross-view), and we present simulation studies probing robustness to sensor noise, occlusion, and distance.

Late-fusion of the three models improves macro-F1 by 6.3 percentage points over the depth-map baseline while maintaining sub-10 ms per-frame latency on an edge GPU. Statistical analysis across five folds shows the transformer and ST-GCN significantly outperform the CNN-LSTM baseline (paired t-tests, p < 0.05) with medium-to-large effect sizes. The study underscores three practical lessons: depth-only systems can be privacy-preserving yet highly accurate; skeleton graphs are strong under occlusion; and transformers capture long-range temporal dependencies but require careful regularization. We conclude with implementation guidance for embedded deployment and outline future directions in self-supervised pretraining and multi-sensor calibration.

Downloads

Download data is not yet available.

Published

2026-01-10

How to Cite

Li Na. “Gesture Recognition Systems Using Depth Sensors and Neural Networks”. International Journal of Advanced Research in Computer Science and Engineering (IJARCSE) 2, no. 1 (January 10, 2026): Jan (16–22). Accessed February 5, 2026. https://ijarcse.org/index.php/ijarcse/article/view/104.

Similar Articles

1-10 of 67

You may also start an advanced similarity search for this article.