Real-Time Sign Language Detection Using YOLOv5 and CNN

Authors

  • Arnav Khanna Independent Researcher Aliganj, Lucknow, India (IN) – 226024 Author

Keywords:

YOLOv5; convolutional neural networks; sign language recognition; real-time detection; human–computer interaction; edge AI; computer vision

Abstract

Real-time sign language understanding can expand access to education, healthcare, and public services for Deaf and hard-of-hearing communities, but it remains technically challenging due to fast hand motion, self-occlusion, variable lighting, diverse signing styles, and the need to operate at edge-device frame rates. This manuscript presents a practical, end-to-end pipeline that combines YOLOv5 for fast, robust hand-and-face localization with a lightweight convolutional neural network (CNN) for isolated sign classification. The detector narrows attention to semantically relevant regions, while the classifier focuses on pose, shape, and finger configuration. For dynamic signs, we extend the classifier with a short sliding-window encoder that aggregates evidence across 8–16 frames without sacrificing latency. The workflow supports online augmentation, temporal smoothing, and confidence-aware post-processing to stabilize predictions in live streams. We describe dataset preparation, annotation strategies, loss functions, hyperparameters, and deployment considerations (CPU/GPU and embedded).

A simulation study using public ASL-style alphabets and custom capture clips evaluates accuracy, mAP, macro-F1, and latency/FPS; ablations quantify the contribution of region-of-interest (ROI) cropping, color jitter, and temporal aggregation. Results indicate that coupling YOLOv5 with a compact CNN improves classification accuracy by ~10 percentage points over whole-frame baselines while meeting real-time constraints on commodity GPUs and modern edge SoCs. We discuss typical failure modes (similar hand shapes, signer variance, background clutter) and outline opportunities in continuous (sentence-level) recognition, signer adaptation, and multilingual coverage. The proposed design balances accuracy, speed, and simplicity, making it suitable for assistive kiosks, classroom captioning, and mobile translation aids.

Downloads

Download data is not yet available.

Published

2025-12-04

How to Cite

Khanna, Arnav. “Real-Time Sign Language Detection Using YOLOv5 and CNN”. International Journal of Advanced Research in Computer Science and Engineering (IJARCSE) 1, no. 4 (December 4, 2025): Dec (9–15). Accessed January 22, 2026. https://ijarcse.org/index.php/ijarcse/article/view/97.

Similar Articles

11-20 of 61

You may also start an advanced similarity search for this article.