Real-Time Sign Language Detection Using YOLOv5 and CNN
Keywords:
YOLOv5; convolutional neural networks; sign language recognition; real-time detection; human–computer interaction; edge AI; computer visionAbstract
Real-time sign language understanding can expand access to education, healthcare, and public services for Deaf and hard-of-hearing communities, but it remains technically challenging due to fast hand motion, self-occlusion, variable lighting, diverse signing styles, and the need to operate at edge-device frame rates. This manuscript presents a practical, end-to-end pipeline that combines YOLOv5 for fast, robust hand-and-face localization with a lightweight convolutional neural network (CNN) for isolated sign classification. The detector narrows attention to semantically relevant regions, while the classifier focuses on pose, shape, and finger configuration. For dynamic signs, we extend the classifier with a short sliding-window encoder that aggregates evidence across 8–16 frames without sacrificing latency. The workflow supports online augmentation, temporal smoothing, and confidence-aware post-processing to stabilize predictions in live streams. We describe dataset preparation, annotation strategies, loss functions, hyperparameters, and deployment considerations (CPU/GPU and embedded).
A simulation study using public ASL-style alphabets and custom capture clips evaluates accuracy, mAP, macro-F1, and latency/FPS; ablations quantify the contribution of region-of-interest (ROI) cropping, color jitter, and temporal aggregation. Results indicate that coupling YOLOv5 with a compact CNN improves classification accuracy by ~10 percentage points over whole-frame baselines while meeting real-time constraints on commodity GPUs and modern edge SoCs. We discuss typical failure modes (similar hand shapes, signer variance, background clutter) and outline opportunities in continuous (sentence-level) recognition, signer adaptation, and multilingual coverage. The proposed design balances accuracy, speed, and simplicity, making it suitable for assistive kiosks, classroom captioning, and mobile translation aids.
Downloads
Downloads
Published
Issue
Section
License
Copyright (c) 2025 The journal retains copyright of all published articles, ensuring that authors have control over their work while allowing wide dissenmination.

This work is licensed under a Creative Commons Attribution-NonCommercial 4.0 International License.
Articles are published under the Creative Commons Attribution NonCommercial 4.0 License (CC BY NC 4.0), allowing others to distribute, remix, adapt, and build upon the work for non-commercial purposes while crediting the original author.
