Deep Learning Techniques for Spam URL Detection in Emails
DOI:
https://doi.org/10.63345/ijarcse.v1.i2.305Keywords:
deep learning; spam URL detection; email security; sequence modeling; transformer encoderAbstract
With the exponential growth of email communication, malicious actors increasingly embed harmful URLs in spam messages to phish, distribute malware, or facilitate fraud. Traditional rule‐based and shallow machine‐learning approaches struggle to generalize to novel URL patterns and obfuscation techniques. Deep learning, with its capacity for hierarchical feature extraction and sequence modeling, offers a promising solution for robust spam URL detection. This manuscript presents a comprehensive study of multiple deep neural architectures—including Convolutional Neural Networks (CNNs), Long Short‐Term Memory networks (LSTMs), and transformer‐based models—applied to the task of identifying spam URLs in email corpora. We detail a pipeline encompassing data collection and labeling, URL tokenization, character‐level and word‐level embeddings, and model training via stratified k-fold cross-validation. Statistical comparisons are conducted using one-way ANOVA and post-hoc testing to assess performance differentials among models.
A simulation environment is developed to mimic real-world email traffic with configurable spam injection rates, enabling assessment of detection latency and throughput under varying load conditions. Results demonstrate that transformer‐based encoders achieve peak detection accuracy (95.8 % ± 0.9 %) and F1-score (0.956 ± 0.008), significantly outperforming CNN (92.3 % ± 1.2 %) and LSTM (93.1 % ± 1.0 %) baselines. The conclusions underscore the trade-offs between detection performance, computational cost, and real-time applicability, offering guidelines for deployment in enterprise email security gateways.
Downloads
Downloads
Additional Files
Published
Issue
Section
License
Copyright (c) 2025 The journal retains copyright of all published articles, ensuring that authors have control over their work while allowing wide dissenmination.

This work is licensed under a Creative Commons Attribution-NonCommercial 4.0 International License.
Articles are published under the Creative Commons Attribution NonCommercial 4.0 License (CC BY NC 4.0), allowing others to distribute, remix, adapt, and build upon the work for non-commercial purposes while crediting the original author.