Efficient String Matching Algorithms in DNA Sequence Analysis

Er. Niharika Singh

doi:10.63345/

Efficient String Matching Algorithms in DNA Sequence Analysis

Authors

Er. Niharika Singh ABES Engineering College, Crossings Republik, Ghaziabad, Uttar Pradesh 201009 niharika250104@gmail.com Author

DOI:

https://doi.org/10.63345/

Keywords:

DNA sequence analysis; string matching; FM-index; seed-and-extend; edit distance; read mapping; algorithm efficiency

Abstract

String matching underpins nearly every task in computational genomics—from mapping next-generation sequencing (NGS) reads to reference genomes and detecting motifs to calling variants and scanning regulatory regions. The volume and heterogeneity of contemporary sequencing data place extreme demands on algorithmic efficiency, with practical constraints on time, memory, and accuracy under sequencing errors and biological variation. This manuscript surveys the algorithmic foundations of efficient string matching for DNA sequence analysis, distinguishes exact from approximate matching, and clarifies when to favor hash-based seeding, automata, or index-based approaches such as suffix arrays and FM-indexes. We complement the conceptual review with a controlled simulation that compares six representative techniques—Knuth–Morris–Pratt (KMP), Boyer–Moore, Rabin–Karp, a bit-parallel edit-distance method (Myers), an FM-index (Burrows–Wheeler–transform based), and a seed-and-extend strategy with spaced seeds—on synthetic read-mapping tasks.

Evaluation emphasizes sensitivity, precision, F1-score, throughput (reads per second), and memory footprint. Results highlight trade-offs: FM-index and seed-and-extend approaches achieve the best F1 and throughput on noisy reads, while classical exact matchers are fast but brittle to mismatches/indels. We discuss statistical significance, effect sizes, and practical guidance for algorithm selection across read length, error profiles, and genome repetitiveness. We conclude with scope and limitations relevant to large genomes, long-read technologies, and structural variation, offering a roadmap for deploying efficient string matching in modern bioinformatics pipelines.

Downloads

Download data is not yet available.

Downloads

Additional Files

Confirmation Letter ⁰

Published

2026-05-03

Issue

Vol. 2 No. 2 (2026): Apr-Jun 2026

Section

Articles

License

This work is licensed under a Creative Commons Attribution-NonCommercial 4.0 International License.

Articles are published under the Creative Commons Attribution NonCommercial 4.0 License (CC BY NC 4.0), allowing others to distribute, remix, adapt, and build upon the work for non-commercial purposes while crediting the original author.

How to Cite

Singh, Er. Niharika. “Efficient String Matching Algorithms in DNA Sequence Analysis”. International Journal of Advanced Research in Computer Science and Engineering (IJARCSE) U.S. ISSN: 3071-0154 2, no. 2 (May 3, 2026): May (14–26). Accessed June 12, 2026. https://ijarcse.org/index.php/ijarcse/article/view/143.

Download Citation

Efficient String Matching Algorithms in DNA Sequence Analysis

Authors

DOI:

Keywords:

Abstract

Downloads

Downloads

Additional Files

Published

Issue

Section

License

How to Cite

Similar Articles

MakeSubmission

Call Submission

Information

Visitors

Keywords

Similar Articles

Multi-View Clustering Algorithms for Big Data Analytics

AI-Based Route Optimization in Urban Public Transport Networks

Post-Quantum Cryptographic Algorithm Performance on IoT Devices

Graph Analytics for Community Detection in Social Media Data

Optimization of Search Algorithms for High-Dimensional Data Spaces

Energy-Efficient Resource Allocation in Green Cloud Infrastructure

IoT-Enabled Smart Waste Management Systems Using RFID and Sensors

Explainability-Driven Feature Selection for Financial Fraud Detection

Real-Time Air Quality Monitoring Using Edge IoT Gateways

Comparative Study of Bio-Inspired Algorithms for Task Scheduling