Phase transition in the computational complexity of the shortest common superstring and genome assembly
Resumen: Genome assembly, the process of reconstructing a long genetic sequence by aligning and merging short fragments, or reads, is known to be NP-hard, either as a version of the shortest common superstring problem or in a Hamiltonian-cycle formulation. That is, the computing time is believed to grow exponentially with the problem size in the worst case. Despite this fact, high-throughput technologies and modern algorithms currently allow bioinformaticians to handle datasets of billions of reads. Using methods from statistical mechanics, we address this conundrum by demonstrating the existence of a phase transition in the computational complexity of the problem and showing that practical instances always fall in the “easy” phase (solvable by polynomial-time algorithms). In addition, we propose a Markov-chain Monte Carlo method that outperforms common deterministic algorithms in the hard regime.
Idioma: Inglés
DOI: 10.1103/PhysRevE.109.014133
Año: 2024
Publicado en: Physical Review E 109, 1 (2024), 014133 [8 pp.]
ISSN: 2470-0045

Factor impacto JCR: 2.4 (2024)
Categ. JCR: PHYSICS, MATHEMATICAL rank: 13 / 61 = 0.213 (2024) - Q1 - T1
Categ. JCR: PHYSICS, FLUIDS & PLASMAS rank: 17 / 41 = 0.415 (2024) - Q2 - T2

Factor impacto SCIMAGO: 0.705 - Condensed Matter Physics (Q2) - Statistics and Probability (Q2) - Statistical and Nonlinear Physics (Q2)

Financiación: info:eu-repo/grantAgreement/ES/MICINN AEI/PID2022-136374NB-C21
Tipo y forma: Artículo (PostPrint)

Derechos Reservados Derechos reservados por el editor de la revista


Exportado de SIDERAL (2025-09-22-14:31:28)


Visitas y descargas

Este artículo se encuentra en las siguientes colecciones:
Artículos



 Registro creado el 2025-01-15, última modificación el 2025-09-23


Postprint:
 PDF
Valore este documento:

Rate this document:
1
2
3
 
(Sin ninguna reseña)