GenArchBench: A genomics benchmark suite for arm HPC processors

López-Villellas, Lorién; Moretó, Miquel; Setoain, Javier; Ono, Makoto; Aguado-Puig, Quim; Kim, Chulho; López-Paradís, Guillem; Soria-Pardos, Víctor; Doblas, Max; Langarita-Benítez, Rubén; Marco-Sola, Santiago; Ibáñez, Pablo; Badouh, Asaf; Alastruey-Benedé, Jesús; Armejach, Adrià
doi:10.1016/j.future.2024.03.050
000134533 001__ 134533
000134533 005__ 20260217205450.0
000134533 0247_ $$2doi$$a10.1016/j.future.2024.03.050
000134533 0248_ $$2sideral$$a138191
000134533 037__ $$aART-2024-138191
000134533 041__ $$aeng
000134533 100__ $$0(orcid)0000-0002-1891-4359$$aLópez-Villellas, Lorién$$uUniversidad de Zaragoza
000134533 245__ $$aGenArchBench: A genomics benchmark suite for arm HPC processors
000134533 260__ $$c2024
000134533 5060_ $$aAccess copy available to the general public$$fUnrestricted
000134533 5203_ $$aArm usage has substantially grown in the High-Performance Computing (HPC) community. Japanese supercomputer Fugaku, powered by Arm-based A64FX processors, held the top position on the Top500 list between June 2020 and June 2022, currently sitting in the fourth position. The recently released 7th generation of Amazon EC2 instances for compute-intensive workloads (C7 g) is also powered by Arm Graviton3 processors. Projects like European Mont-Blanc and U.S. DOE/NNSA Astra are further examples of Arm irruption in HPC. In parallel, over the last decade, the rapid improvement of genomic sequencing technologies and the exponential growth of sequencing data has placed a significant bottleneck on the computational side. While most genomics applications have been thoroughly tested and optimized for x86 systems, just a few are prepared to perform efficiently on Arm machines. Moreover, these applications do not exploit the newly introduced Scalable Vector Extensions (SVE).
This paper presents GenArchBench, the first genome analysis benchmark suite targeting Arm architectures. We have selected computationally demanding kernels from the most widely used tools in genome data analysis and ported them to Arm-based A64FX and Graviton3 processors. Overall, the GenArch benchmark suite comprises 13 multi-core kernels from critical stages of widely-used genome analysis pipelines, including base-calling, read mapping, variant calling, and genome assembly. Our benchmark suite includes different input data sets per kernel (small and large), each with a corresponding regression test to verify the correctness of each execution automatically. Moreover, the porting features the usage of the novel Arm SVE instructions, algorithmic and code optimizations, and the exploitation of Arm-optimized libraries. We present the optimizations implemented in each kernel and a detailed performance evaluation and comparison of their performance on four different HPC machines (i.e., A64FX, Graviton3, Intel Xeon Skylake Platinum, and AMD EPYC Rome). Overall, the experimental evaluation shows that Graviton3 outperforms other machines on average. Moreover, we observed that the performance of the A64FX is significantly constrained by its small memory hierarchy and latencies. Additionally, as proof of concept, we study the performance of a production-ready tool that exploits two of the ported and optimized genomic kernels.
000134533 536__ $$9info:eu-repo/grantAgreement/ES/MICINN/TED2021-132634A-I00$$9info:eu-repo/grantAgreement/ES/MICINN/PID2022-136454NB-C22$$9info:eu-repo/grantAgreement/ES/MICINN/PID2019-107255GB-C21-AEI-10.13039/501100011033$$9info:eu-repo/grantAgreement/ES/MICINN/PID2019-105660RB-C21$$9info:eu-repo/grantAgreement/ES/DGA/T58-23R
000134533 540__ $$9info:eu-repo/semantics/openAccess$$aby-nc$$uhttps://creativecommons.org/licenses/by-nc/4.0/deed.es
000134533 590__ $$a6.1$$b2024
000134533 592__ $$a1.551$$b2024
000134533 591__ $$aCOMPUTER SCIENCE, THEORY & METHODS$$b15 / 147 = 0.102$$c2024$$dQ1$$eT1
000134533 593__ $$aComputer Networks and Communications$$c2024$$dQ1
000134533 593__ $$aSoftware$$c2024$$dQ1
000134533 593__ $$aHardware and Architecture$$c2024$$dQ1
000134533 594__ $$a17.1$$b2024
000134533 655_4 $$ainfo:eu-repo/semantics/article$$vinfo:eu-repo/semantics/publishedVersion
000134533 700__ $$aLangarita-Benítez, Rubén
000134533 700__ $$aBadouh, Asaf
000134533 700__ $$aSoria-Pardos, Víctor
000134533 700__ $$aAguado-Puig, Quim
000134533 700__ $$aLópez-Paradís, Guillem
000134533 700__ $$aDoblas, Max
000134533 700__ $$aSetoain, Javier
000134533 700__ $$aKim, Chulho
000134533 700__ $$aOno, Makoto
000134533 700__ $$aArmejach, Adrià
000134533 700__ $$aMarco-Sola, Santiago
000134533 700__ $$0(orcid)0000-0003-4164-5078$$aAlastruey-Benedé, Jesús$$uUniversidad de Zaragoza
000134533 700__ $$0(orcid)0000-0002-5916-7898$$aIbáñez, Pablo$$uUniversidad de Zaragoza
000134533 700__ $$aMoretó, Miquel
000134533 7102_ $$15007$$2035$$aUniversidad de Zaragoza$$bDpto. Informát.Ingenie.Sistms.$$cÁrea Arquit.Tecnología Comput.
000134533 773__ $$g157 (2024), 313-329$$pFuture gener. comput. syst.$$tFuture Generation Computer Systems-The International Journal of Grid Computing Theory Methods and Applications$$x0167-739X
000134533 8564_ $$s1833596$$uhttps://zaguan.unizar.es/record/134533/files/texto_completo.pdf$$yVersión publicada
000134533 8564_ $$s2491666$$uhttps://zaguan.unizar.es/record/134533/files/texto_completo.jpg?subformat=icon$$xicon$$yVersión publicada
000134533 909CO $$ooai:zaguan.unizar.es:134533$$particulos$$pdriver
000134533 951__ $$a2026-02-17-20:18:50
000134533 980__ $$aARTICLE
Universidad de Zaragoza Repository