<?xml version="1.0" encoding="UTF-8"?>
<collection>
<dc:dc xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:invenio="http://invenio-software.org/elements/1.0" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xsi:schemaLocation="http://www.openarchives.org/OAI/2.0/oai_dc/ http://www.openarchives.org/OAI/2.0/oai_dc.xsd"><dc:identifier>doi:10.1109/PACT65351.2025.00035</dc:identifier><dc:language>eng</dc:language><dc:creator>Langarita, Rubén</dc:creator><dc:creator>Alastruey-Benedé, Jesús</dc:creator><dc:creator>Ibáñez-Marín, Pablo</dc:creator><dc:creator>Marco-Sola, Santiago</dc:creator><dc:creator>Moretó, Miquel</dc:creator><dc:creator>Armejach, Adrià</dc:creator><dc:title>Squire: a general-purpose accelerator to exploit fine-grain parallelism on dependency-bound kernels</dc:title><dc:identifier>ART-2025-147666</dc:identifier><dc:description>Multiple HPC applications are often bottlenecked by compute-intensive kernels implementing complex dependency patterns (data-dependency bound). Traditional general-purpose accelerators struggle to effectively exploit fine-grain parallelism due to limitations in implementing convoluted data-dependency patterns (like SIMD) and overheads due to synchronization and data transfers (like GPGPUs). In contrast, custom FPGA and ASIC designs offer improved performance and energy efficiency at a high cost in hardware design and programming complexity and often lack the flexibility to process different workloads. We propose Squire, a general-purpose accelerator designed to exploit fine-grain parallelism effectively on dependency-bound kernels. Each Squire accelerator has a set of general-purpose low-power in-order cores that can rapidly communicate among themselves and directly access data from the L2 cache. Our proposal integrates one Squire accelerator per core in a typical multicore system, allowing the acceleration of dependency-bound kernels within parallel tasks with minimal software changes. As a case study, we evaluate Squire’s effectiveness by accelerating five kernels that implement complex dependency patterns. We use three of these kernels to build an end-to-end read-mapping tool that will be used to evaluate Squire. Squire obtains speedups up to 7.64× in dynamic programming kernels. Overall, Squire provides an acceleration for an end-to-end application of 3.66×. In addition, Squire reduces energy consumption by up to 56% with a minimal area overhead of 10.5% compared to a NeoverseN1 baseline.</dc:description><dc:date>2025</dc:date><dc:source>http://zaguan.unizar.es/record/168108</dc:source><dc:doi>10.1109/PACT65351.2025.00035</dc:doi><dc:identifier>http://zaguan.unizar.es/record/168108</dc:identifier><dc:identifier>oai:zaguan.unizar.es:168108</dc:identifier><dc:relation>info:eu-repo/grantAgreement/ES/AEI/PID2022-136454NB-C22</dc:relation><dc:relation>info:eu-repo/grantAgreement/ES/AEI/PID2023-146193OB-I00</dc:relation><dc:relation>info:eu-repo/grantAgreement/ES/DGA/T58-23R</dc:relation><dc:relation>info:eu-repo/grantAgreement/ES/MICIU/PID2023-146511NB-I00</dc:relation><dc:identifier.citation>Proceedings of the Conference on Parallel Architectures and Compilation Techniques (2025), 292-305</dc:identifier.citation><dc:rights>All rights reserved</dc:rights><dc:rights>http://www.europeana.eu/rights/rr-f/</dc:rights><dc:rights>info:eu-repo/semantics/openAccess</dc:rights></dc:dc>

</collection>