Lightweight asynchronous scheduling in heterogeneous reconfigurable systems
Resumen: The trend for heterogeneous embedded systems is the integration of accelerators and general-purpose CPU cores on the same die. In these integrated architectures, like the Zynq UltraScale+ board (CPU+FPGA) that we target in this work, hardware support for shared memory and low-overhead synchronization between the accelerator and the CPU cores make the case for exploring strategies that exploit a tight collaboration between the CPUs and the accelerator. In this paper we propose a novel lightweight scheduling strategy, FastFit, targeted to FPGA accelerators, and a new scheduler based on it, named MultiFastFit, which asynchronously tackles heterogeneous systems comprised of a variety of CPU cores and FPGA IPs. Our strategy significantly reduces the overhead to automatically compute the near-optimal chunksizes when compared to a previous state-of-the-art auto-tuned approach, which makes our approach more suitable for fine-grained applications. Additionally, our scheduler MultiFastFit has been designed to enable the efficient co-execution of work among compute devices in such a way that all the devices are busy while minimizing the load unbalance. Our approaches have been evaluated using four benchmarks carefully tuned for the low-power UltraScale+ platform. Our experiments demonstrate that the FastFit strategy always finds the near-optimal FPGA chunksize for any device configuration at a reasonable cost, even for fine-grained and irregular applications, and that heterogeneous CPU+FPGA co-executions that exploit all the compute devices are usually faster and more energy efficient than the CPU-only and FPGA-only executions. We have also compared MultiFastFit with other state-of-the-art scheduling strategies, finding that it outperforms other auto-tuned approach up to 2x and it achieves similar results to manually-tuned schedulers without requiring an offline search of the ideal CPU-FPGA partition or FPGA chunk granularity. © 2022 The Authors
Idioma: Inglés
DOI: 10.1016/j.sysarc.2022.102398
Año: 2022
Publicado en: Journal of Systems Architecture 124 (2022), 102398 [14 pp]
ISSN: 1383-7621

Factor impacto JCR: 4.5 (2022)
Categ. JCR: COMPUTER SCIENCE, SOFTWARE ENGINEERING rank: 22 / 108 = 0.204 (2022) - Q1 - T1
Categ. JCR: COMPUTER SCIENCE, HARDWARE & ARCHITECTURE rank: 11 / 54 = 0.204 (2022) - Q1 - T1

Factor impacto CITESCORE: 8.5 - Computer Science (Q1)

Factor impacto SCIMAGO: 1.276 - Software (Q1) - Hardware and Architecture (Q1)

Financiación: info:eu-repo/grantAgreement/ES/MICINN/PID2019-105396RB-I00
Tipo y forma: Article (Published version)
Área (Departamento): Área Arquit.Tecnología Comput. (Dpto. Informát.Ingenie.Sistms.)

Creative Commons You must give appropriate credit, provide a link to the license, and indicate if changes were made. You may do so in any reasonable manner, but not in any way that suggests the licensor endorses you or your use.


Exportado de SIDERAL (2024-03-18-13:43:25)


Visitas y descargas

Este artículo se encuentra en las siguientes colecciones:
Articles > Artículos por área > Arquitectura y Tecnología de Computadores



 Record created 2022-05-03, last modified 2024-03-19


Versión publicada:
 PDF
Rate this document:

Rate this document:
1
2
3
 
(Not yet reviewed)