Lightweight asynchronous scheduling in heterogeneous reconfigurable systems

Rodríguez, A.; Asenjo, R.; Navarro, A.; Nunez-Yanez, J.; Suárez Gracia, D.; Gran Tejero, R.; Nikov, K.

doi:10.1016/j.sysarc.2022.102398

Lightweight asynchronous scheduling in heterogeneous reconfigurable systems

Rodríguez, A. ; Navarro, A. ; Nikov, K. ; Nunez-Yanez, J. ; Gran Tejero, R. (Universidad de Zaragoza) ; Suárez Gracia, D. (Universidad de Zaragoza) ; Asenjo, R.

Resumen: The trend for heterogeneous embedded systems is the integration of accelerators and general-purpose CPU cores on the same die. In these integrated architectures, like the Zynq UltraScale+ board (CPU+FPGA) that we target in this work, hardware support for shared memory and low-overhead synchronization between the accelerator and the CPU cores make the case for exploring strategies that exploit a tight collaboration between the CPUs and the accelerator. In this paper we propose a novel lightweight scheduling strategy, FastFit, targeted to FPGA accelerators, and a new scheduler based on it, named MultiFastFit, which asynchronously tackles heterogeneous systems comprised of a variety of CPU cores and FPGA IPs. Our strategy significantly reduces the overhead to automatically compute the near-optimal chunksizes when compared to a previous state-of-the-art auto-tuned approach, which makes our approach more suitable for fine-grained applications. Additionally, our scheduler MultiFastFit has been designed to enable the efficient co-execution of work among compute devices in such a way that all the devices are busy while minimizing the load unbalance. Our approaches have been evaluated using four benchmarks carefully tuned for the low-power UltraScale+ platform. Our experiments demonstrate that the FastFit strategy always finds the near-optimal FPGA chunksize for any device configuration at a reasonable cost, even for fine-grained and irregular applications, and that heterogeneous CPU+FPGA co-executions that exploit all the compute devices are usually faster and more energy efficient than the CPU-only and FPGA-only executions. We have also compared MultiFastFit with other state-of-the-art scheduling strategies, finding that it outperforms other auto-tuned approach up to 2x and it achieves similar results to manually-tuned schedulers without requiring an offline search of the ideal CPU-FPGA partition or FPGA chunk granularity. © 2022 The Authors
Idioma: Inglés
DOI: 10.1016/j.sysarc.2022.102398
Año: 2022
Publicado en: Journal of Systems Architecture 124 (2022), 102398 [14 pp]
ISSN: 1383-7621
Factor impacto JCR: 4.5 (2022)
Categ. JCR: COMPUTER SCIENCE, SOFTWARE ENGINEERING rank: 22 / 108 = 0.204 (2022) - Q1 - T1
Categ. JCR: COMPUTER SCIENCE, HARDWARE & ARCHITECTURE rank: 11 / 54 = 0.204 (2022) - Q1 - T1
Factor impacto CITESCORE: 8.5 - Computer Science (Q1)

Factor impacto SCIMAGO: 1.276 - Software (Q1) - Hardware and Architecture (Q1)

Financiación: info:eu-repo/grantAgreement/ES/MICINN/PID2019-105396RB-I00
Tipo y forma: Artículo (Versión definitiva)
Área (Departamento): Área Arquit.Tecnología Comput. (Dpto. Informát.Ingenie.Sistms.)

Debe reconocer adecuadamente la autoría, proporcionar un enlace a la licencia e indicar si se han realizado cambios. Puede hacerlo de cualquier manera razonable, pero no de una manera que sugiera que tiene el apoyo del licenciador o lo recibe por el uso que hace.

Exportado de SIDERAL (2024-03-18-13:43:25)

Enlace permanente:

Visitas y descargas

Este artículo se encuentra en las siguientes colecciones:
Artículos

Volver a la búsqueda

Registro creado el 2022-05-03, última modificación el 2024-03-19

Versión publicada:
PDF

Valore este documento:

(Sin ninguna reseña)

Añadir a una carpeta personal
Exportar como BibTeX, MARC, MARCXML, DC, EndNote, NLM, RefWorks

Repositorio Institucional de Documentos

Lightweight asynchronous scheduling in heterogeneous reconfigurable systems