Estudios
I+D+I
Institución
Internacional
Vida Universitaria
Repositorio Institucional de Documentos
Buscar
Enviar
Personalizar
Sus alertas
Sus carpetas
Sus búsquedas
Ayuda
EN
/
ES
Página principal
>
Artículos
> Lightweight asynchronous scheduling in heterogeneous reconfigurable systems
Estadísticas de uso
Gráficos
Lightweight asynchronous scheduling in heterogeneous reconfigurable systems
Rodríguez, A.
;
Navarro, A.
;
Nikov, K.
;
Nunez-Yanez, J.
;
Gran Tejero, R.
(Universidad de Zaragoza)
;
Suárez Gracia, D.
(Universidad de Zaragoza)
;
Asenjo, R.
Resumen:
The trend for heterogeneous embedded systems is the integration of accelerators and general-purpose CPU cores on the same die. In these integrated architectures, like the Zynq UltraScale+ board (CPU+FPGA) that we target in this work, hardware support for shared memory and low-overhead synchronization between the accelerator and the CPU cores make the case for exploring strategies that exploit a tight collaboration between the CPUs and the accelerator. In this paper we propose a novel lightweight scheduling strategy, FastFit, targeted to FPGA accelerators, and a new scheduler based on it, named MultiFastFit, which asynchronously tackles heterogeneous systems comprised of a variety of CPU cores and FPGA IPs. Our strategy significantly reduces the overhead to automatically compute the near-optimal chunksizes when compared to a previous state-of-the-art auto-tuned approach, which makes our approach more suitable for fine-grained applications. Additionally, our scheduler MultiFastFit has been designed to enable the efficient co-execution of work among compute devices in such a way that all the devices are busy while minimizing the load unbalance. Our approaches have been evaluated using four benchmarks carefully tuned for the low-power UltraScale+ platform. Our experiments demonstrate that the FastFit strategy always finds the near-optimal FPGA chunksize for any device configuration at a reasonable cost, even for fine-grained and irregular applications, and that heterogeneous CPU+FPGA co-executions that exploit all the compute devices are usually faster and more energy efficient than the CPU-only and FPGA-only executions. We have also compared MultiFastFit with other state-of-the-art scheduling strategies, finding that it outperforms other auto-tuned approach up to 2x and it achieves similar results to manually-tuned schedulers without requiring an offline search of the ideal CPU-FPGA partition or FPGA chunk granularity. © 2022 The Authors
Idioma:
Inglés
DOI:
10.1016/j.sysarc.2022.102398
Año:
2022
Publicado en:
Journal of Systems Architecture
124 (2022), 102398 [14 pp]
ISSN:
1383-7621
Factor impacto JCR:
4.5 (2022)
Categ. JCR:
COMPUTER SCIENCE, SOFTWARE ENGINEERING
rank: 22 / 108 = 0.204
(2022)
- Q1
- T1
Categ. JCR:
COMPUTER SCIENCE, HARDWARE & ARCHITECTURE
rank: 11 / 54 = 0.204
(2022)
- Q1
- T1
Factor impacto CITESCORE:
8.5 -
Computer Science
(Q1)
Factor impacto SCIMAGO:
1.276 -
Software
(Q1) -
Hardware and Architecture
(Q1)
Financiación:
info:eu-repo/grantAgreement/ES/MICINN/PID2019-105396RB-I00
Tipo y forma:
Artículo (Versión definitiva)
Área (Departamento):
Área Arquit.Tecnología Comput.
(
Dpto. Informát.Ingenie.Sistms.
)
Debe reconocer adecuadamente la autoría, proporcionar un enlace a la licencia e indicar si se han realizado cambios. Puede hacerlo de cualquier manera razonable, pero no de una manera que sugiera que tiene el apoyo del licenciador o lo recibe por el uso que hace.
Exportado de SIDERAL (2024-03-18-13:43:25)
Enlace permanente:
Copiar
Visitas y descargas
Este artículo se encuentra en las siguientes colecciones:
Artículos
Volver a la búsqueda
Registro creado el 2022-05-03, última modificación el 2024-03-19
Versión publicada:
PDF
Valore este documento:
Rate this document:
1
2
3
4
5
(Sin ninguna reseña)
Añadir a una carpeta personal
Exportar como
BibTeX
,
MARC
,
MARCXML
,
DC
,
EndNote
,
NLM
,
RefWorks