Resumen: Heterogeneous chips that combine CPUs and FPGAs can distribute processing so that the algorithm tasks are mapped onto the most suitable processing element. New software-defined high-level design environments for these chips use general purpose languages such as C++ and OpenCL for hardware and interface generation without the need for register transfer language expertise. These advances in hardware compilers have resulted in significant increases in FPGA design productivity. In this paper, we investigate how to enhance an existing software-defined framework to reduce overheads and enable the utilization of all the available CPU cores in parallel with the FPGA hardware accelerators. Instead of selecting the best processing element for a task and simply offloading onto it, we introduce two schedulers, Dynamic and LogFit, which distribute the tasks among all the resources in an optimal manner. A new platform is created based on interrupts that removes spin-locks and allows the processing cores to sleep when not performing useful work. For a compute-intensive application, we obtained up to 45.56% more throughput and 17.89% less energy consumption when all devices of a Zynq-7000 SoC collaborate in the computation compared against FPGA-only execution. Idioma: Inglés DOI: 10.1007/s11227-018-2367-9 Año: 2018 Publicado en: Journal of Supercomputing 75 (2018), 4078 - 4095 ISSN: 0920-8542 Factor impacto JCR: 2.157 (2018) Categ. JCR: COMPUTER SCIENCE, HARDWARE & ARCHITECTURE rank: 22 / 52 = 0.423 (2018) - Q2 - T2 Categ. JCR: ENGINEERING, ELECTRICAL & ELECTRONIC rank: 132 / 265 = 0.498 (2018) - Q2 - T2 Categ. JCR: COMPUTER SCIENCE, THEORY & METHODS rank: 35 / 104 = 0.337 (2018) - Q2 - T2 Factor impacto SCIMAGO: 0.385 - Hardware and Architecture (Q2) - Theoretical Computer Science (Q2) - Software (Q2) - Information Systems (Q2)