Universidad de Zaragoza Custodiado por la Biblioteca de la Universidad de Zaragoza Premis-plugin for CDSInvenio, developed by Miguel Martín Miguel Martín González
oai:zaguan.unizar.es:9229 2017-08-31
eng Ferrerón Labari, Alexandra Suárez Gracia, Darío Efficient instruction and data caching for high-performance low-power embedded systems https://zaguan.unizar.es/record/9229/files/TAZ-TFM-2012-919_ANE.pdf https://zaguan.unizar.es/record/9229/files/TAZ-TFM-2012-919.pdf Although multi-threading processors can increase the performance of embedded systems with a minimum overhead, fetching instructions from multiple threads each cycle also increases the pressure on the instruction cache, potentially harming the performance/consumption ratio. Instruction caches are responsible of a high percentage of the total energy consumption of the chip, which for battery-powered embedded devices becomes a critical issue. A direct way to reduce the energy consumption of the first level instruction cache is to decrease its size and associativity. However, demanding applications, and specially applications with several threads running together, might suffer a dramatic performance slow down, or even increase the total energy consumption of the cache hierarchy, due to the extra misses incurred. In this work we introduce iLP-NUCA (Instruction Light Power NUCA), a new instruction cache that replaces the conventional second level cache (L2) and improves the Energy–Delay of the system. We provided iLP-NUCA with a new tree-based transport network-in-cache that reduces both the cache line service latency and the energy consumption, regarding the former LP-NUCA implementation. We modeled in our cycle-accurate simulation environment both conventional instruction hierarchies and iLP-NUCAs. Our experiments show that, running SPEC CPU2006, iLP-NUCA, in comparison with a state–of–the–art high performance conventional cache hierarchy (three cache levels, dedicated L1 and L2, shared L3), performs better and consumes less energy. Furthermore, iLP-NUCA reaches the performance, on average, of a conventional instruction cache hierarchy implementing a double sized L1, independently of the number of threads. This translates into a reduction of the Energy–Delay product of 21%, 18%, and 11%, reaching 90%, 95%, and 99% of the ideal performance for 1, 2, and 4 threads, respectively. These results are consistent for the considered applications distribution, and bigger gains are in the most demanding applications (applications with high instruction cache requirements). Besides, we increase the performance of applications with several threads without being detrimental for any of them. The new transport topology reduces the average service latency of cache lines by 8%, and the energy consumption of its components by 20%. 2014-11-27
9229 20170831220420.0 TAZ-TFM-2012-919 eng Ferrerón Labari, Alexandra Efficient instruction and data caching for high-performance low-power embedded systems Zaragoza Universidad de Zaragoza 2012 by-nc-sa Creative Commons 3.0 http://creativecommons.org/licenses/by-nc-sa/3.0/ Abstract also available in Spanish. Although multi-threading processors can increase the performance of embedded systems with a minimum overhead, fetching instructions from multiple threads each cycle also increases the pressure on the instruction cache, potentially harming the performance/consumption ratio. Instruction caches are responsible of a high percentage of the total energy consumption of the chip, which for battery-powered embedded devices becomes a critical issue. A direct way to reduce the energy consumption of the first level instruction cache is to decrease its size and associativity. However, demanding applications, and specially applications with several threads running together, might suffer a dramatic performance slow down, or even increase the total energy consumption of the cache hierarchy, due to the extra misses incurred. In this work we introduce iLP-NUCA (Instruction Light Power NUCA), a new instruction cache that replaces the conventional second level cache (L2) and improves the Energy–Delay of the system. We provided iLP-NUCA with a new tree-based transport network-in-cache that reduces both the cache line service latency and the energy consumption, regarding the former LP-NUCA implementation. We modeled in our cycle-accurate simulation environment both conventional instruction hierarchies and iLP-NUCAs. Our experiments show that, running SPEC CPU2006, iLP-NUCA, in comparison with a state–of–the–art high performance conventional cache hierarchy (three cache levels, dedicated L1 and L2, shared L3), performs better and consumes less energy. Furthermore, iLP-NUCA reaches the performance, on average, of a conventional instruction cache hierarchy implementing a double sized L1, independently of the number of threads. This translates into a reduction of the Energy–Delay product of 21%, 18%, and 11%, reaching 90%, 95%, and 99% of the ideal performance for 1, 2, and 4 threads, respectively. These results are consistent for the considered applications distribution, and bigger gains are in the most demanding applications (applications with high instruction cache requirements). Besides, we increase the performance of applications with several threads without being detrimental for any of them. The new transport topology reduces the average service latency of cache lines by 8%, and the energy consumption of its components by 20%. Máster Universitario en Ingeniería de Sistemas e Informática Derechos regulados por licencia Creative Commons computer architecture cache memory multi-thread embedded systems. Suárez Gracia, Darío dir. Universidad de Zaragoza Informática e Ingeniería de Sistemas Arquitectura y Tecnología de Computadores Alastruey Benedé, Jesús ponente ferreron@unizar.es 897144 https://zaguan.unizar.es/record/9229/files/TAZ-TFM-2012-919_ANE.pdf Anexos (eng) Anexos (eng) 966806 https://zaguan.unizar.es/record/9229/files/TAZ-TFM-2012-919.pdf Memoria (eng) oai:zaguan.unizar.es:9229 trabajos-fin-master driver TAZ TFM EINA URI https://zaguan.unizar.es/record/9229 SUPPORTED 0 MD5 https://zaguan.unizar.es/record/9229/files/TAZ-TFM-2012-919_ANE.md5 0 image/x.djvu 6 http://djvu.sourceforge.net/abstract.html DJVU/6 Profile information Lizardtech Document Express Enterprise 5.1 0 URI https://zaguan.unizar.es/record/9229/files/TAZ-TFM-2012-919_ANE.pdf disk Minimum View Print Visualization of DJVU requires specific software, like DjVu Browser Plugin URI http://creativecommons.org/licenses/by-nc/3.0 URI http://creativecommons.org/licenses/by-nc/3.0 license URI http://creativecommons.org/licenses/by-nc/3.0 You are free to adapt, copy, transmite or distribute the work under the following conditions: (1) You must attribute the work in the manner specified by the author or licensor (but not in any way that suggests that they endorse you or your use of the work). (2) You may not use this work for commercial purposes (3) For any reuse or distribution, you must make clear to others the license terms of this work (4) Any of the above conditions can be waived if you get permission from the copyright holder (5) Nothing in this license impairs or restricts the author's moral rights This object is licensed under Creative Common Attribution-NonCommercial 3.0 (further details: http://creativecommons.org/licenses/by-nc/3.0/). Universidad de Zaragoza Automatizacion de Bibliotecas Edif. Matematicas, Pedro Cerbuna 12, 50009 Zaragoza auto.buz@unizar.es