Universidad de Zaragoza
Custodiado por la Biblioteca de la Universidad de Zaragoza
Premis-plugin for CDSInvenio, developed by Miguel Martín
Miguel Martín González
oai:zaguan.unizar.es:9229
2017-08-31
eng
Ferrerón Labari, Alexandra
Suárez Gracia, Darío
Efficient instruction and data caching for high-performance low-power embedded systems
https://zaguan.unizar.es/record/9229/files/TAZ-TFM-2012-919_ANE.pdf
https://zaguan.unizar.es/record/9229/files/TAZ-TFM-2012-919.pdf
Although multi-threading processors can increase the performance of embedded systems with a minimum overhead, fetching instructions from multiple threads each cycle also increases the pressure on the instruction cache, potentially harming the performance/consumption ratio. Instruction caches are responsible of a high percentage of the total energy consumption of the chip, which for battery-powered embedded devices becomes a critical issue. A direct way to reduce the energy consumption of the first level instruction cache is to decrease its size and associativity. However, demanding applications, and specially applications with several threads running together, might suffer a dramatic performance slow down, or even increase the total energy consumption of the cache hierarchy, due to the extra misses incurred. In this work we introduce iLP-NUCA (Instruction Light Power NUCA), a new instruction cache that replaces the conventional second level cache (L2) and improves the Energy–Delay of the system. We provided iLP-NUCA with a new tree-based transport network-in-cache that reduces both the cache line service latency and the energy consumption, regarding the former LP-NUCA implementation. We modeled in our cycle-accurate simulation environment both conventional instruction hierarchies and iLP-NUCAs. Our experiments show that, running SPEC CPU2006, iLP-NUCA, in comparison with a state–of–the–art high performance conventional cache hierarchy (three cache levels, dedicated L1 and L2, shared L3), performs better and consumes less energy. Furthermore, iLP-NUCA reaches the performance, on average, of a conventional instruction cache hierarchy implementing a double sized L1, independently of the number of threads. This translates into a reduction of the Energy–Delay product of 21%, 18%, and 11%, reaching 90%, 95%, and 99% of the ideal performance for 1, 2, and 4 threads, respectively. These results are consistent for the considered applications distribution, and bigger gains are in the most demanding applications (applications with high instruction cache requirements). Besides, we increase the performance of applications with several threads without being detrimental for any of them. The new transport topology reduces the average service latency of cache lines by 8%, and the energy consumption of its components by 20%.
2014-11-27
9229
20170831220420.0
TAZ-TFM-2012-919
eng
Ferrerón Labari, Alexandra
Efficient instruction and data caching for high-performance low-power embedded systems
Zaragoza
Universidad de Zaragoza
2012
by-nc-sa
Creative Commons
3.0
http://creativecommons.org/licenses/by-nc-sa/3.0/
Abstract also available in Spanish.
Although multi-threading processors can increase the performance of embedded systems with a minimum overhead, fetching instructions from multiple threads each cycle also increases the pressure on the instruction cache, potentially harming the performance/consumption ratio. Instruction caches are responsible of a high percentage of the total energy consumption of the chip, which for battery-powered embedded devices becomes a critical issue. A direct way to reduce the energy consumption of the first level instruction cache is to decrease its size and associativity. However, demanding applications, and specially applications with several threads running together, might suffer a dramatic performance slow down, or even increase the total energy consumption of the cache hierarchy, due to the extra misses incurred. In this work we introduce iLP-NUCA (Instruction Light Power NUCA), a new instruction cache that replaces the conventional second level cache (L2) and improves the Energy–Delay of the system. We provided iLP-NUCA with a new tree-based transport network-in-cache that reduces both the cache line service latency and the energy consumption, regarding the former LP-NUCA implementation. We modeled in our cycle-accurate simulation environment both conventional instruction hierarchies and iLP-NUCAs. Our experiments show that, running SPEC CPU2006, iLP-NUCA, in comparison with a state–of–the–art high performance conventional cache hierarchy (three cache levels, dedicated L1 and L2, shared L3), performs better and consumes less energy. Furthermore, iLP-NUCA reaches the performance, on average, of a conventional instruction cache hierarchy implementing a double sized L1, independently of the number of threads. This translates into a reduction of the Energy–Delay product of 21%, 18%, and 11%, reaching 90%, 95%, and 99% of the ideal performance for 1, 2, and 4 threads, respectively. These results are consistent for the considered applications distribution, and bigger gains are in the most demanding applications (applications with high instruction cache requirements). Besides, we increase the performance of applications with several threads without being detrimental for any of them. The new transport topology reduces the average service latency of cache lines by 8%, and the energy consumption of its components by 20%.
Máster Universitario en Ingeniería de Sistemas e Informática
Derechos regulados por licencia Creative Commons
computer architecture
cache memory
multi-thread
embedded systems.
Suárez Gracia, Darío
dir.
Universidad de Zaragoza
Informática e Ingeniería de Sistemas
Arquitectura y Tecnología de Computadores
Alastruey Benedé, Jesús
ponente
ferreron@unizar.es
897144
https://zaguan.unizar.es/record/9229/files/TAZ-TFM-2012-919_ANE.pdf
Anexos (eng)
Anexos (eng)
966806
https://zaguan.unizar.es/record/9229/files/TAZ-TFM-2012-919.pdf
Memoria (eng)
oai:zaguan.unizar.es:9229
trabajos-fin-master
driver
TAZ
TFM
EINA
URI
https://zaguan.unizar.es/record/9229
SUPPORTED
0
MD5
https://zaguan.unizar.es/record/9229/files/TAZ-TFM-2012-919_ANE.md5
0
image/x.djvu
6
http://djvu.sourceforge.net/abstract.html
DJVU/6
Profile information
Lizardtech Document Express Enterprise
5.1
0
URI
https://zaguan.unizar.es/record/9229/files/TAZ-TFM-2012-919_ANE.pdf
disk
Minimum
View
Print
Visualization of DJVU requires specific software, like DjVu Browser Plugin
URI
http://creativecommons.org/licenses/by-nc/3.0
URI
http://creativecommons.org/licenses/by-nc/3.0
license
URI
http://creativecommons.org/licenses/by-nc/3.0
You are free to adapt, copy, transmite or distribute the work under the following conditions:
(1) You must attribute the work in the manner specified by the author or licensor (but not in any way that suggests that they endorse you or your use of the work).
(2) You may not use this work for commercial purposes
(3) For any reuse or distribution, you must make clear to others the license terms of this work
(4) Any of the above conditions can be waived if you get permission from the copyright holder
(5) Nothing in this license impairs or restricts the author's moral rights
This object is licensed under Creative Common Attribution-NonCommercial 3.0 (further details: http://creativecommons.org/licenses/by-nc/3.0/).
Universidad de Zaragoza
Automatizacion de Bibliotecas
Edif. Matematicas, Pedro Cerbuna 12, 50009 Zaragoza
auto.buz@unizar.es