000057881 001__ 57881
000057881 005__ 20190219123624.0
000057881 037__ $$aTESIS-2016-216
000057881 041__ $$aeng
000057881 080__ $$a004.3
000057881 1001_ $$aFerrerón Labari, Alexandra
000057881 24500 $$aExploiting Natural On-chip Redundancy for Energy Efficient Memory and Computing
000057881 260__ $$aZaragoza$$bUniversidad de Zaragoza, Prensas de la Universidad$$c2016
000057881 300__ $$a124
000057881 4900_ $$aTesis de la Universidad de Zaragoza$$v2016-216$$x2254-7606
000057881 500__ $$aPresentado: 25 11 2016
000057881 502__ $$aTesis-Univ. Zaragoza, Informática e Ingeniería de Sistemas, 2016$$bZaragoza, Universidad de Zaragoza$$c2016
000057881 506__ $$aby-nc-nd$$bCreative Commons$$c3.0$$uhttps://creativecommons.org/licenses/by-nc-nd/3.0/
000057881 520__ $$aPower density is currently the primary design constraint across most computing segments and the main performance limiting factor. For years, industry has kept power density constant, while increasing frequency, lowering transistors supply (Vdd) and threshold (Vth) voltages. However, Vth scaling has stopped because leakage current is exponentially related to it. Transistor count and integration density keep doubling every process generation (Moore’s Law), but the power budget caps the amount of hardware that can be active at the same time, leading to dark silicon. With each new generation, there are more resources available, but we cannot fully exploit their performance potential. In the last years, different research trends have explored how to cope with dark silicon and unlock the energy efficiency of the chips, including Near-Threshold voltage Computing (NTC) and approximate computing. NTC aggressively lowers Vdd to values near Vth. This allows a substantial reduction in power, as dynamic power scales quadratically with supply voltage. The resultant power reduction could be used to activate more chip resources and potentially achieve performance improvements. Unfortunately, Vdd scaling is limited by the tight functionality margins of on-chip SRAM transistors. When scaling Vdd down to values near-threshold, manufacture-induced parameter variations affect the functionality of SRAM cells, which eventually become not reliable. A large amount of emerging applications, on the other hand, features an intrinsic error-resilience property, tolerating a certain amount of noise. In this context, approximate computing takes advantage of this observation and exploits the gap between the level of accuracy required by the application and the level of accuracy given by the computation, providing that reducing the accuracy translates into an energy gain. However, deciding which instructions and data and which techniques are best suited for approximation still poses a major challenge. This dissertation contributes in these two directions. First, it proposes a new approach to mitigate the impact of SRAM failures due to parameter variation for effective operation at ultra-low voltages. We identify two levels of natural on-chip redundancy: cache level and content level. The first arises because of the replication of blocks in multi-level cache hierarchies. We exploit this redundancy with a cache management policy that allocates blocks to entries taking into account the nature of the cache entry and the use pattern of the block. This policy obtains performance improvements between 2% and 34%, with respect to block disabling, a technique with similar complexity, incurring no additional storage overhead. The latter (content level redundancy) arises because of the redundancy of data in real world applications. We exploit this redundancy compressing cache blocks to fit them in partially functional cache entries. At the cost of a slight overhead increase, we can obtain performance within 2% of that obtained when the cache is built with fault-free cells, even if more than 90% of the cache entries have at least a faulty cell. Then, we analyze how the intrinsic noise tolerance of emerging applications can be exploited to design an approximate Instruction Set Architecture (ISA). Exploiting the ISA redundancy, we explore a set of techniques to approximate the execution of instructions across a set of emerging applications, pointing out the potential of reducing the complexity of the ISA, and the trade-offs of the approach. In a proof-of-concept implementation, the ISA is shrunk in two dimensions: Breadth (i.e., simplifying instructions) and Depth (i.e., dropping instructions). This proof-of-concept shows that energy can be reduced on average 20.6% at around 14.9% accuracy loss.
000057881 6531_ $$ainformática
000057881 6531_ $$aarquitectura de ordenadores
000057881 700__ $$aAlastruey Benedé, Jesús$$edir.
000057881 700__ $$aSuárez Gracia, Darío$$edir.
000057881 7102_ $$aUniversidad de Zaragoza$$bInformática e Ingeniería de Sistemas
000057881 8560_ $$fchperez@unizar.es
000057881 8564_ $$s4211004$$uhttps://zaguan.unizar.es/record/57881/files/TESIS-2016-216.pdf$$zTexto completo (eng)
000057881 909CO $$ooai:zaguan.unizar.es:57881$$pdriver
000057881 909co $$ptesis
000057881 9102_ $$aArquitectura y tecn. Computadoras$$bInformática e Ingeniería de Sistemas
000057881 980__ $$aTESIS