000084317 001__ 84317
000084317 005__ 20200609132535.0
000084317 0247_ $$2doi$$a10.1093/comjnl/bxx099
000084317 0248_ $$2sideral$$a103086
000084317 037__ $$aART-2017-103086
000084317 041__ $$aeng
000084317 100__ $$aRodríguez-Rodríguez, Roberto
000084317 245__ $$aReuse Detector: Improving the management of STT-RAM SLLCs
000084317 260__ $$c2017
000084317 5060_ $$aAccess copy available to the general public$$fUnrestricted
000084317 5203_ $$aVarious constraints of Static Random Access Memory (SRAM) are leading to consider new memory technologies as candidates for building on-chip shared last-level caches (SLLCs). Spin-Transfer Torque RAM (STT-RAM) is currently postulated as the prime contender due to its better energy efficiency, smaller die footprint and higher scalability. However, STT-RAM also exhibits some drawbacks, like slow and energy-hungry write operations that need to be mitigated before it can be used in SLLCs for the next generation of computers. In this work, we address these shortcomings by leveraging a new management mechanism for STT-RAM SLLCs. This approach is based on the previous observation that although the stream of references arriving at the SLLC of a Chip MultiProcessor (CMP) exhibits limited temporal locality, it does exhibit reuse locality, i.e. those blocks referenced several times manifest high probability of forthcoming reuse. As such, conventional STT-RAM SLLC management mechanisms, mainly focused on exploiting temporal locality, result in low efficient behavior. In this paper, we employ a cache management mechanism that selects the contents of the SLLC aimed to exploit reuse locality instead of temporal locality. Specifically, our proposal consists in the inclusion of a Reuse Detector (RD) between private cache levels and the STT-RAM SLLC. Its mission is to detect blocks that do not exhibit reuse, in order to avoid their insertion in the SLLC, hence reducing the number of write operations and the energy consumption in the STT-RAM. Our evaluation, using multiprogrammed workloads in quad-core, eight-core and 16-core systems, reveals that our scheme reports on average, energy reductions in the SLLC in the range of 37–30%, additional energy savings in the main memory in the range of 6–8% and performance improvements of 3% (quad-core), 7% (eight-core) and 14% (16-core) compared with an STT-RAM SLLC baseline where no RD is employed. More importantly, our approach outperforms DASCA, the state-of-the-art STT-RAM SLLC management, reporting—depending on the specific scenario and the kind of applications used—SLLC energy savings in the range of 4–11% higher than those of DASCA, delivering higher performance in the range of 1.5–14% and additional improvements in DRAM energy consumption in the range of 2–9% higher than DASCA.
000084317 536__ $$9info:eu-repo/grantAgreement/EC/H2020/687698/EU/High Performance and Embedded Architecture and Compilation/HiPEAC$$9This project has received funding from the European Union’s Horizon 2020 research and innovation program under grant agreement No H2020 687698-HiPEAC$$9info:eu-repo/grantAgreement/ES/MINECO/TIN2012-32180$$9info:eu-repo/grantAgreement/ES/MINECO/TIN2015-65277-R
000084317 540__ $$9info:eu-repo/semantics/openAccess$$aby-nc$$uhttp://creativecommons.org/licenses/by-nc/3.0/es/
000084317 590__ $$a0.792$$b2017
000084317 591__ $$aCOMPUTER SCIENCE, THEORY & METHODS$$b75 / 103 = 0.728$$c2017$$dQ3$$eT3
000084317 591__ $$aCOMPUTER SCIENCE, SOFTWARE ENGINEERING$$b85 / 104 = 0.817$$c2017$$dQ4$$eT3
000084317 591__ $$aCOMPUTER SCIENCE, HARDWARE & ARCHITECTURE$$b47 / 52 = 0.904$$c2017$$dQ4$$eT3
000084317 591__ $$aCOMPUTER SCIENCE, INFORMATION SYSTEMS$$b131 / 148 = 0.885$$c2017$$dQ4$$eT3
000084317 592__ $$a0.319$$b2017
000084317 593__ $$aComputer Science (miscellaneous)$$c2017$$dQ2
000084317 655_4 $$ainfo:eu-repo/semantics/article$$vinfo:eu-repo/semantics/acceptedVersion
000084317 700__ $$aDíaz, Javier
000084317 700__ $$aCastro, Fernando
000084317 700__ $$0(orcid)0000-0002-5916-7898$$aIbáñez, Pablo$$uUniversidad de Zaragoza
000084317 700__ $$aChaver, Daniel
000084317 700__ $$0(orcid)0000-0002-5976-1352$$aViñals, Víctor$$uUniversidad de Zaragoza
000084317 700__ $$aSáez, Juan Carlos
000084317 700__ $$aPrieto, Manuel
000084317 700__ $$aPiñuel, Luis
000084317 700__ $$aMonreal, Teresa
000084317 700__ $$aLlabería, José María
000084317 7102_ $$15007$$2035$$aUniversidad de Zaragoza$$bDpto. Informát.Ingenie.Sistms.$$cÁrea Arquit.Tecnología Comput.
000084317 773__ $$g61, 6 (2017), 856 – 880$$pComput. j.$$tCOMPUTER JOURNAL$$x0010-4620
000084317 8564_ $$s291180$$uhttps://zaguan.unizar.es/record/84317/files/texto_completo.pdf$$yPostprint
000084317 8564_ $$s120809$$uhttps://zaguan.unizar.es/record/84317/files/texto_completo.jpg?subformat=icon$$xicon$$yPostprint
000084317 909CO $$ooai:zaguan.unizar.es:84317$$particulos$$pdriver
000084317 951__ $$a2020-06-09-13:22:40
000084317 980__ $$aARTICLE