000084317 001__ 84317 000084317 005__ 20200609132535.0 000084317 0247_ $$2doi$$a10.1093/comjnl/bxx099 000084317 0248_ $$2sideral$$a103086 000084317 037__ $$aART-2017-103086 000084317 041__ $$aeng 000084317 100__ $$aRodríguez-Rodríguez, Roberto 000084317 245__ $$aReuse Detector: Improving the management of STT-RAM SLLCs 000084317 260__ $$c2017 000084317 5060_ $$aAccess copy available to the general public$$fUnrestricted 000084317 5203_ $$aVarious constraints of Static Random Access Memory (SRAM) are leading to consider new memory technologies as candidates for building on-chip shared last-level caches (SLLCs). Spin-Transfer Torque RAM (STT-RAM) is currently postulated as the prime contender due to its better energy efficiency, smaller die footprint and higher scalability. However, STT-RAM also exhibits some drawbacks, like slow and energy-hungry write operations that need to be mitigated before it can be used in SLLCs for the next generation of computers. In this work, we address these shortcomings by leveraging a new management mechanism for STT-RAM SLLCs. This approach is based on the previous observation that although the stream of references arriving at the SLLC of a Chip MultiProcessor (CMP) exhibits limited temporal locality, it does exhibit reuse locality, i.e. those blocks referenced several times manifest high probability of forthcoming reuse. As such, conventional STT-RAM SLLC management mechanisms, mainly focused on exploiting temporal locality, result in low efficient behavior. In this paper, we employ a cache management mechanism that selects the contents of the SLLC aimed to exploit reuse locality instead of temporal locality. Specifically, our proposal consists in the inclusion of a Reuse Detector (RD) between private cache levels and the STT-RAM SLLC. Its mission is to detect blocks that do not exhibit reuse, in order to avoid their insertion in the SLLC, hence reducing the number of write operations and the energy consumption in the STT-RAM. Our evaluation, using multiprogrammed workloads in quad-core, eight-core and 16-core systems, reveals that our scheme reports on average, energy reductions in the SLLC in the range of 37–30%, additional energy savings in the main memory in the range of 6–8% and performance improvements of 3% (quad-core), 7% (eight-core) and 14% (16-core) compared with an STT-RAM SLLC baseline where no RD is employed. More importantly, our approach outperforms DASCA, the state-of-the-art STT-RAM SLLC management, reporting—depending on the specific scenario and the kind of applications used—SLLC energy savings in the range of 4–11% higher than those of DASCA, delivering higher performance in the range of 1.5–14% and additional improvements in DRAM energy consumption in the range of 2–9% higher than DASCA. 000084317 536__ $$9info:eu-repo/grantAgreement/EC/H2020/687698/EU/High Performance and Embedded Architecture and Compilation/HiPEAC$$9This project has received funding from the European Union’s Horizon 2020 research and innovation program under grant agreement No H2020 687698-HiPEAC$$9info:eu-repo/grantAgreement/ES/MINECO/TIN2012-32180$$9info:eu-repo/grantAgreement/ES/MINECO/TIN2015-65277-R 000084317 540__ $$9info:eu-repo/semantics/openAccess$$aby-nc$$uhttp://creativecommons.org/licenses/by-nc/3.0/es/ 000084317 590__ $$a0.792$$b2017 000084317 591__ $$aCOMPUTER SCIENCE, THEORY & METHODS$$b75 / 103 = 0.728$$c2017$$dQ3$$eT3 000084317 591__ $$aCOMPUTER SCIENCE, SOFTWARE ENGINEERING$$b85 / 104 = 0.817$$c2017$$dQ4$$eT3 000084317 591__ $$aCOMPUTER SCIENCE, HARDWARE & ARCHITECTURE$$b47 / 52 = 0.904$$c2017$$dQ4$$eT3 000084317 591__ $$aCOMPUTER SCIENCE, INFORMATION SYSTEMS$$b131 / 148 = 0.885$$c2017$$dQ4$$eT3 000084317 592__ $$a0.319$$b2017 000084317 593__ $$aComputer Science (miscellaneous)$$c2017$$dQ2 000084317 655_4 $$ainfo:eu-repo/semantics/article$$vinfo:eu-repo/semantics/acceptedVersion 000084317 700__ $$aDíaz, Javier 000084317 700__ $$aCastro, Fernando 000084317 700__ $$0(orcid)0000-0002-5916-7898$$aIbáñez, Pablo$$uUniversidad de Zaragoza 000084317 700__ $$aChaver, Daniel 000084317 700__ $$0(orcid)0000-0002-5976-1352$$aViñals, Víctor$$uUniversidad de Zaragoza 000084317 700__ $$aSáez, Juan Carlos 000084317 700__ $$aPrieto, Manuel 000084317 700__ $$aPiñuel, Luis 000084317 700__ $$aMonreal, Teresa 000084317 700__ $$aLlabería, José María 000084317 7102_ $$15007$$2035$$aUniversidad de Zaragoza$$bDpto. Informát.Ingenie.Sistms.$$cÁrea Arquit.Tecnología Comput. 000084317 773__ $$g61, 6 (2017), 856 – 880$$pComput. j.$$tCOMPUTER JOURNAL$$x0010-4620 000084317 8564_ $$s291180$$uhttps://zaguan.unizar.es/record/84317/files/texto_completo.pdf$$yPostprint 000084317 8564_ $$s120809$$uhttps://zaguan.unizar.es/record/84317/files/texto_completo.jpg?subformat=icon$$xicon$$yPostprint 000084317 909CO $$ooai:zaguan.unizar.es:84317$$particulos$$pdriver 000084317 951__ $$a2020-06-09-13:22:40 000084317 980__ $$aARTICLE