000120154 001__ 120154 000120154 005__ 20221213161034.0 000120154 0247_ $$2doi$$a10.1109/MICRO56248.2022.00072 000120154 0248_ $$2sideral$$a131192 000120154 037__ $$aART-2022-131192 000120154 041__ $$aeng 000120154 100__ $$aNavarro-Torres, Agustin$$uUniversidad de Zaragoza 000120154 245__ $$aBerti: an Accurate Local-Delta Data Prefetcher 000120154 260__ $$c2022 000120154 5060_ $$aAccess copy available to the general public$$fUnrestricted 000120154 5203_ $$aData prefetching is a technique that plays a crucial role in modern high-performance processors by hiding long latency memory accesses. Several state-of-the-art hardware prefetchers exploit the concept of deltas, defined as the difference between the cache line addresses of two demand accesses. Existing delta prefetchers, such as best offset prefetching (BOP) and multi-lookahead prefetching (MLOP), train and predict future accesses based on global deltas. We observed that the use of global deltas results in missed opportunities to anticipate memory accesses. In this paper, we propose Berti, a first-level data cache prefetcher that selects the best local deltas, i.e., those that consider only demand accesses issued by the same instruction. Thanks to a high-confidence mechanism that precisely detects the timely local deltas with high coverage, Berti generates accurate prefetch requests. Then, it orchestrates the prefetch requests to the memory hierarchy, using the selected deltas. Our empirical results using ChampSim and SPEC CPU2017 and GAP workloads show that, with a storage overhead of just 2.55 KB, Berti improves performance by 8.5% compared to a baseline IP-stride and 3.5% compared to IPCP, a state-of-the-art prefetcher. Our evaluation also shows that Berti reduces dynamic energy at the memory hierarchy by 33.6% compared to IPCP, thanks to its high prefetch accuracy. 000120154 536__ $$9info:eu-repo/grantAgreement/ES/AEI-FEDER/PID2019-105660RB-C21$$9info:eu-repo/grantAgreement/ES/AEI-FEDER/RTI2018-098156-B-C53$$9info:eu-repo/grantAgreement/ES/DGA-ESF/T58-20R$$9info:eu-repo/grantAgreement/EC/H2020/819134/EU/Extending Coherence for Hardware-Driven Optimizations in Multicore Architectures/ECHO$$9This project has received funding from the European Union’s Horizon 2020 research and innovation program under grant agreement No H2020 819134-ECHO 000120154 540__ $$9info:eu-repo/semantics/openAccess$$aAll rights reserved$$uhttp://www.europeana.eu/rights/rr-f/ 000120154 655_4 $$ainfo:eu-repo/semantics/article$$vinfo:eu-repo/semantics/acceptedVersion 000120154 700__ $$aPanda, Biswabandan 000120154 700__ $$0(orcid)0000-0003-4164-5078$$aAlastruey-Benede, Jesus$$uUniversidad de Zaragoza 000120154 700__ $$0(orcid)0000-0002-5916-7898$$aIbañez, Pablo$$uUniversidad de Zaragoza 000120154 700__ $$0(orcid)0000-0002-5976-1352$$aViñals-Yufera, Victor$$uUniversidad de Zaragoza 000120154 700__ $$aRos, Alberto 000120154 7102_ $$15007$$2035$$aUniversidad de Zaragoza$$bDpto. Informát.Ingenie.Sistms.$$cÁrea Arquit.Tecnología Comput. 000120154 773__ $$g55 (2022), 975-991$$tProceedings of the Annual International Symposium on Microarchitecture, MICRO$$x1072-4451 000120154 8564_ $$s800357$$uhttps://zaguan.unizar.es/record/120154/files/texto_completo.pdf$$yPostprint 000120154 8564_ $$s2731914$$uhttps://zaguan.unizar.es/record/120154/files/texto_completo.jpg?subformat=icon$$xicon$$yPostprint 000120154 909CO $$ooai:zaguan.unizar.es:120154$$particulos$$pdriver 000120154 951__ $$a2022-12-13-14:05:33 000120154 980__ $$aARTICLE