Dual Fast-Track Cache: Organizing Ring-Shaped Racetracks to Work as L1 Caches

Valero, Alejandro; Petit, Salvador; Lorente, Vicente; Sahuquillo, Julio
doi:10.1109/TC.2025.3575909
000165054 001__ 165054
000165054 005__ 20251204150239.0
000165054 0247_ $$2doi$$a10.1109/TC.2025.3575909
000165054 0248_ $$2sideral$$a146455
000165054 037__ $$aART-2025-146455
000165054 041__ $$aeng
000165054 100__ $$0(orcid)0000-0002-0824-5833$$aValero, Alejandro$$uUniversidad de Zaragoza
000165054 245__ $$aDual Fast-Track Cache: Organizing Ring-Shaped Racetracks to Work as L1 Caches
000165054 260__ $$c2025
000165054 5060_ $$aAccess copy available to the general public$$fUnrestricted
000165054 5203_ $$aStatic Random-Access Memory (SRAM) is the fastest memory technology and has been the common design choice for implementing first-level (L1) caches in the processor pipeline, where speed is a key design issue that must be fulfilled. On the contrary, this technology offers much lower density compared to other technologies like Dynamic RAM, limiting L1 cache sizes of modern processors to a few tens of KB. This paper explores the use of slower but denser Domain Wall Memory (DWM) technology for L1 caches. This technology provides slow access times since it arranges multiple bits sequentially in a magnetic racetrack. To access these bits, they need to be shifted in order to place them under a header. A 1-bit shift usually takes one processor cycle, which can significantly hurt the application performance, making this working behavior inappropriate for L1 caches. Based on the locality (temporal and spatial) principles exploited by caches, this work proposes the Dual Fast-Track Cache (Dual FTC) design, a new approach to organizing a set of racetracks to build set-associative caches. Compared to a conventional SRAM cache, Dual FTC enhances storage capacity by 5× while incurring minimal shifting overhead, thereby rendering it a practical and appealing solution for L1 cache implementations. Experimental results show that the devised cache organization is as fast as an SRAM cache for 78% and 86% of the L1 data cache hits and L1 instruction cache hits, respectively (i.e., no shift is required). Consequently, due to the larger L1 cache capacities, significant system performance gains (by 22% on average) are obtained under the same silicon area.
000165054 536__ $$9info:eu-repo/grantAgreement/ES/AEI/PID2022-136454NB-C22$$9info:eu-repo/grantAgreement/ES/AEI/TED2021-130233B-C33$$9info:eu-repo/grantAgreement/ES/MICINN/PID2021-123627OB-C52
000165054 540__ $$9info:eu-repo/semantics/openAccess$$aby$$uhttps://creativecommons.org/licenses/by/4.0/deed.es
000165054 655_4 $$ainfo:eu-repo/semantics/article$$vinfo:eu-repo/semantics/publishedVersion
000165054 700__ $$aLorente, Vicente
000165054 700__ $$aPetit, Salvador
000165054 700__ $$aSahuquillo, Julio
000165054 7102_ $$15007$$2035$$aUniversidad de Zaragoza$$bDpto. Informát.Ingenie.Sistms.$$cÁrea Arquit.Tecnología Comput.
000165054 773__ $$g74, 8 (2025), 2812-2826$$pIEEE trans. comput.$$tIEEE TRANSACTIONS ON COMPUTERS$$x0018-9340
000165054 8564_ $$s3209334$$uhttps://zaguan.unizar.es/record/165054/files/texto_completo.pdf$$yVersión publicada
000165054 8564_ $$s3634150$$uhttps://zaguan.unizar.es/record/165054/files/texto_completo.jpg?subformat=icon$$xicon$$yVersión publicada
000165054 909CO $$ooai:zaguan.unizar.es:165054$$particulos$$pdriver
000165054 951__ $$a2025-12-04-14:40:08
000165054 980__ $$aARTICLE
Repositorio Institucional de Documentos