Publication:
Berti: An Accurate Local-Delta Data Prefetcher

dc.contributor.authorNavarro-Torres, Agustín
dc.contributor.authorPanda, Biswabandan
dc.contributor.authorAlastruey-Benedé, Jesús
dc.contributor.authorIbáñez, Pablo
dc.contributor.authorViñals-Yúfera, Victor
dc.contributor.authorRos, Alberto
dc.contributor.departmentIngeniería y Tecnología de Computadores
dc.date.accessioned2022-10-21T10:31:38Z
dc.date.available2022-10-21T10:31:38Z
dc.date.issued2022-10
dc.description© 2022. IEEE. This document is made available under the CC-BY-NC-ND 4.0 license http://creativecommons.org/licenses/by-nc-nd/4.0 This document is the accepted version of a published work that appeared in final form in 2022 55th IEEE/ACM International Symposium on Microarchitecture (MICRO). To access the final work, see DOI: 10.1109/MICRO56248.2022.00072
dc.description.abstractData prefetching is a technique that plays a crucial role in modern high-performance processors by hiding long latency memory accesses. Several state-of-the-art hardware prefetchers exploit the concept of deltas, defined as the difference between the cache line addresses of two demand accesses. Existing delta prefetchers, such as best offset prefetching (BOP) and multi-lookahead prefetching (MLOP), train and predict future accesses based on global deltas. We observed that the use of global deltas results in missed opportunities to anticipate memory accesses. In this paper, we propose Berti, a first-level data cache prefetcher that selects the best local deltas, i.e., those that consider only demand accesses issued by the same instruction. Thanks to a high-confidence mechanism that precisely detects the timely local deltas with high coverage, Berti generates accurate prefetch requests. Then, it orchestrates the prefetch requests to the memory hierarchy, using the selected deltas. Our empirical results using ChampSim and SPEC CPU2017 and GAP workloads show that, with a storage overhead of just 2.55 KB, Berti improves performance by 8.5% compared to a baseline IP-stride and 3.5% compared to IPCP, a state-of-the-art prefetcher. Our evaluation also shows that Berti reduces dynamic energy at the memory hierarchy by 33.6% compared to IPCP, thanks to its high prefetch accuracy.es
dc.formatapplication/pdfes
dc.format.extent17es
dc.identifier.doihttps://doi.org/10.1109/MICRO56248.2022.00072
dc.identifier.eissn978-1-6654-6272-3
dc.identifier.urihttp://hdl.handle.net/10201/124766
dc.languageenges
dc.publisherIEEE
dc.relationEuropean Research Council (ERC) under the European Union s Horizon 2020 research and innovation programme (ECHO: Extending Coherence for Hardware-Driven Optimizations in Multicore Architectures, grant agreement No 819134, Consolidator Grant, 2018).es
dc.relation.ispartof55th International Symposium on Microarchitecture (MICRO)es
dc.relation.publisherversionhttps://ieeexplore.ieee.org/document/9923806
dc.rightsinfo:eu-repo/semantics/openAccesses
dc.rightsAttribution-NonCommercial-NoDerivatives 4.0 Internacional*
dc.rights.urihttp://creativecommons.org/licenses/by-nc-nd/4.0/*
dc.subjectData prefetchinges
dc.subjectHardware prefetchinges
dc.subjectFirst-level cachees
dc.subjectLocal deltases
dc.subjectAccuracyes
dc.subjectTimelinesses
dc.titleBerti: An Accurate Local-Delta Data Prefetcheres
dc.typeinfo:eu-repo/semantics/articlees
dspace.entity.typePublicationes
Files
Original bundle
Now showing 1 - 1 of 1
Loading...
Thumbnail Image
Name:
anavarrotorres-micro22.pdf
Size:
771.12 KB
Format:
Adobe Portable Document Format
Description:
License bundle
Now showing 1 - 1 of 1
Loading...
Thumbnail Image
Name:
license.txt
Size:
1.39 KB
Format:
Item-specific license agreed upon to submission
Description:
Collections