Browsing by Subject "Timeliness"
Now showing 1 - 2 of 2
Results Per Page
Sort Options
- PublicationOpen AccessA complexity-effective local delta prefetcher(Institute of Electrical and Electronics Engineers, 2025-01-31) Navarro-Torres, Agustín; Panda, Biswabandan; Alastruey-Benedé, Jesús; Ibáñez, Pablo; Viñals-Yúfera, Víctor; Ros Bardisa, Alberto; Ingeniería y Tecnología de ComputadoresData prefetching is crucial for performance in modern processors by effectively masking long-latency memory accesses. Over the past decades, numerous data prefetching mechanisms have been proposed, which have continuously reduced the access latency to the memory hierarchy. Several state-of-the-art prefetchers, namely Instruction Pointer Classifier Prefetcher (IPCP) and Berti, target the first-level data cache, and thus, they are able to completely hide the miss latency for timely prefetched cache lines. Berti exploits timely local deltas to achieve high accuracy and performance. This paper extends Berti with a larger evaluation and with extra optimizations on top of the previous conference paper. The result is a complexity-effective version of Berti that outperforms it for a large amount of workloads and simplifies its control logic. The key for those advancements is a simple mechanism for learning timely deltas without the need to track the fetch latency of each cache miss. Our experiments conducted with a wide range of workloads (CVP traces by Qualcomm, SPEC CPU2017, and GAP) show performance improvements by 4.0% over a mainstream stride prefetcher, and by a non-negligible 1.4% over the previously published version of Berti requiring similar storage.
- PublicationOpen AccessBerti: An Accurate Local-Delta Data Prefetcher(IEEE, 2022-10) Navarro-Torres, Agustín; Panda, Biswabandan; Alastruey-Benedé, Jesús; Ibáñez, Pablo; Viñals-Yúfera, Victor; Ros, Alberto; Ingeniería y Tecnología de ComputadoresData prefetching is a technique that plays a crucial role in modern high-performance processors by hiding long latency memory accesses. Several state-of-the-art hardware prefetchers exploit the concept of deltas, defined as the difference between the cache line addresses of two demand accesses. Existing delta prefetchers, such as best offset prefetching (BOP) and multi-lookahead prefetching (MLOP), train and predict future accesses based on global deltas. We observed that the use of global deltas results in missed opportunities to anticipate memory accesses. In this paper, we propose Berti, a first-level data cache prefetcher that selects the best local deltas, i.e., those that consider only demand accesses issued by the same instruction. Thanks to a high-confidence mechanism that precisely detects the timely local deltas with high coverage, Berti generates accurate prefetch requests. Then, it orchestrates the prefetch requests to the memory hierarchy, using the selected deltas. Our empirical results using ChampSim and SPEC CPU2017 and GAP workloads show that, with a storage overhead of just 2.55 KB, Berti improves performance by 8.5% compared to a baseline IP-stride and 3.5% compared to IPCP, a state-of-the-art prefetcher. Our evaluation also shows that Berti reduces dynamic energy at the memory hierarchy by 33.6% compared to IPCP, thanks to its high prefetch accuracy.