Publication:
Hardware cache locking for all memory updates

Loading...
Thumbnail Image
Date
2024
relationships.isAuthorOfPublication
relationships.isSecondaryAuthorOf
relationships.isDirectorOf
Authors
Asgharzadeh, Ashkan ; Gómez Hernández, Eduardo José ; Cebrián, Juan M. ; Kaxiras, Stefanos ; Ros Bardisa, Alberto
item.page.secondaryauthor
item.page.director
Publisher
IEEE Computer Society
publication.page.editor
DOI
https://doi.org/10.1109/ICCD63220.2024.00092
item.page.type
info:eu-repo/semantics/article
Description
© 2024 IEEE. This document is the Submitted Published version of a Published Work that appeared in final form in 42th IEEE International Conference on Computer Design (ICCD 2024). To access the final edited and published work see https://doi.org/10.1109/ICCD63220.2024.00092
Abstract
Many applications need to perform operations thatinvolve reading a value from memory, modifying it, and thenwriting it back. Multiple architectures provide hardware supportfor these operations via read-modify-write (RMW) instructions.The primary benefit is that the read can request a cacheline withwrite permissions, reducing coherence protocol overhead sincethe write will find the cacheline with appropriate permissions.RMWs can be either atomic or non-atomic. Atomic RMWs, usedfor synchronization, commonly require (i) locking the cacheline toguarantee atomicity by preventing invalidations and (ii) enforcingserialization of instructions in the program (e.g., via memoryfences), which may cause performance degradation based onthe implemented memory consistency model. Non-atomic RMWs,while not requiring such strict measures, should only be used indata-race free code sections. However, other cores may invalidatea cacheline during a non-atomic RMW (e.g., due to false sharing),flushing the pipeline and causing the loss of write permissionsobtained by the read, which is detrimental to performance.In this work, we propose a microarchitectural mechanismthat enables non-atomic RMWs to fetch the cacheline lockingit, thus preventing other cores from “stealing” the cachelinewhile allowing them to run concurrently with other instructionsin the same core. Our proposal enables concurrent hardwarecache locking for multiple non-atomic RMWs while guaranteeingdeadlock freedom and no programmer/compiler intervention.We also propose alock-chainingmechanism to allow multipleconsecutive memory updates to the same cacheline up to apredefined maximum (to prevent starvation and load imbalance).Our evaluation using gem5 full-system simulator shows that foran eight-core configuration, our proposal improves performanceby up to 5.36% (2.05% on average), requiring just 45 bytes ofstorage per core.
Citation
item.page.embargo
Collections