Hardware cache locking for all memory updates

Asgharzadeh, Ashkan; Gómez Hernández, Eduardo José; Cebrián, Juan M.; Kaxiras, Stefanos; Ros Bardisa, Alberto

Publication:
Hardware cache locking for all memory updates

Files

aasgharzadeh-iccd24.pdf(339.17 KB)

Date

2024

Authors

Asgharzadeh, Ashkan ; Gómez Hernández, Eduardo José ; Cebrián, Juan M. ; Kaxiras, Stefanos ; Ros Bardisa, Alberto

Publisher

IEEE Computer Society

publication.page.department

Ingeniería y Tecnología de Computadores

DOI

https://doi.org/10.1109/ICCD63220.2024.00092

item.page.type

info:eu-repo/semantics/article

Description

© 2024 IEEE. This document is the Submitted Published version of a Published Work that appeared in final form in 42th IEEE International Conference on Computer Design (ICCD 2024). To access the final edited and published work see https://doi.org/10.1109/ICCD63220.2024.00092

Abstract

Many applications need to perform operations thatinvolve reading a value from memory, modifying it, and thenwriting it back. Multiple architectures provide hardware supportfor these operations via read-modify-write (RMW) instructions.The primary benefit is that the read can request a cacheline withwrite permissions, reducing coherence protocol overhead sincethe write will find the cacheline with appropriate permissions.RMWs can be either atomic or non-atomic. Atomic RMWs, usedfor synchronization, commonly require (i) locking the cacheline toguarantee atomicity by preventing invalidations and (ii) enforcingserialization of instructions in the program (e.g., via memoryfences), which may cause performance degradation based onthe implemented memory consistency model. Non-atomic RMWs,while not requiring such strict measures, should only be used indata-race free code sections. However, other cores may invalidatea cacheline during a non-atomic RMW (e.g., due to false sharing),flushing the pipeline and causing the loss of write permissionsobtained by the read, which is detrimental to performance.In this work, we propose a microarchitectural mechanismthat enables non-atomic RMWs to fetch the cacheline lockingit, thus preventing other cores from “stealing” the cachelinewhile allowing them to run concurrently with other instructionsin the same core. Our proposal enables concurrent hardwarecache locking for multiple non-atomic RMWs while guaranteeingdeadlock freedom and no programmer/compiler intervention.We also propose alock-chainingmechanism to allow multipleconsecutive memory updates to the same cacheline up to apredefined maximum (to prevent starvation and load imbalance).Our evaluation using gem5 full-system simulator shows that foran eight-core configuration, our proposal improves performanceby up to 5.36% (2.05% on average), requiring just 45 bytes ofstorage per core.

Sin licencia Creative Commons.

Publication:
Hardware cache locking for all memory updates

Files

Date

relationships.isAuthorOfPublication

relationships.isSecondaryAuthorOf

relationships.isDirectorOf

Authors

item.page.secondaryauthor

item.page.director

Publisher

publication.page.editor

publication.page.department

DOI

item.page.type

Description

Abstract

publication.page.subject

Citation

URI

item.page.embargo

Collections

Publication: Hardware cache locking for all memory updates

Files

Date

relationships.isAuthorOfPublication

relationships.isSecondaryAuthorOf

relationships.isDirectorOf

Authors

item.page.secondaryauthor

item.page.director

Publisher

publication.page.editor

publication.page.department

DOI

item.page.type

Description

Abstract

publication.page.subject

Citation

URI

item.page.embargo

Collections

Publication:
Hardware cache locking for all memory updates