Browsing by Subject "Memory Consistency Models"
Now showing 1 - 2 of 2
Results Per Page
Sort Options
- PublicationOpen AccessRegional Out-of-Order Writes in Total Store Order(2020-10) Singh, Sawan; Jimborean, Alexandra; Ros, Alberto; Ingeniería y Tecnología de ComputadoresThe store buffer, an essential component in today’s processors, is designed to hide memory latency by moving stores off the processor’s critical path. Furthermore, under the Total Store Order (TSO) memory model, the store buffer ensures the in-order retirement of stores. Problems arise when the store buffer is full or, under TSO, when the leading store encounters a cache miss, which blocks all subsequent stores and incurs severe performance bottlenecks.This work presents a software-hardware co-designed approach to cope with this bottleneck for processors with strong consistency guarantees. Our proposal is driven by the insight that store operations can be reordered if their reordering does not change the observable program behavior. The compiler delineates safe regions within which stores can be shuffled while still delivering the same observable behavior as if they performed in program order and unsafe regions within which stores must be kept in program order. This is leveraged by a novel dual-mode store buffer that switches between the out-of-order and in-order execution of stores within the safe and respectively unsafe regions. Correctness is preserved through well-placed fences inserted by the compiler, which impede the execution of stores from the following regions until all stores of the current region complete. Our dual-mode store buffer only requires one extra bit per entry, significantly decreases processor stall cycles, and brings 8.13% performance improvements compared to a mainstream store buffer.
- PublicationOpen AccessSpeculative Enforcement of Store Atomicity(2020-10) Ros, Alberto; Kaxiras, Stefanos; Ingeniería y Tecnología de ComputadoresVarious memory consistency model implementations (e.g., x86, SPARC) willfully allow a core to see its own stores while they are in limbo, i.e., executed (and perhaps retired) but not yet inserted in memory order. This is known as store-to-load forwarding and it is a necessity to safeguard the local thread’s sequential program semantics while achieving high performance.However, this can lead to counter-intuitive behaviours, requiring fences to prevent such behaviours when needed.Other vendors (e.g., IBM 370 and the z/Architecture series)opt for enforcing what we call in this work store atomicity, that is, disallowing a core to see its own stores before they are written to memory, trading off performance for a more intuitive memory model. Ideally, we want a stricter model to ease programability at the same time that architects can provide high-performance solutions. We make a simple observation. What holds for any other rule in a consistency model, also holds for store atomicity: it is not a crime to break the rule, unless we get caught.In this work, we detail the different ways of detecting a store atomicity violation. This leads us to a new insight: a load performed by a forwarding from an in-limbo store is not speculative; younger loads performed after that forwarding are. Based on this insight we propose an effective and cheap speculative approach to dynamically enforce store atomicity only when the detection of its violation actually occurs. In practice, these cases are rare during the execution of a program. In all other cases (the bulk of the execution of a program) store-to-load forwarding can be done without violating store atomicity.The end result is that we provide the best of both worlds: amore intuitive store-atomic memory model, i.e., the 370 model,with the performance and cost approaching (at an average of just2.5% and 2.7% overhead for parallel and sequential applications,respectively) that of a non-store-atomic model, i.e., the x86 model.