DigitalUM :: Browsing by Subject "Microarchitecture"

Browsing by Subject "Microarchitecture"

Now showing 1 - 5 of 5

Restricted
Chaining transactions for effective concurrency management in hardware transactional memory
(IEEE Computer Society, 2024-12-03) Nicolas Conesa, Víctor; Titos Gil, Rubén; Fernández Pascual, Ricardo; Acacio, Manuel E.; Ros Bardisa, Alberto; Ingeniería y Tecnología de Computadores
Hardware Transactional Memory (HTM) offers the opportunity to ease parallel programming. However, driven by hardware limitations, commercial implementations eschew the complexity involved in early sophisticated proposals from academia, and, among other things, opt for simple conflict resolution policies that inevitably increase transaction aborts. To increase thread level parallelism, previous works propose conflict resolution schemes that, instead of aborting, add a second level of speculation consisting in using not-yet-committed data from another transaction. This policy, which we refer to as requester-speculates, has not yet been considered in the context of the kind of best-effort HTM support provided by commercial processors. This work proposes CHAining TransactionS (CHATS), a simple yet effective realization of the requester-speculates con-flict resolution policy in which cyclic dependencies between transactions are avoided and the commit ordering respects the dependencies that transactions make once speculative values are communicated. The ultimate result is a best-effort HTM implementation that forces a partial order between transactions in a way that ensures effective utilization of forwarded data and that gets away from the complexity of previous proposals. Simulations using gem5 demonstrate the effectiveness of CHATS in both commercial-like setups and academic state-of-the-art best-effort systems (22% and 16% reduction in execution time, on average, respectively). These improvements are achieved by requiring less than 280 bytes of extra storage.
Open Access
Development of the dynamic structure force lines of the middle ear ossicles in human foetuses
(Murcia : F. Hernández, 2008) Whyte, J.; Cisneros, A.; Yus, C.; Obón, J.; Whyte, A.; Serrano, P.; Pérez-Castejon, C.; Vera, A.
Objectives: To study the ontogenic development of the organisation of the human middle ear ossicles structure. Material and methods: 46 human temporal bones of ages varying from 32 days postconception to newborns. Results: The development of the structural organisation of the malleus begins at 16 weeks via two cortical fascicles situated in the neck; at 21 weeks they extend towards the head, at 23 weeks to the lateral process and at 24 weeks to the handle. In the handle, the force lines are transmitted via three cardinal fascicles, two of them via the cortical fascicle and one via the centre, which starts after 29 weeks' development and is consolidated after 31 weeks. In the incus the force lines start at 16 weeks via two cortical fascicles situated in the long process, which progressively extend in a rostro-caudal direction between 17 and 20 weeks. At 21 weeks they occupy the whole extension of the long process and at 22 weeks the fusion of both cortical fascicles begins. From 30 weeks onwards it is strengthened by the crossing of bone trabeculae from one cortical to another. Two fascicles come out of the incus body, surrounding the medullary cavity and going in the direction of the short process. In the beginning, the stapes have two cortical fascicles in their crura. The remodelling process makes the internal cortical fascicle disappear and after 31 weeks all the force lines run through the external cortical fascicle. The tympanic membrane of the stapes footplate undergoes a remodelling process and after 28 weeks bony trabeculae are deposited. In newborns (40 weeks), the ossicles’ structure is cavitary and has not been completed. The fan-shaped trabecular fascicle, which starts in the articular facets of the malleus and the incus, still has to develop.
Embargo
Exploring Instruction Fusion Opportunities in General Purpose Processors
(IEEE Press, 2023-12-18) Singh, Sawan; Perais, Arthur; Jimborean, Alexandra; Ros, Alberto; Ingeniería y Tecnología de Computadores
The Complex Instruction Set Computer (CISC) paradigm has led to the introduction of instruction cracking in which an architectural instruction is divided into multiple microarchitectural instructions (μ-ops). However, the dual concept, instruction fusion is also prevalent in modern microarchitectures to maximize resource utilization. In essence, some architectural instructions are too complex to be executed as a unit, so they should be cracked, while others are too simple to waste resources on executing them as a unit, so they should be fused with others. In this paper, we focus on instruction fusion and explore opportunities for fusing additional instructions in a high- performance general purpose pipeline. We show that enabling fusion for common RISC-V idioms improves performance by 7%. Then, we determine experimentally that enabling fusion only for memory instructions achieves 86% of the potential of fusion in this particular case. Finally, we propose the Helios microarchitecture, able to fuse non-consecutive and non-contiguous memory instructions, and discuss microarchitectural changes required to do so efficiently while preserving correctness. Helios allows to fuse an additional 5.5% of dynamic instructions, yielding a 14.2% performance uplift over no fusion (8.2% over baseline fusion).
Open Access
Free Atomics: Hardware Atomic Operations without Fences
(Association for Computing Machinery, 2022-06-11) Asgharzadeh, Ashkan; Cebrian, Juan M.; Perais, Arthur; Kaxiras, Stefanos; Ros, Alberto; Ingeniería y Tecnología de Computadores
Atomic Read-Modify-Write (RMW) instructions are primitive synchronization operations implemented in hardware that provide the building blocks for higher-abstraction synchronization mechanisms to programmers. According to publicly available documentation, current x86 implementations serialize atomic RMW operations, i.e., the store buffer is drained before issuing atomic RMWs and subsequent memory operations are stalled until the atomic RMW commits. This serialization, carried out by memory fences, incurs a performance cost which is expected to increase with deeper pipelines. This work proposes Free atomics, a lightweight, speculative, deadlock-free implementation of atomic operations that removes the need for memory fences, thus improving performance, while preserving atomicity and consistency. Free atomics is, to the best of our knowledge, the first proposal to enable store-to-load forwarding for atomic RMWs. Free atomics only requires simple modifications and incurs a small area overhead (15 bytes). Our evaluation using gem5-20 shows that, for a 32-core configuration, Free atomics improves performance by 12.5%, on average, for a large range of parallel workloads and 25.2%, on average, for atomic-intensive parallel workloads over a fenced atomic RMW implementation.
Embargo
Secure prefetching for secure cache systems
(IEEE Computer Society, 2024-12-03) Nath, Sumon; Navarro Torres, Agustín; Ros Bardisa, Alberto; Panda, Biswabandan; Ingeniería y Tecnología de Computadores
Transient execution attacks like Spectre and its vari-ants can cause information leakage through a cache hierarchy. There are two classes of techniques that mitigate speculative execution attacks: delay-based and invisible speculation. Invisible speculation-based techniques like GhostMinion are the high-performing yet secure techniques that mitigate all kinds of spec-ulative execution attacks. Similar to a cache system, hardware prefetchers can also cause speculative information leakage. To mitigate it, GhostMinion advocates on-commit prefetching on top of strictness ordering in the cache system. Our experiments show that the GhostMinion cache system interacts negatively with the hardware prefetchers leading to redundant traffic between different levels of cache. This traffic causes contention and increases the miss latency leading to performance loss. Next, we observe that on-commit prefetching enforced by GhostMinion leads to nerformance loss as it affects the prefetcher timeliness. We perform the first thorough analysis of the interaction between state-of-the-art prefetching techniques and the secure cache system. Based on this, we propose two microarchitectural solutions that ensure high performance while designing secure prefetchers on top of secure cache system. The first solution detects and filters redundant traffic when updating the cache hierarchy non-speculatively. The second solution ensures the timeliness of the prefetcher to compensate for the delayed triggering of prefetch requests at commit, resulting in a secure yet high-performing prefetcher. Overall, our enhancements are secure and provide synergistic interactions between hardware prefetchers and a secure cache system. Our experiments show that our filter consistently improves the performance of secure cache systems like GhostMinion in the presence of state-of-the-art prefetchers (by 1.9% for single-core and 19.0% for multi-core for the top-performing prefetcher). We see a synergistic behavior of the filter with our proposed secure prefetcher, which leads to a further increase in performance by 6.3% and 23.0% (over the top-performing prefetcher), for single-core and multi-core systems, respectively. Our enhancements are extremely lightweight incurring a storage overhead of 0.59 KB per core.

Browsing by Subject "Microarchitecture"

Results Per Page

Sort Options