DigitalUM :: Browsing by Subject "Interconnection networks"

Browsing by Subject "Interconnection networks"

Now showing 1 - 1 of 1

Open Access
Scalability Limitations of Processing-in-Memory using RealSystem Evaluations
(Association for Computing Machinery (ACM), 2024-03) Gilbert, Jonatan; Haeyoon, Cho; Hyojun, Son; Xiangyu, Wu; Neal, Livesay; Evelio, Mora; Kaustubh, Shivdikar; José L., Abellán; Ajay, Joshi; David, Kaeli; John, Kim; Ingeniería y Tecnología de Computadores
Processing-in-memory (PIM), where the compute is moved closer to the memory or the data, has been widelyexplored to accelerate emerging workloads. Recently, different PIM-based systems have been announced bymemory vendors to minimize data movement and improve performance as well as energy efficiency. Onecritical component of PIM is the large amount of compute parallelism provided across many PIM “nodes” orthe compute units near the memory. In this work, we provide an extensive evaluation and analysis of realPIM systems based on UPMEM PIM. We show that while there are benefits of PIM, there are also scalabilitychallenges and limitations as the number of PIM nodes increases. In particular, we show how collectivecommunications that are commonly found in many kernels/workloads can be problematic for PIM systems.To evaluate the impact of collective communication in PIM architectures, we provide an in-depth analysisof two workloads on the UPMEM PIM system that utilize representative common collective communicationpatterns – AllReduce and All-to-All communication. Specifically, we evaluate 1) embedding tables that arecommonly used in recommendation systems that require AllReduce and 2) the Number Theoretic Transform(NTT) kernel which is a critical component of Fully Homomorphic Encryption (FHE) that requires All-to-Allcommunication. We analyze the performance benefits of these workloads and show how they can be efficientlymapped to the PIM architecture through alternative data partitioning. However, since each PIM compute unitcan only access its local memory, when communication is necessary between PIM nodes (or remote data isneeded), communication between the compute units must be done through the host CPU, thereby severelyhampering application performance. To increase the scalability (or applicability) of PIM to future workloads,we make the case for how future PIM architectures need efficient communication or interconnection networksbetween the PIM nodes that require both hardware and software support.

Browsing by Subject "Interconnection networks"

Results Per Page

Sort Options