Repository logo
  • English
  • Čeština
  • Deutsch
  • Español
  • Français
  • Gàidhlig
  • Latviešu
  • Magyar
  • Nederlands
  • Português
  • Português do Brasil
  • Suomi
  • Svenska
  • Türkçe
  • Қазақ
  • বাংলা
  • हिंदी
  • Ελληνικά
  • Log In
    or
    New user? Click here to register.
Repository logo

Repositorio Institucional de la Universidad de Murcia

Repository logoRepository logo
  • Communities & Collections
  • All of DSpace
  • Statistics
  • menu.section.collectors
  • menu.section.acerca
  • English
  • Čeština
  • Deutsch
  • Español
  • Français
  • Gàidhlig
  • Latviešu
  • Magyar
  • Nederlands
  • Português
  • Português do Brasil
  • Suomi
  • Svenska
  • Türkçe
  • Қазақ
  • বাংলা
  • हिंदी
  • Ελληνικά
  • Log In
    or
    New user? Click here to register.
  1. Home
  2. Browse by Subject

Browsing by Subject "Multi GPU"

Now showing 1 - 2 of 2
Results Per Page
Sort Options
  • Loading...
    Thumbnail Image
    Publication
    Open Access
    An autotuning approach to select the inter-GPU communication library on heterogeneous systems
    (Springer, 2024-12-12) Cámara, Jesús; Cuenca Muñoz, Antonio Javier; Cuenca, Javier; Boratto, Murilo; Vicente Jaén, Arturo; Galindo Garre, Víctor; Ingeniería y Tecnología de Computadores
    In this work, an automatic optimisation approach for parallel routines on multi-GPU systems is presented. Several inter-GPU communication libraries (such as CUDA- Aware MPI or NCCL) are used with a set of routines to perform the numerical oper- ations among the GPUs located on the compute nodes. The main objective is the selection of the most appropriate communication library, the number of GPUs to be used and the workload to be distributed among them in order to reduce the cost of data movements, which represent a large percentage of the total execution time. To this end, a hierarchical modelling of the execution time of each routine to be opti- mised is proposed, combining experimental and theoretical approaches. The results show that near-optimal decisions are taken in all the scenarios analysed.
  • Loading...
    Thumbnail Image
    Publication
    Open Access
    Griffin: hardware-software support for efficient page migration in Multi-GPU systems
    (IEEE, 2020-04-16) Baruah, Trinayan; Sun, Yifan; Dinçer, Ali Tolga; Mojumder, Saiful A.; Abellán Miguel, José Luis; Ukidave, Yash; Joshi, Ajay; Rubin, Norman; Kim, John; Kaeli, David; Ingeniería y Tecnología de Computadores; Facultad de Informática
    As transistor scaling becomes increasingly more difficult to achieve, scaling the core count on a single GPU chip has also become extremely challenging. As the volume of data to process in today's increasingly parallel workloads continues to grow unbounded, we need to find scalable solutions that can keep up with this increasing demand. To meet the need of modern-day parallel applications, multi-GPU systems offer a promising path to deliver high performance and large memory capacity. However, multi-GPU systems suffer from performance issues associated with GPU-to-GPU communication and data sharing, which severely impact the benefits of multi-GPU systems. Programming multi-GPU systems has been made considerably simpler with the advent of Unified Memory which enables runtime migration of pages to the GPU on demand. Current multi-GPU systems rely on a first-touch Demand Paging scheme, where memory pages are migrated from the CPU to the GPU on the first GPU access to a page. The data sharing nature of GPU applications makes deploying an efficient programmer-transparent mechanism for inter-GPU page migration challenging. Therefore following the initial CPU-to-GPU page migration, the page is pinned on that GPU. Future accesses to this page from other GPUs happen at a cache-line granularity - pages are not transferred between GPUs without significant programmer intervention. We observe that this mechanism suffers from two major drawbacks: 1) imbalance in the page distribution across multiple GPUs, and 2) inability to move the page to the GPU that uses it most frequently. Both of these problems lead to load imbalance across GPUs, degrading the performance of the multi-GPU system. To address these problems, we propose Griffin, a holistic hardware-software solution to improve the performance of NUMA multi-GPU systems. Griffin introduces programmer-transparent modifications to both the IOMMU and GPU architecture, supporting efficient runtime page migration based on locality information. In particular, Griffin employs a novel mechanism to detect and move pages at runtime between GPUs, increasing the frequency of resolving accesses locally, which in turn improves the performance. To ensure better load balancing across GPUs, Griffin employs a Delayed First-Touch Migration policy that ensures pages are evenly distributed across multiple GPUs. Our results on a diverse set of multi-GPU workloads show that Griffin can achieve up to a 2.9× speedup on a multi-GPU system, while incurring low implementation overhe

DSpace software copyright © 2002-2026 LYRASIS

  • Cookie settings
  • Accessibility
  • Send Feedback