Publication:
Graphfire: Synergizing Fetch, Insertion, and Replacement Policies for Graph Analytics

relationships.isAuthorOfPublication
relationships.isSecondaryAuthorOf
relationships.isDirectorOf
Authors
Manocha, Aninda ; Aragón, J.L. ; Martonosi, Margaret
item.page.secondaryauthor
item.page.director
Publisher
IEEE
publication.page.editor
DOI
https://www.doi.org/10.1109/TC.2022.3157525
item.page.type
info:eu-repo/semantics/article
Description
©2021. This manuscript version is made available under the CC-BY 4.0 license http://creativecommons.org/licenses/by /4.0/ This document is the Accepted version of a Published Work that appeared in final form in IEEE IEEE Transactions on Computers. To access the final edited and published work see https://www.doi.org/10.1109/TC.2022.3157525
Abstract
Despite their ubiquity in many important big-data applications, graph analytic kernels continue to challenge modern memory hierarchies due to their frequent, long-latency, pointer indirect accesses to vertex property data. Such accesses exhibit poor locality and variable reuse that trouble cache replacement policies, and consequently increase memory bandwidth pressure. Specialized graph-tailored prefetching mechanisms, processor designs, and memory hierarchy engines have been developed to tolerate the long latencies of such accesses. However, these approaches are either too bandwidth-intensive, require invasive hardware changes that inhibit general-purpose computation flexibility, or rely on software preprocessing that limits true speedup. This work introduces Graphfire, a flexible memory hierarchy approach that learns different access patterns in graph processing and exploits the synergy of specialized fetch, insertion, and replacement optimizations for problematic indirect accesses without relying on software or ISA support. More specifically, Graphfire identifies when these irregular accesses occur and employs tailored access granularities, data-aware insertion, and frequency-based replacement accordingly. It achieves up to a 1.79× speedup (geomean 1.3×) and these improvements scale due to bandwidth efficiency; with 64 cores, Graphfire yields up to a 71.33× speedup (geomean 63.32×) over a single baseline core and allows memory-bound graph analytic codes to scale far beyond prior work.
Citation
IEEE Transactions on Computers, vol. 72, issue 1, pp. 291-304, ISSN: 0018-9340, Enero 2023
item.page.embargo
Collections