DigitalUM :: Browsing by Subject "Compilers"

Browsing by Subject "Compilers"

Now showing 1 - 4 of 4

Open Access
Code Detection for Hardware Acceleration Using Large Language Models
(2024-03-01) Martínez Sánchez, Pablo Antonio; Bernabé García, Gregorio; García Carrasco, José Manuel; Ingeniería y Tecnología de Computadores
Large language models (LLMs) have been massively applied to many tasks, often surpassing state-of-the-art approaches. While their effectiveness in code generation has been extensively studied (e.g., AlphaCode), their potential for code detection remains unexplored. This work presents the first analysis of code detection using LLMs. Our study examines essential kernels, including matrix multiplication, convolution, fast-fourier transform and LU factorization, implemented in C/C++. We propose both a preliminary, naive prompt and a novel prompting strategy for code detection. Results reveal that conventional prompting achieves great precision but poor accuracy (67.5%, 22.5%, 79.5% and 64% for GEMM, convolution, FFT and LU factorization, respectively) due to a high number of false positives. Our novel prompting strategy substantially reduces false positives, resulting in excellent overall accuracy (91.2%, 98%, 99.7% and 99.7%, respectively). These results pose a considerable challenge to existing state-of-the-art code detection methods.
Open Access
Multiversioned Decoupled Access-Execute: the Key to Energy-Efficient Compilation of General-Purpose Programs
(ACM, 2016-03-17) Koukos, Konstantinos; Ekemark, Per; Zacharopoulos, Georgios; Spiliopoulos, Vasileios; Kaxiras, Stefanos; Jimborean, Alexandra; Ingeniería y Tecnología de Computadores
Computer architecture design faces an era of great challenges in an attempt to simultaneously improve performance and energy efficiency. Previous hardware techniques for energy management become severely limited, and thus, compilers play an essential role in matching the software to the more restricted hardware capabilities. One promising approach is software decoupled access-execute (DAE), in which the compiler transforms the code into coarsegrain phases that are well-matched to the Dynamic Voltage and Frequency Scaling (DVFS) capabilities of the hardware. While this method is proved efficient for statically analyzable codes, generalpurpose applications pose significant challenges due to pointer aliasing, complex control flow and unknown runtime events. We propose a universal compile-time method to decouple generalpurpose applications, using simple but efficient heuristics. Our solutions overcome the challenges of complex code and show that automatic decoupled execution significantly reduces the energy expenditure of irregular or memory-bound applications and even yields slight performance boosts. Overall, our technique achieves over 20% on average energy-delay-product (EDP) improvements (energy over 15% and performance over 5%) across 14 benchmarks from SPEC CPU 2006 and Parboil benchmark suites, with peak EDP improvements surpassing 70%.
Open Access
Static Instruction Scheduling for High Performance on Limited Hardware
(IEEE, 2018-04-01) Tran, Kim-Anh; Carlson, Trevor E.; Koukos, Konstantinos; Själander, Magnus; Spiliopoulos, Vasileios; Kaxiras, Stefanos; Jimborean, Alexandra; Ingeniería y Tecnología de Computadores
Complex out-of-order (OoO) processors have been designed to overcome the restrictions of outstanding long-latency misses at the cost of increased energy consumption. Simple, limited OoO processors are a compromise in terms of energy consumption and performance, as they have fewer hardware resources to tolerate the penalties of long-latency loads. In worst case, these loads may stall the processor entirely. We present Clairvoyance, a compiler based technique that generates code able to hide memory latency and better utilize simple OoO processors. By clustering loads found across basic block boundaries, Clairvoyance overlaps the outstanding latencies to increases memory-level parallelism. We show that these simple OoO processors, equipped with the appropriate compiler support, can effectively hide long-latency loads and achieve performance improvements for memory-bound applications. To this end, Clairvoyance tackles (i) statically unknown dependencies, (ii) insufficient independent instructions, and (iii) register pressure. Clairvoyance achieves a geomean execution time improvement of 14 percent for memory-bound applications, on top of standard O3 optimizations, while maintaining compute-bound applications' high-performance.
Restricted
SWOOP: Software-Hardware Co-design for Non-speculative, Execute-Ahead, In-Order Cores
(ACM, 2018-06-11) Tran, Kim-Anh; Jimborean, Alexandra; Carlson, Trevor E.; Koukos, Konstantinos; Själander, Magnus; Kaxiras, Stefanos; Ingeniería y Tecnología de Computadores
Increasing demands for energy eiciency constrain emerging hardware. These new hardware trends challenge the established assumptions in code generation and force us to rethink existing software optimization techniques. We propose a cross-layer redesign of the way compilers and the underlying microarchitecture are built and interact, to achieve both performance and high energy eiciency. In this paper, we address one of the main performance bottlenecksÐlast-level cache missesÐthrough a softwarehardware co-design. Our approach is able to hide memory latency and attain increased memory and instruction level parallelism by orchestrating a non-speculative, execute-ahead paradigm in software (SWOOP). While out-of-order (OoO) architectures attempt to hide memory latency by dynamically reordering instructions, they do so through expensive, power-hungry, speculative mechanisms.We aim to shift this complexity into software, and we build upon compilation techniques inherited from VLIW, software pipelining, modulo scheduling, decoupled access-execution, and software prefetching. In contrast to previous approaches we do not rely on either software or hardware speculation that can be detrimental to eiciency. Our SWOOP compiler is enhanced with lightweight architectural support, thus being able to transform applications that include highly complex control-low and indirect memory accesses. The efectiveness of our software-hardware co-design is proven on the most limited but energy-eicient microarchitectures, non-speculative, in-order execution (InO) cores, which rely entirely on compile-time instruction scheduling. We show that (1) our approach achieves its goal in hiding the latency of the last-level cache misses and improves performance by 34% and energy eiciency by 23% over the baseline InO core, competitive with an oracle InO core with a perfect last-level cache; (2) can even exceed the performance of the oracle core, by exposing a higher degree of memory and instruction level parallelism. Moreover, we compare to a modest speculative OoO core, which hides not only the latency of last-level cache misses, but most instruction latency, and conclude that while the OoO core is still 39% faster than SWOOP, it pays a steep price for this advantage by doubling the energy consumption.

Browsing by Subject "Compilers"

Results Per Page

Sort Options