- Library Home /
- Search Collections /
- Open Collections /
- Browse Collections /
- UBC Theses and Dissertations /
- Shared load-store unit for instruction-based accelerators...
Open Collections
UBC Theses and Dissertations
UBC Theses and Dissertations
Shared load-store unit for instruction-based accelerators : case study with soft multicore and multithreaded RISC-V vector processors Maheshe, Joseph
Abstract
The end of Moore's Law and Dennard Scaling has shifted computing systems toward specialized accelerators to increase performance and lower power-per-operation. Accelerators are special-purpose hardware optimized to perform specific functions within general-purpose computing systems, and modern SoCs combine many general-purpose processors with a growing number of accelerators. Accelerators achieve higher performance and power efficiency by prioritizing specialization and heterogeneity over generality and homogeneity. When specialization reduces the cost of processing low enough, accelerators become memory-limited. High-performance accelerators typically access memory directly or through caches to circumvent throughput bottlenecks from accessing data only through the processor's register file. However, direct access to shared memory introduces coherence and consistency issues, both challenging aspects of modern processor design. As accelerators proliferate in shared-memory heterogeneous SoCs, ensuring coherence and consistency in each accelerator adds significant design complexity. This thesis proposes a novel shared load-store unit (LSU) to address these challenges collectively for all accelerators. The shared LSU specifically targets instruction-based accelerators integrated into the processor's instruction set architecture. Sharing an LSU consolidates LSU logic, buffering, multiple memory access ports, and coherence and consistency logic into a single unit. It can also enable automatic dependence checking, leading to improved performance. The shared LSU aims to alleviate the burden on designers who typically create accelerator-specific LSUs. Its platform-independent design is written in standard SystemVerilog HDL and uses open-source communication protocols to facilitate hardware reuse. It also offers programmers a simplified programming model featuring custom, accelerator-based, fine-grain fences. A case study demonstrates a shared LSU that replaces individual vector LSUs in high-performance soft RISC-V vector cores integrated into a single scalar core. Each vector core enables simplified multithreading of single-threaded programs using contexts. Many of these vector cores form a novel multicore and multithreaded vector architecture that further enhances performance. The shared LSU seamlessly integrates with the vector cores in an SoC implemented on the AMD ZCU104 evaluation kit. In this system, a single-threaded program using two vector cores and two contexts achieves a 2.66x speedup on rgba2luma, effectively hiding memory latency and pipeline stalls.
Item Metadata
Title |
Shared load-store unit for instruction-based accelerators : case study with soft multicore and multithreaded RISC-V vector processors
|
Creator | |
Supervisor | |
Publisher |
University of British Columbia
|
Date Issued |
2024
|
Description |
The end of Moore's Law and Dennard Scaling has shifted computing systems toward specialized accelerators to increase performance and lower power-per-operation. Accelerators are special-purpose hardware optimized to perform specific functions within general-purpose computing systems, and modern SoCs combine many general-purpose processors with a growing number of accelerators. Accelerators achieve higher performance and power efficiency by prioritizing specialization and heterogeneity over generality and homogeneity. When specialization reduces the cost of processing low enough, accelerators become memory-limited. High-performance accelerators typically access memory directly or through caches to circumvent throughput bottlenecks from accessing data only through the processor's register file. However, direct access to shared memory introduces coherence and consistency issues, both challenging aspects of modern processor design. As accelerators proliferate in shared-memory heterogeneous SoCs, ensuring coherence and consistency in each accelerator adds significant design complexity. This thesis proposes a novel shared load-store unit (LSU) to address these challenges collectively for all accelerators. The shared LSU specifically targets instruction-based accelerators integrated into the processor's instruction set architecture. Sharing an LSU consolidates LSU logic, buffering, multiple memory access ports, and coherence and consistency logic into a single unit. It can also enable automatic dependence checking, leading to improved performance. The shared LSU aims to alleviate the burden on designers who typically create accelerator-specific LSUs. Its platform-independent design is written in standard SystemVerilog HDL and uses open-source communication protocols to facilitate hardware reuse. It also offers programmers a simplified programming model featuring custom, accelerator-based, fine-grain fences. A case study demonstrates a shared LSU that replaces individual vector LSUs in high-performance soft RISC-V vector cores integrated into a single scalar core. Each vector core enables simplified multithreading of single-threaded programs using contexts. Many of these vector cores form a novel multicore and multithreaded vector architecture that further enhances performance. The shared LSU seamlessly integrates with the vector cores in an SoC implemented on the AMD ZCU104 evaluation kit. In this system, a single-threaded program using two vector cores and two contexts achieves a 2.66x speedup on rgba2luma, effectively hiding memory latency and pipeline stalls.
|
Genre | |
Type | |
Language |
eng
|
Date Available |
2025-01-09
|
Provider |
Vancouver : University of British Columbia Library
|
Rights |
Attribution-NonCommercial-NoDerivatives 4.0 International
|
DOI |
10.14288/1.0447717
|
URI | |
Degree | |
Program | |
Affiliation | |
Degree Grantor |
University of British Columbia
|
Graduation Date |
2025-05
|
Campus | |
Scholarly Level |
Graduate
|
Rights URI | |
Aggregated Source Repository |
DSpace
|
Item Media
Item Citations and Data
Rights
Attribution-NonCommercial-NoDerivatives 4.0 International