UBC Theses and Dissertations

UBC Theses Logo

UBC Theses and Dissertations

Architectural support for inter-thread synchronization in SIMT architectures ElTantawy, Ahmed Mohammed

Abstract

Single-Instruction Multiple-Threads (SIMT) architectures have seen widespread interest in accelerating data parallel applications. In the SIMT model, small groups of scalar threads operate in lockstep. Within each group, current SIMT implementations serialize the execution of threads that follow different paths, and to ensure efficiency, revert to lockstep execution as soon as possible. These thread scheduling constraints may cause a deadlock-free program on a multiple-instruction multiple-data architecture to deadlock on a SIMT machine. Further, fine-grained synchronization is often implemented using busy-wait loops. However, busy-wait synchronization incurs significant overheads and existing CPU solutions do not readily translate to SIMT architectures. In this thesis, we tackle these challenges. First, we propose a static analysis technique that detects SIMT deadlocks by inspecting the application control flow graph (CFG). We further propose a CFG transformation that avoids SIMT deadlocks when synchronization is local to a function. The static detection has a false detection rate of 4%-5%. The automated transformation has an average performance overhead of 8.2%-10.9% compared to manual transformation. We also propose an adaptive hardware reconvergence mechanism that supports MIMD synchronization without changing the application CFG. Our hardware approach performs on par with the compiler transformation but avoids key limitations in the compiler only solution. We show that this hardware can be further extended to support concurrent multi-path execution to improve the performance of divergent applications. Finally, We propose a hardware warp scheduling policy informed by a novel hardware mechanism for accurately detecting busy-wait synchronization on GPUs. When employed, it deprioritizes spinning warps achieving a speedup of 42.7% over Greedy Then Oldest scheduling.

Item Media

Item Citations and Data

Rights

Attribution-NoDerivatives 4.0 International