UBC Theses and Dissertations
Towards scalar synchronization in SIMT architectures Ramamurthy, Arun
An important class of compute accelerators are graphics processing units (GPUs). Popular programming models for non-graphics computation on GPUs, such as CUDA and OpenCL, provide an abstraction of many parallel scalar threads. Contemporary GPU hardware groups 32 to 64 scalar threads as a single warp or wavefront and executes this group of scalar threads in lockstep. The inherent mismatch between scalar programming model and vector hardware creates a challenge when developing applications that employ synchronization on the GPU. This challenge arises from the use of a hardware stack to manage control flow divergence among scalar threads. This thesis explains the porting of the Apriori benchmark to a GPU which led to the research on synchronization in SIMT hardware. It then proposes instruction set and hardware changes that simplify the implementation of mutual exclusion when porting multiple-instruction, multiple data (MIMD) programs with synchronization to accelerators employing single-instruction, multiple thread (SIMT) hardware. These instructions when compared with more complex software only solutions, achieve similar performance. This thesis also implements and evaluates queue based mutual exclusion on SIMT hardware.
Item Citations and Data
Attribution-NonCommercial-NoDerivatives 4.0 International