Architectural support for inter-thread synchronization in SIMT architectures

UBC Theses and Dissertations

Featured Collection

UBC Theses and Dissertations

Architectural support for inter-thread synchronization in SIMT architectures ElTantawy, Ahmed Mohammed

Abstract

Single-Instruction Multiple-Threads (SIMT) architectures have seen widespread interest in accelerating data parallel applications. In the SIMT model, small groups of scalar threads operate in lockstep. Within each group, current SIMT implementations serialize the execution of threads that follow different paths, and to ensure efficiency, revert to lockstep execution as soon as possible. These thread scheduling constraints may cause a deadlock-free program on a multiple-instruction multiple-data architecture to deadlock on a SIMT machine. Further, fine-grained synchronization is often implemented using busy-wait loops. However, busy-wait synchronization incurs significant overheads and existing CPU solutions do not readily translate to SIMT architectures. In this thesis, we tackle these challenges. First, we propose a static analysis technique that detects SIMT deadlocks by inspecting the application control flow graph (CFG). We further propose a CFG transformation that avoids SIMT deadlocks when synchronization is local to a function. The static detection has a false detection rate of 4%-5%. The automated transformation has an average performance overhead of 8.2%-10.9% compared to manual transformation. We also propose an adaptive hardware reconvergence mechanism that supports MIMD synchronization without changing the application CFG. Our hardware approach performs on par with the compiler transformation but avoids key limitations in the compiler only solution. We show that this hardware can be further extended to support concurrent multi-path execution to improve the performance of divergent applications. Finally, We propose a hardware warp scheduling policy informed by a novel hardware mechanism for accurately detecting busy-wait synchronization on GPUs. When employed, it deprioritizes spinning warps achieving a speedup of 42.7% over Greedy Then Oldest scheduling.

Item Metadata

Title	Architectural support for inter-thread synchronization in SIMT architectures
Creator	ElTantawy, Ahmed Mohammed
Publisher	University of British Columbia
Date Issued	2018
Description	Single-Instruction Multiple-Threads (SIMT) architectures have seen widespread interest in accelerating data parallel applications. In the SIMT model, small groups of scalar threads operate in lockstep. Within each group, current SIMT implementations serialize the execution of threads that follow different paths, and to ensure efficiency, revert to lockstep execution as soon as possible. These thread scheduling constraints may cause a deadlock-free program on a multiple-instruction multiple-data architecture to deadlock on a SIMT machine. Further, fine-grained synchronization is often implemented using busy-wait loops. However, busy-wait synchronization incurs significant overheads and existing CPU solutions do not readily translate to SIMT architectures. In this thesis, we tackle these challenges. First, we propose a static analysis technique that detects SIMT deadlocks by inspecting the application control flow graph (CFG). We further propose a CFG transformation that avoids SIMT deadlocks when synchronization is local to a function. The static detection has a false detection rate of 4%-5%. The automated transformation has an average performance overhead of 8.2%-10.9% compared to manual transformation. We also propose an adaptive hardware reconvergence mechanism that supports MIMD synchronization without changing the application CFG. Our hardware approach performs on par with the compiler transformation but avoids key limitations in the compiler only solution. We show that this hardware can be further extended to support concurrent multi-path execution to improve the performance of divergent applications. Finally, We propose a hardware warp scheduling policy informed by a novel hardware mechanism for accurately detecting busy-wait synchronization on GPUs. When employed, it deprioritizes spinning warps achieving a speedup of 42.7% over Greedy Then Oldest scheduling.
Genre	Thesis/Dissertation
Type	Text
Language	eng
Date Available	2018-06-30
Provider	Vancouver : University of British Columbia Library
Rights	Attribution-NoDerivatives 4.0 International
DOI	10.14288/1.0363330
URI	http://hdl.handle.net/2429/64507
Degree	Doctor of Philosophy - PhD
Program	Electrical and Computer Engineering
Affiliation	Applied Science, Faculty of; Electrical and Computer Engineering, Department of
Degree Grantor	University of British Columbia
Graduation Date	2018-02
Campus	UBCV
Scholarly Level	Graduate
Rights URI	http://creativecommons.org/licenses/by-nd/4.0/
Aggregated Source Repository	DSpace

Open Collections

UBC Theses and Dissertations

UBC Theses and Dissertations

Architectural support for inter-thread synchronization in SIMT architectures ElTantawy, Ahmed Mohammed

Abstract

Item Metadata

Item Media

Item Citations and Data

Rights