TY - THES AU - Turner, Andrew Edmund PY - 2012 TI - On replay and hazards in graphics processing units KW - Thesis/Dissertation LA - eng M3 - Text AB - Graphics Processing Units (GPUs) have potential for more efficient execution of programs, both time wise and energy wise. They can achieve this by devoting more of their hardware for functional units. GPGPU-Sim is modified to simulate an aggressively modern compute accelerator and used to investigate the impact of hazards in massively multithreaded accelerator architectures such as those represented by modern GPUs employing a single-instruction, multiple thread (SIMT) execution model. Hazards are events that occur in the execution of a program which limit performance. We explore design tradeoffs in hazard handling. We find that in common architectures hazards that stall the pipeline cause other unrelated threads to be unable to make forward progress. This thesis explores an alternative organization, called replay, in which instructions that cannot proceed through the memory stage are squashed and later replayed so that instructions from other threads can make forward progress. This replay architecture can behave pathologically in some cases by excessively replaying without forward progress. A method which predicts when hazards may occur and thus can reduce the number of unnecessary replays and improve performance is proposed and evaluated. This method is found to improve performance up to 13.3% over the stalling baseline, and up to 3.3% over replay. N2 - Graphics Processing Units (GPUs) have potential for more efficient execution of programs, both time wise and energy wise. They can achieve this by devoting more of their hardware for functional units. GPGPU-Sim is modified to simulate an aggressively modern compute accelerator and used to investigate the impact of hazards in massively multithreaded accelerator architectures such as those represented by modern GPUs employing a single-instruction, multiple thread (SIMT) execution model. Hazards are events that occur in the execution of a program which limit performance. We explore design tradeoffs in hazard handling. We find that in common architectures hazards that stall the pipeline cause other unrelated threads to be unable to make forward progress. This thesis explores an alternative organization, called replay, in which instructions that cannot proceed through the memory stage are squashed and later replayed so that instructions from other threads can make forward progress. This replay architecture can behave pathologically in some cases by excessively replaying without forward progress. A method which predicts when hazards may occur and thus can reduce the number of unnecessary replays and improve performance is proposed and evaluated. This method is found to improve performance up to 13.3% over the stalling baseline, and up to 3.3% over replay. UR - https://open.library.ubc.ca/collections/24/items/1.0073331 ER - End of Reference