UBC Theses and Dissertations

UBC Theses Logo

UBC Theses and Dissertations

Error resilience evaluation on GPGPU applications Fang, Bo

Abstract

While graphics processing units (GPUs) have gained wide adoption as accelerators for general-purpose applications (GPGPU), the end-to-end reliability implications of their use have not been quantified. Fault injection is a widely used method for evaluating the reliability of applications. However, building a fault injector for GPGPU applications is challenging due to their massive parallelism, which makes it difficult to achieve representativeness while being time-efficient. This thesis makes three key contributions. First, it presents the design of a fault-injection methodology to evaluate the end-to-end reliability properties of application kernels running on GPUs. Second, it introduces a fault-injection tool that uses real GPU hardware and offers a good balance between the representativeness and the efficiency of the fault injection experiments. Third, it characterizes the error resilience characteristics of twelve GPGPU applications. Last but not least, this thesis provides preliminary insights on correlations between algorithm properties and the measured silent data corruption rates of applications.

Item Media

Item Citations and Data

Rights

Attribution-NonCommercial-NoDerivs 2.5 Canada