UBC Theses and Dissertations

UBC Theses Logo

UBC Theses and Dissertations

Effective debug ecosystem for machine learning hardware accelerators Holanda Noronha, Daniel

Abstract

Recent years have seen a dramatic increase in the use of hardware accelerators to perform machine learning computations. Designing these circuits is challenging, especially due to bugs that may only manifest after long run times, and interactions between hardware and software that are complex to understand. As a result, debugging the machine learning accelerator and ensuring that the system is delivering acceptable performance are very time-consuming processes that significantly limit productivity. This dissertation focuses on investigating how additional circuitry added to machine learning hardware designs may allow for the effective debugging of those systems and on gathering insights on how to better debug those systems. More specifically, we focus on techniques that are suitable for accelerators implemented using Field-Programmable Gate Arrays (FPGA), since many accelerators are prototyped using this type of reconfigurable fabric. This dissertation is comprised of four major contributions towards this goal. First, we present a debugging framework that allows the designer to observe domain-specific information (e.g. sparsity and other statistics) about the machine learning workload running on the accelerator, rather than raw information that is expensive to trace. This includes the creation of a custom circuitry that allows information to be recorded for at least 21.8x longer than in previous debugging architectures. Second, we create a technique to reduce the time between debug iterations by creating an overlay that enables designers to change which signals are being traced and select how those signals are being aggregated without the need of resynthesizing the design. Third, we investigate how to debug underperforming training jobs, resulting in a novel programmable debug architecture that allows designers to create custom ways of aggregating data at debug time, instead of being constrained by a few options selected at compile time. Finally, we investigate the impacts of hardware acceleration on the optimization landscape of training systems and how this information may be used to accelerate debug. We anticipate that the concepts presented in this thesis will be used to allow designers to aggregate domain-specific information in future commercial debugging tools and to motivate future work on improving the training performance of hardware accelerators.

Item Citations and Data

Rights

Attribution-NonCommercial-NoDerivatives 4.0 International