Enhancing FPGA accelerated machine learning debug with lossless compression

UBC Theses and Dissertations

Featured Collection

UBC Theses and Dissertations

Enhancing FPGA accelerated machine learning debug with lossless compression Nafziger, Zakary Snider

Abstract

Acceleration of machine learning models is proving to be an important application for FPGAs. Unfortunately, debugging such models during training or inference is difficult. Software simulations of a machine learning system may be of insufficient detail to provide meaningful debug insight, or may require infeasibly long run-times. Thus, it is often desirable to debug the accelerated model while it is running on real hardware. Effective on-chip debug often requires instrumenting a design with additional circuitry to store run-time data, consuming valuable chip resources. Previous work has developed methods to perform lossy compression of signals by exploiting machine learning specific knowledge, thereby increasing the amount of debug context that can be stored in an on-chip trace buffer. However, all prior work compresses each successive element in a signal of interest independently. Since debug signals may have temporal similarity in many machine learning applications there is an opportunity to further increase trace buffer utilization. To that end, this thesis presents two major research contributions. The first contribution is an architecture to perform lossless temporal compression in addition to the existing lossy element-wise compression. Further, it is shown that, when applied to typical machine learning algorithms in realistic debug scenarios, approximately twice as much information can be stored in an on-chip buffer while increasing the total area of the debug instrument by approximately 25\%. The impact is that, for a given instrumentation budget, a significantly larger trace window is available during debug, possibly allowing a designer to narrow down the root cause of a bug faster. The second contribution is an evaluation of the margin for compression performance improvement. An attempt was made to determine the entropy at the input of the proposed encoder using information theory, but this was mathematically intractable. Instead, a comparison was made to a best in class software compression algorithm. It was demonstrated that while not superior to the software algorithm the proposed encoder performs well in the memory-scarce context of FPGA debug.

Item Metadata

Title	Enhancing FPGA accelerated machine learning debug with lossless compression
Creator	Nafziger, Zakary Snider
Supervisor	Wilton, Steve
Publisher	University of British Columbia
Date Issued	2023
Description	Acceleration of machine learning models is proving to be an important application for FPGAs. Unfortunately, debugging such models during training or inference is difficult. Software simulations of a machine learning system may be of insufficient detail to provide meaningful debug insight, or may require infeasibly long run-times. Thus, it is often desirable to debug the accelerated model while it is running on real hardware. Effective on-chip debug often requires instrumenting a design with additional circuitry to store run-time data, consuming valuable chip resources. Previous work has developed methods to perform lossy compression of signals by exploiting machine learning specific knowledge, thereby increasing the amount of debug context that can be stored in an on-chip trace buffer. However, all prior work compresses each successive element in a signal of interest independently. Since debug signals may have temporal similarity in many machine learning applications there is an opportunity to further increase trace buffer utilization. To that end, this thesis presents two major research contributions. The first contribution is an architecture to perform lossless temporal compression in addition to the existing lossy element-wise compression. Further, it is shown that, when applied to typical machine learning algorithms in realistic debug scenarios, approximately twice as much information can be stored in an on-chip buffer while increasing the total area of the debug instrument by approximately 25\%. The impact is that, for a given instrumentation budget, a significantly larger trace window is available during debug, possibly allowing a designer to narrow down the root cause of a bug faster. The second contribution is an evaluation of the margin for compression performance improvement. An attempt was made to determine the entropy at the input of the proposed encoder using information theory, but this was mathematically intractable. Instead, a comparison was made to a best in class software compression algorithm. It was demonstrated that while not superior to the software algorithm the proposed encoder performs well in the memory-scarce context of FPGA debug.
Genre	Thesis/Dissertation
Type	Text
Language	eng
Date Available	2023-04-20
Provider	Vancouver : University of British Columbia Library
Rights	Attribution-NonCommercial-NoDerivatives 4.0 International
DOI	10.14288/1.0431330
URI	http://hdl.handle.net/2429/84363
Degree (Theses)	Master of Applied Science - MASc
Program (Theses)	Electrical and Computer Engineering
Affiliation	Applied Science, Faculty of; Electrical and Computer Engineering, Department of
Degree Grantor	University of British Columbia
Graduation Date	2023-05
Campus	UBCV
Scholarly Level	Graduate
Rights URI	http://creativecommons.org/licenses/by-nc-nd/4.0/
Aggregated Source Repository	DSpace

Open Collections

UBC Theses and Dissertations

UBC Theses and Dissertations

Enhancing FPGA accelerated machine learning debug with lossless compression Nafziger, Zakary Snider

Abstract

Item Metadata

Item Media

Item Citations and Data

Rights