Using mixed low-precision formats in multiply-accumulate (MAC) units for DNN training

UBC Theses and Dissertations

Featured Collection

UBC Theses and Dissertations

Using mixed low-precision formats in multiply-accumulate (MAC) units for DNN training Tatsumi, Mariko

Abstract

Due to limited size, cost and power, embedded devices do not offer the same computational throughput as graphics processing units (GPUs) for training Deep Neural Networks (DNNs). The most compute-intensive stage of multilayer perceptron (MLP) and convolutional neural network (CNN) training is the general matrix multiply (GEMM) kernel which is executed three times per layer in each iteration: once for forward-propagation and twice for back-propagation. To reduce the number of operations, techniques such as distillation (to reduce model size) and pruning (to introduce sparsity) are commonly applied. This thesis considers another technique, where the computational effort of each operation is reduced using low-precision arithmetic. While the use of small data types is common in DNN inference, this is not yet common in DNN training. Previous work in the area is somewhat limited, sometimes only considering 16-bit floating-point formats or overlooking implementation details, such as the area and accuracy tradeoffs from exact digital multiplier designs. This thesis considers the use of mixed-precision operations (MPO) within the GEMM kernel for DNN training. To conduct these investigations, we have implemented a complete DNN training framework for embedded systems, Archimedes-MPO. Starting with the C++ library TinyDNN, we have abstracted each layer to use custom data types and accelerated the GEMM stage with CUDA and Vitis HLS to produce bit-accurate GPU and FPGA implementations. This framework allows us to exactly measure the impact of various multiplier and accumulator designs on area and accuracy. Compared to 32-bit floating-point multiplier, as few as 2 mantissa bits attain similar accuracy. Accuracy losses are reduced with adaptive loss scaling and the removal of hardware for rounding and not-a-number (NaN) representations. Removal of subnormals saves area as well, but hurts accuracy, so we propose a custom subnormal encoding as a compromise. For accumulation, 12-bit floating-point and 21-bit fixed-point formats work similarly. Fixed-point accumulation seems to have an area advantage, but the impact of a wider output data size can be costly on downstream logic. While precise results depend upon the model and dataset used, the observed trends and framework can help the design of future GEMM-based hardware accelerators for DNNs.

Item Metadata

Title	Using mixed low-precision formats in multiply-accumulate (MAC) units for DNN training
Creator	Tatsumi, Mariko
Supervisor	Lemieux, Guy
Publisher	University of British Columbia
Date Issued	2022
Description	Due to limited size, cost and power, embedded devices do not offer the same computational throughput as graphics processing units (GPUs) for training Deep Neural Networks (DNNs). The most compute-intensive stage of multilayer perceptron (MLP) and convolutional neural network (CNN) training is the general matrix multiply (GEMM) kernel which is executed three times per layer in each iteration: once for forward-propagation and twice for back-propagation. To reduce the number of operations, techniques such as distillation (to reduce model size) and pruning (to introduce sparsity) are commonly applied. This thesis considers another technique, where the computational effort of each operation is reduced using low-precision arithmetic. While the use of small data types is common in DNN inference, this is not yet common in DNN training. Previous work in the area is somewhat limited, sometimes only considering 16-bit floating-point formats or overlooking implementation details, such as the area and accuracy tradeoffs from exact digital multiplier designs. This thesis considers the use of mixed-precision operations (MPO) within the GEMM kernel for DNN training. To conduct these investigations, we have implemented a complete DNN training framework for embedded systems, Archimedes-MPO. Starting with the C++ library TinyDNN, we have abstracted each layer to use custom data types and accelerated the GEMM stage with CUDA and Vitis HLS to produce bit-accurate GPU and FPGA implementations. This framework allows us to exactly measure the impact of various multiplier and accumulator designs on area and accuracy. Compared to 32-bit floating-point multiplier, as few as 2 mantissa bits attain similar accuracy. Accuracy losses are reduced with adaptive loss scaling and the removal of hardware for rounding and not-a-number (NaN) representations. Removal of subnormals saves area as well, but hurts accuracy, so we propose a custom subnormal encoding as a compromise. For accumulation, 12-bit floating-point and 21-bit fixed-point formats work similarly. Fixed-point accumulation seems to have an area advantage, but the impact of a wider output data size can be costly on downstream logic. While precise results depend upon the model and dataset used, the observed trends and framework can help the design of future GEMM-based hardware accelerators for DNNs.
Genre	Thesis/Dissertation
Type	Text
Language	eng
Date Available	2022-04-22
Provider	Vancouver : University of British Columbia Library
Rights	Attribution-NonCommercial-NoDerivatives 4.0 International
DOI	10.14288/1.0412995
URI	http://hdl.handle.net/2429/81290
Degree	Master of Applied Science - MASc
Program	Electrical and Computer Engineering
Affiliation	Applied Science, Faculty of; Electrical and Computer Engineering, Department of
Degree Grantor	University of British Columbia
Graduation Date	2022-05
Campus	UBCV
Scholarly Level	Graduate
Rights URI	http://creativecommons.org/licenses/by-nc-nd/4.0/
Aggregated Source Repository	DSpace

Open Collections

UBC Theses and Dissertations

UBC Theses and Dissertations

Using mixed low-precision formats in multiply-accumulate (MAC) units for DNN training Tatsumi, Mariko

Abstract

Item Metadata

Item Media

Item Citations and Data

Rights