Accelerating deep learning with lossy compression

UBC Theses and Dissertations

Featured Collection

UBC Theses and Dissertations

Accelerating deep learning with lossy compression Evans, Robert David

Abstract

Parallel hardware accelerators, for example Graphics Processor Units, have limited on-chip memory capacity and off-chip bandwidth. Neural network workloads can generally be separated into the preparation (training) and usage (inference) of the network. The training of deep neural networks is especially hampered by memory limitations, as increasing model size (and thus memory) is a common technique to improve accuracy. Previous approaches have examined lossy compression and offloading to reduce activation memory consumption, the primary memory consumer for most models. However, prior works use techniques without utilizing the spatial properties of many neural networks. As well, there is no known relationship between the trained accuracy of a network, and the compression rate, leading to expensive searches for compression rates. In this dissertation, we begin by examining lossy spatial compression using JPEG for ACTivations (JPEG-ACT). JPEG-ACT uses custom-tuned error sensitivities to target machine perception and a hardware accelerator to offload compressed activation values during training. Using the JPEG-ACT accelerator results in an 8.5x memory reduction over uncompressed training and 2.3x performance versus the next-best offload accelerator. Following this, we examine the theoretical relationship between compression errors and accuracy. Prior approaches used expensive tuning to determine the compression/accuracy trade-off after training. Thus, in this dissertation, accuracy guarantees can be set before training, which have a lower runtime, and provide knowledge about how much accuracy is given up for compression. Compression is set using Activation Compression with Guaranteed Convergence (AC-GC) error bounds, with a performance overhead of 0.4% over compressed training. Combining these bounds with various compression methods results in 15.1x compression on average, with theoretical guarantees of convergence. These guarantees provide better potential for AC-GC to function well on current and yet-to-be-developed models. By reducing activation memory consumption, JPEG-ACT and AC-GC allow faster iteration through possible neural network models, advancing the field to new applications and better models.

Item Metadata

Title	Accelerating deep learning with lossy compression
Creator	Evans, Robert David
Supervisor	Aamodt, Tor M.
Publisher	University of British Columbia
Date Issued	2022
Description	Parallel hardware accelerators, for example Graphics Processor Units, have limited on-chip memory capacity and off-chip bandwidth. Neural network workloads can generally be separated into the preparation (training) and usage (inference) of the network. The training of deep neural networks is especially hampered by memory limitations, as increasing model size (and thus memory) is a common technique to improve accuracy. Previous approaches have examined lossy compression and offloading to reduce activation memory consumption, the primary memory consumer for most models. However, prior works use techniques without utilizing the spatial properties of many neural networks. As well, there is no known relationship between the trained accuracy of a network, and the compression rate, leading to expensive searches for compression rates. In this dissertation, we begin by examining lossy spatial compression using JPEG for ACTivations (JPEG-ACT). JPEG-ACT uses custom-tuned error sensitivities to target machine perception and a hardware accelerator to offload compressed activation values during training. Using the JPEG-ACT accelerator results in an 8.5x memory reduction over uncompressed training and 2.3x performance versus the next-best offload accelerator. Following this, we examine the theoretical relationship between compression errors and accuracy. Prior approaches used expensive tuning to determine the compression/accuracy trade-off after training. Thus, in this dissertation, accuracy guarantees can be set before training, which have a lower runtime, and provide knowledge about how much accuracy is given up for compression. Compression is set using Activation Compression with Guaranteed Convergence (AC-GC) error bounds, with a performance overhead of 0.4% over compressed training. Combining these bounds with various compression methods results in 15.1x compression on average, with theoretical guarantees of convergence. These guarantees provide better potential for AC-GC to function well on current and yet-to-be-developed models. By reducing activation memory consumption, JPEG-ACT and AC-GC allow faster iteration through possible neural network models, advancing the field to new applications and better models.
Genre	Thesis/Dissertation
Type	Text
Language	eng
Date Available	2022-04-06
Provider	Vancouver : University of British Columbia Library
Rights	Attribution-NonCommercial-NoDerivatives 4.0 International
DOI	10.14288/1.0412625
URI	http://hdl.handle.net/2429/81111
Degree	Doctor of Philosophy - PhD
Program	Electrical and Computer Engineering
Affiliation	Applied Science, Faculty of; Electrical and Computer Engineering, Department of
Degree Grantor	University of British Columbia
Graduation Date	2022-05
Campus	UBCV
Scholarly Level	Graduate
Rights URI	http://creativecommons.org/licenses/by-nc-nd/4.0/
Aggregated Source Repository	DSpace

Open Collections

UBC Theses and Dissertations

UBC Theses and Dissertations

Accelerating deep learning with lossy compression Evans, Robert David

Abstract

Item Metadata

Item Media

Item Citations and Data

Rights