UBC Theses and Dissertations
Accelerating deep learning with lossy compression Evans, Robert David
Parallel hardware accelerators, for example Graphics Processor Units, have limited on-chip memory capacity and off-chip bandwidth. Neural network workloads can generally be separated into the preparation (training) and usage (inference) of the network. The training of deep neural networks is especially hampered by memory limitations, as increasing model size (and thus memory) is a common technique to improve accuracy. Previous approaches have examined lossy compression and offloading to reduce activation memory consumption, the primary memory consumer for most models. However, prior works use techniques without utilizing the spatial properties of many neural networks. As well, there is no known relationship between the trained accuracy of a network, and the compression rate, leading to expensive searches for compression rates. In this dissertation, we begin by examining lossy spatial compression using JPEG for ACTivations (JPEG-ACT). JPEG-ACT uses custom-tuned error sensitivities to target machine perception and a hardware accelerator to offload compressed activation values during training. Using the JPEG-ACT accelerator results in an 8.5x memory reduction over uncompressed training and 2.3x performance versus the next-best offload accelerator. Following this, we examine the theoretical relationship between compression errors and accuracy. Prior approaches used expensive tuning to determine the compression/accuracy trade-off after training. Thus, in this dissertation, accuracy guarantees can be set before training, which have a lower runtime, and provide knowledge about how much accuracy is given up for compression. Compression is set using Activation Compression with Guaranteed Convergence (AC-GC) error bounds, with a performance overhead of 0.4% over compressed training. Combining these bounds with various compression methods results in 15.1x compression on average, with theoretical guarantees of convergence. These guarantees provide better potential for AC-GC to function well on current and yet-to-be-developed models. By reducing activation memory consumption, JPEG-ACT and AC-GC allow faster iteration through possible neural network models, advancing the field to new applications and better models.
Item Citations and Data
Attribution-NonCommercial-NoDerivatives 4.0 International