- Library Home /
- Search Collections /
- Open Collections /
- Browse Collections /
- UBC Theses and Dissertations /
- Efficient in-hardware compression of on-chip data
Open Collections
UBC Theses and Dissertations
UBC Theses and Dissertations
Efficient in-hardware compression of on-chip data Ghasemazar, Amin
Abstract
The past decade has seen tremendous growth in how much data is collected and stored, and consequently in the sizes of application working sets. On-chip memory capacities, however, have not kept up: average CPU last-level cache capacity per core (thread) has stagnated at 1MB. Similar trends exist in special-purpose computing systems, with only up to tens of megabytes of on-chip memory available in most recent AI accelerators. In this dissertation, we explore hardware-friendly online data compression techniques to gain the performance benefits of larger on-chip memories without paying the costs of larger silicon. We propose several solutions including two methods to compress general workloads in CPU caches, and two methods to compress AI workloads in special-purpose computing systems. In order to efficiently compress on-chip data, the compression mechanisms need to leverage all relevant data stored in on-chip memory. We propose 2DCC, a cache compression mechanism that leverages redundancy both within and across all cache data blocks, and results in a 2.12× compression factor. We then extend this insight by observing that many on-chip blocks are often similar to each other. We propose Thesaurus, a hardware- level on-line cacheline clustering mechanism to dynamically form clusters as these similar blocks appear in the data access stream. Thesaurus significantly improves the state-of-the-art cache compression ratio to 2.25×. Next, we apply our insights to special-purpose applications. We first pro- pose Channeleon, which tackles the problem of compressing the activation maps in deep neural networks (DNNs) at inference time. Leveraging the observed similarity among activation channels, Channeleon first forms clusters of similar activation channels, and then quantizes activations within each cluster. This enables the activations to have low bit-width while incurring acceptable accuracy losses. Lastly, we propose Procrustes, a sparse DNN training accelerator that prunes weights by exploiting both software and hardware knowledge. Procrustes reduces the memory footprint of models by an order of magnitude while maintaining dense model accuracy.
Item Metadata
Title |
Efficient in-hardware compression of on-chip data
|
Creator | |
Supervisor | |
Publisher |
University of British Columbia
|
Date Issued |
2021
|
Description |
The past decade has seen tremendous growth in how much data is collected and stored, and consequently in the sizes of application working sets. On-chip memory capacities, however, have not kept up: average CPU last-level cache capacity per core (thread) has stagnated at 1MB. Similar trends exist in special-purpose computing systems, with only up to tens of megabytes of on-chip memory available in most recent AI accelerators.
In this dissertation, we explore hardware-friendly online data compression techniques to gain the performance benefits of larger on-chip memories without paying the costs of larger silicon. We propose several solutions including two methods to compress general workloads in CPU caches, and two methods to compress AI workloads in special-purpose computing systems.
In order to efficiently compress on-chip data, the compression mechanisms need to leverage all relevant data stored in on-chip memory. We propose 2DCC, a cache compression mechanism that leverages redundancy both within and across all cache data blocks, and results in a 2.12× compression factor. We then extend this insight by observing that many on-chip blocks are often similar to each other. We propose Thesaurus, a hardware- level on-line cacheline clustering mechanism to dynamically form clusters as these similar blocks appear in the data access stream. Thesaurus significantly improves the state-of-the-art cache compression ratio to 2.25×.
Next, we apply our insights to special-purpose applications. We first pro- pose Channeleon, which tackles the problem of compressing the activation maps in deep neural networks (DNNs) at inference time. Leveraging the observed similarity among activation channels, Channeleon first forms clusters of similar activation channels, and then quantizes activations within each cluster. This enables the activations to have low bit-width while incurring acceptable accuracy losses. Lastly, we propose Procrustes, a sparse DNN training accelerator that prunes weights by exploiting both software and hardware knowledge. Procrustes reduces the memory footprint of models by an order of magnitude while maintaining dense model accuracy.
|
Genre | |
Type | |
Language |
eng
|
Date Available |
2021-12-08
|
Provider |
Vancouver : University of British Columbia Library
|
Rights |
Attribution-NonCommercial-NoDerivatives 4.0 International
|
DOI |
10.14288/1.0404515
|
URI | |
Degree | |
Program | |
Affiliation | |
Degree Grantor |
University of British Columbia
|
Graduation Date |
2022-05
|
Campus | |
Scholarly Level |
Graduate
|
Rights URI | |
Aggregated Source Repository |
DSpace
|
Item Media
Item Citations and Data
Rights
Attribution-NonCommercial-NoDerivatives 4.0 International