UBC Theses and Dissertations

UBC Theses Logo

UBC Theses and Dissertations

Efficient in-hardware compression of on-chip data Ghasemazar, Amin

Abstract

The past decade has seen tremendous growth in how much data is collected and stored, and consequently in the sizes of application working sets. On-chip memory capacities, however, have not kept up: average CPU last-level cache capacity per core (thread) has stagnated at 1MB. Similar trends exist in special-purpose computing systems, with only up to tens of megabytes of on-chip memory available in most recent AI accelerators. In this dissertation, we explore hardware-friendly online data compression techniques to gain the performance benefits of larger on-chip memories without paying the costs of larger silicon. We propose several solutions including two methods to compress general workloads in CPU caches, and two methods to compress AI workloads in special-purpose computing systems. In order to efficiently compress on-chip data, the compression mechanisms need to leverage all relevant data stored in on-chip memory. We propose 2DCC, a cache compression mechanism that leverages redundancy both within and across all cache data blocks, and results in a 2.12× compression factor. We then extend this insight by observing that many on-chip blocks are often similar to each other. We propose Thesaurus, a hardware- level on-line cacheline clustering mechanism to dynamically form clusters as these similar blocks appear in the data access stream. Thesaurus significantly improves the state-of-the-art cache compression ratio to 2.25×. Next, we apply our insights to special-purpose applications. We first pro- pose Channeleon, which tackles the problem of compressing the activation maps in deep neural networks (DNNs) at inference time. Leveraging the observed similarity among activation channels, Channeleon first forms clusters of similar activation channels, and then quantizes activations within each cluster. This enables the activations to have low bit-width while incurring acceptable accuracy losses. Lastly, we propose Procrustes, a sparse DNN training accelerator that prunes weights by exploiting both software and hardware knowledge. Procrustes reduces the memory footprint of models by an order of magnitude while maintaining dense model accuracy.

Item Media

Item Citations and Data

Rights

Attribution-NonCommercial-NoDerivatives 4.0 International