UBC Theses and Dissertations

UBC Theses Logo

UBC Theses and Dissertations

Analytically driven software/hardware co-design for accelerating tensor workloads Ng, Christopher Cheuk Wing

Abstract

The emergence of deep learning has launched many works in deep learning accelerators. To fully realize the potential of these accelerators, dataflow mapping must be optimized in order to reduce the number of memory accesses. Dataflow mapping is crucial to improving the performance of deep learning workloads, but mapping optimization is a difficult problem due to the enormous, non-convex, and non-differentiable search space. As workloads become larger and larger, the problem becomes harder while the importance of dataflow increases. To tackle the problem, prior work reduces the search space using empirically driven, or arbitrary heuristics. However, these heuristics are either too simple and the optimization process is still too slow, or are too aggressive and remove optimal mappings. Prior work also explored using black-box optimizers, but reformulating the problem into the input of these black-box optimizers is not always feasible and scalable, leading to sub-optimal or even invalid solutions. In this thesis, we tackle the problem by first formally analyzing how the different aspects of mapping (tiling, ordering, unrolling) algebraically affect memory reuse and performance in order to identify sub-optimal spaces. Next, we introduce new state-space representations and traversal methods to enable the pruning of these spaces, which dramatically reduces the search space without rejecting the best solutions. Finally, we extend these analyses and techniques to tackle problems closely related to mapping optimization, such as memory configuration optimization. Sunstone, our proof-of-concept implementation, improves the optimization time for some of the complex tensor operations by up to 10x compared to prior work, and can yield mappings with up to 1.5-2.5x lower EDP.

Item Citations and Data

Rights

Attribution 4.0 International