- Library Home /
- Search Collections /
- Open Collections /
- Browse Collections /
- UBC Theses and Dissertations /
- Analytically driven software/hardware co-design for...
Open Collections
UBC Theses and Dissertations
UBC Theses and Dissertations
Analytically driven software/hardware co-design for accelerating tensor workloads Ng, Christopher Cheuk Wing
Abstract
The emergence of deep learning has launched many works in deep learning accelerators. To fully realize the potential of these accelerators, dataflow mapping must be optimized in order to reduce the number of memory accesses. Dataflow mapping is crucial to improving the performance of deep learning workloads, but mapping optimization is a difficult problem due to the enormous, non-convex, and non-differentiable search space. As workloads become larger and larger, the problem becomes harder while the importance of dataflow increases. To tackle the problem, prior work reduces the search space using empirically driven, or arbitrary heuristics. However, these heuristics are either too simple and the optimization process is still too slow, or are too aggressive and remove optimal mappings. Prior work also explored using black-box optimizers, but reformulating the problem into the input of these black-box optimizers is not always feasible and scalable, leading to sub-optimal or even invalid solutions. In this thesis, we tackle the problem by first formally analyzing how the different aspects of mapping (tiling, ordering, unrolling) algebraically affect memory reuse and performance in order to identify sub-optimal spaces. Next, we introduce new state-space representations and traversal methods to enable the pruning of these spaces, which dramatically reduces the search space without rejecting the best solutions. Finally, we extend these analyses and techniques to tackle problems closely related to mapping optimization, such as memory configuration optimization. Sunstone, our proof-of-concept implementation, improves the optimization time for some of the complex tensor operations by up to 10x compared to prior work, and can yield mappings with up to 1.5-2.5x lower EDP.
Item Metadata
Title |
Analytically driven software/hardware co-design for accelerating tensor workloads
|
Creator | |
Supervisor | |
Publisher |
University of British Columbia
|
Date Issued |
2022
|
Description |
The emergence of deep learning has launched many works in deep learning accelerators. To fully realize the potential of these accelerators, dataflow mapping must be optimized in order to reduce the number of memory accesses. Dataflow mapping is crucial to improving the performance of deep learning workloads, but mapping optimization is a difficult problem due to the enormous, non-convex, and non-differentiable search space. As workloads become larger and larger, the problem becomes harder while the importance of dataflow increases.
To tackle the problem, prior work reduces the search space using empirically driven, or arbitrary heuristics. However, these heuristics are either too simple and the optimization process is still too slow, or are too aggressive and remove optimal mappings. Prior work also explored using black-box optimizers, but reformulating the problem into the input of these black-box optimizers is not always feasible and scalable, leading to sub-optimal or even invalid solutions.
In this thesis, we tackle the problem by first formally analyzing how the different aspects of mapping (tiling, ordering, unrolling) algebraically affect memory reuse and performance in order to identify sub-optimal spaces. Next, we introduce new state-space representations and traversal methods to enable the pruning of these spaces, which dramatically reduces the search space without rejecting the best solutions. Finally, we extend these analyses and techniques to tackle problems closely related to mapping optimization, such as memory configuration optimization.
Sunstone, our proof-of-concept implementation, improves the optimization time for some of the complex tensor operations by up to 10x compared to prior work, and can yield mappings with up to 1.5-2.5x lower EDP.
|
Genre | |
Type | |
Language |
eng
|
Date Available |
2022-10-19
|
Provider |
Vancouver : University of British Columbia Library
|
Rights |
Attribution 4.0 International
|
DOI |
10.14288/1.0421312
|
URI | |
Degree | |
Program | |
Affiliation | |
Degree Grantor |
University of British Columbia
|
Graduation Date |
2022-11
|
Campus | |
Scholarly Level |
Graduate
|
Rights URI | |
Aggregated Source Repository |
DSpace
|
Item Media
Item Citations and Data
Rights
Attribution 4.0 International