- Library Home /
- Search Collections /
- Open Collections /
- Browse Collections /
- UBC Theses and Dissertations /
- Sparse weight activation training
Open Collections
UBC Theses and Dissertations
UBC Theses and Dissertations
Sparse weight activation training Raihan, Md Aamir
Abstract
Neural network training is computationally and memory intensive. Sparse training can reduce the burden, but it can affect network convergence. In this work, we propose a novel CNN training algorithm Sparse Weight Activation Training (SWAT). SWAT is : (1) more computation and memory-efficient than conventional training, (2) learns a sparse network topology directly, and (3) can be adapted to learn a structured or unstructured sparse topology. SWAT is developed based on insights derived from an empirical sensitivity analysis of network training on six different network architectures and three different datasets. Empirically, we find network convergence is robust to the elimination of small magnitude weights during the forward pass and small magnitude weights and activations during the backward pass. SWAT obtains efficiency by constraining the forward and backward pass during training. SWAT dynamically searches for a sparse topology. The dynamic search of the weights allows SWAT to train a wide variety of architectures such as ResNet, VGG, DenseNet and WideResNet up to 90% sparsity. SWAT demonstrates similar or better performance on CIFAR-10, CIFAR-100, and ImageNet dataset compared to other pruning and sparse learning algorithms. Moreover, SWAT reduces total computations during training by 50% to 90%, reduces memory footprint during the backward pass by 23% to 50% for activations and 50% to 90% for weights.
Item Metadata
Title |
Sparse weight activation training
|
Creator | |
Publisher |
University of British Columbia
|
Date Issued |
2020
|
Description |
Neural network training is computationally and memory intensive. Sparse training
can reduce the burden, but it can affect network convergence. In this work, we propose
a novel CNN training algorithm Sparse Weight Activation Training (SWAT).
SWAT is : (1) more computation and memory-efficient than conventional training,
(2) learns a sparse network topology directly, and (3) can be adapted to learn a
structured or unstructured sparse topology. SWAT is developed based on insights
derived from an empirical sensitivity analysis of network training on six different
network architectures and three different datasets. Empirically, we find network
convergence is robust to the elimination of small magnitude weights during the forward
pass and small magnitude weights and activations during the backward pass.
SWAT obtains efficiency by constraining the forward and backward pass during
training. SWAT dynamically searches for a sparse topology. The dynamic search
of the weights allows SWAT to train a wide variety of architectures such as ResNet,
VGG, DenseNet and WideResNet up to 90% sparsity. SWAT demonstrates similar
or better performance on CIFAR-10, CIFAR-100, and ImageNet dataset compared
to other pruning and sparse learning algorithms. Moreover, SWAT reduces total
computations during training by 50% to 90%, reduces memory footprint during
the backward pass by 23% to 50% for activations and 50% to 90% for weights.
|
Genre | |
Type | |
Language |
eng
|
Date Available |
2021-05-31
|
Provider |
Vancouver : University of British Columbia Library
|
Rights |
Attribution-NonCommercial-NoDerivatives 4.0 International
|
DOI |
10.14288/1.0390935
|
URI | |
Degree | |
Program | |
Affiliation | |
Degree Grantor |
University of British Columbia
|
Graduation Date |
2020-11
|
Campus | |
Scholarly Level |
Graduate
|
Rights URI | |
Aggregated Source Repository |
DSpace
|
Item Media
Item Citations and Data
Rights
Attribution-NonCommercial-NoDerivatives 4.0 International