Sparse weight activation training

UBC Theses and Dissertations

Featured Collection

UBC Theses and Dissertations

Sparse weight activation training Raihan, Md Aamir

Abstract

Neural network training is computationally and memory intensive. Sparse training can reduce the burden, but it can affect network convergence. In this work, we propose a novel CNN training algorithm Sparse Weight Activation Training (SWAT). SWAT is : (1) more computation and memory-efficient than conventional training, (2) learns a sparse network topology directly, and (3) can be adapted to learn a structured or unstructured sparse topology. SWAT is developed based on insights derived from an empirical sensitivity analysis of network training on six different network architectures and three different datasets. Empirically, we find network convergence is robust to the elimination of small magnitude weights during the forward pass and small magnitude weights and activations during the backward pass. SWAT obtains efficiency by constraining the forward and backward pass during training. SWAT dynamically searches for a sparse topology. The dynamic search of the weights allows SWAT to train a wide variety of architectures such as ResNet, VGG, DenseNet and WideResNet up to 90% sparsity. SWAT demonstrates similar or better performance on CIFAR-10, CIFAR-100, and ImageNet dataset compared to other pruning and sparse learning algorithms. Moreover, SWAT reduces total computations during training by 50% to 90%, reduces memory footprint during the backward pass by 23% to 50% for activations and 50% to 90% for weights.

Item Metadata

Title	Sparse weight activation training
Creator	Raihan, Md Aamir
Publisher	University of British Columbia
Date Issued	2020
Description	Neural network training is computationally and memory intensive. Sparse training can reduce the burden, but it can affect network convergence. In this work, we propose a novel CNN training algorithm Sparse Weight Activation Training (SWAT). SWAT is : (1) more computation and memory-efficient than conventional training, (2) learns a sparse network topology directly, and (3) can be adapted to learn a structured or unstructured sparse topology. SWAT is developed based on insights derived from an empirical sensitivity analysis of network training on six different network architectures and three different datasets. Empirically, we find network convergence is robust to the elimination of small magnitude weights during the forward pass and small magnitude weights and activations during the backward pass. SWAT obtains efficiency by constraining the forward and backward pass during training. SWAT dynamically searches for a sparse topology. The dynamic search of the weights allows SWAT to train a wide variety of architectures such as ResNet, VGG, DenseNet and WideResNet up to 90% sparsity. SWAT demonstrates similar or better performance on CIFAR-10, CIFAR-100, and ImageNet dataset compared to other pruning and sparse learning algorithms. Moreover, SWAT reduces total computations during training by 50% to 90%, reduces memory footprint during the backward pass by 23% to 50% for activations and 50% to 90% for weights.
Genre	Thesis/Dissertation
Type	Text
Language	eng
Date Available	2021-05-31
Provider	Vancouver : University of British Columbia Library
Rights	Attribution-NonCommercial-NoDerivatives 4.0 International
DOI	10.14288/1.0390935
URI	http://hdl.handle.net/2429/74543
Degree (Theses)	Master of Applied Science - MASc
Program (Theses)	Electrical and Computer Engineering
Affiliation	Applied Science, Faculty of; Electrical and Computer Engineering, Department of
Degree Grantor	University of British Columbia
Graduation Date	2020-11
Campus	UBCV
Scholarly Level	Graduate
Rights URI	http://creativecommons.org/licenses/by-nc-nd/4.0/
Aggregated Source Repository	DSpace

Open Collections

UBC Theses and Dissertations

UBC Theses and Dissertations

Sparse weight activation training Raihan, Md Aamir

Abstract

Item Metadata

Item Media

Item Citations and Data

Rights