UBC Theses and Dissertations
Sparse weight activation training Raihan, Md Aamir
Neural network training is computationally and memory intensive. Sparse training can reduce the burden, but it can affect network convergence. In this work, we propose a novel CNN training algorithm Sparse Weight Activation Training (SWAT). SWAT is : (1) more computation and memory-efficient than conventional training, (2) learns a sparse network topology directly, and (3) can be adapted to learn a structured or unstructured sparse topology. SWAT is developed based on insights derived from an empirical sensitivity analysis of network training on six different network architectures and three different datasets. Empirically, we find network convergence is robust to the elimination of small magnitude weights during the forward pass and small magnitude weights and activations during the backward pass. SWAT obtains efficiency by constraining the forward and backward pass during training. SWAT dynamically searches for a sparse topology. The dynamic search of the weights allows SWAT to train a wide variety of architectures such as ResNet, VGG, DenseNet and WideResNet up to 90% sparsity. SWAT demonstrates similar or better performance on CIFAR-10, CIFAR-100, and ImageNet dataset compared to other pruning and sparse learning algorithms. Moreover, SWAT reduces total computations during training by 50% to 90%, reduces memory footprint during the backward pass by 23% to 50% for activations and 50% to 90% for weights.
Item Citations and Data
Attribution-NonCommercial-NoDerivatives 4.0 International