Interpolation, growth conditions, and stochastic gradient descent

UBC Theses and Dissertations

Featured Collection

UBC Theses and Dissertations

Interpolation, growth conditions, and stochastic gradient descent Mishkin, Aaron

Abstract

Current machine learning practice requires solving huge-scale empirical risk minimization problems quickly and robustly. These problems are often highly under-determined and admit multiple solutions which exactly fit, or interpolate, the training data. In such cases, stochastic gradient descent has been shown to converge without decreasing step-sizes or averaging, and can achieve the fast convergence of deterministic gradient methods. Recent work has further shown that stochastic acceleration and line-search methods for step-size selection are possible in this restricted setting. Although pioneering, existing analyses for first-order methods under interpolation have two major weaknesses: they are restricted to the finite-sum setting, and, secondly, they are not tight with the best deterministic rates. To address these issues, we extend the notion of interpolation to stochastic optimization problems with general, first-order oracles. We use the proposed framework to analyze stochastic gradient descent with a fixed step-size and with an Armijo-type line-search, as well as Nesterov’s accelerated gradient method with stochastic gradients. In nearly all settings, we obtain faster convergence with a wider range of parameters. The improvement for stochastic Nesterov acceleration is comparable to dividing by the square-root of the condition number and addresses criticism that existing rates were not truly “accelerated”. In the case of convex functions, our convergence rates for stochastic gradient descent — both with and without the stochastic Armijo line-search — recover the best-known rates in the deterministic setting. We also provide a simple extension to L2-regularized minimization, which opens the path to proximal-gradient methods and non-smooth optimization under interpolation.

Item Metadata

Title	Interpolation, growth conditions, and stochastic gradient descent
Creator	Mishkin, Aaron
Publisher	University of British Columbia
Date Issued	2020
Description	Current machine learning practice requires solving huge-scale empirical risk minimization problems quickly and robustly. These problems are often highly under-determined and admit multiple solutions which exactly fit, or interpolate, the training data. In such cases, stochastic gradient descent has been shown to converge without decreasing step-sizes or averaging, and can achieve the fast convergence of deterministic gradient methods. Recent work has further shown that stochastic acceleration and line-search methods for step-size selection are possible in this restricted setting. Although pioneering, existing analyses for first-order methods under interpolation have two major weaknesses: they are restricted to the finite-sum setting, and, secondly, they are not tight with the best deterministic rates. To address these issues, we extend the notion of interpolation to stochastic optimization problems with general, first-order oracles. We use the proposed framework to analyze stochastic gradient descent with a fixed step-size and with an Armijo-type line-search, as well as Nesterov’s accelerated gradient method with stochastic gradients. In nearly all settings, we obtain faster convergence with a wider range of parameters. The improvement for stochastic Nesterov acceleration is comparable to dividing by the square-root of the condition number and addresses criticism that existing rates were not truly “accelerated”. In the case of convex functions, our convergence rates for stochastic gradient descent — both with and without the stochastic Armijo line-search — recover the best-known rates in the deterministic setting. We also provide a simple extension to L2-regularized minimization, which opens the path to proximal-gradient methods and non-smooth optimization under interpolation.
Genre	Thesis/Dissertation
Type	Text
Language	eng
Date Available	2020-09-24
Provider	Vancouver : University of British Columbia Library
Rights	Attribution-NonCommercial-NoDerivatives 4.0 International
DOI	10.14288/1.0394494
URI	http://hdl.handle.net/2429/76150
Degree (Theses)	Master of Science - MSc
Program (Theses)	Computer Science
Affiliation	Science, Faculty of; Computer Science, Department of
Degree Grantor	University of British Columbia
Graduation Date	2020-11
Campus	UBCV
Scholarly Level	Graduate
Rights URI	http://creativecommons.org/licenses/by-nc-nd/4.0/
Aggregated Source Repository	DSpace

Open Collections

UBC Theses and Dissertations

UBC Theses and Dissertations

Interpolation, growth conditions, and stochastic gradient descent Mishkin, Aaron

Abstract

Item Metadata

Item Media

Item Citations and Data

Rights