- Library Home /
- Search Collections /
- Open Collections /
- Browse Collections /
- UBC Theses and Dissertations /
- Interpolation, growth conditions, and stochastic gradient...
Open Collections
UBC Theses and Dissertations
UBC Theses and Dissertations
Interpolation, growth conditions, and stochastic gradient descent Mishkin, Aaron
Abstract
Current machine learning practice requires solving huge-scale empirical risk minimization problems quickly and robustly. These problems are often highly under-determined and admit multiple solutions which exactly fit, or interpolate, the training data. In such cases, stochastic gradient descent has been shown to converge without decreasing step-sizes or averaging, and can achieve the fast convergence of deterministic gradient methods. Recent work has further shown that stochastic acceleration and line-search methods for step-size selection are possible in this restricted setting. Although pioneering, existing analyses for first-order methods under interpolation have two major weaknesses: they are restricted to the finite-sum setting, and, secondly, they are not tight with the best deterministic rates. To address these issues, we extend the notion of interpolation to stochastic optimization problems with general, first-order oracles. We use the proposed framework to analyze stochastic gradient descent with a fixed step-size and with an Armijo-type line-search, as well as Nesterov’s accelerated gradient method with stochastic gradients. In nearly all settings, we obtain faster convergence with a wider range of parameters. The improvement for stochastic Nesterov acceleration is comparable to dividing by the square-root of the condition number and addresses criticism that existing rates were not truly “accelerated”. In the case of convex functions, our convergence rates for stochastic gradient descent — both with and without the stochastic Armijo line-search — recover the best-known rates in the deterministic setting. We also provide a simple extension to L2-regularized minimization, which opens the path to proximal-gradient methods and non-smooth optimization under interpolation.
Item Metadata
Title |
Interpolation, growth conditions, and stochastic gradient descent
|
Creator | |
Publisher |
University of British Columbia
|
Date Issued |
2020
|
Description |
Current machine learning practice requires solving huge-scale empirical risk minimization problems quickly and robustly. These problems are often highly under-determined and admit multiple solutions which exactly fit, or interpolate, the training data. In such cases, stochastic gradient descent has been shown to converge without decreasing step-sizes or averaging, and can achieve the fast convergence of deterministic gradient methods. Recent work has further shown that stochastic acceleration and line-search methods for step-size selection are possible in this restricted setting. Although pioneering, existing analyses for first-order methods under interpolation have two major weaknesses: they are restricted to the finite-sum setting, and, secondly, they are not tight with the best deterministic rates. To address these issues, we extend the notion of interpolation to stochastic optimization problems with general, first-order oracles. We use the proposed framework to analyze stochastic gradient descent with a fixed step-size and with an Armijo-type line-search, as well as Nesterov’s accelerated gradient method with stochastic gradients. In nearly all settings, we obtain faster convergence with a wider range of parameters. The improvement for stochastic Nesterov acceleration is comparable to dividing by the square-root of the condition number and addresses criticism that existing rates were not truly “accelerated”. In the case of convex functions, our convergence rates for stochastic gradient descent — both with and without the stochastic Armijo line-search — recover the best-known rates in the deterministic setting. We also provide a simple extension to L2-regularized minimization, which opens the path to proximal-gradient methods and non-smooth optimization under interpolation.
|
Genre | |
Type | |
Language |
eng
|
Date Available |
2020-09-24
|
Provider |
Vancouver : University of British Columbia Library
|
Rights |
Attribution-NonCommercial-NoDerivatives 4.0 International
|
DOI |
10.14288/1.0394494
|
URI | |
Degree | |
Program | |
Affiliation | |
Degree Grantor |
University of British Columbia
|
Graduation Date |
2020-11
|
Campus | |
Scholarly Level |
Graduate
|
Rights URI | |
Aggregated Source Repository |
DSpace
|
Item Media
Item Citations and Data
Rights
Attribution-NonCommercial-NoDerivatives 4.0 International