Towards a Tight Asymptotic Bound For One-Dimensional Stochastic Gradient Descent

UBC Undergraduate Research

Towards a Tight Asymptotic Bound For One-Dimensional Stochastic Gradient Descent Lu, William; Harvey, Nicholas J. A.

Abstract

Supervised machine learning algorithms work by optimizing model parameters to minimize a loss (objective) function. For many classes of models, such as robust regression and logistic regression, there is no closed form solution for the minimizers. Therefore, iterative optimization algorithms play a key role in training machine learning models. The stochastic gradient descent (SGD) algorithm requires very few assumptions on the objective function and is therefore one of the most frequently used optimizers in practice. When training on large datasets, it is useful to know how fast an iterative optimizer converges to the solution. A common quantification of convergence rate is the expected suboptimality at the final iterate, denoted E [f(~xT )] − f ∗ . Prior work has established an upper bound of O ln √ T T and a lower bound of Ω √ 1 T on the expected suboptimality of SGD. In this thesis, we conjecture that the upper bound on the expected suboptimality of SGD can be improved to O √ 1 T when the objective function is one-dimensional. We prove this conjecture under a restricted setting. Our analysis frames this problem through the lens of stochastic processes and random walks. To investigate the limiting behaviour of biased random walks, we utilize techniques from generating functions and combinatorics. We make standard assumptions of convexity and Lipschitz continuity on the objective function. For simplicity, we assume a constant step size ηt = Θ √ 1 T .

Item Metadata

Title	Towards a Tight Asymptotic Bound For One-Dimensional Stochastic Gradient Descent
Creator	Lu, William; Harvey, Nicholas J. A.
Date Issued	2021-04-27
Description	Supervised machine learning algorithms work by optimizing model parameters to minimize a loss (objective) function. For many classes of models, such as robust regression and logistic regression, there is no closed form solution for the minimizers. Therefore, iterative optimization algorithms play a key role in training machine learning models. The stochastic gradient descent (SGD) algorithm requires very few assumptions on the objective function and is therefore one of the most frequently used optimizers in practice. When training on large datasets, it is useful to know how fast an iterative optimizer converges to the solution. A common quantification of convergence rate is the expected suboptimality at the final iterate, denoted E [f(~xT )] − f ∗ . Prior work has established an upper bound of O ln √ T T and a lower bound of Ω √ 1 T on the expected suboptimality of SGD. In this thesis, we conjecture that the upper bound on the expected suboptimality of SGD can be improved to O √ 1 T when the objective function is one-dimensional. We prove this conjecture under a restricted setting. Our analysis frames this problem through the lens of stochastic processes and random walks. To investigate the limiting behaviour of biased random walks, we utilize techniques from generating functions and combinatorics. We make standard assumptions of convexity and Lipschitz continuity on the objective function. For simplicity, we assume a constant step size ηt = Θ √ 1 T .
Genre	Graduating Project
Type	Text
Language	eng
Series	University of British Columbia. CPSC 449
Date Available	2021-05-17
Provider	Vancouver : University of British Columbia Library
Rights	Attribution-NonCommercial-NoDerivatives 4.0 International
DOI	10.14288/1.0397496
URI	http://hdl.handle.net/2429/78356
Affiliation	Science, Faculty of; Computer Science, Department of
Peer Review Status	Unreviewed
Scholarly Level	Faculty; Undergraduate
Rights URI	http://creativecommons.org/licenses/by-nc-nd/4.0/
Aggregated Source Repository	DSpace

Open Collections

UBC Undergraduate Research