UBC Undergraduate Research

Towards a Tight Asymptotic Bound For One-Dimensional Stochastic Gradient Descent Lu, William; Harvey, Nicholas J. A.

Abstract

Supervised machine learning algorithms work by optimizing model parameters to minimize a loss (objective) function. For many classes of models, such as robust regression and logistic regression, there is no closed form solution for the minimizers. Therefore, iterative optimization algorithms play a key role in training machine learning models. The stochastic gradient descent (SGD) algorithm requires very few assumptions on the objective function and is therefore one of the most frequently used optimizers in practice. When training on large datasets, it is useful to know how fast an iterative optimizer converges to the solution. A common quantification of convergence rate is the expected suboptimality at the final iterate, denoted E [f(~xT )] − f ∗ . Prior work has established an upper bound of O ln √ T T and a lower bound of Ω √ 1 T on the expected suboptimality of SGD. In this thesis, we conjecture that the upper bound on the expected suboptimality of SGD can be improved to O √ 1 T when the objective function is one-dimensional. We prove this conjecture under a restricted setting. Our analysis frames this problem through the lens of stochastic processes and random walks. To investigate the limiting behaviour of biased random walks, we utilize techniques from generating functions and combinatorics. We make standard assumptions of convexity and Lipschitz continuity on the objective function. For simplicity, we assume a constant step size ηt = Θ √ 1 T .

Item Citations and Data

Rights

Attribution-NonCommercial-NoDerivatives 4.0 International