UBC Faculty Research and Publications

Convergence rate of stochastic gradient with constant step size Schmidt, Mark Sep 5, 2014

Your browser doesn't seem to have a PDF viewer, please download the PDF to view this item.

Item Metadata

Download

Media
52383-stochasticGradientConstant.pdf [ 206.14kB ]
Metadata
JSON: 52383-1.0050992.json
JSON-LD: 52383-1.0050992-ld.json
RDF/XML (Pretty): 52383-1.0050992-rdf.xml
RDF/JSON: 52383-1.0050992-rdf.json
Turtle: 52383-1.0050992-turtle.txt
N-Triples: 52383-1.0050992-rdf-ntriples.txt
Original Record: 52383-1.0050992-source.json
Full Text
52383-1.0050992-fulltext.txt
Citation
52383-1.0050992.ris

Full Text

Convergence Rate of Stochastic Gradient with Constant Step SizeMark SchmidtUniversity of British ColumbiaSeptember 5, 2014AbstractWe show that the basic stochastic gradient method applied to a strongly-convex differentiablefunction with a constant step-size achieves a linear convergence rate (in function value and iterates)up to a constant proportional the step-size (under standard assumptions on the gradient).1 Overview and AssumptionsWe want to minimize f(x) = E[fi(x)], where the expectation is taken with respect to i. The mostcommon case is minimizing a finite sum,minx∈Rd1nn∑i=1fi(x), (1.1)as in problems like least squares and logistic regression. With use the iterationxk+1 = xk − αf ′ik (xk),where ik is sampled uniformly (and step-size α is the step-size). We will assume that f ′ is L-Lipschitz,f is µ-strongly convex, ‖f ′i(x)‖ ≤ C for all x and i, that the minimizer is x∗, and 0 < α < 1/2µ. Wewill show thatE[f(xk)− f(x∗)] ≤ (1− 2αµ)k(f(x0)− f(x∗)) +O(α),E[∥∥xk − x∗∥∥2] ≤ (1− 2αµ)k∥∥x0 − x∗∥∥2 +O(α),meaning that the function values and iterates converge linearly up to some error level proportional toα. For the special case of (1.1), Proposition 3.4 in the paper of Nedic and Bertsekas (‘ConvergenceRates of Incremental Subgradient Algorithms’, 2000) gives a similar argument/result but here wealso consider the function value and we work with the expectation to get rid of the dependence onn.2 Useful inequalititesBy L-Lipschitz of f ′, for all x and y we havef(y) ≤ f(x) + f ′(x)T (y − x) +L2‖y − x‖2.By µ-strong-convexity of f , for all x and y we havef(y) ≥ f(x) + f ′(x)T (y − x) +µ2‖y − x‖2.Minimizing both sides in terms of y, by setting y = x− 1µf′(x) on the right hand side and using thedefinition of x∗ on the left hand side,f(x∗) ≥ f(x)−1µ∥∥f ′(x)∥∥2 +12µ∥∥f ′(x)∥∥2 = f(x)−12µ∥∥f ′(x)∥∥2.Also by strong-convexity,f ′(x)T (x− x∗) = (f ′(x)− f ′(x∗))T (x− x∗) ≥ µ‖x− x∗‖2.1By definition of ik and f ,E[f ′ik (xk)] = f ′(xk).Recall the limit of the geometric series,∞∑i=0ri =11− r, for |r| < 1.3 Function Valuef(xk+1) ≤ f(xk) + f ′(xk)T (xk+1 − xk) +L2∥∥xk+1 − xk∥∥2 (x = xk, y = xk+1 in L-Lipshitz inequality)= f(xk)− αf ′(xk)T fik (xk) +Lα22∥∥f ′ik (xk)∥∥2 (eliminate (xk+1 − xk) using definition of xk+1)≤ f(xk)− αf ′(xk)T fik (xk) +Lα2C22. (use∥∥f ′i(xk)∥∥ ≤ C)E[f(xk+1)− f(x∗)] ≤ f(xk)− f(x∗)− αf ′(xk)E[fik (xk)] +Lα2C22(take expectation WRT ik, subtract f(x∗))≤ f(xk)− f(x∗)− α∥∥f ′(xk)∥∥2 +Lα2C22(use E[f ′ik (xk)] = f ′(xk)))≤ f(xk)− f(x∗)− 2αµ(f(xk)− f(x∗)) +Lα2C22(use12µ∥∥f ′(xk)∥∥2 ≥ f(xk)− f(x∗))= (1− 2αµ)(f(xk)− f(x∗)) +Lα2C22.E[f(xk)− f(x∗)] ≤ (1− 2αµ)k(f(x0)− f(x∗)) +k∑i=0(1− 2αµ)iLα2C22(apply recursively, take total expectation)≤ (1− 2αµ)k(f(x0)− f(x∗)) +∞∑i=0(1− 2αµ)iLα2C22(extra terms are positive because α < 1/2µ)= (1− 2αµ)k(f(x0)− f(x∗)) +LαC24µ. (use that∞∑i=0(1− 2αµ)i = 1/2αµ)4 Iterates∥∥xk+1 − x∗∥∥2 =∥∥(xk − αf ′ik (xk))− x∗∥∥2 (definition of xk+1)=∥∥xk − x∗∥∥2 − 2αf ′ik (xk)T (x− x∗) + α2∥∥f ′ik (xk)∥∥2 (group (xk − x∗), expand)≤∥∥xk − x∗∥∥2 − 2αf ′ik (xk)T (xk − x∗) + α2C2. (use∥∥f ′i(xk)∥∥ ≤ C)E[∥∥xk+1 − x∗∥∥2] ≤∥∥xk − x∗∥∥2 − 2αf ′(xk)T (xk − x∗) + α2C2 (take expectation WRT ik)≤ ‖x− x∗‖2 − 2αµ‖x− x∗‖+ α2C2 (use f ′(x)T (x− x∗) ≥ µ‖x− x∗‖2 )= (1− 2αµ)∥∥xk − x∗∥∥2 + α2C2E[∥∥xk − x∗∥∥2] ≤ (1− 2αµ)k∥∥x0 − x∗∥∥2 +k∑i=0(1− 2αµ)iα2C2 (apply recursively, take total expectation)≤ (1− 2αµ)k∥∥x0 − x∗∥∥2 +αC22µ. (as before, use thatk∑i=0(1− 2αµ)i ≤ 1/2αµ).2

Cite

Citation Scheme:

        

Citations by CSL (citeproc-js)

Usage Statistics

Share

Embed

Customize your widget with the following options, then copy and paste the code below into the HTML of your page to embed this item in your website.
                        
                            <div id="ubcOpenCollectionsWidgetDisplay">
                            <script id="ubcOpenCollectionsWidget"
                            src="{[{embed.src}]}"
                            data-item="{[{embed.item}]}"
                            data-collection="{[{embed.collection}]}"
                            data-metadata="{[{embed.showMetadata}]}"
                            data-width="{[{embed.width}]}"
                            async >
                            </script>
                            </div>
                        
                    
IIIF logo Our image viewer uses the IIIF 2.0 standard. To load this item in other compatible viewers, use this url:
http://iiif.library.ubc.ca/presentation/dsp.52383.1-0050992/manifest

Comment

Related Items