{"@context":{"@language":"en","Affiliation":"http:\/\/vivoweb.org\/ontology\/core#departmentOrSchool","AggregatedSourceRepository":"http:\/\/www.europeana.eu\/schemas\/edm\/dataProvider","Campus":"https:\/\/open.library.ubc.ca\/terms#degreeCampus","Creator":"http:\/\/purl.org\/dc\/terms\/creator","DateAvailable":"http:\/\/purl.org\/dc\/terms\/issued","DateIssued":"http:\/\/purl.org\/dc\/terms\/issued","Degree":"http:\/\/vivoweb.org\/ontology\/core#relatedDegree","DegreeGrantor":"https:\/\/open.library.ubc.ca\/terms#degreeGrantor","Description":"http:\/\/purl.org\/dc\/terms\/description","DigitalResourceOriginalRecord":"http:\/\/www.europeana.eu\/schemas\/edm\/aggregatedCHO","FullText":"http:\/\/www.w3.org\/2009\/08\/skos-reference\/skos.html#note","Genre":"http:\/\/www.europeana.eu\/schemas\/edm\/hasType","GraduationDate":"http:\/\/vivoweb.org\/ontology\/core#dateIssued","IsShownAt":"http:\/\/www.europeana.eu\/schemas\/edm\/isShownAt","Language":"http:\/\/purl.org\/dc\/terms\/language","Program":"https:\/\/open.library.ubc.ca\/terms#degreeDiscipline","Provider":"http:\/\/www.europeana.eu\/schemas\/edm\/provider","Publisher":"http:\/\/purl.org\/dc\/terms\/publisher","Rights":"http:\/\/purl.org\/dc\/terms\/rights","RightsURI":"https:\/\/open.library.ubc.ca\/terms#rightsURI","ScholarlyLevel":"https:\/\/open.library.ubc.ca\/terms#scholarLevel","Title":"http:\/\/purl.org\/dc\/terms\/title","Type":"http:\/\/purl.org\/dc\/terms\/type","URI":"https:\/\/open.library.ubc.ca\/terms#identifierURI","SortDate":"http:\/\/purl.org\/dc\/terms\/date"},"Affiliation":[{"@value":"Science, Faculty of","@language":"en"},{"@value":"Mathematics, Department of","@language":"en"}],"AggregatedSourceRepository":[{"@value":"DSpace","@language":"en"}],"Campus":[{"@value":"UBCV","@language":"en"}],"Creator":[{"@value":"Bao, Anyi","@language":"en"}],"DateAvailable":[{"@value":"2019-04-12T17:56:53Z","@language":"en"}],"DateIssued":[{"@value":"2019","@language":"en"}],"Degree":[{"@value":"Master of Science - MSc","@language":"en"}],"DegreeGrantor":[{"@value":"University of British Columbia","@language":"en"}],"Description":[{"@value":"This thesis studies a widely-used solver SPGL1, which applies a general root-finding process to solve the basis pursuit denoising problem. This process involves a nested loop. The outer loop is an inexact Newton root-finding process, and the inner loop approximately solves LASSO (least absolute shrinkage and selection operator). We propose an accelerated dual method to accelerate the inner loop by optimizing the dual problem of LASSO on a low-dimensional space.  Experimental results show that our accelerated method that can successfully reduce the total iteration counts. Our future work is to reduce the total running time of our accelerated method.","@language":"en"}],"DigitalResourceOriginalRecord":[{"@value":"https:\/\/circle.library.ubc.ca\/rest\/handle\/2429\/69644?expand=metadata","@language":"en"}],"FullText":[{"@value":"An accelerated dual method for SPGL1byAnyi BaoB.Sc., Simon Fraser University, 2017A THESIS SUBMITTED IN PARTIAL FULFILLMENT OFTHE REQUIREMENTS FOR THE DEGREE OFMASTER OF SCIENCEinTHE FACULTY OF GRADUATE AND POSTDOCTORAL STUDIES(Mathematics)THE UNIVERSITY OF BRITISH COLUMBIA(Vancouver)April 2019c\u00a9 Anyi Bao, 2019The following individuals certify that they have read, and recommend tothe Faculty of Graduate and Postdoctoral Studies for acceptance, the thesisentitled:An accelerated dual method for SPGL1submitted by Anyi Bao in partial fulfillment of the requirements for thedegree of Master of Science in Mathematics.Examining Committee:Michael P. Friedlander, Computer Science and MathematicsSupervisorYifan Sun, Computer ScienceSupervisory Committee MemberAdditional Supervisory Committee Members:Ewout van de Berg, IBM T.J. Watson Research CenterSupervisory Committee MemberiiAbstractThis thesis studies a well-known solver SPGL1, which applies a general root-finding process to solve the basis pursuit denoising problem. This processinvolves a nested loop. The outer loop is an inexact Newton root-findingprocess, and the inner loop approximately solves LASSO (least absoluteshrinkage and selection operator). We propose an accelerated dual methodto accelerate the inner loop by optimizing the dual problem of LASSO ona low-dimensional space. Experimental results show that our acceleratedmethod that can successfully reduce the total iteration counts. Our futurework is to reduce the total running time of our accelerated method.iiiLay SummaryThis thesis focuses on a widely-used solver called SPGL1, which is used forlarge-scale sparse reconstruction. This solver contains a nested loop. Wepropose an accelerated dual method to increase the speed of convergenceof the inner loop, and thus reduces the overall computational complexity ofthe solver.ivPrefaceThis thesis contains my research during my M.Sc studies at the Universityof British Columbia. The research is conducted under the supervision ofProf. Michael P. Friedlander from the University of British Columbia andDr. Ewout van de Berg from the IBM T.J. Watson Research Center.The accelerated dual method proposed in Chapter 4 came from discus-sions with Prof. Friedlander and Dr. van de Berg. The algorithm imple-mentation was done on my own.vTable of ContentsAbstract . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . iiiLay Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . ivPreface . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . vTable of Contents . . . . . . . . . . . . . . . . . . . . . . . . . . . . viList of Tables . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . viiiList of Figures . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . ixAcknowledgements . . . . . . . . . . . . . . . . . . . . . . . . . . . x1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11.1 Related work . . . . . . . . . . . . . . . . . . . . . . . . . . . 21.2 Thesis overview and contributions . . . . . . . . . . . . . . . 31.3 Notation and key definitions . . . . . . . . . . . . . . . . . . 32 Level-set methods for convex optimization . . . . . . . . . . 52.1 The level-set approach . . . . . . . . . . . . . . . . . . . . . 52.1.1 Warm-starts . . . . . . . . . . . . . . . . . . . . . . . 92.2 Root-finding procedure . . . . . . . . . . . . . . . . . . . . . 92.2.1 Inexact Newton method . . . . . . . . . . . . . . . . 102.2.2 Oracle . . . . . . . . . . . . . . . . . . . . . . . . . . 122.2.3 Feasibility of the solution . . . . . . . . . . . . . . . . 123 Spectral Projected Gradient for L1 minimization (SPGL1) 133.1 The dual problem of LASSO . . . . . . . . . . . . . . . . . . 133.2 Relation between value function and inexact Newton oracle . 143.3 SPGL1 for LASSO algorithm . . . . . . . . . . . . . . . . . . 163.4 Discussion . . . . . . . . . . . . . . . . . . . . . . . . . . . . 18viTable of Contents4 An accelerated dual method . . . . . . . . . . . . . . . . . . . 214.1 Construction of matrix B . . . . . . . . . . . . . . . . . . . . 224.1.1 Analysis of the general matrix B . . . . . . . . . . . . 244.1.2 Analysis of orthogonal matrix Q . . . . . . . . . . . . 254.2 Different solvers for dual problem . . . . . . . . . . . . . . . 254.2.1 PDCO . . . . . . . . . . . . . . . . . . . . . . . . . . . 254.2.2 Gurobi . . . . . . . . . . . . . . . . . . . . . . . . . . 264.3 Measure quantity . . . . . . . . . . . . . . . . . . . . . . . . 274.4 Interpretation . . . . . . . . . . . . . . . . . . . . . . . . . . 284.5 SPGL1 with an accelerated method . . . . . . . . . . . . . . . 285 Experimental results . . . . . . . . . . . . . . . . . . . . . . . . 305.1 Test problems generation . . . . . . . . . . . . . . . . . . . . 305.2 Numerical experiments and discussion . . . . . . . . . . . . . 305.3 Future directions . . . . . . . . . . . . . . . . . . . . . . . . . 32Bibliography . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 33viiList of Tables2.1 Some typical applications of the level-set method [1]. . . . . . 65.1 Experimental results. Note that in the second problem, thedotted line in PDCO means that it is able to solve the problemwithin the maximum iteration limit. . . . . . . . . . . . . . . 31viiiList of Figures2.1 An illustration of the goal of this proof. . . . . . . . . . . . . 72.2 Plots in the first row show the first iteration of the inex-act Newton method (on the left), and the classical Newton\u2019smethod (on the right). The inexact method use the approxi-mate function values while the classical methods compute thetrue ones. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 103.1 This plot illustrates the first two iterations of the oracle SPGL1for LASSO. At each iteration i, we have an upper bound gotfrom the primal value and a lower bound got from the dualvalue. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 153.2 Conceptual map of the algorithm SPGL1 for LASSO presentedin the previous section. . . . . . . . . . . . . . . . . . . . . . . 183.3 Top: primal and dual values of LASSO computed by SPGL1for LASSO at each iteration. We observe that the dual valuedoes not monotonically decrease. Bottom: the duality gap ateach iteration, and this value does not decrease monotonicallyas well. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 20ixAcknowledgementsIt is a pleasure to acknowledge the support I have gained during my master\u2019sstudies. First, I would like to send my grate thanks to my supervisor MichaelP. Friedlander. It has been a pleasure and a privilege to be his student. Hisenergy and enthusiasm are infectious.My research has greatly benefited from insightful discussion and collab-oration with Ewout van den Berg. I also like to thank Yifan Sun for fruitfuldiscussions and constantly encouragement. They have helped me a lot inmy studies. The conversations with them have shaped and sharpened mythinking about optimization.My group members Halyun Jeong, Liran Li, Huang Fang, and ZhenanFan have been of much help and shown me great kindness throughout mygraduate studies.Finally, I would like to thank my family for continuous support. Es-pecially my boyfriend Chen-Chih Lai, he offered steadfast and everlastingcompanionship and support throughout my years at UBC. Without theirencouragement I could not have completed this quest.xChapter 1IntroductionThe main goal of the basis pursuit problem is to find a sparse solution of anunderdetermined system of equations Ax = b, where A \u2208 Rm\u00d7n is a sensingmatrix with m \u001c n, and b \u2208 Rm is called a measurement vector. The goalis to find a sparse signal x \u2208 Rn that picks out the certain columns of A tofit the measurement b. Formally, we have the following convex optimizationproblem(BP) minx\u2208Rn\u2016x\u20161 subject to Ax = b.In practice, we often encounter the case of noisy measurements. Assumethat the measurement b can be decomposed into clean data and noise, i.e.,b = s+ \u03c3oz,where s represents clean data, z is a standard white Gaussian noise vector,and \u03c3o > 0 indicates the noise level. Furthermore, the clean data s can bewritten as s = Ax, then we will have \u03c3oz = b \u2212 Ax. Say that we have acontrol of the total noise in the system, i.e., \u2016\u03c3oz\u20162 \u2264 \u03c3, then we arrive thefollowing basis pursuit denoising (BPDN) formulation(BP\u03c3) minimizex\u2208Rn\u2016x\u20161 subject to \u2016Ax\u2212 b\u20162 \u2264 \u03c3.This can be interpreted as minimizing a regularizer subject to a prescribedlevel of data fit. The theme of this thesis is that users have a priori estimateon the noise levels inherent in the measurements, and thus the (BP\u03c3) is anatural formulation to this theme.With the help of the Lagrange multiplier, we can convert the constrainedproblem (BP\u03c3) into an unconstrained problem. This yields another variantof the BPDN problem: the penalized least-square problem(QP\u03bb) minimizex\u2208Rn\u2016Ax\u2212 b\u20162 + \u03bb\u2016x\u20161.This is also the original formulation of basis pursuit denoise problemthat was proposed by Chen, Donoho, and Saunders [9]. From a modeling11.1. Related workperspective, this formulation establishes the tradeoff between reconstructionaccuracy and the signal sparsity using a positive penalty parameter \u03bb.A third formulation is the least absolute shrinkage and selection operator(LASSO) [24](LS\u03c4 ) minimizex\u2208Rn\u2016Ax\u2212 b\u20162 subject to \u2016x\u20161 \u2264 \u03c4.For appropriate choices of \u03c4, \u03c3, and \u03bb, the solutions to these three prob-lems are identical, and these three formulations are in this sense equivalent.However, except for special cases\u2013 such as A orthogonal\u2013 we often don\u2019t havea prior knowledge of the parameters that make these problems equivalent[25].1.1 Related workThere are some optimization algorithms to solve (BP\u03c3), (QP\u03bb), and (LS\u03c4 ).This section briefly discusses some algorithms to solve each formulation.We first present the algorithms to solve (QP\u03bb). Figueiredo et al. [13] pro-pose GPSR algorithm to solve (QP\u03bb). This method re-formulates the prob-lem to a bound-constrained quadratic problem, and then use the gradient-projection method with Barzilai-Borwein (BB) steps to solve it. In the sameyear, Kim et al. [20] present an interior-point method for this the penalizedleast-square problem. Hale, Yin and Zhang [11, 16] propose the fixed pointcontinuation algorithm (FPC). This method is based on operator splitting. Itfirst splits the optimality condition into the sum of two maximal monotoneoperators, T1, which is gradient of least-square loss, and T2, which is thesoft-thresholding. It then expresses the optimal solution in terms of thesetwo operators, and it leads to the forward-backward splitting algorithm forfinding a zero of T1 + T2. Wen et al. [27, 28] improve the performance ofFPC by adding an active-set step (FPC AS).Next, we present several algorithms to solve the formulation (BP\u03c3).Becker, Bobin and Candes [4, 5] proposed NESTA, it is based on Nesterov\u2019sa smoothing technique and an accelerated first-order algorithm [21]. Thelevel-set framework was first used by van den Berg and Friedlander [25]in 2008 for solving BPDN using a sequence of LASSO subproblems, andthe theory was implemented in the software package called SPGL1. The bigsuccess of SPGL1 motivated them to extend this framework to a more gen-eralized problem class: gauge optimization [26]. The theory of level-set wasformalized and applied to the general convex programming by Aravkin etal. in 2016 [1]. We will give a summary of this method in Chapter 2.21.2. Thesis overview and contributions1.2 Thesis overview and contributionsThe main contribution of this thesis is to develop a new accelerated methodfor solving LASSO in SPGL1. In the original method, LASSO is solved itera-tively until the duality gap converges. To achieve that, at each iteration, thedual solutions are computed directly from the primal solutions. Our newmethod is to actually optimize the dual problem separately on a reducedspace, and this reduced space is constructed based on the primal feasiblepoints. Experimental results show this accelerated method could success-fully reduce the number of iteration to solve LASSO.The rest of the thesis is structured as follows. In Section 1.3, we givesome convex analysis background to understand our algorithms and theo-rems. Chapter 2 provides a description of the intuition behind the theme ofconnecting (BP\u03c3) and (LS\u03c4 ). It gives a general theory called the level-setmethod for exchanging the roles of the objective and constraints, and insteadapproximately solving a sequence of level-set problems. In the chapter 3, wego back to look at level set techniques in the context of the BPDN problem.We describe the SPGL1 algorithm in details, as well as describe the challengeof the current algorithm. Chapter 4 introduces and gives details on a newvariant of SPGL1. In Chapter 5, we show a series of experimental results thatusing our new method is numerically comparable with the original SPGL1 insolving LASSO. Finally, in Chapter 6 we outline some directions for futureresearch.1.3 Notation and key definitionsBefore we go into the main content, this section provides notation and thedefinitions of terminologies that we use throughout the thesis.We use \u2016x\u2016p = (\u2211 |xi|p)1\/p to denote the p-norm of a vector x. Unlessotherwise specified, we adapt the default norm \u2016 \u00b7 \u2016 = \u2016 \u00b7 \u20162. We use thesymbol e to represent a vector of all ones.Now we review some fundamental concepts in convex analysis. A set Cis convex if the line segment between any two points in the set C is also inC. That is, for any x, y \u2208 C, with \u03b8 \u2208 [0, 1], we have \u03b8x + (1 \u2212 \u03b8)y \u2208 C.We consider all functions are on the extended real line R\u222a {\u221e}. A domainof a function f is dom f = {x | f(x) <\u221e} . The indicator function \u03b4 on aconvex set C is defined as\u03b4C(x) ={0 x \u2208 C+\u221e otherwise.31.3. Notation and key definitionsWe call a function f convex if and only if its epigraph, which is the setepi f ={(x, r) \u2208 Rn+1, f(x) \u2264 r} \u2286 Rn+1,is convex.4Chapter 2Level-set methods for convexoptimizationHow is (LS\u03c4 ) related to (BP\u03c3)? It is not hard to observe that for properchoices of \u03c4 and \u03c3, one can be \u201ctransformed\u201d to the other by simply swap-ping the objective function and the constraints. In practice, we often en-counter the case when we have to minimize an objective function subjectto a relatively much more complicated and difficult constraint function. Toremedy this problem, Aravkin, Burke, Drusvyatskiy, Friedlander, and Roy[1] propose a level-set method that exchanges the objective and the difficultconstraint, and then uses the easier flipped problem to solve the original.We have the following pair of problems in the general setting(P\u03c3) minimizex\u2208X\u03c6(x) subject to \u03c1(Ax\u2212 b) \u2264 \u03c3,and(Q\u03c4 ) minimizex\u2208X\u03c1(Ax\u2212 b) subject to \u03c6(x) \u2264 \u03c4,where X is a closed convex set, and the functions \u03c6, \u03c1 are closed and convex.Such pairs of problems are very common in contemporary optimization.One example is the (BP\u03c3) and (LS\u03c4 ). In Table 2.1 (borrowed from [1]), welist some other well-known applications which include gauge optimization,matrix completion, and robust elastic net regularization, etc.2.1 The level-set approachThe level-set approach states that the solution of (P\u03c3) can be found byexecuting a root-finding procedure on the nonlinear equationv(\u03c4) = \u03c3, (2.1)52.1. The level-set approachApplications P\u03c3 Q\u03c4BPDN minx\u2016x\u20161 minx\u2016Ax\u2212 b\u20162s.t. \u2016Ax\u2212 b\u20162 \u2264 \u03c3 s.t. \u2016x\u20161 \u2264 \u03c4gauge optimization minx\u03ba(x) minx\u03b3(Ax\u2212 b)(\u03ba, \u03b3: gauge func) s.t. \u03b3(Ax\u2212 b) \u2264 \u03c3 s.t. \u03ba(x) \u2264 \u03c4matrix minX\u2016X\u2016\u2217 minX\u2016AX \u2212 b\u20162completion s.t. \u2016AX \u2212 b\u20162 \u2264 \u03c3 s.t. \u2016X\u2016\u2217 \u2264 \u03c4robust minx\u03b1\u2016x\u20161 + \u03b2\u2016x\u20162 minx\u2016Ax\u2212 b\u20162elastic-net s.t. \u2016Ax\u2212 b\u20162 \u2264 \u03c3 s.t. \u03b1\u2016x\u20161 + \u03b2\u2016x\u20162 \u2264 \u03c4Table 2.1: Some typical applications of the level-set method [1].where v(\u03c4) is the value function of (Q\u03c4 ). This v(\u03c4) is an univariate functiondefined byv(\u03c4) := minx\u2208X{\u03c1(Ax\u2212 b) | \u03c6(x) \u2264 \u03c4}.It is called the value function of (Q\u03c4 ) since it gives the optimal value of (Q\u03c4 )for a given parameter \u03c4 . In order to understand the level-set approach, wefirst study some important properties of this value function.Proposition 2.1.1. Given a convex set X , and two closed convex functions\u03c1 and \u03c6, the univariate function defined byv(\u03c4) = minx\u2208X{\u03c1(Ax\u2212 b) | \u03c6(x) \u2264 \u03c4},is non-increasing and convex.Proof. As the feasible region increases as we increase \u03c4 , the minimizer ina larger feasible set must be no bigger than the one attained in a smallerfeasible set. Therefore, the value function v(\u03c4) is non-increasing.We now show the convexity of v(\u03c4). We first recast v(\u03c4) intov(\u03c4) = minx{\u03c1(Ax\u2212 b) + \u03b4epi(\u03c6)(x, \u03c4) + \u03b4X (x)}.62.1. The level-set approachLet f(x, \u03c4) := \u03c1(Ax \u2212 b) + \u03b4epi(\u03c6)(x, \u03c4) + \u03b4X (x). This function f is jointlyconvex in (x, \u03c4) because it is the sum of three convex functions. In moredetail, the first function \u03c1(Ax \u2212 b) is convex in x as given. We know thatthe indicator function of a set is convex if and only if the set is convex [10,Exercise 2.18]. This fact yields the convexity of the third function \u03b4X (x)as X is convex. Moreover, since the function \u03c6 is convex, its epigraph isconvex by the definition of a convex function. Hence, the third function isconvex in (x, \u03c4). Finally, since f(x, \u03c4) is convex in both variables, its infimalprojection v(\u03c4) is convex [7, Equation 3.16].Let Popt\u03c3 denote the optimal value of the problem (P\u03c3) given a parameter\u03c3. With a mild assumption that the constraint \u03c1(Ax\u2212 b) \u2264 \u03c3 is active (i.e.,satisfied with equality) at the optimal solution of P\u03c3, the following theoremshows that the value \u03c4\u2217\u03c3 := Popt\u03c3 is the smallest \u03c4 that satisfiesv(\u03c4\u2217\u03c3) = \u03c3.Proposition 2.1.2. Let us denote by Popt\u03c3 the optimal value of the problem(P\u03c3) for a given \u03c3 > 0. For that given \u03c3, assume there exists a \u03c4 o\u03c3 such thatv(\u03c4 o\u03c3) = \u03c3, and \u03c4o\u03c3 = min{\u03c4 : v(\u03c4) = \u03c3}.Then \u03c4 o\u03c3 = Popt\u03c3 .Proof. Knowing that v(\u03c4) is non-increasing and convex, we might have thecase as illustrated below. Our goal is to show that two points \u03c4 o\u03c3 and \u03c4\u2217\u03c3coincide.Figure 2.1: An illustration of the goal of this proof.72.1. The level-set approachWe first show that \u03c4 o\u03c3 \u2264 \u03c4\u2217\u03c3 . Let x\u2217 be a minimizer to (P\u03c3), i.e.,x\u2217 \u2208 argminx\u2208X{\u03c6(x) | \u03c1(Ax\u2212 b) \u2264 \u03c3},then \u03c6(x\u2217) = \u03c4\u2217\u03c3 . Thus,v(\u03c4\u2217\u03c3) = minx\u2208X{\u03c1(Ax\u2212 b) | \u03c6(x) \u2264 \u03c4\u2217\u03c3}(i)\u2264 \u03c1(Ax\u2217 \u2212 b)(ii)\u2264 \u03c3 = v(\u03c4 o\u03c3),where (i) comes from the fact v(\u03c4\u2217\u03c3) is the minimal value, and (ii) is becausethat x\u2217 is feasible to (P\u03c3). Moreover, since v(\u03c4) is a nonincreasing functionin \u03c4 , we conclude that\u03c4 o\u03c3 \u2264 \u03c4\u2217\u03c3 .Conversely, we want to prove \u03c4 o\u03c3 \u2265 \u03c4\u2217\u03c3 . Since v(\u03c4 o\u03c3) = \u03c3, let xo be aminimizer that attained that minimal valuexo \u2208 argminx\u2208X{\u03c1(Ax\u2212 b) | \u03c6(x) \u2264 \u03c4 o\u03c3},then xo satisfies\u03c1(Axo \u2212 b) = \u03c3.That means xo is a feasible point to (P\u03c3). Therefore, we have\u03c4\u2217\u03c3(i)\u2264 \u03c6(xo)(ii)\u2264 \u03c4 o\u03c3 ,where (i) comes from the fact \u03c4\u2217\u03c3 is the minima of (P\u03c3), and (ii) is by theconstraint in v(\u03c4 o\u03c3). This completes our proof of both sides.An alternative proof can be shown using the strong duality [12].This theorem provides a natural and intrinsic characterization of thelevel-set method. Finding the optimal value of (P\u03c3) is equivalent to findinga root \u03c4 in the equation (2.1). Moreover, it immediately follows from thetheorem that for all \u03c4 \u2208 (0, \u03c4\u2217\u03c3), v(\u03c4) will return an optimal solution x thatsatisfies\u03c6(x) \u2264 Popt\u03c3 = \u03c4\u2217\u03c3 and \u03c1(Ax\u2212 b) \u2264 \u03c3 + \u000f, (2.2)where \u000f > 0. A point x that satisfies (2.2) is called super-optimal and\u000f-infeasible, respectively.82.2. Root-finding procedure2.1.1 Warm-startsWe thus reduce the problem (P\u03c3) to solve (2.1) effectively for a sequence ofmonotonically increasing \u03c4 . One practical approach is to solve it iteratively,by using the approximate solution x\u03c4k\u22121 from the previous iteration with\u03c4k\u22121 as a warm-start to v(\u03c4k). We notice this warm-start is practicablebecause the approximate minimizer x\u03c4k\u22121 from the last iteration v(\u03c4k\u22121) isindeed a feasible point in the next iteration for v(\u03c4k). That is because\u03c6(x\u03c4k\u22121) \u2264 \u03c4k\u22121 \u2264 \u03c4k,where the second inequality is because we take a monotonically increasingsequence {\u03c4k}k=1. With the warm-start, the overall computational complex-ity is much smaller than kC, where k is the number of root-finding stepsand C is the cost of evaluating v(\u03c4k) in each step.2.2 Root-finding procedureThe core idea of this section is to present a way to solve the root-findingproblem arising in the level-set approach. Throughout this section, we fixone particular \u03c3, and let w(\u03c4) := v(\u03c4)\u2212 \u03c3, and denote \u03c4\u2217 the root of w(\u03c4).Each algorithm we describe below is dependent on one oracle; thus theoverall computational complexity of the algorithm relies on the oracle aswell. It is important to note that both algorithms terminate when0 \u2264 w(\u03c4) \u2264 \u000f,where \u000f > 0 is a preassigned tolerance. This yields a point \u03c4 \u2264 \u03c4\u2217 thatproduces a super-optimal and \u000f-infeasible solution x satisfying (2.2).We discuss a root-finding method called the inexact Newton method. Itis inspired from the classical Newton\u2019s method. The difference is that it isusing an approximate function value at each iteration. It is illustrated inFigure 2.2.This inexact Newton method has a nested loop. The inner loop is alsoan iterative process that evaluates the value function v(\u03c4) approximately,and this iterative process is provided by an oracle. This oracle takes in a \u03c4k,and outputs a lower bound `k and a upper bound uk that are relatively closeto each other, i.e., uk\/`k \u2264 \u03b1, where \u03b1 \u2208 (1, 2) is a user-input constant. Italso outputs a slope sk, combined with `k, it gives a lower minorant to thefunction v(\u03c4k). The outer loop then uses the oracle outputs to compute thenext iterate \u03c4k+1. We present below a pseudocode of this inexact approach.92.2. Root-finding procedureAlgorithm 1: Pseudocode of inexact Newton method.Input : \u03c40[outer loop]1 for k = 1, 2, . . . do[inner loop: evaluate v(\u03c4) approximately]2 (`k, uk, sk) = O(\u03c4k)[end inner loop]3 update \u03c4k[end outer loop]4 return \u03c4kFigure 2.2: Plots in the first row show the first iteration of the inexactNewton method (on the left), and the classical Newton\u2019s method (on theright). The inexact method use the approximate function values while theclassical methods compute the true ones.2.2.1 Inexact Newton methodThe proposed inexact Newton\u2019s method works similarly as the standardNewton\u2019s method. Rather than evaluating the exact w\u2032(\u03c4) at each iteration,it constructs an affine minorant w. The formal algorithm 2 and the definitionof the oracle are presented as follows.102.2. Root-finding procedureAlgorithm 2: Inexact Newton method [1]Input : w : R+ \u2192 R: a decreasing convex function;Onew,w: an affine minorant oracle;\u000f > 0: accuracy;\u03c40: initial point where f(\u03c40) > 0\u03b1 \u2208 (1, 2): a constantOutput: \u03c4\u22171 u\u22121 \u2190 +\u221e2 k \u2190 03 while uk\u22121 > \u000f do4 (`k, uk, sk)\u2190 Onew,w(\u03c4k, \u03b1)[Update \u03c4]5 uk \u2190 min{uk, uk\u22121}6 \u03c4k+1 \u2190 \u03c4k \u2212 `k\/sk[end of the update]7 k \u2190 k + 18 end9 return \u03c4kDefinition 1 (Affine minorant oracle [1]). For a function w : R+ \u2192 R,an affine minorant oracle is a mapping Onew,w that assigns to each pair(\u03c4, \u03b1) \u2208 [w > 0]\u00d7 [1,+\u221e) real numbers (`, u, s) such that 0 < ` \u2264 w(\u03c4) \u2264 uand u\/` \u2264 \u03b1, and the affine function \u03c4 \u2032 \u2192 ` + s(\u03c4 \u2032 \u2212 \u03c4) globally minorizesw.The next theorem states the global convergence for this method.Theorem 2.2.1 (Linear convergence of the inexact Newton method [1]).The inexact Newton method terminates after at mostk \u2264 max{1 + log2\/\u03b1(2C\/\u000f), 2}iterations, where C = max{|s0|(\u03c4\u2217 \u2212 \u03c41), `0}.The total complexity of the inexact Newton is at mostCOnew,w \u00d7 k,where COnew,w is the cost of the Newton oracle Onew,w. This total complexitycan be reduced intensively is we use warm-starts.112.2. Root-finding procedure2.2.2 OracleThere are many first-order methods that might serve as an oracle for eval-uating w(\u03c4). For example, Frank-Wolfe [14, 18], also known as conditionalgradient method, provides a global minorant to the function value v, and theapproximate derivative at each point. These can be used as lower bound` and approximate derivative s. Other examples include some projectedsubgradient [2], cutting-plane [19], and primal-dual [8] methods.2.2.3 Feasibility of the solutionIn Sections 2.2.1, we introduce the inexact Newton method, and state thatthe method output a super-optimal and \u000f-infeasible solution x. To attainthe feasibility, we require to project it over the constraint set F := {x \u2208X | \u03c1(Ax\u2212 b) \u2264 \u03c3}. In general, there is no efficient approach for computingthis projection because of the composite structure of \u03c6. To circumvent theproblem of the feasibility of the solution, one inexpensive approach is toperform a radial-projection. This method requires to the computation of apoint that is strictly feasible for the constraint of the problem [22].12Chapter 3Spectral Projected Gradientfor L1 minimization (SPGL1)In the previous chapter, we introduced the level-set method that can beapplied to solve the basis pursuit denoising problem by solving a sequenceof LASSO problems. It then becomes clear that we require to an effectiveoracle (affine minorant oracle for Newton method) to solve LASSO in orderfor this method to work. However, solving the problem exactly is expensiveand may not be necessary.In 2008, van de Berg and Friedlander [25] propose the spectral projectedgradient for `1-minimization (SPGL1). The SPGL1 is able to solve two typesof problems. The first type is LASSO. SPGL1 contains an iterative oraclethat provides an approximate solution to LASSO. This oracle is namedSPGL1 for LASSO. The second type is BPDN. SPGL1 is using the SPGL1for LASSO with the inexact Newton method to solve BPDN. The rest ofthe thesis will be focused primarily on SPGL1 for LASSO.The roadmap is the following. First, in Section 3.1, we describe the dualproblem of LASSO, and the necessary and sufficient optimality conditionsfor the primal-dual solution. From the previous chapter, we know that theLASSO problem can be characterized by a value function. In Section 3.2, westate the differentiable property of this value function. More importantly,we disclose how this property related to the inexact Newton\u2019s oracle wementioned in Section 2.2.1. Next, we outline the algorithm of the oracleSPGL1 for LASSO. Finally, we discuss the challenge of the current algorithm.See also Figure 3.2 for a conceptual map of the notions and results presentedin this chapter.3.1 The dual problem of LASSOWe can write (LS\u03c4 ) equivalently as(LS\u03c4 ) minimizex,r\u2016r\u20162 subject to b\u2212Ax = r, \u2016x\u20161 \u2264 \u03c4,133.2. Relation between value function and inexact Newton oraclewhich has the dual problem(DLS\u03c4 ) maximizey,\u03bbbT y \u2212 \u03c4\u03bb subject to \u2016y\u20162 \u2264 1, \u2016AT y\u2016\u221e \u2264 \u03bb.Denote the solution to primal problem as x\u03c4 , and the dual solutions as y\u03c4and \u03bb\u03c4 . Feasible dual variables y and \u03bb can be computed directly from afeasible primal variable r [25],y =r\u2016r\u20162 , \u03bb = \u2016AT y\u2016\u221e. (3.1)The second equality in 3.1 comes from the tightness of bound in (DLS\u03c4 ). Ifthe bound is not tight, we can always choose \u03bb = \u2016AT y\u2016\u221e to reduce thedual objective value in (DLS\u03c4 ).In addition to these relations, we also have the following the necessaryand sufficient optimality conditions [25] for the primal-dual solution:Ax+ r = b, \u2016x\u20161 \u2264 \u03c4, (primal feasibility)\u2016AT r\u2016\u221e \u2264 \u03bb\u2016r\u20162, (dual feasibility),\u03bb(\u2016x\u2016 \u2212 \u03c4) = 0, (complementarity).3.2 Relation between value function and inexactNewton oracleIn the last chapter, we mentioned that BPDN can be solved by iterativelyperforming the left-most root-finding procedure on (2.1), where v(\u03c4) is thenon-increasing and convex value function of LASSO,v(\u03c4) := minx\u2208Rn{\u2016Ax\u2212 b\u2016 | \u2016x\u20161 \u2264 \u03c4}, \u03c4 \u2265 0.More specifically, if we choose to use the inexact Newton\u2019s method to dothe root-finding, we require an oracle that provides us the upper bound uk,lower bound `k, and an approximate derivative sk to the value function v(\u03c4k)at each iteration k. From this point onward, we let k denote the index foriterative inexact Newton method.Let i denote the index for the iterative oracle SPGL1 for LASSO. Weknow that for a given \u03c4k, if we get an approximate solution xik from (LS\u03c4 ),then we have an upper bounduik = \u2016ri\u03c4k\u20162 \u2265 \u2016r\u03c4k\u20162 > 0.143.2. Relation between value function and inexact Newton oracleOn the other hand, by the weak duality, this yields a lower bound`ik = bT yi\u03c4k \u2212 \u03c4\u03bbi\u03c4k \u2264 \u2016ri\u03c4k\u20162 \u2264 \u2016r\u03c4k\u20162.An illustrative figure is shown below.Figure 3.1: This plot illustrates the first two iterations of the oracle SPGL1for LASSO. At each iteration i, we have an upper bound got from the primalvalue and a lower bound got from the dual value.Moreover, the value function is differentiable, and its first derivative isrelated to the dual variable. This is stated in the following result.Theorem 3.2.1 ([25]). Let v(\u03c4) be the value function of LASSO. Then forall \u03c4 \u2208 (0, \u03c4\u03c3), v is continuously differentiable, withv\u2032(\u03c4) = \u2212\u03bb\u03c4 ,where the optimal dual variable \u03bb\u03c4 = \u2016AT y\u03c4\u2016\u221e, and y\u03c4 = r\u03c4\/\u2016r\u03c4\u20162.With this relation and (3.1), we could have an approximate uik, `ik, andsik computed asuik = \u2016ri\u03c4k\u20162, `ik =bT ri\u03c4k\u2016ri\u03c4k\u20162, and sik = \u2212\u2016AT ri\u03c4k\u2016\u221e\u2016ri\u03c4k\u20162.Consider the duality gap\u03b4i\u03c4k = uik \u2212 `ik,153.3. SPGL1 for LASSO algorithmwhich is always nonnegative because of the weak duality. This quantity canbe used to measure the accuracy of the approximate value. That is,uik \u2212 v(\u03c4k) < \u03b4i\u03c4k and |sik \u2212 v\u2032(\u03c4k)| < \u03b3\u03b4i\u03c4k ,where \u03b3 is some positive constant related to the condition number of A.The SPGL1 implementation is using a slightly different iteration updatethan the normal inexact Newton method. Let us first denote uk, `k, and skas the final iterate from SPGL1 for LASSO for a given \u03c4k. The correspondingduality gap is denoted by \u03b4k. In the inexact Newton method we mentionedearlier in Section 2.2.2, the Newton update is \u03c4k+1 \u2190 \u03c4k\u2212 `k\/sk. The SPGL1implementation is instead using the uk, i.e.,\u03c4k+1 = \u03c4k \u2212 (uk \u2212 \u03c3)\/sk.Because of this implementation, SPGL1 does not inherit the linear con-vergence that is stated in Theorem 2.2.1. However, it still owns a localconvergence dependent on the duality gap \u03b4k that is described as follows.Theorem 3.2.2 ([25]). Suppose that A has full rank, \u03c3 \u2208 (0, \u2016b\u20162) and\u03b4k = \u03b4\u03c4k \u2192 0, as i\u2192\u221e. Then if \u03c40 is close enough to \u03c4\u03c3, the iteration withv and v\u2032 replaced by v\u00af and v\u00af\u2032 to generate a sequence \u03c4k \u2192 \u03c4\u03c3 that satisfies|\u03c4k+1 \u2212 \u03c4\u03c3| = \u03b3\u03b4k + \u03b7k|\u03c4k \u2212 \u03c4\u03c3|,where \u03b7k \u2192 0 and \u03b3 is a positive constant.In the case where LASSO is solved exactly, (i.e., \u03b4k = 0), we still retainthe superlinear convergence as expected in a normal Newton iteration. Inother cases, it relies on how fast \u03b4k converges to 0 as i approaches infinity.3.3 SPGL1 for LASSO algorithmAlgorithm 3 summarizes the SPGL1 for LASSO algorithm.163.3. SPGL1 for LASSO algorithmAlgorithm 3: SPGL1 for LASSO [25, Algorithm 1]Input : x, \u03c4 , \u03b4Output: x\u03c4 , r\u03c41 Set minimum and maximum steplengths 0 < \u03b1min < \u03b1max2 Set initial step length \u03b10 \u2208 [\u03b1min, \u03b1max] and sufficient descentparameter \u03b3 \u2208 (0, 1).3 Set an integer line search history length M \u2265 1.4 Set initial iterates: x0 \u2190 P\u03c4 [x], r0 \u2190 b\u2212Ax0, g0 \u2190 \u2212AT r05 i\u2190 06 begin7 \u03b4i \u2190 \u2016ri\u20162 \u2212 (bT ri \u2212 \u03c4\u2016gi\u2016\u221e)\/\u2016ri\u20162 [compute duality gap]8 if \u03b4i < \u03b4 then9 break [exit if duality gap is small enough]10 end11 \u03b1\u2190 \u03b1i12 begin13 x\u00af\u2190 P\u03c4 [xi \u2212 \u03b1gi] [candidate line search iterate]14 r\u00af = b\u2212Ax\u00af [update the corresponding residual]15 if \u2016r\u00af\u201622 \u2264 maxj\u2208[0,min{k,M\u22121}] \u2016ri\u2212j\u201622 + \u03b3(x\u00af\u2212 xi)T gi then16 break [exit line search]17 else18 \u03b1\u2190 \u03b1\/2 [decrease step length]19 end20 end21 xi+1 \u2190 x\u00af, ri+1 \u2190 r\u00af, gi+1 \u2190 \u2212AT ri+122 \u2206x\u2190 xi+1 \u2212 xi, \u2206g \u2190 gi+1 \u2212 gi [update iterates]23 if \u2206xT\u2206g \u2264 0 [update the Barzilai-Borwein steplength]24 then25 \u03b1i+1 \u2190 imax26 else27 \u03b1i+1 \u2190 min{\u03b1max,max[\u03b1min, (\u2206xT\u2206x)\/(\u2206xT\u2206g)]}28 end29 i\u2190 i+ 130 end31 return x\u03c4 \u2190 xi, r\u03c4 \u2190 riWe now give a brief explanation about the steps in this algorithm. ThisSPGL1 for LASSO procedure is motivated from the algorithm SPG1 by Bir-173.4. Discussiongin, Martinez, and Raydan [6, Algorithm 2.1]. In step 13, we require toorthogonal project the iterates onto the one-norm feasible set, i.e.,P\u03c4 [xi \u2212 \u03b1gi] = argminx\u00af{\u2016xi \u2212 \u03b1gi \u2212 x\u00af\u20162 : \u2016x\u00af\u20161 \u2264 \u03c4} .The variable g = \u2212AT (b \u2212 Ax) is the steepest descent direction of theobjective \u2016Ax\u2212b\u20162 in LASSO. In order to conduct this one-norm projectioneffectively, in their paper, Van de Berg and Friedlander describe an algorithm[25, Section 4.2], whose worst-case complexity is of O(n log n).Steps 15-19 describes a nonmonotone line search procedure. This tech-nique is first proposed by Grippo, Lampariello, and Lucidi [15]. The condi-tion in step 15 allows an increase of the objective function value at each step,but requires an sufficient decrease on the maximum of objective function inevery min{k,M \u2212 1} iterations. This nonmonotone line search is named assuch because of this property. With this sufficient decrease, this line searchthus maintains the global convergence.This nonmonotone line search is naturally combined with the Barzilai-Borwein steps (BB-steps) [3]. In the line 23-28, the BB-step is computedand is used to update the next iterate in line 13.3.4 DiscussionIn conclusion, the steps in SPGL1 for LASSO can be summarized in Figure3.2.Iterate x(i)r(i) = b\u2212 Ax(i)g(i) = \u2212 ATr(i)\u03b4(i) = \u2225r(i)\u22252 \u2212bTr(i) \u2212 \u03c4\u2225g(i)\u2225\u221e\u2225r(i)\u22252\u03b4(i)ExitIf smallUpdate iterateLinesearch with BB-stepFigure 3.2: Conceptual map of the algorithm SPGL1 for LASSO presentedin the previous section.We observe that the current duality gap \u03b4ik (ith iteration in LASSOsolving with \u03c4k) does not monotonically decrease at each iteration i. To183.4. Discussionillustrate that, we present two examples below. The first is an example forwhich the primal value decreases but the duality gap increases.Example 1. Let A = I be an identity matrix, b =(1 0)T, and \u03c4 = 1. Theprimal and dual problems areminimizex,r\u2016r\u20162 subject to x\u2212 b = r, \u2016x\u20161 \u2264 \u03c4,maximizey,\u03bbbT y \u2212 \u03c4\u03bb subject to \u2016y\u20162 \u2264 1, \u2016y\u2016\u221e \u2264 \u03bb.Let r1 and r2 be the first and second iterations, respectively, and \u2016r2\u20162 <\u2016r1\u20162. Choose r1 =(10.25 0)Tand r2 =(0 10)T. Then their correspond-ing duality gaps are\u03b41 = \u2016r1\u20162 \u2212 (bT y1 \u2212 \u03c4\u03bb1) = \u2016r1\u20162 \u2212 bT r1\u2016r1\u20162 + \u2016r1\u2016\u221e = 19.5,\u03b42 = \u2016r2\u20162 \u2212 (bT y2 \u2212 \u03c4\u03bb2) = \u2016r2\u20162 \u2212 bT r2\u2016r2\u20162 + \u2016r2\u2016\u221e = 20 > \u03b41.Example 2. In this example, we consider a case where A \u2208 R50\u00d7128 andb \u2208 R50 have normally distributed entries with mean 0 and variance 1, and\u03c4 = 15. We use SPGL1 for LASSO to solveminx\u2016Ax\u2212 b\u20162 subject to \u2016x\u20161 \u2264 \u03c4.Below are the plots of the primal and dual values, and the duality gap ateach iteration.193.4. DiscussionFigure 3.3: Top: primal and dual values of LASSO computed by SPGL1 forLASSO at each iteration. We observe that the dual value does not monoton-ically decrease. Bottom: the duality gap at each iteration, and this valuedoes not decrease monotonically as well.20Chapter 4An accelerated dual methodIn this chapter, we present a new approach that is able to estimate theprimal-dual gap more accurately. As we shown in Figure 3.3, the trajectoryof dual objective fluctuates a lot in the first 30 iterations and then it startedto converge. That is because the approximated dual solutions yi, \u03bbi arecomputed directly from ri using the relation 3.1. They are only dual feasiblesolutions, not the dual optimal. Moreover, since they only depend on ri,all these residuals we computed in the previous iterations does not helpthe current dual objective calculation. In other words, the dual objectivecalculated at each iteration is independent, and the one calculated in the ithiteration does not help with later computations. We now show an accelerateddual method so that it uses each residual calculation to greater advantage.It provides a more accurate estimate to the dual objective. In addition, itguarantees the dual objective increases monotonically. Therefore, we areable to obtain a more accurate estimate of the primal-dual gap from thisdual objective. At the end of this chapter, we provide an algorithm to solveBPDN with this accelerated method.To begin with, we first recast (LS\u03c4 ) to the following form(LSQP\u03c4 ) minimizex\u2208Rn,r\u2208Rm12\u2016r\u201622 subject to r = b\u2212Ax, \u2016x\u20161 \u2264 \u03c4.Note that this formulation and (LS\u03c4 ) share the same minimizer (x\u2217, r\u2217)because of the quadratic function \u2016r\u201622 is monotone in \u2016r\u20162 \u2265 0. The La-grangian dual is also a quadratic optimization problem,(DLSQP\u03c4 ) minimizey,\u03bb12yT y \u2212 bT y + \u03c4\u03bb subject to \u2016AT y\u2016\u221e \u2264 \u03bb.To reduce the cost of solving the dual problem exactly, instead we con-sider optimizing it in a reduced subspace. Consider y restricted to the rangeof a matrix B \u2208 Rp, p\u001c m, i.e., y = Bc.Then the dual problem becomesminimizec,\u03bb12\u2016Bc\u20162 \u2212 bTBc+ \u03c4\u03bb subject to \u2016ATBc\u2016\u221e \u2264 \u03bb. (4.1)214.1. Construction of matrix BThis problem is easier than (DLSQP\u03c4 ) as we only find solutions on a reducedsubspace spanned by the columns of B, rather than the entire space Rm.4.1 Construction of matrix BThe low-dimensional matrix Bi is of dimension p, where p is a preassignedwindow size. The first (p\u22122) columns are called a base set. The second lastcolumn is the current residual, and the last column is the largest dual solu-tions ybest that corresponds to the largest dual objective. The acceleratedmethod is summarized in Algorithm 5.In the first p \u2212 2 iterations, we construct the approximations of dualvariables just as the standard SPGL1,yi = ri and \u03bbi = \u2016AT yi\u2016\u221e.We ensemble the residual ri to Bi, so the matrix at (p\u22122)-th iteration lookslikeBp\u22122 =[r1 . . . rp\u22122 | 0 0].Started from the (p \u2212 1)-th iteration, we start to apply the acceleratedmethod. This method involves two steps: solve the dual problem 4.1 andupdate the matrix B. There are two parts needed to be updated. The baseset gets updated whenever we obtain a larger dual objective. For example, atiteration i, the dual objective is larger than all previous dual objectives, thenwe replace the oldest residual with the current residual ri. This replacementis achieved by a circular buffer. In addition to that, we also update ybest tobe yi, and put it in the last column. These yield a new matrix Bi+1. Toformalize it mathematically, the columns of Bi takes a form ofBi =[ri1 . . . rip\u22122 | ri ybest], \u2200i \u2265 p\u2212 2,where{i1, . . . , ip\u22122} = argminI, |I|=p\u22122\u2211i\u2208I\u2223\u2223\u2223\u222312(y\u2217i )T y\u2217i \u2212 bT y\u2217i + \u03c4\u03bb\u2217i\u2223\u2223\u2223\u2223 ,and (y\u2217i , \u03bb\u2217i ) is the optimal dual solutions in the ith iteration.224.1. Construction of matrix BAlgorithm 4: An accelerated dual method for SPGL1: accelDualInput : A, b, ri, \u03c41 if i = 0 then2 ybest \u2190 03 dualMax \u2190 \u2212\u221e4 B \u2190 [0 . . . 0 | 0 0]5 else if 1 \u2264 i \u2264 p\u2212 2 then6 yi \u2190 ri, \u03bbi \u2190 \u2016AT yi\u2016\u221e7 \u03b4i \u2190 rTi ri \u2212 bT yi + \u03c4\u03bbi8 dualMax \u2190 max(dualMax, \u03b4i)9 B(i)\u2190 ri [add residual to B]10 else if i \u2265 p\u2212 1 then11 B(p\u2212 1)\u2190 ri [add current ri to second last col]12 B(p)\u2190 ybest [add ybest to last col]13 yacc \u2190 qpsolver(B) [solve dual problem]14 \u03bbacc \u2190 \u2016AT yacc\u2016\u221e15 dual \u2190 \u22121\/2yTaccyacc + bT yacc \u2212 \u03c4\u03bbacc16 \u03b4i \u2190 1\/2rTi ri\u2212 dual [compute duality gap]17 if dual > dualMax then18 B(mod(i, p\u2212 2)) \u2190 ri [update the base set in B]19 ybest \u2190 yacc [update best dual solution]20 end21 end22 return \u03b4i, yiThis construction has two main advantages. First, it guarantees themonotonic increase of the dual objective. That is, the dual objective ob-tained in the ith iteration by Bi is at least as good as the largest dualobjective obtained in the previous iterations. This is because we have ybestin the last column in Bi. When we use Bi to solve (4.1), notice thatc =[0 . . . 0 | 0 1]is always a feasible point. By choosing this c, we arrivey = Bic = ybest,which gives the largest dual objective in the first i\u22121 iterations. Therefore,one guaranteed to have a monotone increase of the dual objective. Second,234.1. Construction of matrix Bthis construction guarantees that the dual objective obtained from our ac-celerated method is comparable to the one obtained from the original SPGL1at each iteration. Recall that the original SPGL1 only arrives a feasible dualvariable yi = ri at iteration i. However, the matrix Bi includes ri in thesecond last column. When we optimize (4.1) using Bi, we always have afeasible solutionc =[0 . . . 0 | 1 0] ,and y = Bic = ri. Thus, dual value computed using this y is no greaterthan the optimal dual in (4.1). That is,12\u2016y\u20162 \u2212 bT y + \u03c4\u2016AT y\u2016\u221e \u2264 OPT,where OPT denotes the optimal dual objective in (4.1).There are two potentially expensive components in (4.1). One arises inthe constraint where we have to conduct a matrix-matrix products ATBi.The other is the matrix-vector multiplication Bic in the objective. We argue,in the following two sections, that by using different matrices in (4.1), onecan possibly reduce the computation of one component, but not both.4.1.1 Analysis of the general matrix BIf we use the matrix Bi constructed as above, then we can avoid a lotof computation in the matrix multiplication ATBi. Treating the matrixA as an operator, then ATBi is simply just a sequence of matrix-vectormultiplication on the columns of Bi. That is,ATBi =[AT ri1 . . . AT rip\u22122 | AT ri AT ybest].Since we already have AT ri computed at each iteration i, the only one werequire to calculate is AT ybest. Thus, constructing ATBi is inexpensive.Moreover, the storage cost of[AT ri1 . . . AT rip\u22122 | AT ri]is roughly n(p\u22121). This storage cost is small compared to the computationcost of ATBi.We now focus on how this affects to the quadratic part in the dualobjective in 4.1. The quadratic part is \u2016Bic\u20162 = cTBTBc, which requires tocompute the Hessian BTB at each iteration.244.2. Different solvers for dual problem4.1.2 Analysis of orthogonal matrix QWe apply a reduced-QR factorization to B to get an orthogonal matrix Q(i.e., B = QR). Using this Q helps us to reduce the quadratic part in 4.1 tominc,\u03bb12\u2016c\u20162 \u2212 bTQc+ \u03c4\u03bb subject to \u2016ATQc\u2016\u221e \u2264 \u03bb,since QTQ = I.This orthogonal matrix Q leads to an identity Hessian, but we need tocalculate ATQ.4.2 Different solvers for dual problemSince our dual problem (4.1) is a quadratic problem, we present two solversfor this QP. We give a description of each solver, as well as present how toformulate our problem into a form that accepted by the solver.4.2.1 PDCOPDCO [23] is a popular free and open-source solver implemented in MATLAB.It uses a primal-dual interior method to solve a convex objective functionwith linear constraints.PDCO is able to solve problems of the following form:minx,r\u03c6(x) +12\u2016D1x\u20162 + 12\u2016r\u20162 subject to\uf8f1\uf8f2\uf8f3Ax+D2r = bbl \u2264 x \u2264 bur unconstrained.Since PDCO only takes in equality constraints and element-wise bound con-straints, we add two slack variables s1, s2, and our problem becomesmin12[cT \u03bbT sT1 sT2]\ufe38 \ufe37\ufe37 \ufe38(2n+1+p)\uf8ee\uf8ef\uf8ef\uf8f0I 0 . . . 00 . . . 00 . . . 00 . . . 0\uf8f9\uf8fa\uf8fa\uf8fb\ufe38 \ufe37\ufe37 \ufe38(2n+1+p)\u00d7(2n+1+p)\uf8ee\uf8ef\uf8ef\uf8f0c\u03bbs1s2\uf8f9\uf8fa\uf8fa\uf8fb+\uf8ee\uf8ef\uf8ef\uf8f0\u2212QT b\u03c400\uf8f9\uf8fa\uf8fa\uf8fbT \uf8ee\uf8ef\uf8ef\uf8f0c\u03bbs1s2\uf8f9\uf8fa\uf8fa\uf8fbsubject to[\u2212ATQ \u2212e I 0ATQ \u2212e 0 I]\ufe38 \ufe37\ufe37 \ufe382n\u00d7(2n+1+p)\uf8ee\uf8ef\uf8ef\uf8f0c\u03bbs1s2\uf8f9\uf8fa\uf8fa\uf8fb = 0,254.2. Different solvers for dual problem\uf8ee\uf8ef\uf8ef\uf8f0\u2212\u221e\u2212\u221e00\uf8f9\uf8fa\uf8fa\uf8fb \u2264\uf8ee\uf8ef\uf8ef\uf8f0c\u03bbs1s2\uf8f9\uf8fa\uf8fa\uf8fb \u2264\uf8ee\uf8ef\uf8ef\uf8f0\u221e\u221e\u221e\u221e\uf8f9\uf8fa\uf8fa\uf8fb .Consider the new variable x,x =[cT \u03bbT sT1 sT2].We now fit our problem to the PDCO formats. We put the quadratic partinto the primal regularization term 12\u2016D1x\u201622. Here we require d1 \u2208 R2n+1+pand D1 = diag(d1). Thus,\u03c6(x) =\uf8ee\uf8ef\uf8ef\uf8f0\u2212QT b\u03c400\uf8f9\uf8fa\uf8fa\uf8fbT \uf8ee\uf8ef\uf8ef\uf8f0c\u03bbs1s2\uf8f9\uf8fa\uf8fa\uf8fb ,with primal regularization termd1 =[1 \u00b7 \u00b7 \u00b7 1 0 \u00b7 \u00b7 \u00b7 0]T \u2208 R2n+1+p,where first p entries are 1, and D1 = diag(d1).4.2.2 GurobiThe Gurobi Optimizer [17] is a commercial solver designed to tackle math-ematical optimization problems including quadratic programming, linearprogramming, and mixed-integer programming, etc. It is written in C andfeatures several interfaces including MATLAB.The Gurobi MATLAB interface requires the problems to be of the followingform:minimize xTPx+ cTx+ \u03b1subject to Mx = b (linear constraints)l \u2264 x \u2264 u (bound constraints).In a similar manner, we tailor our problem to fit the form withP :=\uf8ee\uf8ef\uf8ef\uf8f0I 0 . . . 00 . . . 00 . . . 00 . . . 0\uf8f9\uf8fa\uf8fa\uf8fb ,\ufe38 \ufe37\ufe37 \ufe38(2n+1+p)\u00d7(2n+1+p)x :=\uf8ee\uf8ef\uf8ef\uf8f0c\u03bbs1s2\uf8f9\uf8fa\uf8fa\uf8fb , c :=\uf8ee\uf8ef\uf8ef\uf8f0\u2212QT b\u03c400\uf8f9\uf8fa\uf8fa\uf8fb ;264.3. Measure quantityM :=[\u2212ATQ \u2212e I 0ATQ \u2212e 0 I]\ufe38 \ufe37\ufe37 \ufe382n\u00d7(2n+1+p), b := 0 \u2208 R2n,l :=\uf8ee\uf8ef\uf8ef\uf8f0\u2212\u221e\u2212\u221e00\uf8f9\uf8fa\uf8fa\uf8fb , u :=\uf8ee\uf8ef\uf8ef\uf8f0\u221e\u221e\u221e\u221e\uf8f9\uf8fa\uf8fa\uf8fb .4.3 Measure quantityHere we use the average mutual coherence to measure the dependence ofcolumns of our final matrix Bi. The average mutual coherence v of a matrixR is defined asv(R) =1n(n\u2212 1) max\u2223\u2223\u2223\u2223\u2223\u2223n\u2211i,j=1,j 6=i\u3008ui, uj\u3009\u2223\u2223\u2223\u2223\u2223\u2223 ,where n is the number of columns in R, and ui, i \u2208 {1, . . . , n} is a normalizedith column inR. Notice that the formula contains a scalar 1n(n\u22121) , that comesfrom the fact that there are n(n\u2212 1) inner products between n columns.We show below the expected mutual coherence for the unconstrainedproblem is approximately 1, dependent on the stepsize. Suppose all pointsare feasible, i.e., satisfies \u2016x\u20161 \u2264 \u03c4 . Consider the unconstrained minimiza-tion problemminx12\u2016Ax\u2212 b\u20162.The gradient descent update isxi+1 = xi \u2212 \u03b1\u2207f(xi) = xi \u2212 \u03b1(AT (Ax\u2212 b)) = xi + \u03b1AT ri,where ri = b\u2212Axi. Then\u2212ri+1 = Axi+1 \u2212 b = A(xi + \u03b1AT ri)\u2212 b = \u2212ri + \u03b1AAT ri.Thus\u3008ri+1, ri\u3009 = \u3008ri \u2212 \u03b1AAT ri, ri\u3009 = \u3008(I \u2212 \u03b1AAT )ri, ri\u3009.We know that\u3008ri+1, ri\u3009 = \u3008(I \u2212 \u03b1AAT )ri, ri\u3009 \u2264 \u2016(I \u2212 \u03b1AAT )ri\u2016\u2016ri\u2016 \u2264 \u2016I \u2212 \u03b1AAT \u2016\u2016ri\u20162,274.4. Interpretationthe first inequality comes from Cauchy-Schwarz inequality and the secondcomes from the definition of matrix norm.Therefore, if \u03b1 is small, then \u3008ri+1, ri\u3009 \u2248 \u2016ri\u20162. For our matrix B, theaverage mutual coherence of a normalized columns shall be roughly 1 forsmall stepsize \u03b1.4.4 InterpretationNow consider y = Bc, then the dual problem becomesminc,\u03bb12\u2016Bc\u20162 \u2212 bTBc+ \u03c4\u03bb subject to \u2212 \u03bbe \u2264 ATBc \u2264 \u03bbe.The \u201creduced primal\u201d problem corresponding to this dual isminc,\u03bb12\u2016BT (Ax\u2212 b)\u201622 subject to \u2016x\u20161 \u2264 \u03c4.If B is an orthogonal matrix, this equation can be interpreted as projectingcolumns of A to the subspace spanned by the orthogonal columns of B.4.5 SPGL1 with an accelerated methodWe present an pseudocode of solving BPDN (BP\u03c3) using SPGL1 with anaccelerated method.284.5. SPGL1 with an accelerated methodAlgorithm 5: Solve BPDN using SPGL1 with the accelerated dualmethodInput : A, b, tol1 \u03c4o \u2190 0[Outer loop: root-finding]2 for k = 1, . . . do[Inner loop: solve LASSO]3 while \u03b4i > tol do4 compute xi5 ri \u2190 b\u2212Axi[accelerated dual method: Algorithm 5]6 (\u03b4i, yi)\u2190 accelDual(A, b, ri, \u03c4k)7 k \u2190 i+ 18 compute xi+1 using nonmonotone linesearch and BB-step9 end[end of inner loop: output u = \u2016ri\u20162, s = \u2212\u2016AT yi\u2016\u221e]10 uk \u2190 u, sk \u2190 s11 update \u03c4k \u2190 \u03c4k \u2212 (uk \u2212 \u03c3)\/sk12 end13 return \u03c4k29Chapter 5Experimental resultsAll the numerical experiments presented in this section have been performedusing MATLAB 2017b on a Macbook Pro equipped with a 2.8 GHz IntelCore i7 CPU and 16 GB of memory. while the specialized solvers againstwhich we compare our results in Chapter 5 were kindly provided by someof their authors.5.1 Test problems generationIn our experiments, we focus on hard test cases where the original SPGL1 forLASSO runs more than requires over 1000 iterations to solve. The difficultyof the problem is dependent on the choice of \u03c4 . Hard problems normallyoccur when \u03c4 is very close to the true root \u03c4\u2217\u03c3 .To generate hard problems, we first choose a value \u03c3, and solve BPDNusing SPGL1. This gives us an approximate \u03c4 is very close to the true root\u03c4\u2217\u03c3 . Then we hard code \u03c4 to finalize a value of \u03c4 so that the original SPGL1for LASSO takes a great number of iterations to solve.5.2 Numerical experiments and discussionTable 5.1 presents the numerical results. In this table, the first column isthe description of the problem. Both sensing matrix A and measurementvector b have normally distributed random entries with mean 0 and variance1. The second column shows the number of iteration each method used.The first row is the original SPGL1 for LASSO, we use it as the benchmark.The second row is the hybrid SPGL1 for LASSO. The rest of rows showour accelerated method with different quadratic programming solvers. Themaximal iterations were set to be 106. Finally, the last two columns indicatethe total running times of each methods. For the accelerated method, wealso display the performance with respect to the running times of each solver.As shown in the table, our accelerated method successfully reduce theiteration cost compared to the original SPGL1 for LASSO. Especially in the305.2. Numerical experiments and discussionsecond problem, our method with Gurobi qp solver is able to reduce theiteration cost by approximately a factor of 4.In general, our accelerated dual method outperforms the original SPGL1for LASSO in terms of the iteration cost. However, it increases the overalltime complexity.Problem No. of iterations Total time (sec) Solver time (sec)A : 512\u00d7 1024 original: 9672 9b : 512\u00d7 1 PDCO: 3590 829.9 806.6\u03c4 = 22 Gurobi: 3191 579.3 168.5A : 256\u00d7 1024 original: 42409 20.0b : 256\u00d7 1 PDCO: \u2212\u2212\u2212 \u2212\u2212\u2212 \u2212\u2212\u2212\u03c4 = 11.1 Gurobi: 15051 2958.6 838.1A : 256\u00d7 1024 original: 5588 2.9b : 256\u00d7 1 PDCO: 3872 889.5 873.4\u03c4 = 10.9 Gurobi: 3642 658.1 185.1A : 256\u00d7 1024 original: 2168 1.0b : 256\u00d7 1 PDCO: 1987 466.9 458.6\u03c4 = 10.7 Gurobi: 1970 351.6 101.7A : 512\u00d7 768 original: 25763 15b : 512\u00d7 1 PDCO: 14519 2473.6 2402.8\u03c4 = 28 Gurobi: 14205 1781.3 608.2Table 5.1: Experimental results. Note that in the second problem, thedotted line in PDCO means that it is able to solve the problem within themaximum iteration limit.315.3. Future directions5.3 Future directionsWe observe from the experimental results that our accelerated method in-creases the overall time cost. This suggests that future work should focus onthe reduction of per-iteration cost. There are two possible directions. Onedirection is to find a cheaper QP solver, such as an active-set solver, or wecould use ADMM (alternating direction method of multipliers) to solve thisQP sub-problem. The other interesting direction is to reduce the frequencyof activating our accelerated method. Currently we are using acceleratedmethod at every iteration to solve the dual problem. Perhaps we could doit every some iterations, or frequent initially and less frequent in later oniterations.32Bibliography[1] Aleksandr Y Aravkin, James V Burke, Dmitry Drusvyatskiy, Michael PFriedlander, and Scott Roy. Level-set methods for convex optimization.Mathematical Programming, pages 1\u201332.[2] Francis Bach. Duality between subgradient and conditional gradientmethods. SIAM Journal on Optimization, 25(1):115\u2013129, 2015.[3] Jonathan Barzilai and Jonathan M. Borwein. Two-point step sizegradient methods. IMA J. Numer. Anal., 8(1):141\u2013148, 1988. ISSN0272-4979. doi: 10.1093\/imanum\/8.1.141. URL https:\/\/doi.org\/10.1093\/imanum\/8.1.141.[4] Stephen Becker, Je\u00b4ro\u02c6me Bobin, and Emmanuel J. Cande`s. Nesta: Afast and accurate first-order method for sparse recovery, 2009. URLhttps:\/\/statweb.stanford.edu\/~candes\/nesta\/.[5] Stephen Becker, Je\u00b4ro\u02c6me Bobin, and Emmanuel J. Cande`s. NESTA:A fast and accurate first-order method for sparse recovery. SIAMJ. Imaging Sciences, 4(1):1\u201339, 2011. doi: 10.1137\/090756855. URLhttps:\/\/doi.org\/10.1137\/090756855.[6] Ernesto G. Birgin, Jose\u00b4 Mario Mart\u00b4\u0131nez, and Marcos Ray-dan. Nonmonotone spectral projected gradient methods on convexsets. SIAM J. Optim., 10(4):1196\u20131211, 2000. ISSN 1052-6234.doi: 10.1137\/S1052623497330963. URL https:\/\/doi.org\/10.1137\/S1052623497330963.[7] Stephen Boyd and Lieven Vandenberghe. Convex Optimiza-tion. Cambridge University Press, March 2004. ISBN0521833787. URL http:\/\/www.amazon.com\/exec\/obidos\/redirect?tag=citeulike-20&path=ASIN\/0521833787.[8] Antonin Chambolle and Thomas Pock. A first-order primal-dualalgorithm for convex problems with applications to imaging. J.Math. Imaging Vision, 40(1):120\u2013145, 2011. ISSN 0924-9907.33Bibliographydoi: 10.1007\/s10851-010-0251-1. URL https:\/\/doi.org\/10.1007\/s10851-010-0251-1.[9] Scott Shaobing Chen, David L. Donoho, and Michael A. Saunders.Atomic decomposition by basis pursuit. SIAM J. Sci. Comput., 20(1):33\u201361, 1998. ISSN 1064-8275. doi: 10.1137\/S1064827596304010. URLhttps:\/\/doi.org\/10.1137\/S1064827596304010.[10] Francis Clarke. Functional analysis, calculus of variations andoptimal control, volume 264 of Graduate Texts in Mathematics.Springer, London, 2013. ISBN 978-1-4471-4819-7; 978-1-4471-4820-3. doi: 10.1007\/978-1-4471-4820-3. URL https:\/\/doi.org\/10.1007\/978-1-4471-4820-3.[11] Yin Zhang Elaine T. Hale, Wotao Yin. Fixed-point continuation (fpc),2007. URL https:\/\/www.caam.rice.edu\/~optimization\/L1\/fpc\/#abs.[12] R. Estrin and M. P. Friedlander. A perturbation view of level-set meth-ods for convex optimization. 2018.[13] Ma\u00b4rio AT Figueiredo, Robert D Nowak, and Stephen J Wright. Gra-dient projection for sparse reconstruction: Application to compressedsensing and other inverse problems. IEEE Journal of selected topics insignal processing, 1(4):586\u2013597, 2007.[14] Marguerite Frank and Philip Wolfe. An algorithm for quadratic pro-gramming. Naval research logistics quarterly, 3(1-2):95\u2013110, 1956.[15] L. Grippo, F. Lampariello, and S. Lucidi. A nonmonotone line searchtechnique for Newton\u2019s method. SIAM J. Numer. Anal., 23(4):707\u2013716,1986. ISSN 0036-1429. doi: 10.1137\/0723046. URL https:\/\/doi.org\/10.1137\/0723046.[16] Elaine T Hale, Wotao Yin, and Yin Zhang. A fixed-point continuationmethod for l1-regularization with application to compressed sensing.Technical report, 2007.[17] Gurobi Optimization Inc. Gurobi optimizer. software, 2012. URL http:\/\/www.gurobi.com\/documentation\/8.1\/refman.pdf.[18] Martin Jaggi. Revisiting frank-wolfe: Projection-free sparse convexoptimization. In ICML (1), pages 427\u2013435, 2013.34Bibliography[19] James E Kelley, Jr. The cutting-plane method for solving convex pro-grams. Journal of the society for Industrial and Applied Mathematics,8(4):703\u2013712, 1960.[20] Kwangmoo Koh, Seung-Jean Kim, and Stephen P. Boyd. An interior-point method for large-scale l1-regularized logistic regression. Journalof Machine Learning Research, 8:1519\u20131555, 2007. URL http:\/\/dl.acm.org\/citation.cfm?id=1314550.[21] Yurii Nesterov. Smooth minimization of non-smooth functions. Math.Program., 103(1):127\u2013152, 2005. doi: 10.1007\/s10107-004-0552-5. URLhttps:\/\/doi.org\/10.1007\/s10107-004-0552-5.[22] James Renegar. A framework for applying subgradient methods toconic optimization problems. arXiv preprint arXiv:1503.02611, 2015.[23] Michael Saunders. Pdco: Primaldual interior method for convex objec-tives, 2010. URL http:\/\/web.stanford.edu\/group\/SOL\/software\/pdco\/pdco.pdf.[24] Robert Tibshirani. Regression shrinkage and selection via the lasso: aretrospective. J. R. Stat. Soc. Ser. B Stat. Methodol., 73(3):273\u2013282,2011. ISSN 1369-7412. doi: 10.1111\/j.1467-9868.2011.00771.x. URLhttps:\/\/doi.org\/10.1111\/j.1467-9868.2011.00771.x.[25] Ewout van den Berg and Michael P. Friedlander. Probing the Paretofrontier for basis pursuit solutions. SIAM J. Sci. Comput., 31(2):890\u2013912, 2008\/09. ISSN 1064-8275. doi: 10.1137\/080714488. URL https:\/\/doi.org\/10.1137\/080714488.[26] Ewout van den Berg and Michael P. Friedlander. Sparse optimizationwith least-squares constraints. SIAM J. Optim., 21(4):1201\u20131229, 2011.ISSN 1052-6234. doi: 10.1137\/100785028. URL https:\/\/doi.org\/10.1137\/100785028.[27] Zaiwen Wen, Wotao Yin, Donald Goldfarb, and Yin Zhang. A fastalgorithm for sparse reconstruction based on shrinkage, subspace op-timization, and continuation. SIAM Journal on Scientific Computing,32(4):1832\u20131857, 2010.[28] Wotao Yin Zaiwen Wen. Fpc as (fixed-point continuation and activeset), 2008. URL https:\/\/www.caam.rice.edu\/~optimization\/L1\/FPC_AS\/.35","@language":"en"}],"Genre":[{"@value":"Thesis\/Dissertation","@language":"en"}],"GraduationDate":[{"@value":"2019-05","@language":"en"}],"IsShownAt":[{"@value":"10.14288\/1.0378176","@language":"en"}],"Language":[{"@value":"eng","@language":"en"}],"Program":[{"@value":"Mathematics","@language":"en"}],"Provider":[{"@value":"Vancouver : University of British Columbia Library","@language":"en"}],"Publisher":[{"@value":"University of British Columbia","@language":"en"}],"Rights":[{"@value":"Attribution-NonCommercial-NoDerivatives 4.0 International","@language":"*"}],"RightsURI":[{"@value":"http:\/\/creativecommons.org\/licenses\/by-nc-nd\/4.0\/","@language":"*"}],"ScholarlyLevel":[{"@value":"Graduate","@language":"en"}],"Title":[{"@value":"An accelerated dual method for SPGL1","@language":"en"}],"Type":[{"@value":"Text","@language":"en"}],"URI":[{"@value":"http:\/\/hdl.handle.net\/2429\/69644","@language":"en"}],"SortDate":[{"@value":"2019-12-31 AD","@language":"en"}],"@id":"doi:10.14288\/1.0378176"}