An accelerated dual method for SPGL1byAnyi BaoB.Sc., Simon Fraser University, 2017A THESIS SUBMITTED IN PARTIAL FULFILLMENT OFTHE REQUIREMENTS FOR THE DEGREE OFMASTER OF SCIENCEinTHE FACULTY OF GRADUATE AND POSTDOCTORAL STUDIES(Mathematics)THE UNIVERSITY OF BRITISH COLUMBIA(Vancouver)April 2019c© Anyi Bao, 2019The following individuals certify that they have read, and recommend tothe Faculty of Graduate and Postdoctoral Studies for acceptance, the thesisentitled:An accelerated dual method for SPGL1submitted by Anyi Bao in partial fulfillment of the requirements for thedegree of Master of Science in Mathematics.Examining Committee:Michael P. Friedlander, Computer Science and MathematicsSupervisorYifan Sun, Computer ScienceSupervisory Committee MemberAdditional Supervisory Committee Members:Ewout van de Berg, IBM T.J. Watson Research CenterSupervisory Committee MemberiiAbstractThis thesis studies a well-known solver SPGL1, which applies a general root-finding process to solve the basis pursuit denoising problem. This processinvolves a nested loop. The outer loop is an inexact Newton root-findingprocess, and the inner loop approximately solves LASSO (least absoluteshrinkage and selection operator). We propose an accelerated dual methodto accelerate the inner loop by optimizing the dual problem of LASSO ona low-dimensional space. Experimental results show that our acceleratedmethod that can successfully reduce the total iteration counts. Our futurework is to reduce the total running time of our accelerated method.iiiLay SummaryThis thesis focuses on a widely-used solver called SPGL1, which is used forlarge-scale sparse reconstruction. This solver contains a nested loop. Wepropose an accelerated dual method to increase the speed of convergenceof the inner loop, and thus reduces the overall computational complexity ofthe solver.ivPrefaceThis thesis contains my research during my M.Sc studies at the Universityof British Columbia. The research is conducted under the supervision ofProf. Michael P. Friedlander from the University of British Columbia andDr. Ewout van de Berg from the IBM T.J. Watson Research Center.The accelerated dual method proposed in Chapter 4 came from discus-sions with Prof. Friedlander and Dr. van de Berg. The algorithm imple-mentation was done on my own.vTable of ContentsAbstract . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . iiiLay Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . ivPreface . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . vTable of Contents . . . . . . . . . . . . . . . . . . . . . . . . . . . . viList of Tables . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . viiiList of Figures . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . ixAcknowledgements . . . . . . . . . . . . . . . . . . . . . . . . . . . x1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11.1 Related work . . . . . . . . . . . . . . . . . . . . . . . . . . . 21.2 Thesis overview and contributions . . . . . . . . . . . . . . . 31.3 Notation and key definitions . . . . . . . . . . . . . . . . . . 32 Level-set methods for convex optimization . . . . . . . . . . 52.1 The level-set approach . . . . . . . . . . . . . . . . . . . . . 52.1.1 Warm-starts . . . . . . . . . . . . . . . . . . . . . . . 92.2 Root-finding procedure . . . . . . . . . . . . . . . . . . . . . 92.2.1 Inexact Newton method . . . . . . . . . . . . . . . . 102.2.2 Oracle . . . . . . . . . . . . . . . . . . . . . . . . . . 122.2.3 Feasibility of the solution . . . . . . . . . . . . . . . . 123 Spectral Projected Gradient for L1 minimization (SPGL1) 133.1 The dual problem of LASSO . . . . . . . . . . . . . . . . . . 133.2 Relation between value function and inexact Newton oracle . 143.3 SPGL1 for LASSO algorithm . . . . . . . . . . . . . . . . . . 163.4 Discussion . . . . . . . . . . . . . . . . . . . . . . . . . . . . 18viTable of Contents4 An accelerated dual method . . . . . . . . . . . . . . . . . . . 214.1 Construction of matrix B . . . . . . . . . . . . . . . . . . . . 224.1.1 Analysis of the general matrix B . . . . . . . . . . . . 244.1.2 Analysis of orthogonal matrix Q . . . . . . . . . . . . 254.2 Different solvers for dual problem . . . . . . . . . . . . . . . 254.2.1 PDCO . . . . . . . . . . . . . . . . . . . . . . . . . . . 254.2.2 Gurobi . . . . . . . . . . . . . . . . . . . . . . . . . . 264.3 Measure quantity . . . . . . . . . . . . . . . . . . . . . . . . 274.4 Interpretation . . . . . . . . . . . . . . . . . . . . . . . . . . 284.5 SPGL1 with an accelerated method . . . . . . . . . . . . . . . 285 Experimental results . . . . . . . . . . . . . . . . . . . . . . . . 305.1 Test problems generation . . . . . . . . . . . . . . . . . . . . 305.2 Numerical experiments and discussion . . . . . . . . . . . . . 305.3 Future directions . . . . . . . . . . . . . . . . . . . . . . . . . 32Bibliography . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 33viiList of Tables2.1 Some typical applications of the level-set method [1]. . . . . . 65.1 Experimental results. Note that in the second problem, thedotted line in PDCO means that it is able to solve the problemwithin the maximum iteration limit. . . . . . . . . . . . . . . 31viiiList of Figures2.1 An illustration of the goal of this proof. . . . . . . . . . . . . 72.2 Plots in the first row show the first iteration of the inex-act Newton method (on the left), and the classical Newton’smethod (on the right). The inexact method use the approxi-mate function values while the classical methods compute thetrue ones. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 103.1 This plot illustrates the first two iterations of the oracle SPGL1for LASSO. At each iteration i, we have an upper bound gotfrom the primal value and a lower bound got from the dualvalue. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 153.2 Conceptual map of the algorithm SPGL1 for LASSO presentedin the previous section. . . . . . . . . . . . . . . . . . . . . . . 183.3 Top: primal and dual values of LASSO computed by SPGL1for LASSO at each iteration. We observe that the dual valuedoes not monotonically decrease. Bottom: the duality gap ateach iteration, and this value does not decrease monotonicallyas well. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 20ixAcknowledgementsIt is a pleasure to acknowledge the support I have gained during my master’sstudies. First, I would like to send my grate thanks to my supervisor MichaelP. Friedlander. It has been a pleasure and a privilege to be his student. Hisenergy and enthusiasm are infectious.My research has greatly benefited from insightful discussion and collab-oration with Ewout van den Berg. I also like to thank Yifan Sun for fruitfuldiscussions and constantly encouragement. They have helped me a lot inmy studies. The conversations with them have shaped and sharpened mythinking about optimization.My group members Halyun Jeong, Liran Li, Huang Fang, and ZhenanFan have been of much help and shown me great kindness throughout mygraduate studies.Finally, I would like to thank my family for continuous support. Es-pecially my boyfriend Chen-Chih Lai, he offered steadfast and everlastingcompanionship and support throughout my years at UBC. Without theirencouragement I could not have completed this quest.xChapter 1IntroductionThe main goal of the basis pursuit problem is to find a sparse solution of anunderdetermined system of equations Ax = b, where A ∈ Rm×n is a sensingmatrix with m  n, and b ∈ Rm is called a measurement vector. The goalis to find a sparse signal x ∈ Rn that picks out the certain columns of A tofit the measurement b. Formally, we have the following convex optimizationproblem(BP) minx∈Rn‖x‖1 subject to Ax = b.In practice, we often encounter the case of noisy measurements. Assumethat the measurement b can be decomposed into clean data and noise, i.e.,b = s+ σoz,where s represents clean data, z is a standard white Gaussian noise vector,and σo > 0 indicates the noise level. Furthermore, the clean data s can bewritten as s = Ax, then we will have σoz = b − Ax. Say that we have acontrol of the total noise in the system, i.e., ‖σoz‖2 ≤ σ, then we arrive thefollowing basis pursuit denoising (BPDN) formulation(BPσ) minimizex∈Rn‖x‖1 subject to ‖Ax− b‖2 ≤ σ.This can be interpreted as minimizing a regularizer subject to a prescribedlevel of data fit. The theme of this thesis is that users have a priori estimateon the noise levels inherent in the measurements, and thus the (BPσ) is anatural formulation to this theme.With the help of the Lagrange multiplier, we can convert the constrainedproblem (BPσ) into an unconstrained problem. This yields another variantof the BPDN problem: the penalized least-square problem(QPλ) minimizex∈Rn‖Ax− b‖2 + λ‖x‖1.This is also the original formulation of basis pursuit denoise problemthat was proposed by Chen, Donoho, and Saunders [9]. From a modeling11.1. Related workperspective, this formulation establishes the tradeoff between reconstructionaccuracy and the signal sparsity using a positive penalty parameter λ.A third formulation is the least absolute shrinkage and selection operator(LASSO) [24](LSτ ) minimizex∈Rn‖Ax− b‖2 subject to ‖x‖1 ≤ τ.For appropriate choices of τ, σ, and λ, the solutions to these three prob-lems are identical, and these three formulations are in this sense equivalent.However, except for special cases– such as A orthogonal– we often don’t havea prior knowledge of the parameters that make these problems equivalent[25].1.1 Related workThere are some optimization algorithms to solve (BPσ), (QPλ), and (LSτ ).This section briefly discusses some algorithms to solve each formulation.We first present the algorithms to solve (QPλ). Figueiredo et al. [13] pro-pose GPSR algorithm to solve (QPλ). This method re-formulates the prob-lem to a bound-constrained quadratic problem, and then use the gradient-projection method with Barzilai-Borwein (BB) steps to solve it. In the sameyear, Kim et al. [20] present an interior-point method for this the penalizedleast-square problem. Hale, Yin and Zhang [11, 16] propose the fixed pointcontinuation algorithm (FPC). This method is based on operator splitting. Itfirst splits the optimality condition into the sum of two maximal monotoneoperators, T1, which is gradient of least-square loss, and T2, which is thesoft-thresholding. It then expresses the optimal solution in terms of thesetwo operators, and it leads to the forward-backward splitting algorithm forfinding a zero of T1 + T2. Wen et al. [27, 28] improve the performance ofFPC by adding an active-set step (FPC AS).Next, we present several algorithms to solve the formulation (BPσ).Becker, Bobin and Candes [4, 5] proposed NESTA, it is based on Nesterov’sa smoothing technique and an accelerated first-order algorithm [21]. Thelevel-set framework was first used by van den Berg and Friedlander [25]in 2008 for solving BPDN using a sequence of LASSO subproblems, andthe theory was implemented in the software package called SPGL1. The bigsuccess of SPGL1 motivated them to extend this framework to a more gen-eralized problem class: gauge optimization [26]. The theory of level-set wasformalized and applied to the general convex programming by Aravkin etal. in 2016 [1]. We will give a summary of this method in Chapter 2.21.2. Thesis overview and contributions1.2 Thesis overview and contributionsThe main contribution of this thesis is to develop a new accelerated methodfor solving LASSO in SPGL1. In the original method, LASSO is solved itera-tively until the duality gap converges. To achieve that, at each iteration, thedual solutions are computed directly from the primal solutions. Our newmethod is to actually optimize the dual problem separately on a reducedspace, and this reduced space is constructed based on the primal feasiblepoints. Experimental results show this accelerated method could success-fully reduce the number of iteration to solve LASSO.The rest of the thesis is structured as follows. In Section 1.3, we givesome convex analysis background to understand our algorithms and theo-rems. Chapter 2 provides a description of the intuition behind the theme ofconnecting (BPσ) and (LSτ ). It gives a general theory called the level-setmethod for exchanging the roles of the objective and constraints, and insteadapproximately solving a sequence of level-set problems. In the chapter 3, wego back to look at level set techniques in the context of the BPDN problem.We describe the SPGL1 algorithm in details, as well as describe the challengeof the current algorithm. Chapter 4 introduces and gives details on a newvariant of SPGL1. In Chapter 5, we show a series of experimental results thatusing our new method is numerically comparable with the original SPGL1 insolving LASSO. Finally, in Chapter 6 we outline some directions for futureresearch.1.3 Notation and key definitionsBefore we go into the main content, this section provides notation and thedefinitions of terminologies that we use throughout the thesis.We use ‖x‖p = (∑ |xi|p)1/p to denote the p-norm of a vector x. Unlessotherwise specified, we adapt the default norm ‖ · ‖ = ‖ · ‖2. We use thesymbol e to represent a vector of all ones.Now we review some fundamental concepts in convex analysis. A set Cis convex if the line segment between any two points in the set C is also inC. That is, for any x, y ∈ C, with θ ∈ [0, 1], we have θx + (1 − θ)y ∈ C.We consider all functions are on the extended real line R∪ {∞}. A domainof a function f is dom f = {x | f(x) <∞} . The indicator function δ on aconvex set C is defined asδC(x) ={0 x ∈ C+∞ otherwise.31.3. Notation and key definitionsWe call a function f convex if and only if its epigraph, which is the setepi f ={(x, r) ∈ Rn+1, f(x) ≤ r} ⊆ Rn+1,is convex.4Chapter 2Level-set methods for convexoptimizationHow is (LSτ ) related to (BPσ)? It is not hard to observe that for properchoices of τ and σ, one can be “transformed” to the other by simply swap-ping the objective function and the constraints. In practice, we often en-counter the case when we have to minimize an objective function subjectto a relatively much more complicated and difficult constraint function. Toremedy this problem, Aravkin, Burke, Drusvyatskiy, Friedlander, and Roy[1] propose a level-set method that exchanges the objective and the difficultconstraint, and then uses the easier flipped problem to solve the original.We have the following pair of problems in the general setting(Pσ) minimizex∈Xφ(x) subject to ρ(Ax− b) ≤ σ,and(Qτ ) minimizex∈Xρ(Ax− b) subject to φ(x) ≤ τ,where X is a closed convex set, and the functions φ, ρ are closed and convex.Such pairs of problems are very common in contemporary optimization.One example is the (BPσ) and (LSτ ). In Table 2.1 (borrowed from [1]), welist some other well-known applications which include gauge optimization,matrix completion, and robust elastic net regularization, etc.2.1 The level-set approachThe level-set approach states that the solution of (Pσ) can be found byexecuting a root-finding procedure on the nonlinear equationv(τ) = σ, (2.1)52.1. The level-set approachApplications Pσ QτBPDN minx‖x‖1 minx‖Ax− b‖2s.t. ‖Ax− b‖2 ≤ σ s.t. ‖x‖1 ≤ τgauge optimization minxκ(x) minxγ(Ax− b)(κ, γ: gauge func) s.t. γ(Ax− b) ≤ σ s.t. κ(x) ≤ τmatrix minX‖X‖∗ minX‖AX − b‖2completion s.t. ‖AX − b‖2 ≤ σ s.t. ‖X‖∗ ≤ τrobust minxα‖x‖1 + β‖x‖2 minx‖Ax− b‖2elastic-net s.t. ‖Ax− b‖2 ≤ σ s.t. α‖x‖1 + β‖x‖2 ≤ τTable 2.1: Some typical applications of the level-set method [1].where v(τ) is the value function of (Qτ ). This v(τ) is an univariate functiondefined byv(τ) := minx∈X{ρ(Ax− b) | φ(x) ≤ τ}.It is called the value function of (Qτ ) since it gives the optimal value of (Qτ )for a given parameter τ . In order to understand the level-set approach, wefirst study some important properties of this value function.Proposition 2.1.1. Given a convex set X , and two closed convex functionsρ and φ, the univariate function defined byv(τ) = minx∈X{ρ(Ax− b) | φ(x) ≤ τ},is non-increasing and convex.Proof. As the feasible region increases as we increase τ , the minimizer ina larger feasible set must be no bigger than the one attained in a smallerfeasible set. Therefore, the value function v(τ) is non-increasing.We now show the convexity of v(τ). We first recast v(τ) intov(τ) = minx{ρ(Ax− b) + δepi(φ)(x, τ) + δX (x)}.62.1. The level-set approachLet f(x, τ) := ρ(Ax − b) + δepi(φ)(x, τ) + δX (x). This function f is jointlyconvex in (x, τ) because it is the sum of three convex functions. In moredetail, the first function ρ(Ax − b) is convex in x as given. We know thatthe indicator function of a set is convex if and only if the set is convex [10,Exercise 2.18]. This fact yields the convexity of the third function δX (x)as X is convex. Moreover, since the function φ is convex, its epigraph isconvex by the definition of a convex function. Hence, the third function isconvex in (x, τ). Finally, since f(x, τ) is convex in both variables, its infimalprojection v(τ) is convex [7, Equation 3.16].Let Poptσ denote the optimal value of the problem (Pσ) given a parameterσ. With a mild assumption that the constraint ρ(Ax− b) ≤ σ is active (i.e.,satisfied with equality) at the optimal solution of Pσ, the following theoremshows that the value τ∗σ := Poptσ is the smallest τ that satisfiesv(τ∗σ) = σ.Proposition 2.1.2. Let us denote by Poptσ the optimal value of the problem(Pσ) for a given σ > 0. For that given σ, assume there exists a τ oσ such thatv(τ oσ) = σ, and τoσ = min{τ : v(τ) = σ}.Then τ oσ = Poptσ .Proof. Knowing that v(τ) is non-increasing and convex, we might have thecase as illustrated below. Our goal is to show that two points τ oσ and τ∗σcoincide.Figure 2.1: An illustration of the goal of this proof.72.1. The level-set approachWe first show that τ oσ ≤ τ∗σ . Let x∗ be a minimizer to (Pσ), i.e.,x∗ ∈ argminx∈X{φ(x) | ρ(Ax− b) ≤ σ},then φ(x∗) = τ∗σ . Thus,v(τ∗σ) = minx∈X{ρ(Ax− b) | φ(x) ≤ τ∗σ}(i)≤ ρ(Ax∗ − b)(ii)≤ σ = v(τ oσ),where (i) comes from the fact v(τ∗σ) is the minimal value, and (ii) is becausethat x∗ is feasible to (Pσ). Moreover, since v(τ) is a nonincreasing functionin τ , we conclude thatτ oσ ≤ τ∗σ .Conversely, we want to prove τ oσ ≥ τ∗σ . Since v(τ oσ) = σ, let xo be aminimizer that attained that minimal valuexo ∈ argminx∈X{ρ(Ax− b) | φ(x) ≤ τ oσ},then xo satisfiesρ(Axo − b) = σ.That means xo is a feasible point to (Pσ). Therefore, we haveτ∗σ(i)≤ φ(xo)(ii)≤ τ oσ ,where (i) comes from the fact τ∗σ is the minima of (Pσ), and (ii) is by theconstraint in v(τ oσ). This completes our proof of both sides.An alternative proof can be shown using the strong duality [12].This theorem provides a natural and intrinsic characterization of thelevel-set method. Finding the optimal value of (Pσ) is equivalent to findinga root τ in the equation (2.1). Moreover, it immediately follows from thetheorem that for all τ ∈ (0, τ∗σ), v(τ) will return an optimal solution x thatsatisfiesφ(x) ≤ Poptσ = τ∗σ and ρ(Ax− b) ≤ σ + , (2.2)where  > 0. A point x that satisfies (2.2) is called super-optimal and-infeasible, respectively.82.2. Root-finding procedure2.1.1 Warm-startsWe thus reduce the problem (Pσ) to solve (2.1) effectively for a sequence ofmonotonically increasing τ . One practical approach is to solve it iteratively,by using the approximate solution xτk−1 from the previous iteration withτk−1 as a warm-start to v(τk). We notice this warm-start is practicablebecause the approximate minimizer xτk−1 from the last iteration v(τk−1) isindeed a feasible point in the next iteration for v(τk). That is becauseφ(xτk−1) ≤ τk−1 ≤ τk,where the second inequality is because we take a monotonically increasingsequence {τk}k=1. With the warm-start, the overall computational complex-ity is much smaller than kC, where k is the number of root-finding stepsand C is the cost of evaluating v(τk) in each step.2.2 Root-finding procedureThe core idea of this section is to present a way to solve the root-findingproblem arising in the level-set approach. Throughout this section, we fixone particular σ, and let w(τ) := v(τ)− σ, and denote τ∗ the root of w(τ).Each algorithm we describe below is dependent on one oracle; thus theoverall computational complexity of the algorithm relies on the oracle aswell. It is important to note that both algorithms terminate when0 ≤ w(τ) ≤ ,where  > 0 is a preassigned tolerance. This yields a point τ ≤ τ∗ thatproduces a super-optimal and -infeasible solution x satisfying (2.2).We discuss a root-finding method called the inexact Newton method. Itis inspired from the classical Newton’s method. The difference is that it isusing an approximate function value at each iteration. It is illustrated inFigure 2.2.This inexact Newton method has a nested loop. The inner loop is alsoan iterative process that evaluates the value function v(τ) approximately,and this iterative process is provided by an oracle. This oracle takes in a τk,and outputs a lower bound `k and a upper bound uk that are relatively closeto each other, i.e., uk/`k ≤ α, where α ∈ (1, 2) is a user-input constant. Italso outputs a slope sk, combined with `k, it gives a lower minorant to thefunction v(τk). The outer loop then uses the oracle outputs to compute thenext iterate τk+1. We present below a pseudocode of this inexact approach.92.2. Root-finding procedureAlgorithm 1: Pseudocode of inexact Newton method.Input : τ0[outer loop]1 for k = 1, 2, . . . do[inner loop: evaluate v(τ) approximately]2 (`k, uk, sk) = O(τk)[end inner loop]3 update τk[end outer loop]4 return τkFigure 2.2: Plots in the first row show the first iteration of the inexactNewton method (on the left), and the classical Newton’s method (on theright). The inexact method use the approximate function values while theclassical methods compute the true ones.2.2.1 Inexact Newton methodThe proposed inexact Newton’s method works similarly as the standardNewton’s method. Rather than evaluating the exact w′(τ) at each iteration,it constructs an affine minorant w. The formal algorithm 2 and the definitionof the oracle are presented as follows.102.2. Root-finding procedureAlgorithm 2: Inexact Newton method [1]Input : w : R+ → R: a decreasing convex function;Onew,w: an affine minorant oracle; > 0: accuracy;τ0: initial point where f(τ0) > 0α ∈ (1, 2): a constantOutput: τ∗1 u−1 ← +∞2 k ← 03 while uk−1 >  do4 (`k, uk, sk)← Onew,w(τk, α)[Update τ]5 uk ← min{uk, uk−1}6 τk+1 ← τk − `k/sk[end of the update]7 k ← k + 18 end9 return τkDefinition 1 (Affine minorant oracle [1]). For a function w : R+ → R,an affine minorant oracle is a mapping Onew,w that assigns to each pair(τ, α) ∈ [w > 0]× [1,+∞) real numbers (`, u, s) such that 0 < ` ≤ w(τ) ≤ uand u/` ≤ α, and the affine function τ ′ → ` + s(τ ′ − τ) globally minorizesw.The next theorem states the global convergence for this method.Theorem 2.2.1 (Linear convergence of the inexact Newton method [1]).The inexact Newton method terminates after at mostk ≤ max{1 + log2/α(2C/), 2}iterations, where C = max{|s0|(τ∗ − τ1), `0}.The total complexity of the inexact Newton is at mostCOnew,w × k,where COnew,w is the cost of the Newton oracle Onew,w. This total complexitycan be reduced intensively is we use warm-starts.112.2. Root-finding procedure2.2.2 OracleThere are many first-order methods that might serve as an oracle for eval-uating w(τ). For example, Frank-Wolfe [14, 18], also known as conditionalgradient method, provides a global minorant to the function value v, and theapproximate derivative at each point. These can be used as lower bound` and approximate derivative s. Other examples include some projectedsubgradient [2], cutting-plane [19], and primal-dual [8] methods.2.2.3 Feasibility of the solutionIn Sections 2.2.1, we introduce the inexact Newton method, and state thatthe method output a super-optimal and -infeasible solution x. To attainthe feasibility, we require to project it over the constraint set F := {x ∈X | ρ(Ax− b) ≤ σ}. In general, there is no efficient approach for computingthis projection because of the composite structure of φ. To circumvent theproblem of the feasibility of the solution, one inexpensive approach is toperform a radial-projection. This method requires to the computation of apoint that is strictly feasible for the constraint of the problem [22].12Chapter 3Spectral Projected Gradientfor L1 minimization (SPGL1)In the previous chapter, we introduced the level-set method that can beapplied to solve the basis pursuit denoising problem by solving a sequenceof LASSO problems. It then becomes clear that we require to an effectiveoracle (affine minorant oracle for Newton method) to solve LASSO in orderfor this method to work. However, solving the problem exactly is expensiveand may not be necessary.In 2008, van de Berg and Friedlander [25] propose the spectral projectedgradient for `1-minimization (SPGL1). The SPGL1 is able to solve two typesof problems. The first type is LASSO. SPGL1 contains an iterative oraclethat provides an approximate solution to LASSO. This oracle is namedSPGL1 for LASSO. The second type is BPDN. SPGL1 is using the SPGL1for LASSO with the inexact Newton method to solve BPDN. The rest ofthe thesis will be focused primarily on SPGL1 for LASSO.The roadmap is the following. First, in Section 3.1, we describe the dualproblem of LASSO, and the necessary and sufficient optimality conditionsfor the primal-dual solution. From the previous chapter, we know that theLASSO problem can be characterized by a value function. In Section 3.2, westate the differentiable property of this value function. More importantly,we disclose how this property related to the inexact Newton’s oracle wementioned in Section 2.2.1. Next, we outline the algorithm of the oracleSPGL1 for LASSO. Finally, we discuss the challenge of the current algorithm.See also Figure 3.2 for a conceptual map of the notions and results presentedin this chapter.3.1 The dual problem of LASSOWe can write (LSτ ) equivalently as(LSτ ) minimizex,r‖r‖2 subject to b−Ax = r, ‖x‖1 ≤ τ,133.2. Relation between value function and inexact Newton oraclewhich has the dual problem(DLSτ ) maximizey,λbT y − τλ subject to ‖y‖2 ≤ 1, ‖AT y‖∞ ≤ λ.Denote the solution to primal problem as xτ , and the dual solutions as yτand λτ . Feasible dual variables y and λ can be computed directly from afeasible primal variable r [25],y =r‖r‖2 , λ = ‖AT y‖∞. (3.1)The second equality in 3.1 comes from the tightness of bound in (DLSτ ). Ifthe bound is not tight, we can always choose λ = ‖AT y‖∞ to reduce thedual objective value in (DLSτ ).In addition to these relations, we also have the following the necessaryand sufficient optimality conditions [25] for the primal-dual solution:Ax+ r = b, ‖x‖1 ≤ τ, (primal feasibility)‖AT r‖∞ ≤ λ‖r‖2, (dual feasibility),λ(‖x‖ − τ) = 0, (complementarity).3.2 Relation between value function and inexactNewton oracleIn the last chapter, we mentioned that BPDN can be solved by iterativelyperforming the left-most root-finding procedure on (2.1), where v(τ) is thenon-increasing and convex value function of LASSO,v(τ) := minx∈Rn{‖Ax− b‖ | ‖x‖1 ≤ τ}, τ ≥ 0.More specifically, if we choose to use the inexact Newton’s method to dothe root-finding, we require an oracle that provides us the upper bound uk,lower bound `k, and an approximate derivative sk to the value function v(τk)at each iteration k. From this point onward, we let k denote the index foriterative inexact Newton method.Let i denote the index for the iterative oracle SPGL1 for LASSO. Weknow that for a given τk, if we get an approximate solution xik from (LSτ ),then we have an upper bounduik = ‖riτk‖2 ≥ ‖rτk‖2 > 0.143.2. Relation between value function and inexact Newton oracleOn the other hand, by the weak duality, this yields a lower bound`ik = bT yiτk − τλiτk ≤ ‖riτk‖2 ≤ ‖rτk‖2.An illustrative figure is shown below.Figure 3.1: This plot illustrates the first two iterations of the oracle SPGL1for LASSO. At each iteration i, we have an upper bound got from the primalvalue and a lower bound got from the dual value.Moreover, the value function is differentiable, and its first derivative isrelated to the dual variable. This is stated in the following result.Theorem 3.2.1 ([25]). Let v(τ) be the value function of LASSO. Then forall τ ∈ (0, τσ), v is continuously differentiable, withv′(τ) = −λτ ,where the optimal dual variable λτ = ‖AT yτ‖∞, and yτ = rτ/‖rτ‖2.With this relation and (3.1), we could have an approximate uik, `ik, andsik computed asuik = ‖riτk‖2, `ik =bT riτk‖riτk‖2, and sik = −‖AT riτk‖∞‖riτk‖2.Consider the duality gapδiτk = uik − `ik,153.3. SPGL1 for LASSO algorithmwhich is always nonnegative because of the weak duality. This quantity canbe used to measure the accuracy of the approximate value. That is,uik − v(τk) < δiτk and |sik − v′(τk)| < γδiτk ,where γ is some positive constant related to the condition number of A.The SPGL1 implementation is using a slightly different iteration updatethan the normal inexact Newton method. Let us first denote uk, `k, and skas the final iterate from SPGL1 for LASSO for a given τk. The correspondingduality gap is denoted by δk. In the inexact Newton method we mentionedearlier in Section 2.2.2, the Newton update is τk+1 ← τk− `k/sk. The SPGL1implementation is instead using the uk, i.e.,τk+1 = τk − (uk − σ)/sk.Because of this implementation, SPGL1 does not inherit the linear con-vergence that is stated in Theorem 2.2.1. However, it still owns a localconvergence dependent on the duality gap δk that is described as follows.Theorem 3.2.2 ([25]). Suppose that A has full rank, σ ∈ (0, ‖b‖2) andδk = δτk → 0, as i→∞. Then if τ0 is close enough to τσ, the iteration withv and v′ replaced by v¯ and v¯′ to generate a sequence τk → τσ that satisfies|τk+1 − τσ| = γδk + ηk|τk − τσ|,where ηk → 0 and γ is a positive constant.In the case where LASSO is solved exactly, (i.e., δk = 0), we still retainthe superlinear convergence as expected in a normal Newton iteration. Inother cases, it relies on how fast δk converges to 0 as i approaches infinity.3.3 SPGL1 for LASSO algorithmAlgorithm 3 summarizes the SPGL1 for LASSO algorithm.163.3. SPGL1 for LASSO algorithmAlgorithm 3: SPGL1 for LASSO [25, Algorithm 1]Input : x, τ , δOutput: xτ , rτ1 Set minimum and maximum steplengths 0 < αmin < αmax2 Set initial step length α0 ∈ [αmin, αmax] and sufficient descentparameter γ ∈ (0, 1).3 Set an integer line search history length M ≥ 1.4 Set initial iterates: x0 ← Pτ [x], r0 ← b−Ax0, g0 ← −AT r05 i← 06 begin7 δi ← ‖ri‖2 − (bT ri − τ‖gi‖∞)/‖ri‖2 [compute duality gap]8 if δi < δ then9 break [exit if duality gap is small enough]10 end11 α← αi12 begin13 x¯← Pτ [xi − αgi] [candidate line search iterate]14 r¯ = b−Ax¯ [update the corresponding residual]15 if ‖r¯‖22 ≤ maxj∈[0,min{k,M−1}] ‖ri−j‖22 + γ(x¯− xi)T gi then16 break [exit line search]17 else18 α← α/2 [decrease step length]19 end20 end21 xi+1 ← x¯, ri+1 ← r¯, gi+1 ← −AT ri+122 ∆x← xi+1 − xi, ∆g ← gi+1 − gi [update iterates]23 if ∆xT∆g ≤ 0 [update the Barzilai-Borwein steplength]24 then25 αi+1 ← imax26 else27 αi+1 ← min{αmax,max[αmin, (∆xT∆x)/(∆xT∆g)]}28 end29 i← i+ 130 end31 return xτ ← xi, rτ ← riWe now give a brief explanation about the steps in this algorithm. ThisSPGL1 for LASSO procedure is motivated from the algorithm SPG1 by Bir-173.4. Discussiongin, Martinez, and Raydan [6, Algorithm 2.1]. In step 13, we require toorthogonal project the iterates onto the one-norm feasible set, i.e.,Pτ [xi − αgi] = argminx¯{‖xi − αgi − x¯‖2 : ‖x¯‖1 ≤ τ} .The variable g = −AT (b − Ax) is the steepest descent direction of theobjective ‖Ax−b‖2 in LASSO. In order to conduct this one-norm projectioneffectively, in their paper, Van de Berg and Friedlander describe an algorithm[25, Section 4.2], whose worst-case complexity is of O(n log n).Steps 15-19 describes a nonmonotone line search procedure. This tech-nique is first proposed by Grippo, Lampariello, and Lucidi [15]. The condi-tion in step 15 allows an increase of the objective function value at each step,but requires an sufficient decrease on the maximum of objective function inevery min{k,M − 1} iterations. This nonmonotone line search is named assuch because of this property. With this sufficient decrease, this line searchthus maintains the global convergence.This nonmonotone line search is naturally combined with the Barzilai-Borwein steps (BB-steps) [3]. In the line 23-28, the BB-step is computedand is used to update the next iterate in line 13.3.4 DiscussionIn conclusion, the steps in SPGL1 for LASSO can be summarized in Figure3.2.Iterate x(i)r(i) = b− Ax(i)g(i) = − ATr(i)δ(i) = ∥r(i)∥2 −bTr(i) − τ∥g(i)∥∞∥r(i)∥2δ(i)ExitIf smallUpdate iterateLinesearch with BB-stepFigure 3.2: Conceptual map of the algorithm SPGL1 for LASSO presentedin the previous section.We observe that the current duality gap δik (ith iteration in LASSOsolving with τk) does not monotonically decrease at each iteration i. To183.4. Discussionillustrate that, we present two examples below. The first is an example forwhich the primal value decreases but the duality gap increases.Example 1. Let A = I be an identity matrix, b =(1 0)T, and τ = 1. Theprimal and dual problems areminimizex,r‖r‖2 subject to x− b = r, ‖x‖1 ≤ τ,maximizey,λbT y − τλ subject to ‖y‖2 ≤ 1, ‖y‖∞ ≤ λ.Let r1 and r2 be the first and second iterations, respectively, and ‖r2‖2 <‖r1‖2. Choose r1 =(10.25 0)Tand r2 =(0 10)T. Then their correspond-ing duality gaps areδ1 = ‖r1‖2 − (bT y1 − τλ1) = ‖r1‖2 − bT r1‖r1‖2 + ‖r1‖∞ = 19.5,δ2 = ‖r2‖2 − (bT y2 − τλ2) = ‖r2‖2 − bT r2‖r2‖2 + ‖r2‖∞ = 20 > δ1.Example 2. In this example, we consider a case where A ∈ R50×128 andb ∈ R50 have normally distributed entries with mean 0 and variance 1, andτ = 15. We use SPGL1 for LASSO to solveminx‖Ax− b‖2 subject to ‖x‖1 ≤ τ.Below are the plots of the primal and dual values, and the duality gap ateach iteration.193.4. DiscussionFigure 3.3: Top: primal and dual values of LASSO computed by SPGL1 forLASSO at each iteration. We observe that the dual value does not monoton-ically decrease. Bottom: the duality gap at each iteration, and this valuedoes not decrease monotonically as well.20Chapter 4An accelerated dual methodIn this chapter, we present a new approach that is able to estimate theprimal-dual gap more accurately. As we shown in Figure 3.3, the trajectoryof dual objective fluctuates a lot in the first 30 iterations and then it startedto converge. That is because the approximated dual solutions yi, λi arecomputed directly from ri using the relation 3.1. They are only dual feasiblesolutions, not the dual optimal. Moreover, since they only depend on ri,all these residuals we computed in the previous iterations does not helpthe current dual objective calculation. In other words, the dual objectivecalculated at each iteration is independent, and the one calculated in the ithiteration does not help with later computations. We now show an accelerateddual method so that it uses each residual calculation to greater advantage.It provides a more accurate estimate to the dual objective. In addition, itguarantees the dual objective increases monotonically. Therefore, we areable to obtain a more accurate estimate of the primal-dual gap from thisdual objective. At the end of this chapter, we provide an algorithm to solveBPDN with this accelerated method.To begin with, we first recast (LSτ ) to the following form(LSQPτ ) minimizex∈Rn,r∈Rm12‖r‖22 subject to r = b−Ax, ‖x‖1 ≤ τ.Note that this formulation and (LSτ ) share the same minimizer (x∗, r∗)because of the quadratic function ‖r‖22 is monotone in ‖r‖2 ≥ 0. The La-grangian dual is also a quadratic optimization problem,(DLSQPτ ) minimizey,λ12yT y − bT y + τλ subject to ‖AT y‖∞ ≤ λ.To reduce the cost of solving the dual problem exactly, instead we con-sider optimizing it in a reduced subspace. Consider y restricted to the rangeof a matrix B ∈ Rp, p m, i.e., y = Bc.Then the dual problem becomesminimizec,λ12‖Bc‖2 − bTBc+ τλ subject to ‖ATBc‖∞ ≤ λ. (4.1)214.1. Construction of matrix BThis problem is easier than (DLSQPτ ) as we only find solutions on a reducedsubspace spanned by the columns of B, rather than the entire space Rm.4.1 Construction of matrix BThe low-dimensional matrix Bi is of dimension p, where p is a preassignedwindow size. The first (p−2) columns are called a base set. The second lastcolumn is the current residual, and the last column is the largest dual solu-tions ybest that corresponds to the largest dual objective. The acceleratedmethod is summarized in Algorithm 5.In the first p − 2 iterations, we construct the approximations of dualvariables just as the standard SPGL1,yi = ri and λi = ‖AT yi‖∞.We ensemble the residual ri to Bi, so the matrix at (p−2)-th iteration lookslikeBp−2 =[r1 . . . rp−2 | 0 0].Started from the (p − 1)-th iteration, we start to apply the acceleratedmethod. This method involves two steps: solve the dual problem 4.1 andupdate the matrix B. There are two parts needed to be updated. The baseset gets updated whenever we obtain a larger dual objective. For example, atiteration i, the dual objective is larger than all previous dual objectives, thenwe replace the oldest residual with the current residual ri. This replacementis achieved by a circular buffer. In addition to that, we also update ybest tobe yi, and put it in the last column. These yield a new matrix Bi+1. Toformalize it mathematically, the columns of Bi takes a form ofBi =[ri1 . . . rip−2 | ri ybest], ∀i ≥ p− 2,where{i1, . . . , ip−2} = argminI, |I|=p−2∑i∈I∣∣∣∣12(y∗i )T y∗i − bT y∗i + τλ∗i∣∣∣∣ ,and (y∗i , λ∗i ) is the optimal dual solutions in the ith iteration.224.1. Construction of matrix BAlgorithm 4: An accelerated dual method for SPGL1: accelDualInput : A, b, ri, τ1 if i = 0 then2 ybest ← 03 dualMax ← −∞4 B ← [0 . . . 0 | 0 0]5 else if 1 ≤ i ≤ p− 2 then6 yi ← ri, λi ← ‖AT yi‖∞7 δi ← rTi ri − bT yi + τλi8 dualMax ← max(dualMax, δi)9 B(i)← ri [add residual to B]10 else if i ≥ p− 1 then11 B(p− 1)← ri [add current ri to second last col]12 B(p)← ybest [add ybest to last col]13 yacc ← qpsolver(B) [solve dual problem]14 λacc ← ‖AT yacc‖∞15 dual ← −1/2yTaccyacc + bT yacc − τλacc16 δi ← 1/2rTi ri− dual [compute duality gap]17 if dual > dualMax then18 B(mod(i, p− 2)) ← ri [update the base set in B]19 ybest ← yacc [update best dual solution]20 end21 end22 return δi, yiThis construction has two main advantages. First, it guarantees themonotonic increase of the dual objective. That is, the dual objective ob-tained in the ith iteration by Bi is at least as good as the largest dualobjective obtained in the previous iterations. This is because we have ybestin the last column in Bi. When we use Bi to solve (4.1), notice thatc =[0 . . . 0 | 0 1]is always a feasible point. By choosing this c, we arrivey = Bic = ybest,which gives the largest dual objective in the first i−1 iterations. Therefore,one guaranteed to have a monotone increase of the dual objective. Second,234.1. Construction of matrix Bthis construction guarantees that the dual objective obtained from our ac-celerated method is comparable to the one obtained from the original SPGL1at each iteration. Recall that the original SPGL1 only arrives a feasible dualvariable yi = ri at iteration i. However, the matrix Bi includes ri in thesecond last column. When we optimize (4.1) using Bi, we always have afeasible solutionc =[0 . . . 0 | 1 0] ,and y = Bic = ri. Thus, dual value computed using this y is no greaterthan the optimal dual in (4.1). That is,12‖y‖2 − bT y + τ‖AT y‖∞ ≤ OPT,where OPT denotes the optimal dual objective in (4.1).There are two potentially expensive components in (4.1). One arises inthe constraint where we have to conduct a matrix-matrix products ATBi.The other is the matrix-vector multiplication Bic in the objective. We argue,in the following two sections, that by using different matrices in (4.1), onecan possibly reduce the computation of one component, but not both.4.1.1 Analysis of the general matrix BIf we use the matrix Bi constructed as above, then we can avoid a lotof computation in the matrix multiplication ATBi. Treating the matrixA as an operator, then ATBi is simply just a sequence of matrix-vectormultiplication on the columns of Bi. That is,ATBi =[AT ri1 . . . AT rip−2 | AT ri AT ybest].Since we already have AT ri computed at each iteration i, the only one werequire to calculate is AT ybest. Thus, constructing ATBi is inexpensive.Moreover, the storage cost of[AT ri1 . . . AT rip−2 | AT ri]is roughly n(p−1). This storage cost is small compared to the computationcost of ATBi.We now focus on how this affects to the quadratic part in the dualobjective in 4.1. The quadratic part is ‖Bic‖2 = cTBTBc, which requires tocompute the Hessian BTB at each iteration.244.2. Different solvers for dual problem4.1.2 Analysis of orthogonal matrix QWe apply a reduced-QR factorization to B to get an orthogonal matrix Q(i.e., B = QR). Using this Q helps us to reduce the quadratic part in 4.1 tominc,λ12‖c‖2 − bTQc+ τλ subject to ‖ATQc‖∞ ≤ λ,since QTQ = I.This orthogonal matrix Q leads to an identity Hessian, but we need tocalculate ATQ.4.2 Different solvers for dual problemSince our dual problem (4.1) is a quadratic problem, we present two solversfor this QP. We give a description of each solver, as well as present how toformulate our problem into a form that accepted by the solver.4.2.1 PDCOPDCO [23] is a popular free and open-source solver implemented in MATLAB.It uses a primal-dual interior method to solve a convex objective functionwith linear constraints.PDCO is able to solve problems of the following form:minx,rφ(x) +12‖D1x‖2 + 12‖r‖2 subject toAx+D2r = bbl ≤ x ≤ bur unconstrained.Since PDCO only takes in equality constraints and element-wise bound con-straints, we add two slack variables s1, s2, and our problem becomesmin12[cT λT sT1 sT2]︸ ︷︷ ︸(2n+1+p)I 0 . . . 00 . . . 00 . . . 00 . . . 0︸ ︷︷ ︸(2n+1+p)×(2n+1+p)cλs1s2+−QT bτ00T cλs1s2subject to[−ATQ −e I 0ATQ −e 0 I]︸ ︷︷ ︸2n×(2n+1+p)cλs1s2 = 0,254.2. Different solvers for dual problem−∞−∞00 ≤cλs1s2 ≤∞∞∞∞ .Consider the new variable x,x =[cT λT sT1 sT2].We now fit our problem to the PDCO formats. We put the quadratic partinto the primal regularization term 12‖D1x‖22. Here we require d1 ∈ R2n+1+pand D1 = diag(d1). Thus,φ(x) =−QT bτ00T cλs1s2 ,with primal regularization termd1 =[1 · · · 1 0 · · · 0]T ∈ R2n+1+p,where first p entries are 1, and D1 = diag(d1).4.2.2 GurobiThe Gurobi Optimizer [17] is a commercial solver designed to tackle math-ematical optimization problems including quadratic programming, linearprogramming, and mixed-integer programming, etc. It is written in C andfeatures several interfaces including MATLAB.The Gurobi MATLAB interface requires the problems to be of the followingform:minimize xTPx+ cTx+ αsubject to Mx = b (linear constraints)l ≤ x ≤ u (bound constraints).In a similar manner, we tailor our problem to fit the form withP :=I 0 . . . 00 . . . 00 . . . 00 . . . 0 ,︸ ︷︷ ︸(2n+1+p)×(2n+1+p)x :=cλs1s2 , c :=−QT bτ00 ;264.3. Measure quantityM :=[−ATQ −e I 0ATQ −e 0 I]︸ ︷︷ ︸2n×(2n+1+p), b := 0 ∈ R2n,l :=−∞−∞00 , u :=∞∞∞∞ .4.3 Measure quantityHere we use the average mutual coherence to measure the dependence ofcolumns of our final matrix Bi. The average mutual coherence v of a matrixR is defined asv(R) =1n(n− 1) max∣∣∣∣∣∣n∑i,j=1,j 6=i〈ui, uj〉∣∣∣∣∣∣ ,where n is the number of columns in R, and ui, i ∈ {1, . . . , n} is a normalizedith column inR. Notice that the formula contains a scalar 1n(n−1) , that comesfrom the fact that there are n(n− 1) inner products between n columns.We show below the expected mutual coherence for the unconstrainedproblem is approximately 1, dependent on the stepsize. Suppose all pointsare feasible, i.e., satisfies ‖x‖1 ≤ τ . Consider the unconstrained minimiza-tion problemminx12‖Ax− b‖2.The gradient descent update isxi+1 = xi − α∇f(xi) = xi − α(AT (Ax− b)) = xi + αAT ri,where ri = b−Axi. Then−ri+1 = Axi+1 − b = A(xi + αAT ri)− b = −ri + αAAT ri.Thus〈ri+1, ri〉 = 〈ri − αAAT ri, ri〉 = 〈(I − αAAT )ri, ri〉.We know that〈ri+1, ri〉 = 〈(I − αAAT )ri, ri〉 ≤ ‖(I − αAAT )ri‖‖ri‖ ≤ ‖I − αAAT ‖‖ri‖2,274.4. Interpretationthe first inequality comes from Cauchy-Schwarz inequality and the secondcomes from the definition of matrix norm.Therefore, if α is small, then 〈ri+1, ri〉 ≈ ‖ri‖2. For our matrix B, theaverage mutual coherence of a normalized columns shall be roughly 1 forsmall stepsize α.4.4 InterpretationNow consider y = Bc, then the dual problem becomesminc,λ12‖Bc‖2 − bTBc+ τλ subject to − λe ≤ ATBc ≤ λe.The “reduced primal” problem corresponding to this dual isminc,λ12‖BT (Ax− b)‖22 subject to ‖x‖1 ≤ τ.If B is an orthogonal matrix, this equation can be interpreted as projectingcolumns of A to the subspace spanned by the orthogonal columns of B.4.5 SPGL1 with an accelerated methodWe present an pseudocode of solving BPDN (BPσ) using SPGL1 with anaccelerated method.284.5. SPGL1 with an accelerated methodAlgorithm 5: Solve BPDN using SPGL1 with the accelerated dualmethodInput : A, b, tol1 τo ← 0[Outer loop: root-finding]2 for k = 1, . . . do[Inner loop: solve LASSO]3 while δi > tol do4 compute xi5 ri ← b−Axi[accelerated dual method: Algorithm 5]6 (δi, yi)← accelDual(A, b, ri, τk)7 k ← i+ 18 compute xi+1 using nonmonotone linesearch and BB-step9 end[end of inner loop: output u = ‖ri‖2, s = −‖AT yi‖∞]10 uk ← u, sk ← s11 update τk ← τk − (uk − σ)/sk12 end13 return τk29Chapter 5Experimental resultsAll the numerical experiments presented in this section have been performedusing MATLAB 2017b on a Macbook Pro equipped with a 2.8 GHz IntelCore i7 CPU and 16 GB of memory. while the specialized solvers againstwhich we compare our results in Chapter 5 were kindly provided by someof their authors.5.1 Test problems generationIn our experiments, we focus on hard test cases where the original SPGL1 forLASSO runs more than requires over 1000 iterations to solve. The difficultyof the problem is dependent on the choice of τ . Hard problems normallyoccur when τ is very close to the true root τ∗σ .To generate hard problems, we first choose a value σ, and solve BPDNusing SPGL1. This gives us an approximate τ is very close to the true rootτ∗σ . Then we hard code τ to finalize a value of τ so that the original SPGL1for LASSO takes a great number of iterations to solve.5.2 Numerical experiments and discussionTable 5.1 presents the numerical results. In this table, the first column isthe description of the problem. Both sensing matrix A and measurementvector b have normally distributed random entries with mean 0 and variance1. The second column shows the number of iteration each method used.The first row is the original SPGL1 for LASSO, we use it as the benchmark.The second row is the hybrid SPGL1 for LASSO. The rest of rows showour accelerated method with different quadratic programming solvers. Themaximal iterations were set to be 106. Finally, the last two columns indicatethe total running times of each methods. For the accelerated method, wealso display the performance with respect to the running times of each solver.As shown in the table, our accelerated method successfully reduce theiteration cost compared to the original SPGL1 for LASSO. Especially in the305.2. Numerical experiments and discussionsecond problem, our method with Gurobi qp solver is able to reduce theiteration cost by approximately a factor of 4.In general, our accelerated dual method outperforms the original SPGL1for LASSO in terms of the iteration cost. However, it increases the overalltime complexity.Problem No. of iterations Total time (sec) Solver time (sec)A : 512× 1024 original: 9672 9b : 512× 1 PDCO: 3590 829.9 806.6τ = 22 Gurobi: 3191 579.3 168.5A : 256× 1024 original: 42409 20.0b : 256× 1 PDCO: −−− −−− −−−τ = 11.1 Gurobi: 15051 2958.6 838.1A : 256× 1024 original: 5588 2.9b : 256× 1 PDCO: 3872 889.5 873.4τ = 10.9 Gurobi: 3642 658.1 185.1A : 256× 1024 original: 2168 1.0b : 256× 1 PDCO: 1987 466.9 458.6τ = 10.7 Gurobi: 1970 351.6 101.7A : 512× 768 original: 25763 15b : 512× 1 PDCO: 14519 2473.6 2402.8τ = 28 Gurobi: 14205 1781.3 608.2Table 5.1: Experimental results. Note that in the second problem, thedotted line in PDCO means that it is able to solve the problem within themaximum iteration limit.315.3. Future directions5.3 Future directionsWe observe from the experimental results that our accelerated method in-creases the overall time cost. This suggests that future work should focus onthe reduction of per-iteration cost. There are two possible directions. Onedirection is to find a cheaper QP solver, such as an active-set solver, or wecould use ADMM (alternating direction method of multipliers) to solve thisQP sub-problem. The other interesting direction is to reduce the frequencyof activating our accelerated method. Currently we are using acceleratedmethod at every iteration to solve the dual problem. Perhaps we could doit every some iterations, or frequent initially and less frequent in later oniterations.32Bibliography[1] Aleksandr Y Aravkin, James V Burke, Dmitry Drusvyatskiy, Michael PFriedlander, and Scott Roy. Level-set methods for convex optimization.Mathematical Programming, pages 1–32.[2] Francis Bach. Duality between subgradient and conditional gradientmethods. SIAM Journal on Optimization, 25(1):115–129, 2015.[3] Jonathan Barzilai and Jonathan M. Borwein. Two-point step sizegradient methods. IMA J. Numer. Anal., 8(1):141–148, 1988. ISSN0272-4979. doi: 10.1093/imanum/8.1.141. URL https://doi.org/10.1093/imanum/8.1.141.[4] Stephen Becker, Je´roˆme Bobin, and Emmanuel J. Cande`s. Nesta: Afast and accurate first-order method for sparse recovery, 2009. URLhttps://statweb.stanford.edu/~candes/nesta/.[5] Stephen Becker, Je´roˆme Bobin, and Emmanuel J. Cande`s. NESTA:A fast and accurate first-order method for sparse recovery. SIAMJ. Imaging Sciences, 4(1):1–39, 2011. doi: 10.1137/090756855. URLhttps://doi.org/10.1137/090756855.[6] Ernesto G. Birgin, Jose´ Mario Mart´ınez, and Marcos Ray-dan. Nonmonotone spectral projected gradient methods on convexsets. SIAM J. Optim., 10(4):1196–1211, 2000. ISSN 1052-6234.doi: 10.1137/S1052623497330963. URL https://doi.org/10.1137/S1052623497330963.[7] Stephen Boyd and Lieven Vandenberghe. Convex Optimiza-tion. Cambridge University Press, March 2004. ISBN0521833787. URL http://www.amazon.com/exec/obidos/redirect?tag=citeulike-20&path=ASIN/0521833787.[8] Antonin Chambolle and Thomas Pock. A first-order primal-dualalgorithm for convex problems with applications to imaging. J.Math. Imaging Vision, 40(1):120–145, 2011. ISSN 0924-9907.33Bibliographydoi: 10.1007/s10851-010-0251-1. URL https://doi.org/10.1007/s10851-010-0251-1.[9] Scott Shaobing Chen, David L. Donoho, and Michael A. Saunders.Atomic decomposition by basis pursuit. SIAM J. Sci. Comput., 20(1):33–61, 1998. ISSN 1064-8275. doi: 10.1137/S1064827596304010. URLhttps://doi.org/10.1137/S1064827596304010.[10] Francis Clarke. Functional analysis, calculus of variations andoptimal control, volume 264 of Graduate Texts in Mathematics.Springer, London, 2013. ISBN 978-1-4471-4819-7; 978-1-4471-4820-3. doi: 10.1007/978-1-4471-4820-3. URL https://doi.org/10.1007/978-1-4471-4820-3.[11] Yin Zhang Elaine T. Hale, Wotao Yin. Fixed-point continuation (fpc),2007. URL https://www.caam.rice.edu/~optimization/L1/fpc/#abs.[12] R. Estrin and M. P. Friedlander. A perturbation view of level-set meth-ods for convex optimization. 2018.[13] Ma´rio AT Figueiredo, Robert D Nowak, and Stephen J Wright. Gra-dient projection for sparse reconstruction: Application to compressedsensing and other inverse problems. IEEE Journal of selected topics insignal processing, 1(4):586–597, 2007.[14] Marguerite Frank and Philip Wolfe. An algorithm for quadratic pro-gramming. Naval research logistics quarterly, 3(1-2):95–110, 1956.[15] L. Grippo, F. Lampariello, and S. Lucidi. A nonmonotone line searchtechnique for Newton’s method. SIAM J. Numer. Anal., 23(4):707–716,1986. ISSN 0036-1429. doi: 10.1137/0723046. URL https://doi.org/10.1137/0723046.[16] Elaine T Hale, Wotao Yin, and Yin Zhang. A fixed-point continuationmethod for l1-regularization with application to compressed sensing.Technical report, 2007.[17] Gurobi Optimization Inc. Gurobi optimizer. software, 2012. URL http://www.gurobi.com/documentation/8.1/refman.pdf.[18] Martin Jaggi. Revisiting frank-wolfe: Projection-free sparse convexoptimization. In ICML (1), pages 427–435, 2013.34Bibliography[19] James E Kelley, Jr. The cutting-plane method for solving convex pro-grams. Journal of the society for Industrial and Applied Mathematics,8(4):703–712, 1960.[20] Kwangmoo Koh, Seung-Jean Kim, and Stephen P. Boyd. An interior-point method for large-scale l1-regularized logistic regression. Journalof Machine Learning Research, 8:1519–1555, 2007. URL http://dl.acm.org/citation.cfm?id=1314550.[21] Yurii Nesterov. Smooth minimization of non-smooth functions. Math.Program., 103(1):127–152, 2005. doi: 10.1007/s10107-004-0552-5. URLhttps://doi.org/10.1007/s10107-004-0552-5.[22] James Renegar. A framework for applying subgradient methods toconic optimization problems. arXiv preprint arXiv:1503.02611, 2015.[23] Michael Saunders. Pdco: Primaldual interior method for convex objec-tives, 2010. URL http://web.stanford.edu/group/SOL/software/pdco/pdco.pdf.[24] Robert Tibshirani. Regression shrinkage and selection via the lasso: aretrospective. J. R. Stat. Soc. Ser. B Stat. Methodol., 73(3):273–282,2011. ISSN 1369-7412. doi: 10.1111/j.1467-9868.2011.00771.x. URLhttps://doi.org/10.1111/j.1467-9868.2011.00771.x.[25] Ewout van den Berg and Michael P. Friedlander. Probing the Paretofrontier for basis pursuit solutions. SIAM J. Sci. Comput., 31(2):890–912, 2008/09. ISSN 1064-8275. doi: 10.1137/080714488. URL https://doi.org/10.1137/080714488.[26] Ewout van den Berg and Michael P. Friedlander. Sparse optimizationwith least-squares constraints. SIAM J. Optim., 21(4):1201–1229, 2011.ISSN 1052-6234. doi: 10.1137/100785028. URL https://doi.org/10.1137/100785028.[27] Zaiwen Wen, Wotao Yin, Donald Goldfarb, and Yin Zhang. A fastalgorithm for sparse reconstruction based on shrinkage, subspace op-timization, and continuation. SIAM Journal on Scientific Computing,32(4):1832–1857, 2010.[28] Wotao Yin Zaiwen Wen. Fpc as (fixed-point continuation and activeset), 2008. URL https://www.caam.rice.edu/~optimization/L1/FPC_AS/.35