UBC Theses and Dissertations

UBC Theses Logo

UBC Theses and Dissertations

Bundle-type methods for dual atomic pursuit Fan, Zhenan 2019

Your browser doesn't seem to have a PDF viewer, please download the PDF to view this item.

Item Metadata

Download

Media
24-ubc_2019_september_fan_zhenan.pdf [ 538.46kB ]
Metadata
JSON: 24-1.0380495.json
JSON-LD: 24-1.0380495-ld.json
RDF/XML (Pretty): 24-1.0380495-rdf.xml
RDF/JSON: 24-1.0380495-rdf.json
Turtle: 24-1.0380495-turtle.txt
N-Triples: 24-1.0380495-rdf-ntriples.txt
Original Record: 24-1.0380495-source.json
Full Text
24-1.0380495-fulltext.txt
Citation
24-1.0380495.ris

Full Text

Bundle-type methods for dual atomicpursuitbyZhenan FanB.Sc., University of Toronto, 2017A THESIS SUBMITTED IN PARTIAL FULFILLMENT OFTHE REQUIREMENTS FOR THE DEGREE OFMASTER OF SCIENCEinThe Faculty of Graduate and Postdoctoral Studies(Computer Science)THE UNIVERSITY OF BRITISH COLUMBIA(Vancouver)August 2019c© Zhenan Fan 2019The following individuals certify that they have read, and recommend to the Faculty ofGraduate and Postdoctoral Studies for acceptance, the thesis entitled:Bundle-type methods for dual atomic pursuitsubmitted by Zhenan Fan in partial fulfillment of the requirements for the degree ofMaster of Science in Computer Science.Examining Committee:Michael P. Friedlander, Computer ScienceSupervisorBruce Shepherd, Computer ScienceSecond ReaderiiAbstractWe discuss a type of structured optimization problem called atomic pursuit, where the aimis to assemble a solution, using a given set of (possibly uncountably infinite) atoms, to fit amodel to data, popularized in compressed sensing and machine learning. The solutions toatomic pursuit problems can be formed as the sum of a few atoms, which means they lie onlow-dimensional facets of the convex hull of the atomic set and the gauges induced by theconvex hull of the atomic sets have been shown to promote atomic sparsity. Examples includewell-studied cases such as one norm promoting general sparsity and nuclear norm promotinglow rankness. In this thesis, we examine the gauge dual of atomic pursuit and show thatthe support of the primal solution can be recovered from the dual solution. In particular, atwo-stage algorithm based on gauge duality and bundle-type methods is proposed. The firststage discovers the optimal atomic support for the primal problem by solving a sequence ofapproximations of the dual problem using a bundle-type method, which geometrically canbe seen as producing a sequence of inscribing subsets of the atomic set. The second stagerecovers the approximate primal solution using the atoms discovered in the first stage. Weshow that these methods lead to efficient and scalable semidefinite optimization methodswith low-rank solutions, producing a fast, scalable, SDP solver for phase retrieval and graphproblems.iiiLay SummaryWe discuss a type of structured optimization problem where the aim is to assemble asolution, using a given set of (possibly uncountably infinite) atoms, to fit a model to data. Atwo-stage algorithm based on gauge duality and bundle-type methods is proposed. The firststage discovers the optimal atomic support for the primal problem by solving a sequence ofapproximations of the dual problem using a bundle-type method. The second stage recoversthe approximate primal solution using the atoms discovered in the first stage. The overallapproach leads to implementable and efficient algorithms for large problems.ivPrefaceThe work described in this paper is an original intellectual product of the author, Zhenan Fan,under the supervision of Professor Michael P. Friedlander. Part of Chapter 2 and Chapter 5is the result of additional collaboration with Dr. Yifan Sun during her post-doctoral careerin UBC. This work has been submitted to the Asilomar conference (2019).vTable of ContentsAbstract . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . iiiLay Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . ivPreface . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . vTable of Contents . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . viList of Tables . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . viiiList of Figures . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . ixList of Theorems . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . xList of Propositions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . xiList of Examples . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . xiiList of Definitions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . xiiiList of Algorithms . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . xivAcknowledgements . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . xv1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11.1 Roadmap . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12 Atomic pursuit . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 33 Gauge duality background . . . . . . . . . . . . . . . . . . . . . . . . . . . . 74 Examples . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 115 Dual atomic pursuit . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 186 Classical bundle methods . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 206.1 Cutting-plane method . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 216.2 Proximal bundle method . . . . . . . . . . . . . . . . . . . . . . . . . . . . 226.3 Level bundle method . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 23viTable of Contents7 Bundle-type two-stage algorithm . . . . . . . . . . . . . . . . . . . . . . . . 247.1 Polyhedral Atomic Set . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 277.2 Spectral Atomic Set . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 308 Experiments . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 348.1 Basis pursuit denoising . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 348.2 Phase retrieval . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 359 Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 38Bibliography . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 39viiList of Tables3.1 Commonly used sets of atoms and their gauge and support function represen-tations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10viiiList of Figures1.0.1 Atomic Decompostion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 22.0.1 Gauge induced by atoms . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 32.0.2 Support function induced by atoms . . . . . . . . . . . . . . . . . . . . . . . 53.0.1 Exposed face . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 105.0.1 Optimal support identification . . . . . . . . . . . . . . . . . . . . . . . . . 196.0.1 Cutting plane model . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 206.0.2 Inner approximation to the atomic set . . . . . . . . . . . . . . . . . . . . . . 218.1.1 Original signal vs Optimal signal computed by cvx vs Recovered signal. . . 348.1.2 Convergence behavior of dual objective. . . . . . . . . . . . . . . . . . . . . 358.1.3 Size of bundle and its intersection with the atomic support for both originalsignal and optimal signal. . . . . . . . . . . . . . . . . . . . . . . . . . . . . 358.2.1 Images recovered at intermediate iterations of the spectral bundle methodAlgorithm 3 for recovering a ground-truth image. . . . . . . . . . . . . . . 368.2.2 Convergence of objective value, primal infeasibility, and Hausdorff distanceto the optimal atomic support. . . . . . . . . . . . . . . . . . . . . . . . . . 37ixList of Theorems5.0.1 Theorem (Optimal support identification) . . . . . . . . . . . . . . . . . . . 187.0.3 Theorem (Convergence of Algorithm 1) . . . . . . . . . . . . . . . . . . . . 267.1.2 Theorem (Convergence of primal estimate) . . . . . . . . . . . . . . . . . . 287.1.3 Theorem (Finite optimal support identification) . . . . . . . . . . . . . . . . 297.2.4 Theorem (Spectral recovery guarantee) . . . . . . . . . . . . . . . . . . . . . 33xList of Propositions2.0.1 Proposition (Gauge equivalence) . . . . . . . . . . . . . . . . . . . . . . . . 33.0.1 Proposition (Properties of gauge, support function and gauge dual pair) . . 76.0.1 Proposition (Cutting-plane model for support functions) . . . . . . . . . . . . 217.0.1 Proposition (Solution set inclusion) . . . . . . . . . . . . . . . . . . . . . . . 247.1.1 Proposition (Polyhedral bundle update) . . . . . . . . . . . . . . . . . . . . 277.2.1 Proposition (Spectral cutting-plane model) . . . . . . . . . . . . . . . . . . 307.2.2 Proposition (Spectral bundle update) . . . . . . . . . . . . . . . . . . . . . . . 31xiList of Examples4.0.1 Example (One norm) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 114.0.2 Example (Nuclear norm) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 114.0.3 Example (Total variation) . . . . . . . . . . . . . . . . . . . . . . . . . . . . 124.0.4 Example (Group norms) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 134.0.5 Example (Trace norm for PSD matrices) . . . . . . . . . . . . . . . . . . . . 144.0.6 Example (Weighted trace norm for PSD matrices) . . . . . . . . . . . . . . 154.0.7 Example (Submodular optimization) . . . . . . . . . . . . . . . . . . . . . . 16xiiList of Definitions2.0.2 Definition (Atomic pursuit) . . . . . . . . . . . . . . . . . . . . . . . . . . . 42.0.3 Definition (Atomic support) . . . . . . . . . . . . . . . . . . . . . . . . . . . 42.0.4 Definition (Dual atomic pursuit) . . . . . . . . . . . . . . . . . . . . . . . . 53.0.2 Definition (Exposed face) . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10xiiiList of Algorithms1 Generic bundle-type method . . . . . . . . . . . . . . . . . . . . . . . . . . . 252 Polyhedral Bundle-type method . . . . . . . . . . . . . . . . . . . . . . . . . 283 Bundle-type method for spectral atomic sets . . . . . . . . . . . . . . . . . . . 31xivAcknowledgementsFirst and foremost, I would like to express my sincere gratitude to my supervisor, ProfessorMichael Friedlander for the continuous support of my master study and research, for hispatience, enthusiasm, and extensive knowledge. His guidence not only helped me completemy master’s degree, but also strengthened my determination to continue my research as aPh.D student of him.Besides my supervisor, I would like to thank Professor Bruce Shepherd, for his verykindly accepting to be part of my Examining committee, encouragement and insightfulcomments.I would like to further thank my colleages and friends Yifan Sun, Halyun Jeong, HuangFang, Liran Li and Casie Bao, for stimulating discussions and active encouragement when Iwas lost.To my parents Ying Qiu and Zeshi Fan, I am deeply grateful for your all-enduring andselfless love. Without your support and understanding, I would have not been able to evenstart my journey of overseas study.Last but not the least, I would like to thank my wife, Wen Xiao, for her love and constantsupport, for all the late nights and early mornings. Without her company, it is impossiblefor me to get to this point. Yor are the light of my life.xvChapter 1IntroductionA recurring approach for solving inverse problems that arises in statistics, signal processing,and machine learning is based on recognizing that the desired solution can often be representedas the superposition of a relatively few canonical atoms as compared to the signal’s ambientdimension. Canonical examples include compressed sensing and model selection, where theaim is to obtain sparse vector solutions; and recommender systems, where low-rank matrixsolutions are required. Our aim is to design a set of algorithms that leverage this sparseatomic structure in order to gain computational efficiencies necessary for problems withlarge data, large parameters and complex constraints.Define a set of atoms by a set A ⊂ Rn. The atomic set may be finite or infinite, but ineither case, we assume that the set is closed and bounded. A point x ∈ Rn is said to besparse with respect to A if it can be written as a nonnegative superposition of a few atomsin A, i.e.,x =∑a∈Acaa, ca ≥ 0, (1.0.1)where most of the coefficients ca associated with each atom a are zero. A geometricexplanation is given in Figure 1.0.1. Two examples are compressed sensing, where the atomsare the canonical unit vectors and the aim to retrieve a sparse vector x, and low-rank matrixcompletion, where the atoms are the set of rank-1 matrices and the aim is to recover alow-rank matrix x. In each of these cases, there is a convex optimization problem whosesolution is sparse relative to the required atomic set. There now exists substantial literaturethat delineates conditions under which the correct solution is identified, typically in aprobabilistic sense [RXH08,Don06,CRT04,RFP10].Our focus here is on the approach advocated by Chandrasekaran et al. [CRPW12], whoidentified a set of convex analytical techniques based on gauge functions, which are norm-likefunctions that are especially well suited to the atomic description of the underlying model.We describe below a generalized linear inverse problem that captures the models analyzedby Chandrasekaran et al.In particular, these sparse vectors lying on low-dimensional facets of the convex hull ofthe atomic set A are often desirable solutions to important convex optimization problems(e.g., LASSO, low-rank matrix completion). Then the problem of recovering such a A-sparsevector x given limited linear measurements Mx ∈ B is of growing interest.1.1 RoadmapIn Chapter 2 we introduce the atomic pursuit problem and give a formal definition to for theatomic support. We also introduce the gauge dual to the atomic pursuit, which is defined11.1. Roadmap0xa1a2a3 a4a5AFigure 1.0.1: Atomic decomposition, where the atomic set is A = {a1, . . . , a5} and x can be expressedas a linear combination of a1 and a5.to be the dual atomic pursuit problem, and describe several possible advantages of dealingwith it. Before analyzing the key relationships between the primal and the dual, we reviewthe necessary convex analysis background for gauges in Chapter 3 and show some gaugesinduced by important atomic sets in Chapter 4. In Chapter 5 we show in Theorem 5.0.1 thatwe can discover the atomic support for the optimal primal variable via solving its gauge dual.This leads to the idea of development of a two-stage algorithm, which depends on Kelly’scutting-plane method [Kel60]. We introduce the cutting-plane method and geometricallyexplain why it is suitable for the dual atomic pursuit in Chapter 6. We also introduce twowell-known stabilized versions of the cutting-plane method, which are often called bundlemethods. Finally we present our two-stage bundle-type algorithm in Chapter 7. We providea generic version of the algorithm, with theoretical guarantee of convergence, as well as twospecialized versions applied to polyhedral atomic set and spectral atomic set respectively.We show some numerical results when applying our algorithm to basis pursuit denoisingphase retrieval in Chapter 8.2Chapter 2Atomic pursuitThe atomic set A ⊆ Rn induces the gauge functionγA(x) = inf{µ ≥ 0∣∣∣ x ∈ µÂ } , (2.0.1)where  = conv(A ∪ {0}) denotes the convex hull of A and 0. The convex hull is notgenerally needed because γA = γconv(0∪A) [FMP14, Proposition 2.3(ii)], but it is helpful forour development to keep it explicit.Intuitively, µ = γA(x) is the amount that the atomic set A must be expanded or shrunkto include a query point x. All positive homogeneous nonnegative functions are gauges; inparticular, norms, seminorms, and indicator functions to cones are gauges. The values ofγA(x) may be 0, finite nonzero, or infinite, depending on the properties of A. Any functionthat is positive homogeneous and nonnegative everywhere is a gauge function for some closedconvex set. We illustrate the geometric idea of gauge function in Figure 2.0.1.0a1a2a3 a4a5x2x3x1µ (µ > 1)µÂ (µ < 1)Figure 2.0.1: Gauge induced by atoms, where the atomic set is A = {a1, . . . , a5} and x1, x2 and x3are the query points with gauge values γA(x1) < γA(x2) = 1 < γA(x3).Proposition 2.0.1 (Gauge equivalence). The gauge to A can be expressed equivalently asγA(x) = inf{ ∑a∈A,ca≥0ca∣∣∣ x = ∑a∈Acaa}; (2.0.2)3Chapter 2. Atomic pursuitProof. Let κ(x) = inf{∑a∈A,ca≥0 ca∣∣∣ x = ∑a∈A caa}, then it follows thatκ(x) = inf{ ∑a∈A,ca≥0ca∣∣∣ x = ∑a∈Acaa}= inf{µ ≥ 0∣∣∣ x = ∑a∈Aµc¯aa,∑a∈Ac¯a = 1, c¯a ≥ 0,∀a ∈ A}= inf{µ ≥ 0∣∣∣ x ∈ µÂ } = γA(x).This alternative expression for the gauge function γA(x) is defined to be the minimalsum of weights over all possible atomic decompositions for x, as in (1.0.1).Definition 2.0.2 (Atomic pursuit). The atomic pursuit problem minimizes the gaugefunction over a set of linear measurements Mx ∈ B, where M : Rn → Rm is a linear operator,and B ⊂ Rm \ {0} denotes the admissible set:minimizex∈RnγA(x) subject to Mx ∈ B. (2.0.3)Chandrasekaran et al. [CRPW12] and Amelunxen et al. [ALMT14] describe conditionsunder which a solution to this convex optimization problem yields a good approximation tothe underlying ground truth.Although (2.0.3) is convex and in theory amenable to efficient algorithms, in practicethe computational and memory requirements of general-purpose algorithms are prohibitivelyexpensive. However, algorithms specially tailored to recognize the sparse atomic structurecan be made to be effective in practice. In particular, if we had information on which atomsparticipate meaningfully in constructing a solution x∗, then (2.0.3) can be reduced to aproblem over just those atoms. This leads to the following definition of atomic support.Definition 2.0.3 (Atomic support). A set S ⊂ A is a support set for x with respect to Aif every element a ∈ S has a coefficient ca from (2.0.2) that is strictly positive. That is,γA(x) =∑a∈Sca, x =∑a∈Scaa, ca > 0 ∀a ∈ S. (2.0.4)The set suppA(x) is defined as the set of all support sets, all sets of atoms in A thatcontribute non-trivially to some construction of x.If we can identify any support set S ∈ supp∗A := suppA(x∗) for any solution x∗,then (2.0.3) is equivalent to the reduced problemminimizex∈RnγS(x) subject to Mx ∈ B. (2.0.5)This is a potentially easier problem to solve, particularly in the case where it is possible toidentify a support set S ∈ supp∗A that has small cardinality. For example, the cardinality of4Chapter 2. Atomic pursuitS is small relative to the cardinality of A in the case where A is finite; or the dimension ofS is small relative to the dimension of A in the case where A is the set of rank-1 matrices.For example, when  is the cross-polytope, then identifying a small support S meansthat the reduced problem (2.0.5) only involves the few variables in S. Similarly, when Aˆ isthe set of rank-1 symmetric PSD matrices, then identifying a small support S correspondsto finding the eigenspace of a low-rank solution X∗. In both cases, knowing S can reducethe computational complexity significantly.Our approach for constructing the optimal support set supp∗A is founded on approximatelysolving a dual problem that is particular to gauge optimization (2.0.3).Definition 2.0.4 (Dual atomic pursuit). The dual atomic pursuit problem is defined to bethe gauge dual of the atomic pursuit problem (2.0.3), which takes the formminimizey∈RmσÂ(M∗y) subject to y ∈ B′, (2.0.6)whereσÂ(·) = supa∈Â〈a, ·〉 (2.0.7)is the support function to the set Â, andB′ = { y ∈ Rm | 〈b, y〉 ≥ 1 ∀b ∈ B } (2.0.8)is the antipolar to B.0a1a2a3 a4a5Âz{a | 〈a, z〉 = σÂ(z)}Figure 2.0.2: Support function induced by atoms. where the atomic set is A = {a1, . . . , a5} and z issome unit vector and a2 is the intersection of A and the hyperplane { a | 〈a, z〉 = σÂ(a) }. Also notethat the hyperplane is perpendicular to vector z.A more detailed explanation of relationship between the primal (2.0.3) and gaugedual (2.0.6) is given in Chapter 3. We finish this section by showing an example of thepossible advantages of dealing with the gauge dual. Consider Example 4.0.5, where A is theset of normalized positive semidefinite rank-one matrices. As we see, the gauge dual (2.0.6)turns out to be5Chapter 2. Atomic pursuitminimizey∈Rmmax { 0, λmax(MT y) } subject to 〈b, y〉 ≥ 1. (2.0.9)Compared with the primal problem, the dual variable y has much lower dimension. Comparedwith the Lagrange dualmaximizey∈Rm〈b, y〉 subject to I −M∗y  0, (2.0.10)where the constraint I −M∗y  0 means the positive semidefiniteness of I −M∗y, the gaugedual constraint is much simpler and thus the projection onto the constraint set is mucheasier.6Chapter 3Gauge duality backgroundIn this section we review known facts about gauges and support functions and introduceresults that are useful for our subsequent analysis. We now consider the following generalgauge dual pair introduced in [FMP14],(P )minimizexγA(x)subject to Mx ∈ B, (D)minimizeyσÂ(M∗y)subject to y ∈ B′. (3.0.1)We include some important properties regarding gauge, support function and gauge dualpair (3.0.1) in the following proposition. The proofs are outlined by Rockafellar [Roc97],Hiriart [HUL12] and Friedlander et al. [FMP14]. For completeness, we provide self-containedproofs here.Proposition 3.0.1 (Properties of gauge, support function and gauge dual pair). Let A bethe atomic set. The following statements hold.a) (Closure and convex hull) σ = max { 0, σA };b) (Bijection)  = {x ∈ Rn | γA(x) ≤ 1 } = {x ∈ Rn | 〈x, z〉 ≤ σÂ(z) for all z ∈ Rn };c) (Linear transformation) For a linear operator M with adjoint M∗,σMA(z) = σA(M∗z),where we interpret MA = {Ma | a ∈ A};d) (Lipschitz continuity) If A is bounded, then the support function is Lipschitz continuouswith Lipschitz constant L = sup { ‖a‖ | a ∈ A};e) (Lower semi-continuity) The support function σA is lower semi-continuous.f) (Subdifferential) ∂σA(z) = { a ∈ A | 〈a, z〉 = σA(z) };g) (Outer semi-continuity) The subdifferential of the support function is outer semi-continuous at any z ∈ dom σA, i.e.,∀ > 0,∃δ > 0 : z′ ∈ B(z, δ)⇒ ∂σA(z′) ⊆ ∂σA(z) + B(0, );h) (Weak duality) Any pair (x, y) that is primal-dual feasible in (3.0.1) must satisfy1 ≤ 〈x,M∗y〉 ≤ γA(x)σÂ(M∗y).7Chapter 3. Gauge duality backgroundi) (Strong duality) If both primal problem (P) and dual problem (D) in (3.0.1) are strictlyfeasible, then any primal-dual optimal pair (x, y) must satisfy1 = 〈x,M∗y〉 = γA(x)σÂ(M∗y).Proof. a) First, we show that σA∪{ 0 } = σÂ. Since A ∪ { 0 } ⊆ Â, it is obvious thatσA∪{ 0 } ≤ σÂ. And since each element a of  is a convex combination of points inσA∪{ 0 }, it follows that 〈a, ·〉 ≤ σA∪{ 0 }, and thus σ ≤ σA∪{ 0 }.Next, we show that σA∪{ 0 } = max { 0, σA }.σA∪{ 0 }(z) = supa∈A∪{ 0 }〈a, z〉= max { 0, supa∈A〈a, z〉 }= max { 0, σA(z) } .b) From the definition of gauge function (2.0.1), it is easy to see that  = {x ∈ Rn :γA(x) ≤ 1}. So we only need to show that  = {x ∈ Rn | 〈x, z〉 ≤ σA(z) for all z ∈ Rn }.Let D = {x ∈ Rn | 〈x, z〉 ≤ σA(z) for all z ∈ Rn }. By the definition of support func-tion, it can be easily shown that  ⊆ D. So we only need to prove that D ⊆ Â.Assume there is some x ∈ D such that x /∈ Â. Then by the separating hyperplanetheorem, there exists s ∈ Rn such that〈s, x〉 > sup { 〈s, y〉 | y ∈  } = σÂ(s).This leads to a contradiction. Therefore, we can conclude that  = D.c) From the definition of support function and adjoint operator,σMC(z) = sup { 〈Mx, z〉 | x ∈ C }= sup { 〈x,M∗z〉 | x ∈ C } = σC(M∗z).d) For any z, w ∈ domσA. Without loss of generality we assume σA(z) ≥ σA(w). Letaz ∈ A be the exposed atom, namely σA(z) = 〈az, z〉. Then,σA(z)− σA(w) = 〈az, z〉 − σA(w)≤ 〈az, z〉 − 〈az, w〉= 〈az, z − w〉≤ sup { ‖a‖ | a ∈ A}‖a− w‖,where the last inequality comes from Cauchy-Schwatz inequality and boundedness ofA.e) By Rockafellar’s theorem [Roc97, Theorem 7.1], we prove this by showing that theeipgraph of σA is closed, where the epigrah is defined to beepi(σA) = { (z, t) | σA(z) ≤ t } .8Chapter 3. Gauge duality backgroundLet { (zk, tk) }∞k=1 be some convergent sequence in epi(σA), and let (z, t) denote itslimit point. Now by the definition of support function and epigraph, we have〈a, zk〉 ≤ tk, ∀k ≥ 1,∀a ∈ A.Taking the limit, we get〈a, z〉 ≤ t, ∀a ∈ A.Therefore, it follows that (z, t) ∈ epi(σA). Thus epi(σA) is closed.f) Let D = { a ∈ A | 〈a, z〉 = σA(z) }. First, we show that D ⊆ ∂σA(z). Assume a ∈ D,then for any w ∈ Rn,σA(w) ≥ 〈a,w〉 = σA(z) + 〈a,w − z〉.Thus, a ∈ ∂σA(z).Next, we prove that ∂σA(z) ⊆ D. Assume a ∈ ∂σA(z), thenσA(w) ≥ σA(z) + 〈a,w − z〉, ∀w ∈ Rn (3.0.2)By the subadditivity of support functions, we must haveσA(z) + σA(w − z) ≥ σA(w), ∀w ∈ Rn. (3.0.3)It then follows from (3.0.2) and (3.0.3) that σA(v) ≥ 〈a, v〉 for all v. By part (d), wecan thus conclude that x ∈ A. Now let w = 0 in (3.0.2), it follows that 〈a, z〉 ≥ σA(z).Therefore, it follows that 〈x, z〉 = σA(z) and thus x ∈ D.g) Assume for contradiction that, at some z, there is  > 0 and a sequence { (zk, ak) }∞k=1such thatzk → z, ak ∈ ∂σA(zk), ak /∈ ∂σA(z) + B(0, ).Now since { ak }∞k=1 is bounded, there exist a convergent subsequence { aj }j∈J , andlet a denote its limit. Since aj ∈ ∂σA(zj), we haveσA(zj) = 〈aj , zj〉.After taking the limit, we getσA(z) = 〈a, z〉,and thus a ∈ ∂σA(z). This leads to a contradiction.h) First, we show that 〈x,M∗y〉 ≤ γA(x)σÂ(M∗y). By Proposition 3.0.1(b),σÂ(M∗y) = sup { 〈x,M∗y〉 : x ∈  } = sup { 〈x,M∗y〉 : γA(x) ≤ 1 } .So it follows that〈x,M∗y〉 = γA(x)〈 1γA(x)x,M∗y〉≤ γA(x)σÂ(M∗y).Next, we show that 〈x,M∗y〉 ≥ 1. Since (x, y) are primal-dual feasible, it follows thatMx ∈ B and y ∈ B′. Then by the definition of antipolar (2.0.8), we can conclude that〈x,M∗y〉 ≥ 1.9Chapter 3. Gauge duality backgroundAtomic sparsity A Aˆ γA(x) suppA(x) σÂ(z)non-negative cone({ e1, . . . , en }) non-negative orthant δ≥0 cone({ ei | xi > 0 }) δ≤0elementwise {±e1, . . . ,±en } cross polytope ‖ · ‖1 { sign(xi)ei | xi 6= 0 } ‖ · ‖∞low rank {uvT | ‖u‖2 = ‖v‖2 = 1 } nuclear-norm ball nuclear norm singular vectors of x spectral normPSD & low rank {uuT | ‖u‖2 = 1 } {X  0 | trX ≤ 1 } tr +δ0 eigenvectors of x max {λmax, 0 }Table 3.1: Commonly used sets of atoms and their gauge and support function representations. Theindicator function δC(x) is zero if x ∈ C and +∞ otherwise.i) We refer the reader to [FMP14, Corollary 5.2].For a closed convex set C ⊂ Rn, a face F of C is a subset of C where for all x ∈ C, y ∈ C,θ ∈ (0, 1),θx+ (1− θ)y ∈ F ⇐⇒ x ∈ F and y ∈ F .Note that the backward direction is true for any convex set F , but the forward onlyholds if F is a face.Definition 3.0.2 (Exposed face). The face of A exposed by a vector z is defined asFA(z) := {x ∈ A | 〈x, z〉 = σA(z) } = ∂σA(z).It is not hard to see that the exposed face is indeed a face. A geometric interpretation isgiven in Figure 3.0.1.0a1a2a3 a4a5Âz1z2FA(z1)FA(z2)Figure 3.0.1: Exposed face. The atomic set is A = {a1, . . . , a5} and z1, z2 are some unit vectors.The exposed faces of z1 and z2 are FA(z1) = { a2 } and FA(z2) = conv { a3, a4 } respectively.10Chapter 4ExamplesWe introduce several gauges and support functions induced by some important atomic setsin this chapter. We also introduce some corresponding atomic pursuit problems and theirgauge dual.Example 4.0.1 (One norm). Let A = {±e1, . . . ,±en} be the signed standard basis vectors,and it follows that  is the cross polytope. It is easy to see that this atomic set induces the1-norm:γA(x) = ‖x‖1,and the corresponding support function is the ∞-norm:σÂ(z) = ‖z‖∞.The support for x with respect to A:suppA(x) = { sign(xi)ei | xi 6= 0 } .The face of A exposed by z is:FA(z) = { sign(zi)ei | | zi| = ‖z‖∞ } .One corresponding atomic pursuit problem is the basis pursuit denoising (BPDN) [CDS01]problem. We will discuss this more in Section 8.1.Example 4.0.2 (Nuclear norm). The nuclear norm, or Schatten 1-norm, of a matrix isinduced by the atomic set A = {uvT | ‖u‖2 = ‖v‖2 = 1 }, which is the set of normalizedrank-1 matrices. To see this, note thatγA(X) = min{r∑i=1ci∣∣∣∣∣ X =r∑i=1ciuivTi , ‖ui‖ = ‖vi‖ = 1,∀i}= min { ‖c‖1 | X =r∑i=1ciuivTi , ‖ui‖ = ‖vi‖ = 1,∀i }= ‖X‖1.The corresponding support function is the spectral norm, or Schatten ∞-norm becauseσA(Z) = max{ 〈Z, uvT 〉 ∣∣ ‖u‖ = ‖v‖ = 1 }= max{〈Z, uvT 〉∣∣∣∣∣ Z =r∑i=1ciuivTi , ‖u‖ = ‖v‖ = ‖ui‖ = ‖vi‖ = 1,∀i}= max { ‖c‖∞ | Z =r∑i=1ciuivTi , ‖ui‖ = ‖vi‖ = 1, ∀i }= ‖Z‖∞.11Chapter 4. ExamplesThus the support for X with respect to A is:suppA(X) = { singular vectors of X } .The face of A exposed by Z is:FA(Z) = { top singular vectors of Z } .One corresponding atomic pursuit problem is the low rank matrix completion prob-lem [RFP10], which is about seeking a row rank X that matches certain observations.Example 4.0.3 (Total variation). The total variation norm of an n-vector x is defined as‖x‖TV =n∑i=2|xi − xi−1| = ‖Dx‖1, D =1 −1 0 · · · 0 00 1 −1 · · · 0 0.......... . .......0 0 0 · · · 1 −1 .Since the matrix D is singular, the atomic set that induces this norm is not unique.Specifically, for any matrix A = [a1, . . . , an−1] where DA = I, the atomic set can be definedasA = {±a1, . . . ,±an−1,±e}+ cone(±e). (4.0.1)To see this, observe thatγA(x) = min {n−1∑i=1| ci| | x =n−1∑i=1ciai + cee }= min { ‖c‖1 | x = Ac+ cee }= ‖Dx‖1.To see the corresponding support function, we first note that from the construction of theatomic set (4.0.1), it’s obvious that if z /∈ null(e), then the support function will be infinity.So we assume that z ∈ null(e). ThenσA(z) = sup { 〈x, z〉 | γA(x) ≤ 1 }= sup { 〈x, z〉 | x = Ac+ cee, ‖c‖1 ≤ 1 }= sup { 〈Ac, z〉 | ‖c‖1 ≤ 1 }= sup { 〈c, AT z〉 | ‖c‖1 ≤ 1 }= ‖AT z‖∞.Next, we show that for any z ∈ null(e) and any matrix A satisfying DA = I, AT z is uniquelydetermined. For any matrix A satisfying DA = I, A can be expressed as A = B + eT s,whereB = [b1, . . . , bn−1] :=1 1 · · · 1 10 1 · · · 1 1....... . .......0 0 · · · 1 10 0 · · · 0 10 0 · · · 0 0. (4.0.2)12Chapter 4. ExamplesTherefore, AT z = (BT + sT e)z = BT z, which is uniquely determined.Provided x = Ac+ cee, the support for x with respect to A issuppA(x) = { sign(ci)ai | ci 6= 0 }+ range(sign(ce)e),The face of A exposed by z isFA(z) = { sign(〈ai, z〉)ai | | 〈ai, z〉)| = ‖AT z‖∞ } .One corresponding atomic pursuit problem is the image reconstruction problem [CESV15],which is about recovering an unknown image from a set of linear measurements.Example 4.0.4 (Group norms). Consider the ` subsets gi ⊆ {1, . . . , n} such that ∪`i=1gi ={1, . . . , n}. Define the group norm with respect to the groups G = {g1, . . . , g`} as the solutionof the convex optimization problem‖x‖G = minyi{∑`i=1‖yi‖2∣∣∣ x = ∑`i=1Pgiyi}, (4.0.3)where the linear operator PI : R|I| → Rn scatters the elements of a vector into an n vectorat positions indexed by I, i.e., {(PIy)i}i∈I = y, and (PIy)k = 0 for any k /∈ I. It is easy tosee that this norm is induced by the atomic setA ={Pgisi∣∣∣ si ∈ R|gi|, ‖si‖2 = 1, i = 1, . . . , `} ,which yields the decompositionx =∑`i=1ci︸︷︷︸caPgisi︸ ︷︷ ︸a. (4.0.4)Next, we consider the corresponding support function,σA(z) = sups∈A〈s, z〉= maxi=1,...,`sup{〈si, zgi〉∣∣ ‖si‖2 ≤ 1, si ∈ R|gi|}= maxi=1,...,`‖zgi‖2.It follows that the support for x with respect to A isSA(x) ={Pgixgi‖xgi‖2∣∣∣∣ ‖xgi‖2 > 0} ,and the face of A exposed by z isFA(z) ={zgi‖zgi‖2∣∣∣∣ ‖zgi‖2 = maxj ‖zgj‖2} .One corresponding atomic pursuit problem is group Lasso with overlaps [OJV11], whichis about recovering signals whose support can be expressed as a union of groups.13Chapter 4. ExamplesExample 4.0.5 (Trace norm for PSD matrices). The trace norm is given by the functionκ(X) = tr(X) + δ0(X),which encourages sparsity with respect to the set of rank-1 matrices—i.e., low-rank. Weclaim that κ is the gauge induced by the atomic setA = {uuT | u ∈ Rn, ‖u‖2 = 1 } ,which is the spectrahedron. By Proposition 3.0.1(b), we only need to show that X ∈ A ifand only if κ(X) ≤ 1.Take any element X ∈ A, then by definition of gauge (2.0.2), X can be expressed asX =r∑i=1ciuiuTi , ci ≥ 0 ∀i,r∑i=1ci ≤ 1.then it follows thatκ(X) = tr(X) + δ(X) = 〈I,r∑i=1ciuiuTi 〉 =r∑i=1ci ≤ 1.Now take any element X such that κ(X) ≤ 1. By the definition of indicator function, Xmust be positive semidefinite. Consider the eigenvalue decomposition for X:X =r∑i=1λiuiuTi , ci ≥ 0 ∀i.Then it follows that 1 ≥ κ(X) =r∑i=1λi. And thus X ∈ A.Next, we derive the support function with respect to A:σA(Z) = supX∈Â〈X,Z〉= max{0, sup‖u‖2=1uTZu}= max{0, λmax(Z)}.which vanishes only if Z is negative semidefinite, and otherwise is achieved when u is amaximal eigenvector of Z.Moreover, it follows that the nontrivial eigenvectors provide a support set for X, i.e.,suppA(X) ⊇ {u1uT1 , . . . , uruTr },where r is the rank of X. Note that this support is not unique, however, and in fact the setof supports of X is very large. To see this, consider any valid conic atomic decompositionX = c1v1vT1 + · · ·+ ckvkvTk = V CV T ,14Chapter 4. Exampleswhere ci and vi, respectively, are the ith diagonal entry of the diagonal matrix C and columnof matrix V . Thentr(X) = tr(V CV T ) = tr(CV TV ) = 〈diag(C),diag(V TV )〉 =k∑i=1ci,where the last equality follows from the fact that each vivTi is in A and thus has unit norm.Therefore any conic atomic decomposition of X yields the same gauge value, which is thetrace of X. Specifically, the support of X with respect to the spectrahedron A can becharacterized assuppA(X) ={ { v1vT1 , . . . , vkvTk } ∣∣ ‖vi‖2 = 1, range(V ) = range(X) } . (4.0.5)Because we do not impose orthonormality among the vectors vi, this set is not unique.The face of A exposed by Z is given by the eigenvectors corresponding to the maximaleigenvalue of Z, including all of their convex combinations:FA(Z) = conv {uuT | uTZu = λmax(Z) } .One corresponding atomic pursuit problem is the phase retrieval [CESV15]. We discussthis more in Section 8.2.Example 4.0.6 (Weighted trace norm for PSD matrices). We describe a generalization ofthe trace norm for positive semidefinite matrices, which was covered by Example 4.0.5. Theweighted trace norm is given by the functionκ(X) = 〈L,X〉+ δ0(X),where L is positive semidefinite. Write the decomposition of L asL =[V V¯] [Λ 00 0] [V TV¯ T]= V ΛV T ,where Λ is diagonal with strictly positive elements and V and V¯ , respectively, span therange and nullspace of L.We claim that κ is the gauge to the atomic setA = { rrT | r = V p, pTΛp = 1 }+ { ssT | s = V¯ q for all q } , (4.0.6)which we establish by showing that X ∈ A implies κ(X) = 1 and vice versa.Take any element X ∈ A, and observeκ(X) = 〈L,X〉 = 〈L, V ppTV T 〉 = pTV TLV p = pTΛp = 1.Conversely, take any X such that κ(X) = 1. Clearly, X is PSD. The orthogonal decomposi-tion of X into the range and nullspace of L is given byX = V V TXV V T + V¯ V¯ TXV¯ V¯ T .15Chapter 4. ExamplesThen,1 = κ(X) = 〈L,X〉 = 〈L, V V TXV V T 〉 = 〈Λ, V TXV 〉,which implies that V TXV ∈ conv { ppT | pTΛp = 1 }. Therefore, X is in the convex hull ofA. The second set in the sum (4.0.6) is in the nullspace of L and thus can be ignored. Thisestablishes the claim, and also provides an expression for the support set to X:suppA(X) = { (V pi)(V pi)T | pTiΛpi = 1, range(V [p1 · · · pk]) = range(X) } .Clearly the minimal set of vectors needed to complete the support is k equal to the rank ofX.The support function with respect to A can be reduced to a maximum generalizedeigenvalue problem, as follows:σA(Z) = sup { 〈X,Z〉 | X ∈  }= sup { 〈V ppTV T, Z〉 | pTΛp ≤ 1 }= sup { 〈V TZV,Λ−1/2uuTΛ−1/2〉 | uTu ≤ 1 }= sup { 〈Λ−1/2V TZV Λ−1/2, uuT 〉 | uTu ≤ 1 }= max{0, λmax(Λ−1/2V TZV Λ−1/2)}.We recognize that the expression inside the supremum is the generalized eigenvalue of thepencil (Z,L), so thatσA(Z) = max {0, λmax(Z,L)} .Hence, the exposed face is given by the maximal generalized eigenvectors, including theirconvex combinations:FA(Z) = conv {uuT | uTZu = λmax(Z,L) } .Example 4.0.7 (Submodular optimization). Consider a real-valued submodular set-functionF : 2V → R, where V = {1, . . . , p} is the base set. Submodularity means that F satisfiesF (A) + F (B) ≥ F (A ∪B) + F (A ∩B) and F (∅) = 0.The corresponding unconstrained submodular minimization problem is:minimizeAF (A) subject to A ⊆ V. (4.0.7)It was shown by Lova´sz [Lov83] that problem (4.0.7) is equivalent tominimizewf(w) subject to w ∈ [0, 1]p, (4.0.8)wheref(w) := Eλ∼[0,1]F ({ i | wi ≥ λ }) (4.0.9)16Chapter 4. Examplesis the Lova´sz extension of function F . It was then shown by Bach [B+13] that the Lova´szextension (4.0.9) is actually a support function,f(w) = σB(F )(w),whereB(F ) := { s ∈ Rp | s(V ) = F (V ), ∀A ⊆ V, s(A) ≤ F (A) }is the base polyhedron of function F .17Chapter 5Dual atomic pursuitThe dual relation between the pair (2.0.3) and (2.0.6) is encapsulated by the weak andstrong duality as outlined in Proposition 3.0.1(h) and Proposition 3.0.1(i). The followingtheorem shows that the gauge dual reveals the optimal support for the primal solution.Theorem 5.0.1 (Optimal support identification). Let (x, y) ∈ Rn × Rm be any optimalprimal-dual solution of the dual pair (2.0.3) and (2.0.6). ThenS ⊆ FA(M∗y) ∀S ∈ supp∗A .Proof. Let S ∈ suppA(x). It follows from (2.0.2) that there exist strictly positive numbers{ ca | a ∈ S } such thatx∗ =∑a∈Scaa and γA(x) =∑a∈Sca,Let xˆ = x/γA(x) be a normalized solution. Thenxˆ =∑a∈ScaγA(x)a and γA(xˆ) =∑a∈AcaγA(x)≡ 1.This implies that xˆ is necessarily a strict convex combination of every point in S. Thusin order to establish that S ⊆ FA(M∗y), it is sufficient to show xˆ ∈ FA(M∗y). By strongduality, as described in Proposition 3.0.1(i),〈xˆ,M∗y〉 = σA(M∗y),and it follows from the definition of exposed faces that xˆ ∈ FA(M∗y∗).This result can be interpreted geometrically: the optimal support atoms lie in the faceof  exposed by M∗y∗. Moreover, each atom is a subgradient of σA. A geometric exampleis shown in Figure 5.0.1. The theoretical basis for this approach is outlined by Friedlanderet al. [FMP14] and Aravkin et al. [ABD+17].18Chapter 5. Dual atomic pursuit0Â0MÂa1a2a3 a4a5x∗Ma3Ma4y∗{x |Mx ∈ B} {y | y ∈ B′}Figure 5.0.1: Optimal support identification, where A = { a1, . . . , a5 } is the atomic set, x∗ and y∗are the optimal solutions to primal (2.0.3) and (2.0.6) respectively. The atomic support for x∗ issuppA(x∗) = { a3, a4 } and the exposed face for M∗y∗ is also { a3, a4 }.Importantly, Theorem 5.0.1 shows that we can pursue the atomic support for the optimalprimal variable by solving the corresponding gauge dual, and this explains why we call thisproblem (2.0.6) dual atomic pursuit. Moreover, together with the reduced problem (2.0.5),it turns out that we should solve the atomic pursuit problem with a two-stage algorithm,where the first stage is discovering the optimal atomic support via solving the dual atomicpursuit and the second stage is solving the reduced problem over the discovered support.We give a detailed introduction to this algorithm in Chapter 7.19Chapter 6Classical bundle methodsThe cutting-plane method for general nonsmooth convex optimization was first introducedby Kelley [Kel60]. It solves the optimization problem via approximating the objectivefunction by a bundle of linear inequalities, called cutting planes. The approximation isiteratively refined by adding new cutting planes computed from the responses of the oracle.The trick behind the method is not to approximate the entire objective function well by aconvex polyhedron, but to do so only near the optimum. Several stabilized versions, usuallyknown as bundle methods, were subsequently developed by Lemarechal et al. [LNN95] andKiewel [Kiw90].We give a simple description of the construction of the lower minorant in the context ofa generic convex function f : Rn → R. Let {x(j), g(j) ∈ ∂f(x(j)) }kj=1 be the set of pairs ofiterates and subgradients visited through iteration k. The cutting plane model at iterationk isf (k)(x) = maxj=1,...,k{f(x(j)) + 〈g(j), x− x(j)〉}. (6.0.1)A geometric interpretation is shown in Figure 6.0.1. In our simplified description, thecutting-plane model is polyhedral, but can be defined more generally, as we describe inconnection with Algorithm 3, where the models are spectrahedral.xx(1)f(x(1)) + 〈g(1), x− x(1)〉x(2)f(x(2)) + 〈g(2), x− x(2)〉x(3)f(x(3)) + 〈g(3), x− x(3)〉f (3)(x)Figure 6.0.1: Cutting plane model f (3) with iterates x(1), . . . , x(3) and corresponding gradientsg(1), . . . , g(3).The next proposition shows that, when specialized to support functions, the cutting-planemodels are themeselves support functions, which only depends on the previous subgradients.206.1. Cutting-plane methodA geometric illustration is given in Figure 6.0.2.Proposition 6.0.1 (Cutting-plane model for support functions). The cutting-plane model (6.0.1)for f = σ and { z(j), a(j) }kj=1 being the set of pairs of iterates and subgradients isf (k)(z) = σA(k)(z) with A(k) = {a(1), . . . , a(k)} ⊂ Â.Proof. By the definition of subdifferential of support functions Proposition 3.0.1(f), we havef (k)(z) = maxj=1,...,kσÂ(z(j)) + 〈a(j), z − z(j)〉= maxj=1,...,k〈a(j), z〉= σA(k)(z).0Âa1a2a3 a4a5A(1)0A(2)Âa1a2a3 a4a50A(3)Âa1a2a3 a4a5Figure 6.0.2: Inner approximation to the atomic set, where A = { a1, . . . , a5 } with A(1) = { a1 },A(2) = conv { a1, a3 } and A(3) = conv { a1, a3, a5 }.Now again consider the dual atomic pursuit problem (2.0.6). Coupled with Theorem 5.0.1and Proposition 6.0.1, we observe that that the inscribing sets A(1) ⊂ A(k) ⊂ . . .  thatdefine the cutting-plane model are constructed from the faces of  exposed by previousiterates {M∗y(i) }ki=1. The sets A(k) thus contain atoms that are candidates for the supportof the optimal solution. This leads to the idea of applying cutting-plane-like methodsto (2.0.6).We will describe cutting-plane, proximal bundle and level bundle methods applied tosolving (2.0.6) in Section 6.1, Section 6.2 and Section 6.3 respectively. We introduce ourbundle-type method in Chapter 7.6.1 Cutting-plane methodKelley’s cutting plane method applied to (2.0.6) can be summaried as minimizing a sequenceof cutting-plane models,216.2. Proximal bundle methody(k+1) ∈ arg miny∈B′σA(k)(M∗y),a(k+1) ∈ FA(M∗y(k+1)),A(k+1) = A(k) ∪ { a(k+1) } ,(6.1.1)where A(k) = {a(1), . . . , a(k)}.The classical cutting-plane method is known to be slow, where the slowness is becauseof the possible zig-zag behavior of the iterates [NY83]. Bundle methods are stabilizedrefinements of the classical cutting-plane method. Also in order to make this approachcomputationally useful, care must be taken to ensure that the sets A(k) do not grow toolarge, which would defeat the purpose of the approach. Both of the bundle methods allowthe control over the size of A(k).6.2 Proximal bundle methodThe proximal bundle method was first introduced by Kiwiel [Kiw90]. The idea of theproximal bundle method is to distinguish one of the generated iterates y(1), . . . , y(k) so farand set it to be the center yˆ(k). Then instead of generating the next iterate as the minimizerof σMA(k) , we compute the next iterate by computing the Moreau-Yosida regularization forσMA(k) at yˆ(k), namelyy(k+1) = arg miny∈B′{σA(k)(M∗y) +µ2∥∥∥ y − yˆ(k)‖2 } ,a(k+1) ∈ FA(M∗y(k+1)),A(k+1) = A(k) ∪ { a(k+1) } .(6.2.1)The two norm here is responsible for the stabilization of the cutting plane method. It forcesthe next iterate not to be too far away from the current center, which will avoid the drasticmovements as in the case of the cutting plane method. And the center will be updated ifcurrent iterate make enough improvement, namely we will set yˆ(k+1) = y(k+1) ifσA(yˆ(k))− σA(y(k+1)) ≥ m[σA(yˆ(k))−(σA(k)(y(k+1)) +µ2‖y(k+1) − yˆ(k)‖2)].This can simply be interpreted as the new center must improve the objective value by atleast a fixed fraction m of the improvement made by the proximal cutting plane model.In contrast to the cutting-plane method, the proximal bundle method allows the controlover the size of A(k), and the process is called aggregation. Here we follow the similarprocedure as described by Belloni [Bel]. The subproblem in (6.2.1) can be written asminimizet,yt+ µ2‖y − yˆ(k)‖2 + δB′(y)subject to 〈a(j),M∗y〉 ≤ t, j = 1, . . . ,m.(6.2.2)226.3. Level bundle methodLet {β(j) }kj=1 denote the Lagrange multiplier of (6.2.2) and leta¯(k) =k∑j=1β(j)a(j),which can be viewed as the weighted average of the atoms a(j). Then the aggregation processcan be described asA(k+1) = A(k) \ { a(j) ∈ A(k) | β(j) < βmin } ∪ { a¯(k) } ∪ { a(k+1) } ,where βmin is a pre-set positive parameter.6.3 Level bundle methodThe level bundle method was first introduced by Lemarechal et al. [LNN95]. The idea ofthe level bundle method is to enclose the optimal solution in a contracting sequence of sets.Formally, we are going to build a sequence of contracting neighbourhoods Y (1) ⊇ Y (2) · · · ⊇Y (k) of the optimal point y∗ based on all former generated iterates y(1), . . . , y(k). The waythey construct these contracting sets is based on the idea of level sets. FormallyU (k) = mini=1,...,kσA(M∗y(i)), L(k) = miny∈B′σA(k)(M∗y),δ(k) = U (k) − L(k), `(k) = L(k) + λδ,Y (k) ={y ∈ B′∣∣∣ σA(k)(M∗y) ≤ `(k) } ,y(k+1) = projY (k)(y(k)), a(k+1) ∈ FA(M∗y(k+1)), A(k+1) = A(k) ∪ { a(k+1) } .(6.3.1)The convergence of the level bundle method depends on the compactness of the domain,which may not be satisfied. Thus, a modified version was proposed by Bra¨nnlund andKiwiel [BKL95] which does not require the domain to be compact. And the modificationcan be summarized as bellow.The projection in (6.3.1) can be written asminimizey12‖y − yˆ(k)‖2 + δB′(y)subject to 〈a(j),M∗y〉 ≤ `(k), j = 1, . . . ,m.(6.3.2)Let {β(j) }kj=1 denote the Lagrange multiplier of (6.3.2). If maxj=1,...,kβ(j) > βmax, where βmax isa pre-set positive parameter, then increase the level `(k) and resolve the subproblem (6.3.2).The level bundle method also allows the control over the size of A(k), which is calledtruncation. As outlined in [BKL95], only the linearizations that have contributed to y(k+1)need be retained for the next iteration, namelyA(k+1) = { a(j) ∈ A(k) | β(j) > 0 } ∪ { a(k+1) } . (6.3.3)23Chapter 7Bundle-type two-stage algorithmWe consider below an algorithmic variation that allows us to periodically trim the inscribingsets. Our method is based on the level bundle-type method introduced by Bello Cruzand Oliveira [CdO14]. Each iterate is computed via a projection onto the level set of thecorresponding lower minorant. The candidate atomic set A(k) continues to inscribe Â, butdoes not necessarily grow monotonically as per Proposition 6.0.1. Instead, we follow therecipe outlined by Bra¨nnlund and Kiwiel [BKL95], as shown in (6.3.3), which only requiresA(k+1) to contain the elements that contribute to y(k+1). However, this only works for thepolyhedral atomic sets A(k). When generalizing this idea to general atomic sets, we similarlygetFA(k)(M∗y(k+1)) ∪ {a(k+1)} ⊆ A(k+1) ⊆ FA(M∗y(k+1)) ∪ A(k), (7.0.1)where y(k+1) is the latest iterate and a(k+1) ∈ FA(M∗y(k+1)). This rule ensures that theupdates to the candidate atomic set A(k) always contain exposed atoms that define thelower minorant, and at least one exposed atom from the full set. The general version of ourproposed method is outlined in Algorithm 1.For simplicity, we assume that we know the optimal value d∗ of (2.0.6). This assumptionis valid in cases where it is possible to normalize the ground truth, and then the optimalvalue d∗ is known in advance.Let S∗ denote the solution set to the problem (2.0.6), and assume that S∗ 6= ∅. Thefollowing proposition then shows that S∗ is always included in the level sets Y (k)’s.Proposition 7.0.1 (Solution set inclusion). S∗ ⊆ Y (k) for all k.Proof. We prove this by induction.First, we show that S∗ ⊆ Y (1). For any y ∈ S∗, it is obvious that σA(1)(M∗y) ≤σÂ(M∗y) = d∗ and 〈y − y(1), yˆ − y(1)〉 = 〈y − y(1), y(1) − y(1)〉 = 0. So y ∈ Y (1) and thusS∗ ⊆ Y (1).Next, suppose S∗ ⊆ Y (k) and we will show S∗ ⊆ Y (k+1). For any y ∈ S∗, it is obviousthat σA(k+1)(M∗y) ≤ σÂ(y) = d∗. By the fact that y(k+1) = projY (k)(yˆ) and y ∈ S∗ ⊆ Y (k),we can conclude that 〈y − y(k+1), yˆ − y(k+1)〉 ≤ 0. Therefore, S∗ ⊆ Y (k+1).We describe the relationship between iterates in the following technical lemma, which iscrutial for Theorem 7.0.3.Lemma 7.0.2 (Relationship between iterates). For all k > 0, we have the following resultsa)∥∥∥∥y(k) − 12(yˆ + y∗)∥∥∥∥ ≤ ‖yˆ − y∗‖2 for any y∗ ∈ S∗.24Chapter 7. Bundle-type two-stage algorithmAlgorithm 1 Generic bundle-type method1: (Set tolerance) δ > 02: (Initial point) y(1) ∈ B′ and a(1) ∈ FA(M∗y(1))3: (Initialize bundle) Construct A(1) such thata(1) ∈ A(1) ⊆ FA(M∗y(1))4: (Set center) yˆ = y(1)5: for k = 1, 2, ... do6: (Upper bound) U (k) = mini=1,....kσÂ(M∗y(i))7: (Gap) δ(k) = U (k) − d∗8: (Stopping criterion) δ(k) ≤ δ9: (Level set)Y (k) ={y ∈ B′∣∣∣ σA(k)(M∗y) ≤ d∗, 〈y − y(k), yˆ − y(k)〉 ≤ 0 }10: (Next iterate)y(k+1) = projY (k)(yˆ), a(k+1) ∈ FA(M∗y(k+1))11: (Update bundle) Construct A(k+1) satisfying (7.0.1)12: end for13: return A(k)b) ‖y(k+1) − yˆ‖2 − ‖y(k) − yˆ‖2 ≥ ‖y(k+1) − y(k)‖2.c) ‖y(k+1) − y(k)‖2 ≥ (σÂ(M∗y(k))− d∗)2‖Ma(k)‖2Proof. (a) Let w(k) = y(k) − 12(yˆ + y∗) and w∗ = y∗ − 12(yˆ + y∗). By Proposition 7.0.1, weknow that y∗ ∈ S∗ ⊆ Y (k), it then follows that0 ≥ 〈y∗ − y(k), yˆ − y(k)〉= 〈w∗ − w(k),−w∗ − w(k)〉= ‖w(k)‖2 − ‖w∗‖2.Therefore, ∥∥∥∥y(k) − 12(yˆ + y∗)∥∥∥∥ ≤ ∥∥∥∥y∗ − 12(yˆ + y∗)∥∥∥∥ = 12‖yˆ − y∗‖.(b) Since y(k+1) ∈ Y (k), it follows that0 ≥ 〈y(k+1) − y(k), yˆ − y(k)〉=12(‖y(k+1) − y(k)‖2 − ‖y(k+1) − yˆ‖2 + ‖y(k) − yˆ‖2).25Chapter 7. Bundle-type two-stage algorithm(c) By the construction of the atomic set A(k), we know that a(k) ∈ A(k). Now sincey(k+1) ∈ Y (k), it follows that〈a(k),M∗y(k+1)〉 ≤ d∗⇒ σÂ(M∗y(k)) + 〈a(k),M∗y(k+1) −M∗y(k)〉 ≤ d∗⇒ σÂ(M∗y(k))− d∗ ≤ 〈Ma(k), y(k+1) − y(k)〉 ≤ ‖Ma(k)‖‖y(k+1) − y(k)‖.The theorem is our theoretical fundation, which shows the convergence of Algorithm 1.Moreover, it shows that the iterates y(k) will converge to a point in solution set S∗ and thatpoint is actually the projection point of yˆ onto S∗.Theorem 7.0.3 (Convergence of Algorithm 1). The sequence { y(k) }∞k=1 converges to thepoint y∗ := projS∗(yˆ).Proof. First, we show that { y(k) }∞k=1 is bounded and limk→∞ ‖y(k+1)−y(k)‖ = 0. The bound-edness follows directly from Lemma 7.0.2(a). Lemma 7.0.2(b) implies that { ‖y(k) − yˆ‖ }∞k=1 isnondecreasing and bounded, and thus converges. Taking limits on both sides of Lemma 7.0.2(b),we can conclude thatlimk→∞‖y(k+1) − y(k)‖ = 0.Next, we show that limk→∞ σÂ(M∗y(k)) = d∗. It follows from Lemma 7.0.2(c) that0 = limk→∞‖y(k+1) − y(k)‖2≥ limk→∞(σÂ(M∗y(k))− d∗)2‖Ma(k)‖2 ≥ 0.Therefore, it follows that limk→∞ σÂ(M∗y(k)) = d∗. Then, we show that { y(k) }∞k=1 hasconvergent subsequence and every convergent subsequence of { y(k) }∞k=1 has a limit inS∗. Since { y(k) }∞k=1 is bounded, it has convergent subsequence. Let { y(j) }j∈J be someconvergent subsequence with limit y¯ ∈ B′. By Proposition 3.0.1(e), the support function islower-semicontinuous, and thusd∗ ≤ σÂ(M∗y¯)≤ lim infj∈JσÂ(M∗y(j))= limk→∞σÂ(M∗y(k)) = d∗,which implies y¯ ∈ S∗.Moreover, we show that every convergent subsequence of { y(k) }∞k=1 converges to y∗. Let{ y(j) }j∈J be some convergent subsequence with limit y¯. By the definition of projectionand Proposition 7.0.1,‖y(k) − yˆ‖ ≤ ‖y∗ − yˆ‖ for all k.267.1. Polyhedral Atomic SetThen it follows that ∀j ∈ J ,‖y(j) − y∗‖2 = ‖y(j) − yˆ − (y∗ − yˆ)‖2= ‖y(j) − yˆ‖2 + ‖y∗ − yˆ‖2 − 2〈y(j) − yˆ, y∗ − yˆ〉≤ 2(‖y∗ − yˆ‖2 − 〈y(j) − yˆ, y∗ − yˆ〉).By the convergence of { y(j) }j∈J ,lim supj∈J‖y(j) − y∗‖2 ≤ 2(‖y∗ − yˆ‖2 − 〈y¯ − yˆ, y∗ − yˆ〉).Besides, since y∗ = projS∗(yˆ) and y¯ ∈ S∗, we have〈yˆ − y∗, y¯ − y∗〉 ≤ 0.Also notice that‖‖y∗ − yˆ‖2 − 〈y¯ − yˆ, y∗ − yˆ〉 = 〈yˆ − y∗, y¯ − y∗〉.Combining the three equations above, we thus conclude thatlim supj∈J‖y(j) − y∗‖2 = 0.Since { y(j) }j∈J is convergent, it follows that { y(j) }j∈J converges to y∗. Finally, since everyconvergent subsequence of { y(k) }∞k=1 converges to y∗, we can thus conclude that { y(k) }∞k=1converges to y∗.We propose two approaches for constructing the candidate sets A(k), specialized forpolyhedral atomic sets and for spectral atomic sets. In the polyhedral case, we constructA(k) by the traditional polyhedral cutting-plane model described by Proposition 6.0.1 andshow in that the optimal atomic support is identified in finite time. In the spectral case, wefollow Helmberg and Rendl [HR00] who replace the polyhedral cutting-plane model by asemidefinite cutting-plane model as discussed in Section 7.2.7.1 Polyhedral Atomic SetIn the case where the atomic set A is finite, the convex hull Aˆ is polyhedral. Assume thefirst stage stops at iteration T with a candidate atomic set AT . Then we solve the primalatomic pursuit on the discovered atoms, namely the reduced problem (2.0.5) with S = AT .Algorithm 2 summarizes the method for solving (2.0.6) with polyhedral atomic set. It is aspecialized version of Algorithm 1, which applies generally.Proposition 7.1.1 (Polyhedral bundle update). The bundle update step (7.1.1) in Algo-rithm 2 satisfies the generic bundle update rule outlined in (7.0.1).277.1. Polyhedral Atomic SetAlgorithm 2 Polyhedral Bundle-type method1: (Set tolerance) δ > 02: (Initial point) y(1) ∈ B′ and a(1) ∈ FA(M∗y(1))3: (Initialize bundle) A(1) := { a(1) }4: (Set center) yˆ = y(1)5: for k = 1, 2, ... do6: (Upper bound) U (k) = mini=1,....kσÂ(M∗y(i))7: (Gap) δ(k) = U (k) − d∗8: (Stopping criterion 1) δ(k) ≤ δ9: (Level set)Y (k) ={y ∈ B′∣∣∣ σA(k)(M∗y) ≤ d∗, 〈y − y(k), yˆ − y(k)〉 ≤ 0 }10: (Next iterate)y(k+1) = projY (k)(yˆ), a(k+1) ∈ FA(M∗y(k+1))11: (Update bundle)A(k+1) = A(k)∪{a(k+1)}\{a ∈ A(k) | 〈a,M∗y(k+1)〉 < min {σA(k)(M∗y(k+1)), d∗ − δ }}(7.1.1)12: (Stopping criterion 2) A(k) = A(k+1)13: end for14: return A(k)Proof. By comparing (7.1.1) and (7.0.1), we only need to show thatFA(k)(M∗y(k+1)) ⊆{a ∈ A(k) | 〈a,M∗y(k+1)〉 ≥ min {σA(k)(M∗y(k+1)), d∗ − δ }}.This is indeed true, because for any a ∈ FA(k)(M∗y(k+1)), we have〈a,M∗y(k+1)〉 = σA(k)(M∗y(k+1))≥ min {σA(k)(M∗y(k+1)), d∗ − δ } .Theorem 7.1.2 (Convergence of primal estimate). Under stopping criterion 1, Algorithm 2stops in a finite number of iterations T . Moreover,0 ≤ pT − p∗ ≤ δd∗(d∗ − δ) ,where pT and p∗ are the optimal values, respectively, of (2.0.5) (with S = AT ) and (2.0.3).287.1. Polyhedral Atomic SetProof. By strong duality( Proposition 3.0.1(i)), we know thatσA(M∗y∗)γA(x∗) = 1,σAˆ(M∗yˆ)γAˆ(xˆ) = 1.Then it follows that,γAˆ(xˆ)− γA(x∗) =1σAˆ(M∗yˆ)− 1σA(M∗y∗)=σA(M∗y∗)− σAˆ(M∗yˆ)σAˆ(M∗yˆ)σA(M∗y∗)≤ d∗(d∗ − ) .Now by the fact that γAˆ(xˆ) ≥ γA(xˆ) ≥ γA(x∗), we can conclude thatγA(xˆ)− γA(x∗) ≤ d∗(d∗ − ) .The next result establishes that Algorithm 2 identifies the optimal support in finite time.Theorem 7.1.3 (Finite optimal support identification). Under stopping critoerion 2, Algo-rithm 2 stops in a finite number of iterations T , and S ⊆ AT for all S ∈ supp∗A.Proof. First, we show that the algorithm will terminate in finite steps with stopping criteria 2.DefineS(k)δ = {a ∈ A | 〈a,M∗yk〉 ≥ d∗ − δ}.By Theorem 7.0.3, we know that y(k) → y∗, where y∗ is some optimal solution to (2.0.6).Then there exist positive number K such that ∀k ≥ K,FA(M∗y∗) ⊆ S(k)δ .By the construction of A(k+1), we know that A(k+1) ⊆ S(k)δ for all k. We can thus concludethat there exist some finite number T such that AT+1 = AT .Next, we show that when the algorithm terminate, the bundle will contain suppA(x∗).From the discussion above, we know thatAT = S(T )δ and FA(M∗y∗) ⊆ S(T )δ .Then by Theorem 5.0.1, the result follows.297.2. Spectral Atomic Set7.2 Spectral Atomic SetAll the bunlde-type methods on the gauge dual (2.0.6) we have seen so far can interpretedas forming inner polyhedral approximations to the atomic set. However, when the atomicset is not polyhedral, which is usually the case when dealing with semidefinite programs,these polyhedral bundle-types do not perform that well. Can we form richer approximations,namely non-polyhedral, approximations to the atomic set? In [HR00], instead of thetraditional polyhedral cutting plane model, the author propose a semidefinite cutting planemodel that is formed by restricting the feasible set to an appropriate face of the semidefinitecone. Here we apply a similar idea.The spectral atomic set A = {uuT | ‖u‖2 = 1 } contains uncountably many atoms. Thesupport function corresponding to A has the explicit form σA(z) = max{λmax(z), 0} [FMP14,Proposition 7.2].Now consider problem (2.0.6). Let y(k) be the current iterate and P (k) be a n-by-rorthogonal matrix whose range intersects the leading eigenspace of M∗y(k). (In this setting,the adjoint operator maps m-vectors to n-by-n symmetric matrices.) Then we can build alocal spectral inner approximation of A byA(k) = {P (k)V P (k)T | V  0, tr(V ) ≤ 1 } .This definition only uses information from the current iterate y(k). Now we consider allthe previous iterates y(1), . . . , y(k−1). Following the aggregation step proposed by Helmbergand Rendl [HR00], we aggragate the information from previous iterates into a single matrixW (k) ∈ Â, and get a richer spectral inner approximation byA(k) = {αW (k) + P (k)V P (k)T | α+ tr(V ) ≤ 1, V  0 } . (7.2.1)The following result shows that the corresponding lower minorant is easy to compute.Proposition 7.2.1 (Spectral cutting-plane model). With A(k) as defined in (7.2.1), thespectral cutting-plane model is given byσA(k)(M∗y) = max { 0, λmax(P (k)TM∗yP (k)), 〈W (k),M∗y〉 } . (7.2.2)Proof.σA(k)(M∗y)= max{〈M∗y, αW (k) + P (k)V P (k)T 〉 | α ≥ 0, α+ tr(V ) ≤ 1, V  0}= max0≤α≤1{α〈W (k),M∗y〉+ (1− α) max{〈P (k)V P (k)T ,M∗y〉 | tr(V ) ≤ 1, V  0}}= max0≤α≤1{α〈W (k),M∗y〉+ (1− α) max { 0, λmax(P (k)TM∗yP (k)) }}= max { 0, λmax(P (k)TM∗yPk), 〈W (k),M∗y〉 }307.2. Spectral Atomic SetAlgorithm 3 Bundle-type method for spectral atomic sets1: (Set tolerance) δ > 02: (Initial point) y(1) ∈ B′ and choose some leading eigenvector v(1) of M∗y(1)3: (Initialize bundle) Construct A(1) as (7.2.1) whereW (1) = v(1)v(1)Tand P (1) = v(1)4: (Set center) yˆ = y(1)5: for k = 1, 2, ... do6: (Upper bound) U (k) = mini=1,....kσÂ(M∗y(i))7: (Gap) δ(k) = U (k) − d∗8: (Stopping criterion) δ(k) ≤ δ9: (Level set)Y (k) ={y ∈ B′∣∣∣ σA(k)(M∗y) ≤ d∗, 〈y − y(k), yˆ − y(k)〉 ≤ 0 }10: (Next iterate)y(k+1) = projY (k)(yˆ)11: (Update bundle) Construct A(k) as (7.2.1) with W (k) and P (k) defined in (7.2.3).12: end for13: return A(k)The update of the bundle is as follows. Take any matrix W¯ := α¯W (k) + P (k)V¯ P (k)Texposed by the latest iterate M∗y(k+1), i.e., 〈W¯ ,M∗y(k+1)〉 = σA(k)(M∗y(k+1)). The im-portant information of W¯ is contained in the spectrum spanned by the eigenvectors of V¯associated with maximal eigenvalues. Consider the eigenvalue decomposition V¯ = QΛQT ,where Λ = diag(λ1, . . . , λr) with λ1 ≥ · · · ≥ λr. Split the spectrum Λ into two parts:Λ1 = λ1I contains the maximal eigenvalue with possible multiplicity, and Λ2 contains theremaining eigenvalues. Let Q1 and Q2 be the corresponding eigenvectors. Then we updatethe bundle byW (k+1) =α¯W (k) + P (k)Q2Λ2QT2 P(k)Tα¯+ tr(Λ2),P (k+1) = orthog[P (k)Q1, v(k+1)],(7.2.3)where v(k+1) is any leading normalized eigenvector of M∗y(k+1). We describe the algorithmin Algorithm 3, which is also a specified version of the generic algorithm (Algorithm 1).Proposition 7.2.2 (Spectral bundle update). The bundle update step (7.2.3) in Algorithm 3satisfies the generic bundle update rule outlined in (7.0.1).317.2. Spectral Atomic SetProof. Since by construction, a(k+1) = v(k+1)v(k+1)Talways lies in A(k+1). We only needto show FA(k)(M∗y(k+1)) ⊂ A(k+1). Based on Proposition 7.2.1, we consier three differentcases.• Consider the case σA(k)(M∗y) = 0. In this case, we haveFA(k)(M∗y(k+1)) = { 0 } ⊂ A(k+1);• Consider the case σA(k)(M∗y) = λmax(P (k)TM∗yP (k)). In this case, we haveFA(k)(M∗y(k+1)) = { vvT | vTP (k)TM∗yP (k)v = λmax(P (k)TM∗yP (k)) }= {QT1 P (k)TV P (k)Q1 | tr(V ) = 1, V  0 }⊂ A(k+1);• Consider the case σA(k)(M∗y) = 〈W (k),M∗y〉. In this case, we haveFA(k)(M∗y(k+1)) = {W (k) } ⊂ A(k+1).Because the spectral atomic set is a continuum, we do not expect to exactly obtainthe optimal atomic support, and thus exact recovery of the primal solution in finite timeis not possible. Friedlander and Maceˆdo [FM16, Corollary 4] show that an approximateprimal solution can be recovered by solving a semidefinite least-squares problem. Given aset of candidate optimal atoms A(T ), an approximate primal solution can be obtained as thesolution ofminimizex∈Rn12‖Mx− b‖22 subject to x ∈1d∗A(T ). (7.2.4)It is shown by Ding in [DYC+19] that under certain assumptions, the recovery qualitycan be guaranteed.Assumption 7.2.3 (Assumptions to ensure recovery quality). The following three assump-tions are critical for the recovery guarantee of (7.2.4).a) (Uniqueness) Both primal problem (P) and dual problem (D) in (3.0.1) have uniquesolution x∗ and y∗.b) (Strong duality) Every solution pair (x∗, y∗) satisfies the strong duality, as shownin Proposition 3.0.1(i):1 = 〈x∗,M∗y∗〉 = γA(x∗)σÂ(M∗y∗).c) (Strict complementarity) Every solution pair (x∗, y∗) satisfies the strict complementaritycondition:rank(x∗) + rank(M∗y∗) = n.327.2. Spectral Atomic SetTheorem 7.2.4 (Spectral recovery guarantee). Assume Assumption 7.2.3 holds and thefirst stage stops at iteration T with UT − d∗ ≤ δ. Let x∗ and x denote the optimizer for(2.0.3) and (7.2.4) respectively. Then‖x∗ − x‖F = O(√δ)for any solution x∗ of (2.0.3).The proof for this theorem follows directly from Ding et al. [DYC+19, Theorem 1.2]with minor modification.33Chapter 8Experiments8.1 Basis pursuit denoisingThe basis pursuit denoising (BPDN) [CDS01] problem arises in sparse recovery applications.Let A : Rn → Rm be some measurement matrix. Let x0 denote the original signal andb = Ax0 be the vector of observations, where x0 is sparse and the observation b might benoisy. For some expected noise level  > 0, the BPDN model isminimizex∈Rn‖x‖1 subject to ‖Ax− b‖2 ≤ . (8.1.1)In this case, the atomic set A is the set of signed unit one-hot vectors A = {±e1, . . . ,±en },the linear measurement operator M is the matrix A and B is the 2-norm ball centered at bwith radius .And follow the discussion in Example 4.0.1, we can see that the corresponding gaugedual problem is given by:minimizey∈Rm‖AT y‖∞ subject to y ∈ B′, (8.1.2)where B′ = { y | 〈b, y〉 − ‖y‖2 ≥ 1 } follows directly from the definition of antipolar (2.0.8).0 1000 2000-0.2-0.100.10.20.3Original coefficientsOptimal coefficientsRecovered coefficientsFigure 8.1.1: Original signal vs Optimal signal computed by cvx vs Recovered signal.348.2. Phase retrieval0 20 40 60iteration10-1010-5100valuegapupper boundlower boundFigure 8.1.2: Convergence behavior of dual objective.0 20 40 60iteration0204060sizesize of bundlesize of intersection with ground truthsize of intersection with optimalFigure 8.1.3: Size of bundle and its intersection with the atomic support for both original signal andoptimal signal.8.2 Phase retrievalPhase retrieval is a problem of recovering a signal from magnitude-only measurements.Specifically, let x0 ∈ Rn be some unknown signal and the measurements are given bybi = |〈x0,mi〉|2, where each vector mi encodes illumination i = 1, . . . ,m. Cande´s etal. [CESV15] advocate “lifting” the signal as X0 = x0xT0 so that the measurements are linearin X0:bi = 〈x0xT0 ,mimTi 〉 = 〈X0,Mi〉, i = 1, . . . , k,358.2. Phase retrievalwhere Mi = mimTi . The following semidefinite program can be used to recover X∗ ≈ x0xT0 :minimizeXtr(X) + δ0(X)subject to 〈X,Mi〉 = bi, i = 1, . . . ,m.(8.2.1)Define A as the set of normalized positive semidefinite rank-1 matrices, and define the linearoperator M asM(X) = [〈X,Mi〉]i=1,...,m . (8.2.2)Then the atomic pursuit problem (2.0.3) is equivalent to (8.2.1) with B = {b}.And follow the discussion in Example 4.0.5, we can see that the corresponding gaugedual problem is given by:minimizey∈Rmmax { 0, λmax(M∗y) } subject to 〈b, y〉 ≥ 1, (8.2.3)where the joint operator M∗ is defined to beM∗y =m∑i=1yiMi.In order to measure the distance between the atomic support and the bundle, we use theHausdorff distance from the point X0 to the convex hull of the set A(k):dH(X0, A(k)) =∥∥∥∥ X0‖X0‖F − projA(k)(X0‖X0‖F)∥∥∥∥F. (8.2.4)10 iterations 20 iterations 30 iterations 40 iterations50 iterations 60 iterations 70 iterations 80 iterations90 iterations 100 iterations 110 iterations 120 iterations130 iterations 140 iterations 150 iterations 160 iterations170 iterations 180 iterations 190 iterations 200 iterationsFigure 8.2.1: Images recovered at intermediate iterations of the spectral bundle method Algorithm 3for recovering a ground-truth image.368.2. Phase retrieval0 50 100 150 200iteration10-2100102 optimal value gapprimal infeasibilityHausdorff distanceFigure 8.2.2: Convergence of objective value, primal infeasibility, and Hausdorff distance to theoptimal atomic support.37Chapter 9ConclusionConvex optimization formulations of inverse problems often come with very strong recoveryguarantees, but the formulations may be too large to be practical for large problems. Thisis especially true of spectral problems, which require very expensive computational kernels.Our atomic pursuit approach shifts the focus from the solution of the full convex optimizationproblem to a sequence of “reduced” problems meant to expose the constituent atoms thatform the final solution. In some sense, atomic pursuit is a generalization of more classicalactive-set algorithms. Future avenues of research include the design of specialized SDPsolvers for the solution of the highly-structured bundle subproblems, and applying thealgorithm framework to other atomic sets.38Bibliography[ABD+17] Alexandre Y Aravkin, James V Burke, Dmitriy Drusvyatskiy, Michael P Fried-lander, and Kellie MacPhee. Foundations of gauge and perspective duality.arXiv preprint arXiv:1702.08649, 2017.[ALMT14] Dennis Amelunxen, Martin Lotz, Michael B McCoy, and Joel A Tropp. Living onthe edge: Phase transitions in convex programs with random data. Informationand Inference: A Journal of the IMA, 3(3):224–294, 2014.[B+13] Francis Bach et al. Learning with submodular functions: A convex optimizationperspective. Foundations and Trends R© in Machine Learning, 6(2-3):145–373,2013.[Bel] Alexandre Belloni. Lecture notes for iap 2005 course introduction to bundlemethods.[BKL95] Ulf Bra¨nnlund, Krzysztof C Kiwiel, and Per Olof Lindberg. A descent proximallevel bundle method for convex nondifferentiable optimization. OperationsResearch Letters, 17(3):121–126, 1995.[CdO14] JY Bello Cruz and Welington de Oliveira. Level bundle-like algorithms forconvex optimization. Journal of Global Optimization, 59(4):787–809, 2014.[CDS01] Scott Shaobing Chen, David L Donoho, and Michael A Saunders. Atomicdecomposition by basis pursuit. SIAM review, 43(1):129–159, 2001.[CESV15] Emmanuel J Candes, Yonina C Eldar, Thomas Strohmer, and Vladislav Voronin-ski. Phase retrieval via matrix completion. SIAM review, 57(2):225–251, 2015.[CRPW12] Venkat Chandrasekaran, Benjamin Recht, Pablo A Parrilo, and Alan S Willsky.The convex geometry of linear inverse problems. Foundations of Computationalmathematics, 12(6):805–849, 2012.[CRT04] Emmanuel Candes, Justin Romberg, and Terence Tao. Robust uncertainty prin-ciples: Exact signal reconstruction from highly incomplete frequency information.arXiv preprint math/0409186, 2004.[Don06] D.L. Donoho. For most large underdetermined systems of linear equations theminimal `1-norm solution is also the sparsest solution. Communications on pureand applied mathematics, 59(6):797–829, 2006.39Bibliography[DYC+19] Lijun Ding, Alp Yurtsever, Volkan Cevher, Joel A Tropp, and Madeleine Udell.An optimal-storage approach to semidefinite programming using approximatecomplementarity. arXiv preprint arXiv:1902.03373, 2019.[FM16] Michael P Friedlander and Ives Macedo. Low-rank spectral optimization viagauge duality. SIAM Journal on Scientific Computing, 38(3):A1616–A1638,2016.[FMP14] Michael P Friedlander, Ives Macedo, and Ting Kei Pong. Gauge optimizationand duality. SIAM Journal on Optimization, 24(4):1999–2022, 2014.[HR00] Christoph Helmberg and Franz Rendl. A spectral bundle method for semidefiniteprogramming. SIAM Journal on Optimization, 10(3):673–696, 2000.[HUL12] Jean-Baptiste Hiriart-Urruty and Claude Lemare´chal. Fundamentals of convexanalysis. Springer Science & Business Media, 2012.[Kel60] James E Kelley, Jr. The cutting-plane method for solving convex programs.Journal of the society for Industrial and Applied Mathematics, 8(4):703–712,1960.[Kiw90] Krzysztof C Kiwiel. Proximity control in bundle methods for convex nondiffer-entiable minimization. Mathematical programming, 46(1-3):105–122, 1990.[LNN95] Claude Lemare´chal, Arkadii Nemirovskii, and Yurii Nesterov. New variants ofbundle methods. Mathematical programming, 69(1-3):111–147, 1995.[Lov83] La´szlo´ Lova´sz. Submodular functions and convexity. In Mathematical Program-ming The State of the Art, pages 235–257. Springer, 1983.[NY83] Arkadii Semenovich Nemirovsky and David Borisovich Yudin. Problem com-plexity and method efficiency in optimization. 1983.[OJV11] Guillaume Obozinski, Laurent Jacob, and Jean-Philippe Vert. Group lasso withoverlaps: the latent group lasso approach. arXiv preprint arXiv:1110.0413, 2011.[RFP10] Benjamin Recht, Maryam Fazel, and Pablo A Parrilo. Guaranteed minimum-rank solutions of linear matrix equations via nuclear norm minimization. SIAMreview, 52(3):471–501, 2010.[Roc97] R Tyrrell Rockafellar. Convex analysis. princeton landmarks in mathematics,1997.[RXH08] Benjamin Recht, Weiyu Xu, and Babak Hassibi. Necessary and sufficientconditions for success of the nuclear norm heuristic for rank minimization. In2008 47th IEEE Conference on Decision and Control, pages 3065–3070. IEEE,2008.40

Cite

Citation Scheme:

        

Citations by CSL (citeproc-js)

Usage Statistics

Share

Embed

Customize your widget with the following options, then copy and paste the code below into the HTML of your page to embed this item in your website.
                        
                            <div id="ubcOpenCollectionsWidgetDisplay">
                            <script id="ubcOpenCollectionsWidget"
                            src="{[{embed.src}]}"
                            data-item="{[{embed.item}]}"
                            data-collection="{[{embed.collection}]}"
                            data-metadata="{[{embed.showMetadata}]}"
                            data-width="{[{embed.width}]}"
                            async >
                            </script>
                            </div>
                        
                    
IIIF logo Our image viewer uses the IIIF 2.0 standard. To load this item in other compatible viewers, use this url:
https://iiif.library.ubc.ca/presentation/dsp.24.1-0380495/manifest

Comment

Related Items