Gauge duality and low-rankspectral optimizationbyIves José de Albuquerque Macêdo JúniorB.Sc., Universidade Federal de Pernambuco, 2005M.Sc., Associação Instituto Nacional de Matemática Pura e Aplicada, 2007D.Sc., Associação Instituto Nacional de Matemática Pura e Aplicada, 2011A THESIS SUBMITTED IN PARTIAL FULFILLMENT OFTHE REQUIREMENTS FOR THE DEGREE OFDOCTOR OF PHILOSOPHYinThe Faculty of Graduate and Postdoctoral Studies(Computer Science)THE UNIVERSITY OF BRITISH COLUMBIA(Vancouver)December 2015c© Ives José de Albuquerque Macêdo Júnior 2015AbstractThe emergence of compressed sensing and its impact on various applications insignal processing and machine learning has sparked an interest in generalizingits concepts and techniques to inverse problems that involve quadratic mea-surements. Important recent developments borrow ideas from matrix liftingtechniques in combinatorial optimization and result in convex optimizationproblems characterized by solutions with very low rank, and by linear opera-tors that are best treated with matrix-free approaches. Typical applicationsgive rise to enormous optimization problems that challenge even the very bestworkhorse algorithms and numerical solvers for semidefinite programming.The work presented in this thesis focuses on the class of low-rank spectraloptimization problems and its connection with a theoretical duality frameworkfor gauge functions introduced in a seminal paper by Freund (1987). Throughthis connection, we formulate a related eigenvalue optimization problem moreamenable to the design of specialized algorithms that scale well and can beused in practical applications.We begin by exploring the theory of gauge duality focusing on a slightlyspecialized structure often encountered in the motivating inverse problems.These developments are still made in a rather abstract form, thus allowing forits application to different problem classes.What follows is a connection of this framework with two important classesof spectral optimization problems commonly found in the literature: traceminimization in the cone of positive semidefinite matrices and affine nuclearnorm minimization. This leads us to a convex eigenvalue optimization problemwith rather simple constraints, and involving a number of variables equal tothe number of measurements, thus with dimension far smaller than the primal.The last part of this thesis exploits a sense of strong duality between theprimal-dual pair of gauge problems to characterize their solutions and to devisea method for retrieving a primal minimizer from a dual one. This allows usto design and implement a proof of concept solver which compares favorablyagainst solvers designed specifically for the PhaseLift formulation of the cele-brated phase recovery problem and a scenario of blind image deconvolution.iiPrefaceThe work presented herein is the product of a joint work and active col-laboration between myself and my supervisor, Professor Michael Friedlander.The material that appears in Chapter 2 and parts of Chapter 3 are the resultof additional collaboration with Dr. Ting Kei Pong while he was a postdocat UBC. The developments that result from this joint work and collaborationhave been published in the SIAM Journal on Optimization; see Friedlander,Macêdo, and Pong (2014).Some parts of Chapter 3 as well as the developments and results appearingin Chapters 4 and 5 have been submitted to the SIAM Journal on ScientificComputing and made publicly available in preprint form; see Friedlander andMacêdo (2015).The proof of concept implementation described in Chapter 4 was developedin Matlab jointly by myself and Professor Michael Friedlander, while thespecialized solvers against which we compare our results in Chapter 5 werekindly provided by some of their authors.iiiTable of ContentsAbstract . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . iiPreface . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . iiiTable of Contents . . . . . . . . . . . . . . . . . . . . . . . . . . . . ivList of Tables . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . viList of Figures . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . viiAcknowledgments . . . . . . . . . . . . . . . . . . . . . . . . . . . . viiiDedication . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . x1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11.1 From sparse to low-rank signal recovery . . . . . . . . . . . . . 21.1.1 Phase retrieval via matrix lifting . . . . . . . . . . . . 81.1.2 Blind deconvolution and biconvex compressive sensing 141.2 Norm-minimization and gauge duality . . . . . . . . . . . . . . 171.3 Thesis overview and contributions . . . . . . . . . . . . . . . . 252 Gauge optimization and duality . . . . . . . . . . . . . . . . . 272.1 Preliminaries . . . . . . . . . . . . . . . . . . . . . . . . . . . 292.1.1 Polar sets . . . . . . . . . . . . . . . . . . . . . . . . . 302.1.2 Gauge functions . . . . . . . . . . . . . . . . . . . . . . 312.1.3 Antipolar sets . . . . . . . . . . . . . . . . . . . . . . . 352.2 Antipolar calculus . . . . . . . . . . . . . . . . . . . . . . . . . 362.2.1 Linear transformations . . . . . . . . . . . . . . . . . . 372.2.2 Unions and intersections . . . . . . . . . . . . . . . . . 402.3 Duality derivations . . . . . . . . . . . . . . . . . . . . . . . . 412.3.1 The gauge dual . . . . . . . . . . . . . . . . . . . . . . 432.3.2 Lagrange duality . . . . . . . . . . . . . . . . . . . . . 45ivTable of Contents2.4 Strong duality . . . . . . . . . . . . . . . . . . . . . . . . . . . 462.5 Variational properties of the gauge value function . . . . . . . 502.6 Extensions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 522.6.1 Composition and conic side constraints . . . . . . . . . 522.6.2 Nonnegative conic optimization . . . . . . . . . . . . . 543 Spectral gauge optimization and duality . . . . . . . . . . . . 563.1 Semidefinite nonnegative conic optimization . . . . . . . . . . 563.1.1 Trace minimization in the PSD cone . . . . . . . . . . 573.2 Nuclear-norm minimization . . . . . . . . . . . . . . . . . . . 603.2.1 Reduction to PSD trace minimization . . . . . . . . . . 623.3 Spectral lower models for convex minimization . . . . . . . . . 633.3.1 Spectral lower models for the dual gauge objective . . . 643.3.2 Proximal-bundle subproblems . . . . . . . . . . . . . . 674 Low-rank spectral optimization . . . . . . . . . . . . . . . . . . 714.1 Derivation of the approach . . . . . . . . . . . . . . . . . . . . 734.1.1 Recovering a primal solution . . . . . . . . . . . . . . . 734.1.2 Primal recovery subproblem . . . . . . . . . . . . . . . 754.2 The implementation of a proof of concept solver . . . . . . . . 774.2.1 Dual descent . . . . . . . . . . . . . . . . . . . . . . . 774.2.2 Primal recovery . . . . . . . . . . . . . . . . . . . . . . 804.2.3 Primal-dual refinement . . . . . . . . . . . . . . . . . . 804.3 Extensions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 814.3.1 Weighted trace minimization . . . . . . . . . . . . . . . 824.3.2 Weighted affine nuclear-norm minimization . . . . . . . 835 Numerical experiments . . . . . . . . . . . . . . . . . . . . . . . 865.1 Phase recovery . . . . . . . . . . . . . . . . . . . . . . . . . . 865.1.1 Random Gaussian signals . . . . . . . . . . . . . . . . 875.1.2 Random problems with noise . . . . . . . . . . . . . . 885.1.3 Two-dimensional signal . . . . . . . . . . . . . . . . . . 895.2 Blind deconvolution . . . . . . . . . . . . . . . . . . . . . . . . 916 Conclusions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 93Bibliography . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 96vList of TablesTable 2.1 The main rules of the antipolar calculus . . . . . . . . . 37Table 5.1 Phase retrieval comparisons for random complex Gaus-sian signals of size n = 128 measured using random com-plex Gaussian masks . . . . . . . . . . . . . . . . . . . . 88Table 5.2 Phase retrieval comparisons for problems with noise . . 89Table 5.3 Phase retrieval comparisons on a 2-dimensional image. . 90Table 5.4 Blind deconvolution comparisons. . . . . . . . . . . . . . 92viList of FiguresFigure 1.1 Unit balls of the 1-, 2- and max-norms in R2 . . . . . . 3Figure 1.2 Unit balls of the 1-, 2- and max-norms in R3 . . . . . . 3Figure 1.3 Unit balls of the nuclear- and operator-norms for realsymmetric 2-matrices . . . . . . . . . . . . . . . . . . . 5Figure 1.4 Unit ball of the nuclear-norm for 2-by-2 real symmetricmatrices . . . . . . . . . . . . . . . . . . . . . . . . . . 6Figure 1.5 Unit ball of the operator-norm for 2-by-2 real symmetricmatrices . . . . . . . . . . . . . . . . . . . . . . . . . . 6Figure 1.6 Visualizing the structural content encoded in the Fourierphase . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9Figure 2.1 Visual depiction of the counter-example described in Ex-ample 2.2.1 . . . . . . . . . . . . . . . . . . . . . . . . 42Figure 3.1 Visualizing the geometry of the gauge dual feasible set 59Figure 3.2 Generalized unit balls of the trace-gauge and its polarfor real symmetric 2-by-2 matrices . . . . . . . . . . . . 60Figure 3.3 Generalized unit ball of the trace-gauge for real symmet-ric 2-matrices . . . . . . . . . . . . . . . . . . . . . . . 61Figure 3.4 Generalized unit ball of the polar to the trace-gauge forreal symmetric 2-matrices . . . . . . . . . . . . . . . . 61Figure 5.1 Image used for large phase retrieval experiment (≈2MP) 90Figure 5.2 Images used for the blind deconvolution experiments . 92viiAcknowledgmentsFirst and foremost, I would like to express my utmost respect, admiration andgratitute to my supervisor, Professor Michael Friedlander. During all theseyears working together, he has provided me insightful guidance, enthusiasticencouragement and tireless support. His work ethic and attitude towardscollaboration has given me an example to aim for.I would like to further express my gratitude to my colleages Ting KeiPong, Nathan Krislock, and Ernie Esser, who has left us way too young andthis World a bit less friendly and jovial. Our numerous and insightful dis-cussions and colaboration have had a great impact on me and broadened myperspectives. I am thankful and priviledged to have crossed paths with you.Thank you to professors Nick Harvey and Philip Loewen for very kindlyaccepting to be part of my Supervisory Committee and providing insightfuladvice during early and later stages of this research. To Joyce Poon, for makingmy life so much easier by taking care of and providing guidance through somany administrative matters.I am also grateful to professors Eldad Haber and Yaniv Plan to generouslyaccepting to be part of my committee as University Examiners and for theirthoughtful feedback on this research. I am humbled and thankful to ProfessorRobert Freund for his acceptance to serve as my External Examiner and forhis presence and comments during my Final Doctoral Exam.Without the financial support provided by awards from the Universityof British Columbia and its Department of Computer Science, by means oftheir Four-Year Doctoral Fellowship (4YF) and International Partial TuitionScholarship, I would have not been able to even start this endeavour. For that,I am very grateful.I am thankful to my colleagues Gabriel Goh and Julie Nutini for many in-sightful discussions and great times during our years as students under Michaeland to this day. My times in the Scientific Computing Lab were filled withgreat interactions and good-hearted discussions with them, Farbod Roosta-Khorasani, Michael Wathen, Essex Edwards and, in earlier times, Ewout vanden Berg, who convinced me a second doctorate might make sense if workingwith Michael Friedlander.viiiAcknowledgmentsMy years at UBC provided me with the opportunity to meet some trulyamazing people. I am deeply grateful to the numerous friends I made whileliving in St. John’s College and while part of UBC’s Salsa-Rueda and LatinDance clubs. You guys kept me sane throughout most of this crazy time.A special thank you to my very dear friends Cristina, Emilio, Clarissa,Julio, Mario and Patricia. You guys made my visits to Calgary so very muchwarm, no matter how cold it was outside. With you, I would feel like I wasback home.Toè, I am grateful for your support, companionship, patience and love.We endured together much pain, frustration and anxiety that is often involvedin the later stages of a doctorate. Without you by my side, this would havebeen a much steeper journey.At last, I am grateful and forever indebted to the unconditional love andneverending emotional support from my parents, Rosélis and Ives, and mysisters, Catarine and Jéssica. You are my constants.ixToMy Parents, Rosélis and Ives,&My Sisters, Catarine and Jéssica.xChapter 1IntroductionRecovering a signal from partial, and possibly corrupted, measurements isa fundamental and ubiquitous problem in science and engineering. In manyapplications, the nature of the measurement acquisition process limits the typeand number of observations available for recovery. This limited availability ofdata in relation to the unknown signal’s ambient space often leads to highlyill-posed inverse problems.Common approaches to resolve this ambiguity often lead to optimizationproblems whose solution one wishes to (perhaps approximately) reproducethe measurements while also exhibiting some application-dependent extremalproperty expected from a good estimate. Classical examples of these ap-proaches are Tikhonov/2-norm regularization and the problem of obtaininga minimum-norm solution to an underdetermined linear system.The past decade has seen the emergence and fast growth of sparse reco-very and compressed sensing, where one expects the signal to possess somelow-dimensional structure, typically represented by the number of coefficientsnecessary for its representation as a linear combination of elements from somebasis or larger dictionary. This approach has received much interest for itstheoretical recovery guarantees and the development of effective computationalmethods for solving the challenging optimization problems that are required.Motivated by the successful application of these developments, there hasbeen a growing body of work that extends concepts and techniques from sparserecovery to problems with different senses of low-dimensionality and moregeneral measurements. Recent examples include extensions of sparse recoveryresults to low-rank matrices—in which the rank of a matrix is the counterpartto the number of nonzeros in a vector—and problems where the measurementsare quadratic, which are then seen as linear measurements from a “lifted”matrix space in which the unknown signal is rank-1.For the latter examples, there have been suitable theoretical recoveryresults, but the computational cost associated to the lifting technique haslimited the scale of the problems and the practical use of these approaches.The main purpose of this thesis is to study the low-rank spectral optimizationproblems that arise from these techniques and propose numerical methods able11.1. From sparse to low-rank signal recoveryto handle large-scale problems that arise in practice.In the following sections we contextualize and introduce the problems thatmotivate this work along with existing approaches for their solution. We thendescribe the class of gauge optimization problems, which neatly captures theseand a wide variety of other formulations that arise from inverse problems, andprovides the basis for our subsequent developments.1.1 From sparse to low-rank signal recoveryIn a number of applications of signal processing, it has been observed thatdigital signals admit compact approximate representations using techniquessuch as transform coding, i.e., they can be well represented by just a fewelements from a suitable basis or dictionary of functions and their coefficients.This simple observation forms the basis for many compression schemes,such as the ubiquitous JPEG (Taubman and Marcellin, 2001), and naturallyleads to the ambitious endeavour of searching for an optimally sparse signalthat satisfies a set of measurements.More formally, let s ∈ Rd be a vector representation of our digital signal,Φ ∈ Rd×n a basis or overcomplete dictionary of elementary signals (i.e., d ≤ nand rank Φ = d), Ψ ∈ Rm×d a model for a linear measurement process andb ∈ Rm a vector of possibily corrupted observations under this process. Thisquest for a sparsest representation of s with respect to Φ, while approximatingthe measurements ‖b−Ψs‖2 ≤ , can be posed as the optimization problemminimizex∈Rn‖x‖0 subject to ‖b− Ax‖2 ≤ ,where s = Φx, A = ΨΦ ∈ Rm×n and ‖x‖0 := #{i = 1, . . . , n |xi 6= 0}.Although easy to state, this problem is extremely difficult to solve in general,in fact it is known to be NP-hard; c.f. Natarajan (1995).While studying exact sparse overcomplete representations of signals, Chen,Donoho, and Saunders (2001) propose a convex relaxation that substitutesthe function ‖·‖0 by the 1-norm of vectors. This approach leads to a convexminimization problem of the formminimizex∈Rn‖x‖1 subject to ‖b− Ax‖2 ≤ ,where ‖x‖1 :=∑ni=1 |xi| , which we refer to as the basis pursuit denoise (BPDN)problem. This convex optimization problem can be solved in practice using anumber of techniques; e.g., see van den Berg and Friedlander (2011).21.1. From sparse to low-rank signal recoveryFigure 1.1: Unit balls of the 1-, 2- and max-norms in R2.Figure 1.2: Unit balls of the 1-, 2- and max-norms in R3.By looking at the unit balls of different p-norms (Figures 1.1 and 1.2), onecan intuitively observe that a minimum-norm point on an affine subspace ingeneral position is most likely to be sparse if the norm used is the 1-norm.The practical observation that basis pursuit ( = 0) solutions often leadto an exact recovery of sparse signals led to further study of the relationshipsbetween the original and the convexified problems. This convex optimizationapproach to sparse recovery gained tremendous momentum with theoreticalresults providing a number of conditions under which these two problems arein fact equivalent, or at least bound the error for the BPDN solution—e.g.,see Candès and Tao (2005); Donoho (2006); Bruckstein, Donoho, and Elad(2009).Motivated by the practical success and available theoretical tools and tech-niques developed for compressed sensing based on the 1-norm heuristic, therehas been an effort to extend this framework to other classes of problemsin which different notions of complexity play roles analogous to that of thecardinality (i.e., number of nonzeros) in sparse recovery.31.1. From sparse to low-rank signal recoveryMost notable is the generalization to matrices in which the rank functionserves as the matrix counterpart to the number of nonzeros in a vector. Appli-cations in which a low-rank matrix is sought appear in many disciplines andexamples include: factor analysis, where one looks for low-degree statisticalmodels; the computation of low-order controllers or realizations of linear sys-tems in control theory; and finding low-dimensional Euclidean embeddings.In such scenarios, one seeks a lowest-rank matrix satisfying a number of linearmeasurements (possibly within a bound on the misfit), leading to the problemminimizeX∈Rn1×n2‖X‖0 subject to ‖b−AX‖2 ≤ , (1.1)where A : Rn1×n2 → Rm is a linear map, ‖X‖0 := ‖σ(X)‖0 = rankX andσ : Rn1×n2 → Rnmin+ maps a matrix to the vector of its singular values innonincreasing order (nmin := min{n1, n2}), i.e., σ1(X) ≥ . . . ≥ σnmin(X) ≥ 0.This is indeed a generalization: a vector can be encoded as a diagonal matrixand the rank function then gives the vector’s cardinality.By observing the characterization of the rank function in terms of thecardinality of the vector of singular values, one can naturally devise a con-vex relaxation by taking the 1-norm of that vector. This is in fact the convexheuristic for rank minimization advocated and proved tightest by Fazel, Hindi,and Boyd (2001); see also Fazel (2002). This approach leads to convex opti-mization problems of the formminimizeX∈Rn1×n2‖X‖1 subject to ‖b−AX‖2 ≤ , (1.2)where ‖X‖1 := ‖σ(X)‖1 =∑nmini=1 σi(X) is the nuclear norm of a matrix (alsoknown as the trace norm, Ky Fan n-norm and Schatten 1-norm).Remark 1.1.1. We shall adopt the Schatten p-norm notation throughout,i.e., ‖X‖p := ‖σ(X)‖p with 1 ≤ p ≤ ∞, which closely parallels the notationof the vector case. A peculiar consequence of this choice of notation is that‖X‖2 will in fact denote the Frobenius norm of X, as opposed to its operatornorm denoted by ‖X‖∞ = ‖σ(X)‖∞ = σ1(X).Figure 1.3 depicts the unit balls of the nuclear and operator norms ina parameterization of the space of real symmetric 2-by-2 matrices (in thisparameterization, the Frobenius norm has the same geometry as that of the2-norm in R3, depicted in Figure 1.2). The brighter kinks and edges depictthe extreme points of these convex bodies. In the case of the nuclear norm,the brighter points along the circumferences forming the edges of the cylindercorrespond to rank-1 matrices, which intuitively justifies the minimization of41.1. From sparse to low-rank signal recoveryFigure 1.3: Unit balls of the nuclear- and operator-norms for real symmetric 2-matrices. Depictions above correspond to the parameterization[x, z√2; z√2, y].Figures 1.4 and 1.5 provide different viewing positions.this norm in the search for low-rank solutions. Figures 1.4 and 1.5 providedifferent viewpoints to better visualize the geometry of these sets.Early work by Recht, Fazel, and Parrilo (2010) exploited this connectionbetween the 1-norm relaxation from sparse recovery and the nuclear-normheuristic from rank minimization to extend concepts and techniques from com-pressed sensing and provide the first sufficient guarantees for the recovery oflow-rank matrices. Central to their work is a suitable generalization of therestricted isometry property (RIP) to operators on low-rank matrices, intro-duced first for sparse vectors by Candès and Tao (2005).Definition 1.1.1 (Restricted Isometry Property). Let A : Rn1×n2 → Rmbe a linear map and nmin := min{n1, n2}. For every 1 ≤ r ≤ nmin, define ther-restricted isometry constant to be the smallest number δr(A) such that(1− δr) ‖X‖2 ≤ ‖AX‖2 ≤ (1 + δr) ‖X‖2holds for all matrices X ∈ Rn1×n2 with rankX ≤ r.With this RIP concept, X0 ∈ Rn1×n2 a fixed matrix of rank at most r andb := AX0, the following result provides a condition for injectivity of A on theset of matrices with rank bounded by r.51.1. From sparse to low-rank signal recoveryFigure 1.4: Unit ball of the nuclear-norm for 2-by-2 real symmetric matrices.Figure 1.5: Unit ball of the operator-norm for 2-by-2 real symmetric matrices.61.1. From sparse to low-rank signal recoveryTheorem 1.1.1 (Recht et al. (2010)). Suppose that δ2r < 1 for some integerr ≥ 1. Then X0 is the only matrix of rank at most r satisfying AX = b.The following result then provides a connection between rank minimiza-tion and its convex relaxation. Let X∗ be a solution of the nuclear-normminimization problem (1.2) with = 0 and the remaining data as above.Theorem 1.1.2 (Recht et al. (2010)). Suppose that r ≥ 1 is such thatδ5r < 1/10. Then X∗ = X0.Recht et al. (2010) then exploited these sufficient conditions to provide afamily of random measurement maps A for which RIP is satisfied with highprobability whenever m is sufficiently large, however still asymptotically muchsmaller than n1n2.A typical example of such constructions with probabilistic guarantees isthe case when the entries of A are assumed i.i.d. Gaussian random variableswith zero mean and variance 1/m. The result below is but a particular case ofa theorem of Recht et al. (2010).Theorem 1.1.3 (Recht et al. (2010, Theorem 4.2)). Fix 0 < δ < 1.If A is a random variable as above, then, for every 1 ≤ r ≤ nmin, there existpositive constants c0 and c1 depending only on δ such that, with probabilityat least 1− exp(−c1m), δr(A) ≤ δ whenever m ≥ c0r(n1 + n2) log(n1n2).Because the number of degrees of freedom of n1 × n2 real matrices with rankat most r is r(n1 +n2− r), this result ensures that the sufficient recoverabilityconditions above are very likely satisfied without having to perform a numberof measurements anywhere near the dimension of the ambient space (= n1n2)with a qualitatively modest increase on the underlying dimensionality of theproblem.Although these results generalize in a natural manner their counterparts insparse recovery, the measurement process induced by such families of operators—in which one essentially measures random projections of the unknown matrixand accesses information about all of its entries—does not commonly arise inpractice.Later recovery results studied the problem ofmatrix completion (also knownas collaborative filtering or the Netflix problem), in which one seeks to re-trieve a low-rank matrix by observing a relatively small subset of its entries.Motivated by this problem, Candès and Recht (2009) introduced similar pro-babilistic results under a more practical measurement scenario. Avoiding a71.1. From sparse to low-rank signal recoverynumber of technical definitions, their results essentially state that “most n-by-n matrices” of rank r can be recovered exactly, with high probability, vianuclear-norm minimization from m entries observed uniformly at random aslong as m ≥ Crn1+c log n, for constants C and 1/5 ≤ c ≤ 1/4.These works sparked an interest in providing more refined conditions underwhich recoverability of low-rank matrices can be attained via nuclear normminimization under a variety of linear measurement models, these includeRecht (2011); Recht, Xu, and Hassibi (2011); Eldar, Needell, and Plan (2012).The salient characteristics in these results relevant for the design of struc-tured optimization methods are that the number of measurements is consid-erably smaller than the dimension of the ambient matrix space, as well as thetypical rank expected in the solution is also low, while the linear measure-ment operator admits fast evaluation on factored low-rank matrices as doesthe product of its adjoint (and transpose) to a vector, i.e., (A∗y)v and (A∗y)Tu.In the following subsections we describe and contextualize two importantproblems in scientific imaging and signal processing to which these techniquesfor low-rank matrix completion have been extended. Interesting characteristicsshared by these two problems are that they involve quadratic measurements intheir nonconvex formulations, typically involve large dimensions already in thisoriginal form, and are expected to lead to rank-1 solutions of their relaxations.1.1.1 Phase retrieval via matrix liftingIn a number of imaging science applications, the recovery of a complex sig-nal from magnitude observations of its Fourier transform is a fundamentalproblem. In such applications, physical properties of the measurement systemtypically prevent gathering phase information or at least without incurringlarge errors (Millane, 2006).Since much of the structural content of a signal is encoded in its phaseinformation, the problem of phase retrieval appears naturally and must betackled. In Figure 1.6 we illustrate this property by computing the Fouriertransform of two real images, swapping their phases, and visualizing the resultsfrom their inverse Fourier transforms—independently across color channels.Assuming that magnitude measurements are available, one can think of amodel in which the signal is first probed by a linear process prior to observingits magnitudes. In such a scenario, we seek to recover a complex signal x ∈ Cnfrom measurements of the form |Ax| , where |·| denotes the componentwiseabsolute-value map and A ∈ Cm×n models the initial probing process, in whichone would naturally expect m ≥ n. An example is the case when A = F thediscrete Fourier transform (DFT) matrix.81.1. From sparse to low-rank signal recovery(a) Signal x1 (b) Signal x2(c) F−1{|Fx1| (Fx2)/ |Fx2|} (d) F−1{|Fx2| (Fx1)/ |Fx1|}Figure 1.6: Visualizing the structural content encoded in the Fourier phase.91.1. From sparse to low-rank signal recoveryThis is a scenario in which the most popular algorithms operate, the al-ternating (nonconvex) projection methods introduced in the seminal works ofGerchberg and Saxton (1972) and Fienup (1978, 1982). The main difficultiesassociated with these approaches lie on the need for a careful exploitation ofprior knowledge about the signal—e.g., support, nonnegativity, real-valuednessetc.—, sensible parameter selection and availability of a good initial iterate.Motivated by these difficulties, structured multiple illumination techniquesin diffraction imaging, and prior connections to rank minimization and matrixcompletion problems by Chai, Moscoso, and Papanicolaou (2011), Candès,Eldar, Strohmer, and Voroninski (2013a) propose and study a more applicablemeasurement model and a convex relaxation for the phase recovery problem.We start with a description of the convex relaxation by noticing that|Ax|2i = (Ax)i(Ax)i = [(Ax)(Ax)∗]ii = [A(xx∗)A∗]ii,leads us to a problem of the formfind x ∈ Cn such that diag[A(xx∗)A∗] = b,where b ∈ Rm denotes the input (squared) magnitude measurements.Observing the intrinsic global-phase ambiguity in the measurement process,in which exp(iθ)x evaluates to the same measurements as x, we can identifythese x with a rank-1 positive semidefinite Hermitian matrix X via X = xx∗.Therefore, if it does admit a solution, this nonconvex feasibility problem canbe recast equivalently as one of rank minimization with the formminimizeX∈HnrankX subject to AX = b and X 0,whereX 0 denotes that the matrixX is Hermitian and positive semidefinite.Employing the nuclear-norm convex relaxation for rank minimization, wearrive at the problemminimizeX∈Hn‖X‖1 subject to AX = b and X 0,which can then be simplified by noticing that σ(X) = λ(X) implies ‖X‖1 =traceX for all X 0, where λ : Hn → Rn denotes the ordered eigenvaluemap, i.e., λ1(X) ≥ . . . ≥ λn(X). This results in the trace minimization SDPminimizeX∈HntraceX subject to AX = b and X 0.This rationale was employed by Chai et al. (2011) in their study of aclass of array imaging problems in which only magnitude measurements can101.1. From sparse to low-rank signal recoverybe obtained by current sensors. They exploit the connection with low-rankrecovery to provide conditions on the imaging and scatterer configurationsunder which the rank minimization problem admits a unique solution. Makinguse of then recent low-rank recovery results, this was accomplished by provingsufficient conditions for δ2(A) < 1 and invoking Theorem 1.1.1 above.Candès et al. (2013a) introduce a measurement model inspired on a numberof structured illumination techniques in scientific imaging. In this model, anillumination of the signal x is modulated by a coded diffraction pattern (CDP),modeled as a diagonal matrix C = Diag(c) ∈ Cn×n, and the resulting signal Cxundergoes a measurement of the squared magnitude of its Fourier transform,leading to measurements of the form |FCx|2 . This procedure is performedL ≥ 1 times with different CDPs, which can be deterministic or drawn from arandom ensemble, e.g., Bernoulli entries modeling a binary mask.In our notation, this model leads to m = nL measurements of the formA :=FC1...FCL , where Ck := Diag(ck), k = 1, . . . , L.It also give rise to very large optimization problems involving linear maps onlyfeasibly treated as matrix-free operators.In that work, Candès et al. (2013a) provide a deterministic construction ofCDPs for which, if the trace minimization problem admits a rank-1 solution, itmatches the measured signal up to a global phase. The main shortcomings werelack of guarantees that the convex problem would have a rank-1 minimizer andwhether the approach would be stable to noisy measurements. These issueswere further studied by Candès, Strohmer, and Voroninski (2013b) in the casewhere the rows of A are Gaussian or uniformly drawn from the sphere of radius√n and the measurements have bounded errors. The convex problem proposedis the one we call here (PhaseLift) (although this originally assumes = 0)minimizeX∈HntraceX subject to ‖b−AX‖2 ≤ and X 0. (PhaseLift)The authors show that, for an arbitrary signal and = 0, if m ≥ c0n log n mea-surements are taken from the distributions above, this problem has a uniquesolution and it has rank-1 with probability at least 1 − 3 exp(−γm/n), forpositive constants c0, γ.For uniform measurements of the sphere, they show that, under the sameconditions, if the measurements are contaminated by additive noise boundedon the 2-norm by > 0, the best rank-1 approximation Xˆ to the solution111.1. From sparse to low-rank signal recoveryof (PhaseLift) will be within a distance of C0 to the measured signal in theFrobenius norm, with nearly the same probability as the noiseless case.These results where further refined by Candès and Li (2014) removingthe logarithmic term, the dependence on the signal—i.e., draw A and theprobabilistic bounds hold for all signals—and proving that the feasible set ofthe = 0 problem is in fact a singleton, a result also shown by Demanet andHand (2014) but with a less tight lower bound on the number of measurements.Much more recently, similar results have been proved for the semidefiniterelaxation under the CDP model in which the ck are drawn for certain randomensembles. Candès, Li, and Soltanolkotabi (2015b) show that L ≥ c0 log4 nensure that the feasible set of the = 0 problem is a singleton with probabilityat least 1 − 1/n. Their result being further refined by Gross, Krahmer, andKueng (2014) decreasing the exponent on the logarithm from 4 down to 2.Although the matrix lifting and convexification techniques relax the quadra-tic measurements to linear operators on matrices, they lead to incredibly chal-lenging convex optimization problems due to the quadratic increase in thedimensionality of the primal space. This calls for customized computationalmethods that exploit the structure of these operators and the (nearly) low-rankproperties of the minimizers.In the following, we describe two main approaches designed for these convexrelaxations of the phase retrieval problem and contrast their results with thosewe obtain in Chapter 5.Convex prox-gradient methodsIn their original paper proposing the PhaseLift formulation, Candès et al.(2013a) propose a computational strategy based on PSD constrained leastsquares (or alternative misfit) regularized by the trace. For a given regula-rization parameter ρ > 0, their basic approach aims to solve the problemminimizeX∈Hnf(X) :={12‖b−AX‖22 + ρ traceX}subject to X 0.To solve this problem, the authors leverage the excellent Matlab packageTFOCS, introduced by Becker, Candès, and Grant (2011). TFOCS provides awrapper for a number of related first-order algorithms to create customizedsolvers for a variety of convex models. The main building block used to solvethe problem above is the implementation of a prox-gradient operator, i.e., aroutine to solveminimizeX∈Hnf(Xk)+〈∇f(Xk), X−Xk〉+ 12αk‖X −Xk‖22 subject to X 0.121.1. From sparse to low-rank signal recoveryBy elementary manipulations, it is easy to see that this amounts to computinga projection onto the PSD cone, i.e., P·0(Xk−αk∇f(Xk)). This operator canbe evaluated by computing the positive eigenvalues and eigenvectors of a ma-trix. Formally, ifX = QDiag(λ(X))Q∗ is an eigendecomposition ofX, we havethat P·0(X) = QDiag([λ(X)]+)Q∗, where ([x]+)i := max{0, xi} denotes thepositive-part operator. Assuming Xk is available as a low-rank factorization,the structure of ∇f(Xk) = −A∗(b−AXk)+ρI allows for efficient products be-tween the matrix to be projected and a given vector, thus amenable to exploitLanczos-based eigensolvers to compute the positive rightmost eigenpairs.A downside of this approach is that there is no a priori knowledge norcontrol on how many positive eigenvalues may appear throughout the itera-tions. Aware of this issue, the authors heuristically fix the maximum numberof computed eigenpairs and mention that the convergence guarantees providedby the convex optimization algorithms implemented in TFOCS are lost. Never-theless, good empirical results have been reported in their paper, where thisnumber is taken between 10 and 20.For it being one of the first computational methods able to solve thePhaseLift formulation in instances just too large for off-the-shelf SDP solvers,and whose code was kindly provided to us by Prof. Thomas Strohmer, wecontrast the results obtained with our proof-of-concept solver to those of theirimplementation later in Chapter 5.It is noteworthy that Candès et al. (2013a) also propose an iterativelyreweighted method in which sequences of problems similar to the one above aresolved and achieve better low-rank recovery results with fewer measurements.Although we have not performed experiments with reweighted formulations,we show in §4.3 how they can be dealt with and fit within the approach wepropose.Gradient descent on a nonconvex formulationDuring the later stages of our research, Candès, Li, and Soltanolkotabi (2015a)proposed an approach for the phase recovery problem in which one employsa simple gradient descent method on a nonlinear least squares (or alternativemisfit function) formulation. Their approach aims to solve the problemminimizex∈Cnf(x) :=14m ‖x0‖22‖b−A(xx∗)‖22 .Using a predefined sequence of steplengths (αk), this leads to the sequencexk+1 := xk +αk‖x0‖22{1m[A∗(b−A(xkx∗k))]xk}131.1. From sparse to low-rank signal recoveryand has been reported to provide excellent numerical solutions.The key idea behind the effectiveness of this method, and difference fromclassical nonconvex approaches, lies in a judicious choice of its first iterate x0 :x0 ∈ arg max‖x‖22=n∑mi=1bi‖A‖22〈(1mA∗b)x, x〉.This way, the first iterate is computed by a suitable rescaling of an eigenvectorof A∗b associated to its rightmost eigenvalue λ1(A∗b), which can be computedvia matrix-free Lanczos-based methods, thus taking advantage of fast ope-rators and allowing the solution of problems far larger than the prox-basedmethods used by Becker et al. (2011).Naturally, this nonconvex approach with spectral initialization requires acertain number of measurements to be taken for the initial iterate to be closeenough to the basin of attraction of a solution. The remarkable achievementof this work is that Candès et al. (2015a) provide guarantees under which, ifa large enough number of measurements in the coded diffraction model aretaken (m ≥ Cn logd n, for constant C and 2 ≤ d ≤ 4), the algorithm willconverge to an exact solution with high probability.Similar provably effective nonconvex approaches with spectral initializationhave been proposed for Gaussian measurements A that induce operators of theform AX = diag(AXA∗)—see Netrapalli, Jain, and Sanghavi (2013)—and themuch more recent works by White, Sanghavi, and Ward (2015) and Chen andCandès (2015), the latter of which came to our attention at the time of writingthis thesis.In Chapter 5, we contrast results obtained by our solver with those providedby an implementation of this approach made available by its authors.1.1.2 Blind deconvolution and biconvex compressivesensingMotivated by applications in image processing and wireless communications,Ahmed, Recht, and Romberg (2014) study a restricted version of the blinddeconvolution problem under the framework of low-rank matrix recovery.Succintly, the blind (circular) deconvolution problem can be stated as:given measurements of the form b = f1 ∗ f2 ∈ Cm, where ∗ : Cm × Cm →Cm denotes the discrete circular convolution operator, recover the generatingsignals f1, f2 ∈ Cm up to scaling (since cf1 and c−1f2 evaluate to the samemeasurements for all c 6= 0).141.1. From sparse to low-rank signal recoveryThis problem has been approached from a number of perspectives and al-gorithms have been proposed for it in different scenarios by exploiting variousapplication-domain specific priors, e.g., see Levin, Weiss, Durand, and Free-man (2011) for uses and solution schemes in natural image processing.Ahmed et al. (2014) analyze an idealized version of this problem in whichthe signals are spread out in known subspaces, i.e., given full column-rankmatrices B1 ∈ Cm×n1 and B2 ∈ Cm×n2 , then f1 = B1x1 and f2 = B2x2 forsome (dense) coefficient vectors x1 ∈ Cn1 and x2 ∈ Cn2 . This leads to theproblemfind x1 ∈ Cn1 and x2 ∈ Cn2 such that (B1x1) ∗ (B2x2) = b.Invoking the convolution theorem, and denoting by F ∈ Cm×m the unitaryDFT matrix (F−1 = F ∗), we observe the following chain of equalities(B1x1) ∗ (B2x2) = F ∗ diag[(FB1x1)(FB2x2)T]= F ∗ diag[(FB1)(x1x2∗)(FB2)∗].Defining A1 := FB1 ∈ Cm×n1 and A2 := FB2 ∈ Cm×n2 , we can employ amatrix lifting technique similar to the one used for the phase recovery problemand identify the coefficient vectors x1 and x2 with the matrix X = x1x2∗.Thus reformulating the problem above as one of rank minimizationminimizeX∈Cn1×n2rankX subject to F ∗ diag(A1XA∗2) = b.A convex relaxation follows imediately from our previous discussions bysubstituting the rank function by its convex surrogate ‖·‖1 , leading to anuclear-norm minimization problem of the formminimizeX∈Cn1×n2‖X‖1 subject to ‖b−AX‖2 ≤ , (NNM)where we use AX := F ∗ diag(A1XA∗2) and have introduced bounded-misfitconstraints to allow for errors, as described by Ahmed et al. (2014).Ahmed et al. (2014) then provide technical conditions on these bases andthe spread of f1 in Fourier domain under which exact recovery is achieved inthe noiseless case with = 0.In order to experimentally illustrate the recovery ability of this approachin a scenario of two-dimensional image deblurring, they design experimentswhere the image support of a blurring kernel is known (but not its values) andthe image to be recovered lies in a known subspace represented as a subset151.1. From sparse to low-rank signal recoveryof the Haar wavelet basis (or is estimated from the Haar coefficients of theblurred image). Their numerical results are very encouraging and we comparethem with our results on one of their experiments in Chapter 5. This choicewas made essentially to compare the optimization approaches, as we make noclaims of contributions regarding recovery ability once the data for the convexrelaxations has been defined.It is noteworthy that the matrix lifting rationale used above can be em-ployed for more general bilinear (or sesquilinear) measurements. An approachin this direction was presented recently by Ling and Strohmer (2015), whopropose an alternative scheme to achieve both low-rank and sparse recovery(where x1 and x2 are expected to be sparse) by using a matrix lifting to lin-earize the measurement operators, and the elementwise 1-norm on the liftedmatrix to encourage sparsity. Although they provide recovery guarantees fortheir measurement models, there is still a lack of scalable computational ap-proaches for their SparseLift formulation.In the following we provide some details on the method used by Ahmedet al. (2014) to solve (NNM).Augmented Lagrangian for low-rank SDPsAhmed et al. (2014) use the nonlinear programming (NLP) algorithm of Burerand Monteiro (2003) to solve nuclear-norm minimization problems whose sizewould be too large for off-the-shelf SDP packages.Their approach first reformulates (NNM), with = 0, into a fully equivalentSDP as proposed by Fazel (2002). This leads to a problem of the formminimizeX∈Cn1×n2 ,U∈Hn1 ,V ∈Hn212trace(U XX∗ V)subject toAX = b and(U XX∗ V) 0.Next, this method represents X in a low-rank factored form X = ZZ∗,with Z ∈ C(n1+n2)×r and r ≥ 1 chosen a priori to be small. This way, the SDPabove can be reformulated as the nonconvex constrained NLPminimizeZ∈C(n1+n2)×r12‖Z‖22 subject to Aˆ(ZZ∗) = b,where we define Aˆ : Hn1+n2 → Cm as the extension of the measurementoperator A to Hermitian matrices that acts on the top-right off-diagonal block.161.2. Norm-minimization and gauge dualityAt this point the classical augmented Lagrangian method is applied to theresulting problem. This leads to a sequence of subproblems of the formZk+1 ∈ arg minZ∈C(n1+n2)×r12‖Z‖22 + 〈yk, b− Aˆ(ZkZ∗k)〉+ρk2∥∥∥b− Aˆ(ZkZ∗k)∥∥∥22,in which we omit the details related to rules for updating the dual variablesyk ∈ Cm and regularization parameters ρk > 0. The numerical solution of thesesubproblems (to first-order stationarity) is then carried out using a standardlimited-memory quasi-Newton method via the minFunc package by Schmidt(2005), and thus being able to take full advantage of matrix-free FFT routines.Such an approach relies on the effectiveness of the subproblem solver toprovide good solutions and that the a priori choice of the rank r is large enoughto capture the expected solution, which seems reasonable in the scenarioswhere low-rank recovery is likely. For problems with positive , the authorsmention they stop the iterations as soon as the misfit falls below that value,not necessarily solving (NNM) but providing a feasible point.Chapter 5 compares the solution computed by this method, from an im-plementation made available online by the authors, to that output from oursolver in an instance of image deblurring.1.2 Norm-minimization and gauge dualityChandrasekaran, Recht, Parrilo, and Willsky (2012) generalize results of sparseand low-rank recovery under linear measurements to more abstract notions oflow-complexity. They observe that the constructions of convex relaxations forthe cardinality function of vectors and for the rank of matrices can be suitablygeneralized by the notion of atomic norms defined below.Definition 1.2.1 (Atomic norm). Let T ⊂ Rn be compact (with elementscalled atoms), such that all of its elements are extreme points of conv T and0 ∈ conv T . The gauge of T , denoted as ‖·‖T : Rn → R+∪{+∞}, is definedby‖x‖T := inf{t > 0 |x ∈ t conv T }. (1.3)If T is centrally symmetric about the origin (i.e., a ∈ T implies −a ∈ T ),then ‖·‖T is a norm and called the atomic norm induced by T .With this concept, the set S := {±e1, . . . ,±en} (where ek ∈ Rn is thekth-element of the canonical basis for Rn) will induce the atomic norm ‖·‖S =171.2. Norm-minimization and gauge duality‖·‖1 : Rn → R+. Under this framework, the atoms of S are taken to be thesparsest unit 2-norm elements in Rn, i.e. the elements of the canonical basisand their negated counterparts. Similarly, the nuclear norm is induced bytaking the gauge of all rank-1 matrices with unit operator-norm—i.e.,L := {xyT |x ∈ Rn1 , y ∈ Rn2 , 1 = ‖x‖2 = ‖y‖2}induces ‖·‖L = ‖·‖1 : Rn1×n2 → R+.In their approach, the notion of low-complexity is embedded in the choiceof the collection of atoms T , which would be dictated by the problem at hand.The studied recovery procedure is given by the convex optimization problemminimizex∈Rn‖x‖T subject to ‖b− Ax‖2 ≤ , (1.4)where 0 ≤ < ‖b‖2 prescribes the admissible measurement misfit away fromthe vector of observations b ∈ Rm.Assuming measurement maps A ∈ Rm×n whose entries are drawn i.i.d.from the standard Gaussian distribution, Chandrasekaran et al. (2012) pro-vide probabilistic recoverability and stability bounds similar to those discussedin §1.1 for a range of different “low-complexity” models generalized by the con-cept of atomic norm and the solution of the convex optimization problem (1.4).For sparse and low-rank recovery, they provide much tighter lower bounds onthe number of measurements sufficient for the effectiveness of the relaxation.Candès and Recht (2013) also leveraged this concept of atomic norms andtheir dual-norms to provide a unified treatment of recoverability bounds fromnoiseless linear measurements for both (block) sparse vectors and low-rankmatrices.As mentioned by Chandrasekaran et al. (2012), most of their developmentsare applicable even in some cases where the collection of atoms T does notinduce a norm—e.g., sparse nonnegative vectors and low-rank positive semidef-inite matrices. In such scenarios, convex problems of the form (1.4) are partic-ular cases of a general class of gauge optimization studied by Freund (1987).In his seminal paper, Freund (1987) develops a duality framework around aclass of optimization problems that generalizes the minimization of a p-norm,convex QPs, and linear programs with nonnegative optimal values.This class of problems can be succintly described with the formulationminimizex∈Xκ(x) subject to x ∈ C, (1.5)where X is a finite-dimensional real inner-product space, κ : X → R ∪ {+∞}is a gauge function—i.e., a nonnegative convex function that is positively ho-mogeneous and satisfies κ(0) = 0—and C ⊆ X is a closed convex set.181.2. Norm-minimization and gauge dualityRemark 1.2.1. It must be noted that we will often assume κ to be closed(i.e., its epigraph epiκ := {(x, t) ∈ X × R |κ(x) ≤ t} is a closed set) and thatC does not contain the origin, otherwise x = 0 would be a solution of (1.5).As can be readily inferred from definitions, the class of gauge functionssubsumes all norms and seminorms. Generalizing the notion of dual norms,we have that of a polar gauge κ◦ : X → R∪ {+∞} defined byκ◦(y) := inf{t > 0 | tκ(x) ≥ 〈x, y〉 , ∀x ∈ X}, (1.6)giving rise to a natural generalization of the Cauchy-Schwartz inequality〈x, y〉 ≤ κ(x)κ◦(y), ∀x, y ∈ X , (1.7)which holds whenever the product makes sense, i.e., {κ(x), κ◦(y)} 6= {0,+∞}.Although motivated by problems in which C is polyhedral, Freund (1987)proposed an approach to handle general nonlinear constraints based on a con-cept of the antipolar set to C, denoted by C ′ ⊂ X and defined asC ′ := {y ∈ X | 〈x, y〉 ≥ 1,∀x ∈ C}. (1.8)From its definition, it can be readily observed that this set is closed and convexand does not contain the origin (however it might be empty).An interesting consequence of the concepts of polar gauges and antipolarsets is that they lead to a certain duality relationship represented byx ∈ C and y ∈ C ′ =⇒ 1 ≤ κ(x)κ◦(y), (1.9)naturally leading to a gauge minimization problem of the formminimizey∈Xκ◦(y) subject to y ∈ C ′, (1.10)which is the corresponding dual to (1.5) proposed by Freund (1987).In his study of duality relationships between the gauge optimization prob-lems (1.5) and (1.10), Freund (1987) provides a general weak duality resultfor that primal-dual pair. In his theorem, compiled below, the domain of afunction f : X → R ∪ {+∞}, denoted by dom f, is defined asdom f := {x ∈ X | f(x) < +∞},i.e., it corresponds to the set of points at which the function is finite-valued.191.2. Norm-minimization and gauge dualityTheorem 1.2.1 (Freund (1987, Theorem 1A (Weak Duality))).Let p∗ ∈ R+ ∪ {+∞} and d∗ ∈ R+ ∪ {+∞} be optimal values for (1.5)and (1.10), respectively. Then(i) If x ∈ C ∩ domκ and y ∈ C ′ ∩ domκ◦, then κ(x)κ◦(y) ≥ 1, and hencep∗d∗ ≥ 1.(ii) If p∗ = 0, then (1.10) is essentially infeasible, i.e., d∗ = +∞.(iii) If d∗ = 0, then (1.5) is essentially infeasible, i.e., p∗ = +∞.(iv) If x∗ ∈ C ∩ domκ and y∗ ∈ C ′ ∩ domκ◦ and κ(x∗)κ◦(y∗) = 1, then x∗and y∗ are optimal solutions of (1.5) and (1.10), respectively.This previous result is a direct consequence of (1.9) and it illustrates themultiplicative notion of duality in gauge optimization, which is different fromthe usual additive one from Lagrangian duality in general convex optimization.For problems in which C is polyhedral, Freund (1987) introduces a qualifi-cation condition for strong duality to hold.Definition 1.2.2 (Projection property and qualification). A convex setS ⊆ X satisfies the projection property if, for all linear maps A : X → Rm,we have that AS = {Ax |x ∈ S} is closed.A gauge function κ : X → R∪{+∞} satisfies the projection qualificationif both {x ∈ X |κ(x) ≤ 1} and {y ∈ X |κ◦(y) ≤ 1} satisfy the projectionproperty.With these concepts, the following strong duality result can be proved.Theorem 1.2.2 (Freund (1987, Theorem 2)). Assume that κ satisfies theprojection qualification and C is polyhedral. Let p∗ ∈ R+ ∪ {+∞} andd∗ ∈ R+ ∪ {+∞} be optimal values for (1.5) and (1.10), respectively. Then(i) If C ∩ domκ 6= ∅ and C ′ ∩ domκ◦ 6= ∅, then p∗d∗ = 1; and the optimalvalues of (1.5) and (1.10) are achieved for some x∗ and y∗.(ii) C ∩ domκ = ∅ (i.e., p∗ = +∞) if and only if d∗ = 0; and d∗ = 0 isachieved for some y∗ ∈ C ′ if C = ∅.(iii) C ′ ∩ domκ◦ = ∅ (i.e., d∗ = +∞) if and only if p∗ = 0; and p∗ = 0 isachieved for some x∗ ∈ C if C ′ = ∅.201.2. Norm-minimization and gauge dualityFreund (1987) provides alternative strong duality results under qualifica-tions similar to the classical Slater conditions, i.e., when the intersections inTheorem 1.2.2 (i) involve relative interiors. Later in §2.4 we provide morerefined sufficient conditions for strong gauge duality in an abstract setting.These results were used by Freund (1987) to show that the gauge dualfor the minimization of strictly convex p-norms provides tighter lower boundswhen compared to the corresponding Lagrange dual. Other authors have ex-ploited these for applications ranging from proving robust stability of lineardynamical systems to approximate Farkas Lemmas in linear programming andtight inequalities in probability; see Teboulle and Kogan (1994); Todd and Ye(1998); Bertsimas and Popescu (2005).Our motivation is the natural gauge optimization form of the convex re-laxations to inverse problems in low-complexity recovery. For these problems,the convex relaxation typically has the formminimizex∈Xκ(x) subject to ρ(b− Ax) ≤ , (1.11)where A : X → Rm is a linear map and ρ : Rm → R is also a gauge, modelingthe measurement misfit and typically a p-norm.As we shall see later in Chapter 2, the corresponding dual gauge problemhas the formminimizey∈Rmκ◦(A∗y) subject to 〈b, y〉 − ρ(y) ≥ 1, (1.12)which can be contrasted to the Lagrange dual problemmaximizey∈Rm〈b, y〉 − ρ(y) subject to κ◦(A∗y) ≤ 1. (1.13)In the applications that motivate our work, ρ is typically rather simplewhile κ has a more complicated structure. Our approach described later inChapter 4 leverages the fact that m is expected to be much smaller thandimX to solve the dual problem (1.12), whose feasible set is much simplerand is thus more amenable to computational approaches for nonsmooth convexoptimization than its Lagrangian counterpart (1.13).Examples. The following self-contained examples, some condensed from ourprevious and later discussions, illustrate the versatility of the gauge optimiza-tion formulation in modeling a variety of problems, and also provide a motiva-tion for the abstract treatment that we give to gauge duality in the followingChapter 2.211.2. Norm-minimization and gauge dualityExample 1.2.1 (Norms and minimum-length solutions). Norms are specialcases of gauge functions that are finite everywhere, symmetric, and zeroonly at the origin. (Semi-norms drop the last requirement, and allow thefunction to be zero at other points.) Let κ(x) = ‖x‖ be any norm, and C ={x | Ax = b } describe the solutions to an underdetermined linear system.Then (1.5) yields a minimum-length solution to the linear system Ax = b.This problem can be modeled as an instance of (1.11) by letting ρ be anygauge function for which ρ−1(0) = { 0 } and setting = 0. The polar κ◦ =‖·‖∗ is the norm dual to ‖·‖, and C ′ = {A∗y | 〈b, y〉 ≥ 1 }; cf. Corollary 2.3.2.The corresponding gauge dual (1.10) is thenminimizey∈Y‖A∗y‖∗ subject to 〈b, y〉 ≥ 1.Example 1.2.2 (Sparse optimization and atomic norms). In his thesis,van den Berg (2009) describes a framework for sparse optimization based onthe formulation where κ is a gauge, and the function ρ is differentiable awayfrom the origin. The nonnegative regularization parameter influences thedegree to which the linear model Ax fits the observations b. This formulationis specialized by van den Berg and Friedlander (2011) to the particular casein which ρ is the 2-norm. In that case, C = {x | ‖Ax− b‖2 ≤ } andC ′ = {A∗y | 〈b, y〉 − ‖y‖2 ≥ 1 } ;cf. Corollary 2.3.1. Teuber, Steidl, and Chan (2013) consider a related casewhere the misfit between the model and the observations is measured bythe Kullback-Leibler divergence.Chandrasekaran et al. (2012) describe how to construct regularizers thatgeneralize the notion of sparsity in linear inverse problems. In particular,they define the gauge‖x‖T := inf { t ≥ 0 | x ∈ t conv T } (1.14)over the convex hull of the set of canonical atoms given by the set T . If0 ∈ int conv T and T is bounded and symmetric, i.e., T = −T , then thedefinition (1.14) yields a norm. For example, if T consists of the set of unitn-vectors that contain a single nonzero element, then (1.14) is the 1-norm; ifT consists of the set of rank-1 matrices with unit spectral norm, then (1.14)is the Schatten 1-norm. The polar κ◦(y) = sup { 〈y, a〉 | a ∈ conv({0} ∪ T ) }is the support function of the closure of conv({0}∪T ). Jaggi (2013) catalogsvarious sets of atoms that yield commonly used gauges in machine learning.221.2. Norm-minimization and gauge dualityExample 1.2.3 (Conic gauge optimization). In this example we demon-strate that it is possible to cast any convex conic optimization problem inthe gauge framework. Let K be a closed convex cone, and let K∗ denote itsdual. Consider the primal-dual pair of feasible conic problems:minimizex〈c, x〉 subject to Ax = b, x ∈ K, (1.15a)maximizey〈b, y〉 subject to c− A∗y ∈ K∗. (1.15b)Suppose that ŷ is a dual-feasible point, and define ĉ = c − A∗ŷ. Becauseĉ ∈ K∗, it follows that 〈ĉ, x〉 ≥ 0 for all x ∈ K. In particular, the primalproblem can be equivalently formulated as a gauge optimization problemby definingκ(x) = 〈ĉ, x〉+ δK(x) and C = {x | Ax = b } , (1.16)where δK is the indicator function on the set K. (More generally, it isevident that any function of the form γ+δK is a gauge if γ is a gauge.) Thisformulation is a generalization of the nonnegative linear program discussedby Freund, and we refer to it as conic gauge optimization. The generalizationcaptures some important problem classes, such as trace minimization ofpositive semidefinite (PSD) matrices, which arises in the phase-retrievalproblem (c.f. §§1.1.1). This is an example where c ∈ K∗, in which case thedual-feasible point ŷ = 0 is trivially available for the gauge reformulation;cf. §3.1.Example 1.2.4 (Semidefinite programming relaxation for MAX-CUT).A concrete example of conic gauge programming, in the simple case wherec ∈ K∗, is the semidefinite programming relaxation of the max-cut problemstudied by Goemans and Williamson (1995). Let G = (V,E) be an undi-rected graph, and D = diag((dv)v∈V), where dv denotes the degree of vertexv ∈ V. The max-cut problem can be written asmaximizex14〈D − A, xxT 〉 subject to x ∈ {−1, 1}V ,where A denotes the adjacency matrix associated with G. The semidefiniteprogramming relaxation for this problem is derived by lifting xxT into aPSD matrix, as we have done for phase retrieval in §1.1.1:maximizeX14〈D − A,X〉 subject to diagX = e, X 0,231.2. Norm-minimization and gauge dualitywhere e denotes the vector of all ones. The constraint diagX = e impliesthat 〈D,X〉 = ∑v∈V dv = 2|E| is constant. Thus, the optimal value is equalto|E| − 14·minX{ 〈D + A,X〉 | diagX = e, X 0 } , (1.17)and the solution can be obtained by solving this latter problem. Note thatD +A is PSD because it has nonnegative diagonals and is diagonally dom-inant. (In fact, it is possible to reduce the problem in linear time to onewhere D+A is positive definite by identifying its bipartite connected com-ponents.) Because the dual of the cone of PSD matrices is itself, and thetrace inner product between PSD matrices is nonnegative, (1.17) falls intothe class of conic gauge problems defined by (1.15a).Example 1.2.5 (Submodular functions). Let V = { 1, . . . , n }, and considerthe set-function f : 2V → R, where f(∅) = 0. The Lovàsz (1983) extensionf̂ : Rn → R of f is given byf̂(x) =n∑k=1xjk[f({ j1, . . . , jk })− f({ j1, . . . , jk−1 })],where xj1 ≥ xj2 ≥ · · · ≥ xjn are the sorted elements of x. Clearly, theextension is positively homogeneous and vanishes at the origin. As shownby Lovász, the extension is convex if and only if f is submodular, i.e.,f(A) + f(B) ≥ f(A ∪B) + f(A ∩B) for all A,B ⊂ V ;see also (Bach, 2013, Proposition 2.3). If f is additionally non-decreasing,i.e.,A,B ⊂ V and A ⊂ B =⇒ f(A) ≤ f(B),then the extension is nonnegative over Rn+. Thus, when f is a submodu-lar and non-decreasing set function, that function plus the indicator on thenonnegative orthant, i.e., f̂+δRn+ , is a gauge. Bach (2013) surveys the prop-erties of submodular functions and their application in machine learning;see Proposition 3.7 therein.241.3. Thesis overview and contributions1.3 Thesis overview and contributionsThe main body of this thesis is comprised of four chapters following the presentintroduction. In Chapter 2 we present the theoretical framework of gaugeoptimization and extend its duality theory with a number of results useful formodeling and deriving duals with focus given to problems involving gauge-constrained linear measurements, an abstraction of the types of measurementsfrequently encountered in the resulting convex relaxations of nonlinear inverseproblems that motivated this research.Chapter 3 specializes this abstract gauge duality framework to problemsinvolving matrix variables. It connects the spectral optimization problems oftrace minimization in the PSD cone and nuclear-norm minimization to a con-crete primal-dual pair of constrained gauge minimization problems. The mainresult is the formulation of a dual convex eigenvalue optimization problem,whose feasible set is defined by a rather simple constraint and is amenable tomatrix-free numerical methods, a requirement for scalabity to large problems.Chapter 4 completes the abstract to concrete flow of this work with ananalysis of the strong duality and optimality conditions of the primal PSDtrace minimization and dual constrained eigenvalue minimization problems.These conditions induce an approach for recovering low-rank primal solutionsfrom dual minimizers and are exploited to design a computational methodto solve the primal-dual gauge pair. A description of the main algorithmiccomponents of our proof-of-concept solver is presented and extensions of ouranalysis to the reweighted formulations by Candès et al. (2013a) and Mohanand Fazel (2010) conclude that chapter.The numerical experiments in Chapter 5 contrast the results of our im-plementation to those obtained by solvers designed for each of the two mainproblems we study. For the PhaseLift formulation of phase recovery, our re-sults are compared against those of TFOCS (Becker et al., 2011; Candès et al.,2013b) and Wirtinger Flow (Candès et al., 2015a) on a number of randomproblems and a large two-dimensional image, used to validate the ability ofour approach to solve problems of a practical scale. An instance of blinddeconvolution for motion deblurring of a two-dimensional image is used tocompare our solver’s results to those obtained by the augmented Lagrangiansolver proposed in (Ahmed et al., 2014) and assess its feasibility in solvingnuclear norm minimization problems of such scale.We conclude this work in Chapter 6 with a discussion of our developmentsalong with our perspective of interesting avenues for further investigation.251.3. Thesis overview and contributionsReproducible researchThe data files and Matlab scripts used to generate the tables and figuresfrom all numerical experiments presented in Chapter 5 can be obtained at:http://www.cs.ubc.ca/labs/scl/low-rank-opt26Chapter 2Gauge optimization and dualityGauge functions significantly generalize the notion of a norm, and gauge opti-mization, as defined by Freund (1987), seeks the element of a convex set thatis minimal with respect to a gauge function. This conceptually simple problemcan be used to model a remarkable array of useful problems that arise in arange of fields (see examples in §1.2). The gauge structure of these problemsallows for a special kind of duality framework which we explore and specializeto a particular form that exposes some useful properties of the gauge opti-mization framework (such as the variational properties of its value function),and yet maintains most of the generality of the abstract form shown in §1.2.As observed in the TFOCS-based method discussed in Chapter 1, one ap-proach to solving linear inverse problems is to optimize a regularization func-tion over the set of admissible deviations between the observations and theforward model. Although regularization functions come in a wide range offorms depending on the particular application, they often share some com-mon properties. In this chapter we describe and study the class of gaugeoptimization problems, which neatly captures a wide variety of regularizationformulations that arise from the convex relaxations described in Chapter 1 andothers from fields such as machine learning and inverse problems. We explorethe duality and variational properties particular to this family of problems.All of the problems that we consider can be expressed asminimizex∈Xκ(x) subject to x ∈ C, (P)where X is a finite-dimensional Euclidean space, C ⊆ X is a closed convex set,and κ : X → R ∪ {+∞} is a gauge function, i.e., a nonnegative, positivelyhomogeneous convex function that vanishes at the origin. (We assume that0 /∈ C, since otherwise the origin is trivially a solution of the problem.) Thisclass of problems admits a duality relationship that is different from Lagrangeduality, and is founded on the gauge structure of its objective. Indeed, Freund(1987) defines the dual gauge counterpart to beminimizey∈Xκ◦(y) subject to y ∈ C ′, (D)272. Gauge optimization and dualitywhere the setC ′ := { y | 〈y, x〉 ≥ 1 for all x ∈ C } (2.1)is the antipolar of C (in contrast to the better-known polar of a convex set), andthe polar κ◦ (also a gauge) is the function that satisfies the Cauchy-Schwartz-like inequality most tightly:〈x, y〉 ≤ κ(x)κ◦(y), ∀x ∈ domκ, ∀y ∈ domκ◦; (2.2)see (2.5) for the precise definition. It follows directly from this inequalityand the definition of C ′ that all primal-dual feasible pairs (x, y) satisfy theweak-duality relationship1 ≤ κ(x)κ◦(y), ∀x ∈ C ∩ domκ, ∀y ∈ C ′ ∩ domκ◦. (2.3)This duality relationship stands in contrast to the more usual Lagrange frame-work, where the primal and dual objective values bound each other in anadditive sense.A roadmap of our developments. Freund’s analysis of gauge duality ismainly concerned with specialized linear and quadratic problems that fit intothe gauge framework, and with the pair of abstract problems (P) and (D). Ourtreatment in this chapter considers the particular formulation of (P) given byminimizex∈Xκ(x) subject to ρ(b− Ax) ≤ , (Pρ)where ρ is also a gauge. Typical applications might use ρ to measure themismatch between the model Ax and the measurements b, and in that case,it is natural to assume that ρ vanishes only at the origin, so that the con-straint reduces to Ax = b when = 0. This formulation is only very slightlyless general than (P) because any closed convex set can be represented as{x | ρ(b− x) ≤ 1 } for some vector b and gauge ρ; cf. §2.1.2. However, itis sufficiently concrete that it allows us to develop a calculus for computinggauge duals for a wide range of existing problems. (Conic side constraintsand a linear map in the objective can be easily accommodated; this is coveredin §2.6.)The special structure of the functions in the gauge program (Pρ) leadsto a duality framework that is analogous to the classical Lagrange-dualityframework. The gauge program dual to (Pρ) is given by282.1. Preliminariesminimizey∈Xκ◦(A∗y) subject to 〈y, b〉 − ρ◦(y) ≥ 1, (Dρ)which bears a striking similarity to the Lagrange dual problemmaximizey∈X〈y, b〉 − ρ◦(y) subject to κ◦(A∗y) ≤ 1. (D`)Note that the objective and constraints between the two duals play differentroles. (These two duals are derived in §2.3 under suitable assumptions.) Asignificant practical difference between these two formulations is when ρ is asimple Euclidean norm and κ is a more complicated function (such as the onedescribed in Example 1.2.2). The result is that the Lagrange dual optimizesa “simple” objective function over a potentially “complicated” constraint; incontrast, the situation is reversed in the gauge optimization formulation.We develop in §2.2 an antipolar calculus for computing the antipolars ofsets such as {x | ρ(b− Ax) ≤ }, which corresponds to the constraint in ourcanonical formulation (Pρ). This calculus is applied in §2.3 to derive the gaugedual (Dρ).The formal properties of the polar and antipolar operations are describedin §§2.1–2.2. In §2.4 we develop conditions sufficient for strong duality, i.e.,for there to exist a primal-dual pair that satisfies (2.3) with equality. Ourderivation parts with the “ray-like” assumption used by Freund, and in certaincases further relaxes the required assumptions by leveraging connections withestablished results from convex analysis and Fenchel duality.2.1 PreliminariesIn this section we review known facts about polar sets, gauges and their polars,and introduce results that are useful for our subsequent analysis. We mainlyfollow Rockafellar (1970) and use a number of results from his monograph:see §14 in that text for a discussion of polarity operations on convex sets,and §15 for a discussion of gauge functions and their corresponding polarityoperations. A few results from other books are partially compiled and creditedwhen needed for convenient reference.We use the following notation throughout. For a closed convex set D, letriD and clD denote, respectively, the relative interior and the closure of D.The indicator function of the setD is denoted by δD and defined as δD(x) = 0, ifx ∈ D, and δD(x) = +∞, otherwise. The recession cone of D is defined below;see Auslender and Teboulle (2003, Definition 2.1.2).292.1. PreliminariesDefinition 2.1.1 (Recession cone). Let D be a nonempty set in X . Thenthe recession (or asymptotic) cone of the set D, denoted by D∞, is the setof vectors d ∈ X that are limits in direction of the sequences {xk} ⊂ D,namelyD∞ :={d ∈ X∣∣∣∣∃tk → +∞, ∃xk ∈ D with limk→+∞ xktk = d}.For a gauge κ : X → R ∪ {+∞}, its domain is denoted by domκ ={x | κ(x) < +∞}, and its epigraph is denoted by epiκ = { (x, µ) | κ(x) ≤ µ }.A function is called closed if its epigraph is closed, which is equivalent to thefunction being lower semi-continuous; c.f. Rockafellar (1970, Theorem 7.1).Let clκ denote the gauge whose epigraph is cl epiκ, which is the largestlower semi-continuous function smaller than κ; see Rockafellar (1970, p. 52).Finally, for any x ∈ domκ, the subdifferential of κ at x is denoted ∂κ(x) ={ y | κ(u) ≥ κ(x) + 〈y, u− x〉, ∀u } .We make the following blanket assumptions throughout. The set C is anonempty closed convex set that does not contain the origin; the set D is anonempty convex set that may or may not contain the origin, depending onthe context. The gauge function ρ : X → R ∪ {+∞}, used in (Pρ), is closed;when = 0, we additionally assume that ρ−1(0) = {0} essentially to ensurethat {x | ρ(b− Ax) ≤ 0} = {x | Ax = b} , as would be expected in the case ofa zero tolerance on the mismatch between Ax and the observations b.2.1.1 Polar setsThe polar of a nonempty closed convex set D is defined asD◦ := { y | 〈x, y〉 ≤ 1, ∀x ∈ D } ,which is necessarily closed convex, and contains the origin.The bipolar theorem states that if D is closed, then it contains the originif and only if D = D◦◦; see Rockafellar (1970, Theorem 14.5).When D = K is a closed convex cone, the polar is equivalently given byK◦ := { y | 〈x, y〉 ≤ 0, ∀x ∈ K} .The positive polar cone (also known as the dual cone) of D is given byD∗ := { y | 〈x, y〉 ≥ 0, ∀x ∈ D } .The polar and positive polar are related via the closure of the conic hull, i.e.,D∗ = (cl coneD)∗ = −(cl coneD)◦, where coneD :=⋃λ≥0λD.302.1. Preliminaries2.1.2 Gauge functionsAll gauges can be represented in the form of a Minkowski function γD of somenonempty convex set D, i.e.,κ(x) = γD(x) := inf {λ ≥ 0 | x ∈ λD } . (2.4)In particular, one can always choose D = {x | κ(x) ≤ 1 }, and the aboverepresentation holds. The polar of the gauge κ is defined byκ◦(y) := inf {µ > 0 | 〈x, y〉 ≤ µκ(x), ∀x } , (2.5)which explains the inequality (2.2). Because κ is a proper convex function,one can also define its convex conjugate:κ∗(y) := supx{ 〈x, y〉 − κ(x) } . (2.6)It is well known that κ∗ is a proper closed convex function (Rockafellar, 1970,Theorem 12.2). The next fact is used in the proof of the proposition followingit and compiled from Auslender and Teboulle (2003, Proposition 3.1.3) forconvenient reference.Proposition 2.1.1. Let f : X → R ∪ {+∞} be proper closed and convex.Then 0 ∈ int dom f ∗ if and only if the optimal set {x | f(x) = inf f} isnonempty and compact.The next result collects properties relating the polar and conjugate of a gauge.Proposition 2.1.2. For the gauge κ : X → R ∪ {+∞}, it holds that(i) κ◦ is a closed gauge function;(ii) κ◦◦ = clκ = κ∗∗;(iii) κ◦(y) = supx { 〈x, y〉 | κ(x) ≤ 1 } for all y;(iv) κ∗(y) = δκ◦(·)≤1(y) for all y;(v) domκ◦ = X if κ is closed and κ−1(0) = { 0 };(vi) epiκ◦ = { (y, λ) | (y,−λ) ∈ (epiκ)◦ }.312.1. PreliminariesProof. The first two items are proved in Theorems 15.1 and 12.2 of Rockafellar(1970). Item (iii) follows directly from the definition (2.5) of the polar gauge.To prove item (iv), we note that if g(t) = t, t ∈ R, then the so-called monotoneconjugate g+ isg+(s) = supt≥0{ st− t } = δ[0,1](s),where s ≥ 0. Now, apply Rockafellar (1970, Theorem 15.3) with g(t) = t,and κ∗∗ in place of f in that theorem to obtain that κ∗∗∗(y) = δ[0,1](κ∗∗◦(y)).The conclusion in item (iv) now follows by noting that κ∗∗∗ = κ∗ and κ∗∗◦ =κ◦◦◦ = κ◦. To prove item (v), note that the assumptions together with Propo-sition 2.1.1 show that 0 ∈ int domκ∗. This together with item (iv) and thepositive homogeneity of κ◦ shows that domκ◦ = X . Finally, item (vi) isstated on Rockafellar (1970, p. 137) and can also be verified directly from thedefinition.In many interesting applications, the objective in (P) is the compositionκ ◦A, where κ is a gauge and A is a linear map. Clearly, κ ◦A is also a gauge.The next result gives the polar of this composition.Proposition 2.1.3. Let A be a linear map. Suppose that either(i) epiκ is polyhedral; or(ii) ri domκ ∩ rangeA 6= ∅.Then(κ ◦ A)◦(y) = infu{κ◦(u) | A∗u = y } .Moreover, the infimum is attained when the value is finite.Proof. Since κ ◦ A is a gauge, we have from Proposition 2.1.2(iii) that(κ ◦ A)◦(y) = supx{ 〈y, x〉 | κ(Ax) ≤ 1 } = − infx{ 〈−y, x〉+ δD(Ax) } ,where D = {x | κ(x) ≤ 1 }. Since κ is positively homogeneous, we havedomκ =⋃λ≥0 λD. Hence, ri domκ =⋃λ>0 λ riD from Rockafellar (1970,p. 50). Thus, assumption (ii) implies that riD ∩ rangeA 6= ∅. On the otherhand, assumption (i) implies that D is polyhedral; and D ∩ rangeA 6= ∅because they both contain the origin. Use these conclusions and apply Rock-afellar (1970, Corollary 31.2.1) (see also Rockafellar’s remark right after that322.1. Preliminariescorollary for the case when D is polyhedral) to conclude that(κ ◦ A)◦(y) = − supu{−(〈−y, ·〉)∗(−A∗u)− (δD)∗(u) }= − supu{−κ◦(u) | A∗u = y } ,where the second equality follows from the definition of conjugate functionsand Proposition 2.1.2(iii). Moreover, from that same corollary, the supremumis attained when finite. (Note that Rockafeller’s statement of that corollary isformulated for the difference between convex and concave function, and mustbe appropriately adapted to our case.) This completes the proof.Suppose that a gauge is given as the Minkowski function of a nonemptyconvex set that may not necessarily contain the origin. The following propo-sition summarizes some properties concerning this representation.Proposition 2.1.4. Suppose that D is a nonempty convex set. Then(i) (γD)◦ = γD◦ ;(ii) γD = γconv({0}∪D);(iii) If conv({0} ∪ D) is closed, then γD is closed;(iv) If κ = γD, D is closed, and 0 ∈ D, then D is the unique closed convexset containing the origin such that κ = γD; indeed, D = {x |κ(x) ≤ 1} .Proof. Item (i) is proved in Rockafellar (1970, Theorem 15.1). Item (ii) followsdirectly from the definition. To prove (iii), we first notice from item (ii) thatwe may assume without loss of generality that D contains the origin. Noticealso that γD is closed if and only if γD = γ∗∗D . Moreover, γ∗∗D = γD◦◦ = γclD,where the first equality follows from Proposition 2.1.2(ii) and item (i), whilethe second equality follows from the bipolar theorem. Thus, γD is closed if andonly if γD = γclD. The latter holds when D = clD. Finally, the conclusionin item (iv) was stated on Rockafellar (1970, p. 128); indeed, the relationD = {x | κ(x) ≤ 1 } can be verified directly from definition.From Proposition 2.1.2(iv) and Proposition 2.1.4(iv), it is not hard to provethe following formula on the polar of the sum of two gauges of independentvariables.332.1. PreliminariesProposition 2.1.5. Let κ1 and κ2 be gauges. Then κ(x1, x2) := κ1(x1) +κ2(x2) is a gauge, and its polar is given byκ◦(y1, y2) = max {κ◦1(y1), κ◦2(y2) } .Proof. It is clear that κ is a gauge. Moreover,κ∗(y1, y2) = κ∗1(y1) + κ∗2(y2) = δD1×D2(y1, y2),where Di = {x | κ◦i (x) ≤ 1 } for i = 1, 2; the first equality follows from thedefinition of the convex conjugate and the fact that y1 and y2 are decoupled,and the second equality follows from Proposition 2.1.2(iv). This together withProposition 2.1.4(iv) implies thatκ◦(y1, y2) = inf {λ ≥ 0 | y1 ∈ λD1, y2 ∈ λD2 }= max { inf {λ ≥ 0 | y1 ∈ λD1 } , inf {λ ≥ 0 | y2 ∈ λD2 } }= max { γD1(y1), γD2(y2) } = max {κ◦1(y1), κ◦2(y2) } .This completes the proof.The following corollary is immediate from Propositions 2.1.3 and 2.1.5.Corollary 2.1.1. Let κ1 and κ2 be gauges. Suppose that either(i) epiκ1 and epiκ2 are polyhedral; or(ii) ri domκ1 ∩ ri domκ2 6= ∅.Then(κ1 + κ2)◦(y) = infu1,u2{max {κ◦1(u1), κ◦2(u2) } | u1 + u2 = y } . (2.7)Moreover, the infimum is attained when finite.Proof. Apply Proposition 2.1.3 with Ax = (x, x) and the gauge κ1(x1)+κ2(x2),whose polar is given by Proposition 2.1.5.The support function for a nonempty convex set D is defined asσD(y) = supx∈D〈x, y〉.342.1. PreliminariesIt is straightforward to check that if D contains the origin, then the supportfunction is a (closed) gauge function. Indeed, we have the following rela-tionship between support and Minkowski functions (Rockafellar, 1970, Corol-lary 15.1.2).Proposition 2.1.6. Let D be a closed convex set that contains the origin.Then γ◦D = σD and σ◦D = γD.The following result relates the domain of the support function and therecession cone via polarity; see Auslender and Teboulle (2003, Theorem 2.2.1).Theorem 2.1.1. If D is nonempty and convex, then (domσD)◦ = D∞.2.1.3 Antipolar setsThe antipolar C ′, defined by (2.1), is nonempty as a consequence of the sepa-ration theorem. Freund’s 1987 derivations are largely based on the followingdefinition of a ray-like set. (As Freund mentions, the terms antipolar andray-like are not universally used.)Definition 2.1.2. A set D is ray-like if for any x, y ∈ D,x+ αy ∈ D for all α ≥ 0.Note that the antipolar C ′ of a (not necessarily ray-like) set C must be ray-like.The following result is analogous to the bipolar theorem for antipolar op-erations; see McLinden (1978, p. 176) and Freund (1987, Lemma 3).Theorem 2.1.2 (Bi-antipolar theorem). C = C ′′ if and only if C is ray-like.The following proposition, stated by McLinden (1978, p. 176), follows fromthe bi-antipolar theorem.Proposition 2.1.7. C ′′ = ⋃λ≥1 λC.The next fact is used in the proof of the lemma following it; it has beendrawn from Auslender and Teboulle (2003, Proposition 2.1.5) and compiledhere for convenient reference.352.2. Antipolar calculusProposition 2.1.8. Let D be a nonempty convex set in X . Then therecession cone D∞ is a closed convex cone. Moreover, definingD(x) := {d ∈ X |x+ td ∈ clD,∀t > 0}∀x ∈ D,E := {d ∈ X | ∃x ∈ D such that x+ td ∈ clD,∀t > 0},F := {d ∈ X | d+ clD ⊂ clD},we have that D(x) is in fact independent of x, now denoted simply by D,and D∞ = D = E = F.The next lemma relates the positive polar of a convex set, its antipolar andthe recession cone of its antipolar.Lemma 2.1.1. cl cone(C ′) = C∗ = (C ′)∞.Proof. It is evident that cl cone(C ′) ⊆ C∗. To show the converse inclusion, takeany x ∈ C∗ and fix an x0 ∈ C ′. Then for any τ > 0, we have〈c, x+ τx0〉 ≥ τ〈c, x0〉 ≥ τ for all c ∈ C,which shows that x + τx0 ∈ cone C ′. Taking the limit as τ goes to 0 showsthat x ∈ cl cone(C ′). This proves the first equality.Next we show the second equality, and begin with the observation thatC∗ ⊆ (C ′)∞. Conversely, suppose that x ∈ (C ′)∞ and fix any x0 ∈ C ′. Then,by Proposition 2.1.8, x0 + τx ∈ C ′ for all τ > 0. Hence, for any c ∈ C,1τ〈c, x0〉+ 〈c, x〉 = 1τ〈c, x0 + τx〉 ≥ 1τ.Since this is true for all τ > 0, we must have 〈c, x〉 ≥ 0. Since c ∈ C isarbitrary, we conclude that x ∈ C∗.2.2 Antipolar calculusIn general, it may not always be easy to obtain an explicit formula for theMinkowski function of a given closed convex set D. Hence, we derive someelements of an antipolar calculus that allows us to express the antipolar of amore complicated set in terms of the antipolars of its constituents. These rulesare useful for writing down the explicit gauge duals of problems such as (Pρ).Table 2.1 summarizes the main elements of the calculus.As a first step, the following formula gives an expression for the antipolarof a set defined via a gauge. The formula follows directly from the definitionof polar functions.362.2. Antipolar calculusTable 2.1: The main rules of the antipolar calculus; the required assumptionsare made explicit in the specific references.Result Reference(AC)′ = (A∗)−1C ′ Proposition 2.2.2(A−1C)′ = cl(A∗C ′) Propositions 2.2.3 and 2.2.4(C1 ∪ C2)′ = C ′1 ∩ C ′2 Proposition 2.2.5(C1 ∩ C2)′ = cl conv(C ′1 ∪ C ′2) Proposition 2.2.6Proposition 2.2.1. Let C = {x | ρ(b− x) ≤ } with 0 < < ρ(b). ThenC ′ = { y | 〈b, y〉 − ρ◦(y) ≥ 1 } .Proof. Note that y ∈ C ′ is equivalent to 〈x, y〉 ≥ 1 for all x ∈ C. Thus, for allx such that ρ(b− x) ≤ ,〈x− b, y〉 ≥ 1− 〈b, y〉 ⇐⇒ 〈b− x, y〉 ≤ 〈b, y〉 − 1.From Proposition 2.1.2(iii), this is further equivalent to ρ◦(y) ≤ 〈b, y〉−1.Proposition 2.2.1 is very general since any closed convex set D containingthe origin can be represented in the form of {x | ρ(x) ≤ 1 }, where ρ(x) =inf {λ ≥ 0 | x ∈ λD }; cf. (2.4). For conic constraints in particular, one obtainsthe following corollary by setting ρ(x) = δ−K(x).Corollary 2.2.1. Let C = {x | x ∈ b+K} for some closed convex cone Kand a vector b /∈ −K. ThenC ′ = { y ∈ K∗ | 〈b, y〉 ≥ 1 } .Note that Proposition 2.2.1 excludes the potentially important case = 0;however, Corollary 2.2.1 can instead be applied by defining K = ρ−1(0) = { 0 }.2.2.1 Linear transformationsWe now consider the antipolar of the image of C under a linear map A.Proposition 2.2.2. It holds that(AC)′ = (A∗)−1C ′.Furthermore, if cl(AC) does not contain the origin, then both sets above arenonempty.372.2. Antipolar calculusProof. Note that y ∈ (AC)′ is equivalent to〈y, Ac〉 = 〈A∗y, c〉 ≥ 1 for all c ∈ C.The last relation is equivalent to A∗y ∈ C ′. Hence, (AC)′ = (A∗)−1C ′. Further-more, the assumption that cl(AC) does not contain the origin, together withan argument using supporting hyperplanes, implies (AC)′ is nonempty. Thiscompletes the proof.We have the following result concerning the pre-image of C.Proposition 2.2.3. Suppose that A−1C 6= ∅. Then(A−1C)′ = cl(A∗C ′),and both sets are nonempty.Proof. Recall from our blanket assumption (cf. §2.1) that C is a closed convexset not containing the origin. It follows that cl(A∗C ′) is nonempty. Moreover,A−1C is also a closed convex set that does not contain the origin. Hence,(A−1C)′ is also nonempty.We next show that cl(A∗C ′) does not contain the origin. Suppose thaty ∈ A∗C ′ so that y = A∗u for some u ∈ C ′. Then for any x ∈ A−1C, we haveAx ∈ C and thus〈x, y〉 = 〈x,A∗u〉 = 〈Ax, u〉 ≥ 1,which shows that y ∈ (A−1C)′. Thus, we have A∗C ′ ⊆ (A−1C)′ and conse-quently that cl(A∗C ′) ⊆ (A−1C)′. Since the set A−1C is nonempty, (A−1C)′does not contain the origin. Hence, it follows that cl(A∗C ′) also does notcontain the origin.Now apply Proposition 2.2.2 with A∗ in place of A, and C ′ in place of C,to obtain(A∗C ′)′ = A−1C ′′.Taking the antipolar on both sides of the above relation, we arrive at(A∗C ′)′′ = (A−1C ′′)′. (2.8)Since C ′ is ray-like, it follows that cl(A∗C ′) is also ray-like. Since cl(A∗C ′)does not contain the origin, we conclude from the bi-antipolar theorem that(A∗C ′)′′ = cl(A∗C ′). Moreover, we have(A−1C ′′)′ = (⋃λ≥1λA−1C)′=(A−1C)′,382.2. Antipolar calculuswhere the first equality follows from Proposition 2.1.7, and the second equalitycan be verified directly from definition. The conclusion now follows from theabove discussion and (2.8).The next result is used in the proof of the proposition following it andwe compile it here for convenient reference; see (Berman, 1973, Lemma 3.1).Theorem 2.2.1 (R.A. Abrams). Let K ⊂ X and A a linear map from X .Then AK is closed if and only if K + kerA is closed.We now have the following further consequence.Proposition 2.2.4. Suppose that A−1C 6= ∅, and either C is polyhedral orri C ∩ rangeA 6= ∅. Then (A−1C)′ is nonempty and(A−1C)′ = A∗C ′.Proof. We will show that A∗C ′ is closed under the assumption of this proposi-tion. Then the conclusion follows immediately from Proposition 2.2.3.Abrams’s Theorem 2.2.1 asserts that A∗C ′ is closed if and only if C ′+kerA∗is closed. We will thus establish the closedness of the latter set.Suppose that C is a polyhedral. Then it is routine to show that C ′ is also apolyhedral and thus C ′+kerA∗ is closed. Hence, the conclusion of the corollaryholds under this assumption.Finally, suppose that ri C ∩ rangeA 6= ∅. From Theorem 2.1.1 and thebipolar theorem mentioned in §2.1.1, we have cl domσC′ = [(C ′)∞]◦, where(C ′)∞ is the recession cone of C ′, which turns out to be just C∗ by Lemma 2.1.1.From this and the bipolar theorem, we see further thatcl domσC′ = [C∗]◦ = [(cl cone C)∗]◦ = − cl cone C,and hence ri domσC′ = − ri cone C, thanks to Rockafellar (1970, Theorem 6.3).Furthermore, the assumption that ri C∩rangeA 6= ∅ is equivalent to ri cone C∩rangeA 6= ∅, since ri cone C = ⋃λ>0 λ ri C; see Rockafellar (1970, p. 50). Thus,the assumption ri C ∩ rangeA 6= ∅ together with Rockafellar (1970, Theo-rem 23.8) imply thatC ′ + kerA∗ = ∂σC′(0) + ∂δrangeA(0) = ∂(σC′ + δrangeA)(0).In particular, C ′ + kerA∗ is closed.392.2. Antipolar calculus2.2.2 Unions and intersectionsOther important set operations are union and intersection, which we discusshere. Ruys and Weddepohl (1979, Appendix A.1) outline additional rules.Proposition 2.2.5. Let C1 and C2 be nonempty closed convex sets. Then(C1 ∪ C2)′ = C ′1 ∩ C ′2.If 0 /∈ cl conv(C1 ∪ C2), then the sets above are nonempty.Proof. By definition, every y ∈ (C1 ∪ C2)′ obeys 〈y, x〉 ≥ 1 for all x ∈ (C1 ∪C2) ⊇ C1, so y ∈ C ′1. Likewise, y ∈ C ′2, and thus y ∈ C ′1 ∩ C ′2. The converse isequally direct. Moreover, if we assume further that 0 /∈ cl conv(C1 ∪ C2), then(C1 ∪ C2)′ = [cl conv(C1 ∪ C2)]′ is nonempty. This completes the proof.We now consider the antipolar of intersections. Note that it is necessaryto assume that both C1 and C2 are ray-like, which was missing from Ruysand Weddepohl (1979, Property A.5). (The necessity of this assumption isdemonstrated by Example 2.2.1, which follows the proposition.)Proposition 2.2.6. Let C1 and C2 be nonempty ray-like closed convex setsnot containing the origin. Suppose further that C1 ∩ C2 6= ∅. Then(C1 ∩ C2)′ = cl conv(C ′1 ∪ C ′2),and both sets are nonempty.Proof. From the fact that both C1 and C2 are closed convex sets not containingthe origin, it follows that C ′1 and C ′2 are nonempty and hence cl conv(C ′1∪C ′2) 6=∅. Moreover, because C1 ∩ C2 does not contain the origin, (C1 ∩ C2)′ is alsononempty.We first show that cl conv(C ′1 ∪ C ′2) does not contain the origin. To thisend, let y ∈ C ′1 ∪C ′2. For any x ∈ C1 ∩C2, we have 〈y, x〉 ≥ 1, which shows thatC ′1 ∪ C ′2 ⊆ (C1 ∩ C2)′, and hence cl conv(C ′1 ∪ C ′2) ⊆ (C1 ∩ C2)′. Since C1 ∩ C2 isnonempty, (C1∩C2)′ does not contain the origin. Consequently, cl conv(C ′1∪C ′2)does not contain the origin, as claimed.Now apply Proposition 2.2.5, with C ′1 in place of C1 and C ′2 in place of C2,to obtain(C ′1 ∪ C ′2)′ = C ′′1 ∩ C ′′2 = C1 ∩ C2.Take the antipolar of both sides to obtain(C1 ∩ C2)′ = (C ′1 ∪ C ′2)′′ = [cl conv(C ′1 ∪ C ′2)]′′ = cl conv(C ′1 ∪ C ′2),402.3. Duality derivationswhere the second equality follows from the definition of antipolar, and the thirdequality follows from the observation that cl conv(C ′1 ∪ C ′2) is a nonempty ray-like closed convex set not containing the origin. This completes the proof.The following counter-example shows that the requirement that C1 and C2are ray-like cannot be removed from Proposition 2.2.6. Refer to Figure 2.1 fora visual depiction of its construction.Example 2.2.1 (Set intersection and the ray-like property).Consider the setsC1 = { (x1, x2) | 1− x1 ≤ x2 ≤ x1 − 1 } and C2 = { (x1, x2) | x1 = 1 } .Define H1 = { (x1, x2) | x1 + x2 ≥ 1 } and H2 = { (x1, x2) | x1 − x2 ≥ 1 } sothat C1 = H1 ∩H2. Clearly the set C2 is not ray-like, while the sets C1, H1,and H2 are. Moreover, all four sets do not contain the origin. Furthermore,C1∩C2 is the singleton { (1, 0) }, and hence a direct computation shows that(C1 ∩ C2)′ = { (y1, y2) | y1 ≥ 1 }.Next, it follows directly from the antipolar definition thatC ′2 = { (y1, 0) | y1 ≥ 1 } .Also note that H1 = L−11 I, where L1(x1, x2) = x1 +x2 and I = {u | u ≥ 1 }.Thus, by Proposition 2.2.4, H ′1 = { (y1, y1) | y1 ≥ 1 }.Similarly, H ′2 = { (y1,−y1) | y1 ≥ 1 }. Because H1 and H2 are ray-like,it follows from Proposition 2.2.6 thatC ′1 = (H1 ∩H2)′ = cl conv(H ′1 ∪H ′2),which contains C ′2. Thus,cl conv(C ′1 ∪ C ′2) = C ′1 ( { (y1, y2) | y1 ≥ 1 } = (C1 ∩ C2)′.2.3 Duality derivationsWe derive in this section the gauge and Lagrange duals of the primal prob-lem (Pρ). LetC = {x | ρ(b− Ax) ≤ } (2.9)denote the constraint set, where ρ is a closed gauge and 0 ≤ < ρ(b). We alsoconsider the associated set412.3. Duality derivations(a) C1 = H1 ∩H2 (b) C′1 = cl conv(H ′1 ∪H ′2)(c) C2 (d) C′2(e) C1 ∩ C2 (f) (C1 ∩ C2)′ ) C′1 = cl conv(C′1 ∪ C′2)Figure 2.1: Visual depiction of the counter-example described in Example 2.2.1422.3. Duality derivationsC0 = {u | ρ(b− u) ≤ } , (2.10)and note that C = A−1C0. Recall from our blanket assumption in §2.1 thatwhen = 0, we only consider closed gauges ρ with ρ−1(0) = { 0 }.2.3.1 The gauge dualWe consider two approaches for deriving the gauge dual of (Pρ). The first usesexplicitly the abstract definition of the gauge dual (D). The second approachredefines the objective function to also contain an indicator for the nonlineargauge ρ where C is an affine set. This alternative approach is instructive,because it illustrates the modeling choices that are available when workingwith gauge functions.First approachThe following combines Proposition 2.2.4 with Proposition 2.2.1, and gives anexplicit expression for the antipolar of C when > 0.Corollary 2.3.1. Suppose that C is given by (2.9), where 0 < < ρ(b),and C0 is given by (2.10). If C0 is polyhedral, or ri C0 ∩ rangeA 6= ∅, thenC ′ = {A∗y | 〈b, y〉 − ρ◦(y) ≥ 1 } .As an aside, we present the following result, which follows from Corol-lary 2.2.1 and Proposition 2.2.4, concerning a general closed convex cone K.Corollary 2.3.2. Suppose that C = {x | Ax− b ∈ K} for some closed con-vex cone K and b /∈ −K. If K is polyhedral, or (b + riK) ∩ rangeA 6= ∅,thenC ′ = {A∗y | 〈b, y〉 ≥ 1, y ∈ K∗ } .These results can be used to obtain an explicit representation of the gaugedual problem. We rely on the antipolar calculus developed in §2.2. AssumeC0 is polyhedral, or ri C0 ∩ rangeA 6= ∅. (2.11)Consider separately the cases > 0 and = 0.Case 1 ( > 0): Apply Corollary 2.3.1 to derive the antipolar setC ′ = {A∗y | 〈b, y〉 − ρ◦(y) ≥ 1 } . (2.12)432.3. Duality derivationsCase 2 ( = 0): Here we use the blanket assumption (see §2.1) that ρ−1(0) ={ 0 }, and in that case, C = {x | Ax = b }. Apply Corollary 2.3.2 withK = { 0 }to obtainC ′ = {A∗y | 〈b, y〉 ≥ 1 } . (2.13)Since ρ−1(0) = { 0 } and ρ is closed, we conclude from Proposition 2.1.2(v)that dom ρ◦ = X . Hence, (2.13) can be seen as a special case of (2.12) with = 0.These two cases can be combined, and we see that when (2.11) holds, thegauge dual problem (D) for (Pρ) can be expressed as (Dρ). If the assump-tions (2.11) are not satisfied, then in view of Proposition 2.2.3, it still holdsthat (D) is equivalent tominimizeu,yκ◦(u) subject to u ∈ cl {A∗y | 〈y, b〉 − ρ◦(y) ≥ 1 } .This optimal value can in general be less than or equal to that of (Dρ).Second approachThis approach does not rely on assumptions (2.11). Define the functionξ(x, r, τ) := κ(x)+δepi ρ(r, τ), which is a gauge because epi ρ is a cone. Then (Pρ)can be equivalently reformulated asminimizex,r,τξ(x, r, τ) subject to Ax+ r = b, τ = . (2.14)Invoke Proposition 2.1.5 to obtainξ◦(z, y, α) = max {κ◦(z), (δepi ρ)◦(y, α) }(i)= max {κ◦(z), δ(epi ρ)◦(y, α) }(ii)= κ◦(z) + δ(epi ρ)◦(y, α)(iii)= κ◦(z) + δepi(ρ◦)(y,−α),where (i) follows from Proposition 2.1.4(i), (ii) follows from the definition ofindicator function, and (iii) follows from Proposition 2.1.2(vi). As Freund(1987, §2) shows for gauge programs with linear constraints, the gauge dual isgiven byminimizey,αξ◦(A∗y, y, α) subject to 〈y, b〉+ α ≥ 1,which can be rewritten asminimizey,ακ◦(A∗y) subject to 〈y, b〉+ α ≥ 1, ρ◦(y) ≤ −α.442.3. Duality derivations(The gauge dual for problems with linear constraints also follows directly fromCorollary 2.3.2 with K = { 0 }.) Further simplification leads to the gauge dualprogram (Dρ).Note that the transformation used to derive (2.14) is very flexible. Forexample, if (Pρ) contained the additional conic constraint x ∈ K, then ξ couldbe defined to contain an additional term given by the indicator of K.Even though this approach does not require the assumptions (2.11) usedin §2.3.1, and thus appears to apply more generally, it is important to keepin mind that we have yet to impose conditions that imply strong duality. Infact, as we show in §2.4, the assumptions required there imply (2.11).2.3.2 Lagrange dualityOur derivation of the Lagrange dual problem (D`) is standard, and we includeit here as a counterpoint to the corresponding gauge dual derivation. We beginby reformulating (Pρ) by introducing an artificial variable r, and deriving thedual of the equivalent problemminimizex,rκ(x) subject to Ax+ r = b, ρ(r) ≤ . (2.15)Define the Lagrangian functionL(x, r, y) = κ(x) + 〈y, b− Ax− r〉.The Lagrange dual problem is given bymaximizeyinfx, ρ(r)≤L(x, r, y).Consider the (concave) dual function`(y) = infx, ρ(r)≤L(x, r, y)= infx, ρ(r)≤{〈y, b〉 − 〈y, r〉 − (〈A∗y, x〉 − κ(x))}= 〈y, b〉 − supρ(r)≤〈y, r〉 − supx{〈A∗y, x〉 − κ(x)}= 〈y, b〉 − ρ◦(y)− δκ◦(·)≤1(A∗y),where the first conjugate on the right-hand side follows from Proposition 2.1.2(iii)when > 0, and when = 0, it is a direct consequence of the assumption thatρ−1(0) = { 0 } so that dom ρ◦ = X from Proposition 2.1.2(v); the last conjugate452.4. Strong dualityfollows from Proposition 2.1.2(iv). The Lagrange dual problem is obtained bymaximizing `, leading to (D`).Strictly speaking, the Lagrangian primal-dual pair of problems that we havederived is given by (2.15) and (D`), but it is easy to see that (Pρ) is equivalentto (2.15) in the sense that the respective optimal values are the same, andthat solutions to one problem readily lead to solutions for the other. Thus,without loss of generality, we refer to (D`) as the Lagrange dual to the primalproblem (Pρ).2.4 Strong dualityFreund’s 1987 analysis of the gauge dual pair is mainly based on the clas-sical separation theorem. It relies on the ray-like property of the constraintset C. Our study of the gauge dual pairs allows us to relax the ray-like as-sumption. By establishing connections with the Fenchel duality framework,we can develop strong duality conditions that are analogous to those requiredfor Lagrange duality theory.The Fenchel dual (Rockafellar, 1970, §31) of (P) is given bymaximizey−σC(−y) subject to κ◦(y) ≤ 1, (2.16)where we use (δC)∗ = σC and Proposition 2.1.2(iv) to obtain κ∗ = δ[κ◦≤1]. Letvp, vg, and vf , respectively, denote the optimal values of (P), (D) and (2.16).The following result relates their optimal values and dual solutions.Theorem 2.4.1 (Weak duality). Suppose that domκ◦ ∩ C ′ 6= ∅. Thenvp ≥ vf = 1/vg > 0.Furthermore,(i) if y∗ solves (2.16), then y∗ ∈ cone C ′ and y∗/vf solves (D);(ii) if y∗ solves (D) and vg > 0, then vfy∗ solves (2.16).Proof. The fact that vp ≥ vf follows from standard Fenchel duality theory.We now show that vf = 1/vg.Because domκ◦ ∩ C ′ 6= ∅, there exists y0 such that κ◦(y0) ≤ 1 and y0 ∈ τC ′for some τ > 0. In particular, because −σC(−y) = infc∈C〈c, y〉 for all y, itfollows from the definition of vf thatvf = supy{ infc∈C〈c, y〉 | κ◦(y) ≤ 1 } ≥ infc∈C〈c, y0〉 ≥ τ > 0. (2.17)462.4. Strong dualityHencevf = supy,λ{λ | κ◦(y) ≤ 1, −σC(−y) ≥ λ, λ > 0 } . (2.18)From this, we have further thatvf = supy,λ{λ | κ◦(y/λ) ≤ 1/λ, −σC(−y/λ) ≥ 1, 1/λ > 0 }= supy,µ{ 1/µ | κ◦(µy) ≤ µ, −σC(−µy) ≥ 1, µ > 0 } .Inverting both sides of this equation gives1/vf = infy,µ{µ | κ◦(µy) ≤ µ, −σC(−µy) ≥ 1, µ > 0 }= infw,µ{µ | κ◦(w) ≤ µ, −σC(−w) ≥ 1, µ > 0 }(i)= infw,µ{µ | κ◦(w) ≤ µ, w ∈ C ′, µ > 0 }= infw,µ{µ | κ◦(w) ≤ µ, w ∈ C ′ }= infw{κ◦(w) | w ∈ C ′ } = vg,(2.19)where equality (i) follows from the definition of C ′. This proves vf = 1/vg.We now prove item (i). Assume that y∗ solves (2.16). Then vf is nonzero(by (2.17)) and finite, and so is vg = 1/vf . Then y∗ ∈ cone C ′ because−σC(−y∗) = infc∈C〈c, y∗〉 = vf > 0, and we see from (2.19) that y∗/vf solves(D). We now prove item (ii). Note that if y∗ solves (D) and vg > 0, thenκ◦(y∗) > 0. One can then observe similarly from (2.19) that y∗/vg = vfy∗solves (2.16). This completes the proof.Fenchel duality theory allows us to use Theorem 2.4.1 to obtain severalsufficient conditions that guarantee strong duality, i.e., vpvg = 1, and theattainment of the gauge dual problem (D). For example, applying Rockafellar(1970, Theorem 31.1) yields the following corollary.Corollary 2.4.1 (Strong duality I). Suppose that domκ◦ ∩ C ′ 6= ∅ andri domκ ∩ ri C 6= ∅. Then vpvg = 1 and the gauge dual (D) attains itsoptimal value.Proof. From ri domκ ∩ ri C 6= ∅ and Rockafellar (1970, Theorem 31.1), we seethat vp = vf and vf is attained. The conclusion of the corollary now followsimmediately from Theorem 2.4.1.472.4. Strong dualityWe would also like to guarantee primal attainment. Note that the gaugedual of the gauge dual problem (D) (i.e., the bidual of (P)) is given byminimizexκ◦◦(x) subject to x ∈ C ′′, (2.20)which is not the same as (P) unless C is ray-like and κ is closed; see Theo-rem 2.1.2 and Proposition 2.1.2(ii). However, we show in the next propositionthat (2.20) and (P) always have the same optimal value when κ is closed (evenif C is not ray-like), and that if the optimal value is attained in one problem,it is also attained in the other.Proposition 2.4.1. Suppose that κ is closed. Then the optimal values of(P) and (2.20) are the same. Moreover, if the optimal value is attained inone problem, it is also attained in the other.Proof. From Proposition 2.1.7, we see that (2.20) is equivalent tominimizeλ,xλκ(x) subject to x ∈ C, λ ≥ 1,which clearly gives the same optimal value as (P). This proves the first con-clusion. The second conclusion now also follows immediately.Hence, we obtain the following corollary, which generalizes Freund (1987,Theorem 2A) by dropping the ray-like assumption on C.Corollary 2.4.2 (Strong duality II). Suppose that κ is closed, and thatri domκ∩ ri C 6= ∅ and ri domκ◦ ∩ ri C ′ 6= ∅. Then vpvg = 1 and both valuesare attained.Proof. The conclusion follows from Corollary 2.4.1, Proposition 2.4.1, the factthat κ = κ◦◦ for closed gauge functions, and the observation that ri domκ ∩ri C 6= ∅ if and only if ri domκ∩ri C ′′ 6= ∅, since ri C ′′ = ⋃λ>1 λ ri C (Rockafellar,1970, p. 50) and domκ is a cone.Before closing this section, we specialize Theorem 2.4.1 to study the rela-tionship between the Lagrange (D`) and gauge (Dρ) duals. Let vl denote theoptimal value of (D`). We use the fact that, for any y,− σC(−y) = infc∈C〈c, y〉{> 0 if y ∈ cone C ′\ { 0 } ,≤ 0 otherwise, (2.21)which is directly verifiable using the definition of C ′.482.4. Strong dualityCorollary 2.4.3. Suppose that C is given by (2.9), where 0 ≤ < ρ(b),assumption (2.11) holds, and domκ◦∩C ′ 6= ∅. Then vl = vf > 0. Moreover,(i) if y∗ solves (D`), then y∗/vl solves (Dρ);(ii) if y∗ solves (Dρ) and vg > 0, then vly∗ solves (D`).Proof. From (2.21), for any y ∈ cone C ′\ { 0 }, we have−σC(−y) = infc∈C〈c, y〉 >0 and is hence finite. Note that infc∈C〈c, y〉 = infc,r{〈c, y〉 |Ac+ r = b, ρ(r) ≤ }.Use this reformulation and proceed as in §2.3.2 to obtain the dual function`(u) = infc, ρ(r)≤{〈u, b〉 − 〈u, r〉 − (〈A∗u, c〉 − 〈c, y〉)}= 〈b, u〉 − supρ(r)≤〈u, r〉 − supc{〈A∗u− y, c〉}= 〈b, u〉 − ρ◦(u)− δA∗u=y(u).The dual problem to infc∈C〈c, y〉 is given by maximizing ` over u. Because ofassumption (2.11) and the finiteness of −σC(−y),infc∈C〈c, y〉 = supy=A∗u{〈b, u〉 − ρ◦(u)}, (2.22)and the supremum is attained, which is a consequence of Rockafellar (1970,Corollary 28.2.2 and Theorem 28.4). On the other hand, for any y /∈cone C ′\{0},we have from weak duality and (2.21) thatsupy=A∗u{〈b, u〉 − ρ◦(u)} ≤ infc∈C〈c, y〉 ≤ 0. (2.23)Since domκ◦ ∩ C ′ 6= ∅, we can substitute (2.22) into (2.18) and obtain0 < vf = sup {λ | κ◦(y) ≤ 1, −σC(−y) ≥ λ > 0 }= sup { 〈b, u〉 − ρ◦(u) | κ◦(A∗u) ≤ 1, A∗u ∈ cone C ′\ { 0 } }= sup { 〈b, u〉 − ρ◦(u) | κ◦(A∗u) ≤ 1 } = vl,where the last equality follows from (2.22), (2.23), and the positivity of vf .This completes the first part of the proof. In particular, the Fenchel dualproblem (2.16) has the same optimal value as the Lagrange dual problem(D`), and y∗ = A∗u∗ solves (2.16) if and only if u∗ solves (D`). Moreover, sinceassumption (2.11) holds, §2.3.1 shows that (D) is equivalent to (Dρ). Theconclusion now follows from these and Theorem 2.4.1.We next state a strong duality result concerning the primal-dual gauge pair(Pρ) and (Dρ).492.5. Variational properties of the gauge value functionCorollary 2.4.4. Suppose that C and C0 are given by (2.9) and (2.10),where 0 ≤ < ρ(b). Suppose also that κ is closed,ri domκ ∩ A−1 ri C0 6= ∅, and ri domκ◦ ∩ A∗ ri C ′0 6= ∅. (2.24)Then the optimal values of (Pρ) and (Dρ) are attained, and their product isequal to 1.Proof. Since A−1 ri C0 6= ∅, A satisfies the assumption in (2.11). Then §2.3.1shows that (D) is equivalent to (Dρ). Moreover, from Rockafellar (1970, The-orem 6.6, Theorem 6.7), we see that ri C = A−1 ri C0 and ri C ′ = A∗ ri C ′0. Theconclusion now follows from Corollary 2.4.2.This last result also holds if C0 were polyhedral; in that case, the assump-tions (2.24) could be replaced with ri domκ ∩ C 6= ∅ and ri domκ◦ ∩ C ′ 6= ∅.2.5 Variational properties of the gauge valuefunctionThus far, our analysis has focused on the relationship between the optimal val-ues of the primal-dual pair (Pρ) and (Dρ). As with Lagrange duality, however,there is also a fruitful view of dual solutions as providing sensitivity informa-tion on the primal optimal value. Here we provide a corresponding variationalanalysis of the gauge optimal-value function with respect to perturbations inb and .Sensitivity information is captured in the subdifferential of the value func-tionv(h, k) = infxf(x, h, k), (2.25)withf(x, h, k) = κ(x) + δepi ρ(b+ h− Ax, + k). (2.26)Following the discussion in Aravkin, Burke, and Friedlander (2013, Section 4),we start by computing the conjugate of f , which can be done as follows:f ∗(z, y, τ) = supx,h,k{ 〈z, x〉+ 〈y, h〉+ τk − κ(x)− δepi ρ(b+ h− Ax, + k) }= supx,w,µ{ 〈z + A∗y, x〉 − κ(x) + 〈y, w〉+ τµ− δepi ρ(w, µ) } − 〈b, y〉 − τ= κ∗(z + A∗y) + δ∗epi ρ(y, τ)− 〈b, y〉 − τ.502.5. Variational properties of the gauge value functionUse Proposition 2.1.2(iv) and the definition of support function and convexconjugate to further transform this asf ∗(z, y, τ) + 〈b, y〉+ τ = δκ◦(·)≤1(z + A∗y) + σepi ρ(y, τ)(i)= δκ◦(·)≤1(z + A∗y) + δ(epi ρ)◦(y, τ)(ii)= δκ◦(·)≤1(z + A∗y) + δepi(ρ◦)(y,−τ)= δκ◦(·)≤1(z + A∗y) + δρ◦(·)≤·(y,−τ),where equality (i) follows from Proposition 2.1.6 and Proposition 2.1.4(i), andequality (ii) follows from Proposition 2.1.2(vi). Combining this with the defi-nition of the value function v(h, k),v∗(y, τ) = suph,k{ 〈y, h〉+ τk − v(h, k) }= supx,h,k{ 〈y, h〉+ τk − f(x, h, k) }= f ∗(0, y, τ) = −〈b, y〉 − τ + δκ◦(·)≤1(A∗y) + δρ◦(·)≤·(y,−τ).(2.27)In view of Rockafellar and Wets (1998, Theorem 11.39), under a suitable con-straint qualification, the set of subgradients of v is nonempty and given by∂v(0, 0) = argmaxy,τ{−f ∗(0, y, τ) }= argmaxy,τ{ 〈b, y〉+ τ | κ◦(A∗y) ≤ 1, ρ◦(y) ≤ −τ }={(y,−ρ◦(y))∣∣∣∣ y∈argmaxy{〈b, y〉 − ρ◦(y) |κ◦(A∗y) ≤ 1}},(2.28)in terms of the solution set of (D`) and the corresponding function value ofρ◦(y). We state formally this result, which is a consequence of the abovediscussion and Corollary 2.4.3.Proposition 2.5.1. For fixed (b, ), define v as in (2.25) and f as in (2.26).Thendom f(·, 0, 0) 6= ∅ ⇐⇒ 0 ∈ A domκ− [ρ(b− ·) ≤ ],and hence(0, 0) ∈ int dom v ⇐⇒ 0 ∈ int(A domκ− [ρ(b− ·) < ])512.6. ExtensionsIf (0, 0) ∈ int dom v and v(0, 0) > 0, then ∂v(0, 0) 6= ∅ with∂v(0, 0) ={(y,−ρ◦(y))∣∣∣∣y∈argmaxy{〈b, y〉 − ρ◦(y) |κ◦(A∗y) ≤ 1}}={v(0, 0) · (y,−ρ◦(y))∣∣∣∣y∈argminy{κ◦(A∗y) | 〈b, y〉 − ρ◦(y) ≥ 1}}.Proof. It is routine to verify the properties of the domain of f(·, 0, 0) and theinterior of the domain of v. Suppose that (0, 0) ∈ int dom v. Then the valuefunction is continuous at (0, 0) and hence ∂v(0, 0) 6= ∅. The first expressionof ∂v(0, 0) follows directly from Rockafellar and Wets (1998, Theorem 11.39)and the discussions preceding this proposition.We next derive the second expression of ∂v(0, 0). Since (0, 0) ∈ int dom vimplies 0 ∈ int(A domκ − [ρ(b − ·) < ]), the linear map A satisfies assump-tion 2.11. Moreover, as another consequence of Rockafellar and Wets (1998,Theorem 11.39), (0, 0) ∈ int dom v also implies that v(0, 0) = supy,τ{−f ∗(0, y, τ)},which is just the optimal value of the Lagrange dual problem (D`). Further-more, v(0, 0) being finite and nonzero together with the definition of (D`) and(2.12) implies that domκ◦ ∩ C ′ 6= ∅. The second expression of ∂v(0, 0) nowfollows from these three observations and Corollary 2.4.3.2.6 ExtensionsThe following examples illustrate how to extend the canonical formulation (Pρ)to accommodate related problems. It also provides an illustration of the tech-niques that can be used to pose problems in gauge form and how to derivetheir corresponding gauge duals.2.6.1 Composition and conic side constraintsA useful generalization of (Pρ) is to allow the gauge objective to be composedwith a linear map, and for the addition of conic side constraints. The compositeobjective can be used to capture, for example, problems such as weighted basispursuit (e.g., Candés, Wakin, and Boyd (2008); Friedlander, Mansour, Saab,and Yilmaz (2012)), or together with the conic constraint, problems such asnonnegative total variation (Krishnan, Lin, and Yip, 2007).The following result generalizes the canonical primal-dual gauge pair (Pρ)and (Dρ).522.6. ExtensionsProposition 2.6.1. Let D be a linear map and K be a convex cone. Thefollowing pair of problems constitute a primal-dual gauge pair:minimizexκ(Dx) subject to ρ(b− Ax) ≤ , x ∈ K, (2.29a)minimizey, zκ◦(z) subject to 〈y, b〉 − ρ◦(y) ≥ 1, D∗z − A∗y ∈ K∗.(2.29b)Proof. Reformulate (2.29a) as a gauge optimization problem by introducingadditional variables, and lifting both the cone K and the epigraph epi ρ into theobjective by means of their indicator functions: use the function f(x, s, r, τ) :=δK(x) + κ(s) + δepi ρ(r, τ) to define the equivalent gauge optimization problemminimizex,s,r,τf(x, s, r, τ) subject to Dx = s, Ax+ r = b, τ = .As with §2.3.1, observe that f is a sum of gauges on disjoint variables. Thus,we invoke Proposition 2.1.5 to deduce the polar of the above objective:f ◦(u, z, y, α) = max { δ◦K(u), κ◦(z), δ◦epi ρ(y, α) }(i)= max { δK◦(u), κ◦(z), δ(epi ρ)◦(y, α) }(ii)= max { δK∗(−u), κ◦(z), δepi(ρ◦)(y,−α) }(iii)= δK∗(−u) + κ◦(z) + δepi(ρ◦)(y,−α),where (i) follows from Proposition 2.1.4(i), (ii) follows from Proposition 2.1.2(vi),and (iii) follows from the definition of indicator function. Moreover, use Corol-lary 2.3.2 to derive the antipolar of the linear constraint setC = { (x, s, r, τ) | Dx = s, Ax+ r = b, τ = } ,which leads us toC ′ = { (−D∗z + A∗y, z, y, α) | 〈b, y〉+ α ≥ 1 } .From the above discussion, we obtain the following gauge programminimizey,z,αδK∗(D∗z−A∗y)+κ◦(z)+δepi(ρ◦)(y,−α) subject to 〈b, y〉+α ≥ 1.Bringing the indicator functions down to the constraints leads tominimizey,z,ακ◦(z) subject to 〈y, b〉+ α ≥ 1, ρ◦(y) ≤ −α, D∗z−A∗y ∈ K∗;further simplification by eliminating α yields the gauge dual problem (2.29b).532.6. Extensions2.6.2 Nonnegative conic optimizationConic optimization subsumes a large class of convex optimization problemsthat ranges from linear, to second-order, to semidefinite programming, amongothers. Example 1.2.3 describes how a general conic optimization problem canbe reformulated as an equivalent gauge problem; see (1.16).We can easily accommodate a generalization of (1.16) by embedding itwithin the formulation defined by (1.15a), and defineminimizex〈c, x〉+ δK(x) subject to ρ(b− Ax) ≤ , (2.30)with c ∈ K∗, as the conic gauge optimization problem. The following resultdescribes its gauge dual.Proposition 2.6.2. Suppose that K ⊂ X is a convex cone and c ∈ K∗.Then the gaugeκ(x) = 〈c, x〉+ δK(x)has the polarκ◦(u) = inf {α ≥ 0 | αc ∈ K∗ + u } , (2.31)with domκ◦ = span{c} − K∗. If K is closed and c ∈ intK∗, then κ hascompact level sets, and domκ◦ = X .Proof. From Proposition 2.1.2, we have thatκ◦(u) = sup {〈u, x〉 |κ(x) ≤ 1}= sup {〈u, x〉 | 〈c, x〉 ≤ 1 and x ∈ K} (2.32)= inf {α ≥ 0 |αc− u ∈ K∗} ,where the strong (Lagrangian) duality relationship in the last equality stemsfrom the following argument. First consider the case where u ∈ domκ◦. Be-cause the maximization problem in (2.32) satisfies Slater’s condition, equalityfollows from Rockafellar (1970, Corollary 28.2.2 and Theorem 28.4). Next, con-sider the case where u /∈ domκ◦, where κ◦(u) = +∞. The last equality thenfollows from weak duality. For the domain, note that the minimization prob-lem is feasible if and only if u ∈ span{c} −K∗; hence domκ◦ = span{c} −K∗.To prove compactness of the level sets of κ when K is closed and c ∈ intK∗,define γ := infx { 〈c, x〉 | ‖x‖ = 1, x ∈ K} and observe that compactness of thefeasible set in this minimization implies that the infimum is attained and thatγ > 0. Thus, for any x ∈ K \ {0}, 〈c, x〉 ≥ γ ‖x‖ > 0 and, consequently,that {x ∈ X | κ(x) ≤ α } = {x ∈ K | 〈c, x〉 ≤ α } ⊂ {x ∈ X | ‖x‖ ≤ α/γ }.542.6. ExtensionsThis guarantees that the level sets of κ are bounded, which establishes theircompactness. From this and Proposition 2.1.2(iii), we see that κ◦(u) is finitefor any u ∈ X .Remark 2.6.1. Note that even though the polar gauge in (2.31) is closed, itis not necessarily the case that it has a closed domain. For example, let K bethe cone of PSD 2-by-2 matrices, and definec =(1 00 0)and un =(0 11 − 1n),for each n = 1, 2, . . .. Use the expression (2.31) to obtain that κ◦(un) = n.Hence, un ∈ domκ◦, but limn→+∞ un 6∈ domκ◦.This is an example of a more general result described by Ramana, Tunçel,and Wolkowicz (1997, Lemma 2.2), which shows that the cone of PSD matricesis devious (i.e., for every nontrivial proper face F of K, spanF + K is notclosed). The concept of a devious cone seems to be intimately related to theclosedness of the domain of polar gauges such as (2.31) because span{c}−K∗ =−(spanF +K∗), where F ⊆ K∗ is the smallest face of K∗ that contains c; seeTunçel and Wolkowicz (2012, Proposition 3.2).With that in mind, it is interesting to derive a representation for the closureof the domain of (2.31). It follows from Rockafellar (1970, Corollary 16.4.2)that cl domκ◦ = cl (span{c} − K∗) = ({c}⊥ ∩ clK)◦.55Chapter 3Spectral gauge optimization anddualityThe gauge duality framework developed in Chapter 2, albeit inspired by convexformulations arising from practical inverse problems, was still approached ina rather abstract form.In this chapter we connect this general class of problems with those con-vex relaxations for phase recovery and blind deconvolution presented in §1.1.Our goal is to derive their corresponding dual problems and to show that theyadmit a unified description suitable for the design of numerical methods fortheir minimization.Due to its expressiveness and direct relationship to the PhaseLift formula-tion, we begin by specializing the nonnegative conic gauge optimization classfrom §2.6.2 to matrix problems in which the abstract cone is instantiated tobe that of positive semidefinite Hermitian matrices. This specialization alongwith further analysis lead us to a constrained eigenvalue minimization prob-lem derived from the dual gauge problem, which will serve as the basis for ourapproach described in Chapter 4.This is followed by an analogous derivation of a dual gauge problem associ-ated to the nuclear norm relaxation described in §1.1.2, which is then furtherreduced to a problem of the same form as that for PhaseLift—thus allowingfor a unified approach to both problems.We conclude this chapter with a construction of a family of lower approx-imants to this spectral gauge dual objective and a discussion of their possibleapplication within the framework of proximal bundle methods for nonsmoothconvex minimization.3.1 Semidefinite nonnegative conic optimizationFor our purposes, an instance of semidefinite nonnegative conic optimizationis a problem of the formminimizeX∈Hn〈C,X〉 subject to ρ(b−AX) ≤ and X 0, (3.1)563.1. Semidefinite nonnegative conic optimizationwhere C 0, ρ : Rm → R ∪ {+∞} is a closed gauge function such thatρ−1(0) = {0}, b ∈ Rm \ {0}, < ρ(b), and A : Hn → Rm is a linear mapdefined by matrices Ak ∈ Hn in such a way that(AX)k = 〈X,Ak〉 := trace(A∗kX) = trace(AkX)for each k = 1, . . . ,m and all X ∈ Hn.We begin with the specialization of the general nonnegative conic gaugeoptimization presented in §2.6.2 to problems of the form (3.1). In that abstractformulation, we take X = Hn to be the real vector space of n-by-n Hermitianmatrices, K ⊂ X to be the (self-dual) cone of positive semidefinite matrices,c = C and A = A, this specialization translates (2.30) to the problemminimizeX∈Hnκ(X) := 〈C,X〉+ δ·0(X) subject to ρ(b−AX) ≤ , (3.2)where the PSD constraint from (3.1) is moved to the objective function turningit into the gauge κ, a technique used in §2.6.2 for more general convex cones.Invoking Proposition 2.6.2, we have that the gauge polar to κ is given byκ◦(U) = infα∈R{α ≥ 0 | αC U }= max{0, inf {α ∈ R | αC U }}= max{0, λ1(U,C)}= [λ1(U,C)]+ ,where λ1(U,C) denotes the rightmost generalized eigenvalue—i.e., largest realnot in absolute value—corresponding to the generalized eigenvalue problemUx = λCx (which might be +∞ as in the example given in Remark 2.6.1).With this characterization, the abstract gauge dual (Dρ) derived in §2.3.1specializes to the spectral optimization problemminimizey∈Rm[λ1(A∗y, C)]+ subject to 〈b, y〉 − ρ◦(y) ≥ 1, (3.3)which is feasible for 0 ≤ < ρ(b), but not necessarily finite due to assumingonly semidefiniteness of C; c.f. Remark 2.6.1.In the following, we further specialize and analyze this primal-dual pair toits most commonly encountered form from convex relaxations.3.1.1 Trace minimization in the PSD coneThe lifted formulation of phase retrieval is an example of the semidefiniteconic gauge optimization problem (3.2) where C is the identity matrix andρ : Cm → R is some norm ‖·‖ (typically the 2-norm).573.1. Semidefinite nonnegative conic optimizationFor these cases, (3.2) reduces to the problem of minimizing the trace of aPSD matrix that satisfies a prescribed bound on the (norm-measured) linearmeasurements misfit, while the dual consists of a constrained eigenvalue mini-mization problem. This neatly specializes the abstract primal-dual gauge pairminimizex∈Xκ(x) subject to ρ(b− Ax) ≤ , (3.4a)minimizey∈Yκ◦(A∗y) subject to 〈y, b〉 − ρ◦(y) ≥ 1, (3.4b)to the spectral gauge pair that forms the core of our approachminimizeX∈HntraceX + δ(X | · 0) subject to ‖b−AX‖ ≤ , (3.5a)minimizey∈Rm[λ1(A∗y)]+ subject to 〈y, b〉 − ‖y‖∗ ≥ 1, (3.5b)where ‖·‖∗ denotes the norm dual to ‖·‖ and A∗y =∑mk=1Akyk.Figure 3.1 illustrates the geometry of the gauge dual feasible set in (3.5b)for different norms and values of in the case wherem = 2 and b = (0, 1) ∈ R2.The generalized “unit balls” of the gauge traceX + δ(X | · 0) and its corre-sponding polar [λ1(U)]+ are depicted in Figure 3.2 for 2-by-2 real symmetricmatrices parameterized as[x, z√2; z√2, y], where (x, y, z) correspond to the co-ordinate axes in R3. Figures 3.3 and 3.4 provide alternative viewing positions.Assuming the original problem with the cone constraints is feasible andb 6= 0, we can further simplify the dual objective and safely eliminate thepositive-part operator: since in this case κ(X) is obviously strictly positivefor all nonzero X, and is additionally finite over the feasible set of the origi-nal cone-constrained problem, it follows from weak gauge duality (cf. Theo-rem 2.4.1) that κ◦(A∗y) is positive for all dual feasible points. In other words,0 < [λ1(A∗y)]+ = λ1(A∗y) (3.6)for all dual feasible points y, leading to the equivalent spectral problemminimizey∈Rmλ1(A∗y) subject to 〈b, y〉 − ‖y‖∗ ≥ 1, (3.7)which will play a central role in our computational approach described inChapter 4.In practice, we need to be prepared to detect infeasiblity. The failure ofcondition (3.6) in fact furnishes a certificate of infeasibility for the originalcone-constrained problem (3.1): if λ1(A∗y) ≤ 0 for some dual-feasible vectory, it follows from weak gauge duality that κ(X) is necessarily infinite over thefeasible set of (3.5a)—i.e., X 6 0 for all X feasible in problem (3.5a).583.1. Semidefinite nonnegative conic optimization(a) ‖·‖ = ‖·‖1 and ‖·‖∗ = ‖·‖∞(b) ‖·‖ = ‖·‖2 and ‖·‖∗ = ‖·‖2(c) ‖·‖ = ‖·‖∞ and ‖·‖∗ = ‖·‖1Figure 3.1: Visualizing the dual feasible set for b = (0, 1) and ∈ [0:.1:.9].593.2. Nuclear-norm minimizationFigure 3.2: Generalized unit balls of the trace-gauge and its polar for realsymmetric 2-by-2 matrices (intersected with [−1.5, 1.5]3 for illustration pur-poses). Depictions above correspond to the parameterization[x, z√2; z√2, y].Figures 3.3 and 3.4 provide different viewing positions.3.2 Nuclear-norm minimizationRecall the nuclear-norm minimization problem described in Chapter 1,minimizeX∈Cn1×n2‖X‖1 =∑iσi(X) subject to ‖b−AX‖ ≤ ,where the measurement operator A : Cn1×n2 → Cm is defined by matricesAk ∈ Cn1×n2 as (AX)k := 〈X,Ak〉 := trace(A∗kX), k = 1, . . . ,m, and < ‖b‖ .The dual to this gauge minimization problem admits a far simpler treat-ment if we exploit the norm structure of the objective. By observing that itspolar is the corresponding dual norm (i.e., the operator norm ‖·‖∞ = σ1(·))and identifying Y = Cm as a real 2m-dimensional vector space with the inner-product R 〈·, ·〉 , we have the following primal-dual gauge pairminimizeX∈Cn1×n2‖X‖1 subject to ‖b−AX‖ ≤ , (3.8a)minimizey∈Cm‖A∗y‖∞ subject to R 〈y, b〉 − ‖y‖∗ ≥ 1, (3.8b)where A∗y = ∑mk=1Akyk ∈ Cn1×n2 is the adjoint of A evaluated at y.603.2. Nuclear-norm minimizationFigure 3.3: Generalized unit ball of the trace-gauge for symmetric 2-matrices.Figure 3.4: Generalized unit ball of the polar to the trace-gauge for real sym-metric 2-matrices. Set is unbounded, so it is illustrated here within a box.613.2. Nuclear-norm minimization3.2.1 Reduction to PSD trace minimizationIt will be convenient, for both theoretical and algorithmic developments of theapproach we discuss in Chapter 4, to embed the nuclear-norm minimizationproblem (3.8a) within the symmetric SDP formulation (3.5a). The resultsare no less general, and it will allow us to solve both problems with what isessentially a single algorithmic approach and software implementation.The reduction to the Hermitian trace-minimization problem (3.5a) leve-rages the SDP formulation of the nuclear norm as introduced by Fazel (2002)and used by Ahmed et al. (2014) for their algorithmic approach, as we discussedin §1.1.2. It embeds the problem in a larger Hermitian space and separatesthe complex measurements into their real and imaginary parts, leading to aproblem of the formminimizeU∈Hn1 ,V ∈Hn2X∈Cn1×n2r1,r2∈Rm〈12(I 00 I),(U XX∗ V)〉subject to〈12(0 AkA∗k 0),(U XX∗ V)〉+ r1k = Rbk,〈i2(0 Ak−A∗k 0),(U XX∗ V)〉+ r2k = Ibk,‖r1 + ir2‖ ≤ ,(U XX∗ V) 0, k = 1, . . . ,m,(3.9)where the residual vectors r1, r2 ∈ Rm are introduced simply to allow a com-pact presentation.The gauge dual obtained from this problem when instantiated in (3.3) willhave the exact same form as (3.8b) once the dual variables y1, y2 ∈ Rm are“compacted” back into a complex vector y = y1 + iy2 ∈ Cm and the objectiveis simplified by observing the following equalities[λ1((0 12(A∗y)12(A∗y)∗ 0),12I)]+=[λ1(0 A∗y(A∗y)∗ 0)]+= [‖A∗y‖∞]+= ‖A∗y‖∞ ,where we use the fact that the eigenvalues of the matrix [0, Z;Z∗, 0] are±σk(Z)for each k = 1, . . . ,min{n1, n2} and |n1 − n2| zeros.Based on this reduction, from now on we focus entirely on the PSD traceminimization formulation (3.5a), its gauge dual (3.5b) and the correspondingconstrained eigenvalue optimization problem (3.7).623.3. Spectral lower models for convex minimization3.3 Spectral lower models for convexminimizationIn the next chapter we describe a computational approach for problem (3.5b)that is based on a projected subgradient method (PSGM) to solve that con-vex eigenvalue minimization problem. Although simple and promising (c.f.,experiments described in Chapter 5), that approach inherits some difficultiesfrom PSGMs—e.g., lack of natural stopping criteria—for which workaroundsneed to be devised.Bundle methods for nonsmooth convex minimization provide natural mech-anisms to address such issues, however they require good (lower) approxima-tions for the objective function. The main goal of this section is to describethe construction of a family of such lower models that seems suitable for thespectral objectives appearing in the dual gauge problems from §3.1 and §3.2.Although promising, these approximations have yet to be thoroughly ex-ploited in the design of computational alternatives to the PSGM approach. Tosupport the first steps towards such an endeavour, the following subsectionsdescribe the construction of these lower models and briefly outline their usewithin the context of proximal bundle methods.An overview. Many classical methods for nonsmooth convex minimizationhave been designed for problems of the formminimizex∈Xf(x) subject to x ∈ F, (CM)where f : X → R is a convex objective function and F ⊆ X is the set of fea-sible points, a closed and convex subset of a finite-dimensional real Euclideanspace X . Due to the simplicity of the dual constraints in (3.5b) and the non-smoothness of the objective, we focus on casting that dual gauge problem inthis framework and specialize the numerical methods to that class of problems.In order to compute solutions for problems in the general formulation (CM),many methods assume that it is possible to query the objective function (seenas a “black-box”) for its value and subgradients at given points (F might al-low for the solution of certain quadratic problems associated with it as anadditional constraint, e.g., projection onto F ). A common structure in thesemethods is that such functional values and subgradients are used to constructsuitable lower models for the original objective function (e.g., bundle meth-ods (Bonnans, Gilbert, Lemaréchal, and Sagastizábal, 2006, Chapter 10)).Following this reasoning, we proceed by presenting a family of lower modelsfor the dual gauge objective in (3.5b) and will assume that the constraints in633.3. Spectral lower models for convex minimizationthat problem are simple enough to allow for the operations required in thosemethods to be performed without significant overhead (since those constraintscan be transformed into a particularly simple second-order cone constraint, webelieve this is not unreasonable).3.3.1 Spectral lower models for the dual gauge objectiveOur approach to construct lower models for the dual gauge objective in (3.5b)is based on forming outer-approximations for its epigraph in a manner equiv-alent to that used by Helmberg and Rendl (2000).Whenever U ∈ Cn×k is an arbitrary matrix with C-orthonormal columns(i.e., U∗CU = I), we have thatepi [λ1(A∗·, C)]+ ={(y, t) ∈ Y × R+∣∣∣ t > λ1(A∗y, C)}={(y, t) ∈ Y × R+∣∣∣ tC < A∗y}⊆{(y, t) ∈ Y × R+∣∣∣ tU∗CU < U∗ (A∗y)U}={(y, t) ∈ Y × R+∣∣∣ tI < U∗ (A∗y)U}={(y, t) ∈ Y × R+∣∣∣ t > λ1(U∗ (A∗y)U)}= epi [λ1(U∗ (A∗·)U)]+= epi [λ1(A∗U ·)]+ ,where we define the reduced measurement operatorAU : Hk → Cm asAU(Z) :=A (UZU∗) , which has an adjointA∗U : Cm → Hk defined byA∗Uy = U∗ (A∗y)U.The advantage of using such approximations is that, while the originalfunction involved a typically very large linear matrix inequality (LMI) withmatrices of order n, the outer-approximation is described by LMIs of a givenorder k. As long as we can control this order k, we have the possibility todeal with much reduced problems (an observation also made in Helmberg andRendl (2000) and exploited via proximal bundle methods (Bonnans et al.,2006, Chapter 10)).Relationship with approximate subdifferentialsKey observations for the suitability of such lower models show up when theyare built using generalized eigenvectors of the adjoint measurement operatorat a given point and related to approximate subdifferentials of the objective at643.3. Spectral lower models for convex minimizationfeasible points. To that end, let us begin with a simple characterization of thesubdifferential of the abstract gauge objective in (3.4b) presented inProposition 3.3.1. Let κ : X → R ∪ {+∞} be a closed gauge functionwith compact level-sets and A : X → Y be a linear map. Then, for eachε > 0, we have thatg ∈ ∂ε(κ◦(A∗·))(y)⇐⇒ g = Au, κ(u) 6 1 and ε > κ◦(A∗y)− 〈A∗y, u〉 .Proof. Since κ has compact level-sets, we have that κ◦ is finite-valued every-where and the first and second of the following chain of equivalences holdby (Hiriart-Urruty and Lemaréchal, 1993, Theorem XI.3.2.1) and (Hiriart-Urruty and Lemaréchal, 1993, Proposition XI.1.2.1), respectively:g ∈ ∂ε(κ◦(A∗·))(y)⇐⇒ g = Au and u ∈ ∂ε(κ◦(·))(A∗y)⇐⇒ g = Au and (κ◦)∗(u) + κ◦(A∗y)− 〈A∗y, u〉 6 ε⇐⇒ g = Au and δκ(·)61 (u) + κ◦(A∗y)− 〈A∗y, u〉 6 ε⇐⇒ g = Au, κ(u) 6 1 and κ◦(A∗y)− 〈A∗y, u〉 6 ε.By noticing that, if the primal (3.5a) is feasible, weak gauge duality andSlater’s constraints qualification in the Lagragian dual of (3.5a) give us0 < (traceX)−1 6 [λ1(A∗y, C)]+ = λ1(A∗y, C) ,for every (dual) feasible point in (3.5b).Observing these two results and that the assumption C 0 implies that theprimal gauge objective κ(X) = 〈C,X〉 + δ·<0 (X) has compact level sets (c.f.Proposition 2.6.2), we can relate our family of outer-approximations with aclass of subdifferential-based lower models for our objective in (3.5b) centeredat a given dual feasible point. Let yˆ ∈ Y be feasible in (3.5b), U ∈ Cn×khave as columns generalized eigenvectors corresponding to the (decreasinglyordered) rightmost generalized eigenvalues of A∗yˆ with respect to C (countingmultiplicities) and Z ∈ Hk be such that traceZ = 1 and Z < 0, then〈yˆ,A (UZU∗)〉 = 〈U∗(A∗yˆ)U,Z〉=〈Diag(λ[k](A∗yˆ, C)), Z〉> λk(A∗yˆ, C)= [λ1(A∗yˆ, C)]+ − ([λ1(A∗yˆ, C)]+ − λk(A∗yˆ, C))= [λ1(A∗yˆ, C)]+ − (λ1(A∗yˆ, C)− λk(A∗yˆ, C)),653.3. Spectral lower models for convex minimizationwhere the last equality uses the observation above that λ1(A∗yˆ, C) > 0 atdual feasible points y. By noticing that 〈C,UZU∗〉 = 〈U∗CU,Z〉 = 〈I, Z〉 =traceZ, we deduce that∂εˆ [λ1(A∗·, C)]+ (yˆ) ⊇{AUZ∣∣∣ traceZ = 1, Z < 0},where εˆ := λ1(A∗yˆ, C) − λk(A∗yˆ, C) . Moreover, it is easy to see that theset on the right-hand side of this inclusion has a nonempty intersection with∂ [λ1(A∗·, C)]+ (yˆ), a property that will justify the use of such lower models inproximal bundle methods as we will see later.These considerations shed some light on the structure of our family of lowermodels as a possibly uncountable supremum of linear lower models given byapproximate subgradient inequalities:[λ1(A∗y, C)]+> suptraceZ=1Z<0{[λ1(A∗yˆ, C)]+ + 〈y − yˆ,AUZ〉−([λ1(A∗yˆ, C)]+ −〈λ[k](A∗yˆ, C), diag(Z)〉) }+= suptraceZ=1Z<0{λ1(A∗yˆ, C) + 〈A∗U(y − yˆ), Z〉−λ1(A∗yˆ, C) +〈λ[k](A∗yˆ, C), diag(Z)〉 }+= suptraceZ=1Z<0{〈A∗U(y − yˆ), Z〉+〈λ[k](A∗yˆ, C), diag(Z)〉}+= suptraceZ=1Z<0{〈A∗Uy, Z〉 − 〈A∗U yˆ, Z〉+〈λ[k](A∗yˆ, C), diag(Z)〉}+= suptraceZ=1Z<0{〈A∗Uy, Z〉 −〈Diag(λ[k](A∗yˆ, C)), Z〉+〈λ[k](A∗yˆ, C), diag(Z)〉}+= suptraceZ=1Z<0{〈A∗Uy, Z〉 −〈λ[k](A∗yˆ, C), diag(Z)〉+〈λ[k](A∗yˆ, C), diag(Z)〉}+= suptraceZ=1Z<0〈A∗Uy, Z〉+= [λ1(A∗Uy)]+ .663.3. Spectral lower models for convex minimization3.3.2 Proximal-bundle subproblemsIn a seminal paper, Correa and Lemaréchal (1993) introduced general condi-tions under which the classical proximal bundle method (Bonnans et al., 2006,Chapter 10) converges to a solution of (CM).The basic idea in those methods consists of approximating f using lowermodels fˇi around the main iterates xk and minimizing a stabilization of theform fˇi + 12αk ‖ · −xk‖22 + δF (·) . By suitably constructing those lower modelsto satisfy a few general conditions and updating the main iterates when suffi-cient descent is achieved, it can be shown that the sequence of main iteratesconverges to a minimizer of f in F when such a point exists.In the following considerations, we skip many details on the outer prox-imal bundle iterations and focus our attention on verifying that the familyof lower models presented in the previous subsection can be used within theframework presented in (Correa and Lemaréchal, 1993) and on the structureof the proximal bundle’s stabilized subproblems when specialized to the dualgauge problem for the generalized PhaseLift formulation (3.5b).Conditions on lower models for use in proximal bundle methodsBundle methods generate iterates approximating solutions of (CM) by solvingproblems of the formxˆk := arg minx∈Xfˇk(x) +12αs‖x− xs‖22 subject to x ∈ F,in order to achieve a tentative iterate xˆk ∈ F. If a certain sufficient decreasecondition is satisfied, this step is called a serious step and the stability centerxs is updated with the new serious iterate xs+1 := xˆk. Otherwise, the step isdeclared a null step and the null iterate xˆk is only used to update the lowermodel.In principle, the lower model can updated at any time after either a seriousor a null step as long as the conditions below, introduced in (Correa andLemaréchal, 1993, page 269), are satisfied:fˇk(x) 6 f(x), ∀k = 0, . . . and ∀x ∈ F,max{fˇk(xˆk) +〈α−1s (xs − xˆk), · − xˆk〉 ,f(xˆk) +〈gˆk, · − xˆ〉}6 fˇk+1, if xˆk ∈ F was generatedin a null step,where gˆk ∈ ∂f(xˆk) is arbitrary.673.3. Spectral lower models for convex minimizationThis result is remarkable since it presents very meager conditions on theconstruction of lower models for the objective which will still satisfy the con-vergence results for a proximal bundle iteration based on them. In fact, themaximum of the two affine functions on the left-hand side of the second con-dition can be used to construct such lower models since it satisfies the firstcondition. This can be verified by noticing that the second term in that max-imum corresponds to a subgradient linearization of the convex f, and then byobserving that the optimality condition for the stabilized subproblem impliesthat the first term in the maximum is a subgradient linearization of fˇk inF—i.e., α−1s (xs − xˆk) ∈ ∂fˇk(xˆk) +NF (xˆk)—, which is a lower model for f byassumption.Naturally, one would not expect good practical convergence behaviour ifonly these meager assumptions are satisfied (i.e., using arbitrary lower modelsafter serious steps and the maximum of these two linearizations). However, bydesigning more accurate lower models satisfying these conditions, one might beable to trade-off convergence performance and complexity of the lower modelsand difficulty in the numerical solution of the subproblems.It is noteworthy that the spectral bundle method introduced by Helmbergand Rendl (2000) can be roughly seen as implementing lower models con-structed taking the maximum of the first affine function above (also knownin the bundle literature as the aggregate linearization (Bonnans et al., 2006,Chapter 10)) and substituting the second subgradient linearization with aspectral lower model as presented in the previous subsection. For those, U isconstructed in such a way that it contains columns with a few of the rightmosteigenvectors of the corresponding A∗yˆk, as well as re-orthonormalized columnsfrom previously computed eigenvectors.Proximal bundle subproblems with spectral lower modelsFrom our discussion above, the design and maintenance of lower models forthe dual gauge objective in (3.5b) allow for considerable flexibility. As longas the models lie below the objective at all times and above the aggregatelinearization and some subgradient linearization at the most recent subprob-lem solution, we have the freedom to manipulate the model function in orderto balance memory usage and subproblem solution costs against the accu-racy/tightness of those lower models while still satisfying the assumptionsused to prove convergence of the serious iterates to a minimizer.To present our construction, we first present the structure of the proximal683.3. Spectral lower models for convex minimizationbundle subproblems when specialized to (3.5b):yˆk := minimizey∈Cmfˇk(y) +12αs‖y − ys‖22 subject to R 〈y, b〉 − ‖y‖2 > 1,where fˇk is our current lower model for [λ1(A∗·, C)]+ .A simple and natural construction of lower models can be derived from theconditions stated above based on the lower models from the previous subsec-tion. Given a C-orthonormal matrix Uk ∈ Cn×rk whose columns comprise atleast one generalized eigenvector of A∗yˆk corresponding to λ1(A∗yˆk) and givenalso arbitrary C-orthonormal {Ui}i∈Ik , we have that lower models of the formfˇk+1 :=max{0, fˇk(yˆk) +〈α−1s (ys − yˆk), · − yˆk〉, λ1(A∗Uk−1·),maxi∈Ik {λ1(A∗Ui ·)}},satisfy the conditions presented by Correa and Lemaréchal (1993). This is dueto our considerations in the previous subsection regarding the relationshipbetween lower models of the form [λ1(A∗U ·)]+ and the (approximate) subdif-ferentials of the dual gauge objective at dual feasible points.Remark 3.3.1. It is noteworthy that Uk does not need to contain more thanone column, as long as one of its columns is a generalized eigenvector of A∗yˆkcorresponding to λ1(A∗yˆk, C) , not all columns need to be generalized eigen-vectors and Ik could be empty (i.e., no additional Ui at all). In fact, one canshow that this is (by equivalence) the case in the spectral bundle method Helm-berg and Rendl (2000) where Uk contains some generalized eigenvectors andyˆk as well as C-reorthonormalized columns of previously computed generalizedeigenvectors.Using lower models like these, we can reformulate bundle subproblems asminimize(y,t)∈Cm×Rt+12αs‖y − ys‖22subject toR 〈y, b〉 − ‖y‖2 > 1,tˆk−1 +〈α−1s (ys − yˆk−1), y − yˆk−1〉 6 t,A∗Uky 4 tI,A∗Uiy 4 tI, i ∈ Ik−1,t > 0,693.3. Spectral lower models for convex minimizationwhere tˆk−1 := fˇk(yˆk). These problems can be solved by off-the-shelf solversimplementing interior point methods (typically after introducing multiple, butsmall, semidefinite slacks and second-order cone auxiliary variables).In this approach, we trade-off the minimization of a linear objective overone huge (n × n) linear matrix inequality (LMI) on m (possibly complex)variables by the sequential minimization of simple quadratic objectives overcontrollably-many controllably-small LMIs (still on m variables).As mentioned earlier, we would not expect to be able to arbitrarily decreasethe complexity of the subproblems and the induced lower models withoutimpacting the number of iterations needed to achieve fixed stopping criteria.However, this family of spectral lower models satisfying the conditions of theframework presented by Correa and Lemaréchal (1993) allows much flexibilityfor bundle maintenance while retaining convergence guarantees.Although the family of lower models described in this section allows for anatural exploitation within the framework of bundle methods for nonsmoothconvex minimization, the approach and proof of concept solver we describenext make use of a simpler projected subgradient descent method for the dualspectral optimization problem (3.5b). The use of such models for the numericalsolution of that problem, along with the primal recovery technique presentednext, seems to be an interesting topic for further investigation.70Chapter 4Low-rank spectral optimizationHaving established in Chapter 3 the connection between those matrix-liftedconvex relaxations of phase recovery and blind deconvolution from Chapter 1and the gauge duality framework studied in Chapter 2, we are in the positionto develop a computational strategy for those problems.The approach that we propose is designed for the numerical solution ofthe following problems in the scenarios of low-rank recovery described in §1.1:minimizeX∈HntraceX subject to ‖b−AX‖ ≤ , X 0, (4.1a)minimizeX∈Cn1×n2‖X‖1 :=min{n1,n2}∑i=1σi(X) subject to ‖b−AX‖ ≤ , (4.1b)where the parameter ∈ [0, ‖b‖) controls the admissible deviations between thelinear model AX and the vector of observations b. (The particular propertiesof b and the linear operators A are detailed in §1.1.) Our strategy for bothproblems is based on first solving a related constrained Hermitian eigenvalueoptimization problem over a simple constraint, and then using that solution torecover a solution of the original problem. This eigenvalue problem is highlystructured, and because the constraint is easily handled, we are free to applya projected first-order method with inexpensive per-iteration costs that scaleswell to very large problems.The method that we develop applies to the much broader class of conicoptimization problems with nonnegative objective values. We pay special at-tention to the low-rank spectral problems just mentioned because their mea-surement operators and solution are highly structured and can be exploitedboth theoretically and computationally.In the following we summarize our overall scheme and its building blocksare described in the next sections.714. Low-rank spectral optimizationA roadmap of our approach. Our strategy for these low-rank spectraloptimization problems is based on solving the following dual constrained eigen-value optimization problem derived via gauge duality in Chapter 3:minimizey∈Rmλ1(A∗y) subject to 〈b, y〉 − ‖y‖∗ ≥ 1. (4.2)The dimension of the variable y in the eigenvalue optimization problem cor-responds to the number of measurements. In the context of phase retrievaland blind deconvolution, Candès et al. (2015b) and Ahmed et al. (2014) showthat the number of measurements needed to recover with high probability theunderlying signals is within logarithmic factors of the signal length (see §1.1).The crucial implication is that the dimension of the dual problem grows slowlyas compared to the dimension of the primal problem, which grows quadrati-cally on the signal length.In our implementation, we apply a simple first-order projected subgradientmethod to solve this convex constrained spectral optimization problem. Thedominant cost at each iteration of our algorithm is the computation of right-most eigenpairs of the n × n Hermitian linear operator A∗y, which are usedto construct descent directions for (4.2). The structure of the measurementoperators then allows us to use Krylov-based eigensolvers, such as ARPACK(Lehoucq, Sorensen, and Yang, 1998), for obtaining these leading eigenpairs.Primal solution estimates X are then recovered and retained in a low-rank fac-tored form via relatively small constrained least-squares problems, describednext in §4.2.An analogous approach based on the classical Lagrangian duality wouldalso lead to a dual optimization problem in the same space as our dual eigen-value problem (c.f. considerations in §§2.3.2 and characterization in §3.1):maximizey∈Rm〈y, b〉 − ‖y‖∗ subject to A∗y I. (4.3)Note that the Lagrange dual possesses a rather simple objective and a difficultlinear matrix inequality of order n as a constraint. Precisely the reverse situ-ation holds for the gauge dual (4.2), which has a relatively simple constraint.It is well known that SDPs with a constant-trace property—i.e., AX =b implies trace(X) is constant—have Lagrange dual problems that can bereformulated as unconstrained eigenvalue problems. This approach is usedby Helmberg and Rendl (2000) to develop a spectral bundle method. Theapplications that we consider however, do not necessarily exhibit this property.724.1. Derivation of the approach4.1 Derivation of the approachThere are two key theoretical pieces needed for our approach. The first is thederivation of the convex constrained eigenvalue optimization problem (4.2),which we described in §3.1.1 (see equation (3.7)). The second piece is describednext and consists of the derivation of a subproblem that allows recovery of aprimal solution X from a solution of (4.2).4.1.1 Recovering a primal solutionOur derivation of a subproblem for primal recovery proceeds in two stages.The first stage develops necessary and sufficient optimality conditions for theprimal-dual gauge pair (3.5a) and (3.5b). The second stage uses these toderive a subproblem that can be used to recover a primal solution from a dualsolution.Strong duality and optimality conditionsFrom Theorem 2.4.1, the weak gauge duality condition1 ≤ κ(X)κ◦(A∗y) (4.4)holds for all primal-dual feasible pairs (X, y). The following result asserts thatif the pair is optimal, then that inequality must necessarily hold tightly.Proposition 4.1.1 (Strong duality). If (4.1a) is feasible and 0 ≤ < ‖b‖ ,then minX∈Hn‖b−AX‖≤traceX + δ(X | · 0) · infy∈Rm〈y,b〉−‖y‖∗≥1[λ1(A∗y)]+ = 1. (4.5)Proof. We proceed by reasoning about the Lagrangian-dual pair (4.1a) and (4.3).We then translate these results to the corresponding gauge-dual pair (3.5a)and (3.5b).The primal problem (4.1a) is feasible by assumption. Because its Lagrangedual problem (4.3) admits strictly feasible points (e.g., y = 0), it follows fromRockafellar (1970, Theorems 28.2 and 28.4) that the primal problem attainsits positive minimum value and that there is zero duality gap between theLagrange-dual pair (a broadly known fact of satisfying dual Slater conditions).734.1. Derivation of the approachMoreover, because the primal problem (4.1a) attains its positive minimumvalue for some X̂, and there is zero duality gap, there exists a sequence{yj} such that [λ1(A∗yj)]+ ≤ 1 and 〈yj, b〉 − ‖yj‖∗ ↗ trace X̂. Becausetrace X̂ > 0, we can take a subsequence {yjk} for which 〈yjk , b〉 − ‖yjk‖∗ isuniformly bounded above zero. Define the sequence {ŷk} by ŷk := yjk(〈yjk , b〉− ‖yjk‖∗)−1. Then 〈ŷk, b〉− ‖ŷk‖∗ = 1 for all k, which is a feasible sequence forthe gauge dual problem (3.5b). Weak gauge duality (4.4) and the definitionof ŷk then implies that(trace X̂)−1 ≤ [λ1(A∗ŷk)]+ ≤ (〈yjk , b〉 − ‖yjk‖∗)−1 ↘ (trace X̂)−1.Multiply the series of inequalities by trace X̂ to obtain (4.5).Note the lack of symmetry in the statement of Proposition 4.1.1: the pri-mal problem is stated with a “min”, but the dual problem is stated with an“inf”. This is because the dual Slater condition—i.e., strict feasibility of thecorresponding Lagrange-dual problem (4.3)—allows us to assert that a primaloptimal solution necessarily exists. However, we cannot assert in general thata dual optimal solution exists because the corresponding primal feasible setdoes not necessarily satisfy the Slater condition. We comment further on thistheoretical question in the concluding Chapter 6.The following result characterizes gauge primal-dual optimal pairs. It relieson von Neumann’s trace inequality, which says that: for Hermitian A and B,〈A,B〉 ≤ 〈λ(A), λ(B)〉 ,and equality holds if and only if A and B admit a simultaneous ordered eigen-decomposition, i.e., A = U Diag[λ(A)]U∗ and B = U Diag[λ(B)]U∗ for someunitary matrix U ; see Lewis (1996).Proposition 4.1.2 (Optimality conditions). If (4.1a) is feasible and 0 ≤ < ‖b‖ , then (X, y) ∈ Hn × Rm is primal-dual optimal for the gauge dualpair (3.5a) and (3.5b) if and only if the following conditions hold:1. X 0 and ‖b−AX‖ = ;2. 〈y, b〉 − ‖y‖∗ = 1;3. 〈y, b−AX〉 = ‖y‖∗ ‖b−AX‖ ;4. λi(X) · (λ1(A∗y)− λi(A∗y)) = 0, i = 1, . . . , n;744.1. Derivation of the approach5. X and A∗y admit a simultaneous ordered eigendecomposition.Proof. By strong duality (Proposition 4.1.1), the pair (X, y) ∈ Hm × Rm isprimal-dual optimal if and only if they are primal-dual feasible and the productof their corresponding objective values is equal to one. In this case,1 = [traceX + δ(X | · 0)] · [λ1(A∗y)]+ (strong duality)= 〈e, λ(X)〉 · λ1(A∗y)= 〈λ1(A∗y) · e, λ(X)〉≥ 〈λ(A∗y), λ(X)〉 (λ1(A∗y) ≥ λi(A∗y) and X 0)≥ 〈A∗y,X〉 (von Neumann’s trace inequality)= 〈y,AX〉= 〈y, b〉 − 〈y, b−AX〉≥ 〈y, b〉 − ‖y‖∗ ‖b−AX‖ (Cauchy-Schwartz inequality)≥ 〈y, b〉 − ‖y‖∗ (primal feasibility)≥ 1. (dual feasibility)Thus all of the above inequalities hold with equality. This proves conditions1–4. Condition 5 follows from again invoking von Neumann’s trace inequal-ity and noting its implication that X and A∗y share a simultaneous orderedeigenvalue decomposition. Sufficiency of those conditions can be verified bysimply following the reverse chain of reasoning and again noticing that theinequalities can be replaced by equalities.4.1.2 Primal recovery subproblemThe optimality conditions stated in Proposition 4.1.2 furnish the means forderiving a subproblem that can be used to recover a primal solution from adual solution. The next result establishes an explicit relationship betweenprimal solutions X and A∗y for an arbitrary optimal dual solution y.Corollary 4.1.1. Suppose that the conditions of Proposition 4.1.2 hold.Let y ∈ Rm be an arbitrary optimal solution for the dual gauge pro-gram (3.5b), r1 ∈ {1, . . . , n} be the multiplicity of λ1(A∗y), and U1 ∈ Cn×r1be the matrix formed by the first r1 eigenvectors of A∗y. Then a matrixX ∈ Hn is a solution for the primal problem (3.5a) if and only if there754.1. Derivation of the approachexists an r1 × r1 matrix S 0 such thatX = U1SU∗1 and (b−AX) ∈ ∂ ‖·‖∗ (y). (4.6)Proof. The assumptions imply that the optimal dual value is positive. If y ∈Rm is an optimal solution to (3.5b), the positive-homogeneity of its objectiveand constraint, and the positivity of the optimal value, allow us to deduce thatthe dual constraint must be active, i.e., 〈y, b〉 − ‖y‖∗ = 1. Thus condition 2of Proposition 4.1.2 holds.The construction of X in (4.6) guarantees that it shares a simultaneousordered eigendecomposition with A∗y, and that it has rank of r1 at most.Thus, conditions 4 and 5 of Proposition 4.1.2 hold.We now show that conditions 1 and 3 of the proposition hold. The sub-differential ∂ ‖·‖∗ corresponds to the set of maximizers of the linear func-tion that defines the dual norm; i.e., ‖·‖∗ := max‖z‖≤1R 〈·, z〉 . Then because(b−AX) ∈ ∂ ‖·‖∗ (y), it holds that ‖b−AX‖ ≤ and ‖y‖∗ = 〈y, b−AX〉 ≤‖y‖∗ ‖b−AX‖ ≤ ‖y‖∗ , implying that ‖b−AX‖ = and 〈y, b−AX〉 =‖y‖∗ ‖b−AX‖ . This way, condition 1 and 3 of Proposition 4.1.2 are also sat-isfied. Hence, all the conditions of the proposition are satisfied, and the pair(X, y) ∈ Hn × Rm is optimal.Suppose now that X ∈ Hn is optimal for (3.5a). We can invoke Propo-sition 4.1.2 on the pair (X, y) ∈ Hn × Rm. Condition 4 implies that anyeigenvector of A∗y associated to an eigenvalue λi(A∗y) with i > r1 is in thenullspace ofX, therefore there is an r1×r1 matrix S 0 such thatX = U1SU∗1 .Conditions 1 and 3 imply that ‖b−AX‖ ≤ and 〈y, b−AX〉 = ‖y‖∗ , thusverifying that (b−AX) ∈ ∂ ‖·‖∗ (y), as required.Corollary 4.1.1 thus provides us with a way to recover a solution to ourmodel problem (4.1a) after computing a solution to the gauge dual prob-lem (3.5a). When the residual in (4.1a) is measured in the 2-norm, condi-tion (4.6) simplifies, and implies that the matrix S that defines X = USU∗can be obtained by solvingminimizeS0‖A(U1SU∗1 )− b‖22 , with b := b− y/ ‖y‖ . (4.7)When the multiplicity r1 of the eigenvalue λ1(A∗y) is much smaller than n, thisoptimization problem is relatively inexpensive. In particular, if r1 = 1—whichmay be expected in some matrix-lifted applications such as PhaseLift—theoptimization problem is over a scalar s that can be obtained immediately ass = [〈A(u1u∗1), b〉]+/ ‖A(u1u∗1)‖22764.2. The implementation of a proof of concept solverwhere u1 is a rightmost eigenvector of A∗y. This approach exploits the comple-mentarity relation on eigenvalues in condition 4 of Proposition 4.1.2 to reducethe dimensionality of the primal solution recovery. Its computational diffi-culty effectively depends on finding a dual solution y at which the rightmosteigenvalue has low multiplicity r1.4.2 The implementation of a proof of conceptsolverThe effectiveness of our approach hinges on efficiently solving the constrainedeigenvalue optimization problem (4.2) in order to generate solution estimatesy and rightmost eigenvector estimates U1 of A∗y that we can feed to (4.7). Thetwo main properties of this problem that drive our approach are that it has anonsmooth objective and that projections on the feasible set are inexpensive.Our implementation is based on a basic projected-subgradient descent method,although certainly other choices are available. For example, Nesterov (2009)and Richtárik (2011) propose specialized algorithms for minimizing positivelyhomogeneous functions with affine constraints; some modification of this ap-proach could possibly apply to (4.3). Another possible choice is Helmbergand Rendl’s (2000) spectral bundle method. For simplicity, and because ithas proven sufficient for our needs, we use a standard projected subgradientmethod as described below.4.2.1 Dual descentThe generic projected subgradient method is based on the iterationy+ = P(y − αg), (4.8)where g is a subgradient of the objective at the current iterate y, α is a positivesteplength, and the operator P : Rm → Rm gives the Euclidean projectiononto the feasible set. For the objective function f(y) = λ1(A∗y) of (3.7), thesubdifferential has the form∂f(y) = {A(U1TU∗1 ) | T 0, traceT = 1 } , (4.9)where U1 is the n × r1 matrix of rightmost eigenvectors of A∗y (Overton,1992, Theorem 3). A Krylov-based eigenvalue solver can be used to evaluatef(y) and a subgradient g ∈ ∂f(y). Such methods require products of the774.2. The implementation of a proof of concept solverform (A∗y)v for arbitrary vectors v. In many cases, these products can becomputed without explicitly forming the matrix A∗y. In particular, for theapplications described in §1.1, these products can be computed entirely usingfast operators such as the FFT and the fast discrete Wavelet transform (DWT),with essentially O(n log n) arithmetic operations. Similar efficiencies can beused to compute a subgradient g from the forward map A(U1TU∗1 ).For large problems, further efficiencies can be obtained simply by comput-ing a single eigenvector u1, i.e., any unit-norm vector in the range of U1. Inour implementation, we typically request at least two rightmost eigenpairs:this gives us an opportunity to detect if the leading eigenpair is isolated. If itis, then the subdifferential contains only a single element, which implies thatf is in fact differentiable at that point.Any sequence of step lengths {αk} that satisfies the generic conditionslimk→∞αk = 0,∞∑k=0αk =∞is sufficient to guarantee that the value of the objective at yk converges tothe optimal value (Bertsekas, 2015, Proposition 3.2.6). A typical choice isαk = O(1/k). Our implementation defaults to a Barzilai-Borwein steplength(Barzilai and Borwein, 1988) with a nonmonotonic linesearch (Zhang andHager, 2004) if it is detected that a sequence of iterates is differentiable (byobserving separation of the leading eigenpair); and otherwise it falls back to adecreasing step size.The projection operator P onto the dual-feasible set (3.7) is inexpensivewhen the residual is measured in the 2-norm. In particular, if = 0, the dual-feasible set is a halfspace, and the projection can be accomplished in lineartime. When is positive, the projection requires computing the roots of a1-dimensional degree-4 polynomial, which in practice requires little additionalexpense. In the following, we describe the approach we use for implementingthe projection operator.Computing the projection onto the gauge dual feasible setIn order to compute the projection onto the gauge dual feasible set, we needto be able to solve the following optimization problem:minimizey∈Cm12‖y − yˆ‖22 subject to R 〈b, y〉 − ‖y‖2 ≥ 1, (4.10)where yˆ ∈ Cm is arbitrary and ‖b‖2 > , otherwise the feasible set is empty.The following result allows for a concrete way to compute this projection by784.2. The implementation of a proof of concept solversolving for the roots of a quartic polynomial and then picking one satisfying afew simple conditions.Proposition 4.2.1. Given b ∈ Cm \ {0} and ∈ [0, ‖b‖2), there is a uniquey ∈ Cm that solves (4.10). Moreover, it is characterized byy ={yˆ if R 〈b, yˆ〉 − ‖yˆ‖2 ≥ 1;(yˆ + λb)(1− λ‖yˆ+λb‖2)otherwise;where λ > 0 is a positive real root of the degree-4 polynomialp(λ) = ‖yˆ + λb‖22[R 〈b, yˆ + λb〉+ λ2 − 1]2−2 [‖yˆ + λb‖2 + λR 〈b, yˆ + λb〉]2which satisfies‖yˆ + λb‖2[R 〈b, yˆ + λb〉+ λ2 − 1] = [‖yˆ + λb‖2 + λR 〈b, yˆ + λb〉]and for which ‖yˆ + λb‖2 − λ is positive and smallest.Proof. Existence and uniqueness follow from the fact that the feasible set isnonempty, closed and convex, and from the strong convexity of the objective.It is clear that if yˆ is feasible, it is also the solution. This way we can focus onthe second part of the characterization. Defining h(·) := 1−R 〈b, ·〉+ ‖·‖2 , wehave that ∂h(y) ={ y‖y‖2 − b}63 0. And the necessary and sufficient first-orderoptimality conditions tell us we need to find λ > 0 such thaty − yˆ + λ(y‖y‖2− b)= 0 and R 〈b, y〉 − ‖y‖2 = 1.Manipulating the first condition, we have thaty(1 +λ‖y‖2)= yˆ + λb =⇒ ‖y‖2 = ‖yˆ + λb‖2 − λ=⇒ y = (yˆ + λb)(1− λ‖yˆ + λb‖2),which we can then substitute into the second condition, resulting in‖yˆ + λb‖2[R 〈b, yˆ + λb〉+ λ2 − 1] = [‖yˆ + λb‖2 + λR 〈b, yˆ + λb〉] .The result follows by squaring both sides and noticing that λ must then satisfyp(λ) = 0.794.2. The implementation of a proof of concept solverOur implementation is motivated by the approach outlined above as itleads to a simple and inexpensive enough Matlab implementation. An alter-native approach to compute λ can be devised by deriving the Lagrangian dualproblem to (4.10) and then employing an efficient iterative method to solvethe resulting one-dimensional problem. This might lead to improved robust-ness and accuracy, but we have not identified a particular need for that in thecourse of our numerical experimentations.4.2.2 Primal recoveryAt each iteration of the descent method (4.8) for the constrained eigenvalueoptimization problem (4.2), we compute a corresponding primal estimateX+ = U1S+U∗1 (4.11)maintained in factored form. The matrix U1 has already been computed inthe evaluation of the objective and its subgradient; see (4.9). The positivesemidefinite matrix S+ is the solution of the primal-recovery problem (4.7).A byproduct of the primal-recovery problem is that it provides a suitablestopping criterion for the overall algorithm. Because the iterations yk are dualfeasible, it follows from Corollary 4.1.1 that if (4.7) has a zero residual, thenthe dual iterate yk and the corresponding primal iterate Xk are optimal. Thus,we use the size of the residual to determine a stopping test for approximateoptimality.4.2.3 Primal-dual refinementThe primal-recovery procedure outlined in §4.2.2 is used as a stopping criterionand to provide primal solutions, it does not directly affect the sequence of dualiterates from (4.8). In our numerical experiments, we find that signficant gainscan be had by refining the primal estimate (4.11) and feeding it back into thedual sequence. We use the following procedure, which involves two auxiliarysubproblems that add relatively little to the overall cost.Inspired by the Wirtinger Flow algorithm described in §§1.1.1, the firststep is to refine the primal estimate obtained via (4.7) by using its solutionto determine the starting point Z0 = U1S1/2+ for the smooth unconstrainednon-convex problemminimizeZ∈Cn×rh(Z) := 14‖A(ZZ∗)− b‖22 . (4.12)804.3. ExtensionsIn effect, we continue to minimize (4.7), where additionally U1 is allowed tovary. Several options are available for solving this smooth unconstrained prob-lem. Our implementation has the option of using a steepest-descent iterationwith a spectral steplength and non-monotone linesearch (Zhang and Hager,2004), or a limited-memory BFGS method (Nocedal and Wright, 2006, §7.2).The main cost at each iteration is the evaluation of the gradient∇h(Z) = A∗(A(ZZ∗)− b)Z. (4.13)We thus obtain a candidate improved primal estimate X̂ = ẐẐ∗, where Ẑis a solution of (4.12). When = 0, this non-convex problem coincides withthe problem used by Candès et al. (2015a) in their Wirtinger Flow algorithm.They use the initialization Z0 = γu1, where u1 is a leading eigenvector of A∗b,and γ = n∑i bi/∑i ‖ai‖22. Our initialization, on the other hand, is based ona solution of the primal-recovery problem (4.7).The second step of the refinement procedure is to construct a candidatedual estimate ŷ from a solution of the constrained linear-least-squares problemminimizey∈Rm12∥∥∥(A∗y)Ẑ − λ̂Ẑ∥∥∥22subject to 〈y, b〉 − ‖y‖∗ ≥ 1, (4.14)where λ̂ := 1/ trace X̂ ≡ 1/∥∥∥Ẑ∥∥∥22is the reciprocal of the primal objective valueassociated with X̂. This constrained linear-least-squares problem attempts toconstruct a vector ŷ such that the columns of Ẑ correspond to eigenvectorsof A∗ŷ associated with λ̂. If f(ŷ) < f(y+), then ŷ improves on the currentdual iterate y+ obtained by the descent method (4.8), and we are free to useŷ in its place. This improved estimate, which is exogenous to the dual descentmethod, can be considered a “spacer” iterate, as described by Bertsekas (1999,Proposition 1.2.6). Importantly, it does not interfere with the convergence ofthe underlying descent method. The projected-descent method used to solvethe dual problem and generate the dual sequence can also be applied to (4.14),though in this case the objective is guaranteed to be differentiable.4.3 ExtensionsThe formulations (4.1) that we have considered so far are stated in their sim-plest form. Reweighted formulations, as introduced by Mohan and Fazel (2010)and Candès et al. (2013a), are also useful, and are accommodated by ourapproach. In the next subsections we provide alternative derivations of thecorresponding gauge duals for these weighted formulations and specialize theprimal-from-dual recovery conditions from Corollary 4.1.1.814.3. Extensions4.3.1 Weighted trace minimizationCandès, Eldar, Strohmer, and Voroninski (2013a) show that an iterativelyreweighted sequence of trace minimization problems of the formminimizeX∈Hn〈W−1, X〉subject to ‖b−AX‖ ≤ , X 0, (4.15)whereW 0, can improve the range of signals that can be recovered using thePhaseLift relaxation. Each problem in the sequence uses the previous solutionto derive the next weight matrix W , which is a low-rank update to a smallpositive multiple of the identity.Define the mapsW(·) := W 12 (·)W 12 and AW := A ◦W .It is evident that X 0 if and only ifW−1(X) 0, and so the weighted traceminimization problem can be stated equivalently asminimizeX∈HntraceW−1(X) subject to ∥∥b−AW (W−1(X))∥∥ ≤ , W−1(X) 0.Because W is a bijection, we can optimize over X̂ :=W−1(X) instead of X:minimizeX̂∈Hntrace X̂ subject to∥∥∥b−AW X̂∥∥∥ ≤ , X̂ 0. (4.16)This clearly falls within the structure of (4.1a), and has the correspondinggauge dualminimizey∈Rm[λ1(A∗Wy)]+ subject to 〈y, b〉 − ‖y‖∗ ≥ 1. (4.17)Observing that λ1(A∗Wy) = λ1(W12 (A∗y)W 12 ) = λ1(A∗y,W−1), the dual gaugeproblem corresponding to (4.15) has the formminimizey∈Rm[λ1(A∗y,W−1)]+ subject to 〈y, b〉 − ‖y‖∗ ≥ 1. (4.18)This shows that the introduction of a weighting matrix that is not a simplemultiple of the identity leads to a dual gauge problem involving the minimiza-tion of the rightmost generalized eigenvalue of A∗y with respect to that weight.Now that we have a formulation for the gauge dual problem, we focus on howa primal solution to the original weighted trace minimization can be computedgiven a dual minimizer. The following result extends Corollary 4.1.1.824.3. ExtensionsCorollary 4.3.1. Suppose that problem (4.15) is feasible and 0 ≤ < ‖b‖ .Let y ∈ Rm be an arbitrary optimal solution for the dual gauge (4.18),r1 ∈ {1, . . . , n} be the multiplicity of λ1(A∗y,W−1), and U1 ∈ Cn×r1 be thematrix formed by the first r1 generalized eigenvectors of A∗y with respectto W−1. Then X ∈ Hn is a solution for the primal problem (4.15) if andonly if there exists S 0 such thatX = U1SU∗1 and (b−AX) ∈ ∂ ‖·‖∗ (y).Proof. A solution for (4.18) is clearly a solution for (4.17). We may thus invokeCorollary 4.1.1 and assert that X̂ ∈ Hn is a solution for (4.16) iff there is S 0such that X̂ = Uˆ1SUˆ∗1 and (b − AW X̂) ∈ ∂ ‖·‖∗ (y), where Uˆ1 ∈ Cn×r1 is amatrix formed by the first r1 eigenvectors of A∗Wy = W12 (A∗y)W 12 . From thestructure of W , we have that X is a solution to (4.15) iff X = W(X̂). Thus,X = W12 Uˆ1SUˆ∗1W12 = U1SU∗1 , where U1 := W12 Uˆ1 corresponds to the first r1generalized eigenvectors of A∗y with respect to W−1.Once again, this provides us with a way to recover a solution to the weightedtrace minimization problem by computing a solution to the gauge dual prob-lem (now involving the rightmost generalized eigenvalue) and then solving aproblem of potentially much reduced dimensionality.4.3.2 Weighted affine nuclear-norm minimizationWe can similarly extend the reweighted extension to the non-symmetric case (4.1b).Let W1 ∈ Hn1 and W2 ∈ Hn2 be invertible. The weighted nuclear-norm mini-mization problem becomesminimizeX∈Cn1×n2∥∥W−11 XW−∗2 ∥∥1 subject to ‖b−AX‖ ≤ . (4.19)Define the weighted quantitiesW(·) = W1(·)W ∗2 : Cn1×n2 → Cn1×n2 , AW = A ◦W , andX̂ :=W−1(X).The weighted problem can then be stated equivalently asminimizeX̂∈Cn1×n2∥∥∥X̂∥∥∥1subject to∥∥∥b−AW X̂∥∥∥ ≤ ,834.3. Extensionswhich, following the approach introduced in Fazel (2002), can be embedded ina symmetric problem:minimizeÛ∈Hn1V̂ ∈Hn2X̂∈Cn1×n2〈12I,(Û X̂X̂∗ V̂)〉subject to(Û X̂X̂∗ V̂) 0 and∥∥∥b−AW X̂∥∥∥ ≤ .(4.20)Define the measurement operator fromM : Hn1+n2 → Cm by the map(Û X̂X̂∗ V̂)7→ AW X̂,and identify Cm with R2m as a real inner-product space. The adjoint of themeasurement operator is then given byM∗y =(0 A∗Wy(A∗Wy)∗ 0),where A∗Wy = 12∑mi=1W1AiW∗2 yi. We can now state the gauge dual problem:minimizey∈Cm[λ1(M∗y, 12I)]+subject to R 〈y, b〉 − ‖y‖∗ ≥ 1. (4.21)Observe the identityλ1(M∗y, 12I)= λ1(2M∗y)=[λ1(0∑mi=1W1AiW∗2 yi(∑mi=1W1AiW∗2 yi)∗ 0)]+= [‖W1(A∗y)W ∗2 ‖∞]+ = ‖W1(A∗y)W ∗2 ‖∞ .We can now deduce the simplified form for the gauge dual problem:minimizey∈Cm‖W1(A∗y)W ∗2 ‖∞ subject to R 〈y, b〉 − ‖y‖∗ ≥ 1. (4.22)This weighted gauge dual problem can be derived from first principlesusing the tools from §3.1.1 by observing that the primal problem is alreadyin standard gauge form (c.f. §3.2.1). We present this alternative approach,844.3. Extensionshowever, to make explicit the close connection between the (weighted) nuclear-norm minimization problem and the (weighted) trace-minimization problemdescribed in §4.3.1.The following result provides a way to characterize solutions of the nuclearnorm minimization problem when a solution to the dual gauge problem isavailable.Corollary 4.3.2. Suppose that problem (4.19) is feasible and 0 ≤ < ‖b‖.Let y ∈ Cm be an arbitrary optimal solution for the dual gauge prob-lem (4.22), r1 ∈ {1, . . . , n} be the multiplicity of σ1(W1(A∗y)W ∗2 ), U1 ∈Cn1×r1 and V1 ∈ Cn2×r1 be the matrices formed by the first r1 left- andright-singular-vectors of W1(A∗y)W ∗2 , respectively. Then X ∈ Cn1×n2 is asolution for the primal problem (4.19) if and only if there exists S 0 suchthat X = (W1U1)S(W2V1)∗ and (b−AX) ∈ ∂ ‖·‖∗ (y).Proof. A solution for (4.22) is clearly a solution for (4.21); this way we in-voke Corollary 4.1.1 and have that (Û , V̂ , X̂) ∈ Hn1 × Hn2 × Cn1×n2 in-duce a solution for (4.20) iff there is S 0 such that X̂ = Û1SV̂ ∗1 and(b − AW X̂) ∈ ∂ ‖·‖∗ (y), where Û1 ∈ Cn1×r1 and V̂1 ∈ Cn2×r1 are matricesformed by the first r1 left- and right-singular-vectors of A∗Wy = W1(A∗y)W ∗2 .From the structure ofW , we have that X is a solution to (4.19) iff X =W(X̂).This way, X = (W1Û1)S(W2V̂1)∗.85Chapter 5Numerical experimentsThis chapter reports on a set of numerical experiments for solving instances ofthe phase retrieval and blind deconvolution problems described in Chapter 1.The various algorithmic pieces described in §4.2 have been implemented as aMatlab software package. The implementation uses Matlab’s eigs routinefor the eigenvalue computations described in §4.2.1, thus only accessing A∗yvia its product with a given vector and not requiring to explicitly form a largedense matrix. We implemented a projected gradient-descent method, which isthen used for solving (4.8), (4.12), and (4.14).5.1 Phase recoveryWe conduct three experiments for phase retrieval via the PhaseLift formula-tion. The first experiment is for a large collection of small 1-dimensional ran-dom signals, and is meant to contrast our approach against a general-purposeconvex optimization algorithm and a specialized non-convex approach. Thesecond experiment tests problems where the vector of observations b is con-taminated by noise, hence testing the case where > 0. The third experimenttests the scalability of the approach on a large 2-dimensional natural image.Our phase retrieval experiments follow the approach outlined in Candèset al. (2015a). The diagonal matrices Ck ∈ Cn×n encode diffraction patternsthat correspond to the kth “mask” (k = 1, . . . , L) through which a signalx0 ∈ Cn is measured. The measurements are given byb = A(x0x∗0) := diagFC1...FCL (x0x∗0)FC1...FCL∗ ,where F is the unitary discrete Fourier transform (DFT). The adjoint of theassociated linear map A is thenA∗y :=L∑k=1C∗kF∗Diag(yk)FCk,865.1. Phase recoverywhere y = (y1, . . . , yL) and Diag(yk) is the diagonal matrix formed from thevector yk ∈ Rn. The main cost in the evaluation of the forward map A(V V ∗)involves L applications of the DFT for each column of V . Each evaluation ofthe adjoint map applied to a vector v—i.e., (A∗y)v—requires L applications ofboth the DFT and its inverse. In the experimental results reported below, thecolumns labeled “nDFT” indicate the total number of DFT evaluations usedover the course of a run. The costs of these DFT evaluations are invariantacross the different algorithms, and dominate the overall computation.5.1.1 Random Gaussian signalsIn this section we consider a set of experiments for different numbers of masks.For each value of L = 6, 7, . . . , 12, we generate a fixed set of 100 randomcomplex Gaussian vectors x0 of length n = 128, and a set of L random complexGaussian masks Ck.Table 5.1 summarizes the results of applying four different solvers to eachset of 100 problems. The solver GAUGE is our implementation of the approachdescribed in §4.2; TFOCS (Becker et al., 2011) is a first-order conic solver appliedto the primal problem (4.1a). The version used here was modified to avoidexplicitly forming the matrix A∗y (Strohmer, 2013). The algorithm WFLOW(Candès et al., 2015a) is a non-convex approach that attempts to recover theoriginal signal directly from the feasibility problem (4.12), with = 0. Tomake sensible performance comparisons to WFLOW, we add to its implemen-tation a stopping test based on the norm of the gradient (4.13); the defaultalgorithm otherwise uses a fixed number of iterations. We also show the re-sults of applying the GAUGE code in a “feasibility” mode that exits with thefirst candidate primal-feasible solution; this is the solver labeled GAUGE-feas.This derivative of GAUGE is in some respects akin to WFLOW, with the maindifference that GAUGE-feas uses starting points generated by the dual-descentestimates, and generates search directions and step-lengths for the feasibilityproblem from a spectral gradient algorithm. The columns labeled “xErr” re-port the median relative error ‖x0x∗0 − x̂x̂∗‖F / ‖x0‖22 of the 100 runs, wherex̂ is the solution returned by the corresponding solver. The columns labeled“%” give the percentage of problems solved to within a relative error of 10−2.This column is excluded for GAUGE and GAUGE-feas because these solvers ob-tained the prescribed accuracy for all problems in each test set. At least onthis set of artificial experiments, the GAUGE solver (and its feasibility variantGAUGE-feas) appear to be most efficient.875.1. Phase recoveryTable 5.1: Phase retrieval comparisons for random complex Gaussian signalsof size n = 128 measured using random complex Gaussian masks. Numbersof the form n−e are a shorthand for n · 10−e.GAUGE TFOCS GAUGE-feas WFLOWL nDFT xErr nDFT xErr % nDFT xErr nDFT xErr %12 18,330 1.6−6 2,341,800 3.6−3 100 3,528 1.3−6 5,232 1.2−5 10011 19,256 1.5−6 2,427,546 4.3−3 100 3,344 1.4−6 4,906 1.6−5 10010 19,045 1.4−6 2,857,650 5.5−3 100 3,120 1.6−6 4,620 2.1−5 1009 21,933 1.6−6 1.2 · 107 7.5−3 89 2,889 1.4−6 4,374 2.5−5 1008 23,144 2.1−6 1.1 · 107 1.2−2 22 2,688 1.9−6 4,080 3.3−5 1007 25,781 1.8−6 6,853,245 2.4−2 0 2,492 2.0−6 3,836 5.2−5 956 34,689 3.0−6 2,664,126 6.4−2 0 2,424 2.5−6 3,954 9.5−5 625.1.2 Random problems with noiseIn this set of experiments, we assess the effectiveness of the gauge-based solversto problems with > 0, which could be useful in recovering signals with noise.For this purpose, it is convenient to generate problems instances with noise andknown primal-dual solutions, which we can do by using Corollary 4.1.1. Eachinstance is generated by first sampling octanary masks Ck—as described byCandès et al. (2015a)—and real Gaussian y ∈ Rm; a solution x0 ∈ Cn is thenchosen as a unit-norm rightmost eigenvector of A∗y, and the measurements arecomputed as b := A(x0x∗0)+y/ ‖y‖ , where is chosen as := ‖b−A(x0x∗0)‖ =η ‖b‖ , for a given noise-level parameter 0 < η < 1.For these experiments, we generate 100 instances with n = 128 for eachpairwise combination (L, η) withL ∈ {6, 9, 12} and η ∈ {0.1%, 0.5%, 1%, 5%, 10%, 50%}.Table 5.2 summarizes the results of applying the solvers GAUGE, GAUGE-feas,and WFLOW to these problems. It is not clear that GAUGE-feas and WFLOWare relevant for this experiment, since they do not directly attempt to solvethe norm-constrained PSD trace minimization problem, but for interest weinclude them in the results. GAUGE is generally successful in recovering therank-1 minimizer for most problems—even for cases with significant noise,though in these cases the overall cost increases considerably.885.1. Phase recoveryTable 5.2: Phase retrieval comparisons for problems with noise, i.e., > 0.Numbers of the form n−e are a shorthand for n · 10−e.GAUGE GAUGE-feas WFLOWL η nDFT xErr % nDFT xErr % nDFT xErr %12 0.1% 29,988 2.9−3 100 936 3.4−3 100 14,856 7.1−4 1009 0.1% 36,292 2.7−3 100 774 2.9−3 100 11,511 7.7−4 1006 0.1% 50,235 2.9−3 100 612 3.2−3 100 8,922 1.2−3 9812 0.5% 27,252 4.9−3 100 936 4.6−3 100 14,808 2.5−3 1009 0.5% 31,032 5.0−3 100 774 4.4−3 100 11,430 3.5−3 1006 0.5% 38,766 5.2−3 100 606 5.9−3 100 8,790 5.7−3 9812 1.0% 22,620 8.5−3 97 936 7.2−3 100 14,712 5.1−3 1009 1.0% 24,813 8.5−3 96 774 7.3−3 100 11,331 7.1−3 996 1.0% 2 · 105 7.1−3 98 600 9.9−3 53 8,634 1.1−2 812 5.0% 9 · 105 4.7−3 90 936 3.3−2 0 14,148 2.6−2 09 5.0% 8 · 105 4.2−3 90 774 3.4−2 0 10,701 3.5−2 06 5.0% 5 · 105 4.8−3 98 600 4.4−2 0 7,728 5.4−2 012 10.0% 1 · 106 4.1−3 89 912 6.7−2 0 13,548 5.3−2 09 10.0% 7 · 105 3.6−3 91 765 6.7−2 0 10,125 7.1−2 06 10.0% 5 · 105 3.2−3 100 588 8.2−2 0 7,098 1.1−1 012 50.0% 5 · 105 2.7−3 94 888 3.7−1 0 11,424 3.3−1 09 50.0% 3 · 105 2.0−3 99 738 3.5−1 0 8,586 4.3−1 06 50.0% 2 · 105 2.4−3 95 588 3.7−1 0 7,176 6.3−1 05.1.3 Two-dimensional signalWe conduct a third experiment on a stylized application in order to assess thescalability of the approach to larger problem sizes. In this case the measuredsignal x0 is a two-dimensional real-valued image of size 1600 × 1350 pixels,a grayscale version of the image shown in Figure 5.1, which corresponds ton = 2.16 · 106. The dimension of the ambient space of the primal lifted for-mulation is(n+12)> 2.3 · 1012, which makes it clear that the resulting SDPis enormous, and must be handled by a specialized solver. We have excludedTFOCS from the list of candidate solvers because it cannot make progress onthis example. We generate 10 and 15 octanary masks. Table 5.3 summarizesthe results. The column headers carry the same meaning as Table 5.1.895.1. Phase recoveryFigure 5.1: Image used for phase retrieval experiment; size 1600× 1350 pixels(7.5MB).Table 5.3: Phase retrieval comparisons on a 2-dimensional image.GAUGE GAUGE-feas WFLOWL nDFT xErr nDFT xErr nDFT xErr15 200,835 2.1 · 10−6 5,700 2.1 · 10−6 8,100 4.1 · 10−610 195,210 5.8 · 10−7 12,280 9.1 · 10−7 12,340 2.1 · 10−5905.2. Blind deconvolution5.2 Blind deconvolutionIn this blind deconvolution experiment, the convolution of two signals s1 ∈ Cmand s2 ∈ Cm are measured. Let B1 ∈ Cm×n1 and B2 ∈ Cm×n2 be two bases.The circular convolution of the signals can be described byb = s1 ∗ s2 = (B1x1) ∗ (B2x2)= F−1 diag((FB1x1)(FB2x2)T)= F−1 diag((FB1)(x1x∗2)(FB2)∗) =: A(x1x∗2),where A is the corresponding asymmetric linear map with the adjointA∗y := (FB1)∗Diag(Fy)(FB2).Because F is unitary, it is possible to work instead with measurementsb̂ ≡ Fb = diag ((FB1)(x1x∗2)(FB2)∗)in the Fourier domain. For the experiments that we run, we choose to workwith the former real-valued measurements b because they do not require ac-counting for the imaginary parts, and thus the number of constraints in (3.9)that would be required otherwise is reduced by half.We follow the experimental setup outlined by Ahmed et al. (2014) and usethe data and code that they provide. In that setup, B1 is a subset of theHaar wavelet basis, and B2 is a mask corresponding to a subset of the identitybasis. The top row of Figure 5.2 shows the original image, a depiction of themotion blur kernel, and the observed blurred image. The second row of thefigure shows the image reconstructed using the augmented Lagrangian codeprovided by Ahmed et al. (2014), GAUGE, and GAUGE-feas. Table 5.4 summa-rizes the results of applying the three solvers. The columns headed “nDFT”and “nDWT” count the number of discrete Fourier and wavelet transforms re-quired by each solver; the columns headed “xErr1” and “xErr2” report the rela-tive errors ‖xi − x̂i‖2 / ‖xi‖2, i = 1, 2, where x̂i are the recovered solutions; thecolumn headed “rErr” reports the relative residual error∥∥∥b−A(x̂1x̂∗2)∥∥∥2/ ‖b‖2.Although the non-convex augmented Lagrangian approach yields visibly bet-ter recovery of the image, the table reveals that the solutions are numericallysimilar, and are recovered with far less work.It is noteworthy that this scenario is extremely limited in the context ofblind deconvolution problems. Our aim with its setup is to compare the com-putational approach described in Chapter 4 against the augmented Lagrangianmethod used by Ahmed et al. (2014) for the numerical solution of the nuclearnorm minimization problem arising from relaxing the lifted formulation pre-sented in §1.1.2.915.2. Blind deconvolution(a) (b) (c)(d) (e) (f)Figure 5.2: Images used for the blind deconvolution experiments: (a) originalimage; (b) zoom of the motion-blurring kernel; (c) blurred image; (d) imagerecovered by the augmented Lagrangian approach; (e) image recovered byGAUGE; (f) image recovered by GAUGE-feas. The shaded background on therecovered images is an artifact of the uniform scaling used to highlight anyerror between the original and recovered signals.Table 5.4: Blind deconvolution comparisons.solver nDFT nDWT xErr1 xErr2 rErraugmented Lagrangian 92,224 76,872 7.9 · 10−2 5.0 · 10−1 1.4 · 10−4GAUGE 17,432 8,374 1.9 · 10−2 5.4 · 10−1 3.8 · 10−4GAUGE-feas 4,128 2,062 8.4 · 10−2 5.5 · 10−1 4.0 · 10−492Chapter 6ConclusionsThe reconstruction of low-complexity signals from given data, under differ-ent measurement and noise models comprises an important field of researchthat generalizes many recent developments in compressed sensing, and hasan impact on a broad range of applications. It encompasses different notionsof complexity—e.g., sparsity, low rank, atomic norms—, reconstruction algo-rithms and the theoretical conditions under which these achieve recoverability,either exactly or approximately. Recent approaches to these problems rely onconvexifying a suitable, but hard, optimization reformulation and providingconditions under which, for certain measurement and noise models, the so-lution of the resulting convex optimization problem will very likely yield asolution of the original problem.This thesis focused on the practical aspect of providing an effective ap-proach to numerically solve the large spectral optimization problems arisingfrom the matrix lifting convex relaxation approach for phase recovery (Candèset al., 2013b) and blind deconvolution (Ahmed et al., 2014). By identifyingboth problems as particular cases of a class of gauge optimization problems,via suitable reformulations, we were able to abstract some of the structureand contribute results useful to our approach while expanding the theoreticalframework of gauge optimization.In the following paragraphs, we discuss our salient developments whilepresenting some interesting opportunities for further investigation.A theoretical framework based on gauge duality. Chapter 2 focusedon presenting and extending the duality theory of gauge optimization to aparticular—but still rather broad—class of problems, and on providing the-oretical tools to simplify the manipulations commonly involved in modelingproblems that fit this framework. Some results for dealing with antipolar setswere presented in §2.2 which can be invoked in the derivation of dual problems.The structure particular to gauge optimization allows for an alternative tothe usual Lagrange duality (a parallel explored in §2.3), which was later usefulfor providing an avenue of exploration for modeling and algorithm develop-ment. Depending on the particular application, it may prove computationally936. Conclusionsconvenient or more efficient to use existing algorithms to solve the gauge dualrather than the Lagrange dual problem. For example, a variation of the pro-jected subgradient method was used to exploit the relative simplicity of thegauge dual constraints in the eigenvalue optimization problem (3.7). As withmethods that solve the Lagrange dual problem, a procedure would be neededto recover the primal solution. Although this is difficult to do in general, forspecific problems it is possible to develop a primal-from-dual recovery proce-dure via the optimality conditions, an approach undertaken later in Chapter 4.More generally, an important question left unanswered is if there exists aclass of algorithms that can leverage this special structure. It is intriguingthe possibility of developing a primal-dual algorithm specialized to the gauge-constrained primal-dual gauge pair.The sensitivity analysis presented in §2.5 relied on existing results fromLagrange duality. It would be preferable, however, to develop a line of analysisthat is self-contained and based entirely on gauge duality theory and someform of “gauge multipliers”. In this regard, if we define the value function asv˜(b, σ) = infx {κ(x) + δepi ρ(b− Ax, σ) }, then v˜ is a gauge. It is conceivablethat sensitivity analysis could be carried out based on studying its polar, givenbyv˜◦(y, τ) = inf {µ ≥ 0 | (y, τ) ∈ µD } = κ◦(A∗y) + δρ◦(·)≤·(y,−τ),where D = { (y, τ) | κ◦(A∗y) ≤ 1, ρ◦(y) ≤ −τ }. This formula follows fromProposition 2.1.2(iv) and a computation of v˜∗ similar to the one leading to(2.27). This approach would be in contrast to the usual sensitivity analysis,which is based on studying a certain (convex) value function and its conjugate.Extensions of the standard primal-dual gauge pair are presented in §2.6.They were motivated by subproblems that show up in iteratively reweightedalgorithms for trace and nuclear-norm minimization. These are further special-ized in Chapter 3 to provide concrete primal-dual pairs for trace minimizationin the PSD cone, connecting our spectral optimization problems to the gaugeframework.An approach to low-rank spectral optimization. One of the criticismsthat have been leveled at relaxation approaches based on matrix lifting is thatthey lead to problems that are too difficult to be useful in practice due tothe quadratic increase in the primal ambient space. This has led to workon non-convex recovery algorithms that may not have as-strong statistical re-covery guarantees, but are nonetheless effective in practice; Netrapalli et al.(2013); Candès et al. (2015a); White et al. (2015). A major motivation for946. Conclusionsthis work was to determine whether it would be possible to develop convexoptimization algorithms that are as efficient as non-convex approaches. Thenumerical experiments on these problems presented in Chapter 5 suggest thatthe gauge-dual approach may prove effective. Indeed, other convex optimiza-tion algorithms may be possible, and clearly the key to their success will beto leverage the special structure of these problems.A theoretical question not addressed here is to delineate conditions underwhich dual attainment will hold. In particular, the conclusion (4.5) of The-orem 4.1.1 is asymmetric: it is possible to assert that, if the primal problemis feasible, a primal point exists that attains the primal optimal value whilesatisfying the constraints (because the Lagrange dual is strictly feasible), butin general one cannot assert that a dual point exists such that it attains thedual optimal value while still being feasible. A related theoretical question isto understand the relationship between the quality of a “near optimal” dualsolution, and the quality of the primal estimate obtained from it by the primalrecovery procedure.During the course of our experiments, we noticed that the rightmost eigen-value of A∗y seems to remain fairly well separated from the others acrossiterations. This appears to contribute to the overall effectiveness of the dual-descent method. Is there a special property of these problems or of the algo-rithm that encourages this separation? It seems likely that there are solutionsy at which the objective is not differentiable, and in that case, one can wonderif there are algorithmic devices that might be used to avoid such points.The dual-descent method used to solve the dual subproblem (cf. §4.2.1) isonly one possible algorithm among many. Other more specialized methods,such as the spectral bundle method of Helmberg and Rendl (2000), its second-order variant (Helmberg, Overton, and Rendl, 2014), or the stochastic-gradientmethod of d’Aspremont and Karoui (2014), may prove effective alternatives.Chapter 3 includes the construction of a family of lower models for the dualgauge objective which might be exploited to construct bundle-type methodsfor its minimization, this seems an interesting avenue for further investigation.It was convenient to embed the nuclear-norm minimization problem (4.1b)in the SDP formulation (4.1a) because it allows us to use the same solver forboth problems. Further efficiencies, however, may be gained by implementinga solver that applied directly to the corresponding gauge dualminimizey∈Cm‖A∗y‖∞ subject to R 〈y, b〉 − ‖y‖∗ ≥ 1.This would require an iterative solver for evaluating leading singular valuesand singular vectors of the non-symmetric operator A∗y, such as PROPACK(Larsen, 2001).95BibliographyA. Ahmed, B. Recht, and J. Romberg. Blind deconvolution using convexprogramming. IEEE Transactions on Information Theory, 60(3):1711–1732,2014.A. Y. Aravkin, J. V. Burke, and M. P. Friedlander. Variational properties ofvalue functions. SIAM Journal on Optimization, 23(3):1689–1717, 2013.A. Auslender and M. Teboulle. Asymptotic Cones and Functions in Opti-mization and Variational Inequalities. Springer Monographs in Mathematics.Springer-Verlag, New York, 2003.F. Bach. Learning with submodular functions: A convex optimization per-spective. Foundations and Trends in Machine Learning, 6(2-3):145–373, 2013.J. Barzilai and J. M. Borwein. Two-point step size gradient methods. IMAJournal of Numerical Analysis, 8:141–148, 1988.S. Becker, E. Candès, and M. Grant. Templates for convex cone problemswith applications to sparse signal recovery. Mathematical Programming Com-putation, 3:165–218, 2011.A. Berman. Cones, Matrices and Mathematical Programming, volume 79of Lecture Notes in Economics and Mathematical Systems. Springer-Verlag,1973.D. P. Bertsekas. Nonlinear Programming. Athena Scientific, Belmont, MA,second edition, 1999.D. P. Bertsekas. Convex optimization algorithms. Athena Scientific, Belmont,MA, 2015.D. Bertsimas and I. Popescu. Optimal inequalities in probability theory: aconvex optimization approach. SIAM Journal on Optimization, 15(3):780–804, 2005.96BibliographyJ. F. Bonnans, J. C. Gilbert, C. Lemaréchal, and C. A. Sagastizábal. Numer-ical optimization: Theoretical and practical aspects. Universitext. Springer-Verlag, Berlin, second edition, 2006. ISBN 3-540-35445-X.A. M. Bruckstein, D. L. Donoho, and M. Elad. From sparse solutions ofsystems of equations to sparse modeling of signals and images. SIAM Review,51(1):34–81, 2009.S. Burer and R. D. C. Monteiro. A nonlinear programming algorithm forsolving semidefinite programs via low-rank factorization. Mathematical Pro-gramming, 95(2, Series B):329–357, 2003.E. J. Candès and T. Tao. Decoding by linear programming. IEEE Transac-tions on Information Theory, 51(12):4203–4215, December 2005.E. J. Candés, M. B. Wakin, and S. P. Boyd. Enhancing sparsity by reweightedL1 minimization. Journal of Fourier Analysis and Applications, 14(5):877–905, 2008.E. J. Candès, X. Li, and M. Soltanolkotabi. Phase retrieval via Wirtingerflow: Theory and algorithms. IEEE Transactions on Information Theory, 61(4):1985–2007, April 2015a.E. J. Candès and X. Li. Solving quadratic equations via PhaseLift when thereare about as many equations as unknowns. Foundations of ComputationalMathematics, 14(5):1017–1026, October 2014.E. J. Candès and B. Recht. Exact matrix completion via convex optimization.Foundations of Computational Mathematics, 9(6):717–772, 2009.E. J. Candès and B. Recht. Simple bounds for recovering low-complexitymodels. Mathematical Programming, 141(1-2, Series A):577–589, 2013.E. J. Candès, Y. C. Eldar, T. Strohmer, and V. Voroninski. Phase retrievalvia matrix completion. SIAM Journal on Imaging Sciences, 6(1):199–225,2013a.E. J. Candès, T. Strohmer, and V. Voroninski. Phaselift: Exact and sta-ble signal recovery from magnitude measurements via convex programming.Communications on Pure and Applied Mathematics, 66(8):1241–1274, 2013b.E. J. Candès, X. Li, and M. Soltanolkotabi. Phase retrieval from codeddiffraction patterns. Applied and Computational Harmonic Analysis, 39(2):277–299, 2015b.97BibliographyA. Chai, M. Moscoso, and G. Papanicolaou. Array imaging using intensity-only measurements. Inverse Problems, 27(1):015005, 2011.V. Chandrasekaran, B. Recht, P. Parrilo, and A. Willsky. The convex geom-etry of linear inverse problems. Foundations of Computational Mathematics,12(6):805–849, 2012.S. S. Chen, D. L. Donoho, and M. A. Saunders. Atomic decomposition bybasis pursuit. SIAM Review, 43(1):129–159, 2001.Y. Chen and E. J. Candès. Solving random quadratic systems of equations isnearly as easy as solving linear systems. CoRR, abs/1505.05114, 2015. URLhttp://arxiv.org/abs/1505.05114. (Retrieved on December 16, 2015).R. Correa and C. Lemaréchal. Convergence of some algorithms for convexminimization. Mathematical Programming, 62(2, Series B):261–275, 1993.A. d’Aspremont and N. E. Karoui. A stochastic smoothing algorithm forsemidefinite programming. SIAM Journal on Optimization, 24(3):1138–1177,2014.L. Demanet and P. Hand. Stable optimizationless recovery from phaselesslinear measurements. Journal of Fourier Analysis and Applications, 20(1):199–221, 2014.D. L. Donoho. For most large underdetermined systems of linear equations,the minimal l1-norm solution is also the sparsest solution. Communicationson Pure and Applied Mathematics, 59, 2006.Y. C. Eldar, D. Needell, and Y. Plan. Uniqueness conditions for low-rankmatrix recovery. Applied and Computational Harmonic Analysis, 33(2):309–314, 2012.M. Fazel. Matrix rank minimization with applications. PhD thesis, Elec. Eng.Dept, Stanford University, 2002.M. Fazel, H. Hindi, and S. P. Boyd. A rank minimization heuristic withapplication to minimum order system approximation. In American ControlConference, Arlington, 2001.J. R. Fienup. Reconstruction of an object from the modulus of its fouriertransform. Optics Letters, 3(1):27–29, Jul 1978.98BibliographyJ. R. Fienup. Phase retrieval algorithms: a comparison. Applied Optics, 21(15):2758–2769, Aug 1982.R. M. Freund. Dual gauge programs, with applications to quadratic pro-gramming and the minimum-norm problem. Mathematical Programming, 38(1):47–67, 1987.M. P. Friedlander, H. Mansour, R. Saab, and O. Yilmaz. Recovering com-pressively sampled signals using partial support information. IEEE Trans.Inform. Theory, 58(2):1122–1134, 2012.M. P. Friedlander and I. Macêdo. Low-rank spectral optimization. CoRR,abs/1508.00315, 2015. URL http://arxiv.org/abs/1508.00315. (Re-trieved on December 16, 2015).M. P. Friedlander, I. Macêdo, and T. K. Pong. Gauge optimization andduality. SIAM Journal on Optimization, 24(4):1999–2022, 2014. doi:10.1137/130940785. URL https://dx.doi.org/10.1137/130940785. (Re-trieved on December 16, 2015).R. W. Gerchberg and W. O. Saxton. A practical algorithm for the deter-mination of the phase from image and diffraction plane pictures. Optik, 35:237–246, 1972.M. X. Goemans and D. P. Williamson. Improved approximation algorithmsfor maximum cut and satisfiability problems using semidefinite programming.Journal of the ACM, 42(6):1115–1145, November 1995.D. Gross, F. Krahmer, and R. Kueng. Improved recovery guarantees for phaseretrieval from coded diffraction patterns. CoRR, abs/1402.6286, 2014. URLhttp://arxiv.org/abs/1402.6286. (Retrieved on December 16, 2015).C. Helmberg and F. Rendl. A spectral bundle method for semidefinite pro-gramming. SIAM Journal on Optimization, 10(3):673–696, 2000.C. Helmberg, M. L. Overton, and F. Rendl. The spectral bundle methodwith second-order information. Optimization Methods and Software, 29(4):855–876, 2014.J.-B. Hiriart-Urruty and C. Lemaréchal. Convex analysis and minimiza-tion algorithms. II: Advanced theory and bundle methods, volume 306 ofGrundlehren der Mathematischen Wissenschaften [Fundamental Principlesof Mathematical Sciences]. Springer-Verlag, Berlin, 1993.99BibliographyM. Jaggi. Revisiting Frank-Wolfe: Projection-free sparse convex optimiza-tion. In Proceedings of the 30th International Conference on Machine Learn-ing (ICML-13), pages 427–435, 2013.D. Krishnan, P. Lin, and A. M. Yip. A primal-dual active-set method fornon-negativity constrained total variation deblurring problems. IEEE Trans-actions on Image Processing, 16(11):2766–2777, 2007.R. M. Larsen. Combining implicit restart and partial reorthogonalization inLanczos bidiagonalization, 2001. URL http://sun.stanford.edu/~rmunk/PROPACK/. (Retrieved on December 16, 2015).R. B. Lehoucq, D. C. Sorensen, and C. Yang. ARPACK Users’ guide: solutionof large-scale eigenvalue problems with implicitly restarted Arnoldi methods,volume 6. SIAM, 1998.A. Levin, Y. Weiss, F. Durand, and W. Freeman. Understanding blind decon-volution algorithms. IEEE Transactions on Pattern Analysis and MachineIntelligence, 33(12):2354–2367, Dec 2011.A. S. Lewis. Convex analysis on the hermitian matrices. SIAM Journal onOptimization, 6(1):164–177, 1996.S. Ling and T. Strohmer. Self-calibration and biconvex compressive sensing.CoRR, abs/1501.06864, 2015. URL http://arxiv.org/abs/1501.06864.(Retrieved on December 16, 2015).L. Lovàsz. Submodular functions and convexity. In A. Bachem, B. Korte,and M. Grötschel, editors, Mathematical Programming The State of the Art,pages 235–257. Springer, 1983.L. McLinden. Symmetric duality for structured convex programs. Transac-tions of the American Mathematical Society, 245:147–181, 1978.R. P. Millane. Recent advances in phase retrieval. Proceedings of SPIE 6316,Image Reconstruction from Incomplete Data IV, 63160E, September 2006.K. Mohan and M. Fazel. Reweighted nuclear norm minimization with ap-plication to system identification. In American Control Conference (ACC),2010, pages 2953–2959, June 2010.B. K. Natarajan. Sparse approximate solutions to linear systems. SIAMJournal on Computing, 24(2):227–234, 1995.100BibliographyY. Nesterov. Unconstrained convex minimization in relative scale. Mathe-matics of Operations Research, 34(1):180–193, 2009.P. Netrapalli, P. Jain, and S. Sanghavi. Phase retrieval using alternatingminimization. In Advances in Neural Information Processing Systems 26,pages 2796–2804, 2013.J. Nocedal and S. J. Wright. Numerical Optimization. Springer, New York,second edition, 2006.M. L. Overton. Large-scale optimization of eigenvalues. SIAM Journal onOptimization, 2(1):88–120, 1992.M. V. Ramana, L. Tunçel, and H. Wolkowicz. Strong duality for semidefiniteprogramming. SIAM Journal on Optimization, 7(3):641–662, 1997.B. Recht. A simpler approach to matrix completion. Journal of MachineLearning Research, 12:3413–3430, 2011.B. Recht, M. Fazel, and P. A. Parrilo. Guaranteed minimum-rank solutionsof linear matrix equations via nuclear norm minimization. SIAM Review, 52(3):471–501, 2010.B. Recht, W. Xu, and B. Hassibi. Null space conditions and thresholds forrank minimization. Mathematical Programming, 127(1, Series B):175–202,2011.P. Richtárik. Improved algorithms for convex minimization in relative scale.SIAM Journal on Optimization, 21(3):1141–1167, 2011.R. T. Rockafellar. Convex Analysis. Princeton University Press, Princeton,1970.R. T. Rockafellar and R. J.-B. Wets. Variational Analysis. Springer-Verlag,New York, 1998.P. H. Ruys and H. M. Weddepohl. Economic theory and duality. In J. Kriens,editor, Convex analysis and mathematical economics, pages 1–72. Springer,Berlin, 1979.M. Schmidt. minFunc: unconstrained differentiable multivariate optimiza-tion in Matlab, 2005. URL http://www.cs.ubc.ca/~schmidtm/Software/minFunc.html. (Retrieved on December 16, 2015).101BibliographyT. Strohmer. Personal communication, December 2013.D. S. Taubman and M. W. Marcellin. JPEG 2000: Image Compression Fun-damentals, Standards and Practice. Kluwer Academic Publishers, Norwell,MA, USA, 2001. ISBN 079237519X.M. Teboulle and J. Kogan. Applications of optimization methods to robuststability of linear systems. Journal of Optimization Theory and Applications,81(1):169–192, 1994.T. Teuber, G. Steidl, and R. H. Chan. Minimization and parameter es-timation for seminorm regularization models with I-divergence constraints.Inverse Problems, 29(3), 2013.M. J. Todd and Y. Ye. Approximate Farkas lemmas and stopping rules foriterative infeasible-point algorithms for linear programming. MathematicalProgramming, 81(1, Series A):1–21, 1998.L. Tunçel and H. Wolkowicz. Strong duality and minimal representationsfor cone optimization. Computational Optimization and Applications, 53(2):619–648, 2012.E. van den Berg. Convex optimization for generalized sparse recovery. PhDthesis, University of British Columbia, December 2009.E. van den Berg and M. P. Friedlander. Sparse optimization with least-squaresconstraints. SIAM Journal on Optimization, 21(4):1201–1229, 2011.C. D. White, S. Sanghavi, and R. Ward. The local convexity of solvingsystems of quadratic equations. CoRR, abs/1506.07868, 2015. URL http://arxiv.org/abs/1506.07868. (Retrieved on December 16, 2015).H. Zhang and W. Hager. A nonmonotone line search technique and its ap-plication to unconstrained optimization. SIAM Journal on Optimization, 14(4):1043–1056, 2004.102
- Library Home /
- Search Collections /
- Open Collections /
- Browse Collections /
- UBC Theses and Dissertations /
- Gauge duality and low-rank spectral optimization
Open Collections
UBC Theses and Dissertations
Featured Collection
UBC Theses and Dissertations
Gauge duality and low-rank spectral optimization de Albuquerque Macêdo Júnior, Ives José 2015
pdf
Page Metadata
Item Metadata
Title | Gauge duality and low-rank spectral optimization |
Creator |
de Albuquerque Macêdo Júnior, Ives José |
Publisher | University of British Columbia |
Date Issued | 2015 |
Description | The emergence of compressed sensing and its impact on various applications in signal processing and machine learning has sparked an interest in generalizing its concepts and techniques to inverse problems that involve quadratic measurements. Important recent developments borrow ideas from matrix lifting techniques in combinatorial optimization and result in convex optimization problems characterized by solutions with very low rank, and by linear operators that are best treated with matrix-free approaches. Typical applications give rise to enormous optimization problems that challenge even the very best workhorse algorithms and numerical solvers for semidefinite programming. The work presented in this thesis focuses on the class of low-rank spectral optimization problems and its connection with a theoretical duality framework for gauge functions introduced in a seminal paper by Freund (1987). Through this connection, we formulate a related eigenvalue optimization problem more amenable to the design of specialized algorithms that scale well and can be used in practical applications. We begin by exploring the theory of gauge duality focusing on a slightly specialized structure often encountered in the motivating inverse problems. These developments are still made in a rather abstract form, thus allowing for its application to different problem classes. What follows is a connection of this framework with two important classes of spectral optimization problems commonly found in the literature: trace minimization in the cone of positive semidefinite matrices and affine nuclear norm minimization. This leads us to a convex eigenvalue optimization problem with rather simple constraints, and involving a number of variables equal to the number of measurements, thus with dimension far smaller than the primal. The last part of this thesis exploits a sense of strong duality between the primal-dual pair of gauge problems to characterize their solutions and to devise a method for retrieving a primal minimizer from a dual one. This allows us to design and implement a proof of concept solver which compares favorably against solvers designed specifically for the PhaseLift formulation of the celebrated phase recovery problem and a scenario of blind image deconvolution. |
Genre |
Thesis/Dissertation |
Type |
Text |
Language | eng |
Date Available | 2015-12-18 |
Provider | Vancouver : University of British Columbia Library |
Rights | Attribution-NonCommercial-NoDerivs 2.5 Canada |
DOI | 10.14288/1.0221436 |
URI | http://hdl.handle.net/2429/55920 |
Degree |
Doctor of Philosophy - PhD |
Program |
Computer Science |
Affiliation |
Science, Faculty of Computer Science, Department of |
Degree Grantor | University of British Columbia |
Graduation Date | 2016-02 |
Campus |
UBCV |
Scholarly Level | Graduate |
Rights URI | http://creativecommons.org/licenses/by-nc-nd/2.5/ca/ |
Aggregated Source Repository | DSpace |
Download
- Media
- 24-ubc_2016_february_dealbuquerquemacedojunior_ivesjose.pdf [ 7.95MB ]
- Metadata
- JSON: 24-1.0221436.json
- JSON-LD: 24-1.0221436-ld.json
- RDF/XML (Pretty): 24-1.0221436-rdf.xml
- RDF/JSON: 24-1.0221436-rdf.json
- Turtle: 24-1.0221436-turtle.txt
- N-Triples: 24-1.0221436-rdf-ntriples.txt
- Original Record: 24-1.0221436-source.json
- Full Text
- 24-1.0221436-fulltext.txt
- Citation
- 24-1.0221436.ris
Full Text
Cite
Citation Scheme:
Usage Statistics
Share
Embed
Customize your widget with the following options, then copy and paste the code below into the HTML
of your page to embed this item in your website.
<div id="ubcOpenCollectionsWidgetDisplay">
<script id="ubcOpenCollectionsWidget"
src="{[{embed.src}]}"
data-item="{[{embed.item}]}"
data-collection="{[{embed.collection}]}"
data-metadata="{[{embed.showMetadata}]}"
data-width="{[{embed.width}]}"
async >
</script>
</div>
Our image viewer uses the IIIF 2.0 standard.
To load this item in other compatible viewers, use this url:
http://iiif.library.ubc.ca/presentation/dsp.24.1-0221436/manifest