UBC Theses and Dissertations

UBC Theses Logo

UBC Theses and Dissertations

Numerical solution of semidefinite constrained least squares problems Krislock, Nathan Gavin Beauregard 2003

Your browser doesn't seem to have a PDF viewer, please download the PDF to view this item.

Item Metadata

Download

Media
831-ubc_2003-0255.pdf [ 4.97MB ]
Metadata
JSON: 831-1.0080035.json
JSON-LD: 831-1.0080035-ld.json
RDF/XML (Pretty): 831-1.0080035-rdf.xml
RDF/JSON: 831-1.0080035-rdf.json
Turtle: 831-1.0080035-turtle.txt
N-Triples: 831-1.0080035-rdf-ntriples.txt
Original Record: 831-1.0080035-source.json
Full Text
831-1.0080035-fulltext.txt
Citation
831-1.0080035.ris

Full Text

N U M E R I C A L S O L U T I O N OF S E M I D E F I N I T E C O N S T R A I N E D L E A S T S Q U A R E S P R O B L E M S by N A T H A N G A V I N B E A U R E G A R D K R I S L O C K B.Sc. Hon. (Combined Math/Computer Science) University of Regina, 2000 A T H E S I S S U B M I T T E D I N P A R T I A L F U L F I L L M E N T O F T H E R E Q U I R E M E N T S F O R T H E D E G R E E O F M A S T E R O F S C I E N C E in T H E F A C U L T Y O F G R A D U A T E S T U D I E S Department of Mathematics Institute of Applied Mathematics We accent thi's.thesis as conforming to t \e required standaSrq T H E U N I V E R S I T Y O F B R I T I S H C O L U M B I A May 2003 © Nathan Gavin Beauregard Krislock, 2003 In presenting this thesis in partial fulfillment of the requirements for an advanced degree at the University of Bri t ish Columbia, I agree that the Library shall make it freely available for refer-ence and study. I further agree that permission for extensive copying of this thesis for scholarly purposes may be granted by the head of my department or by his or her representatives. It is understood that copying or publication of this thesis for financial gain shall not be allowed without my written permission. Department of Mathematics The University of Br i t i sh Columbia Vancouver, Canada Abstract In this thesis, we are concerned wi th computing the least squares solution of the linear matrix equation AX = B subject to the constraint that the matrix X is positive semidefinite. For symmetric X, this is the previously studied semidefinite least squares (SDLS) problem; for nonsymmetric X, we introduce the nonsymmetric semidefinite least squares (NS-SDLS) problem. A n application of this second problem is the estimation of the compliance matrix at some location on a deformable object. This is an important step in the process of making interactive computer models of de-formable objects, a technique which is used in the area of medical simulation. We also introduce a third semidefinite constrained least squares problem called the linear matrix inequality least squares (LMI-LS) problem, which is a generalization of the first two problems. Sufficient conditions for the existence and uniquenss of solutions for each of these three problems is provided. These solutions are characterized as solutions of nonlinear systems of equations known as the Karush-Kuhn-Tucker (KKT) equations. It is shown that the K K T equations for each of these problems can be stated as a semidefinite linear complementarity problem (SDLCP). Interior-point methods are proposed for the numerical solution of each of these three problems. Computational experiments are conducted which indicate that predictor-corrector interior-point methods solve these semidefinite constrained least squares problems efficiently. A noticable improvement is made over the current computational methods used for solving the S D L S problem. 11 Table of Contents Abstract ii Table of Contents i i i List of Tables v i List of Figures vii Acknowledgements vi i i Chapter 1. Introduction 1 1.1 The Semidefinite Constrained Least Squares Problems 2 1.2 A n Appl icat ion 3 1.3 Numerical Solutions 3 1.4 Previous Research 4 1.5 Outline of Thesis 6 Chapter 2. Convex Analysis and Optimality Conditions 7 2.1 Preliminaries 7 2.2 Convex Analysis 9 2.2.1 Convex sets and functions 10 2.2.2 Convex optimization 11 2.2.3 Cone constraints 12 2.3 Lagrangian Duali ty and Optimali ty Conditions 13 2.3.1 Lagrangian duality 14 2.3.2 Sufficient conditions for optimality 15 2.3.3 The K K T conditions 15 2.3.4 Necessary conditions for optimality 17 2.3.5 Summary 20 2.4 Applicat ion to Semidefinite Constrained Least Squares 21 2.4.1 The symmetric semidefinite least squares problem (SDLS) 21 2.4.2 The nonsymmetric semidefinite least squares problem (NS-SDLS) 25 2.4.3 The linear matrix inequality least squares problem ( L M I - L S ) 26 2.5 The Semidefinite Linear Complementarity Problem 30 2.5.1 The S D L S problem is an S D L C P 31 2.5.2 The N S - S D L S problem is an S D L C P 32 2.5.3 The L M I - L S problem is an S D L C P 32 Chapter 3. Interior-Point Methods and Algorithms 35 3.1 Overview 35 3.2 Log Barrier Problems 37 3.2.1 The S D L S log barrier problem 39 3.2.2 The N S - S D L S log barrier problem 40 i i i Table of Contents 3.2.3 The L M I - L S log barrier problem 41 3.3 The Central Pa th 42 3.4 Interior-Point Methods for the S D L C P 44 3.4.1 Choosing a suitable direction (AX, L\Y) 45 3.4.2 Choosing a suitable step length 9 49 3.4.3 Summary 50 3.5 Convergence Results 52 3.6 Implementation Issues 55 3.6.1 Computing search directions for the S D L S problem 56 3.6.2 Computing search directions for the N S - S D L S problem 58 3.6.3 Comput ing search directions for the L M I - L S problem 60 3.6.4 Differences in the search directions 62 Chapter 4. Numerical Results 64 4.1 Implementation and experimentation details 64 4.1.1 Solving for the search direction 64 4.1.2 Computing the maximum step length 65 4.1.3 Starting points 65 4.1.4 Stopping criteria 65 4.1.5 Woodgate's Algor i thm 67 4.1.6 Generating problem instances 67 4.2 Numerical comparison of the algorithms 68 4.2.1 The S P F / P C experiment 68 4.2.2 The Woodgate / P C experiment 69 4.2.3 A final comparison 69 4.3 Est imating the compliance matrix 69 Chapter 5. Conclusions and Future Work 80 5.1 Conclusions 80 5.2 Future Work 81 Bibliography 82 Appendix A . Kronecker products 85 Appendix B . Matlab M-files 88 B . l S D L S M-files 88 B . l . l sdls.m 88 B . l . 2 sdls-precorr.m 92 B.2 N S - S D L S M-files 96 B.2.1 ns_sdls.m 96 B.2.2 ns_slds_precorr.m 100 B.3 L M I - L S M-files 104 B.3.1 ImiJs .m 104 B.3.2 lmids-precorr.m 109 B.4 Woodgate's Algor i thm M-files 114 iv Table of Contents BAA wg.sdls.rn 114 B.5 Miscellaneous M-files 119 B.5.1 vec.m 119 B.5.2 mat.m 119 B.5.3 vecV.m 120 B.5.4 vecK.m 120 B.5.5 maxjstep.m 121 B.5.6 eigshift.m 121 B.5.7 wg.f.m 121 v List of Tables 3.1 The matrices in the vec linear system defining the S D L S search directions 57 3.2 The matrices in the svec linear system defining the S D L S search directions 57 3.3 The matrices i n the vec linear system defining the N S - S D L S search directions 59 3.4 The matrices in the svec linear system defining the N S - S D L S search directions 59 3.5 The matrices in the vec linear system defining the L M I - L S search directions 61 3.6 The matrices in the svec linear system defining the L M I - L S search directions 61 4.1 Results from computing solutions of the S D L S , N S - S D L S , and L M I - L S problems 70 4.2 The Woodgate / P C comparison results 74 v i List of Figures 2.1 The graph of a convex function / 10 3.1 A framework for many optimization algorithms 35 3.2 A general path-following algorithm for the S D L C P 50 3.3 A general predictor-corrector algorithm for the S D L C P 51 4.1 Plots of results from computing solutions to the S D L S problem 71 4.2 Plots of results from computing solutions to the N S - S D L S problem 72 4.3 Plots of results from computing solutions to the L M I - L S problem 73 4.4 Plots of the Woodgate / P C comparison results 75 4.5 Dual i ty gap convergence curves 76 v i i Acknowledgements This thesis is dedicated to my late father, Br i an Russel Krislock. I wish to give thanks to my supervisor, Dr . James Varah, and my reader, Dr . Ph i l i p Loewen, for a l l the effort and advice they provided me during my studies and research at the University of Br i t i sh Columbia. I am also incredibly grateful to Dr . Jochen Lang and Dr . Dinesh Pa i for providing me wi th such an interesting and exciting problem as the topic for my thesis. A special thanks is given to a l l my friends at St. John's College at U B C who provided me wi th some of the best years of my life. Of course, none of this would have been possible without the love and support of my family, especially my wonderful parents and brother. Thank you. v m Chapter 1 Introduction There are often times in which one would like to find a solution to a problem for which no exact solution exists. We would st i l l like to be able to give such a problem to a computer and have it respond wi th more than just "There i s no exact s o l u t i o n . " It is preferable to have the computer let us know that there is no exact solution, but provide us wi th a solution which gives us the smallest error possible i n our problem. In addition, there are many times in which we would like to compute such a solution wi th least error for a problem which requires, for example, that the variables of this solution cannot be negative. We must find a way to have the computer provide us wi th a solution that satisfies our problem as close as is possible subject to such a constraint. In this thesis, we are concerned wi th the problem of solving the linear matrix equation which is defined as AX = B (1.1) where A and B are rectangular matrices and the variable X is a square matrix. There is often no exact solution to this equation, but we are st i l l interested in a solution which minimizes the sum of the squares of the components of the matrix AX — B. Such a solution is called a least squares solution of this equation. In certain applications there are constraints that must be satisfied, constraints which the standard least squares solution does not usually satisfy. For example, we could require that a l l the entries of the matrix X be nonnegative, or that the matrix X be symmetric, or both. We focus our attention on two possible constraints which we can place on the matrix X when finding the least squares solution of equation (1.1). The first is to insist that X be symmetric and have nonnegative eigenvalues; that is, to constrain X to be positive semidefinite. In the second case, we allow X to be nonsymmetric, but constrain the symmetric part of X, ^(X + XT), to be positive semidefinite. We also consider a third problem which is a generalization of the first two problems. This problem seeks a least squares solution of the linear equation Ax = b (1.2) where A is a rectangular matrix and x and b are vectors. In this case we constrain the variable x by insisting that a linear combination of symmetric matrices, which is defined by x, be positive semidefinite. We now define each of these three semidefinite constrained least squares problems explicitly. 1 Chapter 1. Introduction 1.1 The Semidefinite Constrained Least Squares Problems The first of our three problems is called the symmetric semidefinite least squares (SDLS) prob-lem, and is defined as follows. minimize \\AX — B\\F , . subject to X y 0 1 ' ' Here we use the notation X y 0 to mean that X is symmetric and positive definite. Moreover, || • \\p is the Frobenius matrix norm which is used to measure the size of the residual matrix, AX — B, for a given matrix X. We wi l l define this norm in Chapter 2, but for now it is enough to observe that | | A X " — -B||.F > 0 i f X does not satisfy AX = B; if X satisfies AX = B, then \\AX — B\\p = 0. Therefore, the S D L S problem is the problem of finding a symmetric n x n matrix X which minimizes this measure of the error in the equation AX = B, for some m x n matrices A and B, subject to the constraint that X be positive definite. We call our second problem the nonsymmetric semidefinite least squares (NS-SDLS) prob-lem, and we define it as follows. minimize 11^ 4-^ — B\\F A\ subject to \(X + XT)yO ^ > This time we are looking for an n x n matrix X, which is possibly nonsymmetric, and which minimizes the Frobenius norm of the residual matrix AX — B, where A and B are some m x n matrices, subject to the constraint that \(X + XT) be positive semidefinite. Final ly, our th i rd problem is defined as minimize \\Ax — b\\2 Q ^ subject to Kx •< C where n fCx = Xi Ki t=i and K\,..., Kn and C are k x k symmetric matrices. The constraint that ICx -< C means that we require that the k x k symmetric matrix, C — Kx, be positive semidefinite. Since this type of constraint is commonly known as a linear matrix inequality, we call this problem the linear matrix inequality least squares (LMI-LS) problem. Here A is an m x n matrix, b is a vector of length m, and a; is a vector of length n. This time we measure the size of the residual vector, Ax — b, using the Euclidean vector norm, || • H2, which we w i l l define in Chapter 2. We would like to note that while the S D L S problem has been previously studied in great detail, this seems to be the first time that the N S - S D L S and L M I - L S problems have been formulated and studied. Furthermore, as we wi l l see, each of these problems are convex opti-mization problems whose local minimizers are global minimizers, and, under the condition of strict convexity, there is at most one global minimizer. 2 Chapter 1. Introduction 1.2 An Application The inspiration behind the study of these semidefinite constrained least squares problems began wi th the need to solve the N S - S D L S problem in order to estimate the compliance (or stiffness) matrix of a deformable (or elastic) object (see [15]). This compliance matrix estimation is important i n many areas including robotics, medical simulation, and computer graphics. The idea is to use this compliance matrix to be able to model a deformable object, such as a human organ, and interact wi th its vir tual model on a computer. This interaction is done by using some computer interface which allows users to touch this vir tual object, and which also provides some force-feedback depending on the elasticity of the object. The method used for estimating the compliance matrix at a certain location on a deformable object is to experimentally measure the displacement vector u resulting from some force, rep-resented by the vector p, applied to the object at this location. The compliance matrix X for this contact point governs the relationship between u and p: In order to estimate the compliance matrix at this contact point, many displacement measure-ments, u 1 , . . . ,ul, are taken for the same number of different forces, p1,... ,pl. We then want to find the matrix X which satisfies Due to unavoidable errors in this recording process, we wi l l not be able to find the matrix X which satisfies equation (1.7) exactly. Therefore, we aim to find a matrix X which minimizes the error i n this equation. However, a problem occurs when using the simple unconstrained least squares solution of equation (1.7). If the compliance matrix X has an eigenvalue wi th a negative real part, this may cause the vir tual model to respond to the user touching that contact point by pull ing the user's hand further in the direction of the touch. B y insisting that ^(X + XT) is positive definite, none of the real parts of the eigenvalues of X wi l l be negative. Therefore, we are interested in solving the N S - S D L S problem in order to obtain a compliance matrix which produces a well-behaved vir tual model for the deformable object. For more on this topic, see [14]. 1.3 Numerical Solutions The purpose of this thesis is to present algorithms for computing approximate solutions to each of these three problems. We would like these approximate solutions to agree with the exact solution to wi th in a prescribed accuracy, and we would also like to compute these approximate solutions efficiently; i.e., the algorithm should not take too long to run on a computer. The methods we choose to implement are the popular interior-point methods which have been used to solve many other similar constrained optimization problems. We choose these methods due to the great success they have had in solving these types of problems accurately and efficiently. Moreover, it does not appear that interior-point methods have been used in the design of algorithms for solving the S D L S problem before this. u = Xp. (1.6) [ p 1 . . . p T T * T = [ « 1 - " « l ] r . (1.7) 3 Chapter 1. Introduction 1.4 Previous Research Although neither the N S - S D L S problem nor the L M I - L S problem seem to have been previously studied, the S D L S problem was first proposed back in 1968 by Brock [7] for obtaining symmetric positive definite compliance matrices like those discussed in Section 1.2. Brock had proposed a method for computing a symmetric positive definite least squares solution to (1.1); however, the method he uses only returns the solution to minimize \\AX — B\\p ,.. , subject to X = XT { ' ' which is not always positive definite. The S D L S problem was later reformulated and studied in 1988 by Allwright [3]. In that paper, Allwright proved that there is a unique solution to this problem i f and only i f the matrix A has full column rank, and proposed a computational method for approximating this solution under that condition. Allwright also mentions that although there does not seem to be a simple formula for the solution of (1-3), we do have the following result that was shown i n the P h . D . thesis of Woodgate [32]. • If A = I, then the solution of (1.3) is X = \(B + BT){0y Here / represents the n x n identity matrix, and M ( 0 ) denotes the result of changing al l of the negative eigenvalues of M to 0 in the eigenvalue decomposition of M. In 1990 Allwright provided a necessary and sufficient condition for the existence of a solution when A does not have full column rank in the follow-up paper [4] written jointly wi th Woodgate. Woodgate went on in 1996 to provide a Lagrange multiplier type condition which charac-terized solutions of the S D L S problem in [33], although it was not stated in terms of Lagrange multipliers. Th is result was then used to provide expressions for the solution of the S D L S problem in a couple of special cases when A has full column rank: • If ABT is symmetric and positive definite, then the solution of (1.3) is X = A^B = [ATA)-lATB. • The solution of (1.3) is X = 0 i f and only i f — ATB — BTA is positive definite. Notice that X = A^B is the solution to the unconstrained least squares problem m i n | | A X - B\\F, (1.9) where A^ is the Moore-Penrose pseudoinverse of A. Woodgate also provided a computational method in [33] for solving the S D L S problem when A has full column rank. In 1998, Woodgate proposed a new algorithm for solving the S D L S problem in [34] which is based on the related unconstrained problem (1.10). mm\\AETE - B\\F (1.10) 4 Chapter 1. Introduction The idea here is that X is symmetric and positive semidefinite i f and only i f X = ETE for some n x n matrix E. This approach was avoided in [3] and [33] due to the fact that (1.10) is no longer a convex optimization problem, which can make obtaining a global minimizer difficult. However, Woodgate proved in [34] that any local minimizer of (1.10) is actually a global minimizer. The resulting computational method showed great improvements over the previous methods from [3] and [33]. We wi l l refer to this method as Woodgate's Algorithm, and we wi l l use it as the basis for which we compare the interior-point methods described here for the S D L S problem. Further progress was made in 1999 by Liao [16] in the area of determining an expres-sion for the solution, X, of the S D L S problem under certain conditions. B y considering the singular value decomposition of A, Liao was able to determine an expression for X under a fairly general assumption, which includes the case when the matrix ABT + BAT is positive semidefinite. Notice that this is a more general result than the one provided in [33] for the case when ABT is symmetric and positive definite. Liao also used this result to provide an expres-sion for a l l of the solutions of a specific case of the S D L S problem, the least squares inverse eigenvalue problem for positive semidefinite matrices. This is the problem of determining the positive semidefinite matrices A for which, given some eigenvalues A i , . . . , A m wi th correspond-ing eigenvectors v\,... ,vm, minimize the norm of the eigenvalue equation AV = VA, where A = d i a g ( A i , . . . , A m ) and V = {v\--- vm). minimize ^\AV — VA| | i 7 , .. subject to A h 0 A related paper by X ie [35] provided the only example of the N S - S D L S problem (1.4) in the collection of literature we considered. In that paper, X i e studied the least squares inverse eigenvalue problem for nonsymmetric positive semidefinite matrices (1.12). minimize | |^4I^ — V^A , ^. subject to \{A + AT)yQ ^' ' A n interesting result which is found i n [35] is the expression for the solution of the N S - S D L S problem when A = I: • If A = I, then the solution of (1.4) is X = \{B + BT){0) + \{B - BT). U n t i l now, we have only mentioned previous work done on semidefinite constrained least squares problems which are covered by the three problems which we are considering. However, there has also been much work done on problems that are generalizations of the S D L S problem. In 1995, H u [11] proposed a computational method for solving the following matrix estimation problem. minimize \\AX — B\\p subject to X = XT, L<X<U, (1.13) AminPO > C > 0, X eV In (1.13), the inequality, L < X < U, is to be interpreted component-wise: Lij < Xij < Uij, for a l l i,j £ { 1 , . . . , n). Furthermore, A m j n ( X ) represents the minimum eigenvalue of X, and V 5 Chapter 1. Introduction specifies a set of n x n matrices having a particular linear pattern. That is, { m i=l where K\,..., Km are some symmetric n x n matrices. The idea behind Hu's method is to transform (1.13) into an equivalent convex quadratic program wi th infinitely many linear con-straints, and then solve this problem by generating and solving a sequence of ordinary convex quadratic programs, adding a new linear constraint at every iteration. Escalante and Raydan [9] also consider the matrix estimation problem (1.13), but wi th A = I. They propose using a different computational method, known as Dykstra 's alternating projection algorithm. This method is based on viewing the constraints in (1.13) as the intersec-tion of three sets, and performing a series of projections onto each of these sets unt i l a desired tolerance is reached. This paper also contains the following interesting result: • The solution to minimize \\X — B\\p subject to X = XT, (1.14) AminPQ > e > 0 is X = \{B + v3 T)( £), where denotes the result of changing all of the eigenvalues less than e to e in the eigenvalue decomposition of M. 1.5 Outline of Thesis We summarize the remainder of this thesis as follows. In Chapter 2 we provide a general discussion of convex optimization and develop the systems of equations (the KKT conditions) which w i l l characterize the solutions of each of our three semidefinite constrained problems: the S D L S (1.3), the N S - S D L S (1.4), and the L M I - L S (1.5). This chapter also mentions the conditions which w i l l guarantee the existence and uniqueness of solutions to our problems, and shows the connection of each of these problems to the semidefinite linear complementarity problem (SDLCP). Chapter 3 then goes on to describe the interior-point methods which we w i l l be using to solve our three problems. Starting wi th some background about interior-point methods, Chapter 3 provides the intuit ion behind how we can define a central path which w i l l be used as the basis for describing the interior-point algorithms for our problems. Due to the connection to the S D L C P , we are able to provide a uniform discussion of the standard path-following and predictor-corrector algorithms which we are considering; this discussion also includes some of the known theoretical convergence results for these two types of algorithms. Finally, Chapter 3 concludes by mentioning the issues involved in implementing the standard path-following algorithm and the predictor-corrector algorithm for each of our three problems. The results of the numerical experiments of our interior-point algorithms conducted in M A T L A B , along wi th a comparison to Woodgate's Algor i thm, wi l l be presented in Chapter 4. We finish wi th our conclusions and ideas for future work in Chapter 5. 6 Chapter 2 Convex Analysis and Optimality Conditions 2.1 Preliminaries We begin by discussing the framework in which the theoretical results wi l l be presented. Since we w i l l be working both in the space of vectors and in the space of matrices, it w i l l be useful to generalize these spaces as is done in Borwein and Lewis' book [6]. There they define a Euclidean space E to be a finite-dimensional vector space over the reals wi th which there is an inner product (•, •) : E x E —> IR that satisfies the following conditions (x, y, z G E; a, (3 G 1R): 1. {ax + Py, z) = a {x, z) + P (y, z), 2. {x,y) = {y,x), 3. {x, x) > 0, where {x, x) = 0 if and only if x = 0. Examples of Euclidean spaces are I R n and IR™1*™, which are the spaces of real column vectors x = (x\,... ,xn)T and real mxn matrices A = (aij), respectively. The standard inner product on IR" is defined as n {x,y} ••= ^XiVi = xTy. i=l Similarly, the inner product we wi l l use on j " R m x n w i l l be defined as m n i=l j=l where tr(j4) = ^ _ t an is the trace of a matrix A e j " R " x " . ,{x,x). This is the Euclidean nor nTfe j f F r f i s ^ e f e c f o / l f 1 ? ^ . f n ^ ^ . ^ M r t rftxYe well known Frobenius norm, which is denoted by || • | |^ . When referring to topological properties in these spaces, it is wi th respect to these norms. B y the least squares solution to a matrix equation AX = B, where A G ! R m x " , B G R . m x p , and X G I R T l x p , we mean a matrix X (or vector, i f p = 1) such that the norm of the residual matrix, R = B — AX, is minimized. We notice that \\AX — B\\p is minimized exactly when the 7 Chapter 2. Convex Analysis and Optimality Conditions sum of squares of the residuals, m n \\AX - B\\F = £ £ 4 , i=l 3 = 1 is minimized. It is also noticed (we wi l l show this later) that i f f{X) = ^\\AX — B \ \ F and g(X) = \\AX - B\\F, then Vf(X)=AT(AX-B) and Vg{X) = ^ - ^ A T { A X - B). Thus, we can see that V / ( X ) is a simpler expression than Vg{X) and avoids the difficulty arising when | | A X " — BWF = 0. It is for these reasons that we w i l l hence forth only deal wi th the (normalized) sum of squares function, f(X) = ^\\AX — B \ \ F , when finding the least squares solution of a matrix equation. For an account of the statistical justification behind the principle of least squares, see [5]. We wi l l use Sn to denote the space of real symmetric n x n matrices A, where symmetric means A = AT. We make <S™ a Euclidean space by defining the inner product to be (X, Y) tr(XY) and we note that the dimension of 5 " is | r i ( n + 1). A symmetric matrix A is called positive semidefinite i f x1Ax > 0 for a l l x G IR", and positive definite if strict inequality holds whenever x ^ 0. The sets of positive semidefinite and positive definite matrices in Sn w i l l be denoted respectively as S™ and « S " + . Notice that <S™+ is actually the interior of <S™. For X, Y G Sn we say X -< Y if Y - X G S% and X < Y i f Y - X G S™ + . This is known as the Loewner partial ordering of Sn. It is important to note that <S" is a cone in Sn; a cone in E is a nonempty subset C that satisfies C = 1R+C, where I R + C := {tx \ t G IR+, x G C} and JR+ := {t G IR | t > 0}. Thus C is a cone i f for every vector x i n C, the ray from 0 through x is completely contained wi th in C. Another example of a cone is the positive orthant in IR™, IR". := {x G IR™. | Xi > 0 for i = 1,..., n\, whose interior is the set IR™ + := {x G IR™ | Xi > 0 for i = 1 , . . . , n}. For x, y G IR™, the partial ordering induced by IR™ is written a s x < j / i f y — x £ IR™ and x < y iiy — x G IR™ + . We denote the spectrum, or set of eigenvalues, of a matrix X G IR™ x n by cr(X). If X G <S™, then 1. a{X) C IR, 2. X G t?™ i f and only i f a{X) C 1R + , and 3. X G 5 ™ + i f and only i f a{X) C I R + + . A special feature of the cones IR™ and <S™ is the fact that they are self-dual. Let H C E be a cone. Then H is called self-dual i f H = H*, where H* = {y G E | (x,y)>0 for al l x G H} 8 Chapter 2. Convex Analysis and Optimality Conditions is the dual cone of H. A s we wi l l see later, the importance of self-dual cones in optimization is due to the fact that i f H is self-dual, then * f <*,!/> = ( ° ' 'lfl£H (2-1) x e H [ — oo, otherwise. We w i l l now summarize and prove these results in the following proposition. P r o p o s i t i o n 2.1.1 Let H be a self-dual cone in a Euclidean space E . Then equation (2.1) holds. Furthermore, IR" and <S" are self-dual cones. P r o o f . Let y G H. Then y G H*, which implies that (x, y) > 0 for a l l x G H. Moreover, since H is a cone, 0 G H and (0, y) = 0. Thus inf (x, y) = 0. O n the other hand, i f y fi H, then y fi H*, which implies that there exists an x G H such that (x,y) < 0. Furthermore, tx G H for a l l t > 0 and (tx, y) = t (x, y) < 0. Now l i m t (x, y) = —oo implies inf (x, y) = —oo, which proves the first claim. To show IR™ and <S" are self-dual cones, we first note that both sets are clearly cones i n their n respective spaces. If x, y G IR" , then X{, y, > 0 for i = 1 , . . . , n, and thus (x, y) = ^^xi Vi 2t 0. i=l Therefore, IR" C (IR" )*. To show that (IR" )* C IR" , suppose to the contrary that there is a y G (IR")* such that y fi IR" . Then y has the properties that (x,y) > 0 for al l x G IR" and that some y2- < 0. Take x = (where is the i%h unit vector: the vector wi th 1 i n the ith component, and zeros elsewhere). Then x G IR" and (x,y) = < 0, which is a contradiction. Therefore, y G IR" . To show that 5" is self-dual, we follow a proof given in [27, p. 520]. Let X,Y G <S". Since X is positive semidefinite, it has a unique positive semidefinite square root, \[X. Using this factorization of X and the fact that tr(ATB) = tr(BAT) for a l l A,B G I R m x " (see [27, p. 521]), we get (X, Y) = tr(y/xVXY) = tr(VXYVx). Since Y is positive semidefinite, we know that y/XYy/X must also be positive semidefinite, and thus have real nonnegative eigenvalues. B y the fact that the trace of a square matrix is equal to the sum of its eigenvalues, we get (X,Y) > 0, showing that S£ C Suppose now that Y G (Sn)*. Then (X,Y) > 0 for al l X G <S™. If Y fi 5?, then Y has a negative eigenvalue A wi th eigenvector u G IR". Let X = uuT, then X G 5" and (X,Y) = tr(uuTY) = tr(uTYu) = A||it||2 < 0 gives the contradiction. Thus Y G <S", proving that (SI)* CSl. M 2.2 Convex Analysis A s we w i l l see, the problems introduced in Chapter 1 fall under the scope of convex optimization. This is due to the fact that the sum of squares functions which we are considering are convex functions, and we are minimizing them over a convex set of points. A s per the classical work of Rockafellar [25], we find that the convexity of sets and functions is a powerful concept in 9 Chapter 2. Convex Analysis and Optimality Conditions optimization, largely due to the facts that a local minimizer of a convex function over a convex set is a global minimizer and that a minimizer of strictly convex function is unique. After defining these concepts in §2.2.1, we w i l l state and prove these results in §2.2.2. 2.2.1 C o n v e x sets and funct ions We wi l l begin by defining the concepts of convex sets and functions, after which we w i l l show that the sum of squares functions that are the subject of this thesis are convex. Definition 2.2.1 A set fi in a Euclidean space E is called convex if tx + (1 — t)y G fi, for all i , y £ ( ] and t G [0,1]. Given a convex set fi, a function f : fi —> JR is called convex if f(tx+(l-t)y)<tf(x) + (l-t)f(y), for all x,y G fi and t G [0,1]. The function f is called strictly convex if the above inequality is strict whenever x ^ y and t G (0,1). Thus a set is convex i f for any pair of points in the set, the line segment between that pair lies entirely inside the set. For example, the cones IR™ and <S" are convex. A function is convex i f between al l pairs of points, x and y, the line segment connecting (a;, f(x)) and (y, f(y)) lies on or above the graph of / at each point on the line segment between x and y (see Figure 2.1). (yj(y)) Figure 2.1: The graph of a convex function / . Proposition 2.2.2 Given A G I R M X ™ and B G M M X P , the sum of squares function f(X) = | | | J4X — B\\p is convex in I R ™ X P and strictly convex if and only if A has full column rank. 10 Chapter 2. Convex Analysis and Optimality Conditions Proof. Here we give an extended version of a proof provided by Adlers [1, p. 5]. Let X, Y G I R n X D and * G [0,1]. Since f{X) = § (AX, AX) - (AX,B) + \ (B,B), then f[tX + (l-t)Y) = \t2(AX,AX)+t(l-t)(AX,AY) + \(l-t)2(AY,AY) -t (AX, B) - (1 - t) (AY, B) + \ (B, B) which implies that [tf(X) + (1 - t)f(Y)} - f(tX + (1 - t)Y) = \{t- t2) (AX, AX) - t(l - t) (AX, AY) + |[(1 - t) - (1 - t)2} (AY, AY) = \t(l-t)(AX - AY,AX - AY) = \t(l-t)\\A(X-Y)\\2F > 0. If A has full column rank and X ^ Y, then A(X — Y) can not be the zero matrix. Further, i f t G (0,1) then the inequality above wi l l be strict. However, i f A does not have full column rank, then there exists a nonzero U G IR™ x p such that AU = 0. Let t ing X G ] R n X p be arbitrary, we can choose Y = X — U so that 1 ^ 7 and A(X — Y) = AU = 0. In this case, even wi th t G (0,1), the inequality above is not strict, showing that / is only strictly convex when A has full column rank. • From Proposition 2.2.2 it also follows that f(X) = — B\\F is convex over Sn since Sn is a convex subset of H™x™. It is also useful to note that when p = 1 above, we get the result presented in [1] that f(x) = \\\Ax — b\\2 is convex over JRn. 2.2.2 C o n v e x o p t i m i z a t i o n Given a convex set fl C E and a convex function IR, a convex programming problem is an optimization problem of the form ini{f(x)\x efl} (2.2) where we would like to determine the value of the infimum. If the infimum is attained in problem (2.2), then we would like to find a point x G fl such that f(x) = min{/(a;) \ x G fl}. Such an x is called a global minimizer of / over fl, and is referred to as an optimal solution to the optimization problem. A local minimizer is a point x G fl for which there is an open set N C E containing x such that f(x) = min{/(a;) | x G fl fl N}. A n y point x in fl is called feasible; i f x is in the interior of fl, then x is called strictly feasible. A feasible direction at a point x G fl is a vector » 6 E such that x + tv G fl for al l t > 0 small enough. Theorem 2.2.3 Let x G fl be a local minimizer of the convex programming problem (2.2). Then x is a global minimizer. If f is strictly convex, then x is the unique global minimizer. Proof. Suppose x is not a global minimum. Then there exists a y G fl such that f(y) < f(x). Since / is convex, f(tx + (l-t)y) < tf(x) + (l-t)f(y) < tf(x) + (l-t)f(x) = m , 11 Chapter 2. Convex Analysis and Optimality Conditions for a l l t G (0,1). Since x is a local minimizer, there exists an open set TV C E containing x such that f(x) = min{/(a;) | x G fi (~1 TV}. But TV being open implies that there exists a to G (0,1) such that tox + (1 — to)y G TV, contradicting the fact that x is a local minimizer. Therefore, no such y exists and x is a global minimizer. Now suppose that / is strictly convex and that there exists another global minimizer z G fi, which means f(z) = f{x). Lett ing t G (0,1) we get f(tx+(l-t)z) < tf{x) + {l-t)f(z) = tf(x) + (l-t)f(x) = m , contradicting the fact that x is a global minimizer. This implies that no such z exists and x is the unique global minimizer. • 2.2.3 Cone constraints For our purposes we w i l l be specifying the constraint set fi to be the set of points which are mapped by some "convex" function into either IR™ or <S™. B y "convex" we mean convex in terms of the partial order induced by the closed convex cones IR™ or 5™. If C C E is some general closed cone, we write x <c y if y — x £ C and x <c y i f y — x G int C. Using this notation we have that < represents <IR^ a n d zi represents < s « . It turns out that the binary relation <c is a partial ordering when C is convex and pointed, where pointed means that C n - C = {0}. Proposition 2.2.4 If C is a pointed convex cone in E , then <c «s a partial ordering o /E. Proof. The binary relation <c is a partial ordering if <c l s reflexive, antisymmetric, and transitive. 1. Reflexive: x <c £, Vx G E Since C is a cone we have 0 G C, which implies that x — x G C for a l l x G E, and so <c is reflexive. 2. Antisymmetric: x <c V a n d y < c x =^ x = Vi Var, y G E Given x , j / £ E such that x <c y a n d y <c xi w e have y — x^C and —(y — x) G C Since C is pointed and y — x G C D — C , we have x = y. 3. Transitive: x <c y and y <c z =>• x <c zi Vx, y, z G E Let i , i ; , z £ E such that x <c y and y <c z. Then y — x G C and z — x G C , and the convexity of C implies ^(z — a;) = ^(y — x) + \{z — y) G C. Now, since C is a cone, we have z — x G C. • A slightly different version of the following definition can be found in [6, p. 59] Definition 2.2.5 Let E and Y be Euclidean spaces and C a pointed closed convex cone in Y . A function g : E —> Y is called C-convex if g(tx + (1 - t)y) <c tg(x) + (1 - t)g{y), 12 Chapter 2. Convex Analysis and Optimality Conditions for all x,y G E and t G [0,1]. Furthermore, g is called strictly C-convex if the above inequality holds with <c when x ^ y and t G (0,1). Proposition 2.2.6 Let E and Y be Euclidean spaces, C a pointed closed convex cone in Y , and g : E —> Y a C-convex function. Then the set fl = {a; G E | g(x) <c 0} is convex. Proof. Let x,y G fl and t G [0,1]. Then —g(x),—g(y) G C and the convexity of C implies -tg(x) - (1 - t)g(y) G C. Therefore g(tx + (1 - t)y) <c tg(x) + (1 - t)c/(y) < c 0 and the transitivity of <c gives us tx + (1 — i )y e f i . • In §2.1 we stated that the cones of primary interest to us are cones that are self-dual. Recall that a cone H in E is called self-dual i f H = H*, where H* = {y G E | (x, y) > 0 for al l x G H}. It turns out that self-dual cones are always pointed, closed, and convex. Proposition 2.2.7 Let H be self-dual cone in E . Then H is a pointed closed convex cone. Proof. Let y G H, and suppose — y G H. Then we have (x,y) > 0 and (x,—y) > 0 for a l l x G H. Thus (x, y) = 0 for al l x G H, which implies that (y, y) — 0. Therefore y = 0, which shows that H fl —H C {0}. Since every cone satisfies 0 G H, we see that H is pointed. To show H is closed, it suffices to show that for every sequence {yn} i n H that converges to some point y G E, we have y G H. Let x G H, then (a;, y n ) > 0 for a l l n G IN, and since the inner product is continuous, we have (x,y) = l im (x,yn) > 0. Since x was arbitrary, (x,y) > 0 Tl—>0O for a l l a; G i ? , and therefore y G i f . Finally, to show that 7f is convex, we let u,v G H, t G [0,1], and we notice that (ar, tu+(l- t)v) = t (x, u) + (1 - i ) {x, v) > 0 for a l l x £ H, which implies that tu + (1 — i)u G i ? . • 2.3 Lagrangian Duality and Optimality Conditions We are now in a position to describe the duality theory for the convex program inf{/(a:) | g(x) <H 0} (2.3) where / : E —> IR is convex, g : E —» Y is //-convex, E and Y are Euclidean spaces, and H is a self-dual cone in Y . Recall that IR™, l R m x ™ , and <S™ are a l l Euclidean spaces, and that IR™ and <S™ are both self-dual cones. Therefore this general formulation allows us to apply the results obtained to the many problems of interest i n this thesis, and the proofs of the results are practically identical to the case of semidefinite constraints, which we w i l l be considering for our least squares problems. The majority of this section is drawn from Borwein and Lewis ' book [6, §3.2, § 4 . 3 ] where they present the duality theory of problem (2.3) for the case of Y = IR7™ and H = IR™. Furthermore, the following development is undertaken here because it is believed that it is not to be found elsewhere stated in these terms. 13 Chapter 2. Convex Analysis and Optimality Conditions 2.3.1 Lagrangian duality We begin by introducing the central concept in this section, the Lagrangian function L : E x Y -> IR, defined by L(x;X) :=f(x) + (X,g(x)). The immediate power of the Lagrangian function becomes apparent when we notice that by equation (2.1), the self-duality of H implies that i f — g(x) G H, then (A, g(x)) < 0 for al l A G H, and i f — g(x) £ H, we can pick A G H to make (A,g{x)) as large as we want. Thus, , . _ J f(x), ii x is feasible Ae# . \ +oo, otherwise, and we can state our optimization problem as p := inf sup L(x; A), (2.4) xeE Aeff where p G [—oo, +oo] is called the primal value of the primal problem (2.3). It is also useful to quickly note that this same trick can be applied to problems with equality constraints. For example, consider the problem i n f { / ( x ) | g(x) = 0} which w i l l then have the Lagrangian interpretation sup Llx- A) = / / ( : E ) ' l f X i S f e a S i W e A G Y \ +oo, otherwise, where L(x; A) := f(x) + (X,g(x)}. In a sense, this is due to the fact that Y is the dual of the cone {0}. That is, {0}* = Y . Returning to problem (2.3), we define the Lagrangian dual problem as d := sup inf L(x; A), (2.5) where d G [—oo, -f oo] is called the dual value, and notice the following fundamental result. Theorem 2.3.1 (Weak Duality) Defining the primal function : E —> (—oo,+oo] by $(x) := supL{x; A) and the dual function $ : H —> [—oo, +oo) by $(A) := inf L{x;X) we have ^ (x) > $(A) for all x G E and A G H. Moreover, p> d. Proof. Let x G E and A € i f . Then = supL(x;X) > L{x;X) > mi L(x;X) = $ (A) . Since x G E and A G H are arbitrary and independent, p = inf * ( x ) > sup $(A) = d. • 14 Chapter 2. Convex Analysis and Optimality Conditions 2.3.2 Sufficient conditions for optimali ty The usefulness of the Weak Dual i ty Theorem is that i f we find x G E and A G H such that ty(x) = 3>(A) G ( — 00, +00), then we know that x is feasible for (2.3) and attains the minimum. Not only is x opt imal for the primal problem, but A is optimal for the dual problem as well. Furthermore, we notice that 0 = tf(x) - $(A) = f(x)-irdxeE[f(x) + (X,g(x))} ) > f(x)-f(x)-(~X,g(x)) \ (2.6) = (X,-g(x)) > 0, J where the last inequality comes from the self-duality of H. Therefore, equality holds throughout, and we have that x minimizes the function L ( - ; A ) over E and that (A, — g(x)) = 0 (hence (X,g(x)) — 0). This motivates the following definition and theorem. Definition 2.3.2 Let x G E be feasible for the convex program (2.3). We call A G H a Lagrange multiplier for x if x minimizes the function L( •; A) over E and (A, g(x)} = 0. Theorem 2.3.3 (Lagrangian sufficient conditions) If x G E is feasible for the convex program (2.3) and has a Lagrange multiplier A, then x is optimal. Proof. Assuming A is a Lagrange multiplier for x, equation (2.6) shows that ^f(x) = 3>(A), and the optimality of x follows. • 2.3.3 The K K T conditions U p unti l this point, we have not had to use the convexity of the problem (2.3) in the above results. However, using the facts that / is convex, g is iJ-convex, and H is a self-dual cone, we have the following result. Lemma 2.3.4 (Convexity of the Langrangian) Let E and Y be Euclidean spaces, H C Y a self-dual cone, f : E —> JR a convex function, and g : E —> Y an H-convex function. Defining L : E x Y —> IR by L(x;X) = f(x) + (X,g(x)), we have the following results: (a) If / i <H ^ and X G H, then (A,//) < (A, L>). (b) The function L( •; A) : E —> IR is convex for all X G H. Proof. Part (a) follows from the self-duality of H and the following implications: fi <H v =^ v — /i G H {X,u-n)>0 => (A,/i) < (X,v). 15 Chapter 2. Convex Analysis and Optimality Conditions For part (b), we let x , t /£E and t G [0,1]. Then L(tx + (1 - t)y; A) = f(tx + (1 - t)y) + (A, g(tx + (1 - t)y)) < tf(x) + (1 - + (A, tg(x) + (1 -= tL(x;\) + (l-t)L(y;\), where the inequality follows from the convexity of / , the if-convexity of g, and part (a) of this lemma. • Therefore, i f we are dealing wi th differentiable functions, and in particular, i f the function L(x; A) is differentiable with respect to x, then any cri t ical point of L( •; A) is in fact a global minimizer of L( •; A) over E . Thus, finding a solution to the following system of equations wi l l give us a feasible point x wi th Lagrange multiplier A, and hence an optimal solution to the convex program (2.3). V x L ( x ; A) = 0 -g(x) >H 0 A >H 0 (\g(x)) = 0 This system of equations is often known i n other settings as the Karush-Kuhn-Tucker (KKT) conditions (for example, [21, p. 328]). In the K K T conditions (2.7), V x denotes the derivative, or gradient, w i th respect to x, where the gradient of a function h : E —> IR at x G E , denoted Vh(x), is defined as the element a G E (if it exists) such that h'(x;d) = (a,d) for all d G E . Here h'(x;d) is the directional derivative of h at x in the direction d, and is defined as h'(x;d) := l im h(x + td)-h{x) i f this l imit exists. Thus, h has a (Gateaux) derivative S7h(x) at x i f h'(x; d) exists for al l d G E and i f h'(x; •) is a linear function on E. If S7h(x) exists for a l l x G E, then we say that h is differentiable. If V / i : E —> E is a continuous function, then we say that h is continuously differentiable. (2.7) Example 2.3.5 Consider the sum of squares function f(X) = ±\\AX - B\\2F for A G B G I R m x p , andX G l R n X p . Letting D G M n X p be some direction, we have f(X;D) = l i m / ( * + ^ ) - / P 0 t-*o+ t \ (A{X + W) - B, A(X + tD)-B)-\ (AX -B,AX- B) = l im i->0+ t t (AX - B, AD) + U2 (AD, AD) = l i m t->o+ t = (AT(AX -B),D). Therefore, Vf(X) = AT(AX - B). 16 Chapter 2. Convex Analysis and Optimality Conditions If we consider f defined over the Euclidean space Sn with A,B G I R m x ™ , then we require V / P O G Sn. Letting D G Sn and Z := AT{AX - B), we have f'(X;D) = (Z,D) and {\{Z + ZT),D) = \{Z,D) + \{ZT,D) = \(Z,D) + \(Z,D) = (Z,D), where the second equality follows from the symmetry of D. Since \{Z + ZT) G <Sn, we find that Vf(X) = ±(Z + ZT). • 2.3.4 Necessary conditions for optimali ty Now we know that i f we are able to solve the K K T system (2.7), then we have solved our optimization problem, but how do we know that the K K T system is i n fact solvable for our problem? That is, how do we know that i f an optimal solution exists, it must have a Lagrange multiplier? To answer this question, we state the Slater constraint qualification for problem (2.3): There exists x G E such that g{x) <// 0. (2.8) It turns out that the Slater condition is enough to guarantee the existence of a Lagrange multiplier for every optimal solution. Theorem 2.3.6 (Lagrangian necessary conditions) Suppose that x G E i s optimal for the convex program (2.3) and that the Slater condition (2.8) holds. Then there is a Lagrange multiplier for x. Before presenting a proof to this theorem, we need to develop some further concepts. The first of these is the value function v : Y —> [—oo, +oo] defined by v{y):=mf{f{x)\g{x)<Hy}, (2.9) so that v(0) is the optimal value of the convex program (2.3). We w i l l use the value function in the proof of Theorem 2.3.6 by showing that any subgradient of v at 0 is a Lagrange multiplier for any optimal x, where subgradient is defined as follows: Definition 2.3.7 Let h : Y —> ( — oo, +oo]. We say (f> G Y is a subgradient of h at y if (</>,y-y) < h(y) - h(y), for ally G Y We denote the set of subgradients of h at y as dh(y). For a convex function h : Y —> ( —oo,+oo], where convex here means h is convex on its domain, dom(/i) := {y G Y | h(y) < +oo}, the connection between the gradient and subgradients at a point can be seen in the following theorem. 17 Chapter 2. Convex Analysis and Optimality Conditions Theorem 2.3.8 (Borwein &: Lewis [6, p. 36]) If the function h : Y —> (—oo,+oo] is con-vex then any point y £ core(dom h) and any direction d £ E satisfy h'(y;d) = max{(0,d) | 4> £ dh(y)}. (2.10) In particular, the subdifferential dh{y) is nonempty. Here, the core of a set C C Y is the set of points y £ C such that for al l directions d £ Y we have y + td £ C for al l t > 0 small enough. For extended real valued functions h : Y —> [—oo, +oo] we say h is a convex function i f epi(h) = {(y,r)£YxTR\h{y)<r} is a convex set. Using this extended definition of convexity, we can show that the value function is actually convex. Proposition 2.3.9 The value function defined by (2.9) is convex. Proof. Let ( y i , n ) , ( y 2 , r 2 ) £ epi(v), t £ [0,1], and set (y,r) := « ( y i , n ) + (1 -t)(y2,r2). Then v{yi) < r\-> V{V2) < r2t and we want to show that v(y) < r. First of al l , we notice that v{yi) <n =>• mi{f(x) | g(x) <H yi} < n => 3XJ £ E such that g(xi) <H yi (2-H) for i = 1,2. Now we let x := fcri + (1 — t)x2, and the //-convexity of g implies that g(x) <H tg{x\) + (1 - t)g(x2) <H tyi + (1 - t )y 2 = y, so that v(y) < f(x). This fact and the convexity of / implies that v{v)<tf{Xl) + {l-t)f(x2). Taking the infimum over a l l x\ and x2 satisfying 2.11 gives v{y) < tv(yj) + (1 -t)v(y2) < tn + (1 - t)r2 = r. Therefore, epi(w) is a convex set, and thus v is a convex function. • A s we w i l l see in the proof of Theorem 2.3.6, the convexity of the value function v allows us to use the following result to show that the Slater condition (2.8) implies that v never takes on the value —oo. Lemma 2.3.10 (Borwein & Lewis [6, p. 44]) If the function h : Y —> [—oo, +oo] is convex and some point y £ core(dom/i) satisfies h(y) > —oo, then h never takes the value —oo. 18 Chapter 2. Convex Analysis and Optimality Conditions We are now in a position to prove the Lagrangian necessary conditions. Proof of Theorem 2.3.6. Recal l that we are assuming that x G E is optimal for the convex program (2.3) and that the Slater condition (2.8) holds: 3x G E such that g(x) <H 0. Since x is optimal, we have v(0) = f(x) > —oo. B y showing that 0 G core(domv), we can use Lemma 2.3.10 to show that v : Y —> (—oo,+co]. Thus, we need to show that for al l directions d G E we have 0 + td G dom(t;) for a l i i > 0 small enough. Let t ing d G E we have td G dom(w) <=> v(td) = inf{/(o;) | g(x) <H td} < +oo C(t) := {x G E | g(x) <n td} ^ 0. B y showing that 3e > 0 such that V i G (0,e) we have x G C(t), we shall have proved our claim that 0 G core(domu). First of al l , we wi l l show that 3 k G IN such that x G C ( | ) . Suppose, on the contrary, that x fi C(^) for al l k G IN. That is, the sequence defined by = \d — g(x) lies entirely in E \ H, which implies that — g(x) = l i m G c l ( E \ H) — E \ mt(H), wi th the last equality following k—»oo from the fact that H is closed. This gives us our contradiction, since — g(x) G 'mt(H). Lett ing e ^ , where x, G C(jjr), we w i l l now show that x G C(t) for a l l t G (0, e). This can be achieved by showing that the set T := {t > 0 | x G C(t)} is convex, and using the fact that 0, e G T. Let t i , t2 G T and a G [0,1]. We want to show that t := od\ + (1 — a)*2 G T . Since U G T ^ £ G C(<i) #(£) <H Ud, for « = 1,2, we have td — g(x) = at\d + (1 — a)t<id — g(x) = afad - g(x)) + {1 - a){t2d - g{x)) G H. Therefore, g(x) <H td, which implies t G T . Hence, T is convex, finally proving that 0 G core(domt;). Thus, by Lemma 2.3.10, we have v : Y -» (—oo, +oo], so we can apply Theorem 2.3.8 to get the fact that dv(0) ^ 0, which implies the existence of a subgradient —A G Y of v at 0. It remains to show that A is a Lagrange multiplier for x. That is, A G H, x minimizes L( •; A), and (\,g(x)) = 0 . We begin by showing that A G H, which, by the self-duality of H, is equivalent to showing ( A , y ) > 0 for a l l y G H. Let t ing y G H, we have that g(x) <H 0 implies g(x) <H y. Thus, v(y) = i n f { / ( x ) | g{x) <H y} < mi{f(x) | g(x) <H 0} = v(0). 19 Chapter 2. Convex Analysis and Optimality Conditions Since - A G dv(0), Definition 2.3.7 shows (-X,y - 0) < v(y) - v{0), so v{0) < v{y) + (X,y) < v(0) + (X,y), giving us (A, y) > 0 for al l y G H. To show that x minimizes L( •; A) and (A, g(x)) — 0, we again use the definition of subgra-dients, this time to get that ( — X,g(x) — 0) < v(g(x)) — v(0), so that v(0) < v(g(x)) + (X,g(x)) = i n f { / ( x ) | xeB}+(\,g(x)) < f(x) + (\,g(x)) for a l l x G E . Since x is optimal, we also have v(0) — / ( x ) , which implies that f(x) < f(x) + (X,g(x)) for a l l x G E . Using this fact, we find that f(x) < f(x) + (\,g(x)), so we get (\,g(x)) > 0. O n the other hand, (\,g(x)) < 0 since —g(x),X G H. Therefore, (A,g(x)) = 0, which implies that L(x;X) = f(x) + (X,g(x)) = m < f{x) + (\,g{x)) = L(x;X). Therefore, A is a Lagrange multiplier for x. • 2.3.5 Summary We wi l l now summarize the preceding results and introduce some new terminology which wi l l be used to guide our discussion. For our purposes here, we w i l l be assuming that the Lagrangian function is differentiable with respect to x. We first introduce the slack vector, s G Y , which we use to "add slack" to the inequality in problem (2.3), giving us the equivalent problem inf f{x) s.t. g(x) + s = 0, (2.12) S >H 0, sup L(x; X) s.t. VxL(x;X) = 0, (2.13) X>H 0. W i t h this slack vector, the K K T conditions for optimality can now be stated as V x L ( x ; A ) = 0, X>H0, g(x) + s = 0, s > H 0 , } (2.14) (X,s) = 0, with associated dual problem 20 Chapter 2. Convex Analysis and Optimality Conditions where the first two lines can be understood as dual feasibility and primal feasibility, respectively, and the last condition is often coined as complementary slackness. We say that a point (x, s, A) G E x Y x Y is primal-dual feasible if g(x) + s = 0, WxL(x; X) = 0, and s, A ># 0. Given a primal-dual feasible point (x, s ,A) , we know from Theorem 2.3.1 (Weak Duali ty) that the value of the objective function of the pr imal is always greater or equal to that of the dual, wi th equality implying that (x, s) is optimal for the pr imal problem and A is optimal for the dual problem. Let t ing (i{x,s, X) represent this duality gap, we find that n(x,s,X) := f(x) — L(x; X) = f(x) — f(x) — (X,g(x)) (2.15) = (X,-g(x)) = (X,s) > 0 wi th n(x, s, X) = 0 implying that (x, s, X) is optimal for problems (2.12) and (2.13). Further-more, i f (x, s) is optimal and the Slater condition holds for problem (2.12), then there exists A G Y such that (x, s, A) is primal-dual feasible and fi(x, s, A) = 0. Finally, we summarize these results i n the following theorem. T h e o r e m 2 . 3 . 1 1 Suppose the Lagrangian function for problem (2.12) is differentiable with respect to x and that the Slater condition (2.8) holds. Then x G E is optimal if and only if there exist points s, A G Y such that (x, s, A) is primal-dual feasible and fi(x, s, A) = 0, which is equivalent to (x,s,X) satisfying the KKT conditions (2.14). 2.4 Application to Semidefinite Constrained Least Squares We can now apply the results of the previous section to the three problems of interest in order to state the conditions necessary and sufficient for optimality. 2.4.1 T h e s y m m e t r i c semidef ini te least squares p r o b l e m ( S D L S ) A s we mentioned in Chapter 1, we would like to find a solution to the matrix equation AX = B, where A,BE M m x n and X G Sn, i n the least squares sense, but where we require that X be positive semidefinite. The S D L S problem can be stated formally as inf l\\AX-B\\F s.t. XhO [ ' which is a special case of problem (2.3) wi th E = Y = Sn, H = 5", f(X) = ^ | | A X " - B\\F, and g(X) = —X. We can use Proposit ion 2.2.2 to show that / is a convex function over <S", and we note that since g is clearly a linear function, it is necessarily if-convex. For this problem the Lagrangian function L : Sn x Sn —> IR is defined by L(X;A) = f(X) + (A,g(X)) = i\\AX-B\\2F-(A,X), 21 Chapter 2. Convex Analysis and Optimality Conditions with X derivative V * L ( X ; A) = \{Z + ZT) - A , where Z = AT(AX - B). Since the Slater condition (2.8) clearly holds for this problem (for example, X — I), we have that X G Sn is optimal for (2.16) i f and only i f there exists A G Sn such that the following K K T conditions are satisfied: AT(AX-B) = Z, \{Z + ZT) = A , (A,X) = 0, X, A h 0. We now state the following important fact, which is found as an exercise in [6, p. 108]. Proposition 2.4.1 (Semidefinite complementarity) If X,Y >t 0, then the following are equivalent: (i) (X,Y) = 0, (ii) XY = 0, (iii) \{XY + YX) = 0. Proof, (i) (ii): To show that (X,Y) = t r ( X Y ) = 0 implies XY = 0, we follow Todd's advice [27, p. 523] by considering the eigenvalue decomposition of X, X — UAUT, where U is an orthonormal matrix (UUT — I) and A is a diagonal matrix wi th the eigenvalues of X, along its diagonal. We first notice that ^ - - • ^ A r !> Xr+i — • • • — A n — 0, 0 = tr{XY) = tr(UAUTY) = tr{AUTYU) = (A , UTYU) n = £ Xi(UTYU)i Since Y is positive semidefinite, so is UTYU, and thus [UTYU)u > 0 for i = 1 , . . . , n. This implies that A, (UTYU)a — 0 for i — 1 , . . . , n , which tells us that (UTYU)a = 0 for i — 1 , . . . , r. Since UTYU is positive semidefinite wi th zeros along the diagonal in the first r positions, we must have " 0 0 UTYU = o y 22 Chapter 2. Convex Analysis and Optimality Conditions for some Y £ <S™ r . Let t ing A = D i a g ( A i , . . . , Xr), we also have A = A 0 0 0 and so AUTYU = 0, which in turn implies that XY = U{AUTYU)UT = U • 0 • UT = 0. (ii) =» (iii): Clearly, i f X Y = 0, then \(XY + YX) = \(XY + (XY)T) = 0. (iii) =» (j): This follows from the fact that tr(\(XY + YX)) = ti(XY). • Therefore, the K K T conditions for problem (2.16) are equivalent to the following nonlinear system of equations. AT(AX-_B) = Z \(Z + ZT) = A A X = 0 x, A y o It should be noted that although such a Lagrange multiplier type characterization of optimality was stated by Woodgate in [33], it was not stated using the Lagrange multiplier matrix A . For comparative purposes, we provide this characterization of Woodgate: (2.17) where L(X) . Y e 5+ and L(X) £ <S", := ATAX + XATA — ATB — BTA, := {X 6 SI | tr(XL(X)) = 0}. • We can clearly see why Woodgate's characterization of optimality is equivalent to (2.17), how-ever, as we w i l l see, using (2.17) w i l l be highly useful when discussing the interior-point algo-rithms of Chapter 3. Now for a note on whether or not the S D L S problem actually has an optimal solution. We are concerned about the existence of an optimal solution because the usefulness of the above K K T characterization of optimality only applies if such a solution exists, and, as we see from the following example, this is not always the case. Example 2.4.2 (Woodgate's counterexample, [4]) Consider the SDLS problem with A=[l 0 ] , B = [ - 1 1 ] , and X a b b c Then f(X) = i | | A X " - B\\2F = i[(a + l ) 2 + (b - l ) 2 ] and X h 0 if and only if a > 0 and ac — b2 > 0. This implies that f(X) > | ( a + l ) 2 > \ . Furthermore, if Xn — r I 1 n 1 n then Xn y 0 for all n 6 IN and l im f(Xn) = l i m k(-\ + l ) 2 = i . Therefore, the optimal value n—>oo n—>oo " is p = \ , but as we will now show, this value is not attained. 23 Chapter 2. Convex Analysis and Optimality Conditions Suppose X y 0 and f(X) = | [ (a + l ) 2 + (b — l ) 2 ] = \ . Since a > 0, it must be the case that a = 0 and b = 1, which implies that ac — ti2 = — 1 ^ 0, contradicting the assumption that X is positive semidefinite. Therefore, no X £ 5 2 satisfies f(X) = 5. • Originally, in [3, Theorem 3.1], Allwright claimed that the minimum in the S D L S problem always exists. However, after the discovery by Woodgate of the above counterexample, they jointly produced an erratum and addendum paper [4] where they point out the error in the proof of that theorem, and correct the statement to say that the minimum i n the S D L S problem exists when the matrix A has full column rank. Using the following theorem of Weierstrass, we w i l l present an alternate proof of this result. Theorem 2.4.3 (Weierstrass, [6, p. 4]) Suppose that the set D C E is nonempty and closed, and that all the sub-level sets of the continuous function f : D —> IR are bounded. Then f has a global minimizer. Here, the sub-level sets of a function / : D —> IR are defined to be the family of sets {x£D\ f(x) < a} (2.18) for a G IR. Lemma 2.4.4 Let f : TRnxp -> IR be the sum of squares function f{X) = ^ | | A X — B\\p, with A G H m x " and B G I R m x p . Then f has bounded sub-level sets if and only if A has full column rank. Proof. Since / is always nonnegative, we need only consider sub-level sets C{a) := {X G 1R" X P I f{X) < a} for a > 0. Let us first show that i f A does not have full column rank, then some of the sub-level sets of / are not bounded. For example, i f a > ^\\B\\F, we wi l l show that C(a) is unbounded. Since rank(A) < n, there exists a nonzero X G I R " x p such that AX — 0. Furthermore, letting t > 0, we also have A(tX) = 0, which implies that f(tX) = ^\\B\\F < a, and so tX G C(a) for a l l t > 0. However, since \\X\\p > 0, we find that l im = l im = +00. Therefore, C(a) is unbounded. Now suppose that A has full column rank. Let a > 0 and X G C(a). Then | | | A X — B\\p < a, which implies that \\AX — B\\p < \/2a. This in turn implies that | | A X | | F = \\AX-B + B\\F < \\AX - B\\F + \\B\\F < y/2a+\\B\\F, 24 Chapter 2. Convex Analysis and Optimality Conditions and so tr(XTATAX) = \\AX\\2F < /? := (V2a+\\B\\F)2. Since A has full column rank, the n x n symmetric matrix ATA is positive definite and has eigenvalue decomposition A1 A = QTAQ, where Q is orthonormal and A is diagonal wi th the eigenvalues of ATA, A i > • • • > A n > 0, along its diagonal. Let t ing Y = QX (then X = QTY and \\Y\\F = ||-X"||.p), we have P > ti(XTATAX) = t r ( Y T A Y ) = t r ( A Y Y r ) n = £ \i(YTY)ii i=i n > J2Xn(YTY)u i=\ A „ t r ( Y J Y ) = Xn\\X\\2F, which implies that \\X\\F < M := V p / A ^ , and so C(a) is bounded. • Therefore, i f A has full column rank, then f{X) — \\\AX - B\\2F must also have bounded sub-level sets over any subset of TRnxp. Since <S™ is nonempty and closed, and / is a strictly convex and continuous function, Theorem 2.4.3 gives us the desired result. T h e o r e m 2.4.5 If A has full column rank, then the SDLS problem has a unique optimal solu-tion. 2.4.2 T h e n o n s y m m e t r i c semidef in i te least squares p r o b l e m ( N S - S D L S ) It is sometimes desirable to find the least squares solution to the matrix equation AX = B, where A,B £ JRmxn, X £ IR™*™, and requiring that X be "positive semidefinite" in the sense that vTXv > 0 for a l l v £ IR™. Since vT(l(X+XT))v = \vTXv + \vTXTv = \vTXv + \vTXv = vTXv, this requirement turns out to be equivalent to requiring that the symmetric part oi X, \{X + XT), be positive semidefinite. This specific problem, which was the starting point for this thesis, does not seem to have been previously studied. We can state the N S - S D L S problem as follows. inf \\\AX-B\\2F s.t. l(X + XT)hO { y j Here we have the instance of problem (2.3) where E = I R " X " , Y = <Sn, H = 5?, f{X) = \\\AX - B\\2F, and g(X) = -\{X +XT). Proposit ion 2.2.2 states that / is convex over M" x ™, and g being a linear function implies that it is an TJ-convex function. 25 Chapter 2. Convex Analysis and Optimality Conditions This problem then has the Lagrangian function L : ] R n x n x 5 " -> IR defined as L(X;A) = \\\AX-B\\F - (A,\{X + XT)) = \\\AX-B\\F-\(A,X)-\(A,XT) = \\\AX-B\\F-(A,X), and therefore satisfies VXL{X; A) = AT{AX - B) - A. Thus, wi th Proposit ion 2.4.1 in mind, and given that X = I implies that the Slater condition holds, we find that X G 1R™X" is optimal for the N S - S D L S problem i f and only if there exists a slack matrix S G «Sn and a Lagrange multiplier matrix A G <S" such that the following K K T conditions hold. AT(AX — B) = A } 5, A >: 0 J Again , we can use Theorem 2.4.3, Lemma 2.4.4, and the fact that the set of feasible points, D := {X G I R " X " | \{X + XT) h 0}, is nonempty and closed to get the following result. T h e o r e m 2.4.6 If A has full column rank, then the NS-SDLS problem has a unique optimal solution. 2.4.3 The linear matrix inequality least squares problem ( L M I - L S ) The th i rd problem under consideration is actually a generalization of the first two problems. Here we seek a least squares solution to the linear equation Ax = b, where A G I R m x " , x G IR", b G I R m , and we require that x satisfies a linear matrix inequality, i.e., that a linear combination of symmetric matrices be positive semidefinite. More specifically, the L M I - L S problem is described as inf \\\Ax-b\\l ( 2 2 1 ) where s.t. tCx -< C ICx = XiKi i=l and Ku...,Kn,C G Sk. In terms of problem (2.3), we have E — IR", Y = Sk, H = <S* f(x) = ^\\Ax — &H2, and g(x) = ICx — C. The constraints here are defined by an affine function (a linear function plus a constant), which satisfies g(tx + (1 — t)y) = tg(x) + (1 — t)g(y) for al l x,y G IR" and t G [0,1]. Therefore, g is an if-convex function, and the convexity of / follows from Proposit ion 2.2.2. 26 Chapter 2. Convex Analysis and Optimality Conditions To show why this formulation is equivalent to the previous two problems, consider, for example, the N S - S D L S problem wi th n = 2. Then X = Xi x 3 X2 Xi and we have the constraints Ux + xT) = 1 (x2 + s3) X l 2 i ( x 2 + x 3 ) x 4 y 0, which is equivalent to the constraints of the L M I - L S problem, with x = ( x i , x 2 , X 3 , X i ) T and 0 -I - 1 0 0 0 , K2 = K3 = 0 KA 0 0 0 - 1 , and C If the matrices associated wi th the N S - S D L S problem are A and B = bi b2 0 0 0 0 , then setting A = A 0 0 A and b bi b2 implies that ^||^4x — b\fe = ^\\AX — B\\F. In a similar manner, it can also be shown that the S D L S problem is a special case of the L M I - L S problem. The observation that this third problem generalizes the first two problems already gives us more than enough justification for studying it. However, a further reason to study the L M I - L S problem is that its constraints are the same as those arising in the area of (linear) semidefinite optimization (see the recent survey paper by Todd, [27]). A semidefinite programming problem (SDP) is described as inf (C, X) s.t. K*X = d, (2.22) x y 0 where K.*X = ((Ki,X))n , and whose dual problem is described as sup dTX s.t. K\ < C. (2.23) It is also instructive to note that the K K T conditions for the S D P problem are described as (2.24) 0 + 5 = C, K.*X_ = d, XS = 0, S,X y 0. The linear functions K : IR™ —> Sk and K* : Sk —» IR™ described here are adjoints of each other, in that (A, /Cx) = (AC*A, x) for al l A 6 Sk and x G IR™. This is in fact easy to show: n (A,/Cx) = ^ ( A . t f i ) i = l = x r (AC*A) = (AC*A,x). 27 Chapter 2. Convex Analysis and Optimality Conditions The fact that the functions K. and /C* are adjoints allows us to show that the Lagrangian function for problem (2.21), L : IR™ x S f c - > ] R , satisfies L(x;A) = \\\Ax-b\\l + (A,K,x-C) = ±\\Ax-b\\% + {lCmA,x)-(A,C), which in turn implies that VxL{x; A) = AT(Ax - b) + K.*A. Unlike the problems S D L S and N S - S D L S , the Slater condition (2.8) does not always hold for the L M I - L S problem. However, if the Slater condition does hold for this problem, then we have that x £ H™ is optimal i f and only i f there exist S, A £ Sk such that the K K T conditions (2.25) hold. AT(Ax -b) + IC*A = 0 ICx + S = C AS = 0 S, A y 0 For example, i f the set of matrices {K\,..., Kn} spans Sk (i.e., span{Ku...,Kn} = Sk), then the function K. maps IR™ onto Sk. In this case, the Slater condition (2.8) clearly holds: {x £ W1 | ICx -< C} ^ 0. For this more general problem, we have to be a little more careful when stating the existence theorem. The set of feasible points, D := {x £ JRn | ICx z< C}, can be shown to be closed using the fact that /C is continuous and Sk is closed. However, we do not know i f D is nonempty. These facts give rise to the following theorem. Theorem 2.4.7 If A has full column rank and the set of feasible points is nonempty, then the LMI-LS problem has a unique optimal solution. Again we notice that a useful assumption which guarantees that the set of feasible points is nonempty is to have span{.K"i , . . . ,Kn} = Sk. More on the assumption span{K"i , . . . , Kn} = Sk Twice now, we have found the assumption sp<m{K1,...,Kn} = Sk (2.26) to be very useful. First of al l , when assuming that A has full column rank, assuming (2.26) implies that the L M I - L S problem has a unique optimal solution. Secondly, assuming (2.26) implies that optimality is characterized by the K K T conditions (2.25). We now take some time to investigate some other useful implications of assuming (2.26). We begin by giving the following definition. 28 Chapter 2. Convex Analysis and Optimality Conditions Definition 2.4.8 Let E and Y be Euclidean spaces, and let T : E -> Y be a linear function. We define the null space of T as null T = {x G E | Tx = 0} and the range ofT as range T = {y G Y | Tx = y, for some x G E}. Given a subset fl o /E, we define the orthogonal complement of fl as flx = {x G E | (x,v) = 0, for all v G fl}. We immediately have the following familiar result from linear algebra. Proposition 2.4.9 Let 1C : ] R n -> Sk be defined as in the LMI-LS problem (2.21). Then null(K.*) = (range AC) X . Proof. The result follows from the following series of implications. t7Gnull( /C*) AC*[/ = 0 <S> (K*U,x) = 0, for a l l x G E T (U, Kx) = 0, for all x G IR™ ^ (U, V) = 0, for all V G range AC O i7 G (range AC)"1 • We wi l l find the following corollary to the preceding result to be extremely useful in the upcoming discussion. Corollary 2.4.10 Let K, : IR™ -» Sk be defined as in the LMI-LS problem (2.21). Assume that (2.26) is true; that is, s p a n { i f i , . . . , Kn} = Sk. Then \k(k + 1) < n . Moreover, if A G Sk is nonzero, then AC* A ^ 0. Proof. It is clear that the dimension of s p a n j i f i , . . . , Kn} can be at most n. However, by assumption, the dimension of s p a n j i ^ i , . . . , Kn} equals \k(k + 1), which gives us our first claim. B y the definition of AC, we have range AC = span{ i l " i , . . . ,Kn}. Thus, range AC = Sk, which implies that null(AC*) = (range AC)X = {0}. Therefore, i f A G Sk is nonzero, then A £ null(AC*), which implies that AC*A ^ 0 . • 29 Chapter 2. Convex Analysis and Optimality Conditions 2.5 The Semidefinite Linear Complementarity Problem A fantastic connection between our three semidefinite least squares problems was discovered while researching the interior-point methods which we wi l l be using to compute solutions to our problems (see Chapter 3). It turns out that each of our three problems can be stated as a semidefinite linear complementarity problem ( S D L C P ) , which was first introduced by Ko j ima , Shinoh, and Hara [13] as an extension of the L C P , or (monotone) linear complementarity problem (2.27), and as a generalization of the SDP, or semidefinite programming problem (2.22). For comparative purposes, we state the definition of the L C P (see [31, Chap. 8]), which is the problem of finding vectors x and y in IR™ such that y = Mx + q, x>0, y > 0, and xTy = 0, (2.27) where q is a vector i n IR™ and M is an n x n positive semidefinite (but not necessarily symmetric) matrix; i.e., uTMu > 0 for al l u G IR™. O n the other hand, the S D L C P is defined as the problem of finding matrices X and Y in J?" such that (X,Y)eC, XhO, YhO, and (X,Y)=0. (2.28) Here £ is a monotone ^n(n+ l)-dimensional affine subspace of <S™ x Sn, where monotone means that £ satisfies (X1 -X,Y' -Y)> 0, for a l l (X',Y'), (X,Y) G £ . (2.29) If strict inequality holds in (2.29) for al l distinct pairs (X',Y') and ( X , V ) G £ , then we say that £ is strictly monotone. Furthermore, we say that a monotone subset A of <S™ x <S™ is maximal i f there is no monotone subset of <S™ x Sn which properly contains A. The following result from a paper by Shida, Shindoh, and K o j i m a [26] provides insight into the need for £ to have dimension |n(n + 1) = dim<S™. Lemma 2.5.1 ([26, p. 388]) Let E be a Euclidean space. Suppose £ is a monotone affine subspace o / E x E . Then £ is maximal if and only if d im £ = d i m E . The next result, which is a slightly modified version of a result from the same paper, illustrates the connection between the definitions of the L C P and the S D L C P , and w i l l be a useful tool for showing that the S D L S (2.16), N S - S D L S (2.19), and L M I - L S (2.21) problems can each be stated as an S D L C P (2.28). Corollary 2.5.2 ([26, p. 389]) Let E be a Euclidean space. Suppose that £ is an affine subspace o / E x E given by £ = {(x,y) G E x E | y = Mx + q], (2.30) where M. : E —> E is a linear function and q G E . Then £ is monotone and maximal if and only if Ai is positive semidefinite; i.e., (u,M.u) > 0 for every t i £ E . 30 Chapter 2. Convex Analysis and Optimality Conditions We conclude wi th some further notation. We define the set of feasible solutions, the set of interior feasible solutions, and the set of solutions of the S D L C P (2.28) as c+ = {(x,Y) €£ | x y o, Y y o}, (2.31) C++ = {{X,Y) eC\XyO, YyO}, (2.32) £* = {(X,Y)eC+ I (X,Y) = 0}, (2.33) respectively. Thus we can state the S D L C P as the problem of finding a pair of matrices (X, Y) i n £* . 2.5.1 T h e S D L S p r o b l e m is an S D L C P Recall that the semidefinite least squares (SDLS) problem (2.16) is defined as inf s.t. \\\AX B\\F x yo where A, B G TRmxn and X G Sn. We were able to show in Section 2.4.1 that the S D L S problem can be solved by finding symmetric matrices X and A satisfying the following K K T equations: AT{AX -B) = (X,A) = x, A y z, A , 0, 0. (2.34) Lett ing M : Sn ->• Sn and Q G Sn be defined as MX = \(ATAX + XATA), and Q = —\(ATB + BT A), and letting £ = {{X,A) G 5n x Sn | A = MX + Q), we see that (X, A) satisfies the K K T equations (2.34) i f and only i f (X, A) satisfies [X,A)eC, XhO, AyO, and <X,A) = 0. (2.35) Since M is clearly linear, by showing that M is positive semidefinite, we can use Lemma 2.5.1 and Corollary 2.5.2 to conclude that the S D L S problem (2.16) is indeed an S D L C P (2.28). To show that M is positive semidefinite, we first let U G Sn. Then (U,MU) = (U,±(ATAU + UATA)) = \[tr{UATAU)+tx{UUATA)] = ±[tr(UATAU)+tr(UATAU)} = \\AU\\2F > 0, where the third equality follows from the fact that tr((VT) = tr(VU) for a l l V G l R n x n (see [27, p. 521]). Thus, M is positive semidefinite. We also notice that i f A has full column rank, and U is nonzero, then the last inequality would be strict, which shows that M is positive definite in this case. 31 Chapter 2. Convex Analysis and Optimality Conditions 2.5.2 The N S - S D L S problem is an S D L C P We stated the nonsymmetric semidefinite least squares (NS-SDLS) problem (2.19) as inf l\\AX-B\\2F s.t. \(X + XT) h 0 ' where A,B € ] R m x ™ and X G IR™*™. Recall that in Section 2.4.2, we showed that we can solve the N S - S D L S problem by finding symmetric matrices X and A satisfying the following K K T equations: AT(AX-B) = A, \ \{X + XT) = 5 , (A, S) = 0, A , s y o. (2.36) To show that the K K T equations (2.36) can be stated as an S D L C P (2.28), we assume that A has full column rank and let M : Sn —> Sn and Q G Sn be defined as MA = | ( G _ 1 A + A G - 1 ) , and Q = ^(G~1ATB + BTAG~1), where G = ATA is positive definite. Lett ing C = {(A, S) G Sn x Sn | S = MA + Q}, we see that the equations (2.36) imply ( A , S ) G £ , A ^ O , ShO, and (A, S) = 0. (2.37) Conversely, i f (A, S) satisfy (2.37), defining X := G _ 1 ( A + ATB) gives us a solution of (2.36). Aga in we see that M is clearly linear, and by showing that M is positive definite, we can use Lemma 2.5.1 and Corollary 2.5.2 to conclude that the N S - S D L S problem (2.19) is also an S D L C P (2.28). To show that M is positive definite, we first let U G Sn be nonzero. Then (U,MU) = (U^iG^U + UG-1)) = ^[ tr^G- ' f /J+tr f f /PG- 1 ) ] = triUG^U) = \\VG^U\\2F > 0, where we used the same trick as before to get the third equality, and the inequality follows from the fact that G _ 1 is positive definite, and thus VG_1 is positive definite. 2.5.3 The L M I - L S problem is an S D L C P A s before, we state the linear matrix inequality least squares problem (2.21) as, inf i | | A B - 6 | | | s.t. K,x<C ' 32 Chapter 2. Convex Analysis and Optimality Conditions where ICx = y^yXjKj, i=i and Ki,..., Kn, C G Sk. Here we also have A G R m x " , b G IR™, and x G IR™. In this context, we make the following assumptions: 1. A has full column rank; and 2 . the set of strictly feasible points, {x G IR™ | ICx -< C}, is nonempty. Under these assumptions, we know from Section 2 . 4 . 3 that there is a unique optimal solution to the L M I - L S problem, and that to find this solution we need only find a vector x G IR™ and matrices S, A G Sk which satisfy the following K K T equations: AT{Ax-b) + K*A = 0, 1 ICx + S = C, ( A , 5) = 0, A , s y o. ( 2 . 3 8 ) We would now like to show that these K K T equations ( 2 . 3 8 ) can be stated as an S D L C P ( 2 . 2 8 ) . We begin by rewriting the first equation of ( 2 . 3 8 ) as x = G-l{ATb-IC*K), where G = ATA, and substituting into the second equation of ( 2 . 3 8 ) to get S = KG^Vk + {C- KG~lATb). Thus, i f we define M : Sk -» Sk and Q G Sk as MA = / C G " 1 / C * A , and Q = C - ICG~lATb, and let £ = { ( A , S) G Sn x 5 " | S = MA + Q}, then a solution of the K K T equations ( 2 . 3 8 ) gives us a pair ( A , S) which satisfies ( A , S) G C, A y 0, S y 0, and ( A , S) = 0. ( 2 . 3 9 ) Conversely, i f (A, 5) satisfy ( 2 . 3 9 ) , defining x := G~1{ATb - K* A) gives us a solution of ( 2 . 3 8 ) . Since M is composed from linear functions, it too is linear. Now we would like to show that M is also positive semidefinite. Lett ing U G Sk, we have (U,MU) = (U,KG~lK*U) = {K*U,G~lIC*U) = vTG~1v > 0, 3 3 Chapter 2. Convex Analysis and Optimality Conditions where v = K*U G IR™. Thus, M is positive semidefinite. Also, i f we assume that s p a n j ^ a , . . . , ^ } = Sk, then having U ^ 0 implies that v 0 (see Corollary 2.4.10), which then makes the inequality above strict by the fact that G~l is positive definite; therefore, under this assumption, M. is positive definite. Since M. is positive semidefinite, we know from Lemma 2.5.1 and Corollary 2.5.2 that C is indeed a monotone \k(k + l)-dimensional subspace of Sk x Sk, showing that we have stated the L M I - L S problem (2.21) as an S D L C P (2.28). 34 Chapter 3 Interior-Point Methods and Algorithms 3.1 Overview Based on the results of Chapter 2 that describe when the S D L S (2.16), N S - S D L S (2.19), and L M I - L S (2.21) problems have a unique optimal solution, we w i l l hence forth assume that the coefficient matrix A in these problems has full column rank and that {x € IR™ | Kx -< C} ^ 0 in the L M I - L S problem. Using these facts, we are now interested in describing algorithms to generate a close approximation to the unique optimal solution of each of these problems. According to Nocedal and Wright [21, p. 7], the basic idea behind al l optimization algo-rithms is to start at some ini t ia l guess and iterate through a sequence of points unti l we are "close enough" to an optimal solution of the problem. Many of these optimization algorithms can be seen as following the framework provided i n Figure 3.1; i n each iteration they first choose a direction i n which to move and then they choose the distance to travel. Interior-point methods for constrained convex optimization problems do exactly this, wi th the additional requirement that a l l iterates remain in the interior of the feasible region. Depending on whether the iterates are pr imal feasible points, dual feasible points, or primal-dual feasible points, an interior-point method w i l l be classified as a pr imal method, a dual method, or a primal-dual method. We also find that the most important interior-point methods fall under the classes of path-following methods or potential-reduction methods. A l g o r i t h m 3.1 G i v e n an ini t ia l point i £ E . W h i l e x is not "close enough" to an optimal solution, do 1. F i n d a suitable direction d £ E . 2. F i n d a suitable step length 9 > 0. 3. Update x := x + Od. E n d Figure 3.1: A framework for many optimization algorithms 35 Chapter 3. Interior-Point Methods and Algorithms The path-following methods for convex programming problems use a path defined in the feasible region that traces a route to the optimal solution, and, at the same time, stays as far away from the boundary of the feasible region as possible. B y following this path, the iterates are guided toward the solution of the problem before getting too close to the boundary. This technique is used because it was noticed that if iterates are st i l l far from the solution, but close to the boundary, then taking steps toward the solution can be hindered by the requirement that al l iterates remain in the interior of the feasible region. This situation can force us to choose very small step lengths, making the overall progress of the algorithm extremely slow. See [21, p. 398] for a description of this behaviour in the case of linear programming. Potential-reduction methods take a different approach by defining a potential function that is small both at the optimal solution and when we are far away from the boundary. B y choosing directions and step lengths wi th the purpose of reducing the potential function at each step, our iterates wi l l tend toward the solution of the problem while staying away from the boundary of the feasible region. See [30] for a description of a potential-reduction method used for solving the S D P problem (2.22). For our purposes here, we wi l l only be considering primal-dual path-following methods for solving the S D L S , N S - S D L S , and L M I - L S problems. A s our guidance, we wi l l be looking at the primal-dual path-following methods used for solving the S D P problem (2.22), which are described in [27], and the same methods for solving the nonnegative least squares problem ( N N L S ) , i n f 111 4<r- _ Ml? (3.1) inf i | | A c - 6 | | | s.t. x > 0, which are described in [23]. We do this because, in many ways, the S D L S and N S - S D L S problems are extensions of the N N L S problem from the vector case to the matrix case. Further, the L M I - L S problem can be seen as a combination of the S D P and N N L S problems, wi th the constraints coming from the S D P problem and the objective function coming from the N N L S problem. In fact, we can also notice many similarities in the K K T conditions for the N N L S problem, AT(Ax-b) = A, } XiX{ — 0, for i = 1 , . . . , n , > (3-2) x, A > 0, J to the K K T conditions (2.17) and (2.20). Notice that the conditions (3.2) follow from applying the results of Chapter 2 to problem (2.3) wi th E = Y = IR", H = 1R™_, f(x) = \\\Ax - b\\%, and g(x) = —x, and further that i f x, A > 0, then (A, x) = \Tx = 0 i f and only i f AjXj = 0 for each i. It should also be noted that the primary reason for considering primal-dual path-following interior-point methods for solving the S D L S , N S - S D L S , and L M I - L S problems is the success these methods have had for solving the N N L S problem (see [23]) and especially for solving the S D P problem (see [27]). In particular, these methods are not only efficient i n practice, they are also efficient in theory in that they have a provably polynomial complexity. That is, these methods have a worst-case running time that is a polynomial function of the size of the problem (see [20] and [24]). This chapter is summarized as follows. We begin by discussing the log barrier problems associated wi th each of our three problems in Section 3.2, and show how solutions to these 36 Chapter 3. Interior-Point Methods and Algorithms problems satisfy systems of equations which define our central path. We then provide a uniform treatment of the existence of the central for each of our three problems under the setting of the S D L C P (2.28) in Section 3.3. The existence of this central path which traces a route to the optimal solution provides us with the motivation for discussing in Section 3.4 the interior-point methods under consideration. After providing some convergence results for these interior-point methods in Section 3.5, we discuss the issues involved in implementing these methods in Section 3.6. 3.2 Log Barrier Problems Consider the function F : <S™ —> ( — oo, +oo] defined by F(X) = \ " l n d e t * ' (3-3) [ +oo, otherwise. Notice that F has the barrier property for <S™: as X G <S™+ approaches the boundary of <S™, d e t X > 0 and approaches 0, and thus — l n d e t X approaches +oo. The function F defined here is known as the logarithmic barrier function for the cone <S™. A s we w i l l see, we w i l l be able to use this function to impose the positive semidefinite constraints in our optimization problems. There is also a logarithmic barrier function for the cone IR™, which is the function F : IR™ -> (-oo, +oo] defined by 53" (3-4) F{x) = { ^ = 1 l n ^ , ifx>n +oo, otherwise. A s put forth in the highly significant book of Nesterov and Nemirovskii [20] (and more recently i n the work of Renegar [24]), at the heart of many interior-point methods lies a self-concordant barrier function which is responsible for the rate of convergence of the method. The rate of this convergence is in turn based on the parameter of this function (which is referred to as the complexity value in [24]), from which it is implied that the smaller this complexity value, the faster the convergence (see [20, p. 42] and [24, pp. 35, 39]). The functions defined in (3.3) and (3.4) are special types of self-concordant barrier functions in that they have the smallest possible complexity value over a l l such functions for <S™ and IR™ (see [20, pp. 42, 198] and [24, p. 40]). We w i l l now follow Renegar's geometric presentation and define these terms for a general self-dual cone i f in a Euclidean space E. Firs t of a l l , we need to define the second derivative, or Hessian, of a differentiable function / : U —> IR, where U is an open subset of E. We denote the Hessian of / at x G U as the linear function S72f(x) : E -> E (if it exists) which satisfies l i m i!v/(3H-») •  y i x ) v 2/(sHI = o. I M H O If V 2 / ( x ) exists for al l x G U, then we say that / is twice differentiable. If the function x i—^ V 2 f{x) is a continuous function, then we say that / is twice continuously differentiable, in which case, the function V 2 f(x) is self-adjoint for a l l x G U (i.e., (u, V 2 / ( x ) w ) = (V2f(x)u, v) for a l l u,v G E) [24, p. 9]. 37 Chapter 3. Interior-Point Methods and Algorithms In Theorem 3.2.3 we wi l l show that the derivative of the function F defined in (3.3) is V . F ( X ) = — X - 1 for al l X G <S™ + - Just for interest, we wi l l mention without proof (see [24, p. 9]) that second derivative of this function at X G 5 ™ + is the linear function V 2 F ( X ) : <S™ —> Sn defined by V2F(X)V = X^VX'1. It is interesting to note the similarity which these formulas for V F ( X ) and V 2 F ( X ) have to the well-known scalar case: i f f(x) = — \n(x), then f'(x) = —x~l and f"(x) = x~2. Definition 3.2.1 Let E be a Euclidean space, H C E be a self-dual cone with nonempty in-terior, and F : int i f —> IR be a twice continuously differentiate strictly convex function with V 2 i ? ( x ) positive definite for all x G int i f (i.e., (u,S72F(x)u) > 0 for all u ^ 0). The local inner product at x G int i f is defined as (u,v)x:=(u,V2F(x)v), for all u, v G E . Furthermore, we define the local norm at x G int i f as \\u\\x := y/(u, u)x for all u 6 E . Using the local inner product at x G int i f to define the gradient of F at x G int i f , we get the local gradient at x of F at x, denoted as VFx(x), and which equals V2F(x)~1\7F(x). The function F is said to be self-concordant if the following hold (see [24, p. 58]): 1. For all x,y G int i f , if \\y — x\\x < 1, then \\v\\ 1 TTTT < Ti F T ) for all v ^ Q \m\x i - | | y - z | | x 2. F has the barrier property for H; that is, if a sequence {x^} C int i f converges to a point on the boundary of H, then F{x^) —> +oo. The function F is said to be a (self-concordant) barrier function if F is self-concordant and u F := sup | | V F x ( x ) | | 2 < +oo, xGint H where •uV is called the complexity value of F. We wi l l now state the following facts from [20] and [24]. Theorem 3.2.2 Any self-concordant barrier function F for S™ or IR™ must satisfy ftp > n. The functions defined in (3.3) and (3-4) are self-concordant barrier functions for 5 " and IR™, respectively, and each have complexity value $F = n. Therefore, it is in this sense that (3.3) and (3.4) are the best possible self-concordant barrier functions for <S™ and IR™. In fact, we also have the following important result which can be found in [27], and for reasons of interest we wi l l also present a proof. 38 Chapter 3. Interior-Point Methods and Algorithms T h e o r e m 3.2.3 The logarithmic barrier function F for S™, defined by (3.3), is a strictly con-vex function over <S™+ with derivative VF(X) = — X - 1 for X £ <S™_|_. P r o o f . Here we present a different proof than the one given in [27], and we begin by showing that V / ( X ) = -X-1 for X £ S £ + . It is a fairly well-known fact that V d e t ( A ) = a d j ( A ) T for A £ IR™*™, where adj(A) = (m^) is the n x n adjugate matrix defined by rmj = detMji and Mji is formed by deleting row j and column i from A. This can easily be seen by considering the partial derivative of det(A) wi th respect to a,j, which we can calculate by expanding the determinant along row i: n det(A) — "Y^dikmki = a^m^ + ^2aikmki. k=l k^j Since neither mji nor the summation term at the end of this expression depend on a^-, we have 8 f f ^ = mji, and the result follows. If A is invertible, we also have that A-1 = d e t ^ a d j ( y l ) , and so V det(A) = det(A)A-T. For X £ this result becomes V d e t ( X ) = d e t ( X ) X _ 1 , and since F(X) = — l n d e t X , we apply the chain rule to get = d5(]?)V d c t<*> -We follow the advice given in an exercise from [6, p. 40] to show that F is strictly convex over <S™+. There it states that F is convex i f and only i f ( V F ( X ) - V F ( Y ) , X - Y) > 0 for a l l X, Y £ S$+, and strictly convex i f and only i f the inequality is strict whenever X ^ Y. Let t ing X, Y £ <S™+ such that X ^ Y, we have {VF{X)-VF{Y),X-Y) = (Y~l -X~\X -Y) = tviY^X) - 2tr(J) + t r ( X _ 1 y ) = tr (y/XY^y/XJ + tr ( v / X ~ V v / X ~ 1 ) - 2n > 0, where the inequality comes from the fact [6, p. 8] that t r (Z) + t r ( Z - 1 ) > 2n for a l l Z £ <S™+ with equality i f and only i f Z = I. • 3.2.1 The SDLS log barrier problem Notice how we can use the barrier function F defined in (3.3) along wi th a positive parameter T to impose the positive semidefinite constraints in the S D L S problem: min \\\AX-B\\1 + TF{X) (3.5) 39 Chapter 3. Interior-Point Methods and Algorithms Thus we get an unconstrained problem (3.5) called the SDLS log barrier problem, whose solution is necessarily positive definite. If X G <S™+ is such a solution, it is clearly characterized as a crit ical point of the strictly convex objective function f{-) + rF(-), where f{X) = \ \ \ A X - B \ \ F . That is, V(f + rF)(X) = Vf(X) + rVF(X) = \{Z + ZT) -TX-1 = 0, where Z = AT(AX - B). Let t ing A = TX~1, we have the following necessary and sufficient conditions for an optimal solution of (3.5). AT{AX-B) = Z ) X, A >- 0 J If (X, A) satisfies (3.6), then it is a primal-dual feasible point wi th duality gap / i ( X , A ) = <A,X> = tr(AX) = nr. We immediately notice the similarity of conditions (3.6) to the K K T conditions (2.17) for the S D L S problem. In Section 3.3, we wi l l show that there is a unique solution (XT, A T ) to (3.6) for each T > 0, and that as r approaches 0, (XT,AT) approaches the unique optimal solution to the S D L S problem. 3.2.2 T h e N S - S D L S log ba r r i e r p r o b l e m Again we use the barrier function for <S™ from (3.3) to define the NS-SDLS log barrier problem with positive parameter T : min \\\AX-B\\1 + TFC1(X + XT)) ( 3 . 7 ) Lett ing f{X) = ^\\AX — B\\F and g(X) = \(X + XT), we see that the function which we are minimizing, /(•) + rF(g(-)), is strictly convex. This can be shown by noticing that g{tX + (l-t)Y) = tg(X) + {l-t)g(Y) for a l l X,Y G M n x n and t G [0,1], which implies that F o g is convex on its domain. Since the sum of a strictly convex function and a convex function must be strictly convex, we find that we are minimizing a strictly convex function. A matrix X G JRNXN is a solution to (3.7) i f and only i f it is a cri t ical point of the function /(•) + rF(g(-)), and S = \{X + XT) is necessarily positive definite. First we must determine 40 Chapter 3. Interior-Point Methods and Algorithms W(F o g)(X). Let D G l R n x r i be a direction, then using the chain rule and the fact that g is linear, and so g'(X;D) = g(D), we get (Fog)'(X;D) = F'(g(X);g'(X;D)) = (VF(g{X)),g'(X;D)) = ( - ( i ( X + XT))-\$(D + DT)) = (~S-\D), where the last equality follows from the symmetry of — 5"" 1. Therefore, V(F o g)(X) = — and we have V ( / + T F O 5 ) ( 1 ) = Vf(X) + TV(Fog)(X) = AT(AX — B) — r 5 _ 1 = 0. Now we let A = TS~X, and we get that X is optimal for problem (3.7) i f and only i f it satisfies the following conditions. AT(AX-B) = A ' \(X + XT) = 5 1 , s AS = rl \ [6*> S, A >- 0 Again we have that the conditions for optimality of the log barrier problem are remarkably similar to the K K T conditions (2.20) for the original problem. Aga in we notice that i f ( X , S, A) satisfies (3.8), then it is a primal-dual feasible point of the N S - S D L S problem wi th duality gap / / ( X , 5 , A ) = (A, S) = nr. Moreover, as we w i l l see i n Section 3.3, for each r > 0, there is a unique solution ( X T , 5 T , A T ) to (3.8), and that this solution approaches the unique optimal solution to the N S - S D L S problem as r -> 0. 3.2.3 T h e L M I - L S log ba r r i e r p r o b l e m We now define the LMI-LS log barrier problem using the logarithmic barrier function for S+ with r > 0 as follows. min \\\Ax-b\\l + TF{C-lCx) (3.9) If we let f(x) = \ || Ax - b\\?, and g(x) = C — ICx, then x G IR" is a solution to problem (3.9) i f and only i f S = C — ICx is positive definite and a; is a crit ical point of the function f{-) + TF(g(-)), which can be shown to be strictly convex by a similar argument as was used for the N S - S D L S log barrier problem (3.7). Let us first compute V(F o g)(x) by considering a direction d G IR" and using the chain rule wi th the fact that g'(x; d) = —ICd: (Fog)'(x;d) = F'(g(x);g'(x;d)) = (WF(g(x)),g'(x;d)) = (-(C -!Cx)-l,-lCd) = (>C*S-\d). 41 Chapter 3. Interior-Point Methods and Algorithms Therefore, we have V(/ + T f o j ) ( i ) = V / ( i ) + r V ( f o 5 ) ( j ) = AT{Ax-b) +TK.*S-1 = 0, and letting A = T 5 _ 1 , we have the following conditions for optimality of problem (3.9). AT{Ax - b) + K.*A = 0 1 ICx + S_ = C AS = TI 5, A >- 0 (3.10) Once again, except for the right hand side of these equations, these conditions are exactly the L M I - L S K K T conditions (2.25). If (x,5, A) satisfies (3.10) then it is a primal-dual feasible point wi th duality gap / i ( x , S , A) = (A, S) = nr. Final ly , as we w i l l see i n Section 3.3, under the assumption that there exists a primal-dual feasible point (x, 5, A) of the L M I - L S problem such that S, A y 0, we have, for each r > 0, that there exists a unique solution ( x T , 5 T , A T ) to (3.10), and this solution converges to the unique optimal solution of the L M I - L S problem as T -> 0. 3.3 The C e n t r a l P a t h Now that we have described the log barrier problems associated wi th our three semidefinite constrained least squares problems, intuitively we can see how the solutions to these problems stay as far away from the boundary of the semidefinite cone as possible, while approaching solutions of our original problems as we let r —> 0. Each of the systems of equations, (3.6), (3.8), and (3.10), describes what is called the central path which we would like the iterates of our algorithms to follow. A key question to be raised now is that of the existence of the central path for each of our problems. Luckily, we are able to give a uniform treatment for a l l three problems due to the fact that each problem can be stated as an S D L C P (see Section 2.5). Recall that we were able to write each of our three problems in the form: F i n d [X,Y) G £ , such that X y 0, Y h 0, and {X,Y)=0. For each of our problems, we were also able to express £ as £ = {{X,Y) eSp xSp \Y = MX + Q}, with M : Sp —> Sp being a linear positive semidefinite function, and Q G Sp. Also recall the definitions of £ + , £ + + , and £* (2.31-2.33). In each of our three problems, we know that there is a unique optimal solution (X,Y); i.e., £* = {(X,Y)}. Using this notation, we can write the central path equations for each of our problems as (X, Y) G £, X y 0, Y y 0, and XY = T J , or more succinctly as (X,Y) G £ + + \ i i \ XY = TI j where T > 0. We now state the following result from the paper that introduced the S D L C P . 42 Chapter 3. Interior-Point Methods and Algorithms Theorem 3.3.1 ([13, p. 98]) If there exists an {X,Y) £ C++, then: (1) For all r > 0, there exists a unique (XT,YT) £ C++ such that XrYT = TI. (2) The set C := {(XT,YT) | r > 0} forms a smooth path, called the central path. (3) l im (XT,YT) = (X,Y) £ £». T -+0+ In order to apply Theorem 3.3.1 to our problems, it suffices to show that there exists an {X, Y) £ C++. For the first two problems, the S D L S , and the N S - S D L S , showing that C++ # 0 reduces to showing that the solution to a Lyapunov equation of the form GX+XG = H (3.12) is in S++, where G,H £ S++. To show this, we w i l l use the Kronecker product, ®, and symmetric Kronecker product, <g>5, which we w i l l find much use for later, and so we give a treatment of their properties in Appendix A . Unfortunately, for the L M I - L S problem, even after much effort, we were not able to prove that C++ / 0 i n general, nor were we able to find a counter-example. We begin by stating an important theorem about the Lyapunov equation. Theorem 3.3.2 Let G,H £ S++. Then there exists a unique X £ S++ such that (3.12) holds. Proof. This proof is drawn from [10, p. 268-271]. Firs t of a l l , we notice that we can rewrite equation (3.12) as (/ ® G + G ® I)vec(X) = vec(H). Since (I®G)T = IT®GT = I®G and similarly for the matrix G®I, we find that K := I®G + G®l£Sp2. Furthermore, i f a(G) = {A;}, then a {I <g> G) = a(G ® I) = {A,}, and so G being positive definite implies that K is positive definite. Therefore, X = mat(/< ' _ 1 vec(i /)) £ JRpxp is the unique solution to equation (3.12). To show that X is symmetric, we simply take the transpose of both sides of (3.12) to get XTG + GXT = H, and the uniqueness of X implies that X = XT. Finally, Lyapunov's Theorem [10, p. 96] states that i f we are given symmetric matrices G, X, and H satisfying (3.12) such that H is positive definite, then G being positive definite implies that X is positive definite. Therefore, X £ S++ is the unique solution to equation (3.12). • Recall that for the S D L S problem (2.16), we had p = n and had defined the function M. by MX = \{GX + XG), 43 Chapter 3. Interior-Point Methods and Algorithms where G = ATA G S%+. Thus, i f ( X , Y ) G C, then Y = MX + Q, which can be rewritten as GX + XG = 2(Y -Q). Since Q G Sn, we can clearly choose Y G <S£ + so that H := 2(Y - Q) G <S£ + . App ly ing Theorem 3.3.2, we are now able to conclude that there exists a unique X G S++ satisfying GX + XG = H. Therefore, C++ ^ 0 for the S D L S problem, which, by Theorem 3.3.1, implies that the S D L S central path exists. Similarly, for the N S - S D L S problem (2.19), we also h a d p = n , but had defined the function M by MX = \{G-LX + XG~L) where G = ATA G 5 ? + , and so G " 1 G S £ + . For (X,Y) G £ , we have Y = MX + Q, which we rewrite as G^X + XG'1 = H, where we can again choose Y G S++ to make H := 2(Y — Q) positive definite. Now Theorem 3.3.2 provides the existence and uniqueness of an X G S++ which satisfies G _ 1 X + XG~L = H. Thus, for the N S - S D L S problem, we also have C++ ^ 0. Using Theorem 3.3.1, we conclude that the N S - S D L S central path also exists. Since we were not able to determine whether or not C++ ^ 0 is always true for the L M I - L S problem, for the remainder of the discussion we wi l l be assuming that there exists a primal-dual feasible point (x, S, A) of the L M I - L S problem such that S, A >- 0. 3.4 Interior-Point M e t h o d s for the S D L C P We are now in a position to describe interior-point methods used for solving the S D L C P (2.28). The structure of our discussion is inspired by material regarding interior-point methods i n [21] on the linear programming problem ( L P ) , and in [27] on the S D P (2.22). Suppose we are given an S D L C P wi th C = { ( X , Y) G Sp x Sp | Y = MX + Q\, (3.13) where M : SP -» Sp is a linear positive definite function, and Q G Sp. Here we w i l l assume that C++ ^ 0 and that C* = { ( X , Y ) } . The basic idea behind interior-point methods for solving the S D L C P is to construct a sequence { ( X r , YR)} i n <S^+ x S++ which converges to the unique solution ( X , Y). That is, we require that X r and YT be positive definite for each iteration r. A /easi&/e-interior-point method also requires that the positive definite in i t ia l point (Xo, YQ) is in C so that YR = MXR + Q in each iteration; an m/easi 'Me-interior-point method only requires that (XQ,YQ) is positive definite and the method w i l l approach the set £ as it iterates. 44 Chapter 3. Interior-Point Methods and Algorithms The methods we describe here w i l l be path-following methods in that they w i l l use the central path discussed in previous section in order to be guided toward the solution while staying as far away from the boundary of positive semidefinite region as possible. A s in the framework described in Figure 3.1, in each iteration we w i l l update our current iterate (X, Y) to (X,Y) = (X,Y) + 9 (AX, A V ) for some suitable direction (AX, AY) £ Sp x Sp and some suitable step length 9 > 0 chosen so that (X,Y) is positive definite. 3.4.1 C h o o s i n g a su i tab le d i r e c t i o n ( A X , AY) In each iteration, we want to take a step toward a point (XT,YT) on the central path, which is defined as the solution of the following system of nonlinear equations. F(X,Y) := MX + Q-Y' 0' XY - r l 0 (3.14) Here we choose r = a fx, where fx (X, Y) jp is the normalized duality gap, and a is a centering parameter chosen from the interval [0,1]. When we choose a = 1, we wi l l be stepping toward the point on the central path with the same duality gap as our current iterate: /x(XT,YT) = (XT,YT) = tr(XYr) = t r ( £ J ) = PA = (X,Y) = tx(X,Y). Thus, when a = 1, we w i l l step toward the center of the feasible region, away from the boundary, but we w i l l make little or no progress toward the solution of the S D L C P . O n the other hand, by choosing a = 0, we wi l l be stepping directly toward the solution of the S D L C P , but we may be hindered in the next iteration because we end up being too close to the boundary, forcing us to take only very small steps toward the solution. Typical ly a is chosen in the open interval (0,1) in order to balance between these two goals of reducing the duality gap of our iterates and staying as far away from the boundary as possible. A s discussed in [21, Chap. 11], at the heart of many important algorithms for solving nonlinear equations like (3.14) lies Newton's method. This is largely due to the rapid convergence properties of this method under fairly reasonable assumptions. The Newton step taken in each iteration of Newton's method is defined as the solution of the linear approximation of the equations at the current point. For example, the Newton step taken toward the solution of (3.14) is defined as the direction (AX, AY) £ Sp x Sp satisfying F(X,Y) + VF(X,Y) AX AY 0. Thus, to calculate the Newton step for (3.14) at our current iterate (X, Y), we need only solve the system of linear equations, VF(X,Y) AX AY -F(X,Y), (3.15) 45 Chapter 3. Interior-Point Methods and Algorithms which can also be written as MAX-AY' AXY + XAY Notice that the first equation in (3.16) can be rearranged as Y + AY = M(X + A X ) + Q. In the literature for the S D L C P , when describing the Newton step for a general £ , they state this first equation as ( X + A X , Y + AY) £ C. Here arises the key problem that is the focus of much research i n interior-point methods for the S D P (see [27]). The function F defined in (3.14) maps a point in SV x SP to a point in SP x I R p x p . The reason behind this is that even i f the matrices X and Y are symmetric, the matrix XY is usually nonsymmetric. This means that the gradient of F at ( X , Y) is a linear function mapping between vector spaces of different dimension. In other words, VF(X,Y) cannot be a bijective function, and hence it is not invertible. We could compute the solution to (3.16) where the matrices A X and AY are in I R p x p , but we find that, contrary to our wishes, this solution is often nonsymmetric. Since this problem is due to the second equation in (3.14), that XY = rl, we need to find a way to replace this equation with an equivalent symmetric equation. A simple way of doing this is to just take the symmetric part of both sides of this equation, from which we get \{XY + YX) = TI. (3.17) This idea was introduced by Alizadeh, Haeberly, and Overton in [2], and the resulting Newton step, defined by MAX - A Y _\{YAX + AXY) + \(XAY + AYX) has been coined the A HO direction. However, there are many other ways of symmetrizing the equation XY — rl. A whole family of directions was proposed by Monteiro and Zhang (see [18] and [36]), called the MZ-family of directions. In [36], Zhang proposed using replacing XY = rl in (3.14) wi th HP{XY) = rl (3.19) where Hp : JRpxp —>• <SP is the linear transformation defined by HP(U) = \ (PUP-1 + P-TUTPT) (3.20) with P £ I R p x p being some constant nonsingular matrix. In fact, we have Hp(XY) =rl & XY = rl for al l symmetric matrices X and Y (see [36, p. 377]). Thus, substituting XY — rl wi th (3.19), the Newton step (3.15) can be stated as MAX - A Y HP(AXY + XAY) Y - MX - Q rl-XY (3.16) Y - MX - Q TI - \ (XY + Y X ) (3.18) Y - M X - Q TI-HP(XY) (3.21) 46 Chapter 3. Interior-Point Methods and Algorithms Notice that the A H O direction (3.18) is a member of the MZ-fami ly wi th P = I. Another important direction in the MZ-family is the HRVW/KSH/M (or HKM) direction. This is the MZ-direct ion (3.21) defined with P = y/Y. That is, the H K M direction is obtained by solving MAX - AY VYAXVY + KVYXAYVY'1 + VY^AYXVY) or equivalently, [ MAX - AY AX + \{XAYY~L + Y~LAYX) for ( A X , /AY). This direction was independently discovered by Helmberg, Rendl, Vanderbei, and Wolkowicz, and by Koj ima , Shindoh, and Hara, only to be later rediscovered by Monteiro as a member of the MZ-family. We also mention the ducd-HKM direction as the MZ-direct ion (3.21) wi th P = \/X , which could be obtained by solving the following linear system. . M A X -\ ( Y A X X " 1 + X~XAXY) + AY A third direction from the MZ-family which is s t i l l widely used for S D P is the Nesterov-Todd or NT direction. This direction is defined as the solution of (3.21) wi th P = W~2, where W is the unique positive definite matrix satisfying WYW = X. We can even state this matrix W explicitly as W = \/X (VxYx/x)" Y X . (3.24) It was shown in [29] that the NT-direct ion is the solution to the following equivalent linear systems. " MAX - A y ' A X + WAYW For each of the directions already mentioned, we also have their corresponding predictor-corrector direction. Given a nonsingular direction P 6 I R p x p , we compute this direction ( A X , A y ) by first computing the predictor direction (dX, 5Y) as the solution of MSX - 5Y HP{8XY + X5Y) and then correct this direction by (5X, SY), which is computed as the solution of MSC-SY^ HP(SXY + X5Y) where a G (0,1] and T = afi. Together, these two directions give us the predictor-corrector direction, defined as ( A X , A y ) := (SX, 8Y) + (SX, 5Y). Y-MX-Q TI - VYXVY Y - MX - Q TY-1 - X (3.22) y - MX - Q r X " 1 - y (3.23) Y - M X - Q TY'1 - X MAX-AY W^AXW'1 + AY Y - MX - Q T X - 1 - y (3.25) y - MX - Q -Hp(XY) (3.26) 0 TI - Hp{6X5Y) (3.27) 47 Chapter 3. Interior-Point Methods and Algorithms Clearly, once we have solved (3.26) for the predictor direction, we can compute ( A X , AY) directly as the solution of M A X - AY HP(AXY + XAY) The idea behind this predictor-corrector direction is as follows. First we attempt to reduce the duality gap as much as possible by setting cr = 0, which may end up taking us far away from the central-path and too close to the boundary of the semidefinite region. We then correct this step by computing a direction with o ^ 0 chosen to take us toward a point on the central-path wi th a duality gap as small as the one obtained by the predictor direction. We can also interpret this as having used second-order information to determine this predictor-corrector direction; i f the predictor direction gives us the direction tangent to the trajectory we are following, then the predictor-corrector direction takes into account the curvature of the trajectory as well (see [21, §14.2]). A n alternate interpretation of the predictor-corrector direction is that it is the direction which comes from taking a step wi th o = 0, and then a step wi th a G (0,1] (see [27, p. 549]). We choose to give the above description of the predictor-corrector direction to highlight the fact that the two linear systems we must solve, (3.26) and (3.28), both have the same coefficient matrix, which greatly reduces the computational effort required to compute this direction. We would also like to mention that there are many more search directions and families of search directions that have been studied i n the area of S D P . However, the most important of those studied appear to be the A H O , H K M , d u a l - H K M , N T directions, and their predictor-corrector variants, which we have mentioned above. Moreover, as we w i l l discuss further i n Section 3.6, H K M is currently the direction of choice for large S D P problems [22]. Y-MX-Q TI - HP{XY + 6X5Y) (3.28) E x i s t e n c e a n d un iqueness o f sea rch d i r e c t i o n s Now that we have defined a few search directions that can be used in interior-point methods for solving an S D L C P , we would like to know under what conditions these search directions exist. The answer comes from a paper by Shida, Shindoh, and K o j i m a [26]. ++ x S++ T h e o r e m 3.4.1 ([26, p . 392]) Let £ be given as in (3.13). Suppose that ( X , Y ) G S\ and T > 0. Then both the dual-HKM (3.23) and NT (3.25) search directions exist and are unique. Suppose further that | ( X Y + Y X ) G S+. Then the AHO search direction (3.18) exists and is unique. The assumption that \{XY + YX) G S+ is necessary in Theorem 3.4.1, for even when X and Y are both positive definite, the matrix \ (XY + YX) may not be positive semidefinite. Consider the counter-example provided i n [26] where X = 49 0 0 1 and Y = 4 1 1 1 Clearly X and Y are positive definite. However, the matrix \{XY + YX) = 196 25 25 1 48 Chapter 3. Interior-Point Methods and Algorithms is clearly not positive semidefinite. In fact, a further example is provided i n [29] of an S D P for which the A H O direction does not exist for a given positive definite iterate. 3.4.2 C h o o s i n g a su i tab le step l eng th 0 Recall that our current iterate is (X, Y) G S++ x S\+. Once we have picked a step direction ( A X , A Y ) from the directions discussed i n Section 3.4.1, we need to compute a step length 9 > 0, where X := X + 9AX, Y := Y + 9AY, and we require that X,Y y 0. The natural step length for a Newton direction is 9 = 1, but a step length of 1 may cause us to exceed the positive definite region to which we are constrained. Therefore, we need to determine the maximum possible step length # m a x such that i f 0 < 9 < 9max, then X, Y y 0. The maximum step length in the direction AX can easily be seen as being the smallest positive number 9\ such that &et{X + 91AX) = 0, or 9\ = oo if no positive solutions exist. Similarly, the maximum step length in the direction AY is defined as the smallest positive number 92 such that det (Y + 92AY) = 0, or 92 = oo i f no positive solutions exist. Our maximum possible step length is then the minimum of these two numbers: 0 m a x = min{0 i , 0 2 } . (3.29) Final ly, since we do not want to take a step al l the way to the boundary, we define our step length as 9 := m i n { l , c x 9max}, (3.30) for some c G (0,1) which we w i l l call the step length parameter; in practice, c is usually a fixed number chosen from the interval [0.9, 1.0). Defining 9 in this way guarantees that our next iterate, (X, Y ) , satisfies X, Y y 0. We can compute 9\ and 92 as the minimum positive eigenvalues of the respective generalized eigenvalue problems Xv = X{-AX)v, and Yv = X(-AY)v. Since algorithms for solving generalized eigenvalue problems prefer to have the matrix on the right-hand-side be positive definite (see [17]), it is better computationally to find the maximum eigenvalues A i and A2 of the respective problems -AXv = XXv, and - AYv = XYv, then let ^ [ 1 / A „ if A, > 0, ( 3 3 1 ) I +00, otherwise, for i = 1,2. See Lemma 5.1 from [13]. 49 Chapter 3. Interior-Point Methods and Algorithms A l g o r i t h m 3.2 G i v e n ini t ia l matrices XQ,YO >- 0, and tolerances TOL\,TOL2 > 0. L e t {X,Y) := (X0,Y0) and r = 0. W h i l e \\MX + Q - Y\\2F > TOLX o r {X,Y) > TOL2: d o 1. Let ft := (X, Y)/p and choose a centering parameter cr 6 [0,1]. 2. Choose a nonsingular matr ix P G I R p x p and compute the corresponding search direction ( A X , AY) eSp xSp by solving (3.21) wi th r := op,. 3. Choose a step length parameter c G (0,1) and compute the step length 9 by equation (3.30) so that (X,Y) := (X,Y) + 9(AX,AY) satisfies X,Y >- 0. 4. Update (X,Y) := (X,Y). 5. Let r := r + 1, and (XR,YT) := ( X , Y ) . E n d Figure 3.2: A general path-following algorithm for the S D L C P based on the MZ-family of search directions. 3 .4 .3 S u m m a r y In summary, we describe a general path-following algorithm and a general predictor-corrector algorithm for solving the S D L C P (2.28) wi th C defined as in (3.13). The general path-following algorithm is given in Figure 3.2 while Figure 3.3 contains the general outline used for many predictor-corrector algorithms. Note that when TOL2 = 0, these algorithms wi l l each consis-tently generate an infinite sequence {(XT, YT)} i n <S^+ x <SP + . Now we would like to know when { ( X r , YR)} converges to the unique solution of the S D L C P , and, given e > 0, how many itera-tions w i l l it take for Algor i thm 3.2 and Algor i thm 3.3 to terminate when we take the tolerances to be TOLX = e | | M X 0 + Q - Y0\\F and TOL2 = e (X0, Y0). We call such an iterate (XR,YR) which satisfies \\MXR + Q-YR\\2F < e\\MXo + Q-Y0\\% (XR,YR) < e(X0,Y0) an E-approximate solution of the S D L C P . The next section deals with the questions mentioned above. (3.32) 50 Chapter 3. Interior-Point Methods and Algorithms A l g o r i t h m 3.3 G i v e n ini t ia l matrices XO,YQ y 0, and tolerances TOLi,TOL2 > 0. L e t (X, Y) := ( X 0 , Y0) and r = 0. W h i l e \\MX + Q- YfF > TOLx o r (X,Y) > TOL2, d o 1. Let fb := (X,Y)/p and choose a nonsingular P G JRpxp. 2. (Predictor step) Compute the predictor direction (5X, 5Y) G Sp x Sp by solving (3.26). 3. Choose a centering parameter a G (0,1]. 4. (Corrector step) Compute the predictor-corrector direction ( A X , AY) by solving (3.28) wi th r := ajl. 5. Choose a step length parameter c G (0,1) and compute the step length 6 by equation (3.30) so that ( X , Y) := ( X , Y) + 9(AX, AY) satisfies X , Y y 0. 6. Update ( X , Y ) := (X,Y). 7. Let r := r + 1, and ( X r , Y r ) := ( X , Y ) . E n d Figure 3.3: A general predictor-corrector algorithm for the S D L C P based on the MZ-family of search directions. 51 Chapter 3. Interior-Point Methods and Algorithms 3.5 Convergence Results A number of algorithms of theoretical importance have been formulated for solving the S D L C P (2.28), many of which fit nicely into the framework of either Algor i thm 3.2 or Algor i thm 3.3. Their importance comes from the fact that, in each case, it has been shown that the sequence of iterates converges to a solution of the S D L C P , and that the number of iterations required to get an £-approximate solution is a polynomial function in both 1/e and in the size of the problem, p. A n algorithm which has this polynomial iteration complexity is said to be a fully polynomial-time approximation scheme, and any constant-factor decrease in e, or increase in p, only results in a corresponding constant-factor increase in the running time of the algorithm (see [8, C h . 37]). Also see [31, C h . 3] for a complete discussion, i n the case of linear programming, of why we need only be concerned with bounding the number of iterations by a polynomial in order to conclude that the algorithm itself has a polynomial running time. These interior-point path-following algorithms for solving the S D L C P come under many different classifications: • feasible / infeasible, • short-step / long-step / predictor-corrector, • A H O / H K M / d u a l - H K M / N T search directions. Recall that the difference between feasible and infeasible algorithms is whether we are assuming that the ini t ia l matrices satisfy (XQ, YO) G C or not. The difference between short-step, and long-step algorithms comes down to a difference in the strategies used for choosing the center-ing parameter a and the step length parameter c i n Algor i thm 3.2, while predictor-corrector algorithms are based on the framework of Algor i thm 3.3. Finally, we classify each algorithm according to which search direction it uses. The key issue wi th any interior-point path-following algorithm is that the Newton steps we take toward the central path are not guaranteed to land us directly on the central path; the most we can hope for is that we are "near" the central path i n each iteration. This idea of nearness is made concrete wi th the notion of neighbourhoods of the central path. First notice that we can state the central path as the set C = {(X,Y) G C++ I XY = fil, where fi = , (3.33) where we are assuming that C is defined as in Section 3.4. The two most common feasible neighbourhoods considered are the narrow neighbourhood, A/>( 7 ) := {(X,Y) G C++ | \\y/XYy/X - jiI\\F < 7 £ , where fi = ^p-) , (3.34) and the wide neighbourhood, J V -oo ( 7 ) == {(X,Y) e C++ | A m i n ( X Y ) > (1 - 7 ) A , where / i = <^> } , (3.35) where Xmm{XY) is the minimum eigenvalue of XY, and 7 G (0,1) is a parameter which determines the width of the neighbourhoods. Clearly, we can see that C C NF{I) and C C 52 Chapter 3. Interior-Point Methods and Algorithms jV-ood) for a l l 7 G (0,1). We can also consider the infeasible central path, which is defined as SC = {(X, Y) e Sp++ x Sp++ XY = fil, where fi = &jp-} . (3.36) The corresponding infeasible neighbourhoods are then SMF(-y) := {(X,Y) G Sp++ x Sp++ | \\VXYy/X- jiI\\F < 1ft, where fi = ™ } , (3.37) and Stf-ood) : = {(X,Y) G Sp++ x Sp++ | A m i n ( X Y ) > (1 - 7 ) / i , where fi = ^ } , (3.38) where 7 G (0,1). Now that we have a way of measuring nearness to the central path, we would like to determine strategies for choosing a and c so that each iterate is wi th in some neighbourhood, either narrow or wide, of the central path. Short-step algorithms are those which keep al l iterates wi th in a (feasible or infeasible) narrow neighbourhood, while those algorithms which keep al l iterates wi th in a (feasible or infeasible) wide neighbourhood are called long-step. The predictor-corrector type algorithms use two narrow neighbourhoods of different sizes, and keep al l the predictor iterates i n the larger neighbourhood, and al l the corrector iterates in the smaller neighbourhood. Ko j ima , Shindoh, and Hara proposed the first algorithm for solving the S D L C P in [13]. It is a feasible short-step algorithm using the d u a l - H K M direction (3.23) (see [13, p. 115] and [26, p. 391]) and keeps al l iterates in the neighbourhood A / F ( 7 ) , where 7 is chosen from the interval (0, 0.1]. Beginning at a point (XO,YQ) G A O K T ) , they choose a = 1 — j/y/p in each iteration, and are able to take a full step, 6 = 1, while remaining in the neighbourhood of the central path. We now state the theorem from that paper which provides this result. T h e o r e m 3.5.1 ([13, T h m 8.1]) Let 7 G (0, 0.1]. Suppose that (X, Y) G JV>( 7 ) . In Algo-rithm 3.2, let a = 1 -l/y/p, ( A X , AY) be the dual-HKM direction (3.23), and 6 = 1. Then the next iterate, (X,Y), satisfies ( X , Y) = ( X , Y) + (AX, AY) G MF(j), and aft < < ( l - ^ (3-39) where p, = (X,Y)/p. What this tells us is that we are guaranteed to reduce the duality gap of our current iterate by at least a constant factor in each iteration, meaning that the duality gap of our iterates w i l l converge to zero at a linear rate. Moreover, it follows that, under the conditions of Theorem 3.5.1, this algorithm is globally convergent in the sense that, given any (XO,YQ) G jVj?( 7 ) , the infinite sequence {(XR,YR)} generated by Algor i thm 3.2 when TOL2 = 0 wi l l always 53 Chapter 3. Interior-Point Methods and Algorithms converge to the unique optimal solution (X,Y). Following the commentary that appears after Theorem 8.1 in [13], we let e > 0, and find that for every r = 0 , 1 , 2 , . . . we have (XR,YR) e A/>( 7 ) and (XR,YR) < (l - (XQ,Y0). To determine the number of iterations required in order to compute an e-approximate solution which satisfies (XR,YR) e C++ and (XT, YR) < e (XQ, Y0), we follow the proof of a general complexity theorem in [31, p. 61-62]. Let t ing r > l o g - , 7 £ we have iog(jrr,K> < r i o g ( i - J = j + iog(Xo,y0) < loge + l o g ( X 0 , Y o ) , which implies that (XR,YR) < e (XO,YQ). Therefore, we are able to compute an e-approximate solution in O (^/plogj) iterations, which implies that this feasible short-step algorithm of Ko j ima , Shindoh, and Hara is a fully polynomial-time approximation scheme. O f course, as is stated in the concluding remarks of [13], this feasible short-step algorithm is mainly of theoretical importance. It requires that we prepare an ini t ia l point (XQ, YQ) 6 MF(I) with 7 = 0.1, and even i f we know such an ini t ia l point, it would be more computationally efficient to use a smaller centering parameter cr and a larger step length 6. Indeed, i f p = 100, 7 = 0.1, e = 1 0 - 1 0 , and TOL2 = e {XQ,YQ), the above complexity result only guarantees that our algorithm w i l l terminate i n 2000 iterations, which is clearly unreasonable in practice. Other important interior-point algorithms designed for solving the S D L C P were developed by Monteiro and Tsuchiya [19], and Koj ima , Shida, and Shindoh [12]. In [19] the authors put forth two feasible algorithms, a short-step algorithm and a predictor-corrector algorithm, both based on an entire family of search directions called the K S H (Kojima-Shindo-Hara) family of directions, of which the d u a l - H K M direction is a member. Bo th of these algorithms are shown to be able to produce an e-approximate solution of an S D L C P in at most O (.y/plog J ) iterations. The algorithm considered in [12] is an infeasible predictor-corrector algorithm which uses the A H O direction. Using an infeasible wide neighbourhood similar to (3.38), but related to the equation | (XY + YX) = TI, the authors develop an algorithm wi th a sophisticated step length control rule that keeps a l l iterates wi th in this neighbourhood. Furthermore, they prove that their algorithm is globally convergent, and, given e > 0, must provide an e-approximate solution in a finite number of iterations; under certain conditions, this number of iterations can be determined as being O ^ l o g ^ . However, the true strength i n their algorithm lies in the fact that, under the assumption that the solution of the S D L C P satisfies the strict complementarity condition, X + Y y 0, 54 Chapter 3. Interior-Point Methods and Algorithms they are able to prove that a parameter which measures the optimality and feasibility of their iterates converges quadratically to zero. This is a fast moving field and many aspects of interior-point algorithms for the S D L C P , and i n particular for the S D P (2.22), continue to be studied. For example, many other search directions, and families of search directions, have been proposed which we have not mentioned here. Our goal has only been to obtain some theoretical convergence results for the interior-point methods we w i l l be using to solve our three semidefinite constrained least squares problems. We now turn our attention toward the computational aspects of interior-point algorithms for our problems, while we keep in mind the following statement made by Todd i n [27], where he observes that the intense study in the field of semidefinite optimization is partly due to " . . . great advances i n our ability to solve such problems efficiently i n theory and in practice (perhaps 'or' would be more appropriate: the most effective computational methods are not always provably efficient in theory, and vice versa)." 3.6 Implementation Issues The purpose of this section is to discuss the implementation of both a standard path-following algorithm and a predictor-corrector algorithm for solving each of our three semidefinite con-strained least squares problems. In each case, we have decided to use the A H O direction, but we w i l l discuss the advantages of using the H K M , d u a l - H K M , and N T directions at the end of this section. We use the following rule for computing the centering parameter a in each iteration of our standard path-following algorithm, where (X, Y) is our current iterate. "Vr-MX-QfKTO!* otherwise The idea is to wait to reduce the duality gap of our iterates by choosing a near 1 unt i l we are wi th in our desired tolerance of being feasible, after which we choose a close to 0 to make rapid progress toward the solution. Based the results reported in [2] and [29], we have decided to use the Mehrotra algorithm for our predictor-corrector algorithm. This algorithm is based on Algor i thm 3.3 wi th an adaptive formula for determining the centering parameter a i n Step 3. Firs t we must choose a step length parameter c G (0,1) and compute the step length 0 by equation (3.30) so that the predictor iterate, (X+,Y+) = (X,Y) + e(8X,8Y), satisfies X+,Y+ >- 0. Lett ing jl+ = (X+ ,Y+)/p, the Mehrotra rule for defining a is A s is pointed out i n [21] wi th regards to Mehrotra's algorithm for linear programming, i f a lot of progress was made during the predictor step in reducing the duality gap of the current iterate, then / 2 + -C fi, which w i l l cause a -C 1. Having a small w i l l then allow for much progress to 55 Chapter 3. Interior-Point Methods and Algorithms be made during the corrector step as well. O n the other hand, i f not much progress is made during the predictor step, and fi+ ~ fi, then we w i l l have a « 1, which w i l l cause the corrector step to put more emphasis on centering the predictor iterate. We have chosen to implement the step length parameter c according to a recommendation of Todd et a l i n [29]. They observe that letting c = 0.98 in each iteration r is usually a good choice, but that choosing c adaptively according to c : = 0.9 +0.09 x 0 r _ i , (3.42) where 0 r -1 was the step length used i n the previous iteration (9Q = 1), can alleviate stagnation problems that sometimes occur when using a fixed step length parameter. For both the standard path-following algorithm and the predictor-corrector algorithm, most of the work of each iteration is in solving a linear system of the form (3.21) for the step direction. This amounts to rewriting the linear system i n the matrix-vector form using vec and the Kronecker product, or svec and the symmetric Kronecker product (see Appendix A ) , and then solving this fairly large system by factoring the 2p2 x 2p2 coefficient matrix. However, we w i l l see that there are ways to take advantage of the structure present in these linear systems and reduce the bulk of the work to that of factoring a p2 x p2 matrix. Moreover, it may be possible to avoid using vec or svec by solving a Lyapunov system directly (see [27, p. 547]), but we have not pursued this issue. Computing the predictor-corrector direction amounts to having to solve two different linear systems which have the same coefficient matrix. B y retaining the matrix factorization (for example, the L U or Cholesky factorizations) from the predictor step, we are able to compute the corrector step wi th much less effort. For each of our three least squares problems, we wi l l consider both the vec and svec versions of the linear system we must solve. In each case we wi l l propose a method that can be used to solve the resulting linear system more efficiently; there may be more efficient ways to solve these structured systems, especially i f the matrices involved are large and sparse, but we have not pursued this issue. For simplicity, we w i l l only examine the linear systems we must solve in order to obtain the A H O and d u a l - H K M directions for each of our least squares problems. The H K M and N T directions are similar. 3.6.1 C o m p u t i n g search d i rec t ions for the S D L S p r o b l e m Recall that when we stated the S D L S problem (2.16) as an S D L C P (2.28), we had defined M:Sn^>-Sn and Q G Sn by MX = \(ATAX + XATA), and Q = -l(ATB + BTA), so that A = MX + Q is the same as AT{AX -B) = Z , \{Z + ZT) = A. Using vec and the Kronecker product, we can write both of the linear systems for the A H O direction (3.18) and for the d u a l - H K M direction (3.23) in the form 56 Chapter 3. Interior-Point Methods and Algorithms Direction E F D A H O 1(1 ®A + A ® / ) \(I®X + X®I) TI - \(XA + AX) d u a l - H K M \(X~L ® A + A ® * " 1 ) J ® J TX~1 - A Table 3.1: E, F, and D i n equation (3.43) for the A H O and d u a l - H K M directions. Direction E F D A H O I®SX TI -\(XA + AX) d u a l - H K M X - 1 ® s A 7 ® s I TX'1 - A Table 3.2: E, F, and D i n equation (3.48) for the A H O and d u a l - H K M directions. ~\(I ®ATA + ATA®I) - J ® / E F where Z = AT(AX — B), and the matrices E, F, and D are as i n Table 3.1. Notice that while the coefficient matrix in the linear system (3.43) is 2n2 x 2n2, we are able to solve this system efficiently by mult iplying the first block row by F and adding it to the second block row, resulting in the n2 x n2 system M v e c ( A X ) = d (3.44) where M = lF(I®ATA + ATA®I) + E, (3.45) d = F v e c ( A - l(Z + ZT)) +vec(L>), (3.46) and then letting A A := l(ATAAX + AXATA) + \(Z + ZT)- A. (3.47) If we choose the d u a l - H K M direction which has F = I ® 7, then the corresponding M in equation (3.45) w i l l be an n2 x n2 symmetric and positive definite matrix (see Appendix A ) and we can solve (3.44) using the Cholesky factorization of Af . However, the A H O direction yields a matrix M that is, i n general, nonsymmetric; this implies that an L U factorization wi l l be necessary to solve (3.44) in this case. We could also use svec and the symmetric Kronecker product to write both of the linear systems, (3.18) and (3.23), in the form vec (AX) vec(AA) vec (A - KZ + Z1)) vec (D) (3.43) 57 Chapter 3. Interior-Point Methods and Algorithms I ®s ATA -I <g>s / E F where the matrices are defined i n Table 3.2. Th is time we can solve system (3.48) by solving Msvec(AX) = d (3.49) where M = F{I®SATA) + E, (3.50) d = F s v e c (A - \{Z + ZT)) +svec(D) , (3.51) and then letting A A be defined as in (3.47). Aga in , if we choose the d u a l - H K M direction, then M wi l l be an n x n symmetric positive definite matrix, where n := ^n(n + 1); otherwise, M w i l l be nonsymmetric for the A H O direction. svec(AX) svec(AA) svec (A - \{Z + Z1)) svec(Z>) (3.48) 3.6.2 C o m p u t i n g search d i rec t ions for the N S - S D L S p r o b l e m When we stated the N S - S D L S problem (2.19) as an S D L C P (2.28), we had defined M : Sn and Q <E Sn by MA = ^ ( G - ^ + A G " 1 ) , and Q = ^(G-1ATB + BTAG-1), where G = ATA e <5"+. W i t h M and Q defined in this way, the equation S = MA + Q is equivalent to AT(AX — B) = A , \{X + XT) = S. Furthermore, given X e Hrx", A e Sn, and letting S := \{X + XT), it is easy to see that i f and only if MAA - AS = S - MA - Q ATAAX - A A = A - AT(AX - B) and A S = \{&X + AXT). (3.52) (3.53) Indeed, to show the forward implication, we simply let A X := G " 1 [(A + A A ) + ATB] - X; the backward implication is straightforward. Therefore, we are justified in replacing equation (3.52) with the first equation in (3.53) when solving for the search direction (AA, AS), so long as we define A S := \{AX + AXT) after determining AX. Given the current iterate (X, A), where \{X + XT),A >- 0, we first let S := \{X + XT). Then we solve either the matrix-vector equation defined using vec, I ® ATA E - 7 ® / F vec(AX) vec(AA) vec (A - AT{AX - B)) vec(D) (3.54) 58 Chapter 3. Interior-Point Methods and Algorithms Direction E F D A H O \{I ®A + A®I)V \{I ®S + S®I) T I - \{SA + A S ) d u a l - H K M \{S~L ® A + A®S~1)V I®I TS-1 - A Table 3.3: E, F, and D in equation (3.54) for the A H O and d u a l - H K M directions. Direction E F D A H O [I®S A)UTV I ® S S TI-1(SA + AS) d u a l - H K M ( S " 1 <g>s A)UTV I ® S I TS'1 - A Table 3.4: E, F, and D in equation (3.55) for the A H O and d u a l - H K M directions, or the matrix-vector equation defined using svec for AA £ Sn, while s t i l l using vec for AX G I®ATA -U 'vec(AX)' vec (A - AT(AX - B))' E F svec(AA.) svec(D) (3.55) where U is defined as the matrix in equation (A.2). In the linear system (3.54), we have the matrices E, F, and D defined as in Table 3.3, while E, F, and D are defined as in Table 3.4 for the linear system (3.55). Note that V is the n 2 x n2 matrix which satisfies Vvec(AX) = vec Q ( A X + AXT)) ; thus UTVvec(AX) - svec(AS). In order to solve (3.54) more efficiently, we can solve M v e c ( A X ) = d , (3.56) where M = F{I®ATA) + E, (3.57) d = -Fvec (A — AT(AX - B)) + vec(D), (3.58) and let A A := ATAAX + AT(AX - B) - A. (3.59) Alternately, we could solve (3.55) by solving system (3.56) wi th M = I®ATA + UF-1E, (3.60) d = vec(A-AT{AX-B))+UF-1svec{D), (3.61) 59 Chapter 3. Interior-Point Methods and Algorithms and then let A A be defined as in (3.59). Notice that M is an n 2 x n 2 matrix in both (3.57) and (3.60), and so no advantage is obtained by solving (3.55) rather than (3.54). Furthermore, the d u a l - H K M direction does not appear to give a symmetric positive definite M in either (3.57) or (3.60), which means that we wi l l have to resort to using the L U decomposition for both the A H O and d u a l - H K M directions. 3.6.3 C o m p u t i n g search d i rec t ions for the L M I - L S p r o b l e m We stated the L M I - L S problem (2.21) as an S D L C P (2.28) by defining M : Sk -> Sk and Q G Sk as MA = / C G " 1 / C * A , and Q = C-KG~XAT\ where G = ATA G S l + and K : H T -> Sk and K* : Sk -> IR" are defined by ICx = ^^XiKi, 1C*A = (Ki,A) i = 1 [(Kn,A)_ for some matrices K\,...,Kn G Sk. In this way, we have that S = MA + Q is equivalent to AT(Ax - b) + 1C*A = 0, Kx + S = C. Also, as before, i f we are given x G IR™ and A , S G Sk, we have that . M A A -AS = S-MA-Q i f and only i f ATAAx + /C*AA = -K*A - AT{Ax - 6), /CAc + A S = C-ICx-S. Again, this is easy to see; the forward implication follows by defining A E := G~l [ATb - K* (A + AA)] - x, while the backward implication is immediate. Therefore, in order to solve for the search direction given the current iterate (x, A, S), where A , S >- 0, we can solve either the linear system defined wi th vec, (3.62) ~ATA KT 0 ' A E '-K*A - AT{Ax - b)~ K 0 J ® J v e c ( A A ) = vec(C — tCx — S) 0 E F v e c ( A S ) where E, F, and D are as in Table 3.5, or we can solve the linear system defined wi th svec, 60 Chapter 3. Interior-Point Methods and Algorithms Direction E F D A H O \{I®S + S®I) \{I®K + K®I) T I - i ( A S + S A ) d u a l - H K M i ( A - J ® S + S® A " 1 ) I® I r A " 1 - 5 Table 3.5: E, F, and £> in equation (3.62) for the A H O and d u a l - H K M directions. Direction E F D A H O I®SS I ®S A T I - i ( A 5 + 5 A ) d u a l - H K M A " 1 ®s S I ® S I r A - 1 - 5 Table 3.6: E, F, and £> in equation (3.63) for the A H O and d u a l - H K M directions. ~ATA KT 0 " Ax ~-K.*A - AT{Ax - b)~ K 0 I®S I svec(AA.) = svec(C7 — K.x — S) 0 E F _svec(AS')_ svec(.D) (3.63) wi th E, F, and D as in Table 3.6. Furthermore, the K in equation (3.62) is defined as K = [vec(Ki) ••• vec(Kn)} (3.64) so that vec (ICAx) = KAx, and AC*AA = KT vec(AA), while the K i n equation (3.63) is defined as K = [svec(Ki) • • • s vec ( i i „ ) ] (3.65) so that svec {ICAx) = KAx, and AC*AA = KT svec(AA). Following [21, p. 408], we can simplify the linear system (3.62) by eliminating A S , producing the equivalent system ATA KT " Ax K -F~lE_ vec(AA) rp - F " 1 v e c ( i ? ) vec(AS) rp - KAc, where rd = -K*K- AT{Ax-b), rp = vec(C — Kx — S). 61 Chapter 3. Interior-Point Methods and Algorithms We can further reduce this system by eliminating A A , which gives us (ATA + KTE~1FK)Ax = rd + KTE-lrq, (3.66) vec(AA) = E^iFK&c-rq), (3.67) vec(AS') = rp-KAx, (3.68) where rq = Frp - vec(.D). We note here that these equations (3.66)-(3.68) are identical to those for S D P in [2] where they state that computing the step direction in this manner is more stable than other methods considered. A similar discussion also holds for the linear system (3.63). A t this point we could use the Cholesky factorization on the n x n matrix in equation (3.66), provided that E~XF is symmetric positive definite, and that K has full-column rank (i.e., the matrices Ki,..., Kn are linearly independent). For example, it is easy to see that the H K M direction would have E~lF symmetric positive definite, due to the fact that E = I®I, and F= i ^ S " 1 <g> A + A <g> S ^ 1 ) . Of course, i f the matrix in (3.66) is not symmetric, we can always use the L U decomposition to solve for A E . 3.6.4 Differences i n the search d i rec t ions We have already mentioned one significant difference among the different search directions we are considering i n Theorem 3.4.1. This was the fact that it is possible that the A H O direction w i l l not exist at certain interior points, although, as stated in [27, p. 547], this does not seem to cause difficulties in practice. O n the other hand, by the same theorem, we also know that the d u a l - H K M and N T search directions always exist. Al though Theorem 3.4.1 would seem to imply that we would prefer to use the d u a l - H K M or N T directions over the A H O direction, this is not necessarily the case. A s is noticed in both [2] and [29], performing numerical experiments wi th path-following algorithms for solving S D P s (2.22) using the A H O , H K M , and N T directions shows that the A H O direction gives rise to a method that • can achieve higher accuracy i n the duality gap, and • converges in fewer iterations, than the methods using the H K M and N T directions. We can see why the A H O direction might produce a more stable method for our problems from the fact that the linear system defining this direction is actually defined at the optimal solution, which is a feature that the H K M , d u a l - H K M , and N T directions do not possess. See also [2] for more stability analysis of the A H O direction. However, as we also observed for the S D L S and L M I - L S problems, the main disadvantage of using the A H O direction for solving S D P problems is that each iteration of the method w i l l be 62 Chapter 3. Interior-Point Methods and Algorithms about twice as expensive as each iteration of the method using either the H K M or N T directions [29, p. 790]. This is due to the fact that there is some symmetry to be taken advantage of when solving for the H K M and N T directions for an S D P problem; this symmetry does not exist when solving for the A H O direction. It turns out that, in terms of computer running time, using the H K M direction gives rise to the fastest method for reducing the duality gap by a factor of 10 1 0 for S D P s (2.22) (see [29, p. 791]). This point is further reinforced by the fact that one of the more widely used codes for solving SDPs , S D P T 3 [28], has removed the option of using the A H O direction due to it being inefficient or under-used, while retaining the options of using the H K M and N T directions. Moreover, as was mentioned before, H K M is currently the direction of choice for large S D P problems [22]. A l l things considered, we have decided to use the A H O direction in our experiments, pri-marily due to the fact that we are mostly interested in solving only small scale problems. We also find the stability properties and the simplicity of implementing the A H O direction very appealing. Indeed, we see in [2] how care must be observed when implementing the H K M direction for S D P . 63 Chapter 4 Numerical Results The purpose of this chapter is to obtain some information about how the standard path-following (SPF) algorithm (Figure 3.2) and the predictor-corrector (PC) algorithm (Figure 3.3) perform in practice on each of the S D L S (2.16), N S - S D L S (2.19), and L M I - L S (2.21) problems; this gives us a total of six algorithms for our numerical experiments. We w i l l also compare Woodgate's Algor i thm [34] to the P C algorithm for the S D L S problem. A l l of our experiments were conducted i n M A T L A B [17]. A listing of the M A T L A B M-files used to produce these results can be found in Appendix B . Before reporting any results, we w i l l first explain some of the details of these codes and the methods of experimentation used. 4.1 I m p l e m e n t a t i o n a n d e x p e r i m e n t a t i o n d e t a i l s The implementation details that need to be discussed are: • Which search direction are we using and how are we solving for it? • How are we computing the maximum step length? • What have we chosen for the starting points? • What is the stopping criteria? • How have we implemented Woodgate's Algori thm? • How are we generating problem instances for our numerical experiments? We w i l l now discuss each of these questions in order. 4.1.1 S o l v i n g for the search d i r e c t i o n In each of our six M A T L A B codes we have used the A H O direction (3.18). We have decided to solve for this direction using vec and the standard Kronecker product (Appendix A ) ; this is due to the fact that there was too much overhead computation in our implementations using svec and the symmetric Kronecker product to give any advantage in solving a smaller linear system. Since we are using vec to solve for the A H O direction, we use the equations (3.44-3.47) and (3.56-3.59) to solve for this direction for the S D L S and N S - S D L S problems, respectively. 64 Chapter 4. Numerical Results However, we can improve on these equations as follows. Since we are using the A H O direction, we can write (3.45) and (3.46) as M = I [{I®XATA) + (ATA®X) + (X® ATA) + (XATA®I)} + E, (4.1) d = vec{rl- \(XZ + ZX)), (4.2) where Z = AT(AX — B). We can easily see that this is a more efficient way to compute M because the sum of four n2 x n2 matrices is cheaper to compute than the product of two n2 x n2 matrices. Similarly, when using the A H O direction, we can write (3.57) as M = § [(I ® SATA) + {S® ATA)} + E, (4.3) d = vec{rl-\{SR + RS + SA + AS)), (4.4) where R = Z — A and Z = AT(AX — B). We do not simplify the expression for d in (4.4) any further due to a loss of stability that was noticed in numerical experiments. In fact, replacing (4.4) wi th d = v e c ( T J - \(SZ + ZS)), results i n a loss in feasibility of the iterates as the duality gap goes to zero. A similar phe-nomenon was also noticed for S D P in [2, p. 757]. For the L M I - L S problem, our implementation using equations (3.66-3.68) did not produce the efficiency we were hoping for; therefore, we chose to solve (3.62) directly for the A H O direction. 4.1.2 Comput ing the maximum step length When solving the generalized eigenvalue problems in the computation of the maximum step, we have found that computing every eigenvalue and taking the maximum one is often less expensive than using an iterative method for just computing the maximum eigenvalue. In our l imited experimentation wi th sparsity, we found that it is advantageous to use an iterative method when dealing wi th sparse matrices, but in the case of dense matrices, it was more efficient to just solve these generalized eigenvalue problems directly. 4.1.3 Starting points The starting points we choose for a l l our experiments is as follows. • S D L S / N S - S D L S : X0 = I and A 0 = I • L M I - L S : x0 = 0, S0 = I, and A 0 = I. For the S D L S and N S - S D L S problems, I is the nxn identity matrix; for the L M I - L S problem, 0 is the zero vector of length n and I is the k x k identity matrix. 4.1.4 Stopping criteria We use the following stopping criteria: 65 Chapter 4. Numerical Results (i) Stop when ||res|| < TOL\ and p. < TOL2-Here we have the norm of the residual defined as res := \1{Z + ZT)-A\\F, for S D L S , \\Z-A\\F, for N S - S D L S , where Z = AT{AX -B). Recall that the normalized duality gap is defined as p, := where ( X , A ) / n , for S D L S , <S,A)/n, for N S - S D L S , s = Ux + xT). For the L M I - L S problem, the norm of the residual and the normalized duality gap are defined as ||res|| := rd (S,A)/k, where rd = -IC*A-AT(Ax-b), rp — vec(c7 — Kx — S). Furthermore, we set TOLi := Ve| | res 0 | | , TOL2 := eAo, where ||reso|| and /IQ are the norm of the residual and the normalized duality gap at the ini t ia l iterate, and e > 0 is a given tolerance. Notice that, given the starting points above, we w i l l always have /} = 1, and so TOL2 = e. For the most part, we w i l l be using e = 1 0 ~ 1 0 except when investigating the performance of Woodgate's Algor i thm and the predictor-corrector algorithm on different tolerances. (ii) Stop when we have exceeded the maximum number of iterations, Maxlt. For a l l six algorithms, we have set Maxlt = 100. If we have reached our maximum number of iterations without meeting our desired tolerances, our algorithm w i l l report a failure. Failed attempts w i l l be reported but not included in our averages. 66 Chapter 4. Numerical Results 4.1.5 Woodgate's A lgor i thm For the details of Woodgate's Algor i thm, please refer to [34]. However, we would like to mention here the ini t ia l iterate and stopping conditions we w i l l be using in our implementation of this algorithm. We have decided to use the recommended starting point from [34], which is the matrix X0 = ai(o), where X is the solution to the symmetric unconstrained least squares problem (1.3), X^ is the result of the result of changing all of the negative eigenvalues of X to 0 in the eigenvalue decomposition of X, and a is some nonnegative scalar. Since a stopping criterion for approximate solutions was not provided in [34], we have decided to stop iterating Woodgate's Algor i thm when ||res|| < TOL\ and fi < TOL2, or when we have exceeded the maximum number of iterations, Maxlt. This time, we have ||res|| = \\L(X){0) - L(X)\\F, fi = (X,L(X)(0))/n, Maxlt = 100, where L(X) := ATAX + XATA - ATB - BTA. Moreover, we set the tolerances as TOLi := y/i, TOL2 := e. This was done in order to make a fair comparison wi th the predictor-corrector algorithm for the S D L S problem, given the fact that, in general, fiQ ^ 1 for the ini t ia l iterate XQ above. One more difference should be noted between Woodgate's Algor i thm from [34] and our implementation of this algorithm. In each iteration, it is necessary to solve a linear system which has an n2 x n2 symmetric coefficient matrix, M, that is sometimes near singular. In order to avoid this near singularity, we modified M by adding vi, for some small positive parameter v. We were surprised to find that this modification greatly improved the convergence of this algorithm! After some experimentation, we decided to use v = 1 /n 2 . 4.1.6 Generating problem instances For a l l of our experiments, we have chosen to report each result averaged over 10 randomly generated instances of our problems. We randomly generate the matrices A and B for the S D L S and N S - S D L S problems, and the vector b and the matrices A, Ki,... ,Kn, and C for the L M I - L S problem by choosing their entries uniformly from the interval [—1,1]. We have chosen the dimensions of our problems to satisfy m > n and n > ^k(k + 1) so that A has full column rank in al l three problems, and s p a n { K T i , . . . , Kn} = Sk in the L M I - L S problem (see Corollary 2.4.10). 67 Chapter 4. Numerical Results 4.2 Numerical comparison of the algorithms Our experimentation comes in three parts. Firs t of a l l , we compare the S P F algorithm wi th the P C algorithm for each of our three semidefinite constrained least squares problems. We then compare the P C algorithm for solving the S D L S problem to Woodgate's Algor i thm. Final ly, we look at each of these algorithms in action on a typical instance of each problem. 4.2.1 T h e S P F / P C expe r imen t We compare the performance of the S P F algorithm wi th the P C algorithm for the S D L S , N S -S D L S , and L M I - L S problems. This is done by determining how the average number of iterations and amount of C P U time required to compute an e-approximate solution, where e = 10~ 1 0 , and the resulting average feasibility of this solution - measured by the norm of the residual of the solution - each depend on the size of the problem. These averages are based on 10 randomly generated problem instances for each problem size. The results of this experiment are shown in Table 4.1. We have plotted the results from Table 4.1 in Figure 4.1 for the S D L S problem, in Figure 4.2 for the N S - S D L S problem, and in Figure 4.3 for the L M I - L S problem. Based on these results, we make the following observations. 1. The P C algorithm is the clear winner in each of the three problems when we consider the C P U time required to reduce the duality gap by a factor of 1 0 1 0 . Even though each iteration of the P C algorithm is more expensive than each iteration of the S P F algorithm, we find that the overall C P U time for the P C algorithm is much less that of the S P F algorithm. This is due to the fact that, on average, the P C algorithm converges i n far fewer iterations than the S P F algorithm. 2. O n the other hand, we notice that the S P F algorithm is better suited to producing an approximate solution that is closer to being feasible. This is partly due to the strategy we have used for choosing the centering parameter u (3.40) in the S P F algorithm. In fact, we find that while the P C algorithm tends to reduce the norm of the residual gradually, the S P F algorithm achieves feasibility once we are able to take a step length of 9 = 1. 3. We observe that the N S - S D L S algorithms seem to require much more C P U time than their corresponding S D L S algorithms. This difference can be explained by the need to mult iplying an two n2 x n2 matrices each iteration in order to compute the matrix E = \{I®K + k®I)V. 4. Al though we were successful in approximating the solution the S D L S and N S - S D L S prob-lems wi th a tolerance of 1 0 ~ 1 0 in each attempt (in fact, we were able to perform a Cholesky factorization on these approximations, proving that they were numerically positive defi-nite), we encountered a few higher order problems for which the S P F algorithm for the L M I - L S problem stagnated (we exceeded our maximum number of iterations). However, as the results show, the P C algorithm for the L M I - L S problem was successful in each attempt. 5. The rapid growth in the C P U time required to solve these problems is to be expected due to the fact that in each iteration we are performing an L U decomposition of an n 2 x n 2 68 Chapter 4. Numerical Results (or (n + 2k2) x (n + 2k2) for the L M I - L S problem) matrix. The C P U time required for this L U decomposition is itself proportional to n 6 (or (n + 2fc 2) 3 for the L M I - L S problem). 4.2.2 The Woodgate / P C experiment Now we would like to compare Woodgate's Algor i thm to the P C algorithm for solving the S D L S problem. In order to do this, we have chosen to compare their ability to achieve a variety of tolerances on both small instances (n = 5) and large instances (n = 30). The results averaged over 10 random instances for n = 5 and over 10 random instances for n = 30 are shown in Table 4.2; the corresponding plots are presented in Figure 4.4. From this experiment we reach the same conclusion as in [34] that Woodgate's Algo-r i thm is able to converge very quickly to an approximate solution of the S D L S problem for small instances wi th limited tolerances. However, for larger problems and smaller tolerances, Woodgate's Algor i thm is much slower to converge than the P C algorithm, both in terms of number of iterations and C P U time required. In fact, as we see for n = 30 and e < 1 0 - 4 , Woodgate's Algor i thm sometimes exceeds the maximum number of iterations, 100, before be-ing able to achieve the desired tolerance. 4.2.3 A final comparison The purpose of this section is to take a typical medium sized random instance of each of the S D L S , N S - S D L S , and L M I - L S problems, and investigate the convergence properties of each of the algorithms considered in this thesis. We can see how each algorithm converges to the optimal solution by watching how the duality gap goes to zero i n the duality gap convergence curves shown in Figure 4.5. These convergence curves again indicate the rapid convergence of the P C algorithm for each problem. A t the same time we see how the S P F algorithm has a fairly level convergence curve while the centering parameter a is s t i l l near 1, and then converges fairly rapidly for the S D L S and N S - S D L S problems when a is chosen near 0. However, we also witness the ability for the S P F algorithm to stagnate slightly when solving an L M I - L S instance. Woodgate's Algor i thm is seen to make slow but steady progress toward the solution. 4.3 E s t i m a t i n g the compliance m a t r i x We now return to the application that was mentioned in Chapter 1 about estimating the compliance matrix at a certain location on a deformable object. Given a number of displacement measurements, u1 • • • ul G IR 3 , corresponding to some forces, p1 • • -pl G IR 3 , our aim is to find a matrix X which is the least squares solution of the equation \P1...plfXT = [ui...ul}T (4.5) subject to the constraint that the real parts of the eigenvalues of X are nonnegative; i.e., that \{X + XT) y 0. Notice that we can write (4.5) as AXT = B. 69 Chapter 4. Numerical Results S D L S # iterations C P U time (sec) log 1 0(| |res| |) m n S P F P C S P F P C S P F P C 20 5 11.4 7.4 0.02 0.02 -15.2 -10.6 40 10 11.2 8.1 0.06 0.05 -14.6 -10.8 60 15 11.5 8.5 0.39 0.31 -14.4 -10.1 80 20 12.4 9.1 1.38 1.05 -14.2 -10.1 100 25 13.7 9.3 4.06 2.84 -14.0 -10.7 120 30 13.8 9.2 9.61 6.57 -13.8 -10.3 140 35 15.9 9.6 23.85 14.70 -13.7 -10.4 160 40 16.1 9.6 47.95 28.95 -13.6 -10.1 N S - S D L S # iterations C P U time (sec) l o g i o ( l l r e s l l ) m n S P F P C S P F P C S P F P C 20 5 11.2 7.2 0.02 0.01 -15.0 -10.4 40 10 11.4 8.4 0.07 0.06 -12.5 -10.9 60 15 12.2 8.9 0.50 0.39 -12.2 -10.4 80 20 12.4 9.1 2.06 1.57 -11.0 -10.1 100 25 12.7 9.1 6.72 4.94 -11.0 -10.4 120 30 13.4 9.1 18.91 13.15 -10.7 -10.1 140 35 13.9 9.5 47.09 32.84 -9.8 -9.8 160 40 14.8 9.6 106.99 70.61 -9.0 -9.4 L M I - L S % success # iterations C P U time (sec) log 1 0(| |res| |) m n k S P F P C S P F P C S P F P C S P F P C 40 20 5 100 100 11.3 7.7 0.03 0.02 -14.3 -10.3 120 60 10 100 100 13.5 8.3 0.30 0.20 -13.6 -10.2 300 150 15 100 100 17.3 8.3 2.97 1.45 -13.0 -9.5 500 250 20 100 100 26.3 8.7 19.38 6.54 -12.7 -9.8 700 350 25 90 100 32.1 8.9 75.85 20.96 -12.4 -9.6 1000 500 30 70 100 45.1 9.2 289.44 60.86 -12.2 -10.3 Table 4.1: Results from computing approximate solutions of the S D L S , N S - S D L S , and L M I - L S problems with fi < 10~ 1 0 , where fi is the normalized duality gap. The S P F algorithm and the P C algorithm are used. Results averaged over 10 randomly generated problems for each problem size. Failures are not included in the averages. 70 Chapter 4. Numerical Results S D L S 5 10 15 20 25 30 35 40 (a) Average # iterations vs problem size n Figure 4.1: The plots of the data from Table 4.1 for the S D L S problem. The solid line sponds to the S P F algorithm, while the dashed line corresponds to the P C algorithm. 71 Chapter 4. Numerical Results N S - S D L S (a) Average # iterations vs problem size n Figure 4.2: The plots of the data from Table 4.1 for the N S - S D L S problem. The solid line corresponds to the S P F algorithm, while the dashed line corresponds to the P C algorithm. 72 Chapter 4. Numerical Results L M I - L S ,i , , , , 1 5 10 15 20 25 30 (a) Average # iterations vs problem size k 300 (c) Average log 1 0(| |res| |) vs problem size k Figure 4.3: The plots of the data from Table 4.1 for the L M I - L S problem. The solid line corresponds to the S P F algorithm, while the dashed line corresponds to the P C algorithm. 73 Chapter 4. Numerical Results S D L S % success # iterations C P U time (sec) !ogio(e) W G P C W G P C W G P C -2 100 100 0.6 3.0 0.01 0.01 -4 100 100 2.2 4.3 0.02 0.01 -6 100 100 3.9 5.3 0.04 0.01 -8 100 100 6.1 6.3 0.07 0.01 -10 100 100 9.4 7.3 0.10 0.02 (a) m = 20, n = 5 S D L S % success # iterations C P U time (sec) logio(e) W G P C W G P C W G P C -2 100 100 7.9 5.0 12.21 3.57 -4 80 100 10.1 6.0 15.52 4.29 -6 80 100 19.4 7.3 29.27 5.22 -8 80 100 31.3 8.3 46.91 5.94 -10 80 100 42.8 9.3 64.02 6.66 (b) m = 120, n = 30 Table 4.2: Results from computing approximate solutions of the S D L S problem wi th normalized duality gap p, < e, for e = 1 0 - 2 , . . . , 1 0 - 1 0 . Woodgate's Algor i thm (WG) [34] and the P C algorithm are used. Results averaged over 10 randomly generated problems. Failures are not included in the averages. 74 Chapter 4. Numerical Results Woodgate / Predictor-Corrector comparison 10 (a) - 4 - 6 - 8 # iterations vs log 1 0 (e) (TO = 20, n = 5) (b) C P U time vs log 1 0 (e) (TO = 20, n = 5) - 8 - 9 - 1 0 (c) # iterations vs log 1 0 (e) (TO = 120, n = 30) (d) C P U time vs log 1 0 (e) (TO = 120, n = 30) Figure 4.4: The plots of the data from Table 4.2. The dashed line corresponds to the P C algorithm, while the dotted line corresponds to Woodgate's Algor i thm [34]. 75 Chapter 4. Numerical Results Duali ty gap convergence curves (a) S D L S (m = 80, n = 20, e = 1 0 " 1 0 ) (b) N S - S D L S (m = 80, n = 20, e = l O " 1 0 ) 0 5 10 15 20 25 30 (c) L M I - L S (m = 500, n = 250, A: = 20, e = 1 Q - 1 0 ) Figure 4.5: Dual i ty gap convergence curves of typical random S D L S , N S - S D L S , and L M I - L S problems. The solid line corresponds to the S P F algorithm, the dashed line corresponds to the P C algorithm, and the dotted line corresponds to Woodgate's Algor i thm [34]. 76 Chapter 4. Numerical Results In [14], the following ten force and displacement measurements were made from the nose of a stuffed toy tiger. » A A = - 0 3157 0 0330 0 0603 - 0 3274 - 0 0158 0 0625 - 0 3569 0 0787 0 0563 - 0 2994 0 0301 0 0496 - 0 3243 - 0 0048 0 0715 - 0 3447 0 0736 0 0545 - 0 2417 0 0709 0 0522 - 0 2063 - 0 0099 0 0233 - 0 3285 0 1585 0 0979 - 0 2484 0 0878 0 0622 - 0 2196 0 0023 0 0280 - 0 3148 0 1506 0 0922 B -1 4257 0 1528 - 0 4398 -1 4024 - 0 3092 - 0 4187 -1 3766 0 4366 - 0 4197 -1 4274 0 1424 - 0 4353 -1 3994 - 0 3095 - 0 4206 -1 3716 0 4285 - 0 4193 -1 .4269 0 1581 - 0 4335 -1 .4015 - 0 3229 - 0 4214 -1 .3767 0 4189 - 0 4333 -1 .4257 0 1515 - 0 4358 -1 .3989 - 0 3276 - 0 4217 -1 .3724 0 4154 - 0 4356 A s we can see, the regular unconstrained least squares solution, X, which minimizes | | A X " T — B\\p over al l possible 3 x 3 matrices X, does not satisfy our positive semidefinite requirement. » Xhat = ( A \ B ) ' Xhat = 5.1595 0.3075 2.3185 77 Chapter 4. Numerical Results -0.8348 6.2621 -8.1377 1.5400 -0.0070 0.6169 » norm(A*Xhat'-B,'fro') ans = 0.9805 » eig(Xhat) ans = -0.1683 6.1034 + 0.8875i 6.1034 - 0.8875i » eig((Xhat+Xhat')/2) ans = -1.8795 5.1456 8.7725 Using the P C algorithm to compute an e-approximate solution to the N S - S D L S problem, with e = 1 0 - 1 6 , we obtain a matrix X which satisfies \{X-\-XT) >z 0 and minimizes | | j 4 X r — B \ \ F over a l l 3 x 3 matrices X which satisfy + XT) >z 0. » [X,Y] = ns_sdls_precorr(A,B,[], [],le-16,l); Xbar = X' r sigma theta norm(res) <S,Y>/n 0 5 4503e+00 1 0000e+00 1 1 1534e -01 8 0257e-01 1 0761e+00 1 1770e+00 2 9 2476e -02 8 8364e-01 1 2521e-01 1 7550e-01 3 1 4385e -01 9 0385e-01 1 2039e-02 3 7142e-02 4 2 7082e -02 1 0000e+00 8 9038e-16 3 1443e-03 5 1 5178e -02 1 0000e+00 6 4673e-16 2 1483e-04 6 5 1198e -04 9 9198e-01 8 9622e-16 2 4720e-06 7 1 5787e -06 9 8928e-01 8 0999e-16 2 6492e-08 8 1 3218e -06 9 8904e-01 4 0358e-16 2 9050e-10 9 1 3263e -06 9 8901e-01 5 1891e-16 3 1919e-12 10 1 3270e -06 9 8901e-01 8 8842e-16 3 5076e-14 11 1 3460e -06 9 8895e-01 6 6974e-16 3 8598e-16 12 4 1771e -06 9 8768e-01 5 5102e-16 6 2161e-18 78 Chapter 4. Numerical Results Average iteration time: 0.0030 seconds Total time: 0.0486 seconds Xbar = 5.0392 0.4423 1.5978 -0.6207 6.0223 -6.8559 1.8979 -0.4079 2.7600 » norm(A*Xbar'-B,'fro') ans = 0.9859 » eig(Xbar) ans = 1.1481 6.3367 + 0.7865i 6.3367 - 0.7865i » eig((Xbar+Xbar')/2) ans = 0.0000 5.1401 8.6813 A s we see, the eigenvalues of X have nonnegative real parts and the symmetric part of X, \{X + XT), is positive semidefinite. Furthermore, the fact that neither \\X — X\\p nor H ^ X 7 ^ — B\\p — | | ^ 4 X r — B\\p is large indicates that the unconstrained least squares solution, X, was near the semidefinite constrained least squares solution, X; this result is to be expected due to the physical nature of this problem. » norm(Xbar-Xhat,'fro') ans = 2.6795 » norm(A*Xbar'-B,'fro') - norm(A*Xhat'-B,'fro') ans = 0.0055 79 Chapter 5 Conclusions and Future Work 5.1 Conclusions The purpose of this thesis was to develop numerical methods for solving three semidefinite constrained least squares problems: the symmetric semidefinite least squares (SDLS) problem (1.3), the nonsymmetric semidefinite least squares (NS-SDLS) problem (1.4), and the linear matrix inequality least squares ( L M I - L S ) problem (1.5). Al though the S D L S problem had been previously studied, we found that neither the N S - S D L S nor the L M I - L S problems appear to have been proposed before. Our inspiration for studying these problems came from the need to solve the N S - S D L S problem for the purpose of compliance matrix estimation i n the modeling of deformable objects. The numerical methods we chose to use to solve each of these problems were the standard path-following (SPF) and predictor-corrector (PC) interior-point methods which have been used to solve the related semidefinite programming (SDP) and nonnegative least squares (NNLS) problems. Whi le algorithms have already been proposed for the numerical solution of the S D L S problem, namely Woodgate's Algor i thm [34], this appears to be the first time these interior-point methods have been used for this problem. We presented a uniform discussion of the K K T conditions which characterize the solutions of each of the three semidefinite constrained least squares problems by developing these con-ditions for self-dual cone constrained convex programming problems in the general setting of Euclidean spaces. Using an important theorem due to Weierstrass, we were able to show that if the coefficient matrix A has full column rank then the S D L S and N S - S D L S problems have unique optimal solutions, and, under the additional assumption that the set of feasible points is nonempty, that the same holds true for the L M I - L S problem. The key discovery made in this thesis was that each of the three problems under considera-tion are actually instances of a more general problem, the semidefinite linear complementarity problem ( S D L C P ) . This connection to the S D L C P then assisted us in providing a uniform discussion of the interior-point methods we were considering. After discussing some known theoretical convergence results for interior-point methods applied to the S D L C P , we went on to describe techniques used for the numerical implementation of these interior-point methods for each of our three least squares problems. Based on the numerical experiments performed in MATLAB wi th the S P F and P C algorithms using the A H O search direction, it was determined that the P C algorithm was the most efficient for a l l three problems. Moreover, the P C algorithm for the S D L S problem was seen to be an 80 Chapter 5. Conclusions and Future Work improvement over the best current algorithm for solving this problem, Woodgate's Algor i thm. We also saw how the P C algorithm for the N S - S D L S problem was able to efficiently compute an accurate estimate of the compliance matrix at a certain location on a deformable object that satisfied the required positive semidefinite constraint. In conclusion, this thesis has made three important contributions. The first is the intro-duction and study of two new semidefinite constrained least squares problems, the N S - S D L S problem and the L M I - L S problem. The second is the proposed use of interior-point methods for the solution of the S D L S , N S - S D L S , and L M I - L S problems. Included in this second point are the resulting efficient method for compliance matrix estimation and an improved algorithm for solving the S D L S problem. The final contribution, while related to the first two, is the development of the K K T conditions for each of the three problems and the explicit connection of these conditions to the S D L C P . 5.2 Future Work In this section we summarize the many promising ideas for possible future work which arose during the course of research for this thesis. • When considering the existence of the central path i n Section 3.3, the question as to when C++ ^ 0 holds for the L M I - L S problem remained undecided. Th is is an important question that should be the topic of future work in this area. • More research could be done to propose more efficient ways to solve for the search direc-tions in these interior-point methods. For example, it may be possible and preferable to solve a Lyapunov system directly rather than using the functions vec or svec in order to solve these large linear systems. • Similarly, if the matrix A is very ill-conditioned, we could find that the methods mentioned here for solving for the search directions would not be successful due to the presence of the matrix ATA whose condition number is known to be the square of the condition number of A. It may be possible to determine an equivalent system which could be solved in a more stable manner using a Q R factorization (see, for example, [23]). • Another topic for future consideration is to develop methods for solving large sparse semidefinite constrained least squares problems. • Final ly, it would be interesting to determine i f any of the other search directions mentioned here would result in more efficient algorithms for solving our three problems than the algorithms we have implemented here wi th the A H O search direction. 81 Bibliography M . Adlers. Sparse least squares problems wi th box constraints. Linkoping Studies in Science and Technology, Linkoping, 1998. F . Alizadeh, J . A . Haeberly, and M . Overton. Primal-dual interior-point methods for semidefinite programming: Convergence rates, stability and numerical results. SIAM J. Optimization, 8:746-768, 1998. J . C . Al lwright . Positive semidefinite matrices: Characterization v ia conical hulls and least-squares solution of a matrix equation. SIAM J. Control and Optimization, 26:537-555, 1988. J . C . Allwright and K . G . Woodgate. Er ra ta and addendum: Positive semidefinite matrices: Characterization via conical hulls and least-squares solution of a matrix equation. SIAM J. Control and Optimization, 28:250-251, 1990. A. Bjorck. Numerical Methods for Least Squares Problems. S I A M , Philadelphia, 1996. J . M . Borwein and A . S. Lewis. Convex Analysis and Nonlinear Optimization: Theory and Examples. Springer-Verlag, New York, 2000. J . E . Brock. Optimal Matrices Describing Linear Systems. A I A A Journal, 6:1292-1296, 1968. T . H . Cormen, C . E . Leiserson, and R . L . Rivest. Introduction to Algorithms. M c G r a w - H i l l , New York, 1990. R . Escalante and M . Raydan. Dykstra 's algorithm for a constrained least-squares matrix problem. Numerical Linear Algebra with Applications, 3:459-471, 1996. R . A . Horn and C . R . Johnson. Topics in Matrix Analysis. Cambridge University Press, Cambridge, 1991. H . H u . Positive definite constrained least-squares estimation of matrices. Linear Algebra and Its Applications, 229:167-174, 1995. M . Ko j ima , M . Shida, and S. Shindoh. A predictor-corrector interior-point algorithm for the semidefinite linear complementarity problem using the Alizadeh-Haeberly-Overton search direction. SIAM J. Optimization, 9:444-465, 1999. M . Ko j ima , S. Shindoh, and S. Hara. Interior-point methods for the monotone semidefinite linear complementarity problem in symmetric matrices. SIAM J. Optimization, 7:86-125, 1997. N . Kris lock, J . Lang, J . Varah, D . K . Pa i , and H. -P . Seidel. Loca l compliance estimation with a positive semi-definite constraint. Work in progress, 2003. J . Lang, D . K . Pa i , and R . J . Woodham. Robotic acquisition of deformable models. Pro-ceedings of the 2002 IEEE International Conference on Robotics & Automation, 1:933-938, 2002. 82 Bibliography [16] A . Liao . O n the least squares problem of a matrix equation. Journal of Computational Mathematics, 17:589-594, 1999. [17] T H E M A T H W O R K S , I N C . M A T L A B Reference Guide. The MathWorks, Inc., Natick, M A , 1992. [18] R . D . C . Monteiro. Polynomial convergence of primal-dual algorithms for semidefinite programming based on the Monteiro and Zhang family of directions. SI AM J. Optimization, 8:797-812, 1998. [19] R . D . C . Monteiro and T . Tsuchiya. Polynomiali ty of primal-dual algorithms for semidefi-nite linear complementarity problems based on the Kojima-Shindoh-Hara family of direc-tions. Math. Programming, 85:51-80, 1999. [20] Y . Nesterov and A . Nemirovskii . Interior-Point Polynomial Algorithms in Convex Pro-gramming. S I A M , Philadelphia, 1994. [21] J . Nocedal and S. J . Wright. Numerical Optimization. Springer-Verlag, New York, 2000. [22] M . L . Overton. Private communication, 2003. [23] L . F . Portugal, J . J . Jiidice, and L . N . Vicente. A comparison of block pivoting and interior-point algorithms for linear least squares problems wi th nonnegative variables. Mathematics of Computation, 63:625-643, 1994. [24] J . Renegar. A Mathematical View of Interior-Point Methods in Convex Optimization. S I A M , Philadelphia, 2001. [25] R . T . Rockafellar. Convex Analysis. Princeton University Press, Princeton, New Jersey, 1970. [26] M . Shida, S. Shindoh, and M . Ko j ima . Existence and uniqueness of search directions in interior-point algorithms for the S D P and the monotone S D L C P . SIAM J. Optimization, 8:387-396, 1998. [27] M . J . Todd. Semidefinite optimization. Acta Numerica, 10:515-560, 2001. [28] M . J . Todd, K . C . Toh, and R . H . T i i t imc i i . S D P T 3 - a M A T L A B software package for semidefinite-quadratic-linear programming, http://www.math.cmu.edu/~reha/sdpt3.html, 2003. [29] M . J . Todd, K . C . Toh, and R . H . Tu t i inc i i . O n the Nesterov-Todd direction in semidefinite programming. SIAM J. Optimization, 8:769-796, 1998. [30] L . Vandenberghe and S. Boyd. Semidefinite programming. SIAM Review, 38:49-95, 1996. [31] S. J . Wright. Primal-Dual Interior-Point Methods. S I A M , Philadelphia, 1997. [32] K . G . Woodgate. Optimization over positive semi-definite symmetric matrices with appli-cations to quasi-Newton algorithms. P h . D . thesis, Dept. of Electrical Engineering, Imperial College, London, England, 1987. 83 Bibliography [33] K . G . Woodgate. Least-squares solution of F = PG over positive semidefinite symmetric P. Linear Algebra and Its Applications, 245:171-190, 1996. [34] K . G . Woodgate. Efficient stiffness matrix estimation for elastic structures. Computers and Structures, 69:79-84, 1998. [35] D . - X . X i e . Least-squares solution of inverse eigenpair problem of nonnegative definite ma-trices. Computers and Mathematics with Applications, 40:1241-1251, 2000. [36] Y . Zhang. O n extending some primal-dual interior-point algorithms from linear program-ming to semidefinite programming. SIAM J. Optimization, 8:365-386, 1998. 84 Appendix A Kronecker products A t many times in our discussion, we must solve a system of linear equations, but the variables we are solving for reside in a matrix rather than a vector. Here we follow the development in [2] and [29] by noticing that an easy way to deal wi th this situation is to rewrite the linear system by putt ing the entries of the unknown matrix into a vector. We do this by defining the function vec : I R n x " -> IR" 2 as vec(X) X2 where X = [x\ x2 • • • xn\. In other words, the function vec stacks the columns of a matrix into 2 a vector. In fact, vec is an isometry between IR"X™ and IR 7 1 in that it is a linear bijective function that preserves the inner-product: (X, Y) = (vec(X), vec (Y) ) , for a l l X,Y £ I R n x " . 2 We denote the inverse of vec as the function mat : IR" —> I R " x n , which is defined in the obvious way. Using vec to rewrite the linear matrix equation AXBT = C, where A G I R p x n , B G ] R « x n , X G E T X " , and C G Wxq, as the linear equation M v e c ( X ) = vec(C), we find that the matrix M G I R P ? X " is the Kronecker product of B and A, which is defined as [bnA • • • binA~] B®A:= bqiA bqnA (A.1) The properties of the Kronecker product (see [29, p. 791, 794]) can be summarized as follows: 1. [B ® A)vec{X) = vec{AXBT). 2. {B®A)T = BT®AT. 85 Appendix A. Kronecker products 3. (B®A)~1 = B~x ® A"1. 4. (B ®A)(D®C) = BD® AC. 5. If a(A) = {Xi} and a{B) = {HJ}, then a{B ® A) = {\ifij}. 6. If A and B are symmetric and positive definite, then so is B ® A We can also define an isometry between real symmetric matrices and vectors. We denote svec : Sn —> IR™, where n := \n(n + 1), as the function defined by svec(X) = [xn , v / 2 x 2 i , • • •, V^a;™!, x 2 2 , V ^ E 3 2 , . . . , V2xn2,.. • ) U / T i T i J ! where X = [XJJ]. The reason for scaling the off-diagonal entries of X by \/2 is so that the inner-product is preserved: ( X , Y) = (svec(X), svec(Y)) , for a l l X , Y G Sn. The inverse of svec is denoted as the function smat : IR™ —> Sn. Now suppose we would like to solve the linear matrix equation \{AXBT + BXAT) = C for X G <S™, where A,BE IR™X™ and C G <S™. We define the symmetric Kronecker product of B and A, denoted as B ®s A, as the matrix in IR™X™ which satisfies (B ® s ,4)svec(X) = svec (\{AXBT + BXAT)), for al l X G 5™. In fact, we can compute B <S)S A as (A.2) B ®s A = \UT{B ®A + A® B)U, where U is the n 2 x n matrix that satisfies cTsvec(X) = vec(X) , for a l l X G <S". To compute U, let gj be the ith unit vector i n IR™, for i = 1,..., n , and let i?j = smat(ej) G <S™. Then Ei is of the form Ei = 0 0 1A/2 0 0 1A/2 0 0 or Ei = 0 1 0 and the ith column of U is then Hi = Ue~i = Usvec(Ei) = vec(Ei) = vec(smat(ej)). 86 Appendix A. Kronecker products Therefore, we find that U — [vec(smat(ei)) ••• vec(smat(e s))], and since uj'iij = (Ei,Ej) = 0, iii^j, we also find that UTU = I G 1R" X " . Thus, we have UTvecX = svec(X), for a l l X G Sn, and identity (A.2) follows. We now summarize the properties of the symmetric Kronecker product (see [29, p. 794]) 1. (B ®s ^l)svec(X) = svec {\{AXBT + BXAT)). 2. (73 ®s A)T = BT ®s AT. 3. B ®s A = A ®s B. 4. (B ®s A)(D ®s C) = \{BD ®s AC + BC ®s AD). 5. If A,B G Sn commute, a (A) = {Aj}, and a{B) = {fij}, then a(B ®s A) = {^{Xi/tj + XjtM)} • 6. If A and B are symmetric and positive definite, then so is A ®s B. 87 Appendix B Matlab M-files B . l SDLS M-files B . l . l sd l s .m function [X,Y,norm_res,muv,tt,iter,fail] = sdls(A,B,XO,YO,tol,verbose) 7. [X,Y,norm_res,muv,tt,iter,fail] = SDLS(A,B,X0,Y0,tol,verbose) 7. 7. Uses a stanadard path-following interior-point method based on the % AHO search direction to solve the symmetric semidefinite constrained 7. least squares problem: 7. 7. min norm(A*X-B,'fro') 7. s.t. X symm. pos. semidef. 7. 7. where A and B are real m-by-n matrices, and X is a real n-by-n matrix. 7. 7o XO and YO are n-by-n in i t i a l strictly feasible matrices, which means 7o that XO and YO are symmetric positive definite. 7o Set as [] for the default value of eye(n). 7. 7o tol is the zero tolerance described below. 7, Set as [] for the default value of le-10. 7. 7o Set verbose = 1 for screen output during algorithm execution, 7o otherwise set vebose = 0 for no output. 7. 7o SDLS returns approximate optimal solutions to the above primal 7o problem and its associated dual problem so that 7. 7. norm(res,'fro') <= sqrt(tol)*norm(resO,'fro') 7, trace(X*Y) <= tol*trace(X0*Y0) 7. 7. where res = (Z+Z ')/2-Y, Z = A'*(A*X-B), and resO is res evaluated 88 Appendix B. Matlab M-Eles at XO, YO. SDLS optionally returns: norm_res : norm(res,'fro') at the final iterate, muv : a vector of the duality gaps for each iteration tt : the total running time of the algorithm iter : the number of iterations required f a i l : f a i l = 1 i f the algorithm failed to achieve the desired tolerances within the maximum number of iterations allowed; otherwise f a i l = 0 Nathan Krislock, University of British Columbia, 2003. N. Krislock. Numerical solution of semidefinite constrained least squares problems. M.Sc. thesis, University of British Columbia, Vancouver, British Columbia, Canada, 2003. t ic; % Start the preprocessing timer Maxlt = 100; 7, max iteration [m,n] = size(A); AA = A'*A; AB = A'*B; I = eye(n); i f isempty(XO), X = I; else, X = X0; end i f isempty(YO), Y = I; else, Y = Y0; end i f isempty(tol), tol = le-10; end XAA = X*AA; XY = X*Y; Z = XAA' - AB; Z = (Z+Z')/2; R = Z - Y; norm_res = norm(R,'fro'); mu = trace(XY)/n; muv = mu; to l l = sqrt(tol)*norm_res; tol2 = tol*mu; r = 0; theta = 0; i f verbose==l dispC '); disp([' r sigma theta norm(res) <X,Y>/n'] disp([' — ol = sprintf 0°/.3d',r) ; ol = [ o l , sprintf(' ') ] ; 89 Appendix B. Matlab M-Sles ol = [ o l , sprintf(' %12.4e',[norm_res, mu]) ]; disp(ol); end pretime = toe; '/, End the preprocessing timer while ( norm.res > to l l I mu > tol2 ) & r < Maxlt t ic; '/, Start the iteration timer 7, Compute sigma and tau i f norm_res < to l l sigma = l/n"2; else sigma = l-l/n"2; end tau = sigma*mu; 7o Compute the AHO search direction (dX,dY) E = (kron(I,Y)+kron(Y,I))/2; 7. F = (kron(I,X)+kron(X,I))/2; XZ = X*Z; M = (kron(I,XAA)+kron(AA,X)+kron(X,AA)+kron(XAA,I))/4 + E; 7. M = F*kronAA + E; d = vec(tau*I - (XZ+XZ')/2); 7. d = F*vec(-R) + vec(tau*I-(X*Y+Y*X)/2); [L,U,P] = lu(M); dx = U\(L\(P*d)); dX = mat(dx); dX = (dX+dX')/2; AAdX = AA*dX; dY = (AAdX+AAdX')/2 + R; dY = (dY+dY')/2; 7. Compute the step length theta c = 0.9 + 0.09*theta; thetal = max_step(X,dX); theta2 = max_step(Y,dY); 90 Appendix B. Matlab M-files theta_max = min([thetal, theta2]); theta = min([c*theta_max,1]); 7. Update X = X + theta*dX; Y = Y + theta*dY; XAA = X*AA; XY = X*Y; Z = XAA' - AB; Z = (Z+Z')/2; R = Z - Y; norm_res = norm(R,'fro'); mu = trace(XY)/n; muv(r+2) = mu; r = r + 1; i f verbose==l ol = sprintf ('7.3d',r) ; ol = [ o l , sprintf (' 7.12.4e', [sigma, theta, norm_res, mu]) disp(ol); end t(r) = toe; 7. End the iteration timer end i f r==MaxIt & ( norm_res > t o l l I mu > tol2 ) f a i l = 1; else f a i l = 0; end avt = mean(t); t t = sum(t) + pretime; i t e r = r; i f verbose==l i f fail==l, dispC ' ); dispCFailed to reach desired tolerance.'); end dispC ' ); disp(sprintf('Average iteration time:\t7.2.4f seconds', avt)); disp(sprintf('Total time:\t\t7.2.4f seconds', t t ) ) ; dispC ' ); end 91 Appendix B. Matlab M-61es B . 1 . 2 sd ls_precorr .m function [X,Y,norm_res,muv,tt,iter,fail] = sdls_precorr(A,B,X0,Y0,tol,verbose) 7. [X,Y,norm_res,muv,tt,iter,fail] = SDLS_PRECDRR(A,B,XO,YO,tol,verbose) 7. 7o Uses a predictor-corrector interior-point method to solve the 7. symmetric semidefinite constrained least squares problem: 7. 7o min norm(A*X-B,'fro') 7o s.t. X symm. pos. semidef. 7. 7o where A and B are real m-by-n matrices, and X is a real n-by-n matrix. % 7. XO and YO are n-by-n in i t ia l strictly feasible matrices, which means 7o that XO and YO are symmetric positive definite. 7o Set as [] for the default value of eye(n). 7. 7. tol is the zero tolerance described below. 7o Set as [] for the default value of le-10. 7. 7. Set verbose = 1 for screen output during algorithm execution, 7o otherwise set vebose = 0 for no output. 7. 7, SDLS_PRECORR returns approximate optimal solutions to the above 7t primal problem and its associated dual problem so that 7. 7o norm(res,'fro') <= sqrt(tol)*norm(resO,'fro') 7o trace (X*Y) <= tol*trace(X0*Y0) 7. 7. where res = (Z+Z')/2-Y, Z = A'*(A*X-B), and resO is res evaluated 7. at XO, YO. 7. 7. SDLS_PRECDRR optionally returns: 7. 7o norm_res : norm(res,'fro') at the final iterate, 7. muv : a vector of the duality gaps for each iteration 7. tt : the total running time of the algorithm 7o iter : the number of iterations required 7o f a i l : f a i l = 1 i f the algorithm failed to achieve the desired 7. tolerances within the maximum number of iterations allowed; 7. otherwise f a i l = 0 7o Nathan Krislock, University of British Columbia, 2003. 7. 7o N. Krislock. Numerical solution of semidefinite constrained least 7. squares problems. M.Sc. thesis, University of British Columbia, 92 Appendix B. Matlab M-files Vancouver, British Columbia, Canada, 2003. t ic; % Start the preprocessing timer Maxlt = 100; '/, max iteration [m,n] = size (A) ; AA = A'*A; AB = A'*B; I = eye(n); i f isempty(XO), X = I; else, X = X0; end i f isempty(YO), Y = I; else, Y = YO; end i f isempty(tol), tol = le-10; end XAA = X*AA; XY = X*Y; Z = XAA' - AB; Z = (Z+Z')/2; R = Z - Y; norm_res = norm(R,'fro'); mu = trace(XY)/n; muv = mu; to l l = sqrt(tol)*norm_res; tol2 = tol*mu; r = 0; theta = 0; i f verbose==l dispC '); disp([' r sigma theta norm(res) <X,Y>/n disp([' ol = sprintf 07.3d',r); ol = [ o l , sprintf(' ') ]; ol = [ o l , sprintf(' %12.4e',[norm_res, mu]) ]; disp(ol); end pretime = toe; 7, End the preprocessing timer while ( norm_res > to l l I mu > tol2 ) & r < Maxlt t ic; '/, Start the iteration timer 7, Compute the predictor directions dXp, dYp E = (kron(I,Y)+kron(Y,I))/2; '/, F = (kron(I,X)+kron(X,I))/2; 93 Appendix B. Matlab M-files XZ = X*Z; M = (kron(I,XAA)+kron(AA,X)+kron(X,AA)+kron(XAA,I))/4 + E; 7. M = F*kronAA + E; d = -vec((XZ+XZ')/2); 7. d = F*vec(-R) + vec(-(X*Y+Y*X)/2); [L,U,P] = lu(M); dxp = U\(L\(P*d)); dXp = mat(dxp); dXp = (dXp+dXp')/2; AAdXp = AA*dXp; dYp = (AAdXp+AAdXp')/2 + R; dYp = (dYp+dYp')/2; 7o Compute predictor step length ptheta c = 0.9 + 0.09*theta; pthetal = max_step(X,dXp); ptheta2 = max_step(Y,dYp); ptheta_max = min([pthetal, ptheta2]); ptheta = min([c*ptheta_max,1]); '/, Compute the Mehrotra sigma and parameter tau muhat = sum(sum((X+ptheta*dXp).*(Y+ptheta*dYp)))/n; sigma = (muhat/mu)"3; tau = sigma*mu; 7. Compute the predictor-corrector directions dX, dY dYpdXp = dYp*dXp; d = d + vec(tau*I-(dYpdXp+dYpdXp')/2); dx = U\(L\(P*d)); dX = mat(dx); dX = (dX+dX')/2; AAdX = AA*dX; dY = (AAdX+AAdX')/2 + R; dY = (dY+dY')/2; 7o Compute the step length theta thetal = max_step(X,dX); theta2 = max_step(Y,dY); 94 Appendix B. Matlab M-Gles theta_max = min([thetal, theta2]); theta = min([c*theta_max,1]); 7. Update X = X + theta*dX; Y = Y + theta*dY; XAA = X*AA; XY = X*Y; Z = XAA' - AB; Z = (Z+Z')/2; R = Z - Y; norm_res = norm(R,'fro'); mu = trace(XY)/n; muv(r+2) = mu; r = r + 1; i f verbose==l ol = sprintf 07.3d',r) ; ol = [ o l , sprintf(' 7d2.4e',[sigma, theta, norm_res, mu] disp(ol); end t(r) = toe; 7o End the iteration timer end i f r==MaxIt & ( norm_res > t o l l I mu > tol2 ) f a i l = 1; else f a i l = 0; end avt = mean(t); t t = sum(t) + pretime; i t e r = r; i f verbose==l i f fail==l, dispC '); dispCFailed to reach desired tolerance.'); end dispC '); disp(sprintf('Average iteration time: \ t7o2.4f seconds', avt)); disp(sprintf('Total time:\t \ t7o2.4f seconds', t t ) ) ; dispC '); end 95 Appendix B. Matlab M-files B.2 NS-SDLS M-files B . 2 . 1 ns_sdls.m function [X,Y,norm_res,muv,tt,iter,fail] = ns_sdls(A,B,X0,Y0,tol,verbose) % [X,Y,norm_res,muv,tt,iter,fail] = NS.SDLS(A,B,X0,Y0,tol,verbose) °l. Uses a stanadard path-following interior-point method to solve the % nonsymmetric semidefinite constrained least squares problem: 7. min norm(A*X-B,'fro') '/, s.t. (X+X')/2 pos. semidef. 7, where A and B are real m-by-n matrices, and X is a real n-by-n matrix. 7, XO and YO are n-by-n in i t ia l strictly feasible matrices, which means 7o that (X0+X0')/2 and YO are symmetric positive definite. 7o Set as [] for the default value of eye(n). '/, tol is the zero tolerance described below. 7o Set as [] for the default value of le-10. 7. Set verbose = 1 for screen output during algorithm execution, 7c otherwise set vebose = 0 for no output. 7o NS_SDLS returns approximate optimal solutions to the above primal 7, problem and its associated dual problem so that 7. norm(res,'fro') <= sqrt(tol)*norm(resO, 'fro') 7. trace (S*Y) <= tol*trace (S0*Y0) % where res = A'*(A*X-B)-Y, S = (X+X')/2 and resO and SO are res 7. and S evaluated at XO, YO. 7o NS_SDLS optionally returns: 7o norm_res : norm(res,'fro') at the final iterate, 7o muv : a vector of the duality gaps for each iteration 7. tt : the total running time of the algorithm 7o iter : the number of iterations required 7o f a i l : f a i l = 1 i f the algorithm failed to achieve the desired 7o tolerances within the maximum number of iterations allowed; 7o otherwise f a i l = 0 7. Nathan Krislock, University of British Columbia, 2003. 96 Appendix B. Matlab M-files % N. Krislock. Numerical solution of semidefinite constrained least % squares problems. M.Sc. thesis, University of British Columbia, '/, Vancouver, British Columbia, Canada, 2003. t ic; % Start the preprocessing timer Maxlt = 100; '/, max iteration [m,n] = size(A); AA = A'*A; AB = A'*B; I = eye(n); V = vecV(n); '/, vec((X+X')/2) = V*vec(X) i f isempty(XO), X = I; else, X = XO; end, S = (X+X')/2; i f isempty(YO), Y = I; else, Y = YO; end i f isempty(tol), tol = le-10; end Z = AA*X - AB; R = Z - Y; SY = S*Y; norm_res = norm(R,'fro'); mu = trace(SY)/n; muv = mu; toll = sqrt(tol)*norm_res; tol2 = tol*mu; r = 0; theta = 0; i f verbose==l dispC '); disp([' r sigma theta norm(res) <S,Y>/n disp([' — ol = sprintf ("/.3d' ,r) ; ol = [ o l , sprintf(' ') ] ; ol = [ o l , sprintf (' °/ 012.4e', [norm_res, mu]) ]; disp(ol); end pretime = toe; End the preprocessing timer while ( norm_res > to l l I mu > tol2 ) & r < Maxlt t ic; °/, Start the iteration timer °/0 Compute sigma and tau 97 endix B. Matlab M-files i f norm_res < t o l l , sigma = l/n"2; else sigma = l-l/n~2; end tau = sigma*mu; 7, Find suitable directions dX, dS, dY E = (kron(I,Y)+kron(Y,I))*V/2; '/. F = (kron(I,S)+kron(S,I))/2; M = (kron(I,S*AA)+kron(S,AA))/2 + E; d = vec(tau*I - (S*R+R*S+SY+SY')/2); [L,U,P] = lu(M); dx = U\(L\(P*d)); dX = mat(dx); dS = (dX+dX')/2; dY = AA*dX + R; dY = (dY+dY')/2; 7. M = F*kron(I,AA) + E; 7. d = F*vec(-R) 7. + vec(tau*I-(S*Y+Y*S)/2); 7. Find step length theta c = 0.9 + 0.09*theta; thetal = max_step(S,dS); theta2 = max_step(Y,dY); theta_max = min([thetal, theta2]); theta = min([c*theta_max,1]); 7. Update X = X + theta*dX; S = (X+X')/2; Y = Y + theta*dY; Z = AA*X - AB; R = Z - Y; SY = S*Y; norm_res = norm(R,'±ro'); mu = trace(SY)/n; muv(r+2) = mu; r = r + 1; i f verbose==l 98 Appendix B. Matlab M-files o l = sprintf 07.3d' ,r); ol = [ o l , sprintf(' 7.12.4e', [sigma, theta, norm_res, mu]) ]; disp(ol); end t(r) = toe; 7. End the iteration timer end i f r==MaxIt & ( norm_res > t o l l I mu > tol2 ) f a i l = 1; else f a i l = 0; end avt = mean(t); t t = sum(t) + pretime; i t e r = r; i f verbose==l i f fail==l, dispC ' ); dispCFailed to reach desired tolerance.'); end dispC ' ); disp(sprintf('Average iteration time:\t7.2.4f seconds', avt)); disp(sprintf('Total time:\t\t7.2.4f seconds', t t ) ) ; dispC ' ); end 99 Appendix B. Matlab M-Gles B.2.2 ns_slds_precorr.m function [X,Y,norm_res,muv,tt,iter,fail] = ns_sdls_precorr(A,B,X0,Y0,tol,verbose) '/, [X,Y,norm_res,muv,tt,iter,fail] = NS_SDLS_PRECORR(A,B,XO,YO,tol,verbo 7. 7. Uses a predictor-corrector interior-point method to solve the 7o nonsymmetric semidefinite constrained least squares problem: 7. 7. min norm(A*X-B,'fro') 7. s.t. (X+X')/2 pos. semidef. % 7o where A and B are real m-by-n matrices, and X is a real n-by-n matrix 7. 7o XO and YO are n-by-n in i t i a l strictly feasible matrices, which means 7o that (X0+X0')/2 and YO are symmetric positive definite. 7o Set as [] for the default value of eye(n). 7. 7. tol is the zero tolerance described below. 7. Set as [] for the default value of le-10. 7. 7o Set verbose = 1 for screen output during algorithm execution, '/, otherwise set vebose = 0 for no output. 7. 7o NS_SDLS_PRECORR returns approximate optimal solutions to the above 7o primal problem and its associated dual problem so that 7. 7o norm(res,'fro') <= sqrt(tol)*norm(res0, 'fro') 7. trace (S*Y) <= tol*trace(S0*Y0) 7. 7. where res = A'*(A*X-B)-Y, S = (X+X')/2 and resO and SO are res 7o and S evaluated at XO, YO. % '/, NS_SDLS_PRECORR optionally returns: 7. % norm_res : norm(res,'fro') at the final iterate, 7o muv : a vector of the duality gaps for each iteration 7o tt : the total running time of the algorithm 7. iter : the number of iterations required 7o f a i l : f a i l = 1 i f the algorithm failed to achieve the desired 7, tolerances within the maximum number of iterations allowed; '/, otherwise f a i l = 0 7o Nathan Krislock, University of British Columbia, 2003. 7. 7o N. Krislock. Numerical solution of semidefinite constrained least 100 Appendix B. Matlab M-files % squares problems. M.Sc. thesis, University of B r i t i s h Columbia, % Vancouver, B r i t i s h Columbia, Canada, 2003. t i c ; 7, Start the preprocessing timer Maxlt = 100; °/, max iteration [m,n] = size(A); AA = A'*A; AB = A'*B; I = eye(n); V = vecV(n); */. vec((X+X')/2) = V*vec(X) i f isempty(XO), X = I; else, X = XO; end, S = (X+X')/2; i f isempty(YO), Y = I; else, Y = YO; end i f isempty(tol), t o l = le-10; end Z = AA*X - AB; R = Z - Y; SY = S*Y; norm_res = norm(R,'fro'); mu = trace(SY)/n; muv = mu; t o l l = sqrt(tol)*norm_res; tol2 = tol*mu; r = 0; theta = 0; i f verbose==l dispC ' ) ; disp([' r sigma theta norm(res) <S,Y>/n disp([> — ol = sprintf ('°/,3d' ,r) ; ol = [ o l , sprintf(' ') ] ; ol = [ o l , sprintf (' °/,12.4e', [norm_res, mu]) ]; disp(ol); end pretime = toe; End the preprocessing timer while ( norm_res > t o l l I mu > tol2 ) & r < Maxlt t i c ; 7. Start the iteration timer % Compute the predictor directions dXp, dSp, dYp 101 Appendix B. Matlab M-files E = (kron(I,Y)+kron(Y,I))*V/2; 7. F = (kron(I,S)+kron(S,I))/2; M = (kron(I,S*AA)+kron(S,AA))/2 + E; 7. M = F*kronAA + E; d = -vec((S*R+R*S+SY+SY')/2); 7. d = F*vec(-R) + vec(-(S*Y+Y*S)/2); [L,U,P] = lu(M); dxp = U\(L\(P*d)); dXp = mat(dxp); dSp = (dXp+dXp')/2; dYp = AA*dXp + R; dYp = (dYp+dYp')/2; 7, Compute predictor step length ptheta c = 0.9 + 0.09*theta; pthetal = max_step(S,dSp); ptheta2 = max_step(Y,dYp); ptheta_max = min([pthetal, ptheta2]); ptheta = min([c*ptheta_max,1]); '/, Compute the Mehrotra sigma and parameter tau muhat = sum(sum((S+ptheta*dSp).*(Y+ptheta*dYp)))/n; sigma = (muhat/mu)"3; tau = sigma*mu; 7o Compute the predictor-corrector directions dX, dS, dY dYpdSp = dYp*dSp; d = d + vec(tau*I-(dYpdSp+dYpdSp')/2); dx = U\(L\(P*d)); dX = mat(dx); dS = (dX+dX')/2; dY = AA*dX + R; dY = (dY+dY')/2; 7o Compute the step length theta thetal = max_step(S,dS); theta2 = max_step(Y,dY); 102 Appendix B. Matlab M-files theta_max = min([thetal, theta2]); theta = min([c*theta_max,1]); 7. Update X = X + theta*dX; S = (X+X')/2; Y = Y + theta*dY; Z = AA*X - AB; R = Z - Y; SY = S*Y; norm_res = norm(R,'fro'); mu = trace(SY)/n; muv(r+2) = mu; r = r + 1; i f verbose==l ol = sprintf ('7.3d' ,r) ; ol = [ o l , sprintf (' 7.12.4e', [sigma, theta, norm_res, mu]) disp(ol); end t(r) = toe; 7. End the iteration timer end i f r==MaxIt & ( norm_res > t o l l | mu > tol2 ) f a i l = 1; else f a i l = 0; end avt = mean(t); t t = sum(t) + pretime; i t e r = r; i f verbose==l i f fail==l, dispC '); disp('Failed to reach desired tolerance.'); end dispC '); disp(sprintf('Average iteration time:\t7.2.4f seconds', avt)); disp (sprintf ('Total time:\t\t7.2.4f seconds', t t ) ) ; dispC '); end 103 Appendix B. Matlab M-files B.3 LMI-LS M-files B.3.1 ImiJs.m function [x, S, Y,norm_res,muv, t t , i t er, f ai 1] = lmi_ls(A,b,KK,C,xO,SO,YO,tol,verbose) 7. [x,S,Y,norm_res,muv,tt,iter,fail] = LMI_LS(A,b,KK,C,x0,SO,YO,tol,verbose) 19 7. Uses a stanadard path-following interior-point method to solve the 7o linear matrix inequality least squares problem: 7. 7o min norm(A*x-b) 7. s.t. K(x) + S = C 7. S is symm. pos. semidef. 7. '/, where A is a real m-by-n matrix, x is a real n-by-1 vector, 7. b is a real m-by-1 vector, S and C are real n-by-n symmetric 7. matrices, and K(x) = sum( x(i)*K_i, i=l:n ) where KK = [K_l, . . . ,K_n] 7o and K_i is a real k-by-k symmetric matrix for i=l:n. 7. 7o SO and YO are n-by-n in i t ia l strictly feasible matrices, which means 7o that SO and YO are symmetric positive definite. 7o Set as [] for the default value of eye(n). 7. 7o tol is the zero tolerance described below. 7o Set as [] for the default value of le-10. 7. 7. Set verbose = 1 for screen output during algorithm execution, 7o otherwise set vebose = 0 for no output. 7. 7o LMI_LS returns approximate optimal solutions to the above primal 7o problem and its associated dual problem so that 7. 7. norm(res,'fro') <= sqrt(tol)*norm(resO,'fro') 7. trace (S*Y) <= tol*trace(S0*Y0) 7. 7. where res = [rp; rd] is the primal-dual residual vector 7o and resO is res evaluated at xO, SO, YO. 7. 7o LMI_LS optionally returns: 7. 7» norm_res : norm(res,'fro') at the final iterate, 7o muv : a vector of the duality gaps for each iteration 7o tt : the total running time of the algorithm 7o iter : the number of iterations required 7o f a i l : f a i l = 1 i f the algorithm failed to achieve the desired 104 Appendix B. Matlab M-Gles 7. tolerances within the maximum number of iterations allowed; 7. otherwise f a i l = 0 7o Nathan Krislock, University of British Columbia, 2003. 7. 7. N. Krislock. Numerical solution of semidefinite constrained least 7o squares problems. M.Sc. thesis, University of British Columbia, 7o Vancouver, British Columbia, Canada, 2003. t ic; % Start the preprocessing timer Maxlt = 100; 7, max iteration [m,n] = size(A); k = length(C); AA = A'*A; Ab = A'*b; I = eye(k); Ik2 = eye(k~2); D = zeros(k~2,n); 0k2 = zeros(k~2,k~2); K = vecK(KK); 7. vec(K(x)) = K*x, K"{*}(Y) = K'*vec(Y) i f isempty(YO), Y = I; else, Y = Y0; end i f isempty(xO), x = zeros(n,l); else, x = xO; end i f isempty(SO), S = I; else, S = SO; end i f isempty(tol), tol = le-10; end Kx = mat(K*x); y = vec(Y); KY = K'*y; SY = S*Y; z = Ab - AA*x; rd = z - KY; Rp = C - Kx - S; rp = vec(Rp); norm_res = norm([rp;rd]); mu = trace(SY)/k; muv = mu; to l l = sqrt(tol)*norm_res; tol2 = tol*mu; r = 0; theta = 0; i f verbose==l dispC '); disp([' r sigma theta norm(res) <S,Y>/n'] disp([' — ol = sprintf ('7.3d',r) ; ol = [ o l , sprintf(' ') ] ; ol = [ o l , sprintf (' 7.12.4e', [norm_res, mu]) ]; disp(ol); end 105 Appendix B. Matlab M-files pretime = toe; '/„ End the preprocessing timer while ( norm_res > to l l | mu > tol2 ) & r < Maxlt t ic; 7, Start the iteration timer '/. Compute sigma and tau i f norm_res < t o l l , sigma = l/k~2; else sigma = l-l/k~2; end tau = sigma*mu; °/0 Find suitable directions dx, dS, dY E = (kron(I,S)+kron(S,I))/2; F = (kron(I,Y)+kron(Y,I))/2; D = tau*I-(SY+SY')/2; rc = vec(D); M = [ AA, K', 0'; K, 0k2, Ik2; 0, E , F ]; d = [ rd; rp; rc ]; w = M\d; dx = w(1:n); dy = w((n+l):(k-2+n)); ds = w((k~2+n+l):(2*k~2+n)); dY = mat(dy); dY = (dY+dY')/2; dS = mat(ds); dS = (dS+dS')/2; 7o Find step length theta c = 0.9 + 0.09*theta; thetal = max_step(S,dS); theta2 = max_step(Y,dY); 106 Appendix B. Matlab M-Sles theta_max = min([thetal, theta2]); theta = min([c*theta_max,1]); 7. Update Y = Y + theta*dY; x = x + theta*dx; S = S + theta*dS; Kx=mat(K*x); y=vec(Y); KY = K'*y; SY = S*Y; z = Ab - AA*x; rd = z - KY; R p = C - K x - S ; rp= vec(Rp); norm_res = norm([rp;rd]); mu = trace(SY)/k; muv(r+2) = mu; r = r + 1; i f verbose==l ol = sprintf ('7.3d' ,r) ; ol = [ o l , sprintf (' 7 o l 2 . 4 e ' , [sigma, theta, norm_res, mu] disp(ol); end t(r) = toe; 7o End the iteration timer end i f r==MaxIt & ( norm_res > to l l I mu > tol2 ) f a i l = 1; else f a i l = 0; end avt = mean(t); tt = sum(t) + pretime; iter = r; i f verbose==l if fail==l, dispC '); disp('Failed to reach desired tolerance.'); end dispC '); disp(sprintf('Average iteration t ime:\t7o2 .4 f seconds', avt)); 107 Appendix B. Matlab M-files disp(sprintf('Total time:\t\t7.2.4f seconds', tt)) dispC '); end 108 Appendix B. Matlab M-files B . 3 . 2 l m i J s _ p r e c o r r . m function [x,S,Y,norm_res,muv, tt , i t er, f ai 1] = lmi_ls_precorr(A,b,KK,C,xO,SO,YO,tol,verbose) 7. [x, S,Y,norm_res ,muv,tt, i ter , fai l ] 7. = LMIJJSJ>RECORR(A,b,KK,C,xO,SO,YO,tol,verbose) 7o Uses a stanadard path-following interior-point method to solve the 7o linear matrix inequality least squares problem: 7o min norm(A*x-b) 7. s.t. K(x) + S = C 7. S is symm. pos. semidef. 7o where A is a real m-by-n matrix, x is a real n-by-1 vector, 7o b is a real m-by-1 vector, S and C are real n-by-n symmetric 7o matrices, and K(x) = sum( x(i)*K_i, i=l:n ) where KK = [K_l, . . . ,K_n] %, and K_i is a real k-by-k symmetric matrix for i=l:n. 7o SO and YO are n-by-n in i t ia l strictly feasible matrices, which means 7. that SO and YO are symmetric positive definite. 7o Set as [] for the default value of eye(n). 7. tol is the zero tolerance described below. 7o Set as [] for the default value of le-10. 7. Set verbose = 1 for screen output during algorithm execution, 7. otherwise set vebose = 0 for no output. 7, LMI_LS_PRECORR returns approximate optimal solutions to the above primal 7o problem and its associated dual problem so that 7. norm (res,'fro') <= sqrt(tol)*norm(resO, 'fro') 7, trace(S*Y) <= tol*trace(S0*Y0) 7o where res = [rp; rd] is the primal-dual residual vector 7. and resO is res evaluated at xO, SO, YO. 7. LMI_LS_PRECORR optionally returns: 7. norm_res : norm(res,'fro') at the final iterate, 7o muv : a vector of the duality gaps for each iteration 7o tt : the total running time of the algorithm 7. iter : the number of iterations required % f a i l : f a i l = 1 i f the algorithm failed to achieve the desired 7o tolerances within the maximum number of iterations allowed; 109 Appendix B. Matlab M-files '/, otherwise f a i l = 0 '/„ Nathan Krislock, University of British Columbia, 2003. 7. 7o N. Krislock. Numerical solution of semidefinite constrained least 7o squares problems. M.Sc. thesis, University of British Columbia, 7o Vancouver, British Columbia, Canada, 2003. t ic; 7o Start the preprocessing timer Maxlt = 1 0 0 ; 7o max iteration [m,n] = size(A); k = length(C); AA = A'*A; Ab = A'*b; I = eye(k); Ik2 = eye(k~2); 0 = zeros(k~2,n); 0 k 2 = zeros(k"2,k"2); K = vecK(KK); 7. vec(K(x)) = K*x, K~{*}(Y) = K'*vec(Y) i f isempty(YO), Y = I; else, Y = Y0; end i f isempty(xO), x = zeros(n,l); else, x = xO; end i f isempty(SO), S = I; else, S = SO; end i f isempty(tol), tol = l e - 1 0 ; end Kx = mat(K*x); y = vec(Y); KY = K'*y; SY = S*Y; z = Ab - AA*x; rd = z - KY; Rp = C - Kx - S; rp = vec(Rp); norm_res = norm([rp;rd]); mu = trace(SY)/k; muv = mu; to l l = sqrt(tol)*norm_res; tol2 = tol*mu; r = 0; theta = 0; i f verbose==l dispC '); disp([' r sigma theta norm(res) <S,Y>/n disp([' — ol = sprintf ('7,3d' ,r) ; ol = •[ o l , sprintf(' ') ]; ol = [ o l , sprintf (' 7o l2 .4e ' , [norm_res, mu]) ]; disp(ol); end 1 1 0 Appendix B. Matlab M-files pretime = toe; % End the preprocessing timer while ( norm_res > to l l I mu > tol2 ) & r < Maxlt t ic; % Start the iteration timer '/, Compute the predictor directions dxp, dSp, dYp E = (kron(I,S)+kron(S,I))/2; F = (kron(I,Y)+kron(Y,I))/2; D = -(SY+SY')/2; rep = vec(D); M = [ AA, K' , 0'; K, 0k2, Ik2; 0, E , F ]; d = [ rd; rp; rep ]; [L,U,P] = lu(M); w = U\(L\(P*d)); dxp = w (1: n) ; dyp = w((n+l):(k-2+n)); dsp = w((k"2+n+l):(2*k~2+n)); dYp = mat(dyp); dYp = (dYp+dYp')/2; dSp = mat(dsp); dSp = (dSp+dSp')/2; 7, Compute predictor step length ptheta c = 0.9 + 0.09*theta; pthetal = max_step(S,dSp); ptheta2 = max_step(Y,dYp); ptheta_max = min([pthetal, ptheta2]); ptheta = min([c*ptheta_max,1]); 7o Compute the Mehrotra sigma and parameter tau muhat = sum(sum((S+ptheta*dSp).*(Y+ptheta*dYp)))/k; sigma = (muhat/mu)"3; tau = sigma*mu; 111 Appendix B. Matlab M-files % Compute the predictor-corrector directions dx, dS, dY dSpdYp = dSp*dYp; D = tau*I-(dSpdYp+dSpdYp')/2; rc = vec(D); d = [ rd; rp; rep + rc ]; v = U\(L\(P*d)); dx = w(1:n); dy = w((n+l):(k-2+n)); ds = w((k~2+n+l):(2*k"2+n)); dY = mat(dy); dY = (dY+dY')/2; dS = mat(ds); dS = (dS+dS')/2; % Compute the step length theta thetal = max_step(S,dS); theta2 = max_step(Y,dY); theta_max = min([thetal, theta2]); theta = min([c*theta_max,1]); 7. Update Y = Y + theta*dY; x = x + theta*dx; S = S + theta*dS; Kx = mat(K*x); y = vec(Y); KY = K'*y; SY = S*Y; z = Ab - AA*x; rd = z - KY; Rp = C - Kx - S; rp = vec(Rp); norm_res = norm([rp;rd]); mu = trace(SY)/k; muv(r+2) = mu; r = r + 1; i f verbose==l ol = sprintf 0 7 . 3 d ' , r ) ; ol = [ o l , sprintf(' 7d2.4e',[sigma, theta, norm_res, mu]) ]; disp(ol); 112 Appendix B. Matlab M-Bles end t ( r ) = t o e ; % End the i t e r a t i o n t i m e r end i f r==MaxIt & ( norm_res > t o l l I mu > tol2 ) f a i l = 1; e l s e f a i l = 0; end avt = mean( t ) ; t t = sum(t) + p r e t i m e ; i t e r = r ; i f verbose==l i f f a i l = = l , d i s p C ' ) ; d i s p ( ' F a i l e d t o r e a c h d e s i r e d t o l e r a n c e . ' ) ; end d i s p C ' ) ; d i s p ( s p r i n t f ( ' A v e r a g e i t e r a t i o n t ime: \ t%2 .4f s e c o n d s ' , a v t ) ) ; d i s p ( s p r i n t f ( ' T o t a l t i m e : \ t \ t ° / . 2 . 4 f s e c o n d s ' , t t ) ) ; d i s p C ' ) ; end 113 Appendix B. Matlab M-files B .4 Woodgate's Algorithm M-files B . 4 . 1 wg_sdls.m function [K,norm_res,dgv,tt,iter,fail] = wg_sdls(A,B,K0,tol,verbose); 7. [K,norm_res,dgv,tt, iter, fail] = WG_SDLS( A, B,KO, tol .verbose); % 7. Uses Woodgate's Algorithm to solve the symmetric semidefinite °/0 constrained least squares problem: 7. 7, min norm(A*K-B,'fro') 7. s.t. K symm. pos. semidef. 7. 7o where A and B are real m-by-n matrices, and K is a real n-by-n matrix. 7. 7o KO is an n-by-n in i t ia l symmetric matrix. 7o Set as [] for the default value of KO = aleph(Kbar+), where Kbar is the 7o solution of the symmetric least squares problem, 7. 7o min norm(A*K-B,'fro') 7. s.t. K = K' 7. 7o Kbar+ is the result of setting any negative eigenvalue in the 7. eigenvalue decomposition of Kbar to zero, and 7, 7o aleph(Kbar+) = alpha*Kbar+. 7. 7o tol is the zero tolerance described below. 7. Set as [] for the default value of le-10. 7. 7. Set verbose = 1 for screen output during algorithm execution, 7. otherwise set vebose = 0 for no output. 7. 7o WG_SDLS returns approximate optimal solutions to the above problem 7, so that 7o norm(res,'fro') <= sqrt(tol) 7, trace (K*L(K)) <= tol 7. 7. where L(K) = eigshift(A'*(A*K-B) + (A*K-B)'*A, 0) and 7. res = L(K) - (A'*(A*K-B) + (A*K-B)'*A). 7. 7o WG_SDLS optionally returns: 7. 7, norm_res : norm(res,'fro') at the final iterate, 7« dgv : a vector of the duality gaps for each iteration 7. tt : the total running time of the algorithm 114 Appendix B. Matlab M-files '/„ iter : the number of iterations required 7« f a i l : f a i l = 1 i f the algorithm failed to achieve the desired '/, tolerances within the maximum number of iterations allowed; % otherwise f a i l = 0 '/, Nathan Krislock, University of British Columbia, 2003. % °/0 Based on an algorithm described in °/0 K. G. Woodgate. "Efficient stiffness matrix estimation for elastic '/. structures" Computers & Structures, 69:79-84, 1998. t ic; % Start the preprocessing timer ItMax = 100; °/0 max iteration [m,n] = size (A) ; X = A'; F = B'; I = eye(n); In2 = eye(n~2); In2divn2 = (l/n~2)*In2; XX = X*X'; FF = F*F'; FX = F*X'; Q = FX + FX>; 1 vec(E') = V*vec(E) V = []; for i=l:n W = []; for j=l:n Eij = zeros(n); E i j ( i . j ) = 1; W = [W; E i j ] ; end V = [V, W]; end i f isempty(KO), K = lyap(XX,-Q); K = eigshift(K.O); trKKXX = trace(K*K*XX); i f trKKXX==0 alpha =0; else alpha = max([0,trace(K*Q)/(2*trKKXX)]); end K = alpha*K; else K = K0; end 115 Appendix B. Matlab M-Sles i f isempty(tol), tol = le-10; end E = real(sqrtm(K + le-14*I)); K = E'*E; KXX = K*XX; LK = KXX + KXX' - Q; LK = eigshift(LK.O); res = LK - KXX - KXX' + Q; norm_res = norm(res,'fro'); dual_gap = abs(trace(K*LK)/n); dgv = dual_gap; to l l = sqrt(tol); tol2 = tol; r = 0; t = 0; i f verbose==l dispC '); disp([' r omega alpha norm(res) dual_gap']); disp([' ']) ol = sprintf ('°/.3d',r) ; ol = [ o l , sprintf(' ') ]; ol = [ o l , sprintf (' °/,12.4e',norm_res,dual_gap) ]; disp(ol); end pretime = toe; % End the preprocessing timer while ( norm_res > to l l I dual_gap > tol2 ) & r < ItMax t ic; '/, Start the iteration timer 7, Determine the direction D kronEXXEV = (kron(E*XX,E') + kron(E,XX*E'))*V; Z = kron(E*XX*E',1) + kron(K.XX) + kronEXXEV; M = Z + kron(I,LK); "/, This is the matrix used in Woodgate's paper M = M + In2divn2; % We have modified this matrix by adding % (l/n~2)*kron(I,I) to avoid near sigularity '/, and discovered i t has greatly improved the % convergence of this algorithm. 116 Appendix B. Matlab M-files w = (kronEXXEV - kron(I,Q))*vec(E'); [L,U,P] = lu(M); d = U\(L\(P*w)); D=mat(d)'; Determine the step length omega OPTIONS = optimset('fminbnd'); OPTIONS.TolX = le-4; omega = fminbnd(@wg_f,0,1,OPTIONS, FF, E,D, XX, Q) ; 7» Update E to aleph(E-omega*D) E = E - omega*D; K = E'*E; trKKXX = trace(K*K*XX); i f trKKXX==0 alpha = 0; else alpha = sqrt(max([0,trace(K*Q)/(2*trKKXX)])); end E = alpha*E; K = E'*E; KXX = K*XX; LK = KXX + KXX' - Q; LK = eigshift(LK.O); res = LK - KXX - KXX' + Q; norm_res = norm(res,'fro'); dual_gap = abs(trace(K*LK)/n); dgv(r+2) = dual_gap; r = r + 1; i f verbose==l ol = sprintf (''/.3d',r) ; ol = [ o l , sprintf(' %12.4e',[omega, alpha, norm_res, dual_gap]) ]; disp(ol); end t(r) = toe; 7, End the iteration timer end 1 1 7 Appendix B. Matlab M-Gles i f r==ItMax & ( norm_res > to l l I dual_gap > tol2 ) f a i l = 1; else f a i l = 0; end avt = mean(t); tt = sum(t) + pretime; iter = r; i f verbose==l i f fail==l, dispC '); disp('Failed to reach desired tolerance.'); end dispC '); disp(sprintf('Average iteration time:\ty,2.4f seconds', avt)); disp(sprintf('Total time:\t\t%2.4f seconds', tt)); dispC '); end 118 Appendix B. Matlab M-files B . 5 Miscellaneous M-files B.5.1 vec.m function v = vec(V) 7. v = vec(V) 7. 7o Given an m-by-n matrix V, returns the corresponding 7. m*n column vector v. [m,n] = size(V); v = []; for i=l:n v = [v; V(: , i ) ] ; end B.5.2 mat.m function V = mat(v,n) 7. V = mat(v.n) 7. 7o Given an m*n column vector v, returns the corresponding 7o m-by-n matrix V. If n is not given, it is assumed that 7o v is an n~2 column vector. if nargin == 1 n2 = length(v); n = sqrt(n2); m = n; mn = m*n; else mn = length(v); m = mn/n; end k = 1; for i=l:m:mn V(: ,k) = v(i:(i+m-D); k = k+1; end 119 Appendix B. Matlab M-files B .5 .3 v e c V . m function V = vecV(n) 7. V = vecV(n) 7. Calculates the n~2-by-n~2 matrix V which satisfies '/. vec((X+X')/2) = V*vec(X) 7, for a l l n-by-n matrices X. V = []; for i=l:n W = []; for j=l:n E = zeros(n); E( i , j ) = 1; W = [W; E] ; end V = [V, W]; end V = (eye(n~2)+V)/2; B .5 .4 v e c K . m function K = vecK(KK) •/. K = vecK(KK) 7o Calculates the k~2-by-n matrix K which satisfies 7. vec(K(x)) = K*x 7. where K(x) = x(l)*K_l + . . . + x(n)*K_n, '/, KK = [K_l,: . . ,K_n] , and each K_i is a k-by-k matrix. % This matrix K can be described as 7. K= [vec(K_l) , . . . ,vec(K_n)] . [k, nk] = size(KK); n = nk/k; K = • ; for i=l:n K= [K, vec(KK(:,((i-l)*k+l):(i*k)))]; end 120 Appendix B. Matlab M-files B .5.5 max step.m function theta_max = max_step(X,dX) 7. theta = MAX_STEP(X,dX) 7. 7. Given n-by-n symmetric matrices, X and dX, where X is positive definite, 7. MAX_STEP returns the largest theta_max > 0 such that 7. 7. X + theta*dX 7. 7o is positive definite for a l l 0 < theta < theta_max. If X + theta*dX is 7o positive definite for a l l theta, then theta_max = Inf. x = max(eig(-dX,X,'chol')) ; i f x > 0 , theta_max = 1/x; else theta_max = Inf; end B.5.6 eigshift.m function M = eigshift(X,lam_min) 7o M = eigshift(X,lam_min) 7. 7. Returns the result of setting a l l the eigenvalues less than lam_min 7» in the spectral form of X (or (X+X')/2 i f X is not symmetric) to lam_min. X = (X+X')/2; n = length(X); [V,D] = eig(X); for i=l:n i f D(i , i) < lam_min D(i, i) = lam_min; end end M = V*D*V; B.5.7 wg_f.m function f = wg_f(t,FF,E,D,XX,Q) 7. f = wg_f (t,FF,E,D,XX,Q) 7. 7o Required for computing the omega which minimizes WG_F in WG_SDLS. f = .5*trace(FF+((E-t*D)'*(E-t*D))~2*XX-(E-t*D)'*(E-t*D)*Q); 121 

Cite

Citation Scheme:

        

Citations by CSL (citeproc-js)

Usage Statistics

Share

Embed

Customize your widget with the following options, then copy and paste the code below into the HTML of your page to embed this item in your website.
                        
                            <div id="ubcOpenCollectionsWidgetDisplay">
                            <script id="ubcOpenCollectionsWidget"
                            src="{[{embed.src}]}"
                            data-item="{[{embed.item}]}"
                            data-collection="{[{embed.collection}]}"
                            data-metadata="{[{embed.showMetadata}]}"
                            data-width="{[{embed.width}]}"
                            async >
                            </script>
                            </div>
                        
                    
IIIF logo Our image viewer uses the IIIF 2.0 standard. To load this item in other compatible viewers, use this url:
https://iiif.library.ubc.ca/presentation/dsp.831.1-0080035/manifest

Comment

Related Items