Spectral properties of matrices arising from primal-dual interior-point methods for convex quadratic programs by Erin Moulding B.Sc. (Hons), The University of Saskatchewan, 2010 A THESIS SUBMITTED IN PARTIAL FULFILLMENT OF THE REQUIREMENTS FOR THE DEGREE OF MASTER OF SCIENCE in The Faculty of Graduate Studies (Mathematics) THE UNIVERSITY OF BRITISH COLUMBIA (Vancouver) August 2012 c Erin Moulding 2012 Abstract The solution of convex quadratic programs using primal-dual interior-point methods has at its core the solution of a series of linear systems, which are in practice commonly reduced by block Gaussian elimination from the original unsymmetric block 3-by-3 formulation to either a block 2-by-2 saddle-point matrix or a block 1-by-1 normal equations form. The 3-by-3 formulation can also be symmetrized with a similarity transformation if a symmetric solver is to be used. We examine whether this practice of reduction is beneficial for the solver. For each of these formulations we find analytical results for the following spectral properties: the inertia, condition number, conditions for nonsingularity, and bounds on the eigenvalues. While the reduced systems become increasingly ill-conditioned throughout the iterations except in special cases, the 3-by-3 formulations remain nonsingular and well-conditioned with only mild assumptions on the problem; with regularization the assumptions are further simplified. Numerical examples are used to support the analytical results. We conclude that the 3-by-3 formulations, unsymmetric or symmetric, unregularized or regularized, have superior spectral properties that support their use in practice. ii Preface Chapters 3, 4, and 5 are based on work conducted with Chen Greif and Dominique Orban, and this material is in preparation for publication. I was responsible for much of the analysis appearing in Chapters 3 and 4, with input from my coauthors, and the numerical results in Chapter 5 are from my own code. The initial writeup of these chapters was done by me, though there have been many collaborative edits in the time since. The small section on nonstrict complementarity, Section 2.7, was originally written by Dominique Orban. iii Table of Contents Abstract . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . ii . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . iii Table of Contents . . . . . . . . . . . . . . . . . . . . . . . . . . . . iv List of Tables vi Preface . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . List of Figures . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . vii List of Algorithms . . . . . . . . . . . . . . . . . . . . . . . . . . . ix Acknowledgements . . . . . . . . . . . . . . . . . . . . . . . . . . . x 1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1 2 Convex quadratic programming 2.1 Definitions and examples . . . 2.1.1 Non-standard form . . 2.1.2 Examples . . . . . . . . 2.2 Convexity . . . . . . . . . . . . 2.3 Duality and the dual problem 2.4 Constraint qualifications . . . 2.5 Conditions for a solution . . . 2.6 Interior-point methods . . . . 2.7 Non-strict complementarity . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7 7 8 9 10 14 16 17 19 22 3 Properties of matrices in interior-point methods 3.1 Normal equations . . . . . . . . . . . . . . . . . . 3.1.1 Nonsingularity . . . . . . . . . . . . . . . . 3.1.2 Inertia . . . . . . . . . . . . . . . . . . . . 3.1.3 Eigenvalue bounds . . . . . . . . . . . . . . 3.1.4 Condition numbers . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 25 27 27 28 29 30 iv Table of Contents 3.2 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 31 31 31 31 32 33 33 35 37 42 of matrices . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 43 45 45 45 45 46 47 47 48 48 49 50 50 53 55 64 . . . . . . . . . . . . . . . . . . . . . . 65 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 87 Bibliography . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 90 3.3 Saddle-point form . . . . 3.2.1 Nonsingularity . . 3.2.2 Inertia . . . . . . 3.2.3 Eigenvalue bounds 3.2.4 Condition numbers Unreduced 3-by-3 form . 3.3.1 Nonsingularity . . 3.3.2 Inertia . . . . . . 3.3.3 Eigenvalue bounds 3.3.4 Condition numbers . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4 Regularization and the properties 4.1 Normal equations . . . . . . . . 4.1.1 Nonsingularity . . . . . . 4.1.2 Inertia . . . . . . . . . . 4.1.3 Eigenvalue bounds . . . . 4.1.4 Condition numbers . . . 4.2 Saddle-point form . . . . . . . . 4.2.1 Nonsingularity . . . . . . 4.2.2 Inertia . . . . . . . . . . 4.2.3 Eigenvalue bounds . . . . 4.2.4 Condition numbers . . . 4.3 Unreduced 3-by-3 form . . . . . 4.3.1 Nonsingularity . . . . . . 4.3.2 Inertia . . . . . . . . . . 4.3.3 Eigenvalue bounds . . . . 4.3.4 Condition numbers . . . 5 Numerical experiments 6 Conclusions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . v List of Tables 4.1 5.1 5.2 5.3 5.4 5.5 5.6 5.7 A summary of the necessary and sufficient conditions on nonsingularity throughout the interior-point iterations and at the limit. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Properties of quadratic programs solved. . . . . . . . . . . . . TOMLAB problem 6: progression of the interior-point solver. TOMLAB problem 6: progression of the interior-point solver regularized with ρ = δ = 1e−4. . . . . . . . . . . . . . . . . . TOMLAB problem 21: progression of the interior-point solver. TOMLAB problem 21: progression of the interior-point solver regularized with ρ = δ =1e−4. . . . . . . . . . . . . . . . . . . Iteration counts for different problems, part 1. A problem that did not converge is noted by “—”, and a problem that blew up is noted by “*”. . . . . . . . . . . . . . . . . . . . . Iteration counts for different problems, part 2. A problem that did not converge is noted by “—”, and a problem that blew up is noted by “*”. . . . . . . . . . . . . . . . . . . . . . 53 65 70 73 79 82 85 86 vi List of Figures 2.1 Illustration of convexity. . . . . . . . . . . . . . . . . . . . . . 11 5.1 5.2 5.3 TOMLAB problem 6: eigenvalues of K1 in log scale. . . . . . TOMLAB problem 6: eigenvalues of K2 in log scale. . . . . . TOMLAB problem 6: eigenvalues of K3 in log scale. Note that the upper negative bound is zero and is not visible in this scale. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . ˆ 3 in log scale. Note TOMLAB problem 6: eigenvalues of K that the upper negative bound is zero and is not visible in this scale. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . TOMLAB problem 6: comparison of the condition numbers. . TOMLAB problem 6: eigenvalues of K1,reg in log scale with ρ = δ = 1e−4. . . . . . . . . . . . . . . . . . . . . . . . . . . . TOMLAB problem 6: eigenvalues of K2,reg in log scale with ρ = δ = 1e−4. . . . . . . . . . . . . . . . . . . . . . . . . . . . TOMLAB problem 6: eigenvalues of K3,reg in log scale with ρ = δ = 1e−4. . . . . . . . . . . . . . . . . . . . . . . . . . . . ˆ 3,reg in log scale with TOMLAB problem 6: eigenvalues of K ρ = δ = 1e−4. . . . . . . . . . . . . . . . . . . . . . . . . . . . TOMLAB problem 6: comparing the condition numbers over different regularization parameters. . . . . . . . . . . . . . . . TOMLAB problem 21: eigenvalues of K1 in log scale. . . . . TOMLAB problem 21: eigenvalues of K2 in log scale. . . . . TOMLAB problem 21: eigenvalues of K3 in log scale. Note that the upper negative bound is zero and is not visible in this scale. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . ˆ 3 in log scale. Note TOMLAB problem 21: eigenvalues of K that the upper negative bound is zero and is not visible in this scale. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . TOMLAB problem 21: comparing the condition numbers. . . 66 67 5.4 5.5 5.6 5.7 5.8 5.9 5.10 5.11 5.12 5.13 5.14 5.15 68 68 69 69 71 72 72 75 76 76 77 78 78 vii List of Figures 5.16 TOMLAB problem 21: eigenvalues of K1,reg in log scale with ρ = δ = 1e−4. . . . . . . . . . . . . . . . . . . . . . . . . . . . 5.17 TOMLAB problem 21: eigenvalues of K2,reg in log scale with ρ = δ = 1e−4. . . . . . . . . . . . . . . . . . . . . . . . . . . . 5.18 TOMLAB problem 21: eigenvalues of K3,reg in log scale with ρ = δ = 1e−4. . . . . . . . . . . . . . . . . . . . . . . . . . . . ˆ 3,reg in log scale with 5.19 TOMLAB problem 21: eigenvalues of K ρ = δ = 1e−4. . . . . . . . . . . . . . . . . . . . . . . . . . . . 5.20 TOMLAB problem 21: comparing the condition numbers over different regularization parameters. . . . . . . . . . . . . 80 81 81 83 84 viii List of Algorithms 2.1 2.2 Mehrotra’s algorithm for QPs . . . . . . . . . . . . . . . . . . Starting point computation for Mehrotra’s algorithm . . . . . 21 23 ix Acknowledgements I would first like to thank my supervisor, Chen Greif, whose guidance and collaboration on my research were invaluable. Our weekly or twice-weekly meetings often went long while discussing various subjects, research-related or otherwise. He has supported my directions and goals through my academics, and has never failed to believe in me. My other collaborator on this work, Dominique Orban, provided many insights from an optimizer’s perspective. During every meeting we had, I learned something new, pushing me to deeper understanding of the material. The Scientific Computing Lab, the Institute of Applied Mathematics, and the Mathematics Department have been my academic homes, and they house many people who have been helpful in ways too varied to count. I appreciate all the time I have been able to spend with friends, as a break from research or as help with research, sometimes simultaneously. Rob Miller, Iain Moyles, Mark Willoughby, Kai Rothauge, and Rebecca Hiller became my group of friends in a new place, and we have had great fun on our (occasional) nights off. Jill Slind has been a wonderful and close friend for many years, and she continues to tolerate me after all this time. I would like to thank everyone who taught me math over the years for inspiring me and nurturing my love of the subject. Special thanks to Raymond Spiteri, my undergraduate supervisor, who first suggested summer research and graduate studies, and to Janet Christ, my high school math teacher, who took math beyond school and continues to inspire my teaching. My parents have always believed that I could do anything, and their guidance throughout the years has led here. They started my love for math as a child, and their continued support and love is greatly appreciated. Finally, my work would not have been possible without the daily support and encouragement of my husband, Kyle Van Impe. He has been a stabilizing force through the difficult and often frustrating process of my degree, and has allowed me to focus by picking up the slack in everything else in our life. I am always grateful for everything he does. Funding for my graduate work was provided by a NSERC CGS-M scholarship and by UBC affiliated awards. The financial support is appreciated. x Chapter 1 Introduction Optimization is the technique of finding the best solution out of a set of choices. Optimization problems occur throughout all fields, from maximizing profit in business to minimizing material stress in engineering. Some problems model situations created by humans, such as minimizing labour costs at a factory, while others solve problems occurring naturally, such as the minimization of energy in a molecule. A very simple example is to minimize x2 + y 2 , which has the obvious solution x = y = 0. The set of choices may be restricted in some way, such as maximizing profit with a fixed cost of materials. The example of minimizing x2 +y 2 can be extended by restricting x + y ≥ 2, where the solution is no longer as obvious. As computing power becomes ubiquitously available, ever larger optimization problems can be solved, leading to the increasing use of optimization throughout all fields. Any optimization problem involves an objective function, a measure of how good the current choice is. The variables in the problem represent properties of the choice being made, and restrictions on the choice are given as constraints to the problem. A general optimization problem in standard form is given by minimize f (x) x subject to c(x) = 0, d(x) ≥ 0, (1.1) where x ∈ Rn and c and d may be vector functions. In this standard form, x are the variables, f (x) is the objective function and c(x) = 0 and d(x) ≥ 0 are constraints. The exact solutions to only a few optimization problems can be found; for example, certain smooth unconstrained optimization problems can be solved using straightforward calculus. However, most classes of optimization problems, and especially the large problems that occur in practice, require the use of iterative methods that find approximate solutions. Quadratic programs are a type of the general optimization problem (1.1) in which the objective function is quadratic and the constraints are linear. In standard form, the problem is 1 minimize f (x) = cT x + xT Hx x 2 subject to Jx = b, x ≥ 0, 1 Chapter 1. Introduction where H ∈ Rn×n and J ∈ Rm×n . Details on the setup of these problems and converting to standard form will be covered in Section 2.1. These problems arise in many applications, and also as subproblems within solution algorithms for more general constrained optimization problems in the form (1.1), making the efficient solution of quadratic programs an important goal. The two major and contrasting categories of algorithms for quadratic programming are active-set methods and interior-point methods [45]. Active-set methods attempt to decide which inequality constraints are active, or hold with equality, reducing the problem to one that has only equality constraints and is simpler to solve. Interior-point methods, by contrast, require that the inequalities hold strictly, moving towards a solution from the interior of the feasible set. The primal-dual interior-point algorithm, our focus here, uses Newton’s method for rootfinding to solve a set of equations, so the major computational cost of each iteration is the solution of a linear system. A key question in solving quadratic programs with interior-point methods is how to solve these linear systems. The focus of this thesis will be to answer this question by examining properties of different formulations of these systems. The system is naturally posed as a block 3-by-3 system, where the matrix is unsymmetric and indefinite. The system may be symmetrized using a similarity transformation to a symmetric indefinite block 3-by-3 system. Alternatively, one step of block Gaussian elimination can be performed to achieve a symmetric indefinite block 2-by-2 system in the classic saddlepoint form [4]. Finally, a second step of block Gaussian elimination can be performed to reduce further to a symmetric positive definite block 1-by-1 system in the normal equations form. The quadratic program can also be regularized to alleviate numerical difficulties that arise during the solution of the linear system, leading to a set of regularized matrices that can be manipulated in similar ways. Interior-point methods come out of the theoretical background of the older barrier methods and penalty methods, which solve (1.1) by transforming the problem into a sequence of unconstrained minimization problems. Fiacco and McCormick [15] traced in detail the history of these methods until 1968, a few highlights of which follow. In 1943, Courant [10] proposed studying barrier-type functions, but this idea was not followed into the creation of an algorithm. The logarithmic potential method was introduced in two papers by Frisch in 1954 and 1955, using the gradients for problems of the form f (x) + αi log(di (x)) to solve problems with inequality constraints. This method was utilized in 1961 by Parisot in his Ph.D. thesis to solve linear programming problems, and other convex programs, with 2 Chapter 1. Introduction sequential unconstrained methods, later called the sequential unconstrained minimization technique (SUMT). Fiacco and McCormick extended SUMT for convex programming problems with both equality and inequality constraints in 1966 [13], using a barrier function for the inequality constraints and a penalty function for the equality constraints, and in 1967 [14] they established several theoretical results for the same problem, including duality results. Many results of theirs and others are collected in [15]. Barrier methods declined in popularity during the 1970s for several reasons, one being the ill-conditioning of matrices involved in the solution [18, 49]. Several developments specific to linear programming changed this. The solution of linear programs had been done almost exclusively by variants of the simplex method developed in 1947 and published in 1951 by Dantzig [12], which has excellent practical performance but exponential theoretical complexity [15, 49]. In 1979, Khachiyan [31] published the ellipsoid method, the first polynomial-time algorithm for linear programs, which was based on nonlinear programming techniques [49, 51]. Unfortunately its practical performance was poor. A dramatic change to the field occurred in 1984 when Karmarkar published a polynomial-complexity algorithm for linear programs [33], which he claimed to be consistently 50 times faster than simplex [49, 51]. The algorithm was an interior method using a logarithmic potential function, leading to the revival of interior-point methods for linear programming. In 1986, Gill et al. [22] showed the relationship between Karmarkar’s algorithm and classical barrier methods, meaning that similar methods could be exploited for nonlinear programming as well. Research since then has tended to view linear and nonlinear programming from a unified perspective, in contrast to the separated view during the dominance of simplex [49, 51]. The first polynomial interior-point algorithm for convex quadratic programming was introduced in 1986 by Kapoor and Vaidya [32] and independently by Ye and Tse [25]. The methods covered so far have all been formulated as primal methods, focusing only on the original form of the problem. Primal-dual methods were developed after Karmarker’s discovery, and while the definitions differ, they use both the original (primal) variables and the Lagrange multipliers (dual variables), and use Newton’s method to solve a set of nonlinear equations in these variables [51]. They have many excellent theoretical properties, including better conditioning than strictly primal methods, and can be extended to many nonlinear programs [51]. One key component of this discovery was the definition and derivation in 1980 by McLinden [38] of theoretical properties of the central path for a primal-dual set of problems [52]. In 1987, Megiddo [39] provided several insights to the central path, including its ge3 Chapter 1. Introduction ometry near the solution, which motivated Kojima, Mizuno, and Yoshise [34] in the same year to develop the first polynomial primal-dual algorithm, a long-step path-following algorithm. Shortly thereafter, Kojima, Mizuno, and Yoshise [35] also developed a short-step path-following algorithm with better theoretical complexity. Mizuno, Todd, and Ye in 1990 [42] provided a predictor-corrector improvement on the short-step algorithm. Parallel developments to this series of primal-dual methods focused on the logarithmic function, using it to generate search directions or update function values [52]. Path-following algorithms for convex quadratic programs were developed by several groups, including Mehrotra and Sun in 1990 [41]. Gonzaga in 1992 [25] covered many theoretical concepts and algorithms relating to path-following algorithms for linear programming, and lists some developments for these algorithms for quadratic programming and other problems. Nesterov and Nemirovskii in 1994 [44] covered interior-point algorithms for convex optimization in a general setting with many theoretical results and new proposed methods. They also commented that techniques for linear programming can generally be extended without difficulty to convex quadratic programming, and that a thorough bibliography of interior-point methods at that time had numbered over a thousand papers, indicating the explosion of the subject. The previously mentioned methods all require feasibility of the initial starting point, which is difficult to satisfy. In the search for practical algorithms, infeasible-interior-point methods were introduced, allowing more general starting points. Important discoveries for this group of methods were global convergence by Kojima, Megiddo, and Mizuno in 1993 [36] and polynomial complexity by Zhang in 1994 [53], the latter of which is general to convex quadratic programming. Many current algorithms are based on the predictor-corrector algorithm of Mehrotra in 1992 [40]. This infeasibleinterior-point primal-dual method uses a higher-order approximation for the central path. It also uses heuristic guidelines to choose a starting point and several parameters, which leads to an efficient and practical algorithm, though it has no convergence theory [52]. This algorithm is the basis for the solvers covered in this thesis. While the algorithms of modern interior-point solvers are mostly settled, the choice of matrix formulation to use in the algorithm differs, and various linear algebra issues related to the solution of these systems arise. Many modern solvers reduce fully to the normal equations form; one prominent example is PCx for linear programming [11]. Others reduce to the saddlepoint form; examples include OOQP for quadratic programming [21] and IPOPT and KNITRO for general nonlinear programming [7, 8, 48]. An4 Chapter 1. Introduction other example is HOPDM for linear programming and convex quadratic programming, which automatically chooses either the normal equations or saddle-point form [2]. We are not aware of any existing solvers which solve the unreduced problem for any of these problems. Properties of the normal equations form are straightforward, and there is relatively little analysis of this form. One difficulty with the use of the normal equations is that an interim step involves the inversion of a matrix, which may lead to fill-in [18]. The saddle-point formulation has properties that directly follow from the general properties of such matrices; see [4, 28, 46, 47] for some relevant general results and [20] for results specialized to optimization. The ill-conditioning of some reduced matrices is well-known [15, 16, 18, 49, 51], but it has been referred to, with some assumptions on solution methods, as “benign” [16, 51],“usually harmless” [18], and “highly structured” [18]. The matrices for classical barrier methods are also illconditioned [18, 50]. There are relatively few existing results for the unreduced 3-by-3 formulation. Some spectral properties for various formulations for the special case of linear programming are covered in [37]. Uniform boundedness of the inverse under several assumptions is proved in [3], intended to be used in further theoretical proofs. Saunders is cited in [16] as stating by personal communication that a similarity transformation leads to a symmetrized version of the 3-by-3 system, equivalent to the symmetrized matrix used in this thesis. Several works [16, 18] note that the matrix of this system remains well-conditioned though ill-conditioning remains when forming the righthand side and computing the variables due to multiplication by a diagonal matrix with large elements; they also both mention a different symmetric formulation that is ill-conditioned. There are also other formulations of the system besides those covered in this thesis. An augmented formulation is proposed in [17] that has the benefit of being positive definite on and near the central path, allowing the use of more specialized linear solvers. Scalings of the systems are used in [37] and [19] to alleviate ill-conditioning. A partially-eliminated formulation for the penalty problem is used in [27] and [23] based on partitioning inequality constraints by residual size, after noting that ill-conditioning is due to the varying sizes of these residuals. The conditioning of various formulations was discussed in [5] for discretized variational problems, but since they use iterative methods with large-scale problems, the ill-conditioning of the reduced forms is problematic, and they therefore use in experiments only the original unreduced form and a partially-eliminated formulation. They also offer a scaling of the saddle-point form that is well-conditioned, but recov5 Chapter 1. Introduction ering the variables of interest requires the reverse scaling which can amplify numerical errors. An outline of the remainder of this thesis is as follows: Chapter 2 will cover background material on quadratic programming, Chapter 3 will cover properties of the matrices for interior-point methods, Chapter 4 will cover properties of the regularized matrices, Chapter 5 will present numerical experiments, and Chapter 6 will conclude. 6 Chapter 2 Convex quadratic programming In this chapter we introduce quadratic programming and necessary theoretical concepts for its analysis. This material is well known, but is provided for completeness. A thorough treatment of quadratic programming, and the source of the material of this chapter unless otherwise noted, can be found in the books [6, 30, 45]. Section 2.1 will define the problem and give examples, Section 2.2 will cover convexity, Section 2.3 will define the dual problem, Section 2.4 will define constraint qualifications, Section 2.5 will cover the necessary and sufficient conditions for a minimum, Section 2.6 will cover interior-point methods, and Section 2.7 will cover non-strict complementarity. 2.1 Definitions and examples Given matrices H ∈ Rn×n and J ∈ Rm×n and vectors b ∈ Rm and c ∈ Rn , a quadratic program (QP) in standard form is given by 1 minimize cT x + xT Hx x∈Rn 2 subject to Jx = b, x ≥ 0. (2.1) The function cT x + 12 xT Hx is called the objective function, the equations Jx = b are the equality constraints, and the inequalities x ≥ 0 are the inequality constraints or nonnegativity constraints. A common assumption is that J is full rank with m < n, meaning none of the equality constraints are redundant. H is symmetric, and is variously required to be positive definite, positive definite on the nullspace of J, or positive definite on a critical cone lying in the nullspace of J. Throughout this thesis, H will be assumed to be at least positive semi-definite and no assumption will be made on the rank of J, with other conditions on H and J specified as needed. If H ≡ 0, (2.1) is a linear program (LP), another important type of optimization problem. A point x is called feasible if all the constraints are satisfied, and the feasible set is the set of all feasible points, {x | Jx = b, x ≥ 0}. An 7 2.1. Definitions and examples inequality constraint is inactive if it is satisfied strictly, and is called active if it is satisfied with equality. The active set A(x) at a feasible point x is the set of constraints that are active for the point x, and the inactive set I(x) is similarly the set of constraints that are inactive. Henceforth if unambiguous these sets will be referred to as A and I for brevity. For an index set N ⊆ {1, . . . , n} and a vector v ∈ Rn , vN is the subvector of v indexed by N . Similarly, if A is a matrix with n columns, AN is the submatrix of the columns of A corresponding to indices in N , and AN N is the square submatrix with both rows and columns corresponding to indices in N . Using this notation, the vector xA corresponds to the elements of x in the active set, thus xA = 0 for a QP in standard form (2.1). A point x∗ is a local minimizer or local solution of (1.1) if it is feasible and f (x∗ ) ≤ f (x) for all feasible x in some neighbourhood surrounding x∗ . A point x∗ is a global minimizer or global solution if it is feasible and f (x∗ ) ≤ f (x) for all feasible x. The gradient of a differentiable function f (x) is ∇f = ∂f ∂f ∂x1 , . . . , ∂xn . The Hessian is ∂f 2 ∂x21 ∂f 2 ∇ f = ∂x2.∂x1 . . 2 ∂f 2 ∂xn ∂x1 ∂f 2 ∂x1 ∂x2 .. ... ∂f 2 ∂x1 ∂xn .. .. . .. . . ∂f 2 ∂xn ∂x2 . ... ∂f 2 ∂x2n . The Taylor series for a function f (x) is an infinite sum of terms representing the function. Truncating the Taylor series gives a Taylor polynomial that approximates f near a point x0 . We will use only up to the quadratic Taylor polynomial, which is given by 1 f (x) ≈ f (x0 ) + (x − x0 )T ∇f (x0 ) + (x − x0 )T ∇2 f (x0 )(x − x0 ). 2 2.1.1 Non-standard form A quadratic program may be stated in many different forms, but all can be reduced to the standard form. The simplest case is that the variables x are bounded below element-wise, as x ≥ d. This can be changed to standard form by shifting to a new variable vector x = x − d which is then bounded 8 2.1. Definitions and examples below by zeros, and setting b = b − Jd to formulate the new constraints Jx = b . The objective function does not change, although the optimal value will. Another case is if the constraints with the matrix J are given as inequality constraints, Jx ≤ b. By introducing slack variables xs ≥ 0, the x constraints can be rewritten as J I = b. Upper bounds on the xs variables are considered as constraints and dealt with in the same way. Similarly, if the constraints were Jx ≥ b, then the problem can be transformed by subtracting slack variables in the same way (these are sometimes referred to as excess variables). In both cases, the objective function is corrected to x H 0 x cT 0T + 21 xT xTs . xs 0 0 xs If the variables lack nonnegativity constraints and are thus free variables, this can be fixed by setting x = x+ − x− , with x+ ≥ 0 and x− ≥ 0. x+ The equality constraints are corrected to J −J = b, and the new x− x+ H −H x+ . objective function is cT −cT + 21 xT+ xT− x− −H H x− Finally, these techniques can all be combined in the case that the problem is non-standard in more than one way, and each of these techniques can be applied to one or more variables. Thus since any problem can be expressed in standard form, standard form will be assumed henceforth. 2.1.2 Examples One common example of quadratic programming is least-squares approximation. The problem is to approximately solve Jx = b, with J ∈ Rm×n a full rank matrix with m > n. Unless b is in the range-space of J, there is no solution to this equation. Instead, the problem is solved to minimize the residual Jx − b 2 . In the form of a quadratic program, this is minimize x Jx − b 2 2 = xT J T Jx − 2bT Jx + bT b. (2.2) This is an unconstrained QP, and it has the solution x = (J T J)−1 J T b. (2.3) Bound constraints in the form l ≤ x ≤ u can be added to this problem to get a constrained least-squares problem, which no longer has the solution 9 2.2. Convexity (2.3). If J is large and sparse, regardless of whether or not there are constraints, iterative methods are applied to solve (2.2) rather than using (2.3) to compute x, which is computationally inefficient and potentially unstable. An example that comes from modelling a real-life situation is Markowitz’s model for portfolio optimization. It models a collection of n possible investments, each with return ri which is not known in advance. These returns may be assumed to be random variables with expected value µi = E[ri ] and variance σi2 = E[(ri − µi )2 ]. The returns are not independent, and the E[(ri −µi )(rj −µj )] correlations between the returns are given by ρi,j = . Given σi σj this information, the investor invests a fraction of the available money xi in the ith investment. Constraints are that all money must be invested, so the sum of the xi must be one, and that the fractions must be positive. The optimization is to maximize return while minimizing variance, resulting in the quadratic program n T T maximize µ x − κx Gx n x∈R xi = 1, x ≥ 0. subject to i=1 The matrix G is the covariance matrix with entries Gi,j = ρi,j σi σj , and the parameter κ is a risk tolerance parameter. A large value for κ places more emphasis on having less variance, which is a more conservative investment, while a small value will allow more variance but possibly larger returns. When applying this model to real life investments, the expected values, variances, and correlations are not known, and are generally estimated by combining historical data with the insights of professionals. 2.2 Convexity Convexity is a fundamental property for optimization that determines how local and global minimizers behave and are related. The convexity of an optimization problem is determined by the convexity of the objective and constraint functions. A set C ∈ Rn is convex if for any x, y ∈ C, we have αx + (1 − α)y ∈ C for all α ∈ [0, 1]. Geometrically, this says that the line segment connecting any two points in the set lies entirely in the set. A function f (x) is convex on a convex set C if for any x, y ∈ C it satisfies f (αx + (1 − α)y) ≤ αf (x) + (1 − α)f (y), ∀α ∈ [0, 1]. (2.4) It is strictly convex if it satisfies f (αx + (1 − α)y) < αf (x) + (1 − α)f (y), ∀α ∈ (0, 1) 10 2.2. Convexity for any x, y ∈ C with x = y. A function f (x) is concave if −f (x) is convex. Convexity for functions can be thought of as a generalization of linearity: a function is linear if the condition (2.4) holds with equality. Thus linear functions are both convex and concave. Convexity in geometric terms says that the line connecting (x, f (x)) and (y, f (y)) lies on or above the function, as illustrated in Figure 2.1. Figure 2.1: Illustration of convexity. For differentiable functions, convexity can be characterized in terms of derivatives. Theorem 2.1 (First order conditions for convexity). For f continuously differentiable, it is convex on a convex set C if and only it satisfies f (x) ≥ f (y) + ∇f (y)T (x − y), for all x, y in C. It is strictly convex if and only if the above holds with a strict inequality. Proof. First we show that convexity implies the inequality as stated. Assume that f is convex on a set C. Using the definition of convexity of f , we have f (αx + (1 − α)y) ≤ αf (x) + (1 − α)f (y), f (y + α(x − y)) − f (y) ≤ f (x) − f (y). α Allowing α to tend to zero from above gives ∇f (y)T (x − y) ≤ f (x) − f (y), which rearranges to the result. 11 2.2. Convexity Now, for the other direction, assume that f satisfies f (x) ≥ f (y) + ∇f (y)T (x − y), for all x, y in C. Then for an arbitrary α ∈ [0, 1], let z = αx + (1 − α)y ∈ C, and take f (x) ≥ f (z) + ∇f (z)T (x − z), f (y) ≥ f (z) + ∇f (z)T (y − z). Multiplying the first inequality by α and the second by (1 − α) and adding gives αf (x) + (1 − α)f (y) ≥ f (z) + ∇f (z)T (αx + (1 − α)y) − ∇f (z)T z, αf (x) + (1 − α)f (y) ≥ f (αx + (1 − α)y), giving that f is convex. The proof for strict convexity is identical except for the use of strict inequalities, so it is omitted. Theorem 2.2 (Second order conditions for convexity). For f twice continuously differentiable, it is convex if and only if its Hessian is positive semidefinite. It is strictly convex if and only if its Hessian is positive definite. Proof. Assume that f is convex on an open convex set C. Let y ∈ C and d be any direction. Then for small α > 0, y + αd ∈ C, and we can use a Taylor expansion as follows: f (y + αd) = f (y) + α∇f (y)T d + α2 dT ∇2 f (y)d + α3 O( d 3 ). Using the result from Theorem 2.1 with x = y + αd, we have α2 dT ∇2 f (y)d + αO( d 3 ) ≥ 0. Dividing by α2 and then allowing α to tend to zero, we have that ∇2 f (y) is positive semi-definite. Assume that ∇2 f (z) is positive semidefinite for all z ∈ C, an open convex set. Then for any x, y ∈ C, the following holds by Taylor’s theorem f (y) = f (x) + ∇f (x)T (y − x) + (y − x)T ∇2 f (z)(y − x), for some z ∈ C. Since ∇2 f (z) is positive semidefinite, the result of Theorem 2.1 holds and thus f is convex. Again, the results for strictly convex are similar and are omitted. 12 2.2. Convexity Using the definitions of convexity and strict convexity for functions, we can now define convexity for optimization problems. Definition 2.3 (Convexity for optimization problems). A general optimization problem (1.1) is called convex if the objective function f is convex, the equality constraint functions ci are linear, and the inequality constraint functions di are concave. It is called strictly convex if the objective function is strictly convex and the latter two conditions also hold. The benefit of convex optimization is that a local solution is also a global solution, as shown in the following theorem. Theorem 2.4 (Convex global solutions). If x∗ is a local minimizer of a convex optimization problem (1.1), then x∗ is a global minimizer. If the problem is strictly convex, then x∗ is the unique global minimizer. Proof. Assume that x∗ is a local minimizer, but that it is not a global minimizer. Then there exists some x with c(x) = 0 and d(x) ≥ 0 such that f (x) < f (x∗ ). Then for α ∈ (0, 1), we have f (αx∗ + (1 − α)x) ≤ αf (x∗ ) + (1 − α)f (x), < αf (x∗ ) + (1 − α)f (x∗ ), = f (x∗ ), showing that there exist points arbitrarily close to x∗ with function values strictly less than f (x∗ ), contradicting the assumption that x∗ is a local minimizer. Thus such an x cannot exist, and x∗ is a global minimizer. For f strictly convex, assume that x∗ is a global minimizer but that it is not unique. Then there exists some x with c(x) = 0 and d(x) ≥ 0 such that f (x) = f (x∗ ). Then for α ∈ (0, 1), we have f (αx∗ + (1 − α)x) < αf (x∗ ) + (1 − α)f (x), = αf (x∗ ) + (1 − α)f (x∗ ), = f (x∗ ), again contradicting the assumption that x∗ is a local minimizer. Thus x∗ is the unique global minimizer. For quadratic programming, the conditions that the equality constraints be linear and the inequality constraints concave always hold, so whether the problem is convex depends only on the objective function. If H is positive semi-definite, then the quadratic programming problem (2.1) is convex, and 13 2.3. Duality and the dual problem further if H is positive definite, then the problem is strictly convex. The assumptions of this thesis ensure that the problem is always convex, which will ensure additional theoretical properties such as duality, which is covered next. 2.3 Duality and the dual problem The theory of duality allows the construction of an alternative optimization problem, the dual problem, from the original problem, called the primal problem for contrast. In the dual problem, the variables and constraints have switched positions, so that the variables of the dual problem are related to the constraints for the primal problem, and similarly for the constraints of the dual. For some problems, the dual is easier to solve than the primal, and when the problems satisfy certain properties , then given the solution to one problem, the solution to the other can be easily computed. In the case of quadratic programming, the dual objective will provide a convenient lower bound on the primal objective. Here we use Lagrangian duality, though other theories exist; see [30]. By introducing vectors y ∈ Rm and z ∈ Rn , the Lagrange multipliers for the equality and inequality constraints respectively, the dual problem can be defined for the primal QP (2.1). The Lagrangian function for (2.1) is 1 L(x, y, z) = cT x + xT Hx − y T (Jx − b) − z T x. 2 We can recover the primal problem as follows. The primal function is defined to be L∗ (x) = sup L(x, y, z), y,z≥0 = sup y,z≥0 1 cT x + xT Hx − y T (Jx − b) − z T x . 2 If any component of Jx − b = 0, then by allowing the corresponding multiplier in y to increase or decrease, depending on the sign of the component, and setting all other multipliers to zero, the Lagrangian will increase without bound. Similarly, if any x < 0, by allowing the corresponding multiplier in z to increase, the Lagrangian will increase without bound. Thus the terms y T (Jx − b) and z T x must be zero, and the primal function is L∗ (x) = cT x + 21 xT Hx, ∞, if x ≥ 0, Jx = b otherwise. 14 2.3. Duality and the dual problem The original primal problem is thus minimize L∗ (x) on the domain where x L∗ (x) is finite. The primal problem is therefore a min-max problem for the Lagrangian function. The dual problem is given by the max-min problem for the same Lagrangian. Note that since the variables z correspond to the inequality constraints, they must be nonnegative, and thus we have the constraint z ≥ 0. For any z ≥ 0, the dual function is defined as L∗ (y, z) = inf L(x, y, z), x 1 = inf cT x + xT Hx − y T (Jx − b) − z T x . x 2 If any of ∇x L = c + Hx − J T y − z = 0, then by changing the corresponding element of x and setting the remaining elements to zero the Lagrangian will decrease without bound. The condition c + Hx − J T y − z = 0 thus becomes a constraint. Solving this equation for x would require H positive definite, which we do not assume. Alternatively, x can be retained as a variable to obtain L∗ (x, y, z) = bT y − 21 xT Hx, −∞, if z ≥ 0, J T y + z − Hx = c otherwise. The dual problem is then defined to be maximize L∗ (x, y, z) on the domain x,y,z where this is finite. Then the dual problem can be stated as follows 1 maximize bT y − xT Hx subject to J T y + z − Hx = c, z ≥ 0. (2.5) x,y,z 2 Note that this is also a convex quadratic programming problem, though not in standard form. Taking the dual of this dual problem recovers the primal problem. If H is not positive semi-definite, the relationship between the primal and dual problems no longer holds. Given a feasible primal-dual triple (x, y, z), the difference in the objectives is 1 1 (cT x + xT Hx) − (bT y − xT Hx), 2 2 = cT x + xT Hx − bT y, = (c + Hx)T x − bT y, = (J T y + z)T x − (Jx)T y, = y T J T x + z T x − xT J T y, = z T x ≥ 0. 15 2.4. Constraint qualifications Thus for any feasible triple, the dual objective is a lower bound on the primal objective. This is known as weak duality, and the inner product z T x = xT z is called the duality gap. In the case of infeasibility, the duality gap is the difference in objective functions, but this is not equal to xT z. The dual function is in fact a lower bound on the primal function for all triples: if the triple is not primal feasible, the primal function takes the value ∞, and if it is not dual feasible, the dual function takes the value −∞. 2.4 Constraint qualifications Methods for solution of many optimization problems (1.1), and specifically quadratic programs (2.1), work by using linear approximations to the objective and constraint functions to build a better iterate from the current iterate. This approach only makes sense when linear approximations are good approximations. The theory of constraint qualifications finds conditions when these approximations make sense. Let C be the closed, convex set of feasible points for (2.1). If x is a feasible point, the sequence {zk } is a feasible sequence approaching x if zk ∈ C for all k large enough and zk → x. The following definition is one way to capture the geometry of a set C at a feasible point. Definition 2.5 (Tangent cone). The vector d is a tangent to C at x if there is a feasible sequence {zk } approaching x and a sequence {tk } with tk > 0, tk → 0 such that limk→∞ zkt−x = d. The set of all tangents to C at x is k called the tangent cone and is denoted TC (x). Starting from any feasible point for (2.1), we want to move in directions that stay within the feasible set. This is given by the following definition. Definition 2.6 (Linearized feasible direction set). Given a feasible point x and the active set A(x), the set of linearized feasible directions is F(x) = {d | Jd = 0, dT ei = 0, i ∈ A(x)}, where ei is the ith standard basis vector. Constraint qualifications are conditions where TC (x) is identical to, or at least similar to, F(x). This ensures that the feasible directions represent the actual geometry of the set C, and thus that a linearized approximation to the problem captures the essential features and will yield useful information. They are also used extensively for theoretical analysis of optimization problems. There are many different constraint qualifications, some of which 16 2.5. Conditions for a solution are more easily characterized than others. One constraint qualification is that all active constraint gradients be linear, which is obviously satisfied for quadratic programming. However, the constraint qualification we will use throughout this thesis is another common constraint qualification, the linear independence constraint qualification. Definition 2.7 (LICQ). A triple (x, y, z) that is feasible for (2.1) satisfies the Linear Independence Constraint Qualification (LICQ) if the set of active constraint gradients is linearly independent, i.e. J T −IA has full column rank. This matrix comes from the gradients of the equality constraints, J T , and the gradients of the active inequality constraints, IA . The negative sign is added for convenience in later results. The LICQ can be used to find conditions for a solution, as in the following section, and we will also use it in Chapters 3 and 4 to find requirements for nonsingularity of matrices involved in the algorithms. 2.5 Conditions for a solution Conditions for a solution of an optimization problem (1.1) can be characterized by using first-order or second-order derivative information. The first-order necessary conditions for a solution to (2.1) are fundamental to constrained optimization, and in particular to interior-point methods. They are given in the following theorem. Theorem 2.8 (KKT conditions). If x∗ is a local minimizer of (2.1) and the LICQ holds at x∗ , then there exist y ∗ and z ∗ such that the following conditions are satisfied at x∗ , y ∗ , z ∗ : c + Hx − J T y − z = 0, (2.6a) Jx − b = 0, (2.6b) x ≥ 0, (2.6c) z ≥ 0, (2.6d) xi zi = 0, i = 1, 2, . . . , n. (2.6e) These are referred to as the Karush–Kuhn–Tucker (KKT) conditions for the quadratic programming problem (2.1). Proof. The proof is long and technical. The interested reader is referred to [45, Section 12.4] for a thorough proof for general optimization problems. 17 2.5. Conditions for a solution The conditions xi zi = 0 are referred to as the complementarity conditions, and they require that at least one of xi and zi be zero for each index i. If exactly one of these is zero for all indices, then strict complementarity is said to hold. A QP is called degenerate if either strict complementarity fails to hold, or some constraints are redundant at the solution. Thus if the LICQ and strict complementarity hold, the problem is nondegenerate. If the LICQ fails to hold, we can regularize the problem, as described in Chapter 4. Non-strict complementarity complicates matters significantly, but by partitioning the variables, it can be dealt with. We cover this in Section 2.7. At this point the duality result from Section 2.3 can be revisited. The difference in the primal and dual objectives is z T x ≥ 0. At the solution, the KKT conditions require that z T x = 0. Thus at the solution, the primal and dual objectives are equal. This is known as strong duality. The second order necessary conditions are given in the following theorem. Theorem 2.9 (Second-order necessary conditions). Assume x∗ is a local minimizer of (2.1), the LICQ holds at x∗ , the vectors y ∗ and z ∗ are the Lagrange multipliers for which the KKT conditions are satisfied, and strict complementarity holds. Define N to be a null-space basis matrix for the J T matrix T . Then N HN is positive semi-definite. −IA Proof. See [45, Section 12.5]. The second-order sufficient conditions are given in the following theorem. Theorem 2.10 (Second-order sufficient conditions). Assume x∗ is a feasible point for (2.1) and there exists y ∗ and z ∗ such that the KKT conditions are satisfied at x∗ , y ∗ , z ∗ with strict complementarity. Assume also that N T HN is positive definite. Then x∗ is a strict local solution for (2.1). Proof. See [45, Section 12.5]. Remark. Note that for a strictly convex QP, H is positive definite and thus N T HN is positive definite always since N is a basis matrix. Then if strict complementarity is satisfied, by Theorem 2.10 this implies that the KKT conditions are in fact necessary and sufficient. This fact is used to derive methods for solving (2.1) by iteratively finding solutions to the KKT conditions, as is covered in the next section. 18 2.6. Interior-point methods 2.6 Interior-point methods The two main classes of solvers for QPs are active-set methods and interiorpoint methods. Active-set methods have been in use since the 1970s, and extend the idea of Dantzig’s simplex method for LPs. They use an estimate of the active set A to solve at each iteration a QP which has only equality constraints. Then a new estimate of the active set is found, and the method proceeds. Each iteration is cheap, but no advantage of sparsity is taken, and the time required to solve in the worst case is exponential in the problem size. These methods are very effective for smaller QPs where the exponential worst-case bound is not insurmountable, or where a rough solution is known that can warm-start the algorithm [45]. The history of interior-point methods is covered in Chapter 1. While they have polynomial complexity, each iteration is expensive. The idea is to solve the equalities from (2.6) while ensuring that the inequalities from (2.6) are strictly satisfied. These equations have both linear and mildly nonlinear components, and are not too difficult to solve, but the inequalities create more difficulties. The particular class of interior-point methods we focus on are primal-dual interior-point methods. The KKT conditions for the primal QP (2.1) are given by (2.6), and since the solution (x, y, z) of the KKT conditions must be primal and dual feasible and satisfy complementarity, the dual problem is also solved. Thus by solving the KKT conditions for (x, y, z), we obtain a solution for both problems simultaneously. These primal-dual methods were shown to be the most efficient form of interior-point methods, and have been the ones that continue to be used. The equations from the KKT conditions can be rewritten as a mapping as follows c + Hx − J T y − z = 0, b − Jx F (x, y, z) = −XZe where the matrices X and Z are diagonal matrices of the vectors x and z respectively, and e is the vector of ones. Using Newton’s method, the system that is solved at each iteration is k H −J T −I ∆x −c − Hxk + J T y k + z k −J . 0 0 ∆y k = −b + Jxk (2.7) k k k k k −Z 0 −X X Z e ∆z The new iterate is (xk+1 , y k+1 , z k+1 ) = (xk , y k , z k ) + αk (∆xk , ∆y k , ∆z k ), with step size αk ∈ (0, 1] chosen to retain feasibility. Henceforth, we drop the iteration numbers for compactness, and it is understood that the system 19 2.6. Interior-point methods is composed of the vectors from the most recent iteration. This system is referred to as the affine Newton system, and its solution is referred to as the affine Newton direction. The full step with α = 1 generally violates the nonnegativity of x and z, so a line search can be performed to determine the step size. Unfortunately, the step size found from this pure Newton step is often very small. To resolve this difficulty, the complementarity conditions xi zi = 0 are replaced with the relaxed complementarity conditions xi zi = τ , τ > 0. In subsequent iterations, τ is reduced to zero to recover the original complementarity conditions. This relaxed complementarity condition can also be found by considering the optimization problem with a logarithmic barrier term with barrier parameter τ , and finding the first-order optimality conditions for this problem. The revised KKT conditions can again be written as a function as follows c + Hx − J T y − z = 0. b − Jx (2.8) Fτ (x, y, z) = −XZe + τ e The system that is solved at each iteration is then −c − Hx + J T y + z ∆x H −J T −I . −J −b + Jx 0 0 ∆y = XZe − τ e ∆z −Z 0 −X (2.9) A Newton step in this direction is usually able to take a larger step length without violating nonnegativity. The set of solutions {(xτ , yτ , zτ ) | τ > 0} is referred to as the central path, and all solutions on the central path are strictly feasible. The value of τ is chosen to reduce the current average value of xi zi , defined as xT z µ= , n by some fraction σ ∈ [0, 1]. This σ is called the centering parameter, and µ is the duality measure. If σ = 1, this step is a pure centering step, moving the solution towards the point (xµ , yµ , zµ ) on the central path where all the complementarity products are equal to the current average value. If σ = 0, the standard Newton step from (2.7) is recovered. Most solvers choose a value between these two extremes. The solution technique we use for the numerical experiments in Chapter 5 is based on the predictor-corrector method for linear programs by Mehrotra, which was later extended to QPs. It is a practical algorithm that ignores 20 2.6. Interior-point methods some theory but has good results. It involves two Newton steps. The first is a predictor step to find the affine Newton direction by solving the system (2.7). The results from this computation are then used in a second Newton step which computes the search direction in a combination of three contributions: the predictor from the affine Newton step, a corrector for the error in linearization made in the predictor step, and a correction towards the central path. The system for this step is as follows H −J T −I ∆x −c − Hx + J T y + z −J , (2.10) 0 0 ∆y = −b + Jx −Z 0 −X ∆z XZe + ∆Xaff ∆Zaff e − σµe where ∆Xaff and ∆Zaff are diagonal matrices of the vectors ∆xaff and ∆zaff respectively, the steps from the affine Newton step. The value of σ is also based on the results from the affine Newton step, in a formula with no theoretical justification but good performance. The use of the same matrix for two steps allows the matrix to be factored once with the resulting factorization used twice, which is only marginally more expensive than one solve. The full algorithm is given in Algorithm 2.1. Algorithm 2.1 Mehrotra’s algorithm for QPs Input: H, J, c, b, tolerance. Initialize x, y, z. repeat Solve (2.7) to get ∆xaff , ∆yaff , ∆zaff . pri dual . and αaff Compute step sizes αaff pri dual ∆z )/n. T ∆xaff )T (z + αaff Compute µ = x z/n and µaff = (x + αaff aff Compute centering parameter σ = (µaff /µ)3 . Solve (2.10)to get ∆x, ∆y, ∆z. Compute step sizes αpri and αdual . Update x = x + αpri ∆x, y = y + αdual ∆y, z = z + αdual ∆z. until duality gap is less than tolerance. The maximum step size for the primal affine step before moving into infeasibility would be the minimum of −xi /∆xaff,i over all indices i where ∆xaff,i < 0. The actual step size is taken to be most of the distance to the boundary, using a factor called the step factor which is close to 1. We take a step factor of 0.95, and compute a step size as follows: pri αaff = min(1, 0.95 min∆xaff,i <0 (−xi /∆xaff,i )). The calculation of the other step sizes are similar, with the dual steps using the corresponding values of z rather than x. 21 2.7. Non-strict complementarity One difficulty in using interior-point methods is finding a good starting point. Mehrotra’s initialization solves two equality-constrained minimum norm problems, then modifies these to ensure strict feasibility with respect to the inequality constraints. This again has no theoretical basis, but has good practical performance, and the algorithm in fact performs poorly without it. In our work, we solve two equality constrained QPs, following the method of Mehrotra’s original algorithm for LPs and the work to extend to QPs in [20]. Their initialization problem for x is an equality constrained QP, and the problem for y and z is chosen to use the same matrix, as Mehrotra’s LP algorithm does. We follow this idea, and the initial primal variable x is 1 1 chosen to solve the problem minimize x 2 + xT Hx subject to Jx = b, x 2 2 which can be solved with the following linear system: H + I −J T −J 0 x t = 0 . −b (2.11) The solution for t is then discarded. The initial dual variable z is then chosen 1 1 to solve the problem minimize z 2 + z T Hz + cT z subject to Jz = 0, z 2 2 which can be solved with the following system: H + I −J T −J 0 z y = −c , 0 (2.12) which also provides the initial Lagrange multipliers y. Since these problems both use the same matrix, a single factorization can be applied. The initial points x and z will in general include negative elements, so the procedure of Mehrotra is applied to ensure that they are strictly positive by modifying negative elements. The full algorithm to compute the starting values is given in Algorithm 2.2. 2.7 Non-strict complementarity Many of the results of this thesis rely on strict complementarity being satisfied in the limit, however, not all QPs (2.1) satisfy strict complementarity at a local solution. In the case of linear programming, if at least one solution exists, a strictly complementary solution always exists [24]. With mild assumptions, interior-point methods iterate towards a strictly complementary limit point [52, Theorem 6.8]. However, this no longer holds for QPs. One counter-example is minimize x 1 2 x 2 subject to x ≥ 0, 22 2.7. Non-strict complementarity Algorithm 2.2 Starting point computation for Mehrotra’s algorithm Input: H, J, c, b. if c ≡ 0 then Perturb a random element of c to 10−2 . end if if b ≡ 0 then Perturb a random element of b to 10−2 . end if Solve (2.11) to get x ˜. Solve (2.12) to get y˜ and z˜. Compute δx = max((−3/2) mini x˜i , 0) and δz = max((−3/2) mini z˜i , 0). Compute x ˆ=x ˜ + δx e and zˆ = z˜ + δz e. T T ˆ Compute δx = 0.5 xeˆT zzˆˆ and δˆz = 0.5 xeˆT xˆzˆ . Compute x = x ˆ + δˆx e, y = y˜, and z = zˆ + δˆz e. which has (x, z) = (0, 0) as the only solution. The matrices of (2.7), (2.9), and (2.10) will have zero rows in the limit, and thus will be singular. Here we outline a partially eliminated system which will allow our results to apply to these problems. Indicator sets can be used to separate indices into three groups, two active and one inactive. We define two types of active constraints, and thus partition the active set A. The set of strongly active constraints at x is AS := {i = 1, . . . , n | xi = 0 < zi }. The set of weakly active constraints at x is AW := {i = 1, . . . , n | xi = zi = 0}, the constraints at which strict complementarity fails to hold. Suppose at each iteration k of the interiorpoint method, we can identify approximations AkS , AkW , and I k to AS , AW , and I, respectively. Such indicator sets can resolve the singular limit difficulty provided they ensure that zik /xki → 0 as k → ∞ for i ∈ AkW ∪ I k while xki /zik → 0 as k → ∞ for i ∈ AkS . Indeed if this were the case, upon partitioning x, z, H, and J according to B k := AkW ∪ I k and S k := AkS , the variables ∆zB can be eliminated in (2.9) to get HSS HSB −JST −I rd,S ∆xS −1 T H T ∆xB rd,B + X −1 rc,B B . SB HBB + XB ZB −JB ∆y = −JS rp −JB ∆zS rc,S −ZS −XS The coefficient matrix of this system has a well-defined limit whenever J has full row rank. Details of indicator sets with the requisite properties, along 23 2.7. Non-strict complementarity with pointers to the literature, are given in [9] and [43]. Should J not have full row rank, the system can be regularized, as covered in Chapter 4. 24 Chapter 3 Properties of matrices in interior-point methods In this chapter we examine the properties of several different formulations of the matrices in systems (2.7), (2.9), and (2.10). For each of these matrices, we find conditions for nonsingularity, the inertia, bounds on the eigenvalues, and bounds and estimates of the condition numbers. The matrices of (2.7), (2.9), and (2.10) are all the same, so we can write a general form of the step as rd ∆x H −J T −I −J 0 0 ∆y = rp , (3.1) ∆z rc −Z 0 −X where the subscripts on the right side indicate the constraints for dual feasibility, primal feasibility, and the complementarity condition. These are defined by rd = −c − Hx + J T y + z, rp = −b + Jx, and rc will vary depending on the type of step being performed. The matrix of this system is H −J T −I 0 0. K3 = −J −Z 0 −X The system (3.1) can be reduced by eliminating ∆z, resulting in the system H + X −1 Z −J T ∆x r − X −1 rc = d . −J 0 ∆y rp Then ∆z can be recovered by computing ∆z = −X −1 (rc + Z∆x), which since X is diagonal, is an inexpensive computation. The matrix of this system is H + X −1 Z −J T K2 = . −J 0 25 Chapter 3. Properties of matrices in interior-point methods One more step of Gaussian elimination can be performed to eliminate ∆x, resulting in the system J(H + X −1 Z)−1 J T ∆y = −J(H + X −1 Z)−1 (rd − X −1 rc ) − rp . Then ∆z and ∆x can be recovered via ∆z = −X −1 (rc + Z∆x) and ∆x = (H + X −1 Z)−1 (J T ∆y + rd − X −1 rc ), which is an inexpensive computation if the inverse of H + X −1 Z is stored for use several times in each iteration. The matrix of this system is K1 = J(H + X −1 Z)−1 J T . The system (3.1) can be solved using any of the matrices K3 , K2 , or K1 , each of which has different properties. The properties of K2 and K1 are relatively well-known, while K3 has had comparatively little analysis; see Chapter 1 for a review of existing literature. We provide analysis on all formulations, including new results for K3 . The following definitions and notation will be used throughout this and the next chapter. The maximum and minimum singular values of a general matrix B are denoted by σmax (B) and σmin (B) respectively, and similarly the maximum and minimum eigenvalues of a square matrix M are λmax (M ) and λmin (M ). The eigenvalues of H are denoted by λ1 ≥ λ2 ≥ · · · ≥ λn ≥ 0, and the singular values of J are denoted by σ1 ≥ σ2 ≥ · · · ≥ σm ≥ 0. We use θ for the eigenvalues of the matrices K1 , K2 , and K3 as needed to avoid confusion with other quantities. For any vectors u ∈ Rn , v ∈ Null(J)⊥ ⊂ Rn , and w ∈ Rm , the following bounds are satisfied ≤ uT Hu ≤ λ1 u 2 , (3.2) ≤ Jv ≤ σ1 v , (3.3) σm w ≤ J w ≤ σ1 w . (3.4) λn u σm v 2 T Note that the right inequality in (3.3) is satisfied for all v ∈ Rn . Definition 3.1 (Inertia). The inertia of a square matrix S with real eigenvalues is the triple (n+ , n− , n0 ), where n+ , n− and n0 are the numbers of positive, negative, and zero eigenvalues of S, respectively. 26 3.1. Normal equations Theorem 3.2 (Sylvester’s law of inertia, [26]). For any nonsingular matrix R, the matrices S and RSRT have the same inertia. Definition 3.3 (Condition number). The spectral condition number of a matrix B is denoted κ(B) and is defined by κ(B) = σmax (B) . σmin (B) Definition 3.4 (Asymptotic notation, [29, Section 9.2]). f (x) is said to be at most order g(x), written f (x) = O(g(x)), if there exists a positive real number M and a real number x0 such that |f (x)| ≤ M |g(x)| for all x > x0 . Definition 3.5 (Asymptotic bounds). We say that g(x) is an asymptotic upper bound for f (x), and write f (x) g(x), if f (x) ≤ O(g(x)). Similarly, we say that h(x) is an asymptotic lower bound for f (x), and write h(x) f (x), if f (x) ≥ O(h(x)). At each iteration of the interior-point method, the matrices K1 , K2 , and K3 will change since X and Z change. These matrices are collectively referred to as the case of during the interior-point method or during the iterations, as similar properties hold. The limiting case or the case in the limit refers to the situation where τ is taken to zero and we find a solution satisfying the original complementarity condition, and thus the solution to (2.1). Although this case will not occur in the interior-point method and the matrices will not be used, we can assemble matrices in the same forms as K1 , K2 , and K3 and discuss their properties. This analysis is helpful because the interior-point method is iterating towards this solution and thus the matrices of the method iterate towards these limiting matrices. 3.1 Normal equations We first examine the properties of the matrix K1 , the furthest reduced of the formulations. An obvious benefit of the use of this formulation is the smaller size, but spectral properties may not be optimal, as we see in this section. 3.1.1 Nonsingularity First we observe the conditions for nonsingularity of K1 , considering separately the cases of during the interior-point iteration and the limit of the iterations. 27 3.1. Normal equations Theorem 3.6. The matrix K1 is positive definite during the iterations, and thus nonsingular, if and only if J has full rank. Proof. Assume J has full rank. Then for any v = 0, w = J T v = 0 as well. Forming v T K1 v = v T J(H + X −1 Z)−1 J T v = wT (H + X −1 Z)−1 w > 0 since H is positive semi-definite and X and Z are diagonal and strictly positive. Thus J full rank is sufficient for K1 to be positive definite. Assume J does not have full rank. Thus there exists v ∈ Null(J T ), v = 0, and v T K1 v = 0. Thus J full rank is necessary. Theorem 3.7. The matrix K1 is positive definite at the limit of the iterations if and only if the variables x > 0, H is positive definite, and J has full rank. Proof. We show that each of these conditions is necessary. If any element of x = 0, then the matrix X is singular and K1 is as well. Thus x > 0 is necessary. Now assuming x > 0, by complementarity, z = 0, and thus X −1 Z = 0. Then H positive definite is necessary. Assuming that x > 0 and H is positive definite, but that J does not have full rank, there exists v ∈ Null(J), v = 0. Then v T K1 v = 0, and J full rank is necessary. Now for sufficiency, assume x > 0 and H is positive definite. Following the same proof as for 3.6, J together with the two assumed conditions is sufficient for K1 to be positive definite. The case that K1 is positive definite in the limit is a special case: if all variables x are positive, the inequality constraints are all inactive, and the only active constraints are the equality constraints. This means that the solution is the same as the solution to an equality constrained QP, which is much simpler to solve. The conditions of H positive definite and J full rank are also strong conditions, and generally, K1 is singular at the limit. 3.1.2 Inertia Given the conditions for nonsingularity as listed in the previous section, the inertia of K1 is always (m, 0, 0). The inertia is of interest in optimization algorithms since at a local minimum, the problem is locally convex. Thus many methods working with nonconvex problems control the inertia to ensure the correct properties during the iterations; see [16]. 28 3.1. Normal equations 3.1.3 Eigenvalue bounds Bounds on the eigenvalues of K1 can be easily found. Theorem 3.8. The eigenvalues of K1 are bounded in the interval 2 σm σ12 . , λmax (H + X −1 Z) λmin (H + X −1 Z) Proof. The eigenvalue equation for K1 is J(H + X −1 Z)−1 J T v = θv, and multiplying by v T gives v T J(H + X −1 Z)−1 J T v = θ v 2 . (3.5) To find the upper bound, we can bound by the largest eigenvalue of (H + X −1 Z)−1 as follows λmax ((H + X −1 Z)−1 ) J T v 2 ≥ θ v 2. Now using (3.4) and the properties on the eigenvalues of an inverse of a matrix, we have σ12 v 2 ≥ θ v 2. λmin (H + X −1 Z) If v = 0, then it is trivial and is therefore not an eigenvector. Thus v = 0, and we can divide by v 2 to get the upper bound. To find the lower bound, we start from (3.5). Bounding by the smallest eigenvalue of (H + X −1 Z)−1 gives λmin ((H + X −1 Z)−1 ) J T v 2 ≤ θ v 2. Using (3.4) and the properties on the eigenvalues of an inverse of a matrix, we have 2 σm v 2 ≤ θ v 2, λmax (H + X −1 Z) and dividing by v 2 gives the bound. 29 3.1. Normal equations 3.1.4 Condition numbers Using the bounds on eigenvalues from the previous section, we can find a bound on the condition number of K1 . Theorem 3.9. The condition number of K1 is bounded by κ(K1 ) ≤ κ(J)2 κ(H + X −1 Z). Proof. Using the eigenvalue bounds and the definition of condition number, we find σ12 λmin (H+X −1 Z) κ(K1 ) ≤ 2 σm λmax (H+X −1 Z) σ12 2 σm = , λmax (H + X −1 Z) λmin (H + X −1 Z) , = κ(J)2 κ(H + X −1 Z). We can find an asymptotic bound on this condition number as follows. If our solution (x, y, z) exactly solves the relaxed complementarity condition xi zi = µ, then we have X −1 Z = µX −2 . Assuming that at least one element of x goes to zero and does so at the same rate as µ → 0, which is typically the case under our assumptions [52, Theorem 6.8], we have the asymptotic estimates λmin (H + X −1 Z) ≈ λn + µ, λmax (H + X −1 Z) ≈ λ1 + 1/µ ≈ 1/µ. (3.6) Using these, asymptotic bounds on the eigenvalues of K1 are µ θ 1/(λn + µ). The condition number is then O( µ(λn1+µ) ), which simplifies to κ(K1 ) 1 µ, 1 , µ2 if λn > 0 if λn = 0. (3.7) The estimate for λn > 0 is in line with those of [50], who assumes that a second-order sufficiency condition holds. The asymptotic bound hides a factor of κ(J), however we focus on the effect of µ, which changes throughout the iterations. 30 3.2. Saddle-point form 3.2 Saddle-point form We now move on to examine the saddle-point form, the matrix K2 . This formulation is widely used and advocated, and has the advantage of the vast analysis on saddle-point matrices in general. We specialize these to find spectral properties of K2 . 3.2.1 Nonsingularity We can easily observe the conditions for nonsingularity of K2 , since it is a standard saddle-point matrix and there is existing analysis of these matrices. Theorem 3.10. The matrix K2 is nonsingular during the iterations if and only if J has full rank. Proof. Noting that H + X −1 Z is positive definite, this follows directly from [4, Theorem 3.1]. Similarly to the normal equations, the case that K2 is nonsingular at the limit is a special case, requiring the variables x > 0. Theorem 3.11. The matrix K2 is nonsingular at the limit of the iteration if and only if x > 0, J has full rank, and Null(H) ∩ Null(J) = {0}. Proof. If any element of x = 0, then the matrix X is singular and K2 is as well. Thus x > 0 is necessary. Assuming x > 0, by complementarity, z = 0, and thus X −1 Z = 0. Using [4, Theorem 3.2], J has full rank and Null(H) ∩ Null(J) = {0} are necessary and sufficient. 3.2.2 Inertia Assuming the conditions for nonsingularity given previously, the inertia of K2 is (n, m, 0). 3.2.3 Eigenvalue bounds Eigenvalue bounds for a saddle-point matrix are given in [46]; the results here directly follow these. The (1, 1) block of K2 is H + X −1 Z and is strictly positive definite during the iterations; note also that X and Z change at each iteration, so these bounds will change at each iteration as well. 31 3.2. Saddle-point form Theorem 3.12. The eigenvalues of K2 are bounded by the following: 1 λmin (H + X −1 Z) − 2 1 λmax (H + X −1 Z) − θ≤ 2 λmin (H + X −1 Z)2 + 4σ12 , θ≥ 2 , λmax (H + X −1 Z)2 + 4σm for negative eigenvalues θ, and θ ≥ λmin (H + X −1 Z), θ≤ 1 2 λmax (H + X −1 Z) + λmax (H + X −1 Z)2 + 4σ12 , for positive eigenvalues θ. 3.2.4 Condition numbers Using the bounds from the previous section, we can form an asymptotic bound on the condition number. Theorem 3.13. The condition number of K2 is bounded asymptotically by κ(K2 ) 1 . µ2 Proof. Using the situation where the extremal eigenvalues of H + X −1 Z are approximated by (3.6), we obtain the asymptotic estimates µ θ 1/µ for the positive eigenvalues, and −1 θ −µ for the negative eigenvalues. The asymptotic bounds on the negative eigenvalues are simplified from Taylor expansions. Using these bounds on the eigenvalues gives the condition number. Remark. The lower bound on the negative eigenvalues is finite, and thus the remaining three bounds are responsible for the ill-conditioning of K2 . When λn = 0, K2 has the same asymptotic conditioning as K1 , but when λn > 0, it appears that K2 has worse asymptotic conditioning than K1 , due to the upper bound on the negative eigenvalues. 32 3.3. Unreduced 3-by-3 form 3.3 Unreduced 3-by-3 form We now cover the properties of the matrix K3 , providing several new results. 3.3.1 Nonsingularity We consider the conditions for nonsingularity of K3 during the iterations. Theorem 3.14. The matrix K3 is nonsingular during the iterations if and only if J has full rank. Proof. Assume that there exists a nontrivial nullspace vector (u, v, w), that is, 0 u H J T −I J 0 0 v = 0 . (3.8) 0 w −Z 0 −X Solving the third block row of (3.8) for w, which is permissible since X is positive definite during the iteration, yields w = −X −1 Zu. Taking the inner product of the first block row with u and substituting for w gives uT (H + X −1 Z)u + (Ju)T v = 0. The second block row is Ju = 0, so using this we reduce to uT (H + X −1 Z)u = 0. (3.9) Since X and Z are diagonal and strictly positive throughout the iteration, H + X −1 Z is positive definite. Thus (3.9) has only the trivial solution u = 0, which implies w = 0. It follows from the first block row of (3.8) that J T v = 0. If J has full rank, this implies v = 0. If J does not have full row rank, (u, v, w) with v ∈ Null(J T ), v = 0, is a nontrivial null vector. Therefore the condition of J full rank is both necessary and sufficient for K3 to be nonsingular during the iteration. Remark. It is sufficient for nonsingularity of K3 to assume that H is positive semidefinite on the nullspace of J only, since then (3.9) still implies that u = 0. However, in this case (2.1) is no longer a convex QP and the duality relationship with (2.5) does not hold. This definiteness assumption is common in literature; see [26]. We now consider the limit of the iteration. 33 3.3. Unreduced 3-by-3 form Theorem 3.15. The matrix K3 is nonsingular at the limit of the iterations if and only if the solution (x, y, z) is strictly complementary, Null(H) ∩ Null(J)∩Null(Z) = {0}, and the linear independence constraint qualification (LICQ) is satisfied. Proof. If (x, y, z) is not strictly complementary, there is a zero row in the third block row of (3.8) and K3 is singular. Therefore, strict complementarity is necessary. If Null(H) ∩ Null(J) ∩ Null(Z) = {0}, take u ∈ Null(H) ∩ Null(J) ∩ Null(Z), u = 0, v = 0 and w = 0. Since Hu = Ju = −Zu = 0, it follows from (3.8) that (u, v, w) is a nontrivial null vector of K3 . Thus this condition is necessary. Now, assume strict complementarity and Null(H) ∩ Null(J) ∩ Null(Z) = {0}, and suppose (u, v, w) is in the nullspace of K3 . Since zI = 0 at the solution, the third block row of (3.8) and strict complementarity yield uA = 0 and wI = 0. Therefore, uT w = 0. Taking the inner product of the first block row of (3.8) with u and substituting Ju = 0 from the second block row gives uT Hu = 0, thus u ∈ Null(H). Noting that Ju = 0 is equivalent to saying u ∈ Null(J), and uA = 0 is equivalent to u ∈ Null(Z), combining all conditions on u gives u ∈ Null(H) ∩ Null(J) ∩ Null(Z), which implies that u = 0 by the assumption. Eliminating u and wI from (3.8), we have JT −IA v wA = 0, which has only the trivial solution (v, wA ) = (0, 0) if and only if the LICQ holds. Thus the three conditions together are sufficient, and the LICQ is necessary. Remark. The condition that Null(H) ∩ Null(J) ∩ Null(Z) = {0} cannot be verified before a solution is found, but Null(H)∩Null(J) = {0} is a sufficient condition. Comparing the requirements for nonsingularity of K1 , K2 , and K3 , we see that during the iterations, all are nonsingular when J has full rank. However, at the limit, K1 and K2 are singular unless x > 0, or all the inequality constraints are inactive, together with other conditions on J and H. Contrast this with the requirements for nonsingularity of K3 . Strict complementarity is a common assumption for the solution algorithms, and we have already detailed how this requirement can be dealt with. Null(H) ∩ Null(J) = {0} is another common assumption for optimization algorithms. Finally, the LICQ is also a common assumption for analysis of algorithms, though it is 34 3.3. Unreduced 3-by-3 form too strong for use in practice. We shall see that when some of these conditions do not hold, regularization alleviates the difficulties. Numerical results also indicate that even when all forms of the system are singular in the limit and the matrices of the iteration become increasingly ill-conditioned, the conditioning of K3 is superior. 3.3.2 Inertia To work with the eigenvalues and inertia of K3 , a symmetric matrix is required to guarantee that the eigenvalues are real. By using a similarity transformation with the diagonal matrix D defined by I 0 0 D = 0 I 0 , 1 0 0 Z2 (3.1) can be symmetrized to give 1 rd ∆x H −J T −Z 2 0 0 ∆y = rp , −J 1 1 ∆z Z − 2 rc 0 −X −Z 2 (3.10) 1 where ∆z = Z − 2 ∆z. The matrix from this system is 1 H −J T −Z 2 ˆ 3 := D−1 K3 D = 0 0 . K −J 1 0 −X −Z 2 ˆ 3 both during the iterations and in We find results on the inertia of K the limit. We first need the following result. Lemma 3.16 ([26], Lemma 3.1). Let 0 0 It B := 0 S 0 It 0 0 for a arbitrary square matrix S and size t identity matrix It . Then B has t eigenvalues −1, t eigenvalues 1, and the remaining eigenvalues the same as those of S. Theorem 3.17. Assume that J has full rank and Null(H) ∩ Null(J) = {0}. ˆ 3 during the iterations is (n, n + m, 0). Then the inertia of K 35 3.3. Unreduced 3-by-3 form Proof. Let the matrix N be an orthonormal nullspace basis matrix for J. Since Null(H) ∩ Null(J) = {0}, N T HN is positive definite. The following construction follows the proof of [26, Lemma 3.2]. Complete the basis N so that Y N is orthogonal. Then J Y N = L 0 , where L is nonsingular. Define Y N 0 0 M1 := 0 0 I 0 . 0 0 0 I Then we have Y T HY T N HY ˆ 3 M1 = M1T K L 1 −Z 2 Y Let In−m 0 M2 := − 1 L−T Y T HY 2 0 Y T HN N T HN 0 1 −Z 2 N 1 −Y T Z 2 1 −N T Z 2 . 0 −X LT 0 0 0 0 Im −L−T Y T HN 0 0 0 L−T 0 0 0 . 0 I Then 0 ˆ 3 M 1 M2 = 0 M2T M1T K Im 1 −Z 2 Y 0 T N HN 0 1 −Z 2 N 1 Im −Y T Z 2 1 0 −N T Z 2 . 0 0 0 −X Finally, let I 0 M3 := 0 0 1 0 I 0 0 0 0 0 R , I S 0 I 1 where R := −(N T HN )−1 N T Z 2 and S := −Y T Z 2 . The product M := ˆ 3 is nonsingular by Theorem 3.14, the M1 M2 M3 is nonsingular, and since K matrix 0 0 Im 0 T ˆ 3 M = 0 N HN 0 0 , MT K (3.11) Im 0 0 0 0 0 0 G 36 3.3. Unreduced 3-by-3 form 1 1 withG = −X − Z 2 N (N T HN )−1 N T Z 2 , is also nonsingular. From the previously noted Lemma 3.16, Sylvester’s law of inertia, and the fact that G ˆ 3 has n + m negative and n positive is negative definite, we obtain that K eigenvalues. This result holds in the limit. Theorem 3.18. Assume that Null(H) ∩ Null(J) = {0} and (x, y, z) is a solution of (2.1) and (2.5) where strict complementarity and the LICQ are ˆ 3 is (n, n + m, 0). satisfied. Then the inertia of K Proof. Following the proof of Theorem 3.17, (3.11) still holds. G is clearly at least negative semidefinite, and since M is nonsingular by the proof of ˆ 3 is nonsingular by Theorem 3.15, G is in fact negative Theorem 3.17 and K ˆ 3 has n + m negative and n positive eigenvalues. definite. Therefore K Remark. In fact, the assumption of the LICQ can be weakened to assume only that J is full rank, and G can be proven negative definite via strict complementarity. ˆ 3 is the same during the iterations and in the limit with The inertia of K conditions sufficient for nonsingularity. This is in contrast to the cases for K1 and K2 . Theorems 3.17 and 3.18 also hold for indefinite H, as long as the condition of positive definiteness on the nullspace of J is satisfied. 3.3.3 Eigenvalue bounds We now find bounds on the eigenvalues of the symmetric indefinite matrix ˆ 3 , which given the similarity transformation will also apply to K3 . We K ˆ 3 and K3 are nonsinassume that the conditions of Theorem 3.15 hold, so K gular throughout the iterations and at the limit. Our technique uses energy estimates in the style of [46]. ˆ 3 is formulated as The eigenvalue problem for K 1 H J T −Z 2 u u J 0 0 v v. = θ (3.12) 1 w w −Z 2 0 −X ˆ 3 , and thus also K3 , are Theorem 3.19. The positive eigenvalues of K bounded in min j 1 2 λ n − xj + (λn + xj )2 + 4zj , 1 2 λ1 + λ21 + 4(σ12 + zmax ) . 37 3.3. Unreduced 3-by-3 form The lower positive bound in Theorem (3.19) is useful in the case that λn = 0, but in the case where H is strictly positive definite, there is an alternative uniform lower bound. ˆ3 Corollary 3.20. A uniform lower bound for the positive eigenvalues of K is given by λn . The following proof covers both the theorem and corollary. Proof. We proceed one bound at a time, using energy estimates and establishing the desired inequalities. First we find the upper bound. Since θ and x are both positive, the matrix θI + X is nonsingular. We can then solve for w in the third block row of (3.12) to get 1 w = −(θI + X)−1 Z 2 u. Substituting into the first block row of (3.12), we obtain 1 1 Hu + J T v + Z 2 (θI + X)−1 Z 2 u = θu. 1 Taking the inner product with u, and noting that the matrices Z 2 and (θI +X)−1 are diagonal and therefore commute, gives the following equation for θ: uT Hu + uT J T v + uT (θI + X)−1 Zu = θ u 2 . (3.13) Solving for v in the second block row of (3.12) gives v = substitute into (3.13) to get uT Hu + 1 Ju θ 2 1 θ Ju, + uT (θI + X)−1 Zu = θ u 2 . which we (3.14) Now we bound terms in (3.14). We use (3.2) and (3.3) to bound the first and second terms in (3.14), giving λ1 u 2 1 + σ12 u θ 2 + uT (θI + X)−1 Zu ≥ θ u 2 . Since the matrix (θI + X)−1 Z is diagonal, we may bound the remaining term on the left with the maximum of the diagonal elements, leaving 1 zi λ1 + σ12 + max i θ + xi θ u 2 ≥ θ u 2. This maximum term can be bounded as follows zi zmax max ≤ , i θ + xi θ 38 3.3. Unreduced 3-by-3 form where zmax indicates the maximum value of zi at the current iterate. This bound becomes tight as the iterations proceed, since generally at least one xi → 0. On the other hand, if this does not occur, then all zi = 0 at the limit and therefore zmax = 0. We have 1 zmax λ1 + σ12 + θ θ u 2 ≥ θ u 2. Multiplying by θ and rearranging gives θ2 − λ1 θ − (σ12 + zmax ) u 2 ≤ 0. The vector u must be nonzero. If u = 0 then the second block row of (3.12) implies θv = 0, and since θ is strictly positive, this gives v = 0. The first 1 block row then yields Z 2 w = 0 and w = 0, which is a contradiction since an eigenvector must be nontrivial. We can therefore divide by u 2 and bound by the larger root of the quadratic to obtain θ≤ 1 2 λ1 + λ21 + 4(σ12 + zmax ) , (3.15) giving an upper bound. Next, we find a lower bound on θ. Taking the inner product of v with the second block row of (3.12), we have v T Ju ≡ uT J T v = θ v 2 , which we substitute into (3.13) to give uT Hu + θ v 2 + uT (θI + X)−1 Zu = θ u 2 . Using (3.2), we have λn u 2 +θ v 2 + uT (θI + X)−1 Zu ≤ θ u 2 . Bounding the last term on the left with a minimum of the diagonal elements, this becomes zi u 2 ≤ θ u 2. (3.16) λn u 2 + θ v 2 + min i θ + xi This minimum will occur for some index j, and we then multiply by θ + xj and rearrange into (θ2 + (xj − λn )θ − (λn xj + zj )) u 2 ≥θ v 2 ≥ 0. 39 3.3. Unreduced 3-by-3 form Since again u = 0, we then bound by the positive root of the quadratic, giving 1 λn − xj + (λn + xj )2 + 4zj . θ≥ 2 Taking the minimum over all j gives the bound. To find the uniform lower bound given in the corollary, we begin from (3.16) and bound the minimum term below by zero, since the terms are all positive. Then rearranging we have (θ − λn ) u 2 ≥θ v 2 ≥ 0, which since u = 0 gives the bound θ ≥ λn . We now consider bounds on the negative eigenvalues, but are only able to find an effective lower negative bound. Theorem 3.21. Assume that θI + X is nonsingular for all θ < 0 in the ˆ 3 , and thus also K3 , are ˆ 3 . The negative eigenvalues of K spectrum of K bounded in [ζ, 0), where ζ := min 1 2 λn − λ2n + 4σ12 , min {j|θ+xj <0} θj∗ and θj∗ is the smallest negative root of the cubic equation θ3 + (xj − λn )θ2 − (σ12 + zj + xj λn )θ − σ12 xj = 0. Proof. Proceeding as in the proof of Theorem 3.19, we start from (3.14) with the bounds in (3.2) and (3.3) to get λn u 2 1 + σ12 u θ 2 + uT (θI + X)−1 Zu ≤ θ u 2 . Bounding the last term of the left-hand side by the minimum, zi 1 λn + σ12 + min i θ + xi θ u 2 ≤ θ u 2. (3.17) We now need to consider two cases. In case one, θ + xi > 0 for all indices i, and in this case we can bound the minimum term from below by zero. In case zj zi two, some θ+xi < 0, and there exists an index j such that mini θ+x = θ+x . i j 40 3.3. Unreduced 3-by-3 form Beginning with case one, we start from (3.17) and bound the minimum term below by zero, giving 1 (λn + σ12 ) u 2 ≤ θ u 2 , θ which we can multiply by θ and rearrange to give (θ2 − λn θ − σ12 ) u 2 ≤ 0. Since u = 0, we can divide by u 2 and bound by the root of the quadratic to get the bound 1 λn − λ2n + 4σ12 . θ≥ 2 For case two, we use the index j for the minimum in (3.17) to get zj 1 λn + σ12 + θ θ + xj u 2 ≤ θ u 2. Multiplying by θ(θ + xj ) > 0 and rearranging, we get θ3 + (xj − λn )θ2 − (σ12 + zj + xj λn )θ − σ12 xj Since u = 0, we can divide by u the cubic 2 and define θj∗ u 2 ≥ 0. to be the smallest root of θ3 + (xj − λn )θ2 − (σ12 + zj + xj λn )θ − σ12 xj = 0. Evaluating this cubic at θ = 0 gives −σ12 xj < 0, so there must be at least one positive real root because the cubic increases to infinity as θ increases. Looking at the derivative of the cubic and evaluating again at θ = 0 we have −σ12 − zj − xj λn . Since the slope of the cubic for large negative θ would be positive, this indicates that there are either two negative roots or a pair of complex conjugate roots. If no cubic offers a real negative root, this implies that case one applies. The bound is given by the minimum of these possible bounds from cases one and two. Remark. It is not possible in practice to know which indices j satisfy θ+xj < 0, but in computations, θj∗ can be computed for all indices j. Regarding the unsatisfying upper negative bound, the bounds in Theorems 3.19 and 3.21 are pessimistic, as we shall see in the discussion of the condition number next. Regularization, covered in the next chapter, will allow more definite conclusions, including a nonzero upper negative bound. 41 3.3. Unreduced 3-by-3 form 3.3.4 Condition numbers Since we have not been able to find an upper bound for the negative eigenvalues, we cannot find an estimate of the condition number. However, we can make some general statements on the conditioning of K3 . Under the ˆ 3 is nonsingular and converges to assumptions of Theorems 3.14 and 3.15, K a well-defined limit as µ → 0. Therefore, its condition number is asymptotically uniformly bounded independently of µ, which is not reflected by the bounds on the eigenvalues. In practice, we have observed that the condiˆ 3 and K3 are typically substantially better than that of K1 or tioning of K K2 . One potential explanation for this behaviour is that using K3 avoids division by elements of X, avoiding the large roundoff errors that can occur with such divisions. Some observations on the conditioning of these matrices have been previously made; see Chapter 1. 42 Chapter 4 Regularization and the properties of matrices As seen in Chapter 3, there are numerical difficulties with K1 and K2 , and while K3 has finite outer bounds and less restrictive conditions for nonsingularity, some conditions are difficult to verify before solving. Additionally, the lack of an upper negative bound on the eigenvalues means that we have no estimate for the condition number. To alleviate these difficulties, we consider a regularized version of (2.1). There are several ways to regularize: primal regularization alleviates ill-conditioning of the Hessian, while dual regularization alleviates ill-conditioning of the Jacobian. Details of several regularization approaches can be found in [20]. We focus on the two-parameter approach of [20] which applies both primal and dual regularization. For parameters ρ > 0 and δ > 0, the regularized primal QP is 1 1 cT x + xT Hx + ρ x − xk 2 2 subject to Jx + δr = b, x ≥ 0. minimize x,r 2 1 + δ r + yk 2 2 (4.1) In this formulation, xk and y k are the current iterates for x and y respectively. The dual problem corresponding to (4.1) is 1 1 1 bT y − xT Hx − δ y − y k 2 − ρ s + xk x,y,z,s 2 2 2 T subject to − Hx + J y + z − ρs = c, z ≥ 0. maximize 2 (4.2) By setting δ = ρ = 0, the original primal-dual pair (2.1) and (2.5) is recovered. Given this modified primal dual-pair, we can proceed with an interiorpoint method. The KKT conditions with a modified complementarity con- 43 Chapter 4. Regularization and the properties of matrices dition, similarly to (2.8), are c + Hx + ρs − J T y − z δ(r + y k ) − δy k = 0. ρx − ρ(s + x ) Fτ (x, y, z, r, s) = b − Jx − δr −XZe + τ e An interior point-method for these problems is proposed in [20] that converges under standard conditions with either fixed or decreasing regularization parameters. The 3-by-3 system to solve at each iteration, after eliminating the variables s and r and substituting the current iterates for each variable, is rd ∆x H + ρI −J T −I −J −δI 0 ∆y = rp . ∆z rc −Z 0 −X The vectors on the right hand side are defined, depending on the context, as in (2.9) or (2.10). The matrix of this system is H + ρI −J T −I −δI 0 . K3,reg = −J −Z 0 −X Using Gaussian elimination, we can eliminate ∆z, resulting in the system H + X −1 Z + ρI −J T −J −δI ∆x ∆y = rd − X −1 rc . rp Then ∆z can be recovered by computing ∆z = −X −1 (rc + Z∆x). The matrix from this system is K2,reg = H + X −1 Z + ρI −J T −J −δI . One more step of Gaussian elimination can be performed to eliminate ∆x, resulting in the system J(H + X −1 Z + ρI)−1 J T + δI ∆y = −J(H + X −1 Z + ρI)−1 (rd − X −1 rc ) − rp . The remaining variables can be recovered via ∆z = −X −1 (rc + Z∆x) and ∆x = (H + X −1 Z + ρI)−1 (J T ∆y + rd − X −1 rc ). The matrix from this system is K1,reg = J(H + X −1 Z + ρI)−1 J T + δI . 44 4.1. Normal equations 4.1 Normal equations We first consider the matrix K1,reg , the regularized analogue to K1 . 4.1.1 Nonsingularity First we observe the conditions for nonsingularity of K1,reg , both during the iterations and in the limit. Theorem 4.1. For δ > 0, the matrix K1,reg is unconditionally positive definite during the iterations. Proof. Since x is strictly positive during the iterations, H + X −1 Z + ρI is positive definite, and J(H + X −1 Z + ρI)−1 J T is positive semidefinite. Thus for δ > 0, K1,reg is positive definite. Theorem 4.2. For δ, ρ > 0, the matrix K1,reg is positive definite at the limit of the iteration if and only if x > 0. Proof. If any element of x = 0, then the matrix X is singular and K1,reg is as well. Thus x > 0 is necessary. Assume x > 0. Then by complementarity, z = 0, and thus X −1 Z = 0. Then H +X −1 Z +ρI = H +ρI is positive definite, and thus K1,reg is positive definite. Comparing to the unregularized case, we note that during the iterations, J is no longer required to be full rank, and in the limit the requirements of H positive definite and J full rank are not needed. 4.1.2 Inertia Given the conditions for nonsingularity as listed in the previous section, the inertia of K1,reg is always (m, 0, 0). Both K1,reg and K1 are thus strictly positive definite, but under weaker assumptions for K1,reg . 4.1.3 Eigenvalue bounds Bounds on the eigenvalues of K1,reg can be easily found. Theorem 4.3. The eigenvalues of K1,reg are bounded in the interval 2 σm σ12 + δ, +δ . λmax (H + X −1 Z + ρI) λmin (H + X −1 Z + ρI) 45 4.1. Normal equations Proof. The eigenvalue equation for K1,reg is J(H + X −1 Z + ρI)−1 J T + δI v = θv, and multiplying by v T gives v T J(H + X −1 Z + ρI)−1 J T + δI v = θ v 2 . (4.3) First to find the upper bound, we can bound by the largest eigenvalue of (H + X −1 Z + ρI)−1 as follows λmax ((H + X −1 Z + ρI)−1 ) J T v 2 +δ v 2 ≥ θ v 2. Now using (3.4) and the properties on the eigenvalues of an inverse of a matrix, we have σ12 + δI λmin (H + X −1 Z + ρI) v 2 ≥ θ v 2. If v = 0, then it is trivial and thus not an eigenvector. We may then divide by v 2 to get the upper bound. To find the lower bound, we start from (4.3). Bounding by the smallest eigenvalue of (H + X −1 Z + ρI)−1 gives λmin ((H + X −1 Z + ρI)−1 ) J T v 2 +δ v 2 ≤ θ v 2. Using (3.4) and the properties on the eigenvalues of an inverse of a matrix, we have 2 σm + δ v 2 ≤ θ v 2, λmax (H + X −1 Z + ρI) and dividing by v 4.1.4 2 gives the bound. Condition numbers We use the eigenvalue bounds from the previous section to find a bound on the condition number of K1,reg Theorem 4.4. The condition number of K1,reg is bounded by κ(K1,reg ) ≤ σ12 + δλmin (H + X −1 Z + ρI) κ(H + X −1 Z + ρI). 2 + δλ −1 Z + ρI) σm max (H + X 46 4.2. Saddle-point form Proof. Using the eigenvalue bounds and the definition of condition number, we find κ(K1,reg ) ≤ σ12 λmin (H+X −1 Z+ρI) +δ 2 σm λmax (H+X −1 Z+ρI) +δ , λmax (H + X −1 Z + ρI) σ12 + δλmin (H + X −1 Z + ρI) , 2 + δλ −1 Z + ρI)) λmin (H + X −1 Z + ρI) (σm max (H + X σ 2 + δλmin (H + X −1 Z + ρI) κ(H + X −1 Z + ρI). = 21 σm + δλmax (H + X −1 Z + ρI) = From this condition number bound, it it easy to see that if both ρ and δ are positive, the condition number is strictly smaller than the unregularized matrix. In the case that the extremal eigenvalues of H + X −1 Z can be approximated by (3.6), the eigenvalues of K1,reg are asymptotically bounded by δ θ 1/ρ. Then we can find the following asymptotic condition number κ(K1,reg ) 1 . ρδ (4.4) √ In practice, ρ and δ are allowed to take values as small as εmach . In this case, there is a definite disadvantage to using the normal equations formulation since the condition number likely exceeds the inverse of machine precision early. Indeed, the implementation of [20] initializes ρ = δ = 1 and divides both parameters by 10 at each iteration. In double precision, after just 8 iterations, the smallest allowed value of 10−8 is reached but the procedure typically has not yet converged. 4.2 Saddle-point form We now consider spectral properties of K2,reg . 4.2.1 Nonsingularity Using existing analysis for saddle-point matrices, we find conditions for nonsingularity of K2,reg . Theorem 4.5. For δ > 0, the matrix K2,reg is unconditionally nonsingular during the iterations. 47 4.2. Saddle-point form Proof. Since x is strictly positive during the iterations, H + X −1 Z + ρI is positive definite. Thus for δ > 0, nonsingularity follows directly from [4, Theorem 3.1]. Theorem 4.6. For δ, ρ > 0, the matrix K2,reg is nonsingular at the limit of the iteration if and only if x > 0. Proof. If any element of x = 0, then the matrix X is singular and K2,reg is as well. Thus x > 0 is necessary. Assume x > 0. Then by complementarity, z = 0, and thus X −1 Z = 0. Then H +X −1 Z +ρI = H +ρI is positive definite, and again, nonsingularity follows directly from [4, Theorem 3.1]. As in the case of the normal equations, nonsingularity requires fewer assumptions than in the unregularized case. 4.2.2 Inertia Assuming the conditions for nonsingularity given previously, the inertia of K2,reg is (n, m, 0). This result can also be found in [20]. 4.2.3 Eigenvalue bounds Eigenvalue bounds for K2,reg are given in [20, Theorem 5.1] following results of [46] and [47]. Theorem 4.7 ([20]). The eigenvalues of K2,reg are bounded by the following: θ≥ 1 λmin (H + X −1 Z + ρI) − δ 2 − θ≤ (λmin (H + X −1 Z + ρI) + δ)2 + 4σ12 , 1 λmax (H + X −1 Z + ρI) − δ 2 − 2 , (λmax (H + X −1 Z + ρI) + δ)2 + 4σm 48 4.2. Saddle-point form for negative eigenvalues θ, and θ ≥ λmin (H + X −1 Z + ρI), θ≤ 1 λmax (H + X −1 Z + ρI) − δ 2 + (λmax (H + X −1 Z + ρI) + δ)2 + 4σ12 , for positive eigenvalues. Additionally, −δ is an eigenvalue of K2,reg if and only if J is rank deficient. Remark. Using a Taylor expansion on the upper negative bound gives θ≤ 1 λmax (H + X −1 Z + ρI) − δ 2 − ≈ 2 , (λmax (H + X −1 Z + ρI) + δ)2 + 4σm 1 λmax (H + X −1 Z + ρI) − δ 2 − (λmax (H + X −1 Z + ρI) + δ) 1 + = −δ − 2 2σm (λmax (H + X −1 Z + ρI) + δ)2 , 2 2σm , (λmax (H + X −1 Z + ρI) + δ) ≤ −δ, thus all negative eigenvalues are bounded above by −δ. The upper negative bound of approximately −δ is the only place in which the bounds of the regularized version differ substantially from the unregularized. This will play a role in the conditioning of the matrix, seen next. 4.2.4 Condition numbers Using the bounds from the previous section, we can find an asymptotic bound on the condition number. Theorem 4.8. The condition number of K2,reg is bounded asymptotically by 1 κ(K2,reg ) . µ min(ρ, δ) 49 4.3. Unreduced 3-by-3 form Proof. Using the situation where the extremal eigenvalues of H + X −1 Z are approximated by (3.6), we obtain the asymptotic estimates ρ θ λmax (H + X −1 Z + ρI) ≈ 1/µ for the positive eigenvalues, and −1 θ −δ for the negative eigenvalues. Using these asymptotic bounds on the eigenvalues gives the condition number. As in the unregularized case, the lower negative bound is finite, and thus the remaining three bounds are responsible for the condition number. The limits of machine precision, given the common bounds on δ and ρ, are √ not achieved until µ reaches εmach , which typically occurs in the last few iterations. 4.3 Unreduced 3-by-3 form We now turn to K3,reg , providing new results on the spectral properties. 4.3.1 Nonsingularity We start by stating necessary and sufficient conditions for nonsingularity during the iterations. Lemma 4.9. For δ > 0, the matrix K3,reg given is nonsingular throughout the interior-point iterations. Proof. Looking at the system H + ρI J T −I u 0 J −δI 0 v = 0 , −Z 0 −X w 0 (4.5) we attempt to find a nontrivial solution, that is, a nonzero element (u, v, w) in the nullspace of K3,reg . From the third block row, we have w = −X −1 Zu. Taking the inner product of the first block row with u and substituting for w yields uT (H + ρI + X −1 Z)u + (Ju)T v = 0, 50 4.3. Unreduced 3-by-3 form Using using v = 1δ Ju from the second block row, this simplifies to uT 1 (H + ρI + X −1 Z) + J T J u = 0. δ Since X and Z are diagonal and positive definite, and the remaining matrices are positive semidefinite, the matrix associated with the above equation is positive definite for any ρ ≥ 0. Therefore, the only solution is u = 0, which implies v = 0 and w = 0. Lemma 4.10. For δ = 0, the matrix K3,reg is nonsingular throughout the interior-point iterations if and only if J has full rank. Proof. Assume that (u, v, w) lies in the nullspace of K3,reg . The only difference with the proof of Lemma 4.9 is that Ju = 0. Following the same steps, we obtain uT H + ρI + X −1 Z u = 0. Again, the matrix is positive definite, giving u = 0 as the only solution, and this implies that w = 0. The first block row leaves us with J T v = 0, which implies v = 0 if J has full rank. Therefore the full rank condition for J is a sufficient condition for nonsingularity of K3,reg . If J does not have full row rank, then (u, v, w) with v ∈ Null(J T ), v = 0, u = 0, w = 0 is a nontrivial nullspace vector. Therefore the full-rank condition on J is both necessary and sufficient. We collect Lemmas 4.9 and 4.10 into a unified theorem. Theorem 4.11. Let ρ ≥ 0. The matrix K3,reg is nonsingular throughout the interior-point iterations if and only if either δ > 0, or δ = 0 and J has full rank. We now consider what happens to K3,reg in the limit of the interior-point iteration. If (x, y, z) is not strictly complementary, there is a zero row in the third block row of (4.5) and K3,reg is singular. Thus strict complementarity is necessary for nonsingularity in each case. The proof of each lemma in this section attempts to find a nontrivial nullspace element of K3,reg . Lemma 4.12. For ρ > 0 and δ > 0, K3,reg is nonsingular at the limit of the interior-point iteration if and only if (x, y, z) is strictly complementary. Proof. Since zI = 0 at the solution, the third block row of (4.5) and strict complementarity yield uA = 0 and wI = 0, and therefore, uT w = 0. We 51 4.3. Unreduced 3-by-3 form take the inner product of the first block row with u, substitute v = from the second block row and use uT w = 0 to get uT 1 δ Ju 1 H + ρI + J T J u = 0. δ Since ρ > 0, the coefficient matrix is positive definite. Therefore, u = 0. It follows that v = 0, and the first block row leaves w = 0. Therefore K3,reg is nonsingular, and strict complementarity is sufficient. Lemma 4.13. For ρ > 0 and δ = 0, K3,reg is nonsingular at the limit of the interior-point iteration if and only if the solution (x, y, z) is strictly complementary and the LICQ is satisfied. Proof. The only difference with the proof of Lemma 4.12 is that Ju = 0. Following the same steps, we obtain uT (H+ρI)u = 0. Since H+ρI is positive definite, the only solution is u = 0. Examining the remaining equations, we have v J T −IA = 0, wA which has only the trivial solution (v, wA ) = (0, 0) if and only if the LICQ holds. Thus these two conditions together are sufficient, and the LICQ is necessary, completing the proof. Lemma 4.14. For ρ = 0 and δ > 0, K3,reg is nonsingular at the limit of the interior-point iteration if and only if Null(H) ∩ Null(J) ∩ Null(Z) = {0} and the solution (x, y, z) is strictly complementary. Proof. Following again the same steps as in the proofs of the previous two lemmas, taking an inner product with u, we obtain this time uT 1 H + J T J u = 0. δ The coefficient matrix above is positive semidefinite, and thus we must have u ∈ Null(H) ∩ Null(J). Since uA = 0, we also have u ∈ Null(Z). Thus u = 0 if Null(H) ∩ Null(J) ∩ Null(Z) = {0}, and the second block row gives v = 0. The first block row leaves w = 0, so the only solution is the trivial solution. Thus Null(H)∩Null(J)∩Null(Z) = {0} together with strict complementarity are sufficient for nonsingularity of K3,reg . Assume now that Null(H)∩Null(J)∩Null(Z) = {0}, then for any nonzero u ∈ Null(H)∩Null(J)∩Null(Z), we have a nontrivial nullspace vector of the form (u, 0, 0). Therefore the condition is both necessary and sufficient. 52 4.3. Unreduced 3-by-3 form Lemmas 4.12, 4.13, and 4.14 are all summarized in the following theorem. Also included in the theorem is the unregularized case, δ = ρ = 0, which is covered in Theorem 3.15. Theorem 4.15. Necessary and sufficient conditions for the matrix K3,reg to be nonsingular at the limit of the interior-point iteration are that the solution (x, y, z) be strictly complementary, Null(H) ∩ Null(J) ∩ Null(Z) = {0} if ρ = 0, and the LICQ be satisfied if δ = 0. Table 4.1 summarizes all conditions both during the iterations and at the limit covered here and in Chapter 3. We note that regularization with positive ρ and δ gives nonsingularity with no further requirements during the iterations, and that it removes all but the requirement of strict complementarity for nonsingularity in the limit. Table 4.1: A summary of the necessary and sufficient conditions on nonsingularity throughout the interior-point iterations and at the limit. δ>0 ρ>0 IP iteration: unconditional nonsingularity (Lemma 4.9) at limit: (x, y, z) is strictly complementary (Lemma 4.12) δ=0 IP iteration: J has full rank (Lemma 4.10) at limit: (x, y, z) is strictly complementary, and the LICQ (Lemma 4.13) 4.3.2 ρ=0 IP iteration: unconditional nonsingularity (Lemma 4.9) at limit: Null(H) ∩ Null(J) ∩ Null(Z) = {0}, (x, y, z) is strictly complementary (Lemma 4.14) IP iteration: J has full rank (Theorem 3.14 and Lemma 4.10) at limit: Null(H) ∩ Null(J) ∩ Null(Z) = {0}, (x, y, z) is strictly complementary, and the LICQ (Theorem 3.15) Inertia We again require a symmetric matrix to ensure real eigenvalues. The matrix K3,reg is symmetrizable with the diagonal matrix D used earlier in (3.10), ˆ 3,reg . This allows us to consider bounds generating the symmetric matrix K 53 4.3. Unreduced 3-by-3 form in real arithmetic for the latter, which will then also apply to the matrix ˆ 3,reg is given by K3,reg . The matrix K 1 H + ρI J T −Z 2 −δI 0 J 1 0 −X −Z 2 ˆ 3,reg , showing that it is the same as We find results for the inertia of K in the unregularized case both during the iterations and in the limit. Theorem 4.16. For ρ ≥ 0, δ > 0, assume H + ρI is positive definite. Then ˆ 3,reg during the iterations is (n, n + m, 0). the inertia of K Proof. Defining the matrix M as follows I 0 0 M = A I 0 , B 0 I 1 ˆ 3,reg with A := −J(H + ρI)−1 and B := Z 2 (H + ρI)−1 , we can decompose K as H + ρI 0 0 ˆ 3,reg M = 0 U 0 MT K (4.6) 0 0 W 1 1 where U := −δI − J(H + ρI)−1 J T and W := −X − Z 2 (H + ρI)−1 Z 2 . Since H + ρI is positive definite and U and W are negative definite, the inertia of ˆ 3,reg is (n, n + m, 0). K Theorem 4.17. For ρ ≥ 0, δ > 0, assume H + ρI is positive definite and (x, y, z) is a solution of (4.1) and (4.2) where strict complementarity is ˆ 3,reg is (n, n + m, 0). satisfied. Then the inertia of K Proof. Following the proof of Theorem 4.16, the relation (4.6) holds. By ˆ 3,reg is nonsingular, and therefore since M is nonsingular, Lemma 4.12, K the block diagonal matrix in (4.6) is nonsingular as well. Then H + ρI is ˆ 3,reg positive definite and U and W are negative definite, and the inertia of K is (n, n + m, 0). Finally, we can also consider the case where δ > 0, but require only nonnegativity of ρ. ˆ 3,reg during the Theorem 4.18. For δ > 0 and ρ ≥ 0, the inertia of K iterations is (n, n + m, 0). 54 4.3. Unreduced 3-by-3 form ˆ 3,reg as Proof. We can decompose K I − 1δ J T I 1 X −1 Z 2 A −δI I I − 1δ J 1 −X X −1 Z 2 I , I where A := H + ρI + 1δ J T J + X −1 Z, which is positive definite. The result follows. We do not include a similar result for the limit; our goal is to present the regularization with strictly positive parameters as ideal. We add that the inertia results also hold in the case where δ = 0, which follows from the unregularized results. 4.3.3 Eigenvalue bounds The eigenvalue problem for H + ρI J 1 −Z 2 ˆ 3,reg is K 1 J T −Z 2 u u −δI 0 v = θ v. w w 0 −X (4.7) ˆ 3,reg . Our first result provides bounds on the positive eigenvalues of K ˆ 3,reg are bounded Theorem 4.19. The positive eigenvalues of the matrix K in [ξ, η] , where ξ = min j 1 2 λn + ρ − xj + (λn + ρ + xj )2 + 4zj and η is the largest root of the cubic equation θ3 + (δ − (λ1 + ρ))θ2 − (δ(λ1 + ρ) + σ12 + zmax )θ − zmax δ = 0. We can also find a simplified uniform lower bound which is more easily computed. Corollary 4.20. A uniform lower bound for the positive eigenvalues of ˆ 3,reg is given by K ξ0 = λn + ρ. 55 4.3. Unreduced 3-by-3 form Proof. As before, we separate the discussion into the upper bound and lower bound cases. Beginning with the upper bound, we solve for w in the third block row of 1 (4.7) to get w = −(θI + X)−1 Z 2 u, which we substitute into the first block row to obtain 1 1 (H + ρI)u + J T v + Z 2 (θI + X)−1 Z 2 u = θu. 1 Taking the inner product with u and noting that the matrices Z 2 and (θI + X)−1 are diagonal and therefore commute gives the following equation for θ: uT (H + ρI)u + uT J T v + uT (θI + X)−1 Zu = θ u 2 . (4.8) Solving for v in the second block row of (4.7) gives v = substitute into (4.8) to get uT (H + ρI)u + 1 Ju θ+δ 2 1 θ+δ Ju, which we + uT (θI + X)−1 Zu = θ u 2 . (4.9) We use (3.2) and (3.3) to bound the first and second terms in (4.9): (λ1 + ρ) u 2 + σ12 u θ+δ 2 + uT (θI + X)−1 Zu ≥ θ u 2 . Since the matrix (θI + X)−1 Z is diagonal, we bound by the maximum of the diagonal elements uT (θI + X)−1 Zu ≤ max i zi u 2. θ + xi As in the proof of Theorem 3.19, we can bound the maximum term above by zmax θ . We now have λ1 + ρ + zmax σ12 + θ+δ θ u 2 ≥ θ u 2. (4.10) Multiplying by (θ + δ)θ and rearranging gives θ3 + (δ − (λ1 + ρ)) θ2 − δ(λ1 + ρ) + σ12 + zmax θ − zmax δ u 2 ≤ 0. We must have u = 0, since if u = 0 then the second block row of (4.7) implies ˆ 3,reg is nonsingular, θ > 0 and thus θ + δ > 0, that (θ + δ)v = 0. Since K 1 implying that v = 0. Then, by the first block row, Z 2 w = 0 would imply 56 4.3. Unreduced 3-by-3 form w = 0, which must not occur since the eigenvector must be nontrivial. We can thus divide by u 2 and bound by the largest real root of θ3 + (δ − (λ1 + ρ)) θ2 − δ(λ1 + ρ) + σ12 + zmax θ − zmax δ = 0, yielding an upper bound. Note that there is exactly one positive real root, since the values of the cubic and its derivative are negative at zero. For the lower bound, taking the inner product of v with the second block row of (4.7) and rearranging, we have v T Ju ≡ uT J T v = (θ + δ) v 2 , which we substitute into (4.8) to give uT (H + ρI)u + (θ + δ) v 2 + uT (θI + X)−1 Zu = θ u 2 . 2 + uT (θI + X)−1 Zu ≤ θ u 2 . Using (3.2), we have (λn + ρ) u 2 + (θ + δ) v Bounding the last term on the left with a minimum, this becomes (λn + ρ) u 2 + (θ + δ) v 2 + min i zi u θ + xi 2 ≤ θ u 2. (4.11) This minimum will occur for some index j, and we then multiply by θ + xj and rearrange into (θ2 + (xj − λn − ρ)θ − (xj (λn + ρ) + zj )) u 2 ≥ (θ + δ) v 2 ≥ 0. Since again u = 0, we then bound by the positive root of the quadratic. Taking the minimum over all j gives the bound. Now, a uniform lower bound as given in the corollary can be found by zi taking zero as a lower bound for the mini θ+x term in (4.11), giving i (λn + ρ) u 2 + (θ + δ) v 2 ≤ θ u 2, which after rearranging is (θ − λn − ρ) u 2 ≥ (θ + δ) v 2 ≥ 0. Thus an alternative lower bound is θ ≥ λn + ρ. We begin our investigation of negative eigenvalues with an upper bound, which turns out to depend on the scaling of the problem. 57 4.3. Unreduced 3-by-3 form Lemma 4.21. Let ρ ≥ 0 and δ > 0. Suppose that xi > δ for i = 1, . . . , n. ˆ 3,reg are bounded above by −δ, and θ = −δ Then the negative eigenvalues of K is an eigenvalue if and only if J is rank deficient. Proof. First we will show the bound. Assume by contradiction that there 1 is a negative eigenvalue that satisfies θ > −δ. Upon extracting v = θ+δ Ju 1 from the second block row and using the identity wT Z 2 u = −wT (θI + X)w from the third block row, taking the inner product of the first block row with u gives uT (H + ρI)u + 1 θ+δ Ju 2 + wT (θI + X)w = θ u 2 . Since θ + δ > 0 by assumption and all xi > δ > −θ, the left-hand side of the last identity is positive. If u = 0, then both v and w are also zero, giving a trivial eigenvector and therefore a contradiction. If u = 0, θ must be positive, which contradicts our initial assumption on the sign of θ. Thus the negative eigenvalues are bounded above by −δ. Suppose J is rank deficient. Then u = 0, v ∈ Null(J T ), v = 0 and w = 0 satisfies (4.7) with θ = −δ. Suppose now that J has full rank and show that θ = −δ. By contradiction, assume that θ = −δ. From the third block row and the assumption that all xi > δ, we have 1 wT Z 2 u = wT (δI − X)w ≤ 0. Taking the inner product of the first block row of (4.7) with u and using the above inequality and Ju = 0 from the second block row, we obtain −δ u 2 1 = uT (H + ρI)u − uT Z 2 w ≥ 0. Since δ > 0, this must mean that u = 0. The third block row then gives w = 0 and we are left with J T v = 0 in the first block row. Since J has full row rank, we conclude that v = 0 and that θ = −δ cannot be an eigenvalue. Interestingly, a similar result holds in the limit. Note however that there seems to be a transition zone between the moment where some components of x drop below δ and the limit when strict complementarity applies. This “grey zone” is necessarily attained if A = ∅, and it is more difficult to characterize the relationship between θ and −δ in that zone. 58 4.3. Unreduced 3-by-3 form Lemma 4.22. Let ρ > 0 and δ > 0. Suppose that in the limit of the interior-point iterations, (x, y, z) is strictly complementary, xi > δ for all √ i ∈ I, and maxi zi is sufficiently small. Then the negative eigenvalues of ˆ 3,reg are bounded above by −δ, and θ = −δ is an eigenvalue if and only if K J is rank deficient. Proof. We will first show the bound. Assume by contradiction that there ˆ 3,reg is nonsinguexists a negative eigenvalue that satisfies θ > −δ. Since K lar, there must exist > 0 such that θ ≤ − for all negative eigenvalues. If δ ≤ , − ≤ −δ and the eigenvalues are bounded above by −δ, a contradiction. Thus we can consider only the case where < δ. Suppose then that √ maxi zi < . In the limit, we have xA = 0, xI > 0 and zI = 0. Because strict complementarity is satisfied, we must also have zA > 0. Partitioning ˆ 3,reg in (4.7) into active and inactive components the third block row of K gives 1 2 uA = θwA , −ZAA (4.12) −XII wI = θwI . (4.13) We may rewrite (4.13) as (XII + θI)wI = 0, which implies wI = 0 because xi + θ > xi − δ > 0 for all i ∈ I by assumption. Taking the inner product of both sides of (4.12) with wA gives 1 T 2 − wA ZAA uA = θ wA 2 . (4.14) Taking now the inner product of the first block row of (4.7) with u, the inner product of the second block row with v and combining, we may write θ uI 2 = uT (H + ρI)u + (θ + δ) v 2 + θ( wA 2 − uA 2 ), (4.15) where we used the decomposition u 2 = uA 2 + uI 2 and (4.14). Note that from (4.12), uA = 0 if and only if wA = 0. If both vanish, necessarily uI = 0, and then the right hand side of (4.15) is strictly positive. This implies that θ > 0, a contradiction. Suppose now that uA = 0 and wA = 0. 1 2 Rearranging (4.12) we find that wA = − 1θ ZAA uA , and using the upper 2 √ bounds maxi zi < and θ ≤ − we have wA 2 ≤ θ2 uA 2 ≤ uA 2 . Substituting into (4.15) gives θ uI 2 ≥ uT (H + ρI)u + (θ + δ) v 2 . Then the right hand side is strictly positive, and if uI = 0, θ > 0, a contradiction. If uI = 0, then uA = 0, again a contradiction. Therefore, we cannot have θ > −δ, and we conclude that θ ≤ −δ. 59 4.3. Unreduced 3-by-3 form Suppose J is rank deficient. As in Lemma 4.21, (0, v, 0) with v ∈ Null(J T ), v = 0 is an eigenvector for θ = −δ. Now assume that J has full rank, and assume by contradiction that θ = −δ. Partitioning as above gives 1 2 ZAA uA = δwA (4.16) XII wI = δwI . (4.17) Rearranging (4.17), the matrix (XII − δI) has full rank by assumption, so we have wI = 0. If u = 0 we also have wA = 0 and there only remains J T v = 0 in the first block row of (4.7), giving the trivial eigenvector and a contradiction. Thus u = 0. Taking the inner product of (4.16) with wA reveals that 1 1 T 2 ZAA uA = δ wA 2 . (4.18) w T Z 2 u = wA √ Using the Cauchy–Schwarz inequality and the bound on maxi zi , we have 1 T 2 wA ZAA uA ≤ wA uA , and using (4.18) and rearranging gives wA ≤ δ uA ≤ uA ≤ u (4.19) since < δ. Taking the inner product of the first block row with u, using (4.18) and Ju = 0 from the second block row, and rearranging, we have uT (H + ρI)u = δ( wA 2 − u 2 ), which reduces to uT (H + ρI)u ≤ 0 by (4.19). Therefore u = 0, a contradiction, and θ = −δ when J is full rank. Remark. By scaling the optimization problem prior to solving, it is always possible to arrange so that the assumption on scaling of xi and zi are always satisfied. These results hold trivially when δ = 0. Next, we derive a lower bound on the negative eigenvalues. Theorem 4.23. Assume θI + X is nonsingular for all θ throughout the ˆ 3,reg satisfy computation. Then the negative eigenvalues θ of the matrix K θ ≥ ζ, 60 4.3. Unreduced 3-by-3 form where ζ := min 1 2 λn + ρ − δ − (λn + ρ + δ)2 + 4σ12 , min j|θ+xj <0 θj∗ , and θj∗ is the smallest negative root of the cubic equation θ3 + (xj + δ − λn − ρ) θ2 + δxj − δ(λn + ρ) − xj (λn + ρ) − σ12 − zj θ − δxj (λn + ρ) + σ12 xj + zj δ = 0. Proof. We assume that θ + δ < 0. The case where θ ≥ −δ poses no difficulty here since it can be easily verified that ζ ≤ −δ. We start from (4.9) with the bounds in (3.2) and (3.3) to get (λn + ρ) u 2 + 1 σ2 u θ+δ 1 2 + uT (θI + X)−1 Zu ≤ θ u 2 . Bounding the remaining term of the previous inequality by the minimum, we obtain λn + ρ + 1 zi σ12 + min i θ + xi θ+δ u 2 ≤ θ u 2. (4.20) We consider two cases. In case one, θ + xi > 0 for all indices i, and in this case we can bound the minimum term from below by zero. In case two, zj zi some θ + xi < 0, and there exists an index j such that mini θ+x = θ+x . i j In case one, we bound the minimum term in (4.20) below by zero, giving λn + ρ + 1 σ2 θ+δ 1 u 2 ≤ θ u 2, (4.21) which we can multiply by (θ + δ) and rearrange to give θ2 + (δ − λn − ρ)θ − (δ(λn + ρ) + σ12 ) u 2 ≤ 0. We must have u = 0, since if u = 0 the second line of (3.12) implies (θ+δ)v = 1 0, implying that v = 0. The first line yields Z 2 w = 0 and thus w = 0, a contradiction. Thus we can divide by u 2 and bound by the negative root of the quadratic, giving θ≥ 1 2 λn + ρ − δ − (λn + ρ + δ)2 + 4(σ12 ) . 61 4.3. Unreduced 3-by-3 form In case 2, we use the index j for the minimum in (4.20) to get λn + ρ + zj 1 σ12 + θ+δ θ + xj u 2 ≤ θ u 2. (4.22) Multiplying by (θ + δ)(θ + xj ) and rearranging, we get θ3 + (xj + δ − λn − ρ)θ2 + (δxj − δ(λn + ρ) − xj (λn + ρ) − σ12 − zj )θ − (δxj (λn + ρ) + σ12 xj + zj δ) u We note that again u = 0, so we can divide by u smallest root of the cubic 2 2 ≥ 0. and define θj∗ to be the θ3 + (xj + δ − λn − ρ) θ2 + δxj − δ(λn + ρ) − xj (λn + ρ) − σ12 − zj θ − δxj (λn + ρ) + σ12 xj + zj δ = 0. Then the bound is given by the minimum of these possible bounds from cases one and two. We can compare the bounds of the regularized and unregularized 3-by3 systems using asymptotic techniques. For the upper positive bound, we begin with (4.10) and set = δ/θ to get λ1 + ρ + Using 1 1+ 1 zmax σ12 + θ(1 + ) θ ≈ 1 − for small u 2 ≥ θ u 2. and dividing by u 2 gives zmax 1 ≥ θ, λ1 + ρ + σ12 (1 − ) + θ θ which after multiplying by θ and rearranging becomes θ2 − (λ1 + ρ)θ − (σ12 (1 − ) + zmax ) ≤ 0. This yields the bound θ≤ 1 2 λ1 + ρ + (λ1 + ρ)2 + 4(σ12 (1 − ) + zmax ) . (4.23) By comparing with the unregularized bound given in Theorem 3.19, we can see that the regularization with ρ is approximately a shift of this bound 62 4.3. Unreduced 3-by-3 form away from zero by ρ, and that the regularization with δ gives a small contraction of the bound. Comparing the lower positive bounds, we see that the regularization with ρ results in a shift away from zero by approximately ρ. In total, the positive spectral interval is additively shifted away from zero by approximately ρ, and contracted by a factor related to δ. For the upper negative bound, we note that given the scaling assumptions in Lemmas 4.21 and 4.22, these show that the effect of regularization is to buffer the eigenvalues away from zero by δ. For the lower negative bound, we consider each of the two cases. For case one we begin from (4.21), divide by u 2 and use the same asymptotic argument as above to get 1 λn + ρ + σ12 (1 − ) ≤ θ. θ Multiplying by θ and rearranging gives θ2 − (λn + ρ)θ − σ12 (1 − ) ≤ 0, leading to the bound θ≥ 1 2 λn + ρ − (λn + ρ)2 + 4σ12 (1 − ) . Comparing this to the bound given in Theorem 3.21, we see that ρ has no significant effect, while the effect of δ is to bring the bound closer to zero. For case two, we begin from (4.22), which we divide by u 2 and use the same asymptotic argument as above to get λn + ρ + σ12 (1 − ) + zj ≤ θ. θ + xj Multiplying by θ + xj , which is negative, and collecting we get θ3 + (xj − λn − ρ)θ2 − (σ12 (1 − ) + zj + xj (λn + ρ))θ − xj σ12 (1 − ) ≥ 0. We now separate the original equation giving the bound in Theorem 3.21 from the ρ and terms. The ρ terms are −θ2 − xj θ = −θ(θ + xj ) and the terms are σ12 θ + σ12 xj = σ12 (θ + xj ). Since θ + xj < 0, these are both negative, so the cumulative effect is shift the cubic polynomial down. This will shift the lower negative bound closer to zero. In total, there is no significant effect from ρ, while δ both shifts the interval away from zero and contracts the interval. 63 4.3. Unreduced 3-by-3 form 4.3.4 Condition numbers Unlike the situation for the unregularized 3-by-3 system, for the regularized system we can find an asymptotic bound on the condition number. Theorem 4.24. The condition number of K3,reg is bounded asymptotically by 1 if λn > 0 δ, κ(K3,reg ) 1 min(ρ,δ) , if λn = 0. Proof. We can obtain the following asymptotic estimates for the eigenvalues: λn + ρ ≤ θ ≤ η 1 for the positive eigenvalues, and 1 ζ ≤ θ ≤ −δ for the negative eigenvalues. Using these asymptotic bounds on the eigen1 values gives an asymptotic bound of min(ρ+λ , which further simplifies to n ,δ) the given bounds. This validates the claim that the unreduced system sees the best conditioning. Under the usual choices of ρ and δ and unless the conditioning of the problem is such that the constants hidden in the asymptotic bound are very large, this condition number will remain within computing limits through convergence of the iteration. Our numerical experiments, covered in the next chapter, will verify that the unreduced matrices remain numerically nonsingular throughout. 64 Chapter 5 Numerical experiments We offer examples to validate the analysis of previous sections. The computations are done in Matlab. The code is a straightforward implementation of Algorithm 2.1, with starting point computed via Algorithm 2.2. Matlab’s built-in functions are used to compute singular values, eigenvalues, and condition numbers, and we use the backslash operator to solve linear systems, which implements an LU factorization and solves the system using it. The code terminates when the difference between the primal and dual objective functions, the duality gap, is below a tolerance of 10−8 . The comparisons between different formulations are both illustrating the differences in the matrices themselves and the performance of the solver when using that formulation. Plots of eigenvalues and condition numbers are versus iteration number, and should be read left to right as progression of the interior point solver. We use solid lines for eigenvalue bounds and dots for the actual eigenvalues. Details of the problems we solve are given in Table 5.1. The problems are small so that the exact eigenvalues can be calculated, but they are representative. Table 5.1: Properties of quadratic programs solved. Problem 1 2 3 6 8 13 17 21 22 Source TOMLAB TOMLAB TOMLAB TOMLAB TOMLAB TOMLAB TOMLAB TOMLAB TOMLAB n 6 5 10 10 6 7 293 51 295 m 5 3 6 7 4 5 286 27 173 LICQ? Yes Yes Yes Yes Yes Yes Yes No No We first illustrate the eigenvalue bounds of Chapter 3. The problem 65 Chapter 5. Numerical experiments we examine is a small quadratic program from the TOMLAB optimization package [1], problem number 6. At the limit of the iterations, this problem satisfies the LICQ, strict complementarity, and the non-intersection of the nullspaces, and thus by Theorem 3.15 the 3-by-3 matrices will remain nonsingular in the limit. These favourable properties cause the inner eigenvalue bounds to be pessimistic, since our bounds do not distinguish whether the LICQ and non-intersection of the nullspaces hold. First we consider the bounds on K1 . Figure 5.1 shows the eigenvalues of K1 in log scale. The eigenvalues remain within the bounds, and the largest follow the upper bound, while the lower bound is pessimistic. The upper bound is exponentially increasing as 1/µ, and the lower bound is decaying like µ, which were the asymptotic bounds given in Chapter 3. Figure 5.1: TOMLAB problem 6: eigenvalues of K1 in log scale. 20 10 upper positive lower positive 15 10 10 10 eigenvalues 5 10 0 10 −5 10 −10 10 −15 10 −20 10 0 5 10 15 20 iteration Now for K2 , Figure 5.2 shows the positive and negative eigenvalues separately in log scale. These show similar trends as for K1 : the upper positive and lower negative bounds appear fairly tight, while the lower positive and upper negative bounds are pessimistic. The upper positive bound increases as 1/µ, while the lower positive and upper negative bounds decay like µ, again following the asymptotic bounds given in Chapter 3. For both K1 and K2 , the exponential increase of the upper positive bound, with the eigenvalues achieving this bound, is the cause of the ill-conditioning of these matrices. While the inner bounds decay towards zero, the eigenvalues themselves remain well away from zero and do not contribute to the ill-conditioning. 66 Chapter 5. Numerical experiments Figure 5.2: TOMLAB problem 6: eigenvalues of K2 in log scale. 15 1 10 10 0 10 10 10 −1 10 5 −2 10 eigenvalues eigenvalues 10 0 10 −5 upper negative lower negative −3 10 −4 10 −5 10 10 −6 10 −10 10 −7 10 upper positive lower positive −15 10 −8 0 5 10 15 iteration (a) Positive 20 25 10 0 5 10 15 20 25 iteration (b) Negative Figure 5.3 shows the eigenvalues of K3 in log scale. The upper positive and lower negative bounds remain fixed with respect to µ, and these bounds are relatively tight. While the upper negative bound is zero and the lower positive bound decays like µ, these bounds do not reflect the eigenvalues which are bounded away from zero on both the positive and negative side. The conditioning of this matrix is thus better than that of K1 or K2 : it is fixed with respect to both the iteration number and µ. Figure 5.4 shows ˆ 3 , which has near identical behaviour to the same thing for the symmetric K that of its unsymmetric counterpart. Figure 5.5 shows the condition numbers of the different formulations. The 3-by-3 formulations are indeed well-conditioned throughout, while the reduced forms have exponentially increasing condition numbers. Table 5.2 gives the duality gaps during the iterations, showing the progression of the iterations. Note that the 3-by-3 formulations take fewer iterations, possibly due to their better conditioning, though this needs further investigation and remains an item for future work. Next we illustrate the eigenvalue bounds of Chapter 4. We use the same problem and regularize with ρ = δ = 1e−4. Figure 5.6 shows the eigenvalues of K1,reg in log scale. We see that the upper and lower bounds are now fixed with respect to µ after the initial iterations. 67 Chapter 5. Numerical experiments Figure 5.3: TOMLAB problem 6: eigenvalues of K3 in log scale. Note that the upper negative bound is zero and is not visible in this scale. 2 3 10 10 upper negative lower negative 0 10 −2 2 10 10 eigenvalues eigenvalues −4 10 upper positive lower positive −6 10 −8 10 −10 1 10 0 10 10 −12 10 −14 10 −1 0 2 4 6 8 10 10 12 0 2 4 iteration 6 8 10 12 iteration (a) Positive (b) Negative ˆ 3 in log scale. Note that Figure 5.4: TOMLAB problem 6: eigenvalues of K the upper negative bound is zero and is not visible in this scale. 2 3 10 10 upper negative lower negative 0 10 −2 2 10 10 eigenvalues eigenvalues −4 10 upper positive lower positive −6 10 −8 10 −10 1 10 0 10 10 −12 10 −14 10 −1 0 2 4 6 iteration (a) Positive 8 10 12 10 0 2 4 6 8 10 12 iteration (b) Negative 68 Chapter 5. Numerical experiments Figure 5.5: TOMLAB problem 6: comparison of the condition numbers. 16 10 3x3 unsym. cond. 3x3 sym. cond. 2x2 cond. 1x1 cond. 14 10 12 10 10 10 8 10 6 10 4 10 2 10 0 10 0 5 10 15 20 25 iteration Figure 5.6: TOMLAB problem 6: eigenvalues of K1,reg in log scale with ρ = δ = 1e−4. 5 10 upper positive lower positive 4 10 3 10 2 eigenvalues 10 1 10 0 10 −1 10 −2 10 −3 10 −4 10 0 5 10 15 20 iteration 69 Chapter 5. Numerical experiments Table 5.2: TOMLAB problem 6: progression of the interior-point solver. k 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 K3 solver 5.031373e+04 1.490725e+04 5.199114e+04 2.079359e+03 1.300615e+02 6.930406e+00 3.483175e−01 1.741601e−02 8.708002e−04 4.354001e−05 2.177003e−06 1.088520e−07 5.442416e−09 Duality gap ˆ 3 solver K K2 solver 5.031373e+04 5.031373e+04 1.490725e+04 1.490725e+04 5.199114e+04 2.783944e+04 2.079359e+03 3.336749e+04 1.300615e+02 7.693309e+03 6.930406e+00 1.623930e+02 3.483175e−01 −1.709920e+02 1.741601e−02 −4.382071e+00 8.708002e−04 1.536128e+00 4.354001e−05 6.165176e−01 2.177003e−06 1.815162e−01 1.088520e−07 3.393544e−02 5.442416e−09 6.215981e−03 1.618046e−03 2.389091e−04 9.084266e−05 2.157252e−05 6.648293e−06 1.296226e−06 2.493653e−07 4.124377e−08 5.205948e−09 K1 solver 5.031373e+04 2.671911e+04 1.239206e+04 −6.535199e+03 −3.759075e+02 −3.068804e+01 −7.616401e−01 −1.220000e−01 1.252850e−02 1.859958e−03 6.841660e−04 1.158979e−04 2.299556e−05 3.688372e−06 5.986658e−07 8.586721e−08 1.209264e−08 1.524313e−09 70 Chapter 5. Numerical experiments Figure 5.7 shows the eigenvalues of K2,reg in log scale. While the inner bounds are now well away from zero, the upper positive bound is still increasing exponentially with 1/µ, and the eigenvalues have not substantially changed. Figure 5.7: TOMLAB problem 6: eigenvalues of K2,reg in log scale with ρ = δ = 1e−4. 1 10 0 10 −1 eigenvalues 10 upper negative lower negative −2 10 −3 10 −4 10 −5 10 0 5 10 15 20 25 iteration Figure 5.8 shows the eigenvalues of K3,reg in log scale, and Figure 5.9 ˆ 3,reg . The lower positive bound is now well shows the same thing for K away from zero, as given in Theorem 4.19, and there is an upper negative bound at −δ, as given in Lemma 4.21. While these bounds are helpful for analytical purposes, the eigenvalues themselves have not substantially changed from the unregularized case due to the favourable properties of this problem. Table 5.3 gives the duality gaps during the iterations, showing that the 3-by-3 forms take fewer iterations but with little difference to the unregularized case. Figure 5.10 shows the effect on the condition numbers of varying the regularization parameters, again using the same problem. The 3-by-3 formulations are unchanged over different choices of regularization parameters, and referring back to Figure 5.5, the condition numbers are still unchanged. The properties of this problem do not require regularization for the 3-by-3 formulations and accordingly, regularization has little effect on the matrix. Referring back to Theorem 4.24, the upper bound on the condition numbers is O(1/ min(ρ, δ)) when λn = 0, as it is here, but in this case this 71 Chapter 5. Numerical experiments Figure 5.8: TOMLAB problem 6: eigenvalues of K3,reg in log scale with ρ = δ = 1e−4. 2 3 10 10 2 10 1 10 1 10 0 eigenvalues eigenvalues 10 −1 10 upper positive lower positive 0 10 −1 10 −2 10 −2 10 upper negative lower negative −3 10 −3 10 −4 10 −4 0 2 4 6 8 10 10 12 0 2 4 iteration 6 8 10 12 iteration (a) Positive (b) Negative ˆ 3,reg in log scale with Figure 5.9: TOMLAB problem 6: eigenvalues of K ρ = δ = 1e−4. 2 3 10 10 2 10 1 10 1 10 0 eigenvalues eigenvalues 10 −1 10 upper positive lower positive 0 10 −1 10 −2 10 −2 10 upper negative lower negative −3 10 −3 10 −4 10 −4 0 2 4 6 iteration (a) Positive 8 10 12 10 0 2 4 6 8 10 12 iteration (b) Negative 72 Chapter 5. Numerical experiments Table 5.3: TOMLAB problem 6: progression of the interior-point solver regularized with ρ = δ = 1e−4. k 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 K3 solver 5.031373e+04 1.492640e+04 5.159215e+04 2.051814e+03 1.306938e+02 6.966969e+00 3.505399e−01 1.755095e−02 8.787479e−04 4.399788e−05 2.202938e−06 1.102999e−07 5.522452e−09 Duality gap ˆ K3 solver K2 solver 5.031373e+04 5.031373e+04 1.492640e+04 1.492640e+04 5.159215e+04 2.746570e+04 2.051814e+03 3.295581e+04 1.306938e+02 7.512121e+03 6.966969e+00 1.848690e+02 3.505399e−01 −1.716993e+02 1.755095e−02 −4.548437e+00 8.787479e−04 1.526718e+00 4.399788e−05 5.909084e−01 2.202938e−06 1.736246e−01 1.102999e−07 3.174431e−02 5.522452e−09 5.825992e−03 1.521274e−03 2.248358e−04 8.643147e−05 2.060630e−05 6.363942e−06 1.241999e−06 2.390589e−07 3.948298e−08 4.994945e−09 K1 solver 5.031373e+04 2.673851e+04 1.239648e+04 −6.519909e+03 −3.701845e+02 −2.742238e+01 −5.221952e−01 −2.666012e−02 1.887891e−02 3.894197e−03 8.123547e−04 1.487522e−04 2.498628e−05 4.084137e−06 6.214614e−07 8.887218e−08 1.224180e−08 1.531589e−09 73 Chapter 5. Numerical experiments bound is an overestimate since the condition numbers are fixed with respect to the regularization parameters. The 2-by-2 saddle-point problem is also unchanged over the different regularization parameters, but for a different reason: the regularization cannot fix the conditioning of this formulation. Looking at Theorems 3.13 and 4.8, we see that the condition number for the unregularized matrix is O(µ−2 ), while for the regularized matrix it is O(1/(µ min(ρ, δ)). However, the regularized case is as poor as the unregularized, suggesting that at least the unregularized bound is an overestimate. The 1-by-1 is interesting; the condition number increases to a certain point, then levels off. The condition number is O(1/(ρδ)), given in (4.4), so using fixed regularization parameters has this effect. The approximate point of levelling off appears to be closer to O(1/ min(ρ, δ)), suggesting again an overestimate of the asymptotic bound. Common practice is to decrease the regularization parameters at each iteration, which would lead to the condition number of the 1-by-1 form continuing to increase instead of levelling off. Overall, the condition numbers of the 3-by-3 formulations are small and fixed both as the iteration progresses and for different choices of the parameters, while the condition numbers of the 2-by-2 and 1-by-1 are exponentially increasing and sensitive to choice of parameters respectively. We now show an example which does not satisfy the LICQ, but which still does have strict complementarity. Thus the requirements of Theorem 3.15 fail, but the requirements of Lemma 4.12 are satisfied and the 3-by-3 matrices will be nonsingular when regularized. We use problem 21 from the TOMLAB package, another small quadratic program. The inner eigenvalue bounds are more representative of the actual eigenvalues for this problem because the LICQ is not satisfied. Figure 5.11 shows the eigenvalues of K1 in log scale. The eigenvalues follow the trends of both bounds, the upper bound increasing as 1/µ and the lower bound decaying as µ, though neither bound is quite tight. Contrast this to the case for problem 6, where the eigenvalues only followed the upper bound. Figure 5.12 shows the eigenvalues of K2 in log scale. The upper positive bound increases as 1/µ and is tight, and the lower positive and upper negative bounds decay as µ, with the eigenvalues following these trends. This is in contrast to the case for problem 6, where the eigenvalues were well away from zero on both the positive and negative sides. Figure 5.13 shows the eigenvalues of K3 in log scale. The upper positive and lower negative bounds are fixed and tight, while the lower positive bound decays as µ and the upper negative bound is zero. We see for this problem that the eigenvalues are also tending towards zero, in contrast to problem 74 Chapter 5. Numerical experiments Figure 5.10: TOMLAB problem 6: comparing the condition numbers over different regularization parameters. 14 14 10 10 3x3 unsym. cond. 3x3 sym. cond. 2x2 cond. 1x1 cond. 12 10 10 10 10 10 10 8 8 10 10 6 6 10 10 4 4 10 10 2 2 10 10 0 10 3x3 unsym. cond. 3x3 sym. cond. 2x2 cond. 1x1 cond. 12 0 0 5 10 15 20 25 10 0 5 iteration 15 20 25 20 25 iteration (a) ρ = δ = 1e−2 (b) ρ = δ = 1e−4 14 14 10 10 3x3 unsym. cond. 3x3 sym. cond. 2x2 cond. 1x1 cond. 12 10 3x3 unsym. cond. 3x3 sym. cond. 2x2 cond. 1x1 cond. 12 10 10 10 10 10 8 8 10 10 6 6 10 10 4 4 10 10 2 2 10 10 0 10 10 0 0 5 10 15 iteration (c) ρ = δ = 1e−6 20 25 10 0 5 10 15 iteration (d) ρ = δ = 1e−8 75 Chapter 5. Numerical experiments Figure 5.11: TOMLAB problem 21: eigenvalues of K1 in log scale. 20 10 upper positive lower positive 15 10 10 eigenvalues 10 5 10 0 10 −5 10 −10 10 −15 10 0 5 10 15 20 iteration Figure 5.12: TOMLAB problem 21: eigenvalues of K2 in log scale. 15 2 10 10 10 0 10 10 5 −2 10 eigenvalues eigenvalues 10 0 10 −5 10 −10 −4 10 −6 10 −8 10 10 −15 upper negative lower negative −10 10 10 upper positive lower positive −20 10 −12 0 5 10 iteration (a) Positive 15 10 0 5 10 15 iteration (b) Negative 76 Chapter 5. Numerical experiments 6. The cumulative effect is that K3 has better conditioning than K1 or K2 which have poor conditioning due to both the large and small eigenvalues, though all matrices are numerically singular. Figure 5.14 shows the same ˆ 3. thing for K Figure 5.13: TOMLAB problem 21: eigenvalues of K3 in log scale. Note that the upper negative bound is zero and is not visible in this scale. 2 4 10 10 0 10 2 10 −2 10 0 10 −4 −2 eigenvalues eigenvalues 10 −6 10 −8 10 −10 10 −4 10 −6 10 10 upper positive lower positive −12 10 −10 10 −14 10 −16 10 upper negative lower negative −8 10 −12 0 2 4 6 8 iteration (a) Positive 10 12 14 10 0 2 4 6 8 10 12 14 iteration (b) Negative Figure 5.15 shows the condition numbers of the different formulations, illustrating that all formulations are singular in the limit, but that the 3-by-3 formulations are better conditioned. Note that although the condition numbers of the 1-by-1 system and both 3-by-3 systems are very similar through the first 15 iterations, the method with the 1-by-1 has not yet converged and requires several more iterations, at which point the condition number is orders of magnitude higher. The small fluctuation in the condition numbers for the 1-by-1 system in the final iterations is likely a numerical artefact of the condition number computation for very ill-conditioned matrices. Table 5.4 gives the duality gaps during the iterations. Note again that the 3-by-3 formulations take fewer iterations. We now regularize the same problem with ρ = δ = 1e−4. Figure 5.16 shows the eigenvalues of K1,reg in log scale. Both bounds are now fixed with respect to µ after the initial iterations, similarly to the case for problem 6. Figure 5.17 shows the eigenvalues of K2,reg in log scale. While the in77 Chapter 5. Numerical experiments ˆ 3 in log scale. Note Figure 5.14: TOMLAB problem 21: eigenvalues of K that the upper negative bound is zero and is not visible in this scale. 2 4 10 10 0 10 2 10 −2 10 0 10 −4 −2 eigenvalues eigenvalues 10 −6 10 −8 10 −10 10 −4 10 −6 10 10 upper positive lower positive −12 10 −10 10 −14 10 −16 10 upper negative lower negative −8 10 −12 0 2 4 6 8 10 12 10 14 0 2 4 6 iteration 8 10 12 14 iteration (a) Positive (b) Negative Figure 5.15: TOMLAB problem 21: comparing the condition numbers. 25 10 3x3 unsym. cond. 3x3 sym. cond. 2x2 cond. 1x1 cond. 20 10 15 10 10 10 5 10 0 10 0 5 10 15 20 iteration 78 Chapter 5. Numerical experiments Table 5.4: TOMLAB problem 21: progression of the interior-point solver. k 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 K3 solver 4.503941e+06 1.777060e+05 6.685252e+04 2.302182e+03 1.884457e+02 2.327962e+01 3.770574e+00 7.890978e−01 4.111731e−02 2.053638e−03 1.026760e−04 5.133785e−06 2.566892e−07 1.283446e−08 6.417226e−10 Duality gap ˆ K3 solver K2 solver 4.503941e+06 4.503941e+06 1.777060e+05 1.777060e+05 6.685252e+04 6.430173e+04 2.302182e+03 2.664655e+03 1.884457e+02 1.413989e+02 2.327962e+01 1.488893e+01 3.770574e+00 1.531868e+00 7.890978e−01 4.724062e−01 4.111731e−02 3.333264e−02 2.053638e−03 3.364520e−03 1.026760e−04 3.064170e−04 5.133785e−06 3.251172e−05 2.566892e−08 2.923329e−06 1.283446e−08 2.661271e−07 6.417226e−10 2.343025e−08 1.825309e−09 K1 solver 4.503941e+06 4.727740e+05 8.839079e+04 4.533996e+04 4.737956e+03 6.459162e+02 5.649864e+01 1.068655e+01 1.561047e+00 4.095004e−01 7.294324e−02 1.476766e−02 2.762271e−03 4.647201e−04 7.562990e−05 1.052725e−05 1.495218e−06 1.736741e−07 2.168710e−08 2.136175e−09 79 Chapter 5. Numerical experiments Figure 5.16: TOMLAB problem 21: eigenvalues of K1,reg in log scale with ρ = δ = 1e−4. 6 10 upper positive lower positive 4 eigenvalues 10 2 10 0 10 −2 10 −4 10 0 5 10 15 20 iteration ner bounds are now well away from zero, the upper positive bound is still increasing exponentially, and all bounds are tight. Figure 5.18 shows the eigenvalues of K3,reg in log scale, and Figure 5.19 ˆ 3,reg . The lower positive bound is now well shows the same thing for K away from zero, and there is an upper negative bound at −δ. Note that at iteration 9, the magnitude of the smallest negative eigenvalues drops below δ, and this is exactly the point when the smallest xi drops below δ. This illustrates the “grey zone” when the conditions of Lemma 4.21 no longer hold, but Lemma 4.22 does not yet apply. Note that despite these eigenvalues dropping below the bound, they are still well away from zero. Table 5.5 gives the duality gaps during the iterations. Figure 5.20 shows the effect on the condition numbers of varying the regularization parameters, again using the same problem. The conditioning of the 2-by-2 system is still poor, with only slight variations depending on the parameters. The 1-by-1 and 3-by-3 systems have good conditioning, with the condition number increasing with smaller regularization parameters. As in the unregularized case, the 1-by-1 system and both 3-by-3 systems have similar condition numbers for several iterations, but again, the 1-by-1 requires more iterations and the final condition numbers are larger. Note that the iteration counts for the 3-by-3 systems are smallest for the intermediate values of the regularization parameters, showing the tradeoff 80 Chapter 5. Numerical experiments Figure 5.17: TOMLAB problem 21: eigenvalues of K2,reg in log scale with ρ = δ = 1e−4. 14 1 10 10 12 10 0 10 10 10 upper positive lower positive 8 eigenvalues eigenvalues 10 6 10 4 10 −1 10 −2 10 2 10 upper negative lower negative 0 10 −3 10 −2 10 −4 10 −4 0 5 10 10 15 0 5 iteration 10 15 iteration (a) Positive (b) Negative Figure 5.18: TOMLAB problem 21: eigenvalues of K3,reg in log scale with ρ = δ = 1e−4. 2 3 10 10 2 10 1 10 1 10 0 0 10 eigenvalues eigenvalues 10 −1 10 −2 −1 10 −2 10 −3 10 10 upper positive lower positive −4 10 −3 10 −5 10 −4 10 upper negative lower negative −6 0 2 4 6 8 iteration (a) Positive 10 12 14 10 0 2 4 6 8 10 12 14 iteration (b) Negative 81 Chapter 5. Numerical experiments Table 5.5: TOMLAB problem 21: progression of the interior-point solver regularized with ρ = δ =1e−4. k 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 K3 solver 4.503941e+06 1.774202e+05 6.696263e+04 2.287257e+03 1.856561e+02 2.244920e+01 3.377959e+00 7.869090e−01 3.175784e−02 1.187212e−03 3.992364e−05 1.107967e−06 1.547644e−08 −9.907045e−10 Duality ˆ 3 solver K 4.503941e+06 1.774202e+05 6.696263e+04 2.287257e+03 1.856561e+02 2.244920e+01 3.377959e+00 7.869090e−01 3.175784e−02 1.187212e−03 3.992364e−05 1.107967e−06 1.547644e−08 −9.907045e−10 gap K2 solver 4.503941e+06 1.774202e+05 6.445851e+04 2.643388e+03 1.401005e+02 1.433592e+01 1.350548e+00 6.385468e−01 3.318791e−02 4.503474e−03 5.549683e−04 6.684694e−05 7.343561e−06 7.320559e−07 7.355794e−08 6.148318e−09 K1 solver 4.503941e+06 4.763510e+05 8.848551e+04 4.634337e+04 4.814498e+03 6.538652e+02 5.169495e+01 9.709657e+00 1.234782e+00 3.603788e−01 6.298637e−02 1.371323e−02 2.562025e−03 4.360643e−04 7.058248e−05 1.005925e−05 1.422312e−06 1.726436e−07 2.153875e−08 2.267023e−09 82 Chapter 5. Numerical experiments ˆ 3,reg in log scale with Figure 5.19: TOMLAB problem 21: eigenvalues of K ρ = δ = 1e−4. 2 3 10 10 2 10 1 10 1 10 0 0 10 eigenvalues eigenvalues 10 −1 10 −2 −1 10 −2 10 −3 10 10 upper positive lower positive −4 10 −3 10 −5 10 −4 10 upper negative lower negative −6 0 2 4 6 8 iteration (a) Positive 10 12 14 10 0 2 4 6 8 10 12 14 iteration (b) Negative in regularization: larger parameters give more stabilization, but at the cost of changing the problem significantly. Finally, we consider the iteration counts for different problems with and without regularization. Tables 5.6 and 5.7 show the iteration counts for all the problems listed in Table 5.1 with 5 different choices of regularization parameters. The notation “—” indicates a problem which did not converge in the 300 maximum iterations, and the notation “*” indicates a problem which blew up to infinity or NaN. We observe that the iteration counts for the 3-by-3 formulations are either similar to or substantially better than the iteration counts for the reduced formulations. Note the restrictions on regularization parameters in, for example, problem 17: for ρ = δ = 1e−2 the method blows up, for ρ = δ = 1e−4 the iteration counts are large, and for ρ = δ = 1e−6 and ρ = δ = 1e−8 the iteration counts are the same as without regularization. 83 Chapter 5. Numerical experiments Figure 5.20: TOMLAB problem 21: comparing the condition numbers over different regularization parameters. 25 18 10 10 3x3 unsym. cond. 3x3 sym. cond. 2x2 cond. 1x1 cond. 20 10 3x3 unsym. cond. 3x3 sym. cond. 2x2 cond. 1x1 cond. 16 10 14 10 12 10 15 10 10 10 10 10 8 10 6 10 5 10 4 10 0 10 2 0 5 10 15 20 25 10 0 5 iteration 10 15 20 iteration (a) ρ = δ = 1e−2 (b) ρ = δ = 1e−4 18 25 10 10 3x3 unsym. cond. 3x3 sym. cond. 2x2 cond. 1x1 cond. 16 10 3x3 unsym. cond. 3x3 sym. cond. 2x2 cond. 1x1 cond. 20 10 14 10 12 10 15 10 10 10 10 10 8 10 6 10 5 10 4 10 2 10 0 0 5 10 15 iteration (c) ρ = δ = 1e−6 20 10 0 5 10 15 20 iteration (d) ρ = δ = 1e−8 84 Chapter 5. Numerical experiments Table 5.6: Iteration counts for different problems, part 1. A problem that did not converge is noted by “—”, and a problem that blew up is noted by “*”. Problem 1 2 3 6 8 Regularization parameters ρ=δ=0 ρ = δ = 1e−2 ρ = δ = 1e−4 ρ = δ = 1e−6 ρ = δ = 1e−8 ρ=δ=0 ρ = δ = 1e−2 ρ = δ = 1e−4 ρ = δ = 1e−6 ρ = δ = 1e−8 ρ=δ=0 ρ = δ = 1e−2 ρ = δ = 1e−4 ρ = δ = 1e−6 ρ = δ = 1e−8 ρ=δ=0 ρ = δ = 1e−2 ρ = δ = 1e−4 ρ = δ = 1e−6 ρ = δ = 1e−8 ρ=δ=0 ρ = δ = 1e−2 ρ = δ = 1e−4 ρ = δ = 1e−6 ρ = δ = 1e−8 Matrix formulation in solver ˆ3 K3 K K2 K1 9 9 9 9 8 8 8 8 9 9 9 9 9 9 9 9 9 9 9 9 10 10 9 9 11 11 11 12 10 10 10 10 10 10 9 9 10 10 9 9 11 11 26 16 28 28 — 26 11 11 24 15 11 11 26 16 11 11 26 16 12 12 21 17 17 17 23 18 12 12 21 17 12 12 21 17 12 12 21 17 11 11 11 11 11 11 11 11 11 11 11 11 11 11 11 11 11 11 11 11 85 Chapter 5. Numerical experiments Table 5.7: Iteration counts for different problems, part 2. A problem that did not converge is noted by “—”, and a problem that blew up is noted by “*”. Problem 13 17 21 22 Regularization parameters ρ=δ=0 ρ = δ = 1e−2 ρ = δ = 1e−4 ρ = δ = 1e−6 ρ = δ = 1e−8 ρ=δ=0 ρ = δ = 1e−2 ρ = δ = 1e−4 ρ = δ = 1e−6 ρ = δ = 1e−8 ρ=δ=0 ρ = δ = 1e−2 ρ = δ = 1e−4 ρ = δ = 1e−6 ρ = δ = 1e−8 ρ=δ=0 ρ = δ = 1e−2 ρ = δ = 1e−4 ρ = δ = 1e−6 ρ = δ = 1e−8 Matrix formulation in solver ˆ3 K K3 K2 K1 10 10 10 13 11 11 11 13 10 10 10 13 10 10 10 13 10 10 10 13 16 16 18 25 * — * * 41 43 40 40 16 16 18 25 16 16 18 25 14 14 15 19 24 24 21 25 13 13 15 19 14 14 15 19 14 14 15 19 24 24 33 32 — — — — — — — — 28 28 31 35 24 24 32 32 86 Chapter 6 Conclusions We have found analytical results for spectral properties of matrices arising from primal-dual interior-point methods for convex quadratic programs. We began by considering the original, unregularized problem (2.1). We considered four formulations of the Newton system occurring from the interiorpoint method for this problem: the original 3-by-3 unsymmetric form, the symmetrized 3-by-3 form, the 2-by-2 saddle-point form, and the 1-by-1 normal equations form. For each of these formulations, we have provided results on the conditions for nonsingularity, inertia, bounds on the eigenvalues, and condition number. The matrices for all formulations are nonsingular during the iterations if J has full rank, but other properties vary. When we are exactly on the central path and the asymptotic estimates of (3.6) hold, we can find asymptotic estimates of the condition numbers in terms of µ, the duality measure. For the normal equations, we saw that the matrix is generally singular in the limit, and that the condition number through the iteration is O(µ−2 ). For the saddle-point form, the matrix is again generally singular in the limit, and the condition number is O(µ−2 ). The unsymmetric and symmetric 3-by-3 forms have the same spectral properties since the symmetrization is via a similarity transformation, and they are nonsingular in the limit assuming strict complementarity, the LICQ, and that Null(H) ∩ Null(J) ∩ Null(Z) = {0}. These conditions are much less restrictive than those for the reduced formulations. We found inertia results on these matrices both during the iterations and in the limit. We provided upper and lower bounds on the positive eigenvalues, and a lower bound on the negative eigenvalues. For this unregularized case, we were not able to find a nonzero upper bound on the negative eigenvalues, and thus were not able to find a bound on the condition number, but the results on nonsingularity indicate that the condition number is uniformly bounded independently of µ. This alone is stronger than the results on the condition number of the reduced formulations. We next considered the regularized problem (4.1) and considered the same four formulations of the Newton system mentioned above. Again, the matrices for all formulations share the same requirements for nonsin87 Chapter 6. Conclusions gularity during the iteration; when the regularization parameters ρ and δ are positive, the matrices are unconditionally nonsingular. For the normal equations, the matrix is generally singular in the limit, and the condition number is O(1/(ρδ)). If this bound on the condition number were achieved, the conditioning for the regularized 1-by-1 form would be worse than the unregularized form. For the saddle-point form, we again generally have singularity at the limit, and the condition number is O(1/(µ min(ρ, δ))). The 3-by-3 forms require only strict complementarity for nonsingularity in the limit, and the use of a partially reduced formulation will allow nonstrict complementarity. We have found bounds on the eigenvalues, including a nonzero upper bound for the negative eigenvalues, in contrast to the unregularized case where the bound was simply zero. We have also shown that the regularization parameters both shift the spectral intervals away from zero and contract the intervals. Finally, we have shown that the condition number is O(1/δ) when λn > 0 and O(1/ min(ρ, δ)) when λn = 0, which indicates that for common choices of ρ and δ the regularized matrices will be numerically nonsingular throughout the iterations. We illustrated the analytical results with numerical examples, and while the inner eigenvalue bounds were pessimistic for some problems, the outer bounds were generally tight. The 3-by-3 formulations were superior in conditioning for both unregularized and regularized matrices. In preliminary experiments, the performance of the method using the 3-by-3 formulations in terms of iteration counts was equivalent or superior to those using the 2-by-2 or 1-by-1 formulations. Overall, we have presented a thorough analysis of the spectral properties of a range of formulations of the Newton systems. We have shown that the regularized 3-by-3 systems have the most favourable spectral properties and require only strict complementarity to ensure nonsingularity. Based on this analysis, we suggest that the regularized 3-by-3 systems be considered to solve the Newton systems in these methods. This is in contrast to the current state of the field, where the normal equations or saddle-point forms seem to be used exclusively. The ill-conditioning of these systems is well-known, but it has been argued that it does not impact the performance of the method when using direct solvers; this is often known as “benign” illconditioning. However, the increase in availability of computing power has led to larger problems being solved, and these large problems will necessitate the use of iterative solvers for these linear systems, where spectral properties become crucial. The natural extension to this work is thus to consider the performance of interior-point methods with iterative linear solvers. This would include anal88 Chapter 6. Conclusions ysis of the clustering of eigenvalues for all formulations, with the goal of then finding effective preconditioners for each formulation. There is much existing work for preconditioning saddle-point systems, both for specific problems and a few general black-box preconditioners. Development of preconditioners for the 3-by-3 formulations may prove difficult, particularly in a general case where there is no specific information about the structure or properties of H and J. However, the favourable conditioning of the regularized 3-by-3 forms may allow direct application of iterative solvers with either no preconditioner or a very simple preconditioner. One key point for the use of iterative solvers is that they use only matrix-vector products, and thus the increased size of the 3-by-3 forms is not a difficulty compared to the 2-by-2 forms since it is only in diagonal matrices. Another extension to this work would be to consider spectral analysis of the matrices arising from interior-point methods for other classes of optimization problems. This could include nonconvex quadratic programs, general nonlinear programs, and semidefinite programs. While a few of our results apply directly to nonconvex quadratic programs, extension to these classes of problems will require further analysis. Semidefinite programming in particular will require careful analysis since the matrices X and Z are no longer diagonal; instead they are restricted to be positive semidefinite, but are allowed to be dense. 89 Bibliography [1] The TOMLAB optimization environment. tomlab/. Accessed July 16, 2012. http://tomopt.com/ [2] A. Altman and J. Gondzio. An efficient implementation of a higher order primal-dual interior point method for large sparse linear programs. Arch. Control Sci., 2(1-2):23–40 (1994), 1993. ISSN 0004-072X. [3] P. Armand and J. Benoist. Uniform boundedness of the inverse of a jacobian matrix arising in regularized interior-point methods. Math. Programming, pages 1–6, 2011. [4] M. Benzi, G. H. Golub, and J. Liesen. Numerical solution of saddle point problems. Acta Numerica, 14:1–137, 2005. doi: 10.1017/ S0962492904000212. [5] M. Benzi, E. Haber, and L. Taralli. Multilevel algorithms for large-scale interior point methods. SIAM Journal on Scientific Computing, 31(6): 4152–4175, 2009. doi: 10.1137/060650799. [6] S. Boyd and L. Vandenberghe. Convex optimization. Cambridge University Press, Cambridge, 2004. ISBN 0-521-83378-7. [7] R. H. Byrd, M. E. Hribar, and J. Nocedal. An interior point method for large scale nonlinear programming. SIAM Journal on Optimization, 9:877–900, 1999. [8] R. H. Byrd, J.-Ch. Gilbert, and J. Nocedal. A trust region method based on interior point techniques for nonlinear programming. Mathematical Programming, Series A, 89(1):149–185, 2000. [9] Z. Coulibaly, N. I. M. Gould, and D. Orban. Fast local convergence of interior-point methods in the absence of strict complementarity. Cahier du GERAD G-2012-xx, 2012. In preparation. 90 Bibliography [10] R. Courant. Variational methods for the solution of problems of equilibrium and vibrations. Bull. Amer. Math. Soc., 49:1–23, 1943. ISSN 0002-9904. [11] J. Czyzyk, S. Mehrotra, M. Wagner, and S. J. Wright. PCx: an interiorpoint code for linear programming. Optim. Methods Softw., 11/12(14):397–430, 1999. ISSN 1055-6788. doi: 10.1080/10556789908805757. Interior point methods. [12] G. B. Dantzig. Maximization of a linear function of variables subject to linear inequalities. In Activity Analysis of Production and Allocation, Cowles Commission Monograph No. 13, pages 339–347. John Wiley & Sons Inc., New York, N. Y., 1951. [13] A. V. Fiacco and G. P. McCormick. Extensions of SUMT for nonlinear programming: Equality constraints and extrapolation. Management Sci., 12:816–828, 1966. ISSN 0025-1909. [14] A. V. Fiacco and G. P. McCormick. The slacked unconstrained minimization technique for convex programming. SIAM J. Appl. Math., 15: 505–515, 1967. ISSN 0036-1399. [15] A. V. Fiacco and G. P. McCormick. Nonlinear programming, volume 4 of Classics in Applied Mathematics. Society for Industrial and Applied Mathematics (SIAM), Philadelphia, PA, second edition, 1990. ISBN 089871-254-8. doi: 10.1137/1.9781611971316. Sequential unconstrained minimization techniques. [16] A. Forsgren. Inertia-controlling factorizations for optimization algorithms. Applied Numerical Mathematics, 43(1-2):91–107, 2002. doi: 10.1016/S0168-9274(02)00119-8. 19th Dundee Biennial Conference on Numerical Analysis (2001). [17] A. Forsgren and P. E. Gill. Primal-dual interior methods for nonconvex nonlinear programming. SIAM Journal on Optimization, 8(4):1132– 1152, 1998. doi: 10.1137/S1052623496305560. [18] A. Forsgren, P. E. Gill, and M. H. Wright. Interior methods for nonlinear optimization. SIAM Rev., 44(4):525–597 (2003), 2002. ISSN 0036-1445. doi: 10.1137/S0036144502414942. [19] R. Fourer and S. Mehrotra. Solving symmetric indefinite systems in an interior-point method for linear programming. Mathematical Programming, 62(1):15–39, 1993. doi: 10.1007/BF01585158. 91 Bibliography [20] M. P. Friedlander and D. Orban. A primal-dual regularized interiorpoint method for convex quadratic programs. Mathematical Programming Computation, 4(1):71–107, 2012. doi: s12532-012-0035-2. [21] E. M. Gertz and S. J. Wright. Object-oriented software for quadratic programming. ACM Trans. Math. Software, 29(1):58–81, 2003. ISSN 0098-3500. doi: 10.1145/641876.641880. [22] P. E. Gill, W. Murray, M. A. Saunders, J. A. Tomlin, and M. H. Wright. On projected Newton barrier methods for linear programming and an equivalence to Karmarkar’s projective method. Math. Programming, 36 (2):183–209, 1986. ISSN 0025-5610. doi: 10.1007/BF02592025. [23] P. E. Gill, W. Murray, D. B. Poncele´on, and M. A. Saunders. Preconditioners for indefinite systems arising in optimization. SIAM Journal on Matrix Analysis and Applications, 13(1):292–311, 1992. doi: 10.1137/0613022. [24] A. J. Goldman and A. W. Tucker. Theory of linear programming. In Linear inequalities and related systems, Annals of Mathematics Studies, no. 38, pages 53–97. Princeton University Press, Princeton, N.J., 1956. [25] C. C. Gonzaga. Path-following methods for linear programming. SIAM Rev., 34(2):167–224, 1992. ISSN 0036-1445. doi: 10.1137/1034048. [26] N. I. M. Gould. On practical conditions for the existence and uniqueness of solutions to the general equality quadratic-programming problem. Mathematical Programming, 32(1):90–99, 1985. [27] N. I. M. Gould. On the accurate determination of search directions for simple differentiable penalty functions. IMA J. Numer. Anal., 6(3): 357–372, 1986. ISSN 0272-4979. doi: 10.1093/imanum/6.3.357. [28] N. I. M. Gould and V. Simoncini. Spectral analysis of saddle point matrices with indefinite leading blocks. SIAM Journal on Matrix Analysis and Applications, 31(3):1152–1171, 2009. doi: 10.1137/080733413. [29] R. L. Graham, D. E. Knuth, and O. Patashnik. Concrete mathematics. Addison-Wesley Publishing Company, Reading, MA, second edition, 1994. ISBN 0-201-55802-5. A foundation for computer science. [30] I. Griva, S. G. Nash, and A. Sofer. Linear and nonlinear optimization. Society for Industrial and Applied Mathematics (SIAM), Philadelphia, 92 Bibliography PA, second edition, 2009. ISBN 978-0-898716-61-0. doi: 10.1137/1. 9780898717730. [31] L. G. Haˇcijan. A polynomial algorithm in linear programming. Dokl. Akad. Nauk SSSR, 244(5):1093–1096, 1979. ISSN 0002-3264. English translation: Soviet Math. Dokl., 20(1):191–194, 1979. [32] S. Kapoor and P. M. Vaidya. Fast algorithms for convex quadratic programming and multicommodity flows. In Proceedings of the eighteenth annual ACM symposium on Theory of computing, STOC ’86, pages 147–159, 1986. ISBN 0-89791-193-8. [33] N. Karmarkar. A new polynomial-time algorithm for linear programming. Combinatorica, 4(4):373–395, 1984. ISSN 0209-9683. doi: 10.1007/BF02579150. [34] M. Kojima, S. Mizuno, and A. Yoshise. A primal-dual interior point algorithm for linear programming. In Progress in mathematical programming (Pacific Grove, CA, 1987), pages 29–47. Springer, New York, 1989. [35] M. Kojima, S. Mizuno, and A. Yoshise. A polynomial-time algorithm for a class of linear complementarity problems. Math. Programming, 44 (1, (Ser. A)):1–26, 1989. ISSN 0025-5610. doi: 10.1007/BF01587074. [36] M. Kojima, N. Megiddo, and S. Mizuno. A primal-dual infeasibleinterior-point algorithm for linear programming. Math. Programming, 61(3, Ser. A):263–280, 1993. ISSN 0025-5610. doi: 10.1007/ BF01582151. [37] J. Korzak. Eigenvalue relations and conditions of matrices arising in linear programming. Computing, 62(1):45–54, 1999. doi: 10.1007/ s006070050012. [38] L. McLinden. An analogue of Moreau’s proximation theorem, with application to the nonlinear complementarity problem. Pacific J. Math., 88(1):101–161, 1980. ISSN 0030-8730. [39] N. Megiddo. Pathways to the optimal set in linear programming. In Progress in mathematical programming (Pacific Grove, CA, 1987), pages 131–158. Springer, New York, 1989. 93 Bibliography [40] S. Mehrotra. On the implementation of a primal-dual interior point method. SIAM J. Optim., 2(4):575–601, 1992. ISSN 1052-6234. doi: 10.1137/0802028. [41] S. Mehrotra and J. Sun. An algorithm for convex quadratic programming that requires O(n3.5 L) arithmetic operations. Math. Oper. Res., 15(2):342–363, 1990. ISSN 0364-765X. doi: 10.1287/moor.15.2.342. [42] S. Mizuno, M. J. Todd, and Y. Ye. On adaptive-step primal-dual interior-point algorithms for linear programming. Math. Oper. Res., 18(4):964–981, 1993. ISSN 0364-765X. doi: 10.1287/moor.18.4.964. [43] R. D. C. Monteiro and S. J. Wright. Local convergence of interior-point algorithms for degenerate monotone LCP. Comput. Optim. Appl., 3(2): 131–155, 1994. ISSN 0926-6003. doi: 10.1007/BF01300971. [44] Y. Nesterov and A. Nemirovskii. Interior-point polynomial algorithms in convex programming, volume 13 of SIAM Studies in Applied Mathematics. Society for Industrial and Applied Mathematics (SIAM), Philadelphia, PA, 1994. ISBN 0-89871-319-6. doi: 10.1137/1. 9781611970791. [45] J. Nocedal and S. J. Wright. Numerical optimization. Springer Series in Operations Research and Financial Engineering. Springer, New York, second edition, 2006. ISBN 978-0387-30303-1; 0-387-30303-0. [46] T. Rusten and R. Winther. A preconditioned iterative method for saddlepoint problems. SIAM Journal on Matrix Analysis and Applications, 13(3):887–904, 1992. doi: 10.1137/0613054. Iterative methods in numerical linear algebra (Copper Mountain, CO, 1990). [47] D. Silvester and A. Wathen. Fast iterative solution of stabilised Stokes systems. II. Using general block preconditioners. SIAM Journal on Numerical Analysis, 31(5):1352–1367, 1994. doi: 10.1137/0731070. [48] A. W¨ achter and L. T. Biegler. On the implementation of an interiorpoint filter line-search algorithm for large-scale nonlinear programming. Mathematical Programming, Series A, 106(1):25–57, 2006. ISSN 00255610. [49] M. H. Wright. Interior methods for constrained optimization. In Acta numerica, 1992, Acta Numer., pages 341–407. Cambridge Univ. Press, Cambridge, 1992. 94 Bibliography [50] M. H. Wright. Some properties of the Hessian of the logarithmic barrier function. Mathematical Programming, 67(1-3):265–295, 1994. doi: 10. 1007/BF01582224. [51] M. H. Wright. The interior-point revolution in optimization: history, recent developments, and lasting consequences. Bull. Amer. Math. Soc. (N.S.), 42(1):39–56 (electronic), 2005. ISSN 0273-0979. doi: 10.1090/ S0273-0979-04-01040-7. [52] S. J. Wright. Primal-dual interior-point methods. Society for Industrial and Applied Mathematics (SIAM), Philadelphia, PA, 1997. ISBN 089871-382-X. doi: 10.1137/1.9781611971453. [53] Y. Zhang. On the convergence of a class of infeasible interior-point methods for the horizontal linear complementarity problem. SIAM J. Optim., 4(1):208–227, 1994. ISSN 1052-6234. doi: 10.1137/0804012. 95
- Library Home /
- Search Collections /
- Open Collections /
- Browse Collections /
- UBC Theses and Dissertations /
- Spectral properties of matrices arising from primal-dual...
Open Collections
UBC Theses and Dissertations
Featured Collection
UBC Theses and Dissertations
Spectral properties of matrices arising from primal-dual interior-point methods for convex quadratic… Moulding, Erin 2012
pdf
Page Metadata
Item Metadata
Title | Spectral properties of matrices arising from primal-dual interior-point methods for convex quadratic programs |
Creator |
Moulding, Erin |
Publisher | University of British Columbia |
Date Issued | 2012 |
Description | The solution of convex quadratic programs using primal-dual interior-point methods has at its core the solution of a series of linear systems, which are in practice commonly reduced by block Gaussian elimination from the original unsymmetric block 3-by-3 formulation to either a block 2-by-2 saddle-point matrix or a block 1-by-1 normal equations form. The 3-by-3 formulation can also be symmetrized with a similarity transformation if a symmetric solver is to be used. We examine whether this practice of reduction is beneficial for the solver. For each of these formulations we find analytical results for the following spectral properties: the inertia, condition number, conditions for nonsingularity, and bounds on the eigenvalues. While the reduced systems become increasingly ill-conditioned throughout the iterations except in special cases, the 3-by-3 formulations remain nonsingular and well-conditioned with only mild assumptions on the problem; with regularization the assumptions are further simplified. Numerical examples are used to support the analytical results. We conclude that the 3-by-3 formulations, unsymmetric or symmetric, unregularized or regularized, have superior spectral properties that support their use in practice. |
Genre |
Thesis/Dissertation |
Type |
Text |
Language | eng |
Date Available | 2012-08-30 |
Provider | Vancouver : University of British Columbia Library |
Rights | Attribution-NonCommercial-NoDerivatives 4.0 International |
DOI | 10.14288/1.0073086 |
URI | http://hdl.handle.net/2429/43088 |
Degree |
Master of Science - MSc |
Program |
Mathematics |
Affiliation |
Science, Faculty of Mathematics, Department of |
Degree Grantor | University of British Columbia |
GraduationDate | 2012-11 |
Campus |
UBCV |
Scholarly Level | Graduate |
Rights URI | http://creativecommons.org/licenses/by-nc-nd/4.0/ |
AggregatedSourceRepository | DSpace |
Download
- Media
- 24-ubc_2012_fall_moulding_erin.pdf [ 1.38MB ]
- Metadata
- JSON: 24-1.0073086.json
- JSON-LD: 24-1.0073086-ld.json
- RDF/XML (Pretty): 24-1.0073086-rdf.xml
- RDF/JSON: 24-1.0073086-rdf.json
- Turtle: 24-1.0073086-turtle.txt
- N-Triples: 24-1.0073086-rdf-ntriples.txt
- Original Record: 24-1.0073086-source.json
- Full Text
- 24-1.0073086-fulltext.txt
- Citation
- 24-1.0073086.ris
Full Text
Cite
Citation Scheme:
Usage Statistics
Share
Embed
Customize your widget with the following options, then copy and paste the code below into the HTML
of your page to embed this item in your website.
<div id="ubcOpenCollectionsWidgetDisplay">
<script id="ubcOpenCollectionsWidget"
src="{[{embed.src}]}"
data-item="{[{embed.item}]}"
data-collection="{[{embed.collection}]}"
data-metadata="{[{embed.showMetadata}]}"
data-width="{[{embed.width}]}"
async >
</script>
</div>
Our image viewer uses the IIIF 2.0 standard.
To load this item in other compatible viewers, use this url:
http://iiif.library.ubc.ca/presentation/dsp.24.1-0073086/manifest