Spectral properties of matrices arising from primal-dual interior-point methods for convex quadratic programs by Erin Moulding B.Sc. (Hons), The University of Saskatchewan, 2010 A THESIS SUBMITTED IN PARTIAL FULFILLMENT OF THE REQUIREMENTS FOR THE DEGREE OF MASTER OF SCIENCE in The Faculty of Graduate Studies (Mathematics) THE UNIVERSITY OF BRITISH COLUMBIA (Vancouver) August 2012 c Erin Moulding 2012Abstract The solution of convex quadratic programs using primal-dual interior-point methods has at its core the solution of a series of linear systems, which are in practice commonly reduced by block Gaussian elimination from the original unsymmetric block 3-by-3 formulation to either a block 2-by-2 saddle-point matrix or a block 1-by-1 normal equations form. The 3-by-3 formulation can also be symmetrized with a similarity transformation if a symmetric solver is to be used. We examine whether this practice of reduction is bene cial for the solver. For each of these formulations we nd analytical results for the following spectral properties: the inertia, condition number, conditions for nonsingularity, and bounds on the eigenvalues. While the reduced systems become increasingly ill-conditioned throughout the iterations except in spe- cial cases, the 3-by-3 formulations remain nonsingular and well-conditioned with only mild assumptions on the problem; with regularization the as- sumptions are further simpli ed. Numerical examples are used to support the analytical results. We conclude that the 3-by-3 formulations, unsym- metric or symmetric, unregularized or regularized, have superior spectral properties that support their use in practice. iiPreface Chapters 3, 4, and 5 are based on work conducted with Chen Greif and Dominique Orban, and this material is in preparation for publication. I was responsible for much of the analysis appearing in Chapters 3 and 4, with input from my coauthors, and the numerical results in Chapter 5 are from my own code. The initial writeup of these chapters was done by me, though there have been many collaborative edits in the time since. The small section on nonstrict complementarity, Section 2.7, was originally written by Dominique Orban. iiiTable of Contents Abstract . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . ii Preface . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . iii Table of Contents . . . . . . . . . . . . . . . . . . . . . . . . . . . . iv List of Tables . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . vi List of Figures . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . vii List of Algorithms . . . . . . . . . . . . . . . . . . . . . . . . . . . ix Acknowledgements . . . . . . . . . . . . . . . . . . . . . . . . . . . x 1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1 2 Convex quadratic programming . . . . . . . . . . . . . . . . . 7 2.1 De nitions and examples . . . . . . . . . . . . . . . . . . . . 7 2.1.1 Non-standard form . . . . . . . . . . . . . . . . . . . 8 2.1.2 Examples . . . . . . . . . . . . . . . . . . . . . . . . . 9 2.2 Convexity . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10 2.3 Duality and the dual problem . . . . . . . . . . . . . . . . . 14 2.4 Constraint quali cations . . . . . . . . . . . . . . . . . . . . 16 2.5 Conditions for a solution . . . . . . . . . . . . . . . . . . . . 17 2.6 Interior-point methods . . . . . . . . . . . . . . . . . . . . . 19 2.7 Non-strict complementarity . . . . . . . . . . . . . . . . . . . 22 3 Properties of matrices in interior-point methods . . . . . . 25 3.1 Normal equations . . . . . . . . . . . . . . . . . . . . . . . . 27 3.1.1 Nonsingularity . . . . . . . . . . . . . . . . . . . . . . 27 3.1.2 Inertia . . . . . . . . . . . . . . . . . . . . . . . . . . 28 3.1.3 Eigenvalue bounds . . . . . . . . . . . . . . . . . . . . 29 3.1.4 Condition numbers . . . . . . . . . . . . . . . . . . . 30 ivTable of Contents 3.2 Saddle-point form . . . . . . . . . . . . . . . . . . . . . . . . 31 3.2.1 Nonsingularity . . . . . . . . . . . . . . . . . . . . . . 31 3.2.2 Inertia . . . . . . . . . . . . . . . . . . . . . . . . . . 31 3.2.3 Eigenvalue bounds . . . . . . . . . . . . . . . . . . . . 31 3.2.4 Condition numbers . . . . . . . . . . . . . . . . . . . 32 3.3 Unreduced 3-by-3 form . . . . . . . . . . . . . . . . . . . . . 33 3.3.1 Nonsingularity . . . . . . . . . . . . . . . . . . . . . . 33 3.3.2 Inertia . . . . . . . . . . . . . . . . . . . . . . . . . . 35 3.3.3 Eigenvalue bounds . . . . . . . . . . . . . . . . . . . . 37 3.3.4 Condition numbers . . . . . . . . . . . . . . . . . . . 42 4 Regularization and the properties of matrices . . . . . . . . 43 4.1 Normal equations . . . . . . . . . . . . . . . . . . . . . . . . 45 4.1.1 Nonsingularity . . . . . . . . . . . . . . . . . . . . . . 45 4.1.2 Inertia . . . . . . . . . . . . . . . . . . . . . . . . . . 45 4.1.3 Eigenvalue bounds . . . . . . . . . . . . . . . . . . . . 45 4.1.4 Condition numbers . . . . . . . . . . . . . . . . . . . 46 4.2 Saddle-point form . . . . . . . . . . . . . . . . . . . . . . . . 47 4.2.1 Nonsingularity . . . . . . . . . . . . . . . . . . . . . . 47 4.2.2 Inertia . . . . . . . . . . . . . . . . . . . . . . . . . . 48 4.2.3 Eigenvalue bounds . . . . . . . . . . . . . . . . . . . . 48 4.2.4 Condition numbers . . . . . . . . . . . . . . . . . . . 49 4.3 Unreduced 3-by-3 form . . . . . . . . . . . . . . . . . . . . . 50 4.3.1 Nonsingularity . . . . . . . . . . . . . . . . . . . . . . 50 4.3.2 Inertia . . . . . . . . . . . . . . . . . . . . . . . . . . 53 4.3.3 Eigenvalue bounds . . . . . . . . . . . . . . . . . . . . 55 4.3.4 Condition numbers . . . . . . . . . . . . . . . . . . . 64 5 Numerical experiments . . . . . . . . . . . . . . . . . . . . . . 65 6 Conclusions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 87 Bibliography . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 90 vList of Tables 4.1 A summary of the necessary and su cient conditions on non- singularity throughout the interior-point iterations and at the limit. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 53 5.1 Properties of quadratic programs solved. . . . . . . . . . . . . 65 5.2 TOMLAB problem 6: progression of the interior-point solver. 70 5.3 TOMLAB problem 6: progression of the interior-point solver regularized with = = 1e 4. . . . . . . . . . . . . . . . . . 73 5.4 TOMLAB problem 21: progression of the interior-point solver. 79 5.5 TOMLAB problem 21: progression of the interior-point solver regularized with = =1e 4. . . . . . . . . . . . . . . . . . . 82 5.6 Iteration counts for di erent problems, part 1. A problem that did not converge is noted by \|", and a problem that blew up is noted by \*". . . . . . . . . . . . . . . . . . . . . 85 5.7 Iteration counts for di erent problems, part 2. A problem that did not converge is noted by \|", and a problem that blew up is noted by \*". . . . . . . . . . . . . . . . . . . . . . 86 viList of Figures 2.1 Illustration of convexity. . . . . . . . . . . . . . . . . . . . . . 11 5.1 TOMLAB problem 6: eigenvalues of K1 in log scale. . . . . . 66 5.2 TOMLAB problem 6: eigenvalues of K2 in log scale. . . . . . 67 5.3 TOMLAB problem 6: eigenvalues of K3 in log scale. Note that the upper negative bound is zero and is not visible in this scale. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 68 5.4 TOMLAB problem 6: eigenvalues of K^3 in log scale. Note that the upper negative bound is zero and is not visible in this scale. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 68 5.5 TOMLAB problem 6: comparison of the condition numbers. . 69 5.6 TOMLAB problem 6: eigenvalues of K1;reg in log scale with = = 1e 4. . . . . . . . . . . . . . . . . . . . . . . . . . . . 69 5.7 TOMLAB problem 6: eigenvalues of K2;reg in log scale with = = 1e 4. . . . . . . . . . . . . . . . . . . . . . . . . . . . 71 5.8 TOMLAB problem 6: eigenvalues of K3;reg in log scale with = = 1e 4. . . . . . . . . . . . . . . . . . . . . . . . . . . . 72 5.9 TOMLAB problem 6: eigenvalues of K^3;reg in log scale with = = 1e 4. . . . . . . . . . . . . . . . . . . . . . . . . . . . 72 5.10 TOMLAB problem 6: comparing the condition numbers over di erent regularization parameters. . . . . . . . . . . . . . . . 75 5.11 TOMLAB problem 21: eigenvalues of K1 in log scale. . . . . 76 5.12 TOMLAB problem 21: eigenvalues of K2 in log scale. . . . . 76 5.13 TOMLAB problem 21: eigenvalues of K3 in log scale. Note that the upper negative bound is zero and is not visible in this scale. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 77 5.14 TOMLAB problem 21: eigenvalues of K^3 in log scale. Note that the upper negative bound is zero and is not visible in this scale. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 78 5.15 TOMLAB problem 21: comparing the condition numbers. . . 78 viiList of Figures 5.16 TOMLAB problem 21: eigenvalues of K1;reg in log scale with = = 1e 4. . . . . . . . . . . . . . . . . . . . . . . . . . . . 80 5.17 TOMLAB problem 21: eigenvalues of K2;reg in log scale with = = 1e 4. . . . . . . . . . . . . . . . . . . . . . . . . . . . 81 5.18 TOMLAB problem 21: eigenvalues of K3;reg in log scale with = = 1e 4. . . . . . . . . . . . . . . . . . . . . . . . . . . . 81 5.19 TOMLAB problem 21: eigenvalues of K^3;reg in log scale with = = 1e 4. . . . . . . . . . . . . . . . . . . . . . . . . . . . 83 5.20 TOMLAB problem 21: comparing the condition numbers over di erent regularization parameters. . . . . . . . . . . . . 84 viiiList of Algorithms 2.1 Mehrotra’s algorithm for QPs . . . . . . . . . . . . . . . . . . 21 2.2 Starting point computation for Mehrotra’s algorithm . . . . . 23 ixAcknowledgements I would rst like to thank my supervisor, Chen Greif, whose guidance and collaboration on my research were invaluable. Our weekly or twice-weekly meetings often went long while discussing various subjects, research-related or otherwise. He has supported my directions and goals through my aca- demics, and has never failed to believe in me. My other collaborator on this work, Dominique Orban, provided many insights from an optimizer’s perspective. During every meeting we had, I learned something new, pushing me to deeper understanding of the material. The Scienti c Computing Lab, the Institute of Applied Mathematics, and the Mathematics Department have been my academic homes, and they house many people who have been helpful in ways too varied to count. I appreciate all the time I have been able to spend with friends, as a break from research or as help with research, sometimes simultaneously. Rob Miller, Iain Moyles, Mark Willoughby, Kai Rothauge, and Rebecca Hiller became my group of friends in a new place, and we have had great fun on our (occasional) nights o . Jill Slind has been a wonderful and close friend for many years, and she continues to tolerate me after all this time. I would like to thank everyone who taught me math over the years for inspiring me and nurturing my love of the subject. Special thanks to Ray- mond Spiteri, my undergraduate supervisor, who rst suggested summer research and graduate studies, and to Janet Christ, my high school math teacher, who took math beyond school and continues to inspire my teaching. My parents have always believed that I could do anything, and their guidance throughout the years has led here. They started my love for math as a child, and their continued support and love is greatly appreciated. Finally, my work would not have been possible without the daily support and encouragement of my husband, Kyle Van Impe. He has been a stabi- lizing force through the di cult and often frustrating process of my degree, and has allowed me to focus by picking up the slack in everything else in our life. I am always grateful for everything he does. Funding for my graduate work was provided by a NSERC CGS-M schol- arship and by UBC a liated awards. The nancial support is appreciated. xChapter 1 Introduction Optimization is the technique of nding the best solution out of a set of choices. Optimization problems occur throughout all elds, from maximiz- ing pro t in business to minimizing material stress in engineering. Some problems model situations created by humans, such as minimizing labour costs at a factory, while others solve problems occurring naturally, such as the minimization of energy in a molecule. A very simple example is to min- imize x2 + y2, which has the obvious solution x = y = 0. The set of choices may be restricted in some way, such as maximizing pro t with a xed cost of materials. The example of minimizing x2 +y2 can be extended by restricting x+ y 2, where the solution is no longer as obvious. As computing power becomes ubiquitously available, ever larger optimization problems can be solved, leading to the increasing use of optimization throughout all elds. Any optimization problem involves an objective function, a measure of how good the current choice is. The variables in the problem represent properties of the choice being made, and restrictions on the choice are given as constraints to the problem. A general optimization problem in standard form is given by minimize x f(x) subject to c(x) = 0; d(x) 0; (1.1) where x 2 Rn and c and d may be vector functions. In this standard form, x are the variables, f(x) is the objective function and c(x) = 0 and d(x) 0 are constraints. The exact solutions to only a few optimization problems can be found; for example, certain smooth unconstrained optimization problems can be solved using straightforward calculus. However, most classes of optimization problems, and especially the large problems that occur in practice, require the use of iterative methods that nd approximate solutions. Quadratic programs are a type of the general optimization problem (1.1) in which the objective function is quadratic and the constraints are linear. In standard form, the problem is minimize x f(x) = cTx+ 1 2 xTHx subject to Jx = b; x 0; 1Chapter 1. Introduction where H 2 Rn n and J 2 Rm n. Details on the setup of these problems and converting to standard form will be covered in Section 2.1. These problems arise in many applications, and also as subproblems within solution algo- rithms for more general constrained optimization problems in the form (1.1), making the e cient solution of quadratic programs an important goal. The two major and contrasting categories of algorithms for quadratic program- ming are active-set methods and interior-point methods [45]. Active-set methods attempt to decide which inequality constraints are active, or hold with equality, reducing the problem to one that has only equality constraints and is simpler to solve. Interior-point methods, by contrast, require that the inequalities hold strictly, moving towards a solution from the interior of the feasible set. The primal-dual interior-point algorithm, our focus here, uses Newton’s method for root nding to solve a set of equations, so the major compu- tational cost of each iteration is the solution of a linear system. A key question in solving quadratic programs with interior-point methods is how to solve these linear systems. The focus of this thesis will be to answer this question by examining properties of di erent formulations of these systems. The system is naturally posed as a block 3-by-3 system, where the ma- trix is unsymmetric and inde nite. The system may be symmetrized using a similarity transformation to a symmetric inde nite block 3-by-3 system. Alternatively, one step of block Gaussian elimination can be performed to achieve a symmetric inde nite block 2-by-2 system in the classic saddle- point form [4]. Finally, a second step of block Gaussian elimination can be performed to reduce further to a symmetric positive de nite block 1-by-1 system in the normal equations form. The quadratic program can also be regularized to alleviate numerical di culties that arise during the solution of the linear system, leading to a set of regularized matrices that can be manipulated in similar ways. Interior-point methods come out of the theoretical background of the older barrier methods and penalty methods, which solve (1.1) by transform- ing the problem into a sequence of unconstrained minimization problems. Fiacco and McCormick [15] traced in detail the history of these methods until 1968, a few highlights of which follow. In 1943, Courant [10] proposed studying barrier-type functions, but this idea was not followed into the cre- ation of an algorithm. The logarithmic potential method was introduced in two papers by Frisch in 1954 and 1955, using the gradients for problems of the form f(x) + P i log(di(x)) to solve problems with inequality con- straints. This method was utilized in 1961 by Parisot in his Ph.D. thesis to solve linear programming problems, and other convex programs, with 2Chapter 1. Introduction sequential unconstrained methods, later called the sequential unconstrained minimization technique (SUMT). Fiacco and McCormick extended SUMT for convex programming problems with both equality and inequality con- straints in 1966 [13], using a barrier function for the inequality constraints and a penalty function for the equality constraints, and in 1967 [14] they es- tablished several theoretical results for the same problem, including duality results. Many results of theirs and others are collected in [15]. Barrier methods declined in popularity during the 1970s for several rea- sons, one being the ill-conditioning of matrices involved in the solution [18, 49]. Several developments speci c to linear programming changed this. The solution of linear programs had been done almost exclusively by variants of the simplex method developed in 1947 and published in 1951 by Dantzig [12], which has excellent practical performance but exponential theoretical complexity [15, 49]. In 1979, Khachiyan [31] published the ellipsoid method, the rst polynomial-time algorithm for linear programs, which was based on nonlinear programming techniques [49, 51]. Unfortunately its practical performance was poor. A dramatic change to the eld occurred in 1984 when Karmarkar published a polynomial-complexity algorithm for linear programs [33], which he claimed to be consistently 50 times faster than sim- plex [49, 51]. The algorithm was an interior method using a logarithmic potential function, leading to the revival of interior-point methods for lin- ear programming. In 1986, Gill et al. [22] showed the relationship between Karmarkar’s algorithm and classical barrier methods, meaning that similar methods could be exploited for nonlinear programming as well. Research since then has tended to view linear and nonlinear programming from a uni ed perspective, in contrast to the separated view during the dominance of simplex [49, 51]. The rst polynomial interior-point algorithm for convex quadratic programming was introduced in 1986 by Kapoor and Vaidya [32] and independently by Ye and Tse [25]. The methods covered so far have all been formulated as primal methods, focusing only on the original form of the problem. Primal-dual methods were developed after Karmarker’s discovery, and while the de nitions di er, they use both the original (primal) variables and the Lagrange multipliers (dual variables), and use Newton’s method to solve a set of nonlinear equations in these variables [51]. They have many excellent theoretical properties, includ- ing better conditioning than strictly primal methods, and can be extended to many nonlinear programs [51]. One key component of this discovery was the de nition and derivation in 1980 by McLinden [38] of theoretical prop- erties of the central path for a primal-dual set of problems [52]. In 1987, Megiddo [39] provided several insights to the central path, including its ge- 3Chapter 1. Introduction ometry near the solution, which motivated Kojima, Mizuno, and Yoshise [34] in the same year to develop the rst polynomial primal-dual algorithm, a long-step path-following algorithm. Shortly thereafter, Kojima, Mizuno, and Yoshise [35] also developed a short-step path-following algorithm with better theoretical complexity. Mizuno, Todd, and Ye in 1990 [42] provided a predictor-corrector improvement on the short-step algorithm. Parallel de- velopments to this series of primal-dual methods focused on the logarithmic function, using it to generate search directions or update function values [52]. Path-following algorithms for convex quadratic programs were devel- oped by several groups, including Mehrotra and Sun in 1990 [41]. Gonzaga in 1992 [25] covered many theoretical concepts and algorithms relating to path-following algorithms for linear programming, and lists some develop- ments for these algorithms for quadratic programming and other problems. Nesterov and Nemirovskii in 1994 [44] covered interior-point algorithms for convex optimization in a general setting with many theoretical results and new proposed methods. They also commented that techniques for linear pro- gramming can generally be extended without di culty to convex quadratic programming, and that a thorough bibliography of interior-point methods at that time had numbered over a thousand papers, indicating the explosion of the subject. The previously mentioned methods all require feasibility of the initial starting point, which is di cult to satisfy. In the search for practical al- gorithms, infeasible-interior-point methods were introduced, allowing more general starting points. Important discoveries for this group of methods were global convergence by Kojima, Megiddo, and Mizuno in 1993 [36] and polynomial complexity by Zhang in 1994 [53], the latter of which is general to convex quadratic programming. Many current algorithms are based on the predictor-corrector algorithm of Mehrotra in 1992 [40]. This infeasible- interior-point primal-dual method uses a higher-order approximation for the central path. It also uses heuristic guidelines to choose a starting point and several parameters, which leads to an e cient and practical algorithm, though it has no convergence theory [52]. This algorithm is the basis for the solvers covered in this thesis. While the algorithms of modern interior-point solvers are mostly settled, the choice of matrix formulation to use in the algorithm di ers, and various linear algebra issues related to the solution of these systems arise. Many modern solvers reduce fully to the normal equations form; one prominent example is PCx for linear programming [11]. Others reduce to the saddle- point form; examples include OOQP for quadratic programming [21] and IPOPT and KNITRO for general nonlinear programming [7, 8, 48]. An- 4Chapter 1. Introduction other example is HOPDM for linear programming and convex quadratic programming, which automatically chooses either the normal equations or saddle-point form [2]. We are not aware of any existing solvers which solve the unreduced problem for any of these problems. Properties of the normal equations form are straightforward, and there is relatively little analysis of this form. One di culty with the use of the normal equations is that an interim step involves the inversion of a matrix, which may lead to ll-in [18]. The saddle-point formulation has properties that directly follow from the general properties of such matrices; see [4, 28, 46, 47] for some relevant general results and [20] for results specialized to optimization. The ill-conditioning of some reduced matrices is well-known [15, 16, 18, 49, 51], but it has been referred to, with some assumptions on solution methods, as \benign" [16, 51],\usually harmless" [18], and \highly structured" [18]. The matrices for classical barrier methods are also ill- conditioned [18, 50]. There are relatively few existing results for the unreduced 3-by-3 for- mulation. Some spectral properties for various formulations for the special case of linear programming are covered in [37]. Uniform boundedness of the inverse under several assumptions is proved in [3], intended to be used in further theoretical proofs. Saunders is cited in [16] as stating by personal communication that a similarity transformation leads to a symmetrized ver- sion of the 3-by-3 system, equivalent to the symmetrized matrix used in this thesis. Several works [16, 18] note that the matrix of this system remains well-conditioned though ill-conditioning remains when forming the right- hand side and computing the variables due to multiplication by a diagonal matrix with large elements; they also both mention a di erent symmetric formulation that is ill-conditioned. There are also other formulations of the system besides those covered in this thesis. An augmented formulation is proposed in [17] that has the bene t of being positive de nite on and near the central path, allowing the use of more specialized linear solvers. Scalings of the systems are used in [37] and [19] to alleviate ill-conditioning. A partially-eliminated formulation for the penalty problem is used in [27] and [23] based on partitioning inequality constraints by residual size, after noting that ill-conditioning is due to the varying sizes of these residuals. The conditioning of various formulations was discussed in [5] for discretized variational problems, but since they use iterative methods with large-scale problems, the ill-conditioning of the re- duced forms is problematic, and they therefore use in experiments only the original unreduced form and a partially-eliminated formulation. They also o er a scaling of the saddle-point form that is well-conditioned, but recov- 5Chapter 1. Introduction ering the variables of interest requires the reverse scaling which can amplify numerical errors. An outline of the remainder of this thesis is as follows: Chapter 2 will cover background material on quadratic programming, Chapter 3 will cover properties of the matrices for interior-point methods, Chapter 4 will cover properties of the regularized matrices, Chapter 5 will present numerical ex- periments, and Chapter 6 will conclude. 6Chapter 2 Convex quadratic programming In this chapter we introduce quadratic programming and necessary theoret- ical concepts for its analysis. This material is well known, but is provided for completeness. A thorough treatment of quadratic programming, and the source of the material of this chapter unless otherwise noted, can be found in the books [6, 30, 45]. Section 2.1 will de ne the problem and give examples, Section 2.2 will cover convexity, Section 2.3 will de ne the dual problem, Section 2.4 will de ne constraint quali cations, Section 2.5 will cover the necessary and su cient conditions for a minimum, Section 2.6 will cover interior-point methods, and Section 2.7 will cover non-strict complementar- ity. 2.1 De nitions and examples Given matrices H 2 Rn n and J 2 Rm n and vectors b 2 Rm and c 2 Rn, a quadratic program (QP) in standard form is given by minimize x2Rn cTx+ 1 2 xTHx subject to Jx = b; x 0: (2.1) The function cTx + 12x THx is called the objective function, the equations Jx = b are the equality constraints, and the inequalities x 0 are the inequality constraints or nonnegativity constraints. A common assumption is that J is full rank withm < n, meaning none of the equality constraints are redundant. H is symmetric, and is variously required to be positive de nite, positive de nite on the nullspace of J , or positive de nite on a critical cone lying in the nullspace of J . Throughout this thesis, H will be assumed to be at least positive semi-de nite and no assumption will be made on the rank of J , with other conditions on H and J speci ed as needed. If H 0, (2.1) is a linear program (LP), another important type of optimization problem. A point x is called feasible if all the constraints are satis ed, and the feasible set is the set of all feasible points, fx j Jx = b; x 0g. An 72.1. De nitions and examples inequality constraint is inactive if it is satis ed strictly, and is called active if it is satis ed with equality. The active set A(x) at a feasible point x is the set of constraints that are active for the point x, and the inactive set I(x) is similarly the set of constraints that are inactive. Henceforth if unambiguous these sets will be referred to as A and I for brevity. For an index set N f1; : : : ; ng and a vector v 2 Rn, vN is the subvector of v indexed by N . Similarly, if A is a matrix with n columns, AN is the submatrix of the columns of A corresponding to indices in N , and ANN is the square submatrix with both rows and columns corresponding to indices in N . Using this notation, the vector xA corresponds to the elements of x in the active set, thus xA = 0 for a QP in standard form (2.1). A point x is a local minimizer or local solution of (1.1) if it is feasible and f(x ) f(x) for all feasible x in some neighbourhood surrounding x . A point x is a global minimizer or global solution if it is feasible and f(x ) f(x) for all feasible x. The gradient of a di erentiable function f(x) is rf = @f @x1 ; : : : ; @f@xn : The Hessian is r2f = 0 B B B B B @ @f2 @x21 @f2 @x1@x2 : : : @f 2 @x1@xn @f2 @x2@x1 . . . ... ... . . . ... @f2 @xn@x1 @f2 @xn@x2 : : : @f 2 @x2n 1 C C C C C A : The Taylor series for a function f(x) is an in nite sum of terms representing the function. Truncating the Taylor series gives a Taylor polynomial that approximates f near a point x0. We will use only up to the quadratic Taylor polynomial, which is given by f(x) f(x0) + (x x0) Trf(x0) + 1 2 (x x0) Tr2f(x0)(x x0): 2.1.1 Non-standard form A quadratic program may be stated in many di erent forms, but all can be reduced to the standard form. The simplest case is that the variables x are bounded below element-wise, as x d. This can be changed to standard form by shifting to a new variable vector x0 = x d which is then bounded 82.1. De nitions and examples below by zeros, and setting b0 = b Jd to formulate the new constraints Jx0 = b0. The objective function does not change, although the optimal value will. Another case is if the constraints with the matrix J are given as in- equality constraints, Jx b. By introducing slack variables xs 0, the constraints can be rewritten as J I x xs = b. Upper bounds on the variables are considered as constraints and dealt with in the same way. Sim- ilarly, if the constraints were Jx b, then the problem can be transformed by subtracting slack variables in the same way (these are sometimes referred to as excess variables). In both cases, the objective function is corrected to cT 0T x xs + 12 xT xTs H 0 0 0 x xs . If the variables lack nonnegativity constraints and are thus free variables, this can be xed by setting x = x+ x , with x+ 0 and x 0. The equality constraints are corrected to J J x+ x = b, and the new objective function is cT cT x+ x + 12 xT+ x T H H H H x+ x . Finally, these techniques can all be combined in the case that the problem is non-standard in more than one way, and each of these techniques can be applied to one or more variables. Thus since any problem can be expressed in standard form, standard form will be assumed henceforth. 2.1.2 Examples One common example of quadratic programming is least-squares approxi- mation. The problem is to approximately solve Jx = b, with J 2 Rm n a full rank matrix with m > n. Unless b is in the range-space of J , there is no solution to this equation. Instead, the problem is solved to minimize the residual kJx bk2. In the form of a quadratic program, this is minimize x kJx bk22 = x TJTJx 2bTJx+ bT b: (2.2) This is an unconstrained QP, and it has the solution x = (JTJ) 1JT b: (2.3) Bound constraints in the form l x u can be added to this problem to get a constrained least-squares problem, which no longer has the solution 92.2. Convexity (2.3). If J is large and sparse, regardless of whether or not there are con- straints, iterative methods are applied to solve (2.2) rather than using (2.3) to compute x, which is computationally ine cient and potentially unstable. An example that comes from modelling a real-life situation is Markowitz’s model for portfolio optimization. It models a collection of n possible invest- ments, each with return ri which is not known in advance. These returns may be assumed to be random variables with expected value i = E[ri] and variance 2i = E[(ri i) 2]. The returns are not independent, and the correlations between the returns are given by i;j = E[(ri i)(rj j)] i j . Given this information, the investor invests a fraction of the available money xi in the ith investment. Constraints are that all money must be invested, so the sum of the xi must be one, and that the fractions must be positive. The optimization is to maximize return while minimizing variance, resulting in the quadratic program maximize x2Rn Tx xTGx subject to nX i=1 xi = 1; x 0: The matrix G is the covariance matrix with entries Gi;j = i;j i j , and the parameter is a risk tolerance parameter. A large value for places more emphasis on having less variance, which is a more conservative investment, while a small value will allow more variance but possibly larger returns. When applying this model to real life investments, the expected values, variances, and correlations are not known, and are generally estimated by combining historical data with the insights of professionals. 2.2 Convexity Convexity is a fundamental property for optimization that determines how local and global minimizers behave and are related. The convexity of an optimization problem is determined by the convexity of the objective and constraint functions. A set C 2 Rn is convex if for any x; y 2 C, we have x + (1 )y 2 C for all 2 [0; 1]. Geometrically, this says that the line segment connecting any two points in the set lies entirely in the set. A function f(x) is convex on a convex set C if for any x; y 2 C it satis es f( x+ (1 )y) f(x) + (1 )f(y); 8 2 [0; 1]: (2.4) It is strictly convex if it satis es f( x+ (1 )y) < f(x) + (1 )f(y); 8 2 (0; 1) 102.2. Convexity for any x; y 2 C with x 6= y. A function f(x) is concave if f(x) is convex. Convexity for functions can be thought of as a generalization of linearity: a function is linear if the condition (2.4) holds with equality. Thus linear functions are both convex and concave. Convexity in geometric terms says that the line connecting (x; f(x)) and (y; f(y)) lies on or above the function, as illustrated in Figure 2.1. Figure 2.1: Illustration of convexity. For di erentiable functions, convexity can be characterized in terms of derivatives. Theorem 2.1 (First order conditions for convexity). For f continuously di erentiable, it is convex on a convex set C if and only it satis es f(x) f(y) +rf(y)T (x y); for all x; y in C. It is strictly convex if and only if the above holds with a strict inequality. Proof. First we show that convexity implies the inequality as stated. Assume that f is convex on a set C. Using the de nition of convexity of f , we have f( x+ (1 )y) f(x) + (1 )f(y); f(y + (x y)) f(y) f(x) f(y): Allowing to tend to zero from above gives rf(y)T (x y) f(x) f(y); which rearranges to the result. 112.2. Convexity Now, for the other direction, assume that f satis es f(x) f(y) +rf(y)T (x y); for all x; y in C. Then for an arbitrary 2 [0; 1], let z = x+ (1 )y 2 C, and take f(x) f(z) +rf(z)T (x z); f(y) f(z) +rf(z)T (y z): Multiplying the rst inequality by and the second by (1 ) and adding gives f(x) + (1 )f(y) f(z) +rf(z)T ( x+ (1 )y) rf(z)T z; f(x) + (1 )f(y) f( x+ (1 )y); giving that f is convex. The proof for strict convexity is identical except for the use of strict inequalities, so it is omitted. Theorem 2.2 (Second order conditions for convexity). For f twice con- tinuously di erentiable, it is convex if and only if its Hessian is positive semide nite. It is strictly convex if and only if its Hessian is positive de - nite. Proof. Assume that f is convex on an open convex set C. Let y 2 C and d be any direction. Then for small > 0, y + d 2 C, and we can use a Taylor expansion as follows: f(y + d) = f(y) + rf(y)Td+ 2dTr2f(y)d+ 3O(kdk3): Using the result from Theorem 2.1 with x = y + d, we have 2 dTr2f(y)d+ O(kdk3) 0: Dividing by 2 and then allowing to tend to zero, we have that r2f(y) is positive semi-de nite. Assume thatr2f(z) is positive semide nite for all z 2 C, an open convex set. Then for any x; y 2 C, the following holds by Taylor’s theorem f(y) = f(x) +rf(x)T (y x) + (y x)Tr2f(z)(y x); for some z 2 C. Since r2f(z) is positive semide nite, the result of Theo- rem 2.1 holds and thus f is convex. Again, the results for strictly convex are similar and are omitted. 122.2. Convexity Using the de nitions of convexity and strict convexity for functions, we can now de ne convexity for optimization problems. De nition 2.3 (Convexity for optimization problems). A general optimiza- tion problem (1.1) is called convex if the objective function f is convex, the equality constraint functions ci are linear, and the inequality constraint func- tions di are concave. It is called strictly convex if the objective function is strictly convex and the latter two conditions also hold. The bene t of convex optimization is that a local solution is also a global solution, as shown in the following theorem. Theorem 2.4 (Convex global solutions). If x is a local minimizer of a convex optimization problem (1.1), then x is a global minimizer. If the problem is strictly convex, then x is the unique global minimizer. Proof. Assume that x is a local minimizer, but that it is not a global minimizer. Then there exists some x with c(x) = 0 and d(x) 0 such that f(x) < f(x ). Then for 2 (0; 1), we have f( x + (1 )x) f(x ) + (1 )f(x); < f(x ) + (1 )f(x ); = f(x ); showing that there exist points arbitrarily close to x with function values strictly less than f(x ), contradicting the assumption that x is a local minimizer. Thus such an x cannot exist, and x is a global minimizer. For f strictly convex, assume that x is a global minimizer but that it is not unique. Then there exists some x with c(x) = 0 and d(x) 0 such that f(x) = f(x ). Then for 2 (0; 1), we have f( x + (1 )x) < f(x ) + (1 )f(x); = f(x ) + (1 )f(x ); = f(x ); again contradicting the assumption that x is a local minimizer. Thus x is the unique global minimizer. For quadratic programming, the conditions that the equality constraints be linear and the inequality constraints concave always hold, so whether the problem is convex depends only on the objective function. If H is positive semi-de nite, then the quadratic programming problem (2.1) is convex, and 132.3. Duality and the dual problem further if H is positive de nite, then the problem is strictly convex. The assumptions of this thesis ensure that the problem is always convex, which will ensure additional theoretical properties such as duality, which is covered next. 2.3 Duality and the dual problem The theory of duality allows the construction of an alternative optimization problem, the dual problem, from the original problem, called the primal problem for contrast. In the dual problem, the variables and constraints have switched positions, so that the variables of the dual problem are related to the constraints for the primal problem, and similarly for the constraints of the dual. For some problems, the dual is easier to solve than the primal, and when the problems satisfy certain properties , then given the solution to one problem, the solution to the other can be easily computed. In the case of quadratic programming, the dual objective will provide a convenient lower bound on the primal objective. Here we use Lagrangian duality, though other theories exist; see [30]. By introducing vectors y 2 Rm and z 2 Rn, the Lagrange multipliers for the equality and inequality constraints respectively, the dual problem can be de ned for the primal QP (2.1). The Lagrangian function for (2.1) is L(x; y; z) = cTx+ 1 2 xTHx yT (Jx b) zTx: We can recover the primal problem as follows. The primal function is de ned to be L (x) = sup y;z 0 L(x; y; z); = sup y;z 0 cTx+ 1 2 xTHx yT (Jx b) zTx : If any component of Jx b 6= 0, then by allowing the corresponding multi- plier in y to increase or decrease, depending on the sign of the component, and setting all other multipliers to zero, the Lagrangian will increase without bound. Similarly, if any x < 0, by allowing the corresponding multiplier in z to increase, the Lagrangian will increase without bound. Thus the terms yT (Jx b) and zTx must be zero, and the primal function is L (x) = ( cTx+ 12x THx; if x 0; Jx = b 1; otherwise: 142.3. Duality and the dual problem The original primal problem is thus minimize x L (x) on the domain where L (x) is nite. The primal problem is therefore a min-max problem for the Lagrangian function. The dual problem is given by the max-min problem for the same La- grangian. Note that since the variables z correspond to the inequality con- straints, they must be nonnegative, and thus we have the constraint z 0. For any z 0, the dual function is de ned as L (y; z) = inf x L(x; y; z); = inf x cTx+ 1 2 xTHx yT (Jx b) zTx : If any of rxL = c+Hx JT y z 6= 0, then by changing the corresponding element of x and setting the remaining elements to zero the Lagrangian will decrease without bound. The condition c+Hx JT y z = 0 thus becomes a constraint. Solving this equation for x would require H positive de nite, which we do not assume. Alternatively, x can be retained as a variable to obtain L (x; y; z) = ( bT y 12x THx; if z 0; JT y + z Hx = c 1; otherwise: The dual problem is then de ned to be maximize x;y;z L (x; y; z) on the domain where this is nite. Then the dual problem can be stated as follows maximize x;y;z bT y 1 2 xTHx subject to JT y + z Hx = c; z 0: (2.5) Note that this is also a convex quadratic programming problem, though not in standard form. Taking the dual of this dual problem recovers the primal problem. If H is not positive semi-de nite, the relationship between the primal and dual problems no longer holds. Given a feasible primal-dual triple (x; y; z), the di erence in the objec- tives is (cTx+ 1 2 xTHx) (bT y 1 2 xTHx); = cTx+ xTHx bT y; = (c+Hx)Tx bT y; = (JT y + z)Tx (Jx)T y; = yTJTx+ zTx xTJT y; = zTx 0: 152.4. Constraint quali cations Thus for any feasible triple, the dual objective is a lower bound on the primal objective. This is known as weak duality, and the inner product zTx = xT z is called the duality gap. In the case of infeasibility, the duality gap is the di erence in objective functions, but this is not equal to xT z. The dual function is in fact a lower bound on the primal function for all triples: if the triple is not primal feasible, the primal function takes the value 1, and if it is not dual feasible, the dual function takes the value 1. 2.4 Constraint quali cations Methods for solution of many optimization problems (1.1), and speci cally quadratic programs (2.1), work by using linear approximations to the ob- jective and constraint functions to build a better iterate from the current iterate. This approach only makes sense when linear approximations are good approximations. The theory of constraint quali cations nds condi- tions when these approximations make sense. Let C be the closed, convex set of feasible points for (2.1). If x is a feasible point, the sequence fzkg is a feasible sequence approaching x if zk 2 C for all k large enough and zk ! x. The following de nition is one way to capture the geometry of a set C at a feasible point. De nition 2.5 (Tangent cone). The vector d is a tangent to C at x if there is a feasible sequence fzkg approaching x and a sequence ftkg with tk > 0, tk ! 0 such that limk!1 zk x tk = d. The set of all tangents to C at x is called the tangent cone and is denoted TC(x). Starting from any feasible point for (2.1), we want to move in directions that stay within the feasible set. This is given by the following de nition. De nition 2.6 (Linearized feasible direction set). Given a feasible point x and the active set A(x), the set of linearized feasible directions is F(x) = fd j Jd = 0; dT ei = 0; i 2 A(x)g; where ei is the ith standard basis vector. Constraint quali cations are conditions where TC(x) is identical to, or at least similar to, F(x). This ensures that the feasible directions represent the actual geometry of the set C, and thus that a linearized approximation to the problem captures the essential features and will yield useful informa- tion. They are also used extensively for theoretical analysis of optimization problems. There are many di erent constraint quali cations, some of which 162.5. Conditions for a solution are more easily characterized than others. One constraint quali cation is that all active constraint gradients be linear, which is obviously satis ed for quadratic programming. However, the constraint quali cation we will use throughout this thesis is another common constraint quali cation, the linear independence constraint quali cation. De nition 2.7 (LICQ). A triple (x; y; z) that is feasible for (2.1) satis es the Linear Independence Constraint Quali cation (LICQ) if the set of active constraint gradients is linearly independent, i.e. JT IA has full column rank. This matrix comes from the gradients of the equality constraints, JT , and the gradients of the active inequality constraints, IA. The negative sign is added for convenience in later results. The LICQ can be used to nd conditions for a solution, as in the following section, and we will also use it in Chapters 3 and 4 to nd requirements for nonsingularity of matrices involved in the algorithms. 2.5 Conditions for a solution Conditions for a solution of an optimization problem (1.1) can be char- acterized by using rst-order or second-order derivative information. The rst-order necessary conditions for a solution to (2.1) are fundamental to constrained optimization, and in particular to interior-point methods. They are given in the following theorem. Theorem 2.8 (KKT conditions). If x is a local minimizer of (2.1) and the LICQ holds at x , then there exist y and z such that the following conditions are satis ed at x ; y ; z : c+Hx JT y z = 0; (2.6a) Jx b = 0; (2.6b) x 0; (2.6c) z 0; (2.6d) xizi = 0; i = 1; 2; : : : ; n: (2.6e) These are referred to as the Karush{Kuhn{Tucker (KKT) conditions for the quadratic programming problem (2.1). Proof. The proof is long and technical. The interested reader is referred to [45, Section 12.4] for a thorough proof for general optimization problems. 172.5. Conditions for a solution The conditions xizi = 0 are referred to as the complementarity condi- tions, and they require that at least one of xi and zi be zero for each index i. If exactly one of these is zero for all indices, then strict complementarity is said to hold. A QP is called degenerate if either strict complementarity fails to hold, or some constraints are redundant at the solution. Thus if the LICQ and strict complementarity hold, the problem is nondegenerate. If the LICQ fails to hold, we can regularize the problem, as described in Chap- ter 4. Non-strict complementarity complicates matters signi cantly, but by partitioning the variables, it can be dealt with. We cover this in Section 2.7. At this point the duality result from Section 2.3 can be revisited. The di erence in the primal and dual objectives is zTx 0: At the solution, the KKT conditions require that zTx = 0. Thus at the solution, the primal and dual objectives are equal. This is known as strong duality. The second order necessary conditions are given in the following theorem. Theorem 2.9 (Second-order necessary conditions). Assume x is a local minimizer of (2.1), the LICQ holds at x , the vectors y and z are the Lagrange multipliers for which the KKT conditions are satis ed, and strict complementarity holds. De ne N to be a null-space basis matrix for the matrix J ITA . Then NTHN is positive semi-de nite. Proof. See [45, Section 12.5]. The second-order su cient conditions are given in the following theorem. Theorem 2.10 (Second-order su cient conditions). Assume x is a feasible point for (2.1) and there exists y and z such that the KKT conditions are satis ed at x ; y ; z with strict complementarity. Assume also that NTHN is positive de nite. Then x is a strict local solution for (2.1). Proof. See [45, Section 12.5]. Remark. Note that for a strictly convex QP, H is positive de nite and thus NTHN is positive de nite always since N is a basis matrix. Then if strict complementarity is satis ed, by Theorem 2.10 this implies that the KKT conditions are in fact necessary and su cient. This fact is used to derive methods for solving (2.1) by iteratively nding solutions to the KKT condi- tions, as is covered in the next section. 182.6. Interior-point methods 2.6 Interior-point methods The two main classes of solvers for QPs are active-set methods and interior- point methods. Active-set methods have been in use since the 1970s, and extend the idea of Dantzig’s simplex method for LPs. They use an estimate of the active set A to solve at each iteration a QP which has only equality constraints. Then a new estimate of the active set is found, and the method proceeds. Each iteration is cheap, but no advantage of sparsity is taken, and the time required to solve in the worst case is exponential in the problem size. These methods are very e ective for smaller QPs where the exponential worst-case bound is not insurmountable, or where a rough solution is known that can warm-start the algorithm [45]. The history of interior-point methods is covered in Chapter 1. While they have polynomial complexity, each iteration is expensive. The idea is to solve the equalities from (2.6) while ensuring that the inequalities from (2.6) are strictly satis ed. These equations have both linear and mildly nonlinear components, and are not too di cult to solve, but the inequalities create more di culties. The particular class of interior-point methods we focus on are primal-dual interior-point methods. The KKT conditions for the primal QP (2.1) are given by (2.6), and since the solution (x; y; z) of the KKT conditions must be primal and dual feasible and satisfy complementarity, the dual problem is also solved. Thus by solving the KKT conditions for (x; y; z), we obtain a solution for both problems simultaneously. These primal-dual methods were shown to be the most e cient form of interior-point methods, and have been the ones that continue to be used. The equations from the KKT conditions can be rewritten as a mapping as follows F (x; y; z) = 0 @ c+Hx JT y z b Jx XZe 1 A = 0; where the matrices X and Z are diagonal matrices of the vectors x and z respectively, and e is the vector of ones. Using Newton’s method, the system that is solved at each iteration is 0 @ H JT I J 0 0 Zk 0 Xk 1 A 0 @ xk yk zk 1 A = 0 @ c Hxk + JT yk + zk b+ Jxk XkZke 1 A : (2.7) The new iterate is (xk+1; yk+1; zk+1) = (xk; yk; zk) + k( xk; yk; zk), with step size k 2 (0; 1] chosen to retain feasibility. Henceforth, we drop the iteration numbers for compactness, and it is understood that the system 192.6. Interior-point methods is composed of the vectors from the most recent iteration. This system is referred to as the a ne Newton system, and its solution is referred to as the a ne Newton direction. The full step with = 1 generally violates the nonnegativity of x and z, so a line search can be performed to determine the step size. Unfortu- nately, the step size found from this pure Newton step is often very small. To resolve this di culty, the complementarity conditions xizi = 0 are re- placed with the relaxed complementarity conditions xizi = , > 0. In subsequent iterations, is reduced to zero to recover the original comple- mentarity conditions. This relaxed complementarity condition can also be found by considering the optimization problem with a logarithmic barrier term with barrier parameter , and nding the rst-order optimality condi- tions for this problem. The revised KKT conditions can again be written as a function as follows F (x; y; z) = 0 @ c+Hx JT y z b Jx XZe+ e 1 A = 0: (2.8) The system that is solved at each iteration is then 0 @ H JT I J 0 0 Z 0 X 1 A 0 @ x y z 1 A = 0 @ c Hx+ JT y + z b+ Jx XZe e 1 A : (2.9) A Newton step in this direction is usually able to take a larger step length without violating nonnegativity. The set of solutions f(x ; y ; z ) j > 0g is referred to as the central path, and all solutions on the central path are strictly feasible. The value of is chosen to reduce the current average value of xizi, de ned as = xT z n ; by some fraction 2 [0; 1]. This is called the centering parameter, and is the duality measure. If = 1, this step is a pure centering step, moving the solution towards the point (x ; y ; z ) on the central path where all the complementarity products are equal to the current average value. If = 0, the standard Newton step from (2.7) is recovered. Most solvers choose a value between these two extremes. The solution technique we use for the numerical experiments in Chapter 5 is based on the predictor-corrector method for linear programs by Mehrotra, which was later extended to QPs. It is a practical algorithm that ignores 202.6. Interior-point methods some theory but has good results. It involves two Newton steps. The rst is a predictor step to nd the a ne Newton direction by solving the system (2.7). The results from this computation are then used in a second Newton step which computes the search direction in a combination of three contri- butions: the predictor from the a ne Newton step, a corrector for the error in linearization made in the predictor step, and a correction towards the central path. The system for this step is as follows 0 @ H JT I J 0 0 Z 0 X 1 A 0 @ x y z 1 A = 0 @ c Hx+ JT y + z b+ Jx XZe+ Xa Za e e 1 A ; (2.10) where Xa and Za are diagonal matrices of the vectors xa and za respectively, the steps from the a ne Newton step. The value of is also based on the results from the a ne Newton step, in a formula with no the- oretical justi cation but good performance. The use of the same matrix for two steps allows the matrix to be factored once with the resulting factor- ization used twice, which is only marginally more expensive than one solve. The full algorithm is given in Algorithm 2.1. Algorithm 2.1 Mehrotra’s algorithm for QPs Input: H, J , c, b, tolerance. Initialize x, y, z. repeat Solve (2.7) to get xa , ya , za . Compute step sizes pria and dual a . Compute = xT z=n and a = (x+ pri a xa ) T (z + duala za )=n. Compute centering parameter = ( a = )3. Solve (2.10)to get x, y, z. Compute step sizes pri and dual. Update x = x+ pri x, y = y + dual y, z = z + dual z. until duality gap is less than tolerance. The maximum step size for the primal a ne step before moving into in- feasibility would be the minimum of xi= xa ;i over all indices i where xa ;i < 0. The actual step size is taken to be most of the distance to the boundary, using a factor called the step factor which is close to 1. We take a step factor of 0:95, and compute a step size as follows: pria = min(1; 0:95 min xa ;i<0( xi= xa ;i)). The calculation of the other step sizes are similar, with the dual steps using the corresponding values of z rather than x. 212.7. Non-strict complementarity One di culty in using interior-point methods is nding a good starting point. Mehrotra’s initialization solves two equality-constrained minimum norm problems, then modi es these to ensure strict feasibility with respect to the inequality constraints. This again has no theoretical basis, but has good practical performance, and the algorithm in fact performs poorly without it. In our work, we solve two equality constrained QPs, following the method of Mehrotra’s original algorithm for LPs and the work to extend to QPs in [20]. Their initialization problem for x is an equality constrained QP, and the problem for y and z is chosen to use the same matrix, as Mehrotra’s LP algorithm does. We follow this idea, and the initial primal variable x is chosen to solve the problem minimize x 1 2 kxk2 + 1 2 xTHx subject to Jx = b, which can be solved with the following linear system: H + I JT J 0 x t = 0 b : (2.11) The solution for t is then discarded. The initial dual variable z is then chosen to solve the problem minimize z 1 2 kzk2 + 1 2 zTHz + cT z subject to Jz = 0, which can be solved with the following system: H + I JT J 0 z y = c 0 ; (2.12) which also provides the initial Lagrange multipliers y. Since these problems both use the same matrix, a single factorization can be applied. The initial points x and z will in general include negative elements, so the procedure of Mehrotra is applied to ensure that they are strictly positive by modifying negative elements. The full algorithm to compute the starting values is given in Algorithm 2.2. 2.7 Non-strict complementarity Many of the results of this thesis rely on strict complementarity being satis- ed in the limit, however, not all QPs (2.1) satisfy strict complementarity at a local solution. In the case of linear programming, if at least one solution exists, a strictly complementary solution always exists [24]. With mild as- sumptions, interior-point methods iterate towards a strictly complementary limit point [52, Theorem 6.8]. However, this no longer holds for QPs. One counter-example is minimize x 1 2 x2 subject to x 0; 222.7. Non-strict complementarity Algorithm 2.2 Starting point computation for Mehrotra’s algorithm Input: H, J , c, b. if c 0 then Perturb a random element of c to 10 2. end if if b 0 then Perturb a random element of b to 10 2. end if Solve (2.11) to get ~x. Solve (2.12) to get ~y and ~z. Compute x = max(( 3=2) mini ~xi; 0) and z = max(( 3=2) mini ~zi; 0). Compute x^ = ~x+ xe and z^ = ~z + ze. Compute ^x = 0:5 x^ T z^ eT z^ and ^z = 0:5 x^T z^ eT x^ . Compute x = x^+ ^xe, y = ~y, and z = z^ + ^ze. which has (x; z) = (0; 0) as the only solution. The matrices of (2.7), (2.9), and (2.10) will have zero rows in the limit, and thus will be singular. Here we outline a partially eliminated system which will allow our results to apply to these problems. Indicator sets can be used to separate indices into three groups, two active and one inactive. We de ne two types of active constraints, and thus partition the active set A. The set of strongly active constraints at x is AS := fi = 1; : : : ; n j xi = 0 < zig. The set of weakly active constraints at x is AW := fi = 1; : : : ; n j xi = zi = 0g, the constraints at which strict complementarity fails to hold. Suppose at each iteration k of the interior- point method, we can identify approximations AkS , A k W , and I k to AS , AW , and I, respectively. Such indicator sets can resolve the singular limit di culty provided they ensure that zki =x k i ! 0 as k ! 1 for i 2 A k W [ I k while xki =z k i ! 0 as k ! 1 for i 2 A k S . Indeed if this were the case, upon partitioning x, z, H, and J according to Bk := AkW [ I k and Sk := AkS , the variables zB can be eliminated in (2.9) to get 0 B B @ HSS HSB JTS I HTSB HBB +X 1 B ZB J T B JS JB ZS XS 1 C C A 0 B B @ xS xB y zS 1 C C A = 0 B B @ rd;S rd;B +X 1 B rc;B rp rc;S 1 C C A : The coe cient matrix of this system has a well-de ned limit whenever J has full row rank. Details of indicator sets with the requisite properties, along 232.7. Non-strict complementarity with pointers to the literature, are given in [9] and [43]. Should J not have full row rank, the system can be regularized, as covered in Chapter 4. 24Chapter 3 Properties of matrices in interior-point methods In this chapter we examine the properties of several di erent formulations of the matrices in systems (2.7), (2.9), and (2.10). For each of these matrices, we nd conditions for nonsingularity, the inertia, bounds on the eigenvalues, and bounds and estimates of the condition numbers. The matrices of (2.7), (2.9), and (2.10) are all the same, so we can write a general form of the step as 0 @ H JT I J 0 0 Z 0 X 1 A 0 @ x y z 1 A = 0 @ rd rp rc 1 A ; (3.1) where the subscripts on the right side indicate the constraints for dual fea- sibility, primal feasibility, and the complementarity condition. These are de ned by rd = c Hx + JT y + z, rp = b + Jx, and rc will vary de- pending on the type of step being performed. The matrix of this system is K3 = 0 @ H JT I J 0 0 Z 0 X 1 A : The system (3.1) can be reduced by eliminating z, resulting in the system H +X 1Z JT J 0 x y = rd X 1rc rp : Then z can be recovered by computing z = X 1(rc + Z x), which since X is diagonal, is an inexpensive computation. The matrix of this system is K2 = H +X 1Z JT J 0 : 25Chapter 3. Properties of matrices in interior-point methods One more step of Gaussian elimination can be performed to eliminate x, resulting in the system J(H +X 1Z) 1JT y = J(H +X 1Z) 1(rd X 1rc) rp : Then z and x can be recovered via z = X 1(rc + Z x) and x = (H +X 1Z) 1(JT y + rd X 1rc), which is an inexpensive computation if the inverse of H +X 1Z is stored for use several times in each iteration. The matrix of this system is K1 = J(H +X 1Z) 1JT : The system (3.1) can be solved using any of the matrices K3, K2, or K1, each of which has di erent properties. The properties of K2 and K1 are relatively well-known, while K3 has had comparatively little analysis; see Chapter 1 for a review of existing literature. We provide analysis on all formulations, including new results for K3. The following de nitions and notation will be used throughout this and the next chapter. The maximum and minimum singular values of a general matrix B are denoted by max(B) and min(B) respectively, and similarly the maximum and minimum eigenvalues of a square matrix M are max(M) and min(M). The eigenvalues of H are denoted by 1 2 n 0; and the singular values of J are denoted by 1 2 m 0: We use for the eigenvalues of the matrices K1, K2, and K3 as needed to avoid confusion with other quantities. For any vectors u 2 Rn, v 2 Null(J)? Rn, and w 2 Rm, the following bounds are satis ed nkuk 2 uTHu 1kuk 2; (3.2) mkvk kJvk 1kvk; (3.3) mkwk kJ Twk 1kwk: (3.4) Note that the right inequality in (3.3) is satis ed for all v 2 Rn. De nition 3.1 (Inertia). The inertia of a square matrix S with real eigen- values is the triple (n+; n ; n0), where n+, n and n0 are the numbers of positive, negative, and zero eigenvalues of S, respectively. 263.1. Normal equations Theorem 3.2 (Sylvester’s law of inertia, [26]). For any nonsingular matrix R, the matrices S and RSRT have the same inertia. De nition 3.3 (Condition number). The spectral condition number of a matrix B is denoted (B) and is de ned by (B) = max(B) min(B) : De nition 3.4 (Asymptotic notation, [29, Section 9.2]). f(x) is said to be at most order g(x), written f(x) = O(g(x)), if there exists a positive real number M and a real number x0 such that jf(x)j M jg(x)j for all x > x0. De nition 3.5 (Asymptotic bounds). We say that g(x) is an asymptotic upper bound for f(x), and write f(x) . g(x), if f(x) O(g(x)). Similarly, we say that h(x) is an asymptotic lower bound for f(x), and write h(x) . f(x), if f(x) O(h(x)). At each iteration of the interior-point method, the matrices K1, K2, and K3 will change since X and Z change. These matrices are collectively referred to as the case of during the interior-point method or during the iterations, as similar properties hold. The limiting case or the case in the limit refers to the situation where is taken to zero and we nd a solution satisfying the original complementarity condition, and thus the solution to (2.1). Although this case will not occur in the interior-point method and the matrices will not be used, we can assemble matrices in the same forms as K1, K2, and K3 and discuss their properties. This analysis is helpful because the interior-point method is iterating towards this solution and thus the matrices of the method iterate towards these limiting matrices. 3.1 Normal equations We rst examine the properties of the matrix K1, the furthest reduced of the formulations. An obvious bene t of the use of this formulation is the smaller size, but spectral properties may not be optimal, as we see in this section. 3.1.1 Nonsingularity First we observe the conditions for nonsingularity of K1, considering sep- arately the cases of during the interior-point iteration and the limit of the iterations. 273.1. Normal equations Theorem 3.6. The matrix K1 is positive de nite during the iterations, and thus nonsingular, if and only if J has full rank. Proof. Assume J has full rank. Then for any v 6= 0, w = JT v 6= 0 as well. Forming vTK1v = vT J(H +X 1Z) 1JT v = wT (H + X 1Z) 1w > 0 since H is positive semi-de nite and X and Z are diagonal and strictly positive. Thus J full rank is su cient for K1 to be positive de nite. Assume J does not have full rank. Thus there exists v 2 Null(JT ), v 6= 0, and vTK1v = 0. Thus J full rank is necessary. Theorem 3.7. The matrix K1 is positive de nite at the limit of the itera- tions if and only if the variables x > 0, H is positive de nite, and J has full rank. Proof. We show that each of these conditions is necessary. If any element of x = 0, then the matrix X is singular and K1 is as well. Thus x > 0 is necessary. Now assuming x > 0, by complementarity, z = 0, and thus X 1Z = 0. Then H positive de nite is necessary. Assuming that x > 0 and H is positive de nite, but that J does not have full rank, there exists v 2 Null(J), v 6= 0. Then vTK1v = 0, and J full rank is necessary. Now for su ciency, assume x > 0 and H is positive de nite. Following the same proof as for 3.6, J together with the two assumed conditions is su cient for K1 to be positive de nite. The case that K1 is positive de nite in the limit is a special case: if all variables x are positive, the inequality constraints are all inactive, and the only active constraints are the equality constraints. This means that the solution is the same as the solution to an equality constrained QP, which is much simpler to solve. The conditions of H positive de nite and J full rank are also strong conditions, and generally, K1 is singular at the limit. 3.1.2 Inertia Given the conditions for nonsingularity as listed in the previous section, the inertia of K1 is always (m; 0; 0). The inertia is of interest in optimization algorithms since at a local min- imum, the problem is locally convex. Thus many methods working with nonconvex problems control the inertia to ensure the correct properties dur- ing the iterations; see [16]. 283.1. Normal equations 3.1.3 Eigenvalue bounds Bounds on the eigenvalues of K1 can be easily found. Theorem 3.8. The eigenvalues of K1 are bounded in the interval 2m max(H +X 1Z) ; 21 min(H +X 1Z) : Proof. The eigenvalue equation for K1 is J(H +X 1Z) 1JT v = v; and multiplying by vT gives vTJ(H +X 1Z) 1JT v = kvk2: (3.5) To nd the upper bound, we can bound by the largest eigenvalue of (H + X 1Z) 1 as follows max((H +X 1Z) 1)kJT vk2 kvk2: Now using (3.4) and the properties on the eigenvalues of an inverse of a matrix, we have 21 min(H +X 1Z) kvk2 kvk2: If v = 0, then it is trivial and is therefore not an eigenvector. Thus v 6= 0, and we can divide by kvk2 to get the upper bound. To nd the lower bound, we start from (3.5). Bounding by the smallest eigenvalue of (H +X 1Z) 1 gives min((H +X 1Z) 1)kJT vk2 kvk2: Using (3.4) and the properties on the eigenvalues of an inverse of a matrix, we have 2m max(H +X 1Z) kvk2 kvk2; and dividing by kvk2 gives the bound. 293.1. Normal equations 3.1.4 Condition numbers Using the bounds on eigenvalues from the previous section, we can nd a bound on the condition number of K1. Theorem 3.9. The condition number of K1 is bounded by (K1) (J) 2 (H +X 1Z): Proof. Using the eigenvalue bounds and the de nition of condition number, we nd (K1) 21 min(H+X 1Z) 2m max(H+X 1Z) ; = 21 2m max(H +X 1Z) min(H +X 1Z) ; = (J)2 (H +X 1Z): We can nd an asymptotic bound on this condition number as follows. If our solution (x; y; z) exactly solves the relaxed complementarity condition xizi = , then we have X 1Z = X 2. Assuming that at least one element of x goes to zero and does so at the same rate as ! 0, which is typically the case under our assumptions [52, Theorem 6.8], we have the asymptotic estimates min(H +X 1Z) n + ; max(H +X 1Z) 1 + 1= 1= : (3.6) Using these, asymptotic bounds on the eigenvalues of K1 are . . 1=( n + ): The condition number is then O( 1 ( n+ )), which simpli es to (K1) . ( 1 ; if n > 0 1 2 ; if n = 0: (3.7) The estimate for n > 0 is in line with those of [50], who assumes that a second-order su ciency condition holds. The asymptotic bound hides a factor of (J), however we focus on the e ect of , which changes throughout the iterations. 303.2. Saddle-point form 3.2 Saddle-point form We now move on to examine the saddle-point form, the matrix K2. This formulation is widely used and advocated, and has the advantage of the vast analysis on saddle-point matrices in general. We specialize these to nd spectral properties of K2. 3.2.1 Nonsingularity We can easily observe the conditions for nonsingularity of K2, since it is a standard saddle-point matrix and there is existing analysis of these matrices. Theorem 3.10. The matrix K2 is nonsingular during the iterations if and only if J has full rank. Proof. Noting that H +X 1Z is positive de nite, this follows directly from [4, Theorem 3.1]. Similarly to the normal equations, the case that K2 is nonsingular at the limit is a special case, requiring the variables x > 0. Theorem 3.11. The matrix K2 is nonsingular at the limit of the iteration if and only if x > 0, J has full rank, and Null(H) \ Null(J) = f0g. Proof. If any element of x = 0, then the matrix X is singular and K2 is as well. Thus x > 0 is necessary. Assuming x > 0, by complementarity, z = 0, and thus X 1Z = 0. Using [4, Theorem 3.2], J has full rank and Null(H) \ Null(J) = f0g are necessary and su cient. 3.2.2 Inertia Assuming the conditions for nonsingularity given previously, the inertia of K2 is (n;m; 0). 3.2.3 Eigenvalue bounds Eigenvalue bounds for a saddle-point matrix are given in [46]; the results here directly follow these. The (1; 1) block of K2 is H+X 1Z and is strictly positive de nite during the iterations; note also that X and Z change at each iteration, so these bounds will change at each iteration as well. 313.2. Saddle-point form Theorem 3.12. The eigenvalues of K2 are bounded by the following: 1 2 min(H +X 1Z) q min(H +X 1Z)2 + 4 21 ; 1 2 max(H +X 1Z) p max(H +X 1Z)2 + 4 2m ; for negative eigenvalues , and min(H +X 1Z); 1 2 max(H +X 1Z) + q max(H +X 1Z)2 + 4 21 ; for positive eigenvalues . 3.2.4 Condition numbers Using the bounds from the previous section, we can form an asymptotic bound on the condition number. Theorem 3.13. The condition number of K2 is bounded asymptotically by (K2) . 1 2 : Proof. Using the situation where the extremal eigenvalues of H+X 1Z are approximated by (3.6), we obtain the asymptotic estimates . . 1= for the positive eigenvalues, and 1 . . for the negative eigenvalues. The asymptotic bounds on the negative eigen- values are simpli ed from Taylor expansions. Using these bounds on the eigenvalues gives the condition number. Remark. The lower bound on the negative eigenvalues is nite, and thus the remaining three bounds are responsible for the ill-conditioning of K2. When n = 0, K2 has the same asymptotic conditioning as K1, but when n > 0, it appears that K2 has worse asymptotic conditioning than K1, due to the upper bound on the negative eigenvalues. 323.3. Unreduced 3-by-3 form 3.3 Unreduced 3-by-3 form We now cover the properties of the matrix K3, providing several new results. 3.3.1 Nonsingularity We consider the conditions for nonsingularity of K3 during the iterations. Theorem 3.14. The matrix K3 is nonsingular during the iterations if and only if J has full rank. Proof. Assume that there exists a nontrivial nullspace vector (u; v; w), that is, 0 @ H JT I J 0 0 Z 0 X 1 A 0 @ u v w 1 A = 0 @ 0 0 0 1 A : (3.8) Solving the third block row of (3.8) for w, which is permissible since X is positive de nite during the iteration, yields w = X 1Zu. Taking the inner product of the rst block row with u and substituting for w gives uT (H +X 1Z)u+ (Ju)T v = 0: The second block row is Ju = 0, so using this we reduce to uT (H +X 1Z)u = 0: (3.9) Since X and Z are diagonal and strictly positive throughout the iteration, H + X 1Z is positive de nite. Thus (3.9) has only the trivial solution u = 0, which implies w = 0. It follows from the rst block row of (3.8) that JT v = 0. If J has full rank, this implies v = 0. If J does not have full row rank, (u; v; w) with v 2 Null(JT ), v 6= 0, is a nontrivial null vector. Therefore the condition of J full rank is both necessary and su cient for K3 to be nonsingular during the iteration. Remark. It is su cient for nonsingularity of K3 to assume that H is pos- itive semide nite on the nullspace of J only, since then (3.9) still implies that u = 0. However, in this case (2.1) is no longer a convex QP and the duality relationship with (2.5) does not hold. This de niteness assumption is common in literature; see [26]. We now consider the limit of the iteration. 333.3. Unreduced 3-by-3 form Theorem 3.15. The matrix K3 is nonsingular at the limit of the iterations if and only if the solution (x; y; z) is strictly complementary, Null(H) \ Null(J)\Null(Z) = f0g, and the linear independence constraint quali cation (LICQ) is satis ed. Proof. If (x; y; z) is not strictly complementary, there is a zero row in the third block row of (3.8) and K3 is singular. Therefore, strict complemen- tarity is necessary. If Null(H) \ Null(J) \ Null(Z) 6= f0g, take u 2 Null(H) \ Null(J) \ Null(Z), u 6= 0, v = 0 and w = 0. Since Hu = Ju = Zu = 0, it follows from (3.8) that (u; v; w) is a nontrivial null vector of K3. Thus this condition is necessary. Now, assume strict complementarity and Null(H)\ Null(J)\ Null(Z) = f0g, and suppose (u; v; w) is in the nullspace of K3. Since zI = 0 at the solution, the third block row of (3.8) and strict complementarity yield uA = 0 and wI = 0. Therefore, uTw = 0. Taking the inner product of the rst block row of (3.8) with u and substituting Ju = 0 from the second block row gives uTHu = 0, thus u 2 Null(H). Noting that Ju = 0 is equivalent to saying u 2 Null(J), and uA = 0 is equivalent to u 2 Null(Z), combining all conditions on u gives u 2 Null(H) \ Null(J) \ Null(Z), which implies that u = 0 by the assumption. Eliminating u and wI from (3.8), we have JT IA v wA = 0; which has only the trivial solution (v; wA) = (0; 0) if and only if the LICQ holds. Thus the three conditions together are su cient, and the LICQ is necessary. Remark. The condition that Null(H) \ Null(J) \ Null(Z) = f0g cannot be veri ed before a solution is found, but Null(H)\Null(J) = f0g is a su cient condition. Comparing the requirements for nonsingularity ofK1, K2, andK3, we see that during the iterations, all are nonsingular when J has full rank. However, at the limit, K1 and K2 are singular unless x > 0, or all the inequality con- straints are inactive, together with other conditions on J and H. Contrast this with the requirements for nonsingularity of K3. Strict complementarity is a common assumption for the solution algorithms, and we have already detailed how this requirement can be dealt with. Null(H) \ Null(J) = f0g is another common assumption for optimization algorithms. Finally, the LICQ is also a common assumption for analysis of algorithms, though it is 343.3. Unreduced 3-by-3 form too strong for use in practice. We shall see that when some of these condi- tions do not hold, regularization alleviates the di culties. Numerical results also indicate that even when all forms of the system are singular in the limit and the matrices of the iteration become increasingly ill-conditioned, the conditioning of K3 is superior. 3.3.2 Inertia To work with the eigenvalues and inertia of K3, a symmetric matrix is required to guarantee that the eigenvalues are real. By using a similarity transformation with the diagonal matrix D de ned by D = 0 @ I 0 0 0 I 0 0 0 Z 1 2 1 A ; (3.1) can be symmetrized to give 0 B @ H JT Z 1 2 J 0 0 Z 1 2 0 X 1 C A 0 @ x y f z 1 A = 0 @ rd rp Z 1 2 rc 1 A ; (3.10) where f z = Z 1 2 z. The matrix from this system is K^3 := D 1K3D = 0 B @ H JT Z 1 2 J 0 0 Z 1 2 0 X 1 C A : We nd results on the inertia of K^3 both during the iterations and in the limit. We rst need the following result. Lemma 3.16 ([26], Lemma 3:1). Let B := 0 @ 0 0 It 0 S 0 It 0 0 1 A for a arbitrary square matrix S and size t identity matrix It. Then B has t eigenvalues 1, t eigenvalues 1, and the remaining eigenvalues the same as those of S. Theorem 3.17. Assume that J has full rank and Null(H)\ Null(J) = f0g. Then the inertia of K^3 during the iterations is (n; n+m; 0). 353.3. Unreduced 3-by-3 form Proof. Let the matrix N be an orthonormal nullspace basis matrix for J . Since Null(H) \ Null(J) = f0g, NTHN is positive de nite. The follow- ing construction follows the proof of [26, Lemma 3.2]. Complete the basis N so that Y N is orthogonal. Then J Y N = L 0 , where L is nonsingular. De ne M1 := 0 @ Y N 0 0 0 0 I 0 0 0 0 I 1 A : Then we have MT1 K^3M1 = 0 B B B @ Y THY Y THN LT Y TZ 1 2 NTHY NTHN 0 NTZ 1 2 L 0 0 0 Z 1 2Y Z 1 2N 0 X 1 C C C A : Let M2 := 0 B B @ In m 0 0 0 0 Im 0 0 12L TY THY L TY THN L T 0 0 0 0 I 1 C C A : Then MT2 M T 1 K^3M1M2 = 0 B B B @ 0 0 Im Y TZ 1 2 0 NTHN 0 NTZ 1 2 Im 0 0 0 Z 1 2Y Z 1 2N 0 X 1 C C C A : Finally, let M3 := 0 B B @ I 0 0 0 0 I 0 R 0 0 I S 0 0 0 I 1 C C A ; where R := (NTHN) 1NTZ 1 2 and S := Y TZ 1 2 . The product M := M1M2M3 is nonsingular, and since K^3 is nonsingular by Theorem 3.14, the matrix MT K^3M = 0 B B @ 0 0 Im 0 0 NTHN 0 0 Im 0 0 0 0 0 0 G 1 C C A ; (3.11) 363.3. Unreduced 3-by-3 form withG = X Z 1 2N(NTHN) 1NTZ 1 2 , is also nonsingular. From the pre- viously noted Lemma 3.16, Sylvester’s law of inertia, and the fact that G is negative de nite, we obtain that K^3 has n + m negative and n positive eigenvalues. This result holds in the limit. Theorem 3.18. Assume that Null(H) \ Null(J) = f0g and (x; y; z) is a solution of (2.1) and (2.5) where strict complementarity and the LICQ are satis ed. Then the inertia of K^3 is (n; n+m; 0). Proof. Following the proof of Theorem 3.17, (3.11) still holds. G is clearly at least negative semide nite, and since M is nonsingular by the proof of Theorem 3.17 and K^3 is nonsingular by Theorem 3.15, G is in fact negative de nite. Therefore K^3 has n+m negative and n positive eigenvalues. Remark. In fact, the assumption of the LICQ can be weakened to assume only that J is full rank, and G can be proven negative de nite via strict complementarity. The inertia of K^3 is the same during the iterations and in the limit with conditions su cient for nonsingularity. This is in contrast to the cases for K1 and K2. Theorems 3.17 and 3.18 also hold for inde nite H, as long as the condition of positive de niteness on the nullspace of J is satis ed. 3.3.3 Eigenvalue bounds We now nd bounds on the eigenvalues of the symmetric inde nite matrix K^3, which given the similarity transformation will also apply to K3. We assume that the conditions of Theorem 3.15 hold, so K^3 and K3 are nonsin- gular throughout the iterations and at the limit. Our technique uses energy estimates in the style of [46]. The eigenvalue problem for K^3 is formulated as 0 B @ H JT Z 1 2 J 0 0 Z 1 2 0 X 1 C A 0 @ u v w 1 A = 0 @ u v w 1 A : (3.12) Theorem 3.19. The positive eigenvalues of K^3, and thus also K3, are bounded in min j 1 2 n xj + q ( n + xj)2 + 4zj ; 1 2 1 + q 21 + 4( 2 1 + zmax) : 373.3. Unreduced 3-by-3 form The lower positive bound in Theorem (3.19) is useful in the case that n = 0, but in the case where H is strictly positive de nite, there is an alternative uniform lower bound. Corollary 3.20. A uniform lower bound for the positive eigenvalues of K^3 is given by n. The following proof covers both the theorem and corollary. Proof. We proceed one bound at a time, using energy estimates and estab- lishing the desired inequalities. First we nd the upper bound. Since and x are both positive, the matrix I + X is nonsingular. We can then solve for w in the third block row of (3.12) to get w = ( I +X) 1Z 1 2u: Substituting into the rst block row of (3.12), we obtain Hu+ JT v + Z 1 2 ( I +X) 1Z 1 2u = u: Taking the inner product with u, and noting that the matrices Z 1 2 and ( I+X) 1 are diagonal and therefore commute, gives the following equation for : uTHu+ uTJT v + uT ( I +X) 1Zu = kuk2: (3.13) Solving for v in the second block row of (3.12) gives v = 1 Ju, which we substitute into (3.13) to get uTHu+ 1 kJuk2 + uT ( I +X) 1Zu = kuk2: (3.14) Now we bound terms in (3.14). We use (3.2) and (3.3) to bound the rst and second terms in (3.14), giving 1kuk 2 + 1 21kuk 2 + uT ( I +X) 1Zu kuk2: Since the matrix ( I + X) 1Z is diagonal, we may bound the remaining term on the left with the maximum of the diagonal elements, leaving 1 + 1 21 + maxi zi + xi kuk2 kuk2: This maximum term can be bounded as follows max i zi + xi zmax ; 383.3. Unreduced 3-by-3 form where zmax indicates the maximum value of zi at the current iterate. This bound becomes tight as the iterations proceed, since generally at least one xi ! 0. On the other hand, if this does not occur, then all zi = 0 at the limit and therefore zmax = 0. We have 1 + 1 21 + zmax kuk2 kuk2: Multiplying by and rearranging gives 2 1 ( 2 1 + zmax) kuk2 0: The vector u must be nonzero. If u = 0 then the second block row of (3.12) implies v = 0, and since is strictly positive, this gives v = 0. The rst block row then yields Z 1 2w = 0 and w = 0, which is a contradiction since an eigenvector must be nontrivial. We can therefore divide by kuk2 and bound by the larger root of the quadratic to obtain 1 2 1 + q 21 + 4( 2 1 + zmax) ; (3.15) giving an upper bound. Next, we nd a lower bound on . Taking the inner product of v with the second block row of (3.12), we have vTJu uTJT v = kvk2; which we substitute into (3.13) to give uTHu+ kvk2 + uT ( I +X) 1Zu = kuk2: Using (3.2), we have nkuk 2 + kvk2 + uT ( I +X) 1Zu kuk2: Bounding the last term on the left with a minimum of the diagonal elements, this becomes nkuk 2 + kvk2 + min i zi + xi kuk2 kuk2: (3.16) This minimum will occur for some index j, and we then multiply by + xj and rearrange into ( 2 + (xj n) ( nxj + zj))kuk 2 kvk2 0: 393.3. Unreduced 3-by-3 form Since again u 6= 0, we then bound by the positive root of the quadratic, giving 1 2 n xj + q ( n + xj)2 + 4zj : Taking the minimum over all j gives the bound. To nd the uniform lower bound given in the corollary, we begin from (3.16) and bound the minimum term below by zero, since the terms are all positive. Then rearranging we have ( n)kuk 2 kvk2 0; which since u 6= 0 gives the bound n: We now consider bounds on the negative eigenvalues, but are only able to nd an e ective lower negative bound. Theorem 3.21. Assume that I + X is nonsingular for all < 0 in the spectrum of K^3. The negative eigenvalues of K^3, and thus also K3, are bounded in [ ; 0), where := min 1 2 n q 2n + 4 2 1 ; min fjj +xj<0g j and j is the smallest negative root of the cubic equation 3 + (xj n) 2 ( 21 + zj + xj n) 2 1xj = 0: Proof. Proceeding as in the proof of Theorem 3.19, we start from (3.14) with the bounds in (3.2) and (3.3) to get nkuk 2 + 1 21kuk 2 + uT ( I +X) 1Zu kuk2: Bounding the last term of the left-hand side by the minimum, n + 1 21 + mini zi + xi kuk2 kuk2: (3.17) We now need to consider two cases. In case one, + xi > 0 for all indices i, and in this case we can bound the minimum term from below by zero. In case two, some +xi < 0, and there exists an index j such that mini zi +xi = zj +xj . 403.3. Unreduced 3-by-3 form Beginning with case one, we start from (3.17) and bound the minimum term below by zero, giving ( n + 1 21)kuk 2 kuk2; which we can multiply by and rearrange to give ( 2 n 2 1)kuk 2 0: Since u 6= 0, we can divide by kuk2 and bound by the root of the quadratic to get the bound 1 2 n q 2n + 4 2 1 : For case two, we use the index j for the minimum in (3.17) to get n + 1 21 + zj + xj kuk2 kuk2: Multiplying by ( + xj) > 0 and rearranging, we get 3 + (xj n) 2 ( 21 + zj + xj n) 2 1xj kuk2 0: Since u 6= 0, we can divide by kuk2 and de ne j to be the smallest root of the cubic 3 + (xj n) 2 ( 21 + zj + xj n) 2 1xj = 0: Evaluating this cubic at = 0 gives 21xj < 0; so there must be at least one positive real root because the cubic increases to in nity as increases. Looking at the derivative of the cubic and evaluating again at = 0 we have 21 zj xj n: Since the slope of the cubic for large negative would be positive, this indi- cates that there are either two negative roots or a pair of complex conjugate roots. If no cubic o ers a real negative root, this implies that case one ap- plies. The bound is given by the minimum of these possible bounds from cases one and two. Remark. It is not possible in practice to know which indices j satisfy +xj < 0, but in computations, j can be computed for all indices j. Regarding the unsatisfying upper negative bound, the bounds in Theo- rems 3.19 and 3.21 are pessimistic, as we shall see in the discussion of the condition number next. Regularization, covered in the next chapter, will allow more de nite conclusions, including a nonzero upper negative bound. 413.3. Unreduced 3-by-3 form 3.3.4 Condition numbers Since we have not been able to nd an upper bound for the negative eigen- values, we cannot nd an estimate of the condition number. However, we can make some general statements on the conditioning of K3. Under the assumptions of Theorems 3.14 and 3.15, K^3 is nonsingular and converges to a well-de ned limit as ! 0. Therefore, its condition number is asymptot- ically uniformly bounded independently of , which is not re ected by the bounds on the eigenvalues. In practice, we have observed that the condi- tioning of K^3 and K3 are typically substantially better than that of K1 or K2. One potential explanation for this behaviour is that using K3 avoids division by elements of X, avoiding the large roundo errors that can occur with such divisions. Some observations on the conditioning of these matrices have been previously made; see Chapter 1. 42Chapter 4 Regularization and the properties of matrices As seen in Chapter 3, there are numerical di culties with K1 and K2, and while K3 has nite outer bounds and less restrictive conditions for nonsin- gularity, some conditions are di cult to verify before solving. Additionally, the lack of an upper negative bound on the eigenvalues means that we have no estimate for the condition number. To alleviate these di culties, we consider a regularized version of (2.1). There are several ways to regular- ize: primal regularization alleviates ill-conditioning of the Hessian, while dual regularization alleviates ill-conditioning of the Jacobian. Details of several regularization approaches can be found in [20]. We focus on the two-parameter approach of [20] which applies both primal and dual regu- larization. For parameters > 0 and > 0, the regularized primal QP is minimize x;r cTx+ 1 2 xTHx+ 1 2 kx xkk2 + 1 2 kr + ykk2 subject to Jx+ r = b; x 0: (4.1) In this formulation, xk and yk are the current iterates for x and y respec- tively. The dual problem corresponding to (4.1) is maximize x;y;z;s bT y 1 2 xTHx 1 2 ky ykk2 1 2 ks+ xkk2 subject to Hx+ JT y + z s = c; z 0: (4.2) By setting = = 0, the original primal-dual pair (2.1) and (2.5) is recov- ered. Given this modi ed primal dual-pair, we can proceed with an interior- point method. The KKT conditions with a modi ed complementarity con- 43Chapter 4. Regularization and the properties of matrices dition, similarly to (2.8), are F (x; y; z; r; s) = 0 B B B B @ c+Hx+ s JT y z (r + yk) y x (s+ xk) b Jx r XZe+ e 1 C C C C A = 0: An interior point-method for these problems is proposed in [20] that con- verges under standard conditions with either xed or decreasing regulariza- tion parameters. The 3-by-3 system to solve at each iteration, after elimi- nating the variables s and r and substituting the current iterates for each variable, is 0 @ H + I JT I J I 0 Z 0 X 1 A 0 @ x y z 1 A = 0 @ rd rp rc 1 A : The vectors on the right hand side are de ned, depending on the context, as in (2.9) or (2.10). The matrix of this system is K3;reg = 0 @ H + I JT I J I 0 Z 0 X 1 A : Using Gaussian elimination, we can eliminate z, resulting in the system H +X 1Z + I JT J I x y = rd X 1rc rp : Then z can be recovered by computing z = X 1(rc + Z x). The matrix from this system is K2;reg = H +X 1Z + I JT J I : One more step of Gaussian elimination can be performed to eliminate x, resulting in the system J(H +X 1Z + I) 1JT + I y = J(H +X 1Z + I) 1(rd X 1rc) rp : The remaining variables can be recovered via z = X 1(rc + Z x) and x = (H + X 1Z + I) 1(JT y + rd X 1rc). The matrix from this system is K1;reg = J(H +X 1Z + I) 1JT + I : 444.1. Normal equations 4.1 Normal equations We rst consider the matrix K1;reg, the regularized analogue to K1. 4.1.1 Nonsingularity First we observe the conditions for nonsingularity of K1;reg, both during the iterations and in the limit. Theorem 4.1. For > 0, the matrix K1;reg is unconditionally positive de nite during the iterations. Proof. Since x is strictly positive during the iterations, H + X 1Z + I is positive de nite, and J(H+X 1Z+ I) 1JT is positive semide nite. Thus for > 0, K1;reg is positive de nite. Theorem 4.2. For ; > 0, the matrix K1;reg is positive de nite at the limit of the iteration if and only if x > 0. Proof. If any element of x = 0, then the matrix X is singular and K1;reg is as well. Thus x > 0 is necessary. Assume x > 0. Then by complementarity, z = 0, and thus X 1Z = 0. Then H+X 1Z+ I = H+ I is positive de nite, and thus K1;reg is positive de nite. Comparing to the unregularized case, we note that during the iterations, J is no longer required to be full rank, and in the limit the requirements of H positive de nite and J full rank are not needed. 4.1.2 Inertia Given the conditions for nonsingularity as listed in the previous section, the inertia of K1;reg is always (m; 0; 0). Both K1;reg and K1 are thus strictly positive de nite, but under weaker assumptions for K1;reg. 4.1.3 Eigenvalue bounds Bounds on the eigenvalues of K1;reg can be easily found. Theorem 4.3. The eigenvalues of K1;reg are bounded in the interval 2m max(H +X 1Z + I) + ; 21 min(H +X 1Z + I) + : 454.1. Normal equations Proof. The eigenvalue equation for K1;reg is J(H +X 1Z + I) 1JT + I v = v; and multiplying by vT gives vT J(H +X 1Z + I) 1JT + I v = kvk2: (4.3) First to nd the upper bound, we can bound by the largest eigenvalue of (H +X 1Z + I) 1 as follows max((H +X 1Z + I) 1)kJT vk2 + kvk2 kvk2: Now using (3.4) and the properties on the eigenvalues of an inverse of a matrix, we have 21 min(H +X 1Z + I) + I kvk2 kvk2: If v = 0, then it is trivial and thus not an eigenvector. We may then divide by kvk2 to get the upper bound. To nd the lower bound, we start from (4.3). Bounding by the smallest eigenvalue of (H +X 1Z + I) 1 gives min((H +X 1Z + I) 1)kJT vk2 + kvk2 kvk2: Using (3.4) and the properties on the eigenvalues of an inverse of a matrix, we have 2m max(H +X 1Z + I) + kvk2 kvk2; and dividing by kvk2 gives the bound. 4.1.4 Condition numbers We use the eigenvalue bounds from the previous section to nd a bound on the condition number of K1;reg Theorem 4.4. The condition number of K1;reg is bounded by (K1;reg) 21 + min(H +X 1Z + I) 2m + max(H +X 1Z + I) (H +X 1Z + I): 464.2. Saddle-point form Proof. Using the eigenvalue bounds and the de nition of condition number, we nd (K1;reg) 21 min(H+X 1Z+ I) + 2m max(H+X 1Z+ I) + ; = max(H +X 1Z + I) 21 + min(H +X 1Z + I) min(H +X 1Z + I) ( 2m + max(H +X 1Z + I)) ; = 21 + min(H +X 1Z + I) 2m + max(H +X 1Z + I) (H +X 1Z + I): From this condition number bound, it it easy to see that if both and are positive, the condition number is strictly smaller than the unregularized matrix. In the case that the extremal eigenvalues of H + X 1Z can be approximated by (3.6), the eigenvalues of K1;reg are asymptotically bounded by . . 1= : Then we can nd the following asymptotic condition number (K1;reg) . 1 : (4.4) In practice, and are allowed to take values as small as p "mach. In this case, there is a de nite disadvantage to using the normal equations formulation since the condition number likely exceeds the inverse of machine precision early. Indeed, the implementation of [20] initializes = = 1 and divides both parameters by 10 at each iteration. In double precision, after just 8 iterations, the smallest allowed value of 10 8 is reached but the procedure typically has not yet converged. 4.2 Saddle-point form We now consider spectral properties of K2;reg. 4.2.1 Nonsingularity Using existing analysis for saddle-point matrices, we nd conditions for non- singularity of K2;reg. Theorem 4.5. For > 0, the matrix K2;reg is unconditionally nonsingular during the iterations. 474.2. Saddle-point form Proof. Since x is strictly positive during the iterations, H + X 1Z + I is positive de nite. Thus for > 0, nonsingularity follows directly from [4, Theorem 3.1]. Theorem 4.6. For ; > 0, the matrix K2;reg is nonsingular at the limit of the iteration if and only if x > 0. Proof. If any element of x = 0, then the matrix X is singular and K2;reg is as well. Thus x > 0 is necessary. Assume x > 0. Then by complementarity, z = 0, and thus X 1Z = 0. Then H+X 1Z+ I = H+ I is positive de nite, and again, nonsingularity follows directly from [4, Theorem 3.1]. As in the case of the normal equations, nonsingularity requires fewer assumptions than in the unregularized case. 4.2.2 Inertia Assuming the conditions for nonsingularity given previously, the inertia of K2;reg is (n;m; 0). This result can also be found in [20]. 4.2.3 Eigenvalue bounds Eigenvalue bounds for K2;reg are given in [20, Theorem 5.1] following results of [46] and [47]. Theorem 4.7 ([20]). The eigenvalues of K2;reg are bounded by the following: 1 2 min(H +X 1Z + I) q ( min(H +X 1Z + I) + )2 + 4 21 ; 1 2 max(H +X 1Z + I) p ( max(H +X 1Z + I) + )2 + 4 2m ; 484.2. Saddle-point form for negative eigenvalues , and min(H +X 1Z + I); 1 2 max(H +X 1Z + I) + q ( max(H +X 1Z + I) + )2 + 4 21 ; for positive eigenvalues. Additionally, is an eigenvalue of K2;reg if and only if J is rank de cient. Remark. Using a Taylor expansion on the upper negative bound gives 1 2 max(H +X 1Z + I) p ( max(H +X 1Z + I) + )2 + 4 2m ; 1 2 max(H +X 1Z + I) ( max(H +X 1Z + I) + ) 1 + 2 2m ( max(H +X 1Z + I) + )2 ; = 2 2m ( max(H +X 1Z + I) + ) ; ; thus all negative eigenvalues are bounded above by . The upper negative bound of approximately is the only place in which the bounds of the regularized version di er substantially from the unregularized. This will play a role in the conditioning of the matrix, seen next. 4.2.4 Condition numbers Using the bounds from the previous section, we can nd an asymptotic bound on the condition number. Theorem 4.8. The condition number of K2;reg is bounded asymptotically by (K2;reg) . 1 min( ; ) : 494.3. Unreduced 3-by-3 form Proof. Using the situation where the extremal eigenvalues of H+X 1Z are approximated by (3.6), we obtain the asymptotic estimates . . max(H +X 1Z + I) 1= for the positive eigenvalues, and 1 . . for the negative eigenvalues. Using these asymptotic bounds on the eigen- values gives the condition number. As in the unregularized case, the lower negative bound is nite, and thus the remaining three bounds are responsible for the condition number. The limits of machine precision, given the common bounds on and , are not achieved until reaches p "mach, which typically occurs in the last few iterations. 4.3 Unreduced 3-by-3 form We now turn to K3;reg, providing new results on the spectral properties. 4.3.1 Nonsingularity We start by stating necessary and su cient conditions for nonsingularity during the iterations. Lemma 4.9. For > 0, the matrix K3;reg given is nonsingular throughout the interior-point iterations. Proof. Looking at the system 0 @ H + I JT I J I 0 Z 0 X 1 A 0 @ u v w 1 A = 0 @ 0 0 0 1 A ; (4.5) we attempt to nd a nontrivial solution, that is, a nonzero element (u; v; w) in the nullspace of K3;reg. From the third block row, we have w = X 1Zu. Taking the inner product of the rst block row with u and substituting for w yields uT (H + I +X 1Z)u+ (Ju)T v = 0; 504.3. Unreduced 3-by-3 form Using using v = 1 Ju from the second block row, this simpli es to uT (H + I +X 1Z) + 1 JTJ u = 0: Since X and Z are diagonal and positive de nite, and the remaining matrices are positive semide nite, the matrix associated with the above equation is positive de nite for any 0. Therefore, the only solution is u = 0, which implies v = 0 and w = 0. Lemma 4.10. For = 0, the matrix K3;reg is nonsingular throughout the interior-point iterations if and only if J has full rank. Proof. Assume that (u; v; w) lies in the nullspace of K3;reg. The only di er- ence with the proof of Lemma 4.9 is that Ju = 0. Following the same steps, we obtain uT H + I +X 1Z u = 0: Again, the matrix is positive de nite, giving u = 0 as the only solution, and this implies that w = 0. The rst block row leaves us with JT v = 0, which implies v = 0 if J has full rank. Therefore the full rank condition for J is a su cient condition for nonsingularity of K3;reg. If J does not have full row rank, then (u; v; w) with v 2 Null(JT ), v 6= 0, u = 0, w = 0 is a nontrivial nullspace vector. Therefore the full-rank condition on J is both necessary and su cient. We collect Lemmas 4.9 and 4.10 into a uni ed theorem. Theorem 4.11. Let 0. The matrix K3;reg is nonsingular throughout the interior-point iterations if and only if either > 0, or = 0 and J has full rank. We now consider what happens to K3;reg in the limit of the interior-point iteration. If (x; y; z) is not strictly complementary, there is a zero row in the third block row of (4.5) and K3;reg is singular. Thus strict complementarity is necessary for nonsingularity in each case. The proof of each lemma in this section attempts to nd a nontrivial nullspace element of K3;reg. Lemma 4.12. For > 0 and > 0, K3;reg is nonsingular at the limit of the interior-point iteration if and only if (x; y; z) is strictly complementary. Proof. Since zI = 0 at the solution, the third block row of (4.5) and strict complementarity yield uA = 0 and wI = 0, and therefore, uTw = 0. We 514.3. Unreduced 3-by-3 form take the inner product of the rst block row with u, substitute v = 1 Ju from the second block row and use uTw = 0 to get uT H + I + 1 JTJ u = 0: Since > 0, the coe cient matrix is positive de nite. Therefore, u = 0. It follows that v = 0, and the rst block row leaves w = 0. Therefore K3;reg is nonsingular, and strict complementarity is su cient. Lemma 4.13. For > 0 and = 0, K3;reg is nonsingular at the limit of the interior-point iteration if and only if the solution (x; y; z) is strictly complementary and the LICQ is satis ed. Proof. The only di erence with the proof of Lemma 4.12 is that Ju = 0. Following the same steps, we obtain uT (H+ I)u = 0: SinceH+ I is positive de nite, the only solution is u = 0. Examining the remaining equations, we have JT IA v wA = 0; which has only the trivial solution (v; wA) = (0; 0) if and only if the LICQ holds. Thus these two conditions together are su cient, and the LICQ is necessary, completing the proof. Lemma 4.14. For = 0 and > 0, K3;reg is nonsingular at the limit of the interior-point iteration if and only if Null(H)\ Null(J)\ Null(Z) = f0g and the solution (x; y; z) is strictly complementary. Proof. Following again the same steps as in the proofs of the previous two lemmas, taking an inner product with u, we obtain this time uT H + 1 JTJ u = 0: The coe cient matrix above is positive semide nite, and thus we must have u 2 Null(H) \ Null(J). Since uA = 0, we also have u 2 Null(Z). Thus u = 0 if Null(H) \ Null(J) \ Null(Z) = f0g, and the second block row gives v = 0. The rst block row leaves w = 0, so the only solution is the trivial solution. Thus Null(H)\Null(J)\Null(Z) = f0g together with strict complementarity are su cient for nonsingularity of K3;reg. Assume now that Null(H)\Null(J)\Null(Z) 6= f0g, then for any nonzero u 2 Null(H)\Null(J)\Null(Z), we have a nontrivial nullspace vector of the form (u; 0; 0). Therefore the condition is both necessary and su cient. 524.3. Unreduced 3-by-3 form Lemmas 4.12, 4.13, and 4.14 are all summarized in the following theorem. Also included in the theorem is the unregularized case, = = 0, which is covered in Theorem 3.15. Theorem 4.15. Necessary and su cient conditions for the matrix K3;reg to be nonsingular at the limit of the interior-point iteration are that the solution (x; y; z) be strictly complementary, Null(H) \ Null(J) \ Null(Z) 6= f0g if = 0, and the LICQ be satis ed if = 0. Table 4.1 summarizes all conditions both during the iterations and at the limit covered here and in Chapter 3. We note that regularization with positive and gives nonsingularity with no further requirements during the iterations, and that it removes all but the requirement of strict comple- mentarity for nonsingularity in the limit. Table 4.1: A summary of the necessary and su cient conditions on nonsin- gularity throughout the interior-point iterations and at the limit. > 0 = 0 > 0 IP iteration: unconditional nonsingularity (Lemma 4.9) IP iteration: unconditional nonsingularity (Lemma 4.9) at limit: (x; y; z) is strictly complementary (Lemma 4.12) at limit: Null(H) \ Null(J) \ Null(Z) = f0g, (x; y; z) is strictly complementary (Lemma 4.14) = 0 IP iteration: J has full rank (Lemma 4.10) IP iteration: J has full rank (Theorem 3.14 and Lemma 4.10) at limit: (x; y; z) is strictly complementary, and the LICQ (Lemma 4.13) at limit: Null(H) \ Null(J) \ Null(Z) = f0g, (x; y; z) is strictly complementary, and the LICQ (Theorem 3.15) 4.3.2 Inertia We again require a symmetric matrix to ensure real eigenvalues. The matrix K3;reg is symmetrizable with the diagonal matrix D used earlier in (3.10), generating the symmetric matrix K^3;reg. This allows us to consider bounds 534.3. Unreduced 3-by-3 form in real arithmetic for the latter, which will then also apply to the matrix K3;reg. The matrix K^3;reg is given by 0 B @ H + I JT Z 1 2 J I 0 Z 1 2 0 X 1 C A We nd results for the inertia of K^3;reg, showing that it is the same as in the unregularized case both during the iterations and in the limit. Theorem 4.16. For 0, > 0, assume H+ I is positive de nite. Then the inertia of K^3;reg during the iterations is (n; n+m; 0). Proof. De ning the matrix M as follows M = 0 @ I 0 0 A I 0 B 0 I 1 A ; with A := J(H+ I) 1 and B := Z 1 2 (H+ I) 1, we can decompose K^3;reg as MT K^3;regM = 0 @ H + I 0 0 0 U 0 0 0 W 1 A (4.6) where U := I J(H+ I) 1JT and W := X Z 1 2 (H+ I) 1Z 1 2 . Since H + I is positive de nite and U and W are negative de nite, the inertia of K^3;reg is (n; n+m; 0). Theorem 4.17. For 0, > 0, assume H + I is positive de nite and (x; y; z) is a solution of (4.1) and (4.2) where strict complementarity is satis ed. Then the inertia of K^3;reg is (n; n+m; 0). Proof. Following the proof of Theorem 4.16, the relation (4.6) holds. By Lemma 4.12, K^3;reg is nonsingular, and therefore since M is nonsingular, the block diagonal matrix in (4.6) is nonsingular as well. Then H + I is positive de nite and U and W are negative de nite, and the inertia of K^3;reg is (n; n+m; 0). Finally, we can also consider the case where > 0, but require only nonnegativity of . Theorem 4.18. For > 0 and 0, the inertia of K^3;reg during the iterations is (n; n+m; 0). 544.3. Unreduced 3-by-3 form Proof. We can decompose K^3;reg as 0 @ I 1 J T X 1Z 1 2 I I 1 A 0 @ A I X 1 A 0 @ I 1 J I X 1Z 1 2 I 1 A ; where A := H + I + 1 J TJ +X 1Z, which is positive de nite. The result follows. We do not include a similar result for the limit; our goal is to present the regularization with strictly positive parameters as ideal. We add that the inertia results also hold in the case where = 0, which follows from the unregularized results. 4.3.3 Eigenvalue bounds The eigenvalue problem for K^3;reg is 0 B @ H + I JT Z 1 2 J I 0 Z 1 2 0 X 1 C A 0 @ u v w 1 A = 0 @ u v w 1 A : (4.7) Our rst result provides bounds on the positive eigenvalues of K^3;reg. Theorem 4.19. The positive eigenvalues of the matrix K^3;reg are bounded in [ ; ] ; where = min j 1 2 n + xj + q ( n + + xj)2 + 4zj and is the largest root of the cubic equation 3 + ( ( 1 + )) 2 ( ( 1 + ) + 2 1 + zmax) zmax = 0: We can also nd a simpli ed uniform lower bound which is more easily computed. Corollary 4.20. A uniform lower bound for the positive eigenvalues of K^3;reg is given by 0 = n + : 554.3. Unreduced 3-by-3 form Proof. As before, we separate the discussion into the upper bound and lower bound cases. Beginning with the upper bound, we solve for w in the third block row of (4.7) to get w = ( I +X) 1Z 1 2u, which we substitute into the rst block row to obtain (H + I)u+ JT v + Z 1 2 ( I +X) 1Z 1 2u = u: Taking the inner product with u and noting that the matrices Z 1 2 and ( I+ X) 1 are diagonal and therefore commute gives the following equation for : uT (H + I)u+ uTJT v + uT ( I +X) 1Zu = kuk2: (4.8) Solving for v in the second block row of (4.7) gives v = 1 + Ju, which we substitute into (4.8) to get uT (H + I)u+ 1 + kJuk2 + uT ( I +X) 1Zu = kuk2: (4.9) We use (3.2) and (3.3) to bound the rst and second terms in (4.9): ( 1 + )kuk 2 + 21 + kuk2 + uT ( I +X) 1Zu kuk2: Since the matrix ( I + X) 1Z is diagonal, we bound by the maximum of the diagonal elements uT ( I +X) 1Zu max i zi + xi kuk2: As in the proof of Theorem 3.19, we can bound the maximum term above by zmax . We now have 1 + + 21 + + zmax kuk2 kuk2: (4.10) Multiplying by ( + ) and rearranging gives 3 + ( ( 1 + )) 2 ( 1 + ) + 2 1 + zmax zmax kuk2 0: We must have u 6= 0, since if u = 0 then the second block row of (4.7) implies that ( + )v = 0. Since K^3;reg is nonsingular, > 0 and thus + > 0, implying that v = 0. Then, by the rst block row, Z 1 2w = 0 would imply 564.3. Unreduced 3-by-3 form w = 0, which must not occur since the eigenvector must be nontrivial. We can thus divide by kuk2 and bound by the largest real root of 3 + ( ( 1 + )) 2 ( 1 + ) + 2 1 + zmax zmax = 0; yielding an upper bound. Note that there is exactly one positive real root, since the values of the cubic and its derivative are negative at zero. For the lower bound, taking the inner product of v with the second block row of (4.7) and rearranging, we have vTJu uTJT v = ( + )kvk2; which we substitute into (4.8) to give uT (H + I)u+ ( + )kvk2 + uT ( I +X) 1Zu = kuk2: Using (3.2), we have ( n + )kuk 2 + ( + )kvk2 + uT ( I +X) 1Zu kuk2: Bounding the last term on the left with a minimum, this becomes ( n + )kuk 2 + ( + )kvk2 + min i zi + xi kuk2 kuk2: (4.11) This minimum will occur for some index j, and we then multiply by + xj and rearrange into ( 2 + (xj n ) (xj( n + ) + zj))kuk 2 ( + )kvk2 0: Since again u 6= 0, we then bound by the positive root of the quadratic. Taking the minimum over all j gives the bound. Now, a uniform lower bound as given in the corollary can be found by taking zero as a lower bound for the mini zi +xi term in (4.11), giving ( n + )kuk 2 + ( + )kvk2 kuk2; which after rearranging is ( n )kuk 2 ( + )kvk2 0: Thus an alternative lower bound is n + . We begin our investigation of negative eigenvalues with an upper bound, which turns out to depend on the scaling of the problem. 574.3. Unreduced 3-by-3 form Lemma 4.21. Let 0 and > 0. Suppose that xi > for i = 1; : : : ; n. Then the negative eigenvalues of K^3;reg are bounded above by , and = is an eigenvalue if and only if J is rank de cient. Proof. First we will show the bound. Assume by contradiction that there is a negative eigenvalue that satis es > . Upon extracting v = 1 + Ju from the second block row and using the identity wTZ 1 2u = wT ( I +X)w from the third block row, taking the inner product of the rst block row with u gives uT (H + I)u+ 1 + kJuk 2 + wT ( I +X)w = kuk2: Since + > 0 by assumption and all xi > > , the left-hand side of the last identity is positive. If u = 0, then both v and w are also zero, giving a trivial eigenvector and therefore a contradiction. If u 6= 0, must be positive, which contradicts our initial assumption on the sign of . Thus the negative eigenvalues are bounded above by . Suppose J is rank de cient. Then u = 0, v 2 Null(JT ), v 6= 0 and w = 0 satis es (4.7) with = . Suppose now that J has full rank and show that 6= . By contradic- tion, assume that = . From the third block row and the assumption that all xi > , we have wTZ 1 2u = wT ( I X)w 0: Taking the inner product of the rst block row of (4.7) with u and using the above inequality and Ju = 0 from the second block row, we obtain kuk2 = uT (H + I)u uTZ 1 2w 0: Since > 0, this must mean that u = 0. The third block row then gives w = 0 and we are left with JT v = 0 in the rst block row. Since J has full row rank, we conclude that v = 0 and that = cannot be an eigenvalue. Interestingly, a similar result holds in the limit. Note however that there seems to be a transition zone between the moment where some components of x drop below and the limit when strict complementarity applies. This \grey zone" is necessarily attained if A 6= ;, and it is more di cult to characterize the relationship between and in that zone. 584.3. Unreduced 3-by-3 form Lemma 4.22. Let > 0 and > 0. Suppose that in the limit of the interior-point iterations, (x; y; z) is strictly complementary, xi > for all i 2 I, and maxi p zi is su ciently small. Then the negative eigenvalues of K^3;reg are bounded above by , and = is an eigenvalue if and only if J is rank de cient. Proof. We will rst show the bound. Assume by contradiction that there exists a negative eigenvalue that satis es > . Since K^3;reg is nonsingu- lar, there must exist > 0 such that for all negative eigenvalues. If , and the eigenvalues are bounded above by , a contradic- tion. Thus we can consider only the case where < . Suppose then that maxi p zi < . In the limit, we have xA = 0, xI > 0 and zI = 0. Because strict complementarity is satis ed, we must also have zA > 0. Partitioning the third block row of K^3;reg in (4.7) into active and inactive components gives Z 1 2 AAuA = wA; (4.12) XIIwI = wI : (4.13) We may rewrite (4.13) as (XII + I)wI = 0, which implies wI = 0 because xi + > xi > 0 for all i 2 I by assumption. Taking the inner product of both sides of (4.12) with wA gives wTAZ 1 2 AAuA = kwAk 2: (4.14) Taking now the inner product of the rst block row of (4.7) with u, the inner product of the second block row with v and combining, we may write kuIk 2 = uT (H + I)u+ ( + )kvk2 + (kwAk 2 kuAk 2); (4.15) where we used the decomposition kuk2 = kuAk2 + kuIk2 and (4.14). Note that from (4.12), uA = 0 if and only if wA = 0. If both vanish, necessarily uI 6= 0, and then the right hand side of (4.15) is strictly positive. This implies that > 0, a contradiction. Suppose now that uA 6= 0 and wA 6= 0. Rearranging (4.12) we nd that wA = 1 Z 1 2 AAuA, and using the upper bounds maxi p zi < and we have kwAk2 2 2 kuAk 2 kuAk2. Substituting into (4.15) gives kuIk 2 uT (H + I)u+ ( + )kvk2: Then the right hand side is strictly positive, and if uI 6= 0, > 0, a con- tradiction. If uI = 0, then uA = 0, again a contradiction. Therefore, we cannot have > , and we conclude that . 594.3. Unreduced 3-by-3 form Suppose J is rank de cient. As in Lemma 4.21, (0; v; 0) with v 2 Null(JT ), v 6= 0 is an eigenvector for = . Now assume that J has full rank, and assume by contradiction that = . Partitioning as above gives Z 1 2 AAuA = wA (4.16) XIIwI = wI : (4.17) Rearranging (4.17), the matrix (XII I) has full rank by assumption, so we have wI = 0. If u = 0 we also have wA = 0 and there only remains JT v = 0 in the rst block row of (4.7), giving the trivial eigenvector and a contradiction. Thus u 6= 0. Taking the inner product of (4.16) with wA reveals that wTZ 1 2u = wTAZ 1 2 AAuA = kwAk 2: (4.18) Using the Cauchy{Schwarz inequality and the bound on maxi p zi, we have wTAZ 1 2 AAuA kwAkkuAk; and using (4.18) and rearranging gives kwAk kuAk kuAk kuk (4.19) since < . Taking the inner product of the rst block row with u, using (4.18) and Ju = 0 from the second block row, and rearranging, we have uT (H + I)u = (kwAk 2 kuk2); which reduces to uT (H + I)u 0 by (4.19). Therefore u = 0, a contradic- tion, and 6= when J is full rank. Remark. By scaling the optimization problem prior to solving, it is always possible to arrange so that the assumption on scaling of xi and zi are always satis ed. These results hold trivially when = 0. Next, we derive a lower bound on the negative eigenvalues. Theorem 4.23. Assume I + X is nonsingular for all throughout the computation. Then the negative eigenvalues of the matrix K^3;reg satisfy ; 604.3. Unreduced 3-by-3 form where := min 1 2 n + q ( n + + )2 + 4 21 ; min jj +xj<0 j ; and j is the smallest negative root of the cubic equation 3 + (xj + n ) 2 + xj ( n + ) xj( n + ) 2 1 zj xj( n + ) + 2 1xj + zj = 0: Proof. We assume that + < 0. The case where poses no di culty here since it can be easily veri ed that . We start from (4.9) with the bounds in (3.2) and (3.3) to get ( n + )kuk 2 + 1 + 21kuk 2 + uT ( I +X) 1Zu kuk2: Bounding the remaining term of the previous inequality by the minimum, we obtain n + + 1 + 21 + mini zi + xi kuk2 kuk2: (4.20) We consider two cases. In case one, + xi > 0 for all indices i, and in this case we can bound the minimum term from below by zero. In case two, some + xi < 0, and there exists an index j such that mini zi +xi = zj +xj . In case one, we bound the minimum term in (4.20) below by zero, giving n + + 1 + 21 kuk2 kuk2; (4.21) which we can multiply by ( + ) and rearrange to give 2 + ( n ) ( ( n + ) + 2 1) kuk2 0: We must have u 6= 0, since if u = 0 the second line of (3.12) implies ( + )v = 0, implying that v = 0. The rst line yields Z 1 2w = 0 and thus w = 0, a contradiction. Thus we can divide by kuk2 and bound by the negative root of the quadratic, giving 1 2 n + q ( n + + )2 + 4( 21) : 614.3. Unreduced 3-by-3 form In case 2, we use the index j for the minimum in (4.20) to get n + + 1 + 21 + zj + xj kuk2 kuk2: (4.22) Multiplying by ( + )( + xj) and rearranging, we get 3 + (xj + n ) 2 + ( xj ( n + ) xj( n + ) 2 1 zj) ( xj( n + ) + 2 1xj + zj ) kuk2 0: We note that again u 6= 0, so we can divide by kuk2 and de ne j to be the smallest root of the cubic 3 + (xj + n ) 2 + xj ( n + ) xj( n + ) 2 1 zj xj( n + ) + 2 1xj + zj = 0: Then the bound is given by the minimum of these possible bounds from cases one and two. We can compare the bounds of the regularized and unregularized 3-by- 3 systems using asymptotic techniques. For the upper positive bound, we begin with (4.10) and set = = to get 1 + + 1 (1 + ) 21 + zmax kuk2 kuk2: Using 11+ 1 for small and dividing by kuk 2 gives 1 + + 1 21(1 ) + zmax ; which after multiplying by and rearranging becomes 2 ( 1 + ) ( 2 1(1 ) + zmax) 0: This yields the bound 1 2 1 + + q ( 1 + )2 + 4( 21(1 ) + zmax) : (4.23) By comparing with the unregularized bound given in Theorem 3.19, we can see that the regularization with is approximately a shift of this bound 624.3. Unreduced 3-by-3 form away from zero by , and that the regularization with gives a small con- traction of the bound. Comparing the lower positive bounds, we see that the regularization with results in a shift away from zero by approximately . In total, the positive spectral interval is additively shifted away from zero by approximately , and contracted by a factor related to . For the upper negative bound, we note that given the scaling assumptions in Lemmas 4.21 and 4.22, these show that the e ect of regularization is to bu er the eigenvalues away from zero by . For the lower negative bound, we consider each of the two cases. For case one we begin from (4.21), divide by kuk2 and use the same asymptotic argument as above to get n + + 1 21(1 ) : Multiplying by and rearranging gives 2 ( n + ) 2 1(1 ) 0; leading to the bound 1 2 n + q ( n + )2 + 4 21(1 ) : Comparing this to the bound given in Theorem 3.21, we see that has no signi cant e ect, while the e ect of is to bring the bound closer to zero. For case two, we begin from (4.22), which we divide by kuk2 and use the same asymptotic argument as above to get n + + 2 1(1 ) + zj + xj : Multiplying by + xj , which is negative, and collecting we get 3 + (xj n ) 2 ( 21(1 ) + zj + xj( n + )) xj 2 1(1 ) 0: We now separate the original equation giving the bound in Theorem 3.21 from the and terms. The terms are 2 xj = ( + xj) and the terms are 21 + 2 1xj = 2 1( +xj). Since +xj < 0, these are both negative, so the cumulative e ect is shift the cubic polynomial down. This will shift the lower negative bound closer to zero. In total, there is no signi cant e ect from , while both shifts the interval away from zero and contracts the interval. 634.3. Unreduced 3-by-3 form 4.3.4 Condition numbers Unlike the situation for the unregularized 3-by-3 system, for the regularized system we can nd an asymptotic bound on the condition number. Theorem 4.24. The condition number of K3;reg is bounded asymptotically by (K3;reg) . ( 1 ; if n > 0 1 min( ; ) ; if n = 0: Proof. We can obtain the following asymptotic estimates for the eigenvalues: n + . 1 for the positive eigenvalues, and 1 . for the negative eigenvalues. Using these asymptotic bounds on the eigen- values gives an asymptotic bound of 1min( + n; ) , which further simpli es to the given bounds. This validates the claim that the unreduced system sees the best con- ditioning. Under the usual choices of and and unless the conditioning of the problem is such that the constants hidden in the asymptotic bound are very large, this condition number will remain within computing limits through convergence of the iteration. Our numerical experiments, covered in the next chapter, will verify that the unreduced matrices remain numerically nonsingular throughout. 64Chapter 5 Numerical experiments We o er examples to validate the analysis of previous sections. The compu- tations are done in Matlab. The code is a straightforward implementation of Algorithm 2.1, with starting point computed via Algorithm 2.2. Matlab’s built-in functions are used to compute singular values, eigenvalues, and con- dition numbers, and we use the backslash operator to solve linear systems, which implements an LU factorization and solves the system using it. The code terminates when the di erence between the primal and dual objective functions, the duality gap, is below a tolerance of 10 8. The comparisons between di erent formulations are both illustrating the di erences in the matrices themselves and the performance of the solver when using that for- mulation. Plots of eigenvalues and condition numbers are versus iteration number, and should be read left to right as progression of the interior point solver. We use solid lines for eigenvalue bounds and dots for the actual eigenvalues. Details of the problems we solve are given in Table 5.1. The problems are small so that the exact eigenvalues can be calculated, but they are representative. Table 5.1: Properties of quadratic programs solved. Problem Source n m LICQ? 1 TOMLAB 6 5 Yes 2 TOMLAB 5 3 Yes 3 TOMLAB 10 6 Yes 6 TOMLAB 10 7 Yes 8 TOMLAB 6 4 Yes 13 TOMLAB 7 5 Yes 17 TOMLAB 293 286 Yes 21 TOMLAB 51 27 No 22 TOMLAB 295 173 No We rst illustrate the eigenvalue bounds of Chapter 3. The problem 65Chapter 5. Numerical experiments we examine is a small quadratic program from the TOMLAB optimization package [1], problem number 6. At the limit of the iterations, this problem satis es the LICQ, strict complementarity, and the non-intersection of the nullspaces, and thus by Theorem 3.15 the 3-by-3 matrices will remain non- singular in the limit. These favourable properties cause the inner eigenvalue bounds to be pessimistic, since our bounds do not distinguish whether the LICQ and non-intersection of the nullspaces hold. First we consider the bounds on K1. Figure 5.1 shows the eigenvalues of K1 in log scale. The eigenvalues remain within the bounds, and the largest follow the upper bound, while the lower bound is pessimistic. The upper bound is exponentially increasing as 1= , and the lower bound is decaying like , which were the asymptotic bounds given in Chapter 3. Figure 5.1: TOMLAB problem 6: eigenvalues of K1 in log scale. 0 5 10 15 20 10−20 10−15 10−10 10−5 100 105 1010 1015 1020 iteration eigenvalue s upper positive lower positive Now for K2, Figure 5.2 shows the positive and negative eigenvalues sep- arately in log scale. These show similar trends as for K1: the upper positive and lower negative bounds appear fairly tight, while the lower positive and upper negative bounds are pessimistic. The upper positive bound increases as 1= , while the lower positive and upper negative bounds decay like , again following the asymptotic bounds given in Chapter 3. For both K1 and K2, the exponential increase of the upper positive bound, with the eigenval- ues achieving this bound, is the cause of the ill-conditioning of these matri- ces. While the inner bounds decay towards zero, the eigenvalues themselves remain well away from zero and do not contribute to the ill-conditioning. 66Chapter 5. Numerical experiments Figure 5.2: TOMLAB problem 6: eigenvalues of K2 in log scale. 0 5 10 15 20 25 10−15 10−10 10−5 100 105 1010 1015 iteration eigenvalue s upper positive lower positive (a) Positive 0 5 10 15 20 25 10−8 10−7 10−6 10−5 10−4 10−3 10−2 10−1 100 101 iteration eigenvalue s upper negative lower negative (b) Negative Figure 5.3 shows the eigenvalues of K3 in log scale. The upper positive and lower negative bounds remain xed with respect to , and these bounds are relatively tight. While the upper negative bound is zero and the lower positive bound decays like , these bounds do not re ect the eigenvalues which are bounded away from zero on both the positive and negative side. The conditioning of this matrix is thus better than that of K1 or K2: it is xed with respect to both the iteration number and . Figure 5.4 shows the same thing for the symmetric K^3, which has near identical behaviour to that of its unsymmetric counterpart. Figure 5.5 shows the condition numbers of the di erent formulations. The 3-by-3 formulations are indeed well-conditioned throughout, while the reduced forms have exponentially increasing condition numbers. Table 5.2 gives the duality gaps during the iterations, showing the progression of the iterations. Note that the 3-by-3 formulations take fewer iterations, possibly due to their better conditioning, though this needs further investigation and remains an item for future work. Next we illustrate the eigenvalue bounds of Chapter 4. We use the same problem and regularize with = = 1e 4. Figure 5.6 shows the eigenvalues of K1;reg in log scale. We see that the upper and lower bounds are now xed with respect to after the initial iterations. 67Chapter 5. Numerical experiments Figure 5.3: TOMLAB problem 6: eigenvalues of K3 in log scale. Note that the upper negative bound is zero and is not visible in this scale. 0 2 4 6 8 10 12 10−14 10−12 10−10 10−8 10−6 10−4 10−2 100 102 iteration eigenvalue s upper positive lower positive (a) Positive 0 2 4 6 8 10 12 10−1 100 101 102 103 iteration eigenvalue s upper negative lower negative (b) Negative Figure 5.4: TOMLAB problem 6: eigenvalues of K^3 in log scale. Note that the upper negative bound is zero and is not visible in this scale. 0 2 4 6 8 10 12 10−14 10−12 10−10 10−8 10−6 10−4 10−2 100 102 iteration eigenvalue s upper positive lower positive (a) Positive 0 2 4 6 8 10 12 10−1 100 101 102 103 iteration eigenvalue s upper negative lower negative (b) Negative 68Chapter 5. Numerical experiments Figure 5.5: TOMLAB problem 6: comparison of the condition numbers. 0 5 10 15 20 25 100 102 104 106 108 1010 1012 1014 1016 iteration 3x3 unsym. cond. 3x3 sym. cond. 2x2 cond. 1x1 cond. Figure 5.6: TOMLAB problem 6: eigenvalues of K1;reg in log scale with = = 1e 4. 0 5 10 15 20 10−4 10−3 10−2 10−1 100 101 102 103 104 105 iteration eigenvalue s upper positive lower positive 69Chapter 5. Numerical experiments Table 5.2: TOMLAB problem 6: progression of the interior-point solver. k Duality gap K3 solver K^3 solver K2 solver K1 solver 0 5.031373e+04 5.031373e+04 5.031373e+04 5.031373e+04 1 1.490725e+04 1.490725e+04 1.490725e+04 2.671911e+04 2 5.199114e+04 5.199114e+04 2.783944e+04 1.239206e+04 3 2.079359e+03 2.079359e+03 3.336749e+04 6.535199e+03 4 1.300615e+02 1.300615e+02 7.693309e+03 3.759075e+02 5 6.930406e+00 6.930406e+00 1.623930e+02 3.068804e+01 6 3.483175e 01 3.483175e 01 1.709920e+02 7.616401e 01 7 1.741601e 02 1.741601e 02 4.382071e+00 1.220000e 01 8 8.708002e 04 8.708002e 04 1.536128e+00 1.252850e 02 9 4.354001e 05 4.354001e 05 6.165176e 01 1.859958e 03 10 2.177003e 06 2.177003e 06 1.815162e 01 6.841660e 04 11 1.088520e 07 1.088520e 07 3.393544e 02 1.158979e 04 12 5.442416e 09 5.442416e 09 6.215981e 03 2.299556e 05 13 1.618046e 03 3.688372e 06 14 2.389091e 04 5.986658e 07 15 9.084266e 05 8.586721e 08 16 2.157252e 05 1.209264e 08 17 6.648293e 06 1.524313e 09 18 1.296226e 06 19 2.493653e 07 20 4.124377e 08 21 5.205948e 09 70Chapter 5. Numerical experiments Figure 5.7 shows the eigenvalues of K2;reg in log scale. While the inner bounds are now well away from zero, the upper positive bound is still in- creasing exponentially with 1= , and the eigenvalues have not substantially changed. Figure 5.7: TOMLAB problem 6: eigenvalues of K2;reg in log scale with = = 1e 4. 0 5 10 15 20 25 10−5 10−4 10−3 10−2 10−1 100 101 iteration eigenvalue s upper negative lower negative Figure 5.8 shows the eigenvalues of K3;reg in log scale, and Figure 5.9 shows the same thing for K^3;reg. The lower positive bound is now well away from zero, as given in Theorem 4.19, and there is an upper negative bound at , as given in Lemma 4.21. While these bounds are helpful for analytical purposes, the eigenvalues themselves have not substantially changed from the unregularized case due to the favourable properties of this problem. Table 5.3 gives the duality gaps during the iterations, showing that the 3-by-3 forms take fewer iterations but with little di erence to the unregularized case. Figure 5.10 shows the e ect on the condition numbers of varying the regularization parameters, again using the same problem. The 3-by-3 for- mulations are unchanged over di erent choices of regularization parameters, and referring back to Figure 5.5, the condition numbers are still unchanged. The properties of this problem do not require regularization for the 3-by-3 formulations and accordingly, regularization has little e ect on the matrix. Referring back to Theorem 4.24, the upper bound on the condition num- bers is O(1= min( ; )) when n = 0, as it is here, but in this case this 71Chapter 5. Numerical experiments Figure 5.8: TOMLAB problem 6: eigenvalues of K3;reg in log scale with = = 1e 4. 0 2 4 6 8 10 12 10−4 10−3 10−2 10−1 100 101 102 iteration eigenvalue s upper positive lower positive (a) Positive 0 2 4 6 8 10 12 10−4 10−3 10−2 10−1 100 101 102 103 iteration eigenvalue s upper negative lower negative (b) Negative Figure 5.9: TOMLAB problem 6: eigenvalues of K^3;reg in log scale with = = 1e 4. 0 2 4 6 8 10 12 10−4 10−3 10−2 10−1 100 101 102 iteration eigenvalue s upper positive lower positive (a) Positive 0 2 4 6 8 10 12 10−4 10−3 10−2 10−1 100 101 102 103 iteration eigenvalue s upper negative lower negative (b) Negative 72Chapter 5. Numerical experiments Table 5.3: TOMLAB problem 6: progression of the interior-point solver regularized with = = 1e 4. k Duality gap K3 solver K^3 solver K2 solver K1 solver 0 5.031373e+04 5.031373e+04 5.031373e+04 5.031373e+04 1 1.492640e+04 1.492640e+04 1.492640e+04 2.673851e+04 2 5.159215e+04 5.159215e+04 2.746570e+04 1.239648e+04 3 2.051814e+03 2.051814e+03 3.295581e+04 6.519909e+03 4 1.306938e+02 1.306938e+02 7.512121e+03 3.701845e+02 5 6.966969e+00 6.966969e+00 1.848690e+02 2.742238e+01 6 3.505399e 01 3.505399e 01 1.716993e+02 5.221952e 01 7 1.755095e 02 1.755095e 02 4.548437e+00 2.666012e 02 8 8.787479e 04 8.787479e 04 1.526718e+00 1.887891e 02 9 4.399788e 05 4.399788e 05 5.909084e 01 3.894197e 03 10 2.202938e 06 2.202938e 06 1.736246e 01 8.123547e 04 11 1.102999e 07 1.102999e 07 3.174431e 02 1.487522e 04 12 5.522452e 09 5.522452e 09 5.825992e 03 2.498628e 05 13 1.521274e 03 4.084137e 06 14 2.248358e 04 6.214614e 07 15 8.643147e 05 8.887218e 08 16 2.060630e 05 1.224180e 08 17 6.363942e 06 1.531589e 09 18 1.241999e 06 19 2.390589e 07 20 3.948298e 08 21 4.994945e 09 73Chapter 5. Numerical experiments bound is an overestimate since the condition numbers are xed with respect to the regularization parameters. The 2-by-2 saddle-point problem is also unchanged over the di erent regularization parameters, but for a di erent reason: the regularization cannot x the conditioning of this formulation. Looking at Theorems 3.13 and 4.8, we see that the condition number for the unregularized matrix is O( 2), while for the regularized matrix it is O(1=( min( ; )). However, the regularized case is as poor as the unregu- larized, suggesting that at least the unregularized bound is an overestimate. The 1-by-1 is interesting; the condition number increases to a certain point, then levels o . The condition number is O(1=( )), given in (4.4), so us- ing xed regularization parameters has this e ect. The approximate point of levelling o appears to be closer to O(1=min( ; )), suggesting again an overestimate of the asymptotic bound. Common practice is to decrease the regularization parameters at each iteration, which would lead to the condi- tion number of the 1-by-1 form continuing to increase instead of levelling o . Overall, the condition numbers of the 3-by-3 formulations are small and xed both as the iteration progresses and for di erent choices of the parame- ters, while the condition numbers of the 2-by-2 and 1-by-1 are exponentially increasing and sensitive to choice of parameters respectively. We now show an example which does not satisfy the LICQ, but which still does have strict complementarity. Thus the requirements of Theorem 3.15 fail, but the requirements of Lemma 4.12 are satis ed and the 3-by-3 ma- trices will be nonsingular when regularized. We use problem 21 from the TOMLAB package, another small quadratic program. The inner eigenvalue bounds are more representative of the actual eigenvalues for this problem because the LICQ is not satis ed. Figure 5.11 shows the eigenvalues of K1 in log scale. The eigenvalues follow the trends of both bounds, the upper bound increasing as 1= and the lower bound decaying as , though neither bound is quite tight. Contrast this to the case for problem 6, where the eigenvalues only followed the upper bound. Figure 5.12 shows the eigenvalues of K2 in log scale. The upper positive bound increases as 1= and is tight, and the lower positive and upper nega- tive bounds decay as , with the eigenvalues following these trends. This is in contrast to the case for problem 6, where the eigenvalues were well away from zero on both the positive and negative sides. Figure 5.13 shows the eigenvalues of K3 in log scale. The upper positive and lower negative bounds are xed and tight, while the lower positive bound decays as and the upper negative bound is zero. We see for this problem that the eigenvalues are also tending towards zero, in contrast to problem 74Chapter 5. Numerical experiments Figure 5.10: TOMLAB problem 6: comparing the condition numbers over di erent regularization parameters. 0 5 10 15 20 25 100 102 104 106 108 1010 1012 1014 iteration 3x3 unsym. cond. 3x3 sym. cond. 2x2 cond. 1x1 cond. (a) = = 1e 2 0 5 10 15 20 25 100 102 104 106 108 1010 1012 1014 iteration 3x3 unsym. cond. 3x3 sym. cond. 2x2 cond. 1x1 cond. (b) = = 1e 4 0 5 10 15 20 25 100 102 104 106 108 1010 1012 1014 iteration 3x3 unsym. cond. 3x3 sym. cond. 2x2 cond. 1x1 cond. (c) = = 1e 6 0 5 10 15 20 25 100 102 104 106 108 1010 1012 1014 iteration 3x3 unsym. cond. 3x3 sym. cond. 2x2 cond. 1x1 cond. (d) = = 1e 8 75Chapter 5. Numerical experiments Figure 5.11: TOMLAB problem 21: eigenvalues of K1 in log scale. 0 5 10 15 20 10−15 10−10 10−5 100 105 1010 1015 1020 iteration eigenvalue s upper positive lower positive Figure 5.12: TOMLAB problem 21: eigenvalues of K2 in log scale. 0 5 10 15 10−20 10−15 10−10 10−5 100 105 1010 1015 iteration eigenvalue s upper positive lower positive (a) Positive 0 5 10 15 10−12 10−10 10−8 10−6 10−4 10−2 100 102 iteration eigenvalue s upper negative lower negative (b) Negative 76Chapter 5. Numerical experiments 6. The cumulative e ect is that K3 has better conditioning than K1 or K2 which have poor conditioning due to both the large and small eigenvalues, though all matrices are numerically singular. Figure 5.14 shows the same thing for K^3. Figure 5.13: TOMLAB problem 21: eigenvalues of K3 in log scale. Note that the upper negative bound is zero and is not visible in this scale. 0 2 4 6 8 10 12 14 10−16 10−14 10−12 10−10 10−8 10−6 10−4 10−2 100 102 iteration eigenvalue s upper positive lower positive (a) Positive 0 2 4 6 8 10 12 14 10−12 10−10 10−8 10−6 10−4 10−2 100 102 104 iteration eigenvalue s upper negative lower negative (b) Negative Figure 5.15 shows the condition numbers of the di erent formulations, illustrating that all formulations are singular in the limit, but that the 3-by-3 formulations are better conditioned. Note that although the condition num- bers of the 1-by-1 system and both 3-by-3 systems are very similar through the rst 15 iterations, the method with the 1-by-1 has not yet converged and requires several more iterations, at which point the condition number is orders of magnitude higher. The small uctuation in the condition numbers for the 1-by-1 system in the nal iterations is likely a numerical artefact of the condition number computation for very ill-conditioned matrices. Ta- ble 5.4 gives the duality gaps during the iterations. Note again that the 3-by-3 formulations take fewer iterations. We now regularize the same problem with = = 1e 4. Figure 5.16 shows the eigenvalues of K1;reg in log scale. Both bounds are now xed with respect to after the initial iterations, similarly to the case for problem 6. Figure 5.17 shows the eigenvalues of K2;reg in log scale. While the in- 77Chapter 5. Numerical experiments Figure 5.14: TOMLAB problem 21: eigenvalues of K^3 in log scale. Note that the upper negative bound is zero and is not visible in this scale. 0 2 4 6 8 10 12 14 10−16 10−14 10−12 10−10 10−8 10−6 10−4 10−2 100 102 iteration eigenvalue s upper positive lower positive (a) Positive 0 2 4 6 8 10 12 14 10−12 10−10 10−8 10−6 10−4 10−2 100 102 104 iteration eigenvalue s upper negative lower negative (b) Negative Figure 5.15: TOMLAB problem 21: comparing the condition numbers. 0 5 10 15 20 100 105 1010 1015 1020 1025 iteration 3x3 unsym. cond. 3x3 sym. cond. 2x2 cond. 1x1 cond. 78Chapter 5. Numerical experiments Table 5.4: TOMLAB problem 21: progression of the interior-point solver. k Duality gap K3 solver K^3 solver K2 solver K1 solver 0 4.503941e+06 4.503941e+06 4.503941e+06 4.503941e+06 1 1.777060e+05 1.777060e+05 1.777060e+05 4.727740e+05 2 6.685252e+04 6.685252e+04 6.430173e+04 8.839079e+04 3 2.302182e+03 2.302182e+03 2.664655e+03 4.533996e+04 4 1.884457e+02 1.884457e+02 1.413989e+02 4.737956e+03 5 2.327962e+01 2.327962e+01 1.488893e+01 6.459162e+02 6 3.770574e+00 3.770574e+00 1.531868e+00 5.649864e+01 7 7.890978e 01 7.890978e 01 4.724062e 01 1.068655e+01 8 4.111731e 02 4.111731e 02 3.333264e 02 1.561047e+00 9 2.053638e 03 2.053638e 03 3.364520e 03 4.095004e 01 10 1.026760e 04 1.026760e 04 3.064170e 04 7.294324e 02 11 5.133785e 06 5.133785e 06 3.251172e 05 1.476766e 02 12 2.566892e 07 2.566892e 08 2.923329e 06 2.762271e 03 13 1.283446e 08 1.283446e 08 2.661271e 07 4.647201e 04 14 6.417226e 10 6.417226e 10 2.343025e 08 7.562990e 05 15 1.825309e 09 1.052725e 05 16 1.495218e 06 17 1.736741e 07 18 2.168710e 08 19 2.136175e 09 79Chapter 5. Numerical experiments Figure 5.16: TOMLAB problem 21: eigenvalues of K1;reg in log scale with = = 1e 4. 0 5 10 15 20 10−4 10−2 100 102 104 106 iteration eigenvalue s upper positive lower positive ner bounds are now well away from zero, the upper positive bound is still increasing exponentially, and all bounds are tight. Figure 5.18 shows the eigenvalues of K3;reg in log scale, and Figure 5.19 shows the same thing for K^3;reg. The lower positive bound is now well away from zero, and there is an upper negative bound at . Note that at iteration 9, the magnitude of the smallest negative eigenvalues drops below , and this is exactly the point when the smallest xi drops below . This illustrates the \grey zone" when the conditions of Lemma 4.21 no longer hold, but Lemma 4.22 does not yet apply. Note that despite these eigenvalues dropping below the bound, they are still well away from zero. Table 5.5 gives the duality gaps during the iterations. Figure 5.20 shows the e ect on the condition numbers of varying the reg- ularization parameters, again using the same problem. The conditioning of the 2-by-2 system is still poor, with only slight variations depending on the parameters. The 1-by-1 and 3-by-3 systems have good conditioning, with the condition number increasing with smaller regularization parameters. As in the unregularized case, the 1-by-1 system and both 3-by-3 systems have similar condition numbers for several iterations, but again, the 1-by-1 re- quires more iterations and the nal condition numbers are larger. Note that the iteration counts for the 3-by-3 systems are smallest for the intermediate values of the regularization parameters, showing the tradeo 80Chapter 5. Numerical experiments Figure 5.17: TOMLAB problem 21: eigenvalues of K2;reg in log scale with = = 1e 4. 0 5 10 15 10−4 10−2 100 102 104 106 108 1010 1012 1014 iteration eigenvalue s upper positive lower positive (a) Positive 0 5 10 15 10−4 10−3 10−2 10−1 100 101 iteration eigenvalue s upper negative lower negative (b) Negative Figure 5.18: TOMLAB problem 21: eigenvalues of K3;reg in log scale with = = 1e 4. 0 2 4 6 8 10 12 14 10−4 10−3 10−2 10−1 100 101 102 iteration eigenvalue s upper positive lower positive (a) Positive 0 2 4 6 8 10 12 14 10−6 10−5 10−4 10−3 10−2 10−1 100 101 102 103 iteration eigenvalue s upper negative lower negative (b) Negative 81Chapter 5. Numerical experiments Table 5.5: TOMLAB problem 21: progression of the interior-point solver regularized with = =1e 4. k Duality gap K3 solver K^3 solver K2 solver K1 solver 0 4.503941e+06 4.503941e+06 4.503941e+06 4.503941e+06 1 1.774202e+05 1.774202e+05 1.774202e+05 4.763510e+05 2 6.696263e+04 6.696263e+04 6.445851e+04 8.848551e+04 3 2.287257e+03 2.287257e+03 2.643388e+03 4.634337e+04 4 1.856561e+02 1.856561e+02 1.401005e+02 4.814498e+03 5 2.244920e+01 2.244920e+01 1.433592e+01 6.538652e+02 6 3.377959e+00 3.377959e+00 1.350548e+00 5.169495e+01 7 7.869090e 01 7.869090e 01 6.385468e 01 9.709657e+00 8 3.175784e 02 3.175784e 02 3.318791e 02 1.234782e+00 9 1.187212e 03 1.187212e 03 4.503474e 03 3.603788e 01 10 3.992364e 05 3.992364e 05 5.549683e 04 6.298637e 02 11 1.107967e 06 1.107967e 06 6.684694e 05 1.371323e 02 12 1.547644e 08 1.547644e 08 7.343561e 06 2.562025e 03 13 9.907045e 10 9.907045e 10 7.320559e 07 4.360643e 04 14 7.355794e 08 7.058248e 05 15 6.148318e 09 1.005925e 05 16 1.422312e 06 17 1.726436e 07 18 2.153875e 08 19 2.267023e 09 82Chapter 5. Numerical experiments Figure 5.19: TOMLAB problem 21: eigenvalues of K^3;reg in log scale with = = 1e 4. 0 2 4 6 8 10 12 14 10−4 10−3 10−2 10−1 100 101 102 iteration eigenvalue s upper positive lower positive (a) Positive 0 2 4 6 8 10 12 14 10−6 10−5 10−4 10−3 10−2 10−1 100 101 102 103 iteration eigenvalue s upper negative lower negative (b) Negative in regularization: larger parameters give more stabilization, but at the cost of changing the problem signi cantly. Finally, we consider the iteration counts for di erent problems with and without regularization. Tables 5.6 and 5.7 show the iteration counts for all the problems listed in Table 5.1 with 5 di erent choices of regularization parameters. The notation \|" indicates a problem which did not converge in the 300 maximum iterations, and the notation \*" indicates a problem which blew up to in nity or NaN. We observe that the iteration counts for the 3-by-3 formulations are either similar to or substantially better than the iteration counts for the reduced formulations. Note the restrictions on regularization parameters in, for example, problem 17: for = = 1e 2 the method blows up, for = = 1e 4 the iteration counts are large, and for = = 1e 6 and = = 1e 8 the iteration counts are the same as without regularization. 83Chapter 5. Numerical experiments Figure 5.20: TOMLAB problem 21: comparing the condition numbers over di erent regularization parameters. 0 5 10 15 20 25 100 105 1010 1015 1020 1025 iteration 3x3 unsym. cond. 3x3 sym. cond. 2x2 cond. 1x1 cond. (a) = = 1e 2 0 5 10 15 20 102 104 106 108 1010 1012 1014 1016 1018 iteration 3x3 unsym. cond. 3x3 sym. cond. 2x2 cond. 1x1 cond. (b) = = 1e 4 0 5 10 15 20 102 104 106 108 1010 1012 1014 1016 1018 iteration 3x3 unsym. cond. 3x3 sym. cond. 2x2 cond. 1x1 cond. (c) = = 1e 6 0 5 10 15 20 100 105 1010 1015 1020 1025 iteration 3x3 unsym. cond. 3x3 sym. cond. 2x2 cond. 1x1 cond. (d) = = 1e 8 84Chapter 5. Numerical experiments Table 5.6: Iteration counts for di erent problems, part 1. A problem that did not converge is noted by \|", and a problem that blew up is noted by \*". Problem Regularization Matrix formulation in solver parameters K3 K^3 K2 K1 1 = = 0 9 9 9 9 = = 1e 2 8 8 8 8 = = 1e 4 9 9 9 9 = = 1e 6 9 9 9 9 = = 1e 8 9 9 9 9 2 = = 0 10 10 9 9 = = 1e 2 11 11 11 12 = = 1e 4 10 10 10 10 = = 1e 6 10 10 9 9 = = 1e 8 10 10 9 9 3 = = 0 11 11 26 16 = = 1e 2 28 28 | 26 = = 1e 4 11 11 24 15 = = 1e 6 11 11 26 16 = = 1e 8 11 11 26 16 6 = = 0 12 12 21 17 = = 1e 2 17 17 23 18 = = 1e 4 12 12 21 17 = = 1e 6 12 12 21 17 = = 1e 8 12 12 21 17 8 = = 0 11 11 11 11 = = 1e 2 11 11 11 11 = = 1e 4 11 11 11 11 = = 1e 6 11 11 11 11 = = 1e 8 11 11 11 11 85Chapter 5. Numerical experiments Table 5.7: Iteration counts for di erent problems, part 2. A problem that did not converge is noted by \|", and a problem that blew up is noted by \*". Problem Regularization Matrix formulation in solver parameters K3 K^3 K2 K1 13 = = 0 10 10 10 13 = = 1e 2 11 11 11 13 = = 1e 4 10 10 10 13 = = 1e 6 10 10 10 13 = = 1e 8 10 10 10 13 17 = = 0 16 16 18 25 = = 1e 2 * | * * = = 1e 4 41 43 40 40 = = 1e 6 16 16 18 25 = = 1e 8 16 16 18 25 21 = = 0 14 14 15 19 = = 1e 2 24 24 21 25 = = 1e 4 13 13 15 19 = = 1e 6 14 14 15 19 = = 1e 8 14 14 15 19 22 = = 0 24 24 33 32 = = 1e 2 | | | | = = 1e 4 | | | | = = 1e 6 28 28 31 35 = = 1e 8 24 24 32 32 86Chapter 6 Conclusions We have found analytical results for spectral properties of matrices arising from primal-dual interior-point methods for convex quadratic programs. We began by considering the original, unregularized problem (2.1). We consid- ered four formulations of the Newton system occurring from the interior- point method for this problem: the original 3-by-3 unsymmetric form, the symmetrized 3-by-3 form, the 2-by-2 saddle-point form, and the 1-by-1 nor- mal equations form. For each of these formulations, we have provided results on the conditions for nonsingularity, inertia, bounds on the eigenvalues, and condition number. The matrices for all formulations are nonsingular dur- ing the iterations if J has full rank, but other properties vary. When we are exactly on the central path and the asymptotic estimates of (3.6) hold, we can nd asymptotic estimates of the condition numbers in terms of , the duality measure. For the normal equations, we saw that the matrix is generally singular in the limit, and that the condition number through the iteration is O( 2). For the saddle-point form, the matrix is again generally singular in the limit, and the condition number is O( 2). The unsym- metric and symmetric 3-by-3 forms have the same spectral properties since the symmetrization is via a similarity transformation, and they are non- singular in the limit assuming strict complementarity, the LICQ, and that Null(H) \ Null(J) \ Null(Z) = f0g. These conditions are much less restric- tive than those for the reduced formulations. We found inertia results on these matrices both during the iterations and in the limit. We provided up- per and lower bounds on the positive eigenvalues, and a lower bound on the negative eigenvalues. For this unregularized case, we were not able to nd a nonzero upper bound on the negative eigenvalues, and thus were not able to nd a bound on the condition number, but the results on nonsingularity indicate that the condition number is uniformly bounded independently of . This alone is stronger than the results on the condition number of the reduced formulations. We next considered the regularized problem (4.1) and considered the same four formulations of the Newton system mentioned above. Again, the matrices for all formulations share the same requirements for nonsin- 87Chapter 6. Conclusions gularity during the iteration; when the regularization parameters and are positive, the matrices are unconditionally nonsingular. For the normal equations, the matrix is generally singular in the limit, and the condition number is O(1=( )). If this bound on the condition number were achieved, the conditioning for the regularized 1-by-1 form would be worse than the unregularized form. For the saddle-point form, we again generally have sin- gularity at the limit, and the condition number is O(1=( min( ; ))). The 3-by-3 forms require only strict complementarity for nonsingularity in the limit, and the use of a partially reduced formulation will allow nonstrict complementarity. We have found bounds on the eigenvalues, including a nonzero upper bound for the negative eigenvalues, in contrast to the unreg- ularized case where the bound was simply zero. We have also shown that the regularization parameters both shift the spectral intervals away from zero and contract the intervals. Finally, we have shown that the condition number is O(1= ) when n > 0 and O(1=min( ; )) when n = 0, which indicates that for common choices of and the regularized matrices will be numerically nonsingular throughout the iterations. We illustrated the analytical results with numerical examples, and while the inner eigenvalue bounds were pessimistic for some problems, the outer bounds were generally tight. The 3-by-3 formulations were superior in con- ditioning for both unregularized and regularized matrices. In preliminary experiments, the performance of the method using the 3-by-3 formulations in terms of iteration counts was equivalent or superior to those using the 2-by-2 or 1-by-1 formulations. Overall, we have presented a thorough analysis of the spectral properties of a range of formulations of the Newton systems. We have shown that the regularized 3-by-3 systems have the most favourable spectral properties and require only strict complementarity to ensure nonsingularity. Based on this analysis, we suggest that the regularized 3-by-3 systems be considered to solve the Newton systems in these methods. This is in contrast to the current state of the eld, where the normal equations or saddle-point forms seem to be used exclusively. The ill-conditioning of these systems is well-known, but it has been argued that it does not impact the performance of the method when using direct solvers; this is often known as \benign" ill- conditioning. However, the increase in availability of computing power has led to larger problems being solved, and these large problems will necessitate the use of iterative solvers for these linear systems, where spectral properties become crucial. The natural extension to this work is thus to consider the performance of interior-point methods with iterative linear solvers. This would include anal- 88Chapter 6. Conclusions ysis of the clustering of eigenvalues for all formulations, with the goal of then nding e ective preconditioners for each formulation. There is much exist- ing work for preconditioning saddle-point systems, both for speci c problems and a few general black-box preconditioners. Development of precondition- ers for the 3-by-3 formulations may prove di cult, particularly in a general case where there is no speci c information about the structure or properties of H and J . However, the favourable conditioning of the regularized 3-by-3 forms may allow direct application of iterative solvers with either no pre- conditioner or a very simple preconditioner. One key point for the use of iterative solvers is that they use only matrix-vector products, and thus the increased size of the 3-by-3 forms is not a di culty compared to the 2-by-2 forms since it is only in diagonal matrices. Another extension to this work would be to consider spectral analysis of the matrices arising from interior-point methods for other classes of op- timization problems. This could include nonconvex quadratic programs, general nonlinear programs, and semide nite programs. While a few of our results apply directly to nonconvex quadratic programs, extension to these classes of problems will require further analysis. Semide nite programming in particular will require careful analysis since the matrices X and Z are no longer diagonal; instead they are restricted to be positive semide nite, but are allowed to be dense. 89Bibliography [1] The TOMLAB optimization environment. http://tomopt.com/ tomlab/. Accessed July 16, 2012. [2] A. Altman and J. Gondzio. An e cient implementation of a higher or- der primal-dual interior point method for large sparse linear programs. Arch. Control Sci., 2(1-2):23{40 (1994), 1993. ISSN 0004-072X. [3] P. Armand and J. Benoist. Uniform boundedness of the inverse of a jacobian matrix arising in regularized interior-point methods. Math. Programming, pages 1{6, 2011. [4] M. Benzi, G. H. Golub, and J. Liesen. Numerical solution of sad- dle point problems. Acta Numerica, 14:1{137, 2005. doi: 10.1017/ S0962492904000212. [5] M. Benzi, E. Haber, and L. Taralli. Multilevel algorithms for large-scale interior point methods. SIAM Journal on Scienti c Computing, 31(6): 4152{4175, 2009. doi: 10.1137/060650799. [6] S. Boyd and L. Vandenberghe. Convex optimization. Cambridge Uni- versity Press, Cambridge, 2004. ISBN 0-521-83378-7. [7] R. H. Byrd, M. E. Hribar, and J. Nocedal. An interior point method for large scale nonlinear programming. SIAM Journal on Optimization, 9:877{900, 1999. [8] R. H. Byrd, J.-Ch. Gilbert, and J. Nocedal. A trust region method based on interior point techniques for nonlinear programming. Mathe- matical Programming, Series A, 89(1):149{185, 2000. [9] Z. Coulibaly, N. I. M. Gould, and D. Orban. Fast local convergence of interior-point methods in the absence of strict complementarity. Cahier du GERAD G-2012-xx, 2012. In preparation. 90Bibliography [10] R. Courant. Variational methods for the solution of problems of equi- librium and vibrations. Bull. Amer. Math. Soc., 49:1{23, 1943. ISSN 0002-9904. [11] J. Czyzyk, S. Mehrotra, M. Wagner, and S. J. Wright. PCx: an interior- point code for linear programming. Optim. Methods Softw., 11/12(1- 4):397{430, 1999. ISSN 1055-6788. doi: 10.1080/10556789908805757. Interior point methods. [12] G. B. Dantzig. Maximization of a linear function of variables subject to linear inequalities. In Activity Analysis of Production and Allocation, Cowles Commission Monograph No. 13, pages 339{347. John Wiley & Sons Inc., New York, N. Y., 1951. [13] A. V. Fiacco and G. P. McCormick. Extensions of SUMT for nonlinear programming: Equality constraints and extrapolation. Management Sci., 12:816{828, 1966. ISSN 0025-1909. [14] A. V. Fiacco and G. P. McCormick. The slacked unconstrained mini- mization technique for convex programming. SIAM J. Appl. Math., 15: 505{515, 1967. ISSN 0036-1399. [15] A. V. Fiacco and G. P. McCormick. Nonlinear programming, volume 4 of Classics in Applied Mathematics. Society for Industrial and Applied Mathematics (SIAM), Philadelphia, PA, second edition, 1990. ISBN 0- 89871-254-8. doi: 10.1137/1.9781611971316. Sequential unconstrained minimization techniques. [16] A. Forsgren. Inertia-controlling factorizations for optimization algo- rithms. Applied Numerical Mathematics, 43(1-2):91{107, 2002. doi: 10.1016/S0168-9274(02)00119-8. 19th Dundee Biennial Conference on Numerical Analysis (2001). [17] A. Forsgren and P. E. Gill. Primal-dual interior methods for nonconvex nonlinear programming. SIAM Journal on Optimization, 8(4):1132{ 1152, 1998. doi: 10.1137/S1052623496305560. [18] A. Forsgren, P. E. Gill, and M. H. Wright. Interior methods for non- linear optimization. SIAM Rev., 44(4):525{597 (2003), 2002. ISSN 0036-1445. doi: 10.1137/S0036144502414942. [19] R. Fourer and S. Mehrotra. Solving symmetric inde nite systems in an interior-point method for linear programming. Mathematical Program- ming, 62(1):15{39, 1993. doi: 10.1007/BF01585158. 91Bibliography [20] M. P. Friedlander and D. Orban. A primal-dual regularized interior- point method for convex quadratic programs. Mathematical Program- ming Computation, 4(1):71{107, 2012. doi: s12532-012-0035-2. [21] E. M. Gertz and S. J. Wright. Object-oriented software for quadratic programming. ACM Trans. Math. Software, 29(1):58{81, 2003. ISSN 0098-3500. doi: 10.1145/641876.641880. [22] P. E. Gill, W. Murray, M. A. Saunders, J. A. Tomlin, and M. H. Wright. On projected Newton barrier methods for linear programming and an equivalence to Karmarkar’s projective method. Math. Programming, 36 (2):183{209, 1986. ISSN 0025-5610. doi: 10.1007/BF02592025. [23] P. E. Gill, W. Murray, D. B. Poncele on, and M. A. Saunders. Pre- conditioners for inde nite systems arising in optimization. SIAM Jour- nal on Matrix Analysis and Applications, 13(1):292{311, 1992. doi: 10.1137/0613022. [24] A. J. Goldman and A. W. Tucker. Theory of linear programming. In Linear inequalities and related systems, Annals of Mathematics Studies, no. 38, pages 53{97. Princeton University Press, Princeton, N.J., 1956. [25] C. C. Gonzaga. Path-following methods for linear programming. SIAM Rev., 34(2):167{224, 1992. ISSN 0036-1445. doi: 10.1137/1034048. [26] N. I. M. Gould. On practical conditions for the existence and uniqueness of solutions to the general equality quadratic-programming problem. Mathematical Programming, 32(1):90{99, 1985. [27] N. I. M. Gould. On the accurate determination of search directions for simple di erentiable penalty functions. IMA J. Numer. Anal., 6(3): 357{372, 1986. ISSN 0272-4979. doi: 10.1093/imanum/6.3.357. [28] N. I. M. Gould and V. Simoncini. Spectral analysis of saddle point ma- trices with inde nite leading blocks. SIAM Journal on Matrix Analysis and Applications, 31(3):1152{1171, 2009. doi: 10.1137/080733413. [29] R. L. Graham, D. E. Knuth, and O. Patashnik. Concrete mathematics. Addison-Wesley Publishing Company, Reading, MA, second edition, 1994. ISBN 0-201-55802-5. A foundation for computer science. [30] I. Griva, S. G. Nash, and A. Sofer. Linear and nonlinear optimization. Society for Industrial and Applied Mathematics (SIAM), Philadelphia, 92Bibliography PA, second edition, 2009. ISBN 978-0-898716-61-0. doi: 10.1137/1. 9780898717730. [31] L. G. Ha cijan. A polynomial algorithm in linear programming. Dokl. Akad. Nauk SSSR, 244(5):1093{1096, 1979. ISSN 0002-3264. English translation: Soviet Math. Dokl., 20(1):191{194, 1979. [32] S. Kapoor and P. M. Vaidya. Fast algorithms for convex quadratic pro- gramming and multicommodity ows. In Proceedings of the eighteenth annual ACM symposium on Theory of computing, STOC ’86, pages 147{159, 1986. ISBN 0-89791-193-8. [33] N. Karmarkar. A new polynomial-time algorithm for linear program- ming. Combinatorica, 4(4):373{395, 1984. ISSN 0209-9683. doi: 10.1007/BF02579150. [34] M. Kojima, S. Mizuno, and A. Yoshise. A primal-dual interior point algorithm for linear programming. In Progress in mathematical pro- gramming (Paci c Grove, CA, 1987), pages 29{47. Springer, New York, 1989. [35] M. Kojima, S. Mizuno, and A. Yoshise. A polynomial-time algorithm for a class of linear complementarity problems. Math. Programming, 44 (1, (Ser. A)):1{26, 1989. ISSN 0025-5610. doi: 10.1007/BF01587074. [36] M. Kojima, N. Megiddo, and S. Mizuno. A primal-dual infeasible- interior-point algorithm for linear programming. Math. Program- ming, 61(3, Ser. A):263{280, 1993. ISSN 0025-5610. doi: 10.1007/ BF01582151. [37] J. Korzak. Eigenvalue relations and conditions of matrices arising in linear programming. Computing, 62(1):45{54, 1999. doi: 10.1007/ s006070050012. [38] L. McLinden. An analogue of Moreau’s proximation theorem, with ap- plication to the nonlinear complementarity problem. Paci c J. Math., 88(1):101{161, 1980. ISSN 0030-8730. [39] N. Megiddo. Pathways to the optimal set in linear programming. In Progress in mathematical programming (Paci c Grove, CA, 1987), pages 131{158. Springer, New York, 1989. 93Bibliography [40] S. Mehrotra. On the implementation of a primal-dual interior point method. SIAM J. Optim., 2(4):575{601, 1992. ISSN 1052-6234. doi: 10.1137/0802028. [41] S. Mehrotra and J. Sun. An algorithm for convex quadratic program- ming that requires O(n3:5L) arithmetic operations. Math. Oper. Res., 15(2):342{363, 1990. ISSN 0364-765X. doi: 10.1287/moor.15.2.342. [42] S. Mizuno, M. J. Todd, and Y. Ye. On adaptive-step primal-dual interior-point algorithms for linear programming. Math. Oper. Res., 18(4):964{981, 1993. ISSN 0364-765X. doi: 10.1287/moor.18.4.964. [43] R. D. C. Monteiro and S. J. Wright. Local convergence of interior-point algorithms for degenerate monotone LCP. Comput. Optim. Appl., 3(2): 131{155, 1994. ISSN 0926-6003. doi: 10.1007/BF01300971. [44] Y. Nesterov and A. Nemirovskii. Interior-point polynomial algo- rithms in convex programming, volume 13 of SIAM Studies in Ap- plied Mathematics. Society for Industrial and Applied Mathematics (SIAM), Philadelphia, PA, 1994. ISBN 0-89871-319-6. doi: 10.1137/1. 9781611970791. [45] J. Nocedal and S. J. Wright. Numerical optimization. Springer Series in Operations Research and Financial Engineering. Springer, New York, second edition, 2006. ISBN 978-0387-30303-1; 0-387-30303-0. [46] T. Rusten and R. Winther. A preconditioned iterative method for sad- dlepoint problems. SIAM Journal on Matrix Analysis and Applications, 13(3):887{904, 1992. doi: 10.1137/0613054. Iterative methods in nu- merical linear algebra (Copper Mountain, CO, 1990). [47] D. Silvester and A. Wathen. Fast iterative solution of stabilised Stokes systems. II. Using general block preconditioners. SIAM Journal on Numerical Analysis, 31(5):1352{1367, 1994. doi: 10.1137/0731070. [48] A. W achter and L. T. Biegler. On the implementation of an interior- point lter line-search algorithm for large-scale nonlinear programming. Mathematical Programming, Series A, 106(1):25{57, 2006. ISSN 0025- 5610. [49] M. H. Wright. Interior methods for constrained optimization. In Acta numerica, 1992, Acta Numer., pages 341{407. Cambridge Univ. Press, Cambridge, 1992. 94Bibliography [50] M. H. Wright. Some properties of the Hessian of the logarithmic barrier function. Mathematical Programming, 67(1-3):265{295, 1994. doi: 10. 1007/BF01582224. [51] M. H. Wright. The interior-point revolution in optimization: history, recent developments, and lasting consequences. Bull. Amer. Math. Soc. (N.S.), 42(1):39{56 (electronic), 2005. ISSN 0273-0979. doi: 10.1090/ S0273-0979-04-01040-7. [52] S. J. Wright. Primal-dual interior-point methods. Society for Industrial and Applied Mathematics (SIAM), Philadelphia, PA, 1997. ISBN 0- 89871-382-X. doi: 10.1137/1.9781611971453. [53] Y. Zhang. On the convergence of a class of infeasible interior-point methods for the horizontal linear complementarity problem. SIAM J. Optim., 4(1):208{227, 1994. ISSN 1052-6234. doi: 10.1137/0804012. 95
- Library Home /
- Search Collections /
- Open Collections /
- Browse Collections /
- UBC Theses and Dissertations /
- Spectral properties of matrices arising from primal-dual...
Open Collections
UBC Theses and Dissertations
Featured Collection
UBC Theses and Dissertations
Spectral properties of matrices arising from primal-dual interior-point methods for convex quadratic.. Moulding, Erin 2012-08-30
pdf
Page Metadata
Item Metadata
Title | Spectral properties of matrices arising from primal-dual interior-point methods for convex quadratic programs |
Creator |
Moulding, Erin |
Publisher | University of British Columbia |
Date Issued | 2012 |
Description | The solution of convex quadratic programs using primal-dual interior-point methods has at its core the solution of a series of linear systems, which are in practice commonly reduced by block Gaussian elimination from the original unsymmetric block 3-by-3 formulation to either a block 2-by-2 saddle-point matrix or a block 1-by-1 normal equations form. The 3-by-3 formulation can also be symmetrized with a similarity transformation if a symmetric solver is to be used. We examine whether this practice of reduction is beneficial for the solver. For each of these formulations we find analytical results for the following spectral properties: the inertia, condition number, conditions for nonsingularity, and bounds on the eigenvalues. While the reduced systems become increasingly ill-conditioned throughout the iterations except in special cases, the 3-by-3 formulations remain nonsingular and well-conditioned with only mild assumptions on the problem; with regularization the assumptions are further simplified. Numerical examples are used to support the analytical results. We conclude that the 3-by-3 formulations, unsymmetric or symmetric, unregularized or regularized, have superior spectral properties that support their use in practice. |
Genre |
Thesis/Dissertation |
Type |
Text |
Language | eng |
Date Available | 2012-08-30 |
Provider | Vancouver : University of British Columbia Library |
DOI | 10.14288/1.0073086 |
URI | http://hdl.handle.net/2429/43088 |
Degree |
Master of Science - MSc |
Program |
Mathematics |
Affiliation |
Science, Faculty of Mathematics, Department of |
Degree Grantor | University of British Columbia |
Graduation Date | 2012-11 |
Campus |
UBCV |
Scholarly Level | Graduate |
Aggregated Source Repository | DSpace |
Download
- Media
- 24-ubc_2012_fall_moulding_erin.pdf [ 1.38MB ]
- Metadata
- JSON: 24-1.0073086.json
- JSON-LD: 24-1.0073086-ld.json
- RDF/XML (Pretty): 24-1.0073086-rdf.xml
- RDF/JSON: 24-1.0073086-rdf.json
- Turtle: 24-1.0073086-turtle.txt
- N-Triples: 24-1.0073086-rdf-ntriples.txt
- Original Record: 24-1.0073086-source.json
- Full Text
- 24-1.0073086-fulltext.txt
- Citation
- 24-1.0073086.ris
Full Text
Cite
Citation Scheme:
Usage Statistics
Share
Embed
Customize your widget with the following options, then copy and paste the code below into the HTML
of your page to embed this item in your website.
<div id="ubcOpenCollectionsWidgetDisplay">
<script id="ubcOpenCollectionsWidget"
src="{[{embed.src}]}"
data-item="{[{embed.item}]}"
data-collection="{[{embed.collection}]}"
data-metadata="{[{embed.showMetadata}]}"
data-width="{[{embed.width}]}"
async >
</script>
</div>
Our image viewer uses the IIIF 2.0 standard.
To load this item in other compatible viewers, use this url:
http://iiif.library.ubc.ca/presentation/dsp.24.1-0073086/manifest