Open Collections

UBC Theses and Dissertations

UBC Theses Logo

UBC Theses and Dissertations

Augmentation preconditioning for saddle point systems arising from interior point methods Rees, Tim 2006

Your browser doesn't seem to have a PDF viewer, please download the PDF to view this item.

Item Metadata

Download

Media
831-ubc_2006-0633.pdf [ 2.99MB ]
Metadata
JSON: 831-1.0051722.json
JSON-LD: 831-1.0051722-ld.json
RDF/XML (Pretty): 831-1.0051722-rdf.xml
RDF/JSON: 831-1.0051722-rdf.json
Turtle: 831-1.0051722-turtle.txt
N-Triples: 831-1.0051722-rdf-ntriples.txt
Original Record: 831-1.0051722-source.json
Full Text
831-1.0051722-fulltext.txt
Citation
831-1.0051722.ris

Full Text

Augmentation Preconditioning for Saddle Point Systems Arising from Interior Point Methods by . Tim Rees B . A . S c , The University of British Columbia, 2004 A THESIS S U B M I T T E D IN P A R T I A L F U L F I L M E N T OF T H E R E Q U I R E M E N T S F O R T H E D E G R E E OF Master of Science The Faculty of Graduate Studies (Computer Science) The University Of British Columbia ' August 17, 2006 © Tim Rees .2006 Abstract We investigate a preconditioning technique applied to the problem of solving linear systems arising from primal-dual interior point algorithms in linear and quadratic programming. The preconditioner has the attractive property of im-proved eigenvalue clustering with increased ill-conditioning of the (1,1) block of the saddle point matrix. We demonstrate performance of the preconditioner on problems from the N E T L I B and C U T E r test suites. The numerical experi-ment's include results based on inexact inner iterations, and comparisons of the proposed techniques with constraint preconditioners. Approximations to the preconditioner are considered for systems with simple (1,1) blocks. The pre-conditioning approach is also extended to deal with stabilized systems. We show that for stabilized saddle point systems a minimum residual Krylov method will converge in just two iterations. iii Contents Abs t rac t ii Contents iii L i s t of Tables v Lis t of Figures vi Preface vii Acknowledgements viii C o - A u t h o r s h i p Statement ix 1 In t roduct ion 1 1.1 Krylov Subspace Methods 2 1.2 Preconditioning 7 1.3 Saddle Point Systems 9 2 Interior Po in t Me thods for Quadrat ic P rog ramming . . . . . . 13 2.1 Convex Quadratic Programming 13 2.2 A Generic Interior Point Method 15 2.3 Properties of the Step Matrix 18 2.4 Solving the Step Equation • 20 2.4.1 Direct Solutions 21 Contents iv 2.4.2 Iterative Solutions 22 3 Augmentation Preconditioning For Interior Point Methods 25 3.1 Introduction 25 3.2 The Preconditioning Approach 30 3.2.1 A Block Triangular Preconditioner 30 3.2.2 A Block Diagonal Preconditioner 32 3.3 Practical Considerations and Computational Cost 35 3.3.1 The Inner Iteration and Choices of the Weight Matrix . . 36 3.3.2 Dealing with a Dense Row in A 38 3.4 Numerical Experiments 41 3.5 Conclusions 50 4 Extensions of Augmentation Preconditioning 52 4.1 Preconditioning Stabilized Systems 53 4.2 Applications of the Stabilized Preconditioner 59 4.2.1 Approximately Solving Non-Stabilized Systems 59 4.2.2 A n Assisted Direct Solve 61 4.3 Approximate Augmentation Preconditioners 65 5 Conclusions ' 69 Bibliography 73 V List of Tables 3.1 Solver results using BICG-STAB. BICG-STAB error tolerance was 1.0e-02 - 43 3.2 Solver results using BICG-STAB. BICG-STAB error tolerance was 1.0e-06 44 3.3 Solver results using MINRES. MINRES error tolerance was 1.0e-02. 45 3.4 Solver results using MINRES. MINRES error tolerance was 1.0e-06. 46 3.5 Results obtained using the symmetric dense row removal scheme of Section 3.3.2. Problems solved using MINRES with error tol-erance 1.0e-05. 46 3.6 Results obtained using the symmetric dense row removal scheme of Section 3.3.2. Problems solved using MINRES with error tol-erance 1.0e-02 47 v i List of Figures 3.1 M I N R E S iteration counts wi th various W = 7 / 37 3.2 Eigenvalues of M~lA at different steps of an L P solution 48 3.3 M I N R E S iteration counts for the Q P problem " c v x q p l " , n = 1000, m i = 250. 49 3.4 M I N R E S iteration counts for the Q P problem "cvxqp2", n = 10000, m i = 2500 50 4.1 Compar ing an assisted direct solve wi th a direct solve 64 V l l Preface This thesis presents research conducted in partial fulfillment of an MSc degree requirement. The work was conducted from May 2005 to July 2006, under the guidance of my supervisor, Chen Greif. Initially, the research was geared towards interior point methods for linear programming, but we expanded our investigations to include convex quadratic programming. A student paper focussing on augmentation preconditioning for linear programming was submitted to the Copper Mountain conference on iter-ative methods, and can be found in the conference proceedings. Chapter 3 was built on the work of the student paper submission, and ex-panded to deal with quadratic programming. The chapter in this thesis is a slightly modified form of a paper submitted to the S IAM Journal on Scientific Computing, special issue on the ninth Copper mountain conference. The consistent theme of this thesis is the study and application of augmen-tation preconditioners. The process of writing this thesis and studying iter-ative methods has been an excellent learning experience that I have enjoyed immensely. viii Acknowledgements I appreciate the support I have received from many people. My family has always been supportive of my educational pursuits, and I am grateful for that. I am especially thankful to my parents and my wife, Aya. Their continuous encouragement has helped keep me motivated. I am also grateful for the excellent guidance I have received from my su-pervisor, Chen Greif. He has given me exposure to a wide range of interesting problems, and his enthusiasm is contagious. I look forward to future collabora-tions with him. Jim Varah has also been of great assistance. He-kindly agreed to be a reader of the thesis and provided many excellent suggestions. I am also grateful to Uri Ascher for providing his MATLAB code which served as a starting point for our implementation. Co-Authorship Statement Chapter 3 is a slightly modified version of a paper submitted for publication to the S IAM Journal on Scientific Computing. The paper was co-authored with Chen Greif. The division of tasks between the two authors is outlined below. Overall guidance and directions for the research was provided by Chen. The theoretical results that were discovered along the way were found in unison, often a direct result of discussions. The experiments conducted to test the theoretical results were primarily conducted by me. Preparation of the manuscript involved significant effort by both authors. Initial writing and development work was carried out by me, but the overall layout was determined by Chen. Writing and editing work was done by both authors, with my initial focus on writing and Chen's on editing. In its final phases, though, both authors were equally involved in both tasks. 1 Chapter 1 Introduction The development of techniques for efficiently solving saddle point systems plays a central role in numerical analysis. A vast range of problems in applied math-ematics involve solving these linear systems, and their solution is often the critical, most time-consuming component of the underlying algorithm. Many techniques for solving saddle point systems are available, but their viability for large scale systems is uncertain, and new methods are in demand. This thesis describes a new approach to preconditioning saddle point systems arising from interior point methods in optimization. The approach is applied within the general framework of iterative solvers. We provide an array of the-oretical and experimental results that demonstrate the merits of our proposed technique. The goals of this thesis are three-fold: • To introduce a new preconditioner and provide theoretical analysis of its properties and expected performance. • To demonstrate the performance of the preconditioner on standard con-strained optimization problems. • To extend the preconditioner to deal with new, more general problems, and provide modifications to improve its performance on special classes of subproblems. To meet these goals, this thesis is organized as follows. Chapter 1. Introduction 2 This chapter begins with an introduction to Krylov subspace methods for solving linear systems, and a discussion of their convergence. We then consider preconditioning, and discuss how it can significantly improve the convergence of a Krylov method. Next, saddle point linear systems are discussed, and examples of these systems from real applications are given. These topics are fundamental to all of the theory presented throughout the rest of the thesis. Chapter 2 provides a detailed introduction to interior point methods for quadratic programming. The framework described there is the setting for the preconditioning approaches we explore. A discussion of several software pack-ages and the approaches they employ is also given. Chapters 3 and 4 present a collection of new ideas and original work con-ducted in collaboration with Chen Greif. Chapter 3 contains theoretical results and discussion of experiments related to the application of the precondition-ing technique to convex quadratic programming problems. The chapter can be treated as stand-alone, and is a slightly modified version of a paper submitted for publication. Chapter 4 extends the augmentation preconditioning approach in several ways. A modification is presented to deal with stabilized saddle point systems. In addition, approximate augmentation preconditioners are considered. The final chapter provides general concluding remarks about the research. A summary of the contents of the thesis is also outlined there. 1.1 Krylov Subspace Methods Direct methods for solving linear systems are very popular in many applica-tions in science and engineering, but they are not ideal in many senses. While Cholesky, LU, and symmetric indefinite factorizations and substitutions are known for being robust, there are certain drawbacks to them. A detailed sum-Chapter 1. Introduction 3 mary of the weaknesses of direct, methods can be found in many standard texts, and includes the following points: • Factorizations typically require an operation count that is is quadratic (for some sparse matrices) or cubic (for dense matrices) in the number of unknowns. • For a sparse matrix, the resulting factors can be dense, and require signif-icantly more storage than the original matrix. • Direct access to each element of the matrix involved is required, but this is not available for all problems of interest. • Direct solves produce exact solutions to linear systems, when approximate answers may be sufficient. • Direct methods do not take advantage of known approximate solutions, regardless of how accurate they may be. Certain advanced pivoting techniques and ordering schemes can reduce the space and time complexity of a direct factorization. Such techniques are applicable to classes of problems, but involve a complicated implementation, and often do not reduce time and storage requirements significantly. While iterative methods are not appropriate for all problems, they suffer from few of these weaknesses. Iterative methods for solving linear systems have existed in various forms for quite some time, and have received great attention in the last half-century. Examples of these methods include simple fixed point iterations such as Gauss-Seidel and SSOR, Krylov subspace methods, and multi-grid techniques. Treatments of iterative methods in numerical linear algebra can be found in many texts, see [1, 11, 22, 30, 32]. A key advantage of iterative methods is that each iteration generally produces a more accurate solution, and from a reasonably accurate initial guess they can converge very quickly. Chapter 1. Introduction 4 In recent decades, K r y l o v subspace methods have t ruly blossomed in impor-tance. In their raw form, these methods do not directly access elements of the matr ix involved, they only require the functionality to compute matrix-vector products. W h e n a matr ix is sparse this can be performed very efficiently. The pr imary factor dr iv ing the popularity of K r y l o v methods is their rapid conver-gence for a very broad range of problems. A s wi l l be discussed shortly, K r y l o v methods converge at a rate heavily dependent on both the condit ion number and eigenvalue distr ibut ion of the underlying matrix. This has helped to identify large classes of problems where these methods are highly suitable. A s there is a considerable body of literature discussing these techniques, we only present a brief description here. The interested reader can find detailed introductions and analysis in [11, 22, 30], and many other books and papers. Suppose we are to solve Ax = b iteratively. The exact solution is clearly x = A~lb, but as a direct implicat ion of the Cayley-Hami l ton theorem, A~l can be writ ten as a polynomial in A. Tha t is, x = A~1b = p(A)b for some polynomial p(x). Since p(A)b can be writ ten as Ylk=o ckAkb (where Ck are constants), it is natural to consider the vectors (b, Ab, A2b,..., Ak_1b), and search for solutions to Ax — b as linear combinations of these vectors. This suggests that an ap-proximate method for solving the linear system could be based on repeatedly mul t ip ly ing A w i t h b and combining the results. The vectors (b, Ab, A2b,..., Ak~1b) span the Krylov subspace Kk{A, b), but as k increases, they become more linearly dependent. A s a result, it is more appropriate to consider an orthogonal basis for Kk(A,b), denoted by Qk, and construct solutions Xk as linear combinations of the columns of Qk- Since Qk spans Kk(A, b), one should expect that for large k accurate solutions to Ax = b Chapter 1. Introduction 5 can be found. The process of constructing an orthogonal basis for the Krylov subspace is one of the two fundamental components of every Krylov method. This is done using the Arnoldi algorithm for asymmetric matrices, and the Lanczos algorithm for symmetric matrices. Each iteration adds an orthogonal column to Qk- The approximate solution Xk, sought from the columns of Qk, will becomes more accurate as the orthogonal basis grows. The second fundamental component of a Krylov method is the linear combi-nation of columns of Qk that defines Xk- This is typically chosen to be optimal in some sense. Different Krylov methods can be developed based on the measure of optimality used to define Xk- For instance, the conjugate gradient method (CG) for symmetric positive definite systems takes solutions Xk to minimize ||rfc||_4-i. Another example is MINRES, which takes Xk so that ||rfc||2 is minimized. In [11, pp. 306] a'list of common Krylov methods and the optimality measures they employ can be found. The definition of Xk can lead to beneficial properties under some circum-stances. For C G , it turns out that only the final two columns of Qk need to be stored throughout the solve. MINRES can be shown to depend on a similar short-term recurrence, but not all Krylov methods have this desirable property. Each iteration of a Krylov method results in a higher dimensional Krylov subspace. Thus if A is n x n, after n iterations, the Krylov subspace spans K n x n , and an exact solution to Ax = b can be found. Originally, C G was treated as a direct method because of this property. Experiments showed, though, that for some problems the iteration could be stopped after relatively few steps and still produce an accurate solution. Detailed analysis reveals that the convergence of Krylov methods are closely connected with the condition number and eigenvalues of the matrix involved. Chapter 1. Introduction 6 Defining t\. = x — Xk and the ^4-norm \\V\\A = \\Av\\2, one can show the well-known result [22, pp. 51] that for C G , the absolute error e^is related to the initial error eo and the condition number K of A by A similar result holds for MINRES in terms of the residual 7 > at each iteration [22]. Equation (1.1) illustrates two important points about the convergence of C G and other Krylov methods. First, an accurate initial guess xo can ensure small error values at early iterates. Second, if the condition number of A is close to one, then (^/K — 1 ) / ( - v / K + 1) will be small and the error will decay quickly. A more involved error analysis of C G and other Krylov methods reveals that convergence is intimately related to the distribution of eigenvalues of A. For C G it is known that where pk denotes the set of all kth order polynomials such that Pfc(O) = 1, see, e.g. [22]. The Cayley-Hamilton theorem states that a matrix A is a zero of its characteristic polynomial. It follows that-if A is symmetric and j is the number of distinct eigenvalues in A, then there is a jth order polynomial such that pj(A) = 0. The number of iterations needed to achieve \\ek\\A — 0 is then equal to the number of distinct eigenvalues of A plus one. In addition, by expressing eo as a linear combination of the eigenvectors of A, one can show that the number of iterations required to achieve an exact solution depends on how many eigenvectors are required in the representation of eo. Constructing solutions from the Krylov subspace is equivalent to fitting a polynomial through the eigenvalues of A. The number of iterations required for convergence to an exact solution corresponds to the degree of the polynomial required to fit through all of the eigenvalues. A matrix with many distinct (1.1) ||efc|U = min||pjfe(A)eo|U, Chapter 1. Introduction 7 eigenvalues will require a high order polynomial fit and thus many iterations will be necessary for convergence. For matrices with few distinct eigenvalues, or eigenvalues that are clustered close together, one can expect rapid convergence of a Krylov method. The convergence dependence on condition number and spectral clustering of the matrix A gives the solver a means of estimating how effective a method will be in advance of its application. This has greatly contributed to the popularity of Krylov methods. When analysis revealing spectral properties of the matrix is possible, these convergence results allow solvers to make accurate estimates of run times required prior to attempting a solve. 1.2 Preconditioning In the previous section it was stated that the convergence of a Krylov subspace method depends directly on the condition number and distribution of eigen-values of the linear system involved. Fewer distinct eigenvalues lead to fewer Krylov iterations, and this observation has given great importance to the field of preconditioning. Preconditioning is the science of transforming linear systems so that their spectral properties are more amenable to rapid convergence with iterative methods. A basic preconditioning approach is to introduce a preconditioner M , and solve M~lAx = M~lb instead of Ax = b. It is important to point out that preconditioning does not require the explicit inversion of M, the inverse is implicitly used .by solving linear systems such as Mx = c at each iteration. For symmetric positive definite linear systems this type of precondition-ing destroys symmetry and seems unsuitable with a method such as C G . This Chapter 1. Introduction 8 problem can be overcome by recognizing that M~lA has the same inertia as M1/2M~1AM1/2 = M~1/2AMl/2. As long as M is symmetric positive definite, M - 1 / 2 ' A M 1 / 2 will be, and since M~*A has the same inertia, C G can be applied directly to it. The choice of M can make a dramatic difference in the convergence of a Krylov method. As an extreme example, if one chose M = A, then M~lA = / which has just one distinct eigenvalue and a Krylov method would converge in one iteration. This is impractical, though, because it would require the ability to invert A or solve systems with it, which was the goal in the first place! There is a delicate balance between a preconditioner that is easy to invert or solve with, and one that effectively alters the eigenvalues of A. As explained in [4, 11, 30], there are several important properties a precon-ditioner should have: • The system Mx = c must be easy to solve. • The matrix M must be nonsingular. • M must in some way approximate A, so that M _ 1 is close to A-1 and the preconditioned system is easy to solve. • M must not be too difficult to construct. The first requirement arises because at each step of the iterative solution of Ax = b, a linear system in M must be solved. The second requirement is somewhat more obvious, if a preconditioner does not make a problem easier to solve it is not worthwhile. Requiring M to be nonsingular guarantees that the problem of no solution, or infinitely many solutions does not arise. Finally, if M is extremely difficult to construct solving the non-preconditioned system might be more efficient. Chapter 1. Introduction 9 There are a wide range of preconditioners commonly used. For a broad sur-vey of popular methods including approximate inverses, and incomplete factor-izations, see [4]. These methods are designed for very general classes of matrices, when a problem has very specialized structure, better preconditioners can be derived. Matrices with special structures arise often in real applications and special preconditioning techniques are necessary to deal with them effectively. 1.3 Saddle Point Systems Until now, we have discussed iterative methods for general, possibly unstruc-tured matrices. In real applications, though, the underlying linear systems are often highly structured. It makes sense to develop techniques that apply to special, but still broad, classes of matrices. One such class of matrices is the saddle point matrix. The ubiquity of saddle point systems in applied mathematics is at first glance astonishing. From constrained optimization to computational fluid dynamics to circuit design, problems consistently lead to matrix equations with saddle point systems. A n extensive list of problems that lead to saddle point systems can be found in the survey [5]. Additional demonstrations of the proliferation of these structured matrices throughout applied mathematics can be found in [31]. Saddle point systems can be asymmetric, but our interest is primarily in the symmetric case. These saddle point matrices are of the form where G is n x n and symmetric, A is m x n, with m < n, and the zero block is my. m. These systems can also appear in regularized or stabilized form, where the zero (2, 2) block is replaced by a symmetric negative definite matrix. We consider these problems in Section 4.1. (1.2) Chapter 1. Introduction 10 Saddle point systems come in all sizes, and have many special properties. In the symmetric case, (1.2) is non-singular if and only if AT has full column rank and G is non-singular. If G is symmetric positive semi-definite, then (1.2) is invertible if the null spaces of G and A do not intersect. We now present a small sample of problems that immediately lead to saddle point linear systems, more thorough derivations can be found in [5, 31]. Equality Constrained Quadratic Programming Consider the following equality constrained quadratic program min ^xTHx + cTx, subject to: Ax = b, where x e R n , H is n x n, and A is m x n, with m < n. The solution can be found by introducing Lagrange multipliers A, and solving the unconstrained problem min -xTHx + cTx — \T(Ax — b). x,x 2 When H is symmetric positive semi definite, first order Karush-Kuhn-Tucker ( K K T ) conditions guarantee the optimality of a solution [28, Chapter 16]. These are found by setting the derivatives with respect to x and A to zero. This results in n + rn linearly independent equations in n + m unknowns. Hx- ATX = - c , Ax = b. This can be reformulated as the solution of a saddle point linear system This simple derivation shows how to set up and solve equality constrained quadratic programs in terms of a single linear system. As will be shown shortly, Chapter 1. Introduction 11 for inequality constrained programs the linear system to solve is similar, but solves must be performed repeatedly. The 2D Stokes Problems The 2D Stokes problem involves finding the velocity v = (vi(x,y),V2(x,y)) and pressure p(x,y) for an incompressible flow for a region in K 2 . The incompress-ibility condition leads to the constraint which can be expressed for the discrete problem as Ax = 0, where A is a finite difference matrix. The Stokes problem governs slow moving flows, and can be reduced to the solution of the differential equations where A is the Laplacian operator. By discretizing the differential equation, and letting L represent the discrete Laplace operator, the Stokes problem can be solved by finding solutions to the linear equations which has the familiar saddle point structure. Constra ined Weighted Least Squares The constrained, weighted least squares problem requires the solution of V -v = 0, -Av = / - Vp, Lv - ATp = -/, Ax = 0. These equations can naturally be expressed in matrix form min ||c — Ga;||2, subject to: Ex — d Chapter 1. Introduction 12 This problem can arise from constrained curve fitting. Optimality conditions for this problem require introducing vectors r, A and differentiating to find the optimality conditions. The optimality conditions lead to the solution of the following system of equations 0 G 0 0 E GT ET 0 A This is a saddle point system" with (1,1) block \ 7 iP o 0 0, and (2,1) block: A= [G1 E1}. This illustrates the point that even if a system cannot be written immediately in block 2x2 form, often a simple permutation or grouping of the block variables leads to the familiar saddle point structure. 13 Chapter 2 Interior Point Methods for Quadratic Programming The role of optimization in science and engineering problems is prominent. De-sign tasks frequently require a "best" solution. From aeronautical engineering to financial planning, the selection of a set of parameters that maximizes or minimizes an objective function occurs on a regular basis. To deal with these problems, operations researchers and numerical analysts have developed sophis-ticated algorithms for solving the most important classes of optimization prob-lems. One such technique known as interior point methods, is applicable to constrained optimization problems, and has become popular in recent decades. In this chapter we present a general introduction to interior point methods for solving linear (LP) and convex quadratic programs (QP). The focus of the presentation is on the nature of the linear systems that arise, and the methods that are used to solve them. The interested reader is directed to [28] for a thorough treatment of the topic. 2.1 Convex Quadratic Programming The standard convex QP problem is of the form min \xTHx + cTx (2.1) x 2 subject to: Ax = b, Cx> d. Chapter 2. Interior Point Methods for Quadratic Programming 14 The vector the matrix H is n x n, A is m i x n, with m i < n, and C is 7712 x n. The other vectors have appropriate dimensions to match these. To be a convex quadratic program, H must be symmetric positive semi definite. This is similar to the equality constrained QP problem discussed in Section 1.3, with the additional constraint Cx > d. Notably, linear programming (LP) is just an instance of this problem with H = 0. The description given here applies to LP, though typically a different notation is used for L P problems. Linear programming is so important in its own right that many specialized techniques have been developed specifically for it. For further details refer to [28, Chapter 13-14], and [34]. Optimality Conditions When the Hessian H is symmetric positive semi-definite and the constraints are linear (as is the case for convex QP problems) the first order K K T conditions are sufficient to guarantee global optimality of a solution [19, 28]. The conditions involve introducing Lagrange multipliers y, z and vector slack variables s such that Hx - ATy - CTz = - c , Ax = b, Cx-s = d, (2.2) sTz = 0, s,z>0. Were it not for the requirement s, z > 0, the first four conditions could easily be rewritten as a single block 4 x 4 linear system and solved. Chapter 2. Interior Point Methods for Quadratic Programming 15 2.2 A Generic Interior Point Method A basic interior point method works as follows. A function F(x,y, z, s) is defined so that its roots coincide with the K K T conditions (Hx- ATy - CTz + c\ Ax -b Cx — s — d F(x,y,z,s) V J The solution of the minimization problem can then be treated as an iterative root finding problem for F(x, y, z, s), constrained so that s, z > 0. From the kth iterate (xk,yk,Zk,Sk) a search direction can be computed by using a Newton type step. This is formulated in the following block 4 x 4 step equation: (2.3) (H -AT -CT o \ ( Ax" A 0 0 0 Ay n C 0 0 -I Az 0 s ZJ \AS) V") where S, Z are diagonal matrices with Sa — Si, Zn — Zi , and rc = Hxk - ATyk - CTzk + c, rb = Axk - b, rd = Cxk — Sk-d, rsz = SZe. We use e = [1,1 , . . . , 1) T in the definition of rsz. By iteratively applying the step procedure, the optimal point should eventually be reached. The above framework is not generally applied in its raw form, since experience has led to the development of more sophisticated heuristics for these problems, as we now discuss. Chapter 2. Interior Point Methods for Quadratic Programming 16 A Pred ic tor -Cor rec tor A l g o r i t h m A more effective scheme, known as a predictor-corrector algorithm was proposed by Mehrotra for linear programming [26]. The technique has been extended to QP, and is commonly used [28]. The description given here is adapted from [19]. After a step is computed with Equation (2.3), another step is computed with an adjusted right hand side. To understand the adjustment, a notion of the central path is necessary. The central path is defined to be the set of points (x, y, z, s) such that M F(x,y,z,s) = 0 0 yej (z,s) > 0. (2.4) Ideally, the iterates (xk,yk,Zk,Sk) should progress along this central path, and T should decrease with each step. Predictor-corrector algorithms favor iterates that follow the central path, under the intuition that progression along the central path gives the most rapid convergence towards the optimal solution. After a step direction is computed using Equation (2.3), a parameter a is calculated to scale the step direction so that (z, s) remain positive: a = arg max {(z, s) + a(Az, As)}. "6(0,1] With initial complementarity measure defined by \i = SkTZk/m2, the pre-dicted value p,' = (sk + aAs)T(zk + aAz) is computed. A new scaling factor cr = (ji1 jfj,)^ is then found. The corrector step is then computed by re-solving Equation (2.3) with a modified right hand side. The only change in the right hand side is to the rsz term, which becomes r'sz — —(rsz — ajie + aAsAze). Chapter 2. Interior Point Methods for Quadratic Programming 17 A final step scaling parameter a ' is computed and used to scale the new step so that (z,s) remain positive. The iterate is then updated with (xk+1,yk+i,Zk+i,sk+i) = (xk,yk,zk,sk) + a'(Ax,Ay,Az,As). The corrected step computation steers the iterate (xk+i, yk+i, zk+i, sk+i) closer to the central path, compensating for the progress made in the predictor calculation. The constants a, ip control how much the corrector step favors the central path, but their definitions are somewhat arbitrary and dictated by experience and intuition. Typically tp € [2,4]. When the predictor step leads to significant reduction in the complementary condition, (n'/p.) is small, and the shift towards the central path is minor. When sTz is not significantly reduced, the corrector step shifts the next iterate away from the computed direction given by the Newton step. This actually reduces the progress that can be made at the current iterate. This is considered an acceptable trade-off in the hope that the following iterates will progress much more rapidly towards the optimal point. Predictor-corrector algorithms require the solution of two step equations at each iterate. In fact, even higher order schemes exist, requiring the solution of up to six step equations per iteration [19]. This illustrates the, importance of being able to efficiently solve the step equation, which in Section 2.3 will be reduced to a saddle point problem. Stopping Criteria The final requirement of an optimization algorithm is a set of stopping criteria so that computations may terminate when a solution is accurate enough (or does not exist). Stopping criteria generally involve a measure of each of the optimality conditions, appropriately scaled. OOQP [19] and P C x [10] use a similar, reasonable set of stopping conditions Chapter 2. Interior Point Methods for Quadratic Programming 18 which we now present. To test an iterate for optimality, residuals at each iterate are tested to ensure they are sufficiently small. Given tolerances r r , r M , each iterate is checked to see if || (r c, Tb, rj) ||oo < Tr (with residuals defined in Equation (2.3)) and SkTZk < r M . If both of these conditions hold, and the tolerances are sufficiently small, the algorithm exits with a successful status.' To detect infeasibility, the duality gap is recorded at each iteration. This is defined to be ga-Pk = ^XkTHxk + cTxk - bTyk - dzk, and is just the difference between primal and dual objective functions. At the optimal point the gap should be zero. A function fa is also defined: , = \\{rc,n,rd)\\oo +gapk and a point is deemed infeasible if both fa> e-i, and fa, > e2 min fa 0<i<fc-l hold. In OOQP, the parameters sx = 1 0 - 8 and e 2 = 104 [19]. Additional conditions are used to quit the algorithms with an indeterminate status. This occurs i f sufficiently many iterations have occurred with relatively little progress, or if the residual errors appear to be blowing up relative to the complementarity measure \i. Beyond satisfaction of the optimality conditions, stopping criteria are some-what arbitrary. Optimization researchers continue to explore these conditions, and different software packages use different variants. 2.3 Properties of the Step Matrix In the previous section we derived the linear systems (2.3) that must be solved at each step of an interior point method. In our description of predictor-corrector Chapter 2. Interior Point Methods for Quadratic Programming 19 algorithms, it was stated that variants of the linear system must be solved twice at each step. We now show how the system can be reformulated as a saddle point problem, and discuss important properties of the underlying matrix. Equation (2.3) is asymmetric but can be reduced to a symmetric system through a simple observation. Since S and Z are diagonal, we can easily in-vert (analytically and computationally) and eliminate A s . By setting A s = -Z lSz + Z 1 r s z , the system reduces to H AT CT A O 0 C 0 -Z~lS ( A ™ \ rp,c, (2.5) -Ay \~AZJ where rp,c denotes the right hand side, for the predictor or corrector step, after eliminating A s . Note that Equation (2.5) can be regarded as a saddle point system if Ay and Az are permuted. The standard approach, though, is to utilize the fact that Z~1S is diagonal. Then Az can be eliminated, and the final system to be solved has the form H + CTS-lZC AT A 0 (2.6) where denotes the appropriately adjusted right hand side after elimination of Az for either the predictor or corrector step. Note the standard block 2 x 2 saddle point form of the system. In the typical case when inequality constraints are simply bounds on indi-vidual variables, the matrix C is an identity or the concatenation of an identity with a negative identity matrix. In this situation, CTS~lZC- is a diagonal matrix. The linear systems that arise from interior point methods are notoriously ( Chapter 2. Interior Point Methods for Quadratic Programming 20 difficult to solve. This is evident from the derivation given above. The difficulties arise from both the size of the linear systems involved, and the complementarity condition for s and z. Each step of an interior point method requires the solution of a saddle point system which, depending on the original problem, could be very large. When a predictor-corrector algorithm is used, two system must be solved at each iteration. This alone is motivation for finding efficient methods for solving saddle point systems. The steady growth in size of design problems of interest places great demand on the ability to solve saddle point systems. To complicate matters, as the minimizer is approached s and z become strictly complementary with s, z > 0. The diagonal entries of S~XZ become ex-tremely large and extremely small. The matrix equations that must be solved become more ill-conditioned with each interior point step, and the combination of large and small entries in S~lZ prevents the use of a simple scaling approach. In the theoretical limit, the diagonal elements of S~lZ could become infinite, which is why interior point methods are only used to attain approximate solu-tions. The ill-conditioning of the matrix in Equation (2.6) can manifest itself even at an early iteration, depending on the Hessian H and constraint matrix A. Typically, though, the ill-conditioning of the step matrix in the final iterations of the solve is due almost entirely to the S~lZ term. 2.4 Solving the Step Equation There are a few common approaches to solving the step equation. Here we briefly describe some common direct solve approaches, then focus on iterative techniques for solving the step equation. In particular, the discussion of iterative techniques focusses on preconditioning techniques for saddle point systems. Chapter 2. Interior Point Methods for Quadratic Programming 21 2.4.1 Direct Solutions Direct methods are still the industry standard for solving the step equations of constrained optimization such as (2.6). Each of the software packages OOQP, IPOPT, and LIPSOL [19, 33, 35] employ direct factorizations and solves to compute step directions at each iteration. The step matrix need only be factored once per step, and then applied in the predictor and corrector solves. In the case of QP problems with diagonal Hessian matrices, a further reduc-tion of Equation (2.3) to normal form (Schur complement) is often made. This reduction is made possible by noting the matrix H + CTS~1ZC is diagonal, and can be easily inverted to eliminate Ax. The resulting system that must be solved is of the form A(H + CTS~lZC)~l ATAy = r'. One of the benefits of the Schur complement approach is that A(H + CTS~lZC)~lAT is symmetric positive definite, so a Cholesky decomposition can be applied to it. The L P problems (when H = 0) warrants special consideration. It is unclear whether the reduction to normal form is a viable approach for truly large scale problems, however. With a zero Hessian and simple upper bounds on variables, the normal matrix becomes A(Z~lS)AT, and its condition number can become astronomical. In addition, the reduction to normal form can result in substantial fill-in. When m is small this is not a serious issue, but for problems with large m and n, there may be no effective way to factor the normal matrix. It should also be noted that the reduction to Schur complement form amounts to a forced ordering of variables and limits pivoting strategies. Even so, the reduction approach is fairly standard for L P problems, and is employed by LIPSOL [36]. To deal with the presence of extremely large and small pivots in the factors, special modifications of the Cholesky decomposition are necessary [27]. For general QP problems, the default linear solver used by IPOPT and Chapter 2. Interior Point Methods for Quadratic Programming 22 OOQP is the MA27 routine [14]. This routine performs a sparse symmetric indefinite factorization of the step matrix. OOQP also allows for solving the block 3 x 3 step equation 2.5 to allow for more flexible selection of pivots. In general, the direct approaches outlined here are very fast for many of the problems in the C U T E r [21] and other test suites. Even for sparse problems in the tens of thousands of unknowns, the direct approach performs very well. As problem size increases, though, these methods may break down. With few excep-tions, general sparse problems will lead to significant fill-in in computed factors, and the time required to factor the step matrix will be unacceptable. The trend towards larger problems will undoubtedly leave direct solve approaches behind. 2.4.2 Iterative Solutions The continued growth in the size of L P and QP problems has placed greater demand on iterative methods and preconditioning techniques specifically de-signed for the step equations. The advantage of an iterative approach primarily lies in the fact that many optimization problems involve very sparse matrices, and direct methods can lead to unacceptable fill-in. As discussed in Section 2.3, the matrices of interest become extremely ill-conditioned. With the basic understanding of the convergence of Krylov methods given in Section 1.1, it is evident that a non-preconditioned approach will not perform well. While generic preconditioning techniques may be used, they do not take ad-vantage of the saddle point structure of the step equation matrix. More special-ized block preconditioners have been developed to deal with matrices with block structure. A comprehensive discussion of block preconditioning techniques, in-cluding block diagonal and block triangular preconditioners, can be found in [5]. We now describe a few of these techniques and how they can be applied to the step equation for interior point methods (2.6). For the remainder of this section, we will simplify our notation by represent-Chapter 2. Interior Point Methods for Quadratic Programming 23 ing saddle point systems as G AT A = V A 0 Constraint Precondi t ion ing A popular choice of block preconditioner for saddle point systems is the con-straint preconditioner [25]. This has the form where M is an approximation of G. As shown in [25], M.~XA has the eigenvalue A = 1 with algebraic multiplicity 2m. The geometric multiplicity, however, is m. A careful choice of M can lead to further clustering, and even those eigenvalues that are not unity may be well distributed. In [25], constraint preconditioners are applied to the QP step equation, and are shown to perform well in comparison to the standard direct solvers. Fun-damental concerns exist with regards to these preconditioners, though. As an obvious drawback, we point out that constraint preconditioners are indefinite, which limits the choices of Krylov solvers they can be applied to. For many problems of interest, m may be small relative to n. The number of distinct eigenvalues of the preconditioned matrix is then n — m + 1 which could be very large. Even worse, since the geometric multiplicity of A = 1 is only m, the number of iterations required for convergence could be n + 1. With m,n both large, many costly Krylov iterations would be required to solve a saddle point system with a constraint preconditioner. Recall also that each step of a preconditioned Krylov method requires the solution of a linear system of the form Mx = b. How can M be chosen so that solving linear systems with the preconditioner repeatedly is easier than solving Chapter 2. Interior Point Methods for Quadratic Programming 24 linear systems with the original saddle point matrix? A common choice of M for constraint preconditioners is M = diag(G). When G is diagonal, though, the resulting preconditioner is equal to the original saddle point matrix. Is there an efficient method for solving saddle point systems with diagonal (1,1) blocks? If so, then the entire problem of linear programming would be greatly simplified (since the L P step matrix has a diagonal (1,1) block). While reducing such systems to normal form is a common approach, it can lead to significant fill-in, and worse conditioning. For problems with m and n extremely large, this will not be appropriate. Variants of constraint preconditioners have also been explored in the context of optimization. A n example of this is where the preconditioner involves an approximate constraint matrix in the (1,2) and (2,1) blocks. Such methods are still under investigation [8]. Augmentation Preconditioning The focus of this thesis is the analysis and application of a new class of precondi-tioners for the saddle point matrix problem. Our interests are in preconditioners with augmented (1,1) blocks. We consider preconditioners of the form and other variants. These are referred to as augmentation preconditioners be-cause of the augmentation term ATW~1A in the (1,1) block. Theoretical prop-erties of this preconditioner and its relatives will be explored in the following chapters, but we mention now that the number of distinct eigenvalues of the preconditioned matrix is less than m. 25 Chapter 3 Augmentation Preconditioning For Interior Point Methods This chapter was originally submitted as a paper to the SI A M Journal on Sci-entific Computing, special issue on the ninth Copper Mountain conference. The paper was co-authored with Chen Greif. 3.1 Introduction Interior point methods for solving linear and quadratic programming problems have been gaining much popularity in the last two decades. These methods have forged connections between previously disjoint fields and allowed for a fairly gen-eral algebraic framework to be used; see, for example, [17] for a comprehensive survey. The size of many problems of interest is very large and the matrices involved are frequently sparse and often have a special structure. As a result, there is an increasing interest in iterative solution methods for the saddle point linear systems that arise throughout the iterations. The general optimization framework is as follows. Consider the quadratic programming (QP) problem min x \xT Rx + cTx subject to : Ax = b, Cx > d. Chapter 3. Augmentation Preconditioning For Interior Point Methods 26 Here, x G R n , H is an n x n Hessian, often symmetric positive semi-definite, and c is an n x 1 vector; the constraint matrix A is m\ x n, with m i < n, and we assume it has rank mi. Inequality constraints are expressed in the m2 x n matrix C. Often simple bounds on the variables are given, so C is an identity matrix or its concatenation with a negative identity matrix. When H is symmetric positive semi-definite and the constraints are linear, satisfaction of the first-order K K T conditions is sufficient to guarantee global optimality of a solution [28, Chap. 16]. If Lagrange multipliers y, z and slack variables s are introduced, the K K T conditions for this problem are ' Hx - ATy - CTz = -c, Ax = b, Cx — s = d, s > 0, z > 0, sTz = 0. Typical interior point methods [28, 34] for QP problems define a function whose roots coincide with the K K T conditions and take Newton steps to pro-gressively approach an optimal solution. Predictor and corrector solves are performed at each step to ensure sufficient progress towards the optimal point occurs. After elimination of some of the unknowns, we obtain a saddle point system of the form (H + CTS-1ZC AT\ I Ax \ A 0 ) \-Ay where S and Z are diagonal and keep changing throughout the QP iteration. The right hand side, u, is related to residual measures of the K K T conditions satisfaction, and depends on whether a predictor or corrector step is being computed [28, 34]. While we will focus primarily on QP, it is worth also considering the linear Chapter 3. Augmentation Preconditioning For Interior Point Methods 27 programming (LP) problem, which is formulated as min cTx x 6 E n subject to : Ax = 6, x > 0. L P problems share similarities with QP; in fact they can be classified as simple subproblems, with a zero Hessian and further simplifications. It is convenient to present the corresponding linear system in the 2 x 2 block form where D is diagonal and changes throughout the L P iteration, and v is con-structed in a similar manner to (3.1). For both L P and QP the saddle point matrix becomes highly ill-conditioned as the solution is approached, due to increased ill-conditioning of the Hessian. In the typical case when inequality constraints are simple bounds on the primal variables, H+CTS~1ZC is a diagonal perturbation of H. The complementarity of S and Z gives rise to extremely small and extremely large values in S~lZ. Both for L P and QP, D and H + CTS~1ZC, respectively, will never be exactly singular, but will approximate singularity as the solution is approached. This is a key property that we exploit to our advantage in our derivation of the proposed preconditioner. Thorough treatments of the theory involved in interior point methods for linear and quadratic programming can be found, for example, in [28, 34]. The development of using a predictor and corrector step calculation at each iter-ation of the solve is originally presented in [26]. These sources point out the importance of efficiently solving the step equations, and identify the difficul-ties involved. Many software packages such as IPOPT [33], LIPSOL [35], and (3.2) Chapter 3. Augmentation Preconditioning For Interior Point Methods 28 OOQP [19] use direct solvers to solve the step equations. While this is a sound approach for certain problems, it may suffer the combined ailments of poor scal-ing with problem size and significant increase in ill-conditioning of solutions as the QP or L P solution is approached. Special care must be taken in matrix factorizations to deal with the presence of large and small pivots [27]. These factors motivate the study of iterative methods in the context of optimization. Modern solution techniques like Krylov subspace methods rely on the ease of sparse matrix-vector products, and converge in a rate dependent on the number of distinct eigenvalues of the preconditioned matrix [11, 30]. In this paper we study a preconditioner that has the property that the more ill-conditioned the (1,1) block of the saddle point matrix is, the better a mini-mum residual solver (such as MINRES) performs. Therefore, the corresponding solver is particularly effective in the last few iterations of the L P or QP solver. Our approach is based on augmentation of the (1,1) block using a weight ma-trix. Augmentation has been used in several areas of applications and in many flavors (see for example [5] and references therein). Our particular methodology extends recent work done by Greif & Schotzau [23, 24] into new directions and introduces a technique that well fits into the algebraic framework of interior point methods for optimization problems. Throughout the paper, we discuss analytical and numerical properties of our proposed preconditioner and point out spectral distribution results. As we discuss later on, the technique involves the choice of a weight matrix, and a sensible choice is necessary for assuring that the inner iterations are not compu-tationally costly. We discuss these aspects and provide numerical evidence that supports our analytical findings. The use of iterative methods in constrained optimization also relies on the notion of inexact interior point methods. These have been investigated in [2, 3, 18]. Their findings justify the use of approximate Chapter 3. Augmentation Preconditioning For Interior Point Methods 29 solutions at each step of the method. Our preconditioner is part of a growing set of preconditioned iterative ap-proaches for solving the optimization problem. A preconditioning technique that has emerged recently as a popular choice is the class of constraint precon-ditioners (see Keller, Gould and Wathen [25] and references therein), which rely on leaving the constraints intact, and seeking to replace the (1,1) block by a matrix that is much easier to invert. Recent work by Dollar and Wathen [13] introduces implicit factorizations that further facilitate the use of constraint preconditioners. Similar factorizations are applied to regularized saddle point systems by Dollar, Gould, Schilders and Wathen in [12]. Forsgren, Gil l and Griffin [16], extend constraint-based preconditioners to deal with regularized saddle point systems using an approximation of the (1,1) block coupled with an augmenting term (related to a product with the constraint matrix and reg-ularized (2,2) block). The technique is intended for interior point methods for general constrained optimization problems. In [8], Bergamaschi, Gondzio, and Zilli employ constraint-based preconditioners with (1,1) blocks equal to the di-agonal of the saddle point matrix. This simple choice allows for a factorization of the preconditioner, or its reduced normal equation form. In related work the authors explore approximate constraint preconditioners [7]. Other block structured preconditioning approaches are also available. For example, Oliveira and Sorensen [29] consider linear programming and make use of block triangular preconditioners that have the constraint matrix in their (1,2) block and easy to invert matrices in the main diagonal. The preconditioned matrix has an eigenvalue A = 1 with algebraic multiplicity n, and since for linear programs G is diagonal, the preconditioner can be factored and solved with efficiently. The remainder of this paper is organized as follows. In Section 3.2 the aug-Chapter 3. Augmentation Preconditioning For Interior Point Methods 30 mentation preconditioner and its general form are presented, algebraic prop-erties of augmentation preconditioners are derived, and the motivation for the block diagonal form is given. In Section 3.3.1 the choice of the weight matrix W and the inner solve are discussed. This is followed by two schemes for re-ducing fill-in in the preconditioner. In Section 3.4 we discuss numerical results demonstrating performance of the preconditioner. In Section 3.5 we draw some conclusions. 3.2 The Preconditioning Approach We will adopt the general notation to represent the saddle point matrices of equations (3.1) and (3.2). We assume G is symmetric, positive semi-definite, and that A is of size n x m and has full row rank. 3.2.1 A Block Triangular Preconditioner Consider the preconditioner where fc is a scalar, and W is an m x m symmetric positive definite weight matrix. The eigenvalues of the preconditioned matrix M~lA satisfy the generalized eigenvalue problem (3.3) M = (3.4) Chapter 3. Augmentation Preconditioning For Interior Point Methods 3 1 The second block row gives y = \W lAx, and substituting it into the first block equation gives A(A - l)Gx + (A 2 + fcA - l)ATW~1Ax = 0 . Regardless of the choice of fc, we see that A = 1 with multiplicity n — m (equal to the nullity of A). For each null vector of G we can find two A values satisfying A 2 + fcA — 1 = 0 . Thus we have A± = ~ f c ± ^ f c 2 + 4 - , each with algebraic multiplicity p. The remaining 2(m — p) eigenvalues satisfy Since both G and ATW~1A are positive semi-definite, we must have A 2 - A A 2 + fcA - 1 thus we can write > 0 , A 2 - A 2 A 2 + fcA - 1 ~ M for some \i G IR, p, > 0 . We can rearrange this to ( 1 + M 2 ) A 2 + (ku.2 - 1 ) A - M 2 = 0 , giving _ _ ( f c M 2 - 1 ) ± ^ ( f c ^ _ 1 ) 2 + 4 m 2 ( 1 + ^ 2 ) A - : 2aT7) ^ ' ( } This expression gives an explicit formula in terms of the generalized eigenvalues of ( 3 . 5 ) and can be used to identify the intervals in which the eigenvalues lie. To illustrate this, we examine the case fc = — 1 , which corresponds to set-ting the ( 1 , 2 ) block of the preconditioner to be — AT. We have A = 1 with multiplicity n — m, and A ± = , each with multiplicity p. By ( 3 . 6 ) we have Chapter 3. Augmentation Preconditioning For Interior Point Methods 32 Since A + is a strictly increasing function of /J, on (0, oo) (and A_ are strictly decreasing), the intervals containing the remaining eigenvalues can be found using limM_,o,oo ^±(M)- From this one finds that the remaining eigenvalues lie in the intervals ( 1~ 2 V^, 0) U (1, 1 + 2 V ^ ) - I t ' s worth noting that since G is typically highly singular, many of the generalized eigenvalues are large, in which case the corresponding eigenvalues A± are bounded away from zero. For example, many of the negative ones will tend to 1 ~ 2 V ^ . It is apparent that the separation of the 2p eigenvalues (k ± \Jk2 + 4)/2 becomes large as grows. Since those eigenvalues are unbounded as k goes to co, we conclude that k should be chosen to be of moderate size. 3.2.2 A Block Diagonal Preconditioner The choice k = 0 yields a block diagonal, symmetric positive definite precon-ditioner, suitable for use with minimal residual methods based on short re-currences, such as MINRES. Furthermore, the formulas given in the previous section may indicate special clustering properties for this case. This motivates us to further study this choice. The preconditioner (for k = 0) is of the form: (G + ATW-1A CM M = \ (3.7) I 0 WJ Assume G is positive semi-definite, with nullity p. Suppose further that A has full rank, and choose W to be symmetric positive definite. It is straightforward to show that if A is non-singular, then G+ATW~1A must be symmetric positive definite. The following spectral clustering theorem demonstrates the effective-ness of the preconditioner, especially when the (1,1) block of A is singular. Part of the results presented below were recently proved in [23], but we offer here additional results related to the reduced space generated by projections onto the Chapter 3. Augmentation Preconditioning For Interior Point Methods 33 null space of A, and prove our results using orthogonal transformations, taking similar steps to those taken in [25]. Theorem 1. The preconditioned matrix M~lA has eigenvalues A = 1 with multiplicity n, and A = — 1 with multiplicity p. The corresponding eigenvectors can be explicitly found in terms of the null space and column space of A. The remaining eigenvalues lie in the interval (—1,0) and satisfy the mxm generalized eigenvalue problem RW-lRTx = A [CT(ZTGZ)-1C - QTGQxq - RW'1RT] x, (3.8) where C = ZTGQ, Z is an orthogonal basis for the null space of A, and QR = AT is the QR factorization of AT. Proof. The eigenvalues of the preconditioned matrix M ~lA can be found through the generalized eigenvalue problem G + ATW-lA 0 (3.9) We transform this system to observe its behavior on the null space of A. Let QR = AT be the QR factorization of AT. Since AT is n x TO, Q is n x TO, and R is m x TO. Define Z to be a n x (n — m) orthogonal basis for the null space of A. Since Z UQ forms an orthogonal basis for R", any vector x £ R " can be written as x = Zxz + Qxq. Following the spirit of the proof of [25, Thm. 2.1], we define the (n + m) x (n + TO) matrix P = 1' Z Q 0^ 0 0 / (3.10) and perform a similarity transformation as follows. We express x = Zxz +Qxq, and let v = (xz,xq,y)T where Pv = (xz,xq:y)T. The generalized eigenvalue Chapter 3. Augmentation Preconditioning For Interior Point Methods 34 problem can then be written as PTAPv = \PTMPv. This yields: ' ZTGZ ZTGQ R QTGZ QTGQ R Xq = A \ 0 RT V \ y ) ZTGZ ZTGQ nTn7 nTnn _i_ R M / - I R T V o 0 °1 R 0 Xq Wj [y) (3.11) By inspection, we observe that by setting A = 1 the system reduces to Let ej denote the column of the identity matrix. Evidently there are n — m corresponding eigenvectors that can be written in the form (xz,xq,y) = (ei,0,0). In addition, m linearly independent eigenvectors can be written in the form: (xz, xq, y) = (0, e{, W~1RTei). 0 0 0 0" -RW~lRT R 0 RT -W 0 V0/ \yj Now consider A = — 1. Equation (3.11) reduces to 2ZTGZ 2ZTGQ 0 2QTGZ 2QTGQ + RW-1RT R 0 RT • W \yj o V0/ Any vector x* = Zx*z + Qx* in the null space of G obeys G(Zx*z + Qx*) = 0. Chapter 3. Augmentation Preconditioning For Interior Point Methods 35 There are p such vectors, so p linearly independent eigenvectors of the form {xz,xq,y) = {xl)x*v-W-lRTx*q) will satisfy (3.11) with A = — 1. To derive an expression for the remaining eigenvalues, A ^ ± 1 , we reduce Equation (3.11) to an eigenvalue problem in xq. From the block row in (3.11), y = jW~1RTxq. The first line of the equation can be reduced to: xz = -{ZTGZ)~lZTGQxq. Substituting these values into the second line of (3.11) and simplifying yields (3.8) with x = xq. By [23, Theorem 2.2] those eigenvalues lie in the interval (-1.0). ' • Theorem 1 illustrates the strong spectral clustering when the (1,1) block of A is singular, in the context of interior point methods. A well-known property (and difficulty) associated with interior point methods is the increased ill-conditioning and singularity of the (1,1) block as the solution is approached. 3.3 Practical Considerations and Computational Cost We now discuss the choice of the weight matrix W and ways of reducing the cost of inner iterations. We also describe procedures for dealing with a dense row. Chapter 3. Augmentation Preconditioning For Interior Point Methods 36 3.3.1 The Inner Iteration and Choices of the Weight Matrix There are two critical issues to consider in the application of the proposed preconditioner. First, the weight matrix W must be chosen. Secondly, given a weight matrix, an efficient method of factoring or iteratively solving systems with the preconditioner must be sought. These considerations are motivated by the fact that each iteration of a preconditioned Krylov subspace method requires solutions to linear systems of the form Mx = b; based on the block structure of M, this requires solving systems with G + ATW~1A and W. The'' simplest choice of a weight matrix is diagonal, and it clearly makes inverting W trivial. A simple, one-parameter choice is a scaled identity. Letting W = jl, 7 could be chosen so that the augmenting term ^ATA is of norm comparable to G. See, for example, [20] for a related discussion. Note that since G changes at each step, 7 must also be updated. For L P G is diagonal, and choosing 1/7 related to an ordered statistic, such as the mean, median or maximum of the diagonal entries in G, has proven to be effective in our experiments in reducing the number of MINRES iterations required at each step. This is illustrated in Figure 3.1 for the N E T L I B problem "tuff", where the MINRES residual norms (in the predictor step) are plotted against each step of the L P solve. An arbitrary fixed choice of 1/7 leads to a "hump" with a large number of intermediate outer iteration counts. As pre-dicted by our theoretical results, close to the solution the near-singularity of the (1,1) block results in fast convergence regardless of the choice of 7. But as is illustrated in the figure, it is the dynamic choice of 7 R L as the maximal entry in G that yields rapid convergence of MINRES throughout the L P iteration, and in fact the "hump" is flattened in this case. The choice 1/7 = max(D) results in a set of values monotonically increasing from approximately 1 to approxi-Chapter 3. Augmentation Preconditioning For Interior Point Methods 37 tuff m=292 n=617 MINRES Iteration Counts 501 1 , , 1 1 1 r— F i gure 3.1: MINRES iteration counts with various W = 'yi. Problem "tuff" has m = 292, n = 617 after LIPSOL preprocessing. MINRES iteration counts are plotted at each L P step for the various choices of 7. MINRES error tolerance was set to 10~ 8. mately 10 1 0 , and the iteration significantly outperforms other choices in terms of number of MINRES iterations, while the cost of each inner solver does not change. For QP, similar approaches are possible. With W — 7/, a choice of 7 such as | | A | | 2 / | | G | | (or an approximation thereof) ensures the norm of the augmenting term is not too small in comparison with G. For solving Mx = b, iterative and direct methods are possible. In a purely iterative scheme, simple preconditioners for G + ATW~1A can be computed, and the inner iteration can be solved using P C G . In this case, G + ATW~lA does not need to be explicitly formed, and it can be used solely in matrix-vector products. Such a product can be computed quickly by computing product with A, then AT, and adding the result (scaled by 1/7) to a product with G. Our experiments were based on this approach, making use of an incomplete L U factorization of the preconditioner and solving with P C G . Chapter 3. Augmentation Preconditioning For Interior Point Methods 38 Symmetric positive definiteness of the preconditioner allows also for use of a sparse Cholesky factorization. By explicitly factoring G + ATW~1A, a single factorization can be repeatedly used for both the predictor and corrector step calculation for each iterate (since the saddle point matrix changes at each iterate, not between predictor and corrector steps). When G and A are narrow banded the preconditioner itself is narrow banded and a sparse Cholesky factorization can be applied efficiently. 3.3.2 Dealing wi th a Dense Row in A The presence of even a single dense row in A can lead to a fully dense augmenting matrix ATW~lA. Arguably we never need to explicitly form this matrix as it will only be used in matrix-vector products in an inner iteration. We present two possible approaches for dealing with dense rows in the situation that it.is desirable to explicitly form the (1,1) block of the preconditioner. First, we present an asymmetric preconditioner, motivated by the analysis of Section 3.2.1. With Oi denoting the dense column i of AT, and being the ith column of an m x m identity, we define a preconditioner G + ATWA -aief\ (3.12) 0 W J Suppose W = 7 / for some 7 > 0, and let W = ^1 — ^ e^ef. Assuming M is non-singular, the eigenvalues of the preconditioned matrix are given by the following theorem. T h e o r e m 2. The preconditioned matrix M~l A has A = 1 -with multiplicity n—1 and A = — 1 with multiplicity p. Corresponding eigenvectors can be explicitly found in terms of the null space and column space of A. Proof. Exactly as in the proof of Theorem 1 we define Q, R, Z and transform Chapter 3. Augmentation Preconditioning For Interior Point Methods 39 the generalized eigenvalue problem using P as in (3.10). This yields ' ZTGZ ZTGQ o\ H ' ZTGZ ZTGQ 0 ^ QTGZ QTGQ R Xq = A QTGZ QTGQ+^RRT-^nrT -nef \ 0 RT °) \y) { 0 0 7 7 j (3.13) where ri denotes the ith column of R. As before, by inspection we check A = 1, which reduces the equation to 0 0 0 0 -\RRT + ±nrJ R + nef 0 RT - 7 / J Q \yj = o Immediately we see n — m corresponding eigenvectors of the form (xz,xq,y) = (u, 0,0), for (n — m) linearly independent vectors u. An additional m — 1 linearly independent eigenvectors can be seen by finding consistent solutions in the free variables xq,y to the equation -±RRT+ ±nr7 R + nef , , R1 - 7 / 0. Substituting y = ^RTxq, this requires 2^rirfxq = 0. In general we can find exactly m — 1 eigenvectors orthogonal to r^. That is, there are m— 1 eigenvectors of the form (xz ,xq,y) = (0,x*,^x*), where xq is orthogonal to , corresponding to A = 1. The p eigenvectors corresponding to A = —1 are also evident by simple Chapter 3. Augmentation Preconditioning For Interior Point Methods 40 inspection. Substituting A = — 1 requires finding a solution to: 2ZTGZ 2ZTGQ \ 2QTGZ 2QTGQ + \RRT -\nrj R-nef V R1 ji J 9 = o Vectors xz,xq,y can be found to solve this equation. Consider any x* = Zx*z + Qx* in the null space of G. Then GZx* + GQx* = 0, and we are left with finding a y such that lRRT_lr.rT R_r.eT RT 7/ = 0 -RTx* correctly cancels the left hand side, for the fixed x*. The choice y and it becomes apparent why the minus sign was chosen for the (1,2) block of M; without it, we could not explicitly find a suitable y value. Since each of the p vectors in the null space of G are linearly independent, we have constructed p linearly independent eigenvectors of M~lA corresponding to A = —1. • Theorem 2 shows that M is sparse and at the same time maintains strong spectral clustering. The preconditioner is asymmetric, though, and it is desir-able to find a simpler form (that still retains the strong spectral properties). To this end, consider replacing preconditioner M from Equation (3.12) with M G + ATWA 0 0 w, where W is again an approximation to W 1 . Similarly to the asymmetric case, consider the choice ' W = W~l - -eieT. 7 Chapter 3. Augmentation Preconditioning For Interior Point Methods 41 The matrix W is diagonal but singular, since each of its rows that corresponds to a dense row of A is identically zero. As a result, the matrix ATWA no longer experiences fill-in from the product a^aj'. This modification does not result in significant changes in the spectral clus-tering of the preconditioned matrix. Since M is a rank-1 perturbation of M. (from Equation (3.7)), it follows that M~1A is just a rank-1 perturbation of M.~lA and we can apply the interlacing theorem. We note that for the inter-lacing theorem to hold we would need a symmetric formulation of the problem, which can be easily obtained by an appropriate similarity transformation. If we let (ii denote the ith largest eigenvalue of M~lA, and Ai be the ith largest eigenvalue of M~lA, the interlace theorem guarantees that Aj_i < Hi < Xi Since the eigenvalues Ai are known and have high algebraic multiplicities, so are the eigenvalues pi, and for each multiple eigenvalue Ai the multiplicity of the corresponding pi goes down by at most 1, due to interlacing. Thus, if preconditioned MINRES is used, we have strong spectral clustering without introducing any fill-in. We can summarize our findings as follows. P ropos i t ion 1. Assume M is non-singular. Then the preconditioned matrix M~1A has A = 1 with multiplicity at least n—l, and A = — 1 with multiplicity at least p — 1. 3.4 Numerical Experiments Numerical experiments were conducted on problems from the C U T E r test suite [21], using MATLAB , on an Intel 2.5GHZ processor with 2GB of R A M . In our experiments we focused on QP test problems with a non-diagonal and semi-definite (1,1) block, for which our preconditioner is suitable. We also illustrate how the number of iterations required by MINRES drops to its theoretical limit and how inexact inner iterations reduce the overall computational work. Results are also included for the row removal scheme discussed in Section 3.3.2. Chapter 3. Augmentation Preconditioning For Interior Point Methods 42 We used a variety of methods for solving the inner iteration Mx = 6, but most of our experiments made use of I L U P A C K [9]: This package uses multi-level incomplete L U factorizations as preconditioners for conjugate gradient and G M R E S methods, and was found to be efficient and easy to use. Tables 3.1-3.4 demonstrate several measures of the work required in the application of our method. In these tables, n , T O I , T O 2 denote the dimensions of the problem being solved, as defined in the Introduction, and NQP denotes the total number of QP .steps required for convergence. The average number of Krylov solver steps per QP steps is given by NK • The average number of iterations of P C G used by I L U P A C K in the inner iteration, and the total number summed over all QP steps are given by Ni and Tot(Ni) respectively. The time (in seconds) required to solve a problem is given in the column T. ^ The first two tables show results for applying BICG-STAB, once with a tight outer tolerance of ICT 6 and once with a loose outer tolerance of 1 0 - 2 . The third and the fourth tables show results using the same tolerances, with MINRES. The following observations can be made. In general, loosening tolerance for the Krylov solver increases the overall number of QP iterations only modestly, and at the same time substantially reduces (in most tested examples) the overall number of solves. We mention here that loosening the convergence tolerance of the inner-most iterations did not result in a similar reduction of computational work. We therefore observe in our experiments that inexactness is more effective on the level of the outer Krylov iterations rather than on the level of the inner-most iterations. Comparing the performance of B iCG-STAB to the performance of MINRES is not within our stated goals, but having results using more than one Krylov solver allows us to confirm the consistency of convergence behavior for most problems. With the exception of a small number of problems (such as "static3" Chapter 3. Augmentation Preconditioning For Interior Point Methods 43 and "steenbra"), the two solvers behave in a similar manner, and the modest running times indicate that the proposed preconditioner seems to be efficient and robust. Problem n mi m 2 NQP NK Nj Tot{Nj) Time (s) avgasa 12 10 18 5 2.80 2.00 112 0.22 avgasb 12 10 18 5 2.75 2.00 110 0.21 blockqpl - 10 26 11 51 4 1.75 2.71 76 0.15 blockqpl - 10 26 11 51 . 4 .1.81 3.07 89 0.14 blockqp3 - 10 26' 11 51 6 2.00 3.42 164 0.25 . blockqpA - 10 26 11 51 5 2.50 3.44 172 0.22 blockqp5 - 10 26 11 . 51 6 2.38 3.25 185 0.28 cvxqpl — 100 100 50 200 34 3.66 4.67 3840 3.75. cvxqpl — 100 100 25 200 13 3.06 3.57 567 0.81 cvxqpZ — 100 100 75 200 13 4.06 3.47 732 1.10 duall 85 1 170 7 1.75 4.26 362 0.99 dual2 96 . 1 192 5 1.50 3.37 101 0.57 dual3 111 1 222 5 1.50 3.07 92 0.71 dualA 75 1 150 5 1.45 3.55 103 0.37 govldqpl 699 349 1398 10 1.82 2.08 152 1.00 gouldqp2 - 30 59 29 118 7 1.64 2.59 119 0.25 gouldqp2> 699 349 1398 11 1.52 3.03 203 1.07 gouldqpS - 30 59 29 118 5 1.55 2.71 84 0.17 statici 434 96 144 20 1.71 3.42 469 1.57 steenbra 432 108 432 11 1.86 5.28 433 2.12 Table 3.1: Solver results using BICG-STAB. Problems were solved to a tolerance of 1.0e-06. BICG-STAB error tolerance was fixed at 1.0e-02. I L U P A C K error tolerance was set to 1.0e-06. Next, to demonstrate the application of the row removal scheme proposed in Section 3.3.2 we consider the "blockqp" set of problems. These problems are characterized by a Hessian with two non-zero diagonals, and a constraint matrix with a single non-zero diagonal, a dense row, and a dense column. As a result, if the augmentation preconditioner is fully formed, it will be dense. To avoid this, the symmetric row removal scheme is used. This leads to a preconditioner with a nearly diagonal (1,1) block, which can be approximately factored with I L U P A C K in an efficient manner. Theorem 1 guarantees increased spectral clustering of the preconditioned Chapter 3. Augmentation Preconditioning For Interior Point Methods 44 Problem n m i m 2 NQP NK Ni Tot{Ni) Time (s) avgasa 12 10 18 3 12.00 2.00 288 0.53 avgasb 12 10 18 4 10.56 2.00 338 0.48 blockqpl - 10 26 11 51 4 2.12 2.59 88 0.15 blockqp2 - 10 ' 26 11 51 4 2.31 3.08 114 0.16 blockqp3 - 10 26 11 51 5 7.60 3.01 457 0.53 blockqpA - 10 26 11 51 5 5.60 3.96 444 0.39 blockqp5 - 10 26 11 51 6 6.75 3.09 500 0.56 cvxqpl — 100 100 50 200 13 21.44 3.94 5002 4.74 cvxqp2— 100 100 25 200 12 10.27 4.01 2008 1.91 cvxqpS — 100 100 75 200 11 22.91 3.73 4107 4.30 duall 85 1 170 7 2.07 5.42 716 1.38 dual2 96 1 192 5 1.50 3.37 101 0.56 duals 111 1 222 5 1.50 3.07 92 0.73 dualA 75 1 150 5 3.50 4.40 308 0.64 gouldqp2 699 349 1398 9 4.28 2.05 315 1.59 gouldqp2 - 3 0 59 29 118 5 5.00 2.00 200 0.35 gouldqp3 699 349 1398 10 3.45 2.69 371 1.64 gouldqp3 - 3 0 59 29 118 5 3.55 2.70 192 0.29 staticZ 434 96 144 878 1.77 2.78 17292 68.72 steenbra 432 108 432 11 6.82 5.30 1591 5.31 Table 3.2: Solver results using BICG-STAB. Problems were solved to a tolerance of 1.0e-06. BICG-STAB error tolerance was fixed at 1.0e-06. I L U P A C K error tolerance was set to 1.0e-06. Chapter 3. Augmentation Preconditioning For Interior Point Methods 45 Problem n TOi m.2 NQP NK Ni Tot(Nj) Time (s) avgasa 12 10 18 4 4.00 2.00 96 0.19 avgasb 12 10 18 5 3.70 2.00 114 0.23 blockqpl - 10 26' 11 51 4 1.50 2.86 80 0.16 blockqp2 - 10 26 11 51 4 1.38 3.07 86 0.15 blockqpS - 10 26 11 51 6 1.92 3.79 178 0.24 blockqpi - 10 26 11 51 5 2.00 3.70 148 0.20 blockqp5 - 10 26 11 51 6 ' 2.58 3.45 190 0.26 cvxqpl — 100 100 50 200 21 8.74 4.29 2000 2.40 cvxqp2 — 100 100 25 200 13 3.31 3.59 496 0.77 cvxqp3 — 100 100 75 200 16 5.94 3.68 946 1.47 duall 85 1 170 7 1.57 5.91 384 0.88 dual2 96 1 192 5 1.00 3.27 98 0.58 dual3 111 1 222 5 1.00 3.07 92 0.74 dualA 75 1 150 5 2.80 5.02 266 0.57 gouldqp2 699 349 1398 10 1.65 2.14 156 • 1.09 gouldqp2 - 3 0 59 29 118 6 1.50 2.33 98 0.21 gouldqp3 699 349 1398 13 1.35 3.14 273 1.48 gouldqp3 - 3 0 59 29 118 6 1.58 2.86 123 0.24 static3 434 96 144 3 , 0.00 2.50 • 20 0.15 steenbra 432 108 432 12 6.38 6.92 1390 4.14 Table 3.3: Solver results using MINRES. Problems were solved to a tolerance of 1.0e-06. MINRES error tolerance was fixed at 1.0e-02. I L U P A C K error tolerance was set to 1.0e-06. Chapter 3.. Augmentation Preconditioning For Interior Point Methods 46 Problem n m i m 2 NQP NK Ni Tot{Ni) Time (s) avgasa 12 10 18 4 12.25 2.00 236 0.41 avgasb 12 10 18 4 11.25 2.00 220 0.38 blockqpl - 10 26 11 51 4 1.75 2.73 82 0.15 blockqp2 - 10 26 11 51 4 1.75 3.07 92 0.16 blockqp3 - 10 26 11 , 51 5 8.10 2.85 288 0.42 blockqpA - 10 26 11 51 5 7.20 3.96 404 0.42 blockqpZ - 10 26 11 51 6 7.25 2.99 332 0.46 cvxqpl — 100 100 50 200 14 21.39 4.06 3072 3.69 cvxqp2 — 100 100 25 200 12 14.71 3.99 1641 1.95 cvxqp3 — 100 100 75 200 14 17.82 3.78 2779 3.66 duall 85 1 170 7 2.64 6.66 586 1.08 dual2 96 1 192 5 1.00 3.27 98 0.58 dual3 111 1 222 5 1.00 3.07 92 0.72 dualA 75 1 150 5 4.60 5.65 418 0.72 gouldqp2 699 349 1398 9 6.22 2.03 300 2.00 gouldqp2 - 30 59 29 118 5 7.10 2.00 182 0.39 gouldqpZ 699 349 1398 10 4.50 2.67 347 1.94 gouldqp3 - 30 59 29 118 5 4.80 2.68 182 0.32 static3 434 96 144 3 0.00 2.50 20 0.16 steenbra 432 108 432 27 40.54 7.60 19033 41.83 Table 3.4: Solver results using MINRES. Problems were solved to a tolerance of 1.0e-06. MINRES error tolerance was fixed at 1.0e-06. I L U P A C K error tolerance was set to 1.0e-06. Problem n m i TO2 NQP Ni Ni Time (s) blockqpl 2006 1001 4011 3 3.00 7.67 1.78 blockqp2 2006 1001 4011 4 3.00 8.00 2.38 blockqp3 2006 1001 4011. 8 30.69 9.44 27.28 blockqpA 2006 1001 4011 6 18.88 8.28 17.31 blockqpl 2006 1001 4011 8 25.38 8.21 23.32 Table 3.5: Results obtained using the symmetric dense row removal scheme of Section 3.3.2. Problems solved using MINRES with error tolerance 1.0e-05. Problems solved to accuracy 1.0e-4, with I L U P A C K error tolerance 1.0e-6. Chapter 3. Augmentation Preconditioning For Interior Point Methods 47 Problem n m i m 2 NQP NK Nj Time (s) blockqpl 2006 1001 4011 4 2.00 7.34 2.11 blockqp2 2006 1001 4011 4 2.75 7.37 2.36 blockqp3 2006 1001 4011 12 2,54 7.71 6.90 blockqpb 2006 1001 4011 10 2.40 7.86 5.57 Table 3.6: Results obtained using the symmetric dense row removal scheme of Section 3.3.2. Problems solved using MINRES with error tolerance 1.0e-02. Problems solved to accuracy 1.0e-4, with I L U P A C K error tolerance 1.0e-6. Note "blockqp4" is not present, the method did not converge with this loose error tolerance. matrix M~lA when the (1,1) block of A is singular. The L P and QP saddle point matrices, however, only become approximately singular as a solution is approached. It is useful to evaluate whether the strong clustering of the pre-conditioned eigenvalues will be achieved under approximate conditions. To test this, we examined the eigenvalues of the preconditioned matrix at various steps in the process of solving an LP- Figure 3.2 depicts the sorted eigenvalues at three different steps of the L P solve for the problem "share2b". Preconditioned eigenvalues at the first, sixth, and tenth L P steps are shown from top to bottom. (The problem took 13 steps to solve). In confirmation of Theorem 1, we see that all eigenvalues lie between —1 and 1. Furthermore, right from the first iteration A = 1 has high multiplicity. It is interesting to note that already by the sixth step (the middle subplot), only a handful of unclustered eigenvalues remain. In the 10th L P step, all eigenvalues appear to be ± 1 . These observations all confirm Theorem 1, and illustrate how tightly clustered the preconditioned eigenvalues can be when the saddle point system is severely ill-conditioned. This also demonstrates that even in early iterations the preconditioner can be effective. Next, we present basic comparisons with the constraint preconditioners. The results here are mixed, as we explain below. Figure 3.3 shows a basic compari-son of MINRES iteration counts (of the predictor computation). The plot shows Chapter 3. Augmentation Preconditioning For Interior Point Methods 48 sharo2b LP Step:1.6.10 0 50 100 150 200 250 0' 50 100 150 200' 250 Counting Index Figure 3.2: Eigenvalues of M~lA at different steps of the L P solution for "share2b". Eigenvalues are plotted in sorted order with values along the y axis. Note clustering to A = ± 1 occurs quickly, typically within a few L P steps. As governed by Theorem 1, all unclustered eigenvalues lie in the interval (—1,0). the MINRES iteration counts for each QP step for the problem "cvxqpl". The constraint preconditioner used for this plot was chosen to have a (1,1) block equal to the diagonal of the saddle point system. For both preconditioners, an exact inner solve was applied. For this problem the constraint preconditioner outperformed the augmentation preconditioner in most steps of the QP solve and in terms of overall computational work. On the other hand, in the final few steps, where the saddle point matrix is most ill-conditioned, MINRES it-eration counts for our preconditioner dropped significantly and convergence is almost .immediate, whereas convergence of the constraint preconditioner was still within approximately 20 iterations. This again confirms Theorem 1, and indicates that the proposed preconditioner is most effective when the saddle point matrix is most singular. This in fact may suggest a hybrid approach in which it may be useful to switch to an augmentation-based preconditioner when iterates approach the solution. Chapter 3. Augmentation Preconditioning For Interior Point Methods 49 krylovJterat ion'CountS'Forcvxqpl 120 | n r- 1 r 0±- H H ' ^ 3-- 0 . <2* 4 6 _ _ 8 10 '12 . :.14. OP*St 'ep . Figure 3.3: MINRES iteration counts for "cvxqpl", n = 1000, m i = 250. Con-straint preconditioner iterations are represented by 'o', augmentation precondi-tioner iterations are represented by 'x'. The constraint preconditioner is consis-tently better. Note in the final few iterations, though, iteration counts for the augmentation preconditioner significantly decrease. This is due to the increased singularity of the (1,1) block. Figure 3.4 shows an example in which our preconditioner is superior to the constraint preconditioner throughout the iteration. For the QP problem "cvxqp2", MINRES iteration counts (of the predictor computation) are plotted against the QP step. This is a large problem but throughout the solve no more than 30 iterations are needed per step. In the final few QP steps, the MINRES iteration count approaches its theoretical limit of two. Depicted in the same plot are the corresponding predictor iteration counts for a constraint precondi-tioner, with (1,1) block set to match the diagonal of the saddle point system. The constraint preconditioner consistently requires more MINRES iterations at each QP step. Chapter 3. Augmentation Preconditioning For Interior Point Methods 50 KryIbviIteratibri'• Couhts'iFor cvxqp2't Figure 3.4: MINRES iteration counts for "cvxqp2", n = 10000, m x = 2500. Constraint preconditioner iterations are represented by 'o', augmentation pre-conditioner iterations are represented by 'x'. The augmentation preconditioner is consistently better and approaches theoretical convergence. The final two iteration counts of MINRES are 3 and 2, respectively. 3.5 Conclusions We have studied a new preconditioner for quadratic programming problems, and have demonstrated its merits in several aspects. The preconditioner is well suited for saddle point systems with a highly singular (1,1) block; close to convergence, where ill-conditioning is at its prime, convergence of MINRES is the fastest and is theoretically guaranteed to occur within two iterations (in the absence of roundoff errors). We have also provided spectral analysis on the null space of the constraint matrix. The choice of the weight matrix W is crucial and we have tested with sim-ple scaled identity matrices that incorporate a dynamic choice of the parameter throughout the iteration. A sensible choice may substantially reduce the itera-tion counts throughout, as we demonstrate by the "flattened hump" for an L P problem in Figure 3.1. Furthermore, we have shown that applying an inexact Chapter 3. Augmentation Preconditioning For Interior Point Methods 51 version of the Krylov solver throughout the iteration, with a convergence toler-ance as low as 0.01 substantially reduces the overall amount of computational work. The excellent performance of the preconditioner in particular near the so-lution may suggest that regardless of what preconditioner is used throughout the iteration, it may be worthwhile to switch to a preconditioner of the form explored here as the iterates approach the solution. Chapter 4 52 Extensions of Augmentation Preconditioning In this chapter we consider various extensions of augmentation preconditioning. This includes modifications to deal with regularized or stabilized saddle point systems (saddle point matrices with, non-zero (2,2) blocks). We also consider approximations to augmentation preconditioners for dealing with special classes of saddle point systems. Throughout this chapter we refer to saddle point matrices by and denote generic augmentation preconditioners (as in Chapter 3) with In Section 4.1, we consider variants of augmentation preconditioning to deal with stabilized systems. When a saddle point problem is ill-posed, or an inertia shift in the saddle point matrix is necessary, a stabilizing matrix is often used to replace the (2,2) block of A. It is not immediately obvious whether directly applying M. as a preconditioner will be effective. We consider the spectral clustering of M.~lA for regularized systems. Section 4.2 makes use of the strong theoretical results obtained for stabilized augmentation preconditioners to motivate two novel applications. We consider (4.1) M = (4.2) V Chapter 4. Extensions of Augmentation Preconditioning 53 the possibility of approximately solving saddle point problems by artificial sta-bilization. We also present an argument showing that, under mild assumptions, the preconditioner approach to regularized saddle point systems is guaranteed to be more efficient than a direct solve on the stabilized' matrix. We shift our focus in Section 4.3 and consider approximate augmentation preconditioners. In many applications, the saddle point systems that arise have simple (1,1) blocks. Sometimes they are diagonal, or narrow-banded, and easy to factor or solve with, making a reduction to Schur complement feasible. When explicitly formed, augmentation preconditioners generally introduce fill-in to the (1,1) block. It is worth exploring variants of augmentation preconditioners to deal with this problem. This chapter deals primarily with theoretical results and proposed solution methods. Numerical experiments will be undertaken shortly to solidify the results presented here. 4.1 Preconditioning Stabilized Systems Consider the following stabilized saddle point matrix: ( N G AT , (4.3) where S is a symmetric positive definite regularizing matrix. The subscript "R" indicates a regularized system. In practice, S is often a scaled identity. In many applications, ||S||2 is very small. Examples of regularized saddle point systems in optimization problems can be found in [16, 33]. Stabilized systems also arise often in computational fluid dynamics, see [15] for instance. More general problems leading to this matrix structure can also be found in [5]. ' Suppose we wish to apply M (defined in (4.2)) as a preconditioner, and consider the generalized eigenvalue problem that the eigenvalues of the precon-Chapter 4. Extensions of Augmentation Preconditioning 54 ditioned matrix Ai 1AR satisfy: An arbitrary choice of W leads to the complicated eigenvalue problem: (A - l)Gx + \ATW~lAx - AT{\W + S^Ax = 0, and it is evident that with a general W further analytical progress will be difficult. However, when S is non-singular, the particular choice W = S results in a simplified problem. Notably, with this choice the (1,1) block of the precondi-tioner becomes the dual Schur complement. With this choice, the eigenvalue problem for the preconditioned matrix can be simplified to (A 2 - l)Gx + (A 2 + A - l)ATS-1Ax = 0, by substituting y = j^S"1 Ax. This results in spectral clustering properties similar to those guaranteed for the asymmetric preconditioners of Section 3.2.1. That is, A = 1 with multiplicity n — m, and A = each with multiplicity p, where p is the nullity of G. In many situations this is quite satisfactory, as relatively few unclustered eigenvalues remain. The number of distinct eigenvalues, though, is dependent on the nullity of G. When G is symmetric positive definite, far fewer eigenvalues are clustered. In fact, the preconditioned system could have up to m + l distinct eigenvalues. When m is large, many Krylov iterations could be required to achieve an accurate solution. It is natural to search for variants of Ai that lead to a tightly clustered spectrum regardless of the nullity of G. Chapter 4. Extensions of Augmentation Preconditioning 55 A n A s y m m e t r i c Precondi t ioner Some exploratory analysis can yield a modification to M so that the precon-ditioned matrix has excellent spectral properties regardless of the nullity of G. Consider the preconditioner The minus sign in the (1,2) block is by design, and leads to interesting results. A related approach has recently been explored for problems in fluid flow [6], whereby a similar preconditioning methodology is used, but the preconditioner is applied to an augmented linear system. In that work, properties of the under-lying differential operators are exploited to obtain computationally inexpensive approximations. The preconditioner is no longer symmetric, but as will be shown, its spec-tral clustering properties more than compensate for this shortcoming. In fact, the preconditioned stabilized saddle point matrix MT^ AR has only two distinct eigenvalues. The following theorem describes the eigenvalues of the precondi-tioned matrix, when G is symmetric positive semi-definite, and S is symmetric positive definite. Theorem 3. Let S be symmetric positive definite, and G be symmetric pos-itive semi-definite. The preconditioned matrix M.~R AR has only two distinct eigenvalues, X — ± 1 . Proof. When G is semi definite, the (1,1) block of MR is invertible (since G + ATS~lA must be symmetric positive definite). The preconditioner is block upper triangular, so we can write its inverse, and obtain an analytical expression Chapter 4. Extensions of Augmentation Preconditioning 56 for MR1AR. That is, M~RLAR = S-^A {G + ATS~1A)-1 (G + ATS-1A)-1ATS~1 \ IG AT 0 S - 1 I I A - 5 I 0 The eigenvalues of a lower triangular matrix lie on its diagonal, so clearly for M^AR, A = ± 1 . Checking the dimensions of the block matrix reveals the multiplicity of A = 1 is n, and the multiplicity of A = —1 is TO. Theorem 3 has important implications. The fact that the preconditioned matrix only has two distinct eigenvalues implies that a preconditioned Krylov solver should converge to the exact solution in just two iterations. While Theorem 3 holds when G is symmetric positive semi-definite, it does not give insight into the distribution of the preconditioned eigenvectors. It is es-sential that the preconditioned matrix has a complete set of linearly independent eigenvectors to explain the convergence of a Krylov solver. Since the precon-ditioner is asymmetric, it is not immediately obvious whether it is possible to diagonalize the preconditioned matrix. These issues can be addressed, however, by constructing a proof using the techniques of the proof of Theorem 1. This requires the following definitions. Let QR = AT be the QR factorization of AT. Since AT is n x TO, Q is n x m, and R is TO x TO. Define Z to be a n x (n — m) orthogonal basis for the null space of A. Since Z UQ forms an orthogonal basis for R™, any vector can be written as: x — Zxz + Qxq. Theorem 4 . Assume that G+ATS~1A is non-singular. For the preconditioned matrix M.R^AR, the eigenvalue A = 1 has (algebraic and geometric) multiplicity • Chapter 4. Extensions of Augmentation Preconditioning 57 n. For A = 1, n — m of the eigenvectors can be written in the form (^Q^J > and ( Qe m linearly independent eigenvectors can be written in the form x c_\ J , T The eigenvalue A = —1 has (algebraic and geometric) multiplicitym, with eigen-vectors (^^j • A minimum residual Krylov method can be expected to converge in two iterations. (z Q o\ Proof. We define P = to be (n + m) x (n + m). We express x = V° 0 V Zxz + Qxq, and let v = (xz,xq,y)T where Pv = (xz,xq,y)T. The generalized eigenvalue problem can then be written as PTARPV = \PTM.RPv. This yields: f ZTGZ ZTGQ 0 A fx.\ QTGZ QTGQ R 0 RT -S Xq \yj = A ZTGZ ZTGQ 0 QTGZ QTGQ + RS~lRT -R 0 O S (4.4) From this equation linearly independent eigenvectors can be immediately iden-tified. Testing A = 1 yields the system of equations which must be consistent for A to be an eigenvalue. The resulting system is = 0. Clearly, choosing (xz,xq,y) = (ej,0,0) satisfies this equation. Since xz has length (n — m) x I, n — m linearly independent eigenvectors of this form are present. In addition, m more linearly independent eigenvectors corresponding to A = 1 are present. These are of the form (xz,xq,y) = (0, e i , \ S ~ l R T e i ) . Since we have found n linearly independent eigenvectors, it follows that the geometric multiplicity is n for A = 1. . (° 0 0 \ (") 0 -RS-'RT 2R Xq RT - 2 5 y w Chapter 4. Extensions of Augmentation Preconditioning 58 For A = — 1, the analysis is much simpler. Inspection immediately reveals any m linearly independent eigenvectors of the form (x,y) = (0,ej satisfy the eigenvalue problem. The geometric multiplicity of A = —1 is therefore m. A complete set of linearly independent eigenvectors have been found for the two eigenvalues of the preconditioned matrix MJ^AR. This guarantees that a Krylov method will converge in just two iterations. • Remark. The analysis made no assumptions on the definiteness of G. The only assumption in the proof was that MR be non-singular. This requires non-singularity of S and G + ATS~1A. The preconditioner is asymmetric, so a method such as G M R E S or BICG-STAB must be used. The spectral clustering, though, is sufficient to guarantee that either of these methods would converge rapidly. With this preconditioner, a Krylov solver essentially behaves like a direct method, computing an exact solution in just two iterations. The cost per iteration of applying MR in a preconditioned Krylov solver is similar to that of applying M for a non-regularized system. Each iteration requires the solution of a linear system of the form Since MR is block upper triangular, y can be solved for first. Then y can be eliminated from the first block row at the cost of a matrix-vector product with AT and a subtraction from the right hand side. The preconditioner MR is suitable for many stabilized saddle point sys-tems. The fact that spectral clustering is independent of the nullity of G is a very desirable property. This extends the applicability of augmentation pre-conditioners to problems with non-singular (1,1) blocks. The strong spectral Chapter 4. Extensions of Augmentation Preconditioning 59 clustering properties of the stabilized preconditioner can also lead to some other interesting applications, which are presented in the next section. 4.2 Applications of the Stabilized Preconditioner We now consider two novel applications of the regularized preconditioner MR. The tight spectral clustering of the preconditioned matrix M^AR guarantees rapid convergence of a preconditioned iterative solver, and introduces several new uses for the preconditioner. The first application we consider is the use of the regularized preconditioner to solve non-stabilized saddle point systems. We make the distinction between this approach and stabilizing for the purposes of treating ill-posed problems. Our proposal is to stabilize purely for the purposes of taking advantage of the spectral clustering properties of the regularized preconditioned approach. We will show that the stabilized preconditioner can be used to approximately solve non-stabilized systems very efficiently. The second application we consider is of an entirely different nature. We argue that under mild assumptions, the stabilized preconditioning approach can be coupled with a direct solver to achieve exceptional performance. We show that such an approach would require significantly fewer floating point operations than a direct solve with the original stabilized saddle point matrix. 4.2.1 Approximately Solving Non-Stabilized Systems Consider the following situation. A saddle, point system must be solved, but not necessarily exactly. This could occur as an intermediate step in a process such as an interior point method or some other algorithm. Preconditioning the non-stabilized system with the basic preconditioner M (4.2) guarantees at most m + 1 distinct eigenvalues in the preconditioned matrix. If the (1,1) block of Chapter 4. Extensions of Augmentation Preconditioning 60 the saddle point system is highly singular, directly applying Ai to A (defined in (4.1)) does result in fast convergence. When the (1,1) block of A is symmetric positive definite, or has low nullity, and m is large, however, a large number of Krylov iterations could be required. When the (1,1) block of the saddle point system is not highly singular, why not artificially stabilize so that an approximate solution with the regularized preconditioner can be found more efficiently? In the past section we saw that the preconditioned stabilized matrix AiR1AR has only two distinct eigenvalues, regardless of the nullity of the (1,1) block, so only two Krylov iterations should be expected. That is, instead of solving solve using the stabilized preconditioning approach. Note the right hand side of the stabilized problem has not been adjusted. Since we are interested in approximate solutions, it might be possible to achieve a relatively accurate answer if the stabilizing matrix is chosen carefully. To see that the error in the approximate solution can be small, suppose that the regularized system is solved to within a relative accuracy of e. That , a vector x = that ^j[a*| | 2 b ^ 2 ^ e- How accurate does x* satisfy the original, non-regularized system? is, for the right hand side b is computed such Chapter 4. Extensions of Augmentation Preconditioning 61 A simple analysis and use of the triangle inequality reveals that \\Ax*-b\\2 = \\Ax* - ARx* + ARx* - b\\2 < | | ^ E * - ^ H i * | | 2 + Mfla;*-6||2 < M-^|| 2||a:*||2+£||a:1|2 < | | S | | 2 | |S12 + 6| |Z*| |2. • Thus l l ^ - 6 | | 2 < | | 5 | | 2 + . 11*12 As long as | |5 | | 2 is small, the error in the approximate solve is also small. With choices such as S = 51, with S <SC 1, this can be easily attained. This approach of artificially stabilizing saddle point problems might be very effective for interior point methods. In early steps, before the saddle point ma-trix is highly ill-conditioned, the stabilized approach could ensure fast conver-gence. As the solution is approached, and the importance of accurately solving the step equation increases, the original method could be used. Fast convergence would still be expected because of the ill-conditioning near the solution. 4.2.2 A n Assisted Direct Solve Theorem 3 guarantees the preconditioned matrix M^AR has only two distinct eigenvalues. Thus a preconditioned solver should require only two iterations. It is interesting to consider if this rapid convergence can be used to guarantee that the preconditioned approach can beat a direct solve in terms of floating point operation counts. We contend that, under mild assumptions, factoring G + ATS~lA and then applying the preconditioner, using the factors to solve MRX = b at each iteration, is faster than a direct factorization and solve with AR. The analysis in this section relies on the following fact: Gaussian elimination with partial pivoting for a general (not necessarily sparse) N xN matrix requires Chapter 4. Extensions of Augmentation Preconditioning 62 | . / V 3 + 0(N2) floating point operations [11, pp. 44]. The justification for our argument is that it is much cheaper to factor a n x n matrix than a (n + TO) x (n + m) matrix. We also require that TO is large relative to n, otherwise benefits diminish. Let us assume that for AR the stabilizing matrix is diagonal. Let us further assume that though the constraint matrix and (1,1) block G may have sparse structure, they are sufficiently complicated that no obvious simplifications can be exploited (such as reduction to the Schur complement). It then follows that !(n+m) 3 +0((n+?n) 2 ) operations are required in the factorization and forward-backwards substitution solves of the saddle point system. This is the cost of a direct solve of the linear system ARX = b. Now we consider the cost required to solve ARX — b using the preconditioned approach. Let us assume that a matrix-vector product with A or AT can be performed in fc^n operations, and that a matrix-vector product with G can be done in ken operations. With S = SI, forming ATS~1A requires fc^n2 operations. Explicitly forming G+ATS~1A then requires (kA + l)n2 operations. The cost of factoring G + ATS~1A is ^n3 + 0(n2), since it is symmetric positive definite and a Cholesky factorization can be used [11, pp. 78]. To apply the preconditioned Krylov solve requires two iterations, and at each iteration a matrix-vector product with AR is required at a cost of (ka + 2kA)n + m. At each iteration the solution of a linear system of the form must also be computed. This can be done by a simple block elimination of y at a cost of TO + (k^ + l)n, followed by solving (G + ATS~1A)x = bi + ATy. With computed factors of G + A T S ~ 1 A, the solve for x at each iteration requires 0{n2) operations. Chapter 4. Extensions of Augmentation Preconditioning 63 The total cost for the preconditioned approach is then the cost of forming and factoring G + ATS^1 A, plus twice the cost of matrix-vector multiplication with AR and solving with MR. Thus the total cost is • \n3 + kAn2 + 2((kG + 2kA)n + m) + 2{(kA + l)n + m) + 0(n2) The 0(n2) term is the cost of performing two substitutions with the factors of G + ATS~1A. Clearly, when kA is small, ^ n 3 is the dominant term in the cost of the preconditioned approach. When kA is close to m, kAn2 must also be considered. For the direct solve approach, the dominant cost term is | ( n + rn) 3 = | n 3 + 2mn2 + 2m2n + | m 3 . Assuming m is not too small (relative to n), each of these terms, must be considered. For large matrices (n large) we can look at the ratio of expected work required by both methods to see which method is best asymptotically. The asymptotic ratio of expected cost of the preconditioned approach to the direct solve is then I n 3 I 3 - _ 3 . § n 3 + 2mn2 + 2m2n + §m 3 § + 2a + 2a2 + f a 3 ' We have substituted m = an, for some a £ (0,1). With a = 1, we find that the preconditioned approach requires a factor of 16 fewer operations asymptotically. For a = 0.5 and a = 0.1, the preconditioned approach requires a factor of 3.37 and 1.3 fewer operations. The preconditioned approach is most beneficial when m is close to n, and when kA is small. For dense A, kA = 2m, and the ratio of expected operation counts can be written 3-+2a | + 2a + 2a2 + | Q 3 ' When m = n, the preconditioned approach beats the direct solve by a factor of Y- A s m decreases, the number of operation counts for the preconditioned Chapter 4. Extensions of Augmentation Preconditioning 64 1.2. ; ...O' ,nV=q:5'n;: :—r*-^ r m =1'n 10-o 4> •2. 0 l * i , j , j £ 1 Li 1. •;1000 •1200.' -1400; '1600 1800; :2000 :;2200; 2400; ,;2600; 2800, '3000 Figure 4.1: Demonstrating the advantages of the assisted direct solve approach on randomly generated matrices. A non-zero density of 1 percent was used to construct random matrices A and G. The ratio of times required for an LU approach to the preconditioned method are plotted against n. The three curves (from top to bottom) are m — n, m = 0.5n, and m = O.ln. method approaches a factor of 2 better than for the direct solve, a result of the benefits of a Cholesky decomposition. To demonstrate the possible advantages of the preconditioned approach, preliminary numerical experiments were conducted. For a range of m and n values, randomly generated constraint matrices A and symmetric matrices G were used to create test saddle point systems. The stabilizing matrix S = 1 0 - 4 / was used. For randomly generated right hand sides, the times required to solve the linear system using an LU factorization and the preconditioned approach were recorded. Figure 4.1 depicts the ratios of time required by the direct method to the preconditioned approach over a range of n values, for m = O.ln, m = 0.5n, and m = n. The plot shows that for m = n the time required to solve the larger test problems was reduced by a factor greater than 12. Even for m — 0.5n the preconditioned approach requires less than one fifth of the time Chapter 4. Extensions of Augmentation Preconditioning 65 needed by the direct method. These results suggest that the preconditioned approach, or "assisted" direct solve, deserves serious consideration. For medium size problems, where factoring the saddle point matrix is possible but slow, the preconditioned approach could be used to great benefit. 4.3 Approximate Augmentation Preconditioners We now revert back to our study of non-stabilized saddle point systems, and consider the situation where the (1,1) block G of A is diagonal, or otherwise simple to invert or solve with. In this situation, augmenting with ATW~1A can lead to substantial fill-in, and a problem that was originally sparse can become dense. We now consider two approximate augmentation preconditioners, and demonstrate how they can avoid this problem, while retaining good spectral clustering properties. A Column Space Approximation For the generic augmentation preconditioner M (defined in Equation (4.2)), we consider W = ^AAT. With this choice, the augmented (1,1) block of the preconditioner becomes G + ^AT(AAT)~lA. Explicitly forming this matrix is computationally infeasible, but it can be approximated. We approximate the augmentation preconditioner with the following precon-ditioner How is this approximation related to the original preconditioner, and when is it valid? To answer this, consider vectors x in the column space of AT. For any such vector, AT{AAT)~l A behaves like / . This can be understood by observing Chapter 4. Extensions of Augmentation Preconditioning 66 that AT(AAT)~1A is a projection matrix onto the column space of AT, or though the following simple analysis. Letting QR = AT be the QR factorization of AT, the (1,1) block of precon-ditioner M can be rewritten as G + •~fQR(RTQTQR)~lRTQT = G + •yQQT. Since any x in the column space of AT can be written as x = Qv for some m x l vector v, ,(G + -yAT(AAT)-1A)x = Gx + 1QQT{Qv) = Gx + 'yQv = (G + -,I)x Thus if x is in the column space of AT, the preconditioner behaves as an aug-mentation preconditioner with W = AAT. It is unreasonable to assume that the solution vector and intermediate it-erates all lie in the column space of AT; however it is useful in motivating the preconditioner Mc- We now consider spectral properties of the preconditioned matrix MQ1 A. Propos i t ion 2. The preconditioned matrix MQ1A has eigenvalue A = 1 with multiplicity m. Proof. The' preconditioned eigenvalues satisfy the generalized eigenvalue prob-lem Ax = XMcx. That is From the second row, we obtain y = ^(AAT) lAx. Substituting this into the first block row yields ( A 2 - \)Gx + X2jIx-jAT{AAT)-1Ax = 0. (4.5) From the arguments preceding the proposition, we know that if x is non-zero Chapter 4. Extensions of Augmentation Preconditioning 67 and is in the column space of AT, then AT(AAT) lAx = x. It therefore follows that (4.5) can be written as when x is in the column space of AT'. Since Gx and x cannot be zero simulta-neously, it follows that A = 1 with multiplicity TO, as required. A n advantage to this approach is that the augmentation to the (1,1) block of A does not introduce any fill-in. When a saddle point system has a diagonal (possibly singular) (1,1) block, this results in a preconditioner that can be solved with relatively easily. The primary difficulty in using this preconditioner remains in solving systems such as AAFx = b at each Krylov iteration. Interestingly, this preconditioner results in similar spectral clustering prop-erties to constraint preconditioners [25]. This method may be most effective for solving multiple systems with the same constraint matrix, since AAT could be factored once and then reused. A Null Space A p p r o x i m a t i o n We now consider an alternative approximation to augmentation precondition-ing. Analogous to the discussion of the column space approximation, this ap-proximation holds for vectors in the null space of the constraint matrix. This approximation, however, is only valid when G is non-singular. Consider the preconditioner Similar to the constraint approximation discussion, this preconditioner be-(A 2 - \)Gx + (A 2 - l ) 7 a ; = 0 • Chapter 4. Extensions of Augmentation Preconditioning 68 haves exactly as an augmentation preconditioner. for vectors in the null space of A. Since the null space of A is of dimension n — m, better spectral clustering properties could be attained. Spectral properties of the preconditioned matrix are given by the following proposition. P ropos i t ion 3. The preconditioned matrix AiJf1A has eigenvalue A = 1 with multiplicity n — m. Proof. The preconditioned eigenvalues satisfy the generalized eigenvalue prob-lem Ax = XMNX. That is As in the proof of Proposition 2, the second row of the matrix equation yields y = j(AAT)~1 Ax. Substituting into the first row and rearranging yields (A 2 - \)Gx - AT(AAT)~lAx = 0. Clearly, any x £ null(A) leads to (A 2 — \)Gx = 0. By the non-singularity of A, this implies that A = 1. Since n — m linearly independent vectors can be found int he null space of A, it follows that the. multiplicity of A = 1 is n — m as required. • The null space approximate preconditioner leads to fewer distinct precondi-tioned eigenvalues than the constraint approximation preconditioner, but it is only usable when G is non-singular. It is worth pointing out that both precon-ditioners are most useful in a process involving repeated solves of saddle point systems with the same constraint matrix. In that situation, if possible, a one time factorization of AAT can be constructed, Chapter 5 Conclusions 69 In this thesis we have considered the problem of efficiently solving saddle point linear systems arising from interior point methods. Reproduced here for conve-nience, we have denoted saddle point systems by (5.1) The primary focus of this work has been the theoretical and numerical investi-gation of augmentation preconditioners. Represented by (G + ATW-1A O \ M = . , (5.2) I 0 WJ these preconditioners have many attractive properties. We have provided a detailed introduction to interior point methods for quadratic programming (QP) problems. We considered the following QP prob-lem: min \xT Hx + cTx, x 2 subject to Ax = b, Cx > d. From the optimality conditions for a solution, we described an interior point framework for finding solutions. At the core of an interior point method is the repeated solution of saddle point systems. Known as step equations, these linear systems become increasingly ill-conditioned as a solution is approached. Chapter 5. Conclusions 70 We focussed on applying augmentation preconditioners to efficiently solve the step equations arising from an interior point method for QP problems. Af-ter analyzing asymmetric generalizations of M, we proved that M~XA has eigenvalue A = 1 with multiplicity n, and A = — 1 with multiplicity equal to the nullity of the.(1,1) block of A. The remaining eigenvalues lie in the interval (-1,0). Our theoretical results show the potential suitability of applying augmen-tation preconditioning to the step equations. The approach to a QP solution involves a step matrix with an increasingly ill-conditioned and singular (1,1) block. The preconditioner is therefore expected to perform best close to the QP solution, when the step equation is hardest to solve. The excellent performance of the preconditioner in particular near the solution suggests that regardless of what preconditioner is used throughout the iteration, it may be worthwhile to switch to an augmentation preconditioner as the iterates approach the solution. The choice of the weight matrix W is crucial and we have tested with sim-ple scaled identity matrices that incorporate a dynamic choice of the parameter throughout the iteration. A sensible choice may substantially reduce the itera-tion counts throughout, as we demonstrate by the "flattened hump" for an L P problem in Figure 3.1. The merits of the approach were confirmed by the excellent performance witnessed on problems from the N E T L I B and C U T E r test suites [21]. We experimented with the use of inexactness in our solution of the step equation. Interestingly, we found that we could still converge to the optimal QP solution with a loose tolerance on our Krylov solver. A convergence tolerance of 0.01 for MINRES substantially reduced the overall amount of work involved in solving a Q P . Arguably the (1,1) block of an augmentation preconditioner need never be Chapter 5. Conclusions 71 formed, as solving systems such as Mx = b could be done iteratively using only matrix-vector products. In the situation that it is desirable to form M (for example when a preconditioner is sought), the presence of a dense row in A could lead to a dense (1,1) block in M. To counter this problem, we presented a row removal scheme that resulted in excellent clustering properties, and demonstrated its performance on C U T E r "problems. Several interesting modifications of augmentation preconditioners were also explored. Specifically, we considered the problem of preconditioning regularized saddle point matrices of the form where S is symmetric positive definite. We introduced an asymmetric modifica-tion to the augmentation preconditioner that resulted in excellent clustering of preconditioned eigenvalues regardless of the nullity of G. These preconditioners are of the form When G+ATS~1A is non-singular, the preconditioned matrix M^AR has only two distinct eigenvalues, and a complete set of linearly independent eigenvectors. This result holds for the very important subset of problems where G is semi-definite. The spectral properties of the preconditioned matrix M^AR are sufficient to guarantee that a minimum residual Krylov method will converge in two iter-ations. This intriguing property led to the conception of two additional applica-tions of regularized augmentation preconditioning. This included an argument showing how the preconditioned approach can be proven to require fewer float-ing point operations than a direct solve of the regularized saddle point system. Chapter 5. Conclusions 72 This thesis has illuminated many of the possible benefits of augmentation preconditioning. We have shown that augmentation preconditioners could play an important role in solving the step equations of interior point methods, but also their potential importance in a much broader context. The extensions of augmentation preconditioners to regularized systems and related applications could be the tip of the iceberg. In general, augmentation preconditioning shows great promise and is worthy of much further consideration. 73 Bibliography U . M . Ascher and C. Greif. A first course on numerical methods. UBC Lecture Notes, 2006. S. Bellavia. Inexact interior-point method. J. Optim. Theory Appl, 96(1):109-121, 1998. S. Bellavia and S. Pieraccini. Convergence analysis of an inexact infeasible interior point method for semidefinite programming. Compt. Optim. Appl, 29(3):289-313, 2004. M . Benzi. Preconditioning techniques for large linear systems: A survey. J. Compt. Phys., 182:418-477, 2002. M . Benzi, G. Golub, and J . Liesen. Numerical solution of saddle point problems. Acta Numerica, pages 1-137, 2005. M . Benzi and M . A . Olshanskii. An augmented lagrangian-based approach to the oseen problem. SIAM J. Sci. Compt., accepted 2006. L. Bergamaschi, J . Gondzio, M . Venturin, and G. Zill i . Inexact constraint preconditioners for linear systems arising in interior point methods. Tech-nical Report MS-05-002, School of Mathematics, The University of Edin-burgh, 2005. L. Bergamaschi, J . Gondzio, and G. Zilli. Preconditioning indefinite sys-tems in interior point methods for optimization. Compt. Optim. Appl, 28:149-171, 2004. M . Bolhofer and Y . Saad. Multilevel preconditioners constructed from inverse based. ILUs . SIAM J. Sci. Compt, 27:1627-1650, 2006. J. Czyzyk, S. Mehrotra, and S. J . Wright. PCx User Guide. Technical Report O T C 96/011, Optimization Technology Center, 1996. J. W. Demmel. Applied Numerical Linear Algebra. S IAM, Philadelphia PA, 1997. H . S. Dollar, N . I. M . Gould, W. H. A . Schilders, and A. J . Wathen. Implicit-factorization preconditioning, and iterative solvers for regularized saddle-point systems. SIAM J. Matrix Anal. Appl., 28(1):170-189, 2006. Bibliography 74 [13] H. S. Dollar and A. J . Wathen. Approximate factorization constraint pre-conditioners for saddle-point matrices. SIAM J. Sci. Compt., 27(5):1555-1572, 2006. [14] I. S. Duff and J . K . Reid. MA27 - a set of FORTRAN subroutines for solving sparse symmetric sets of linear equations. Technical Report A E R E R10533, A E R E Harwell Laboratories, London, England, 1982. [15] H.C. Elman, D.J . Silvester, and A . J . Wathen. Finite Elements and Fast Iterative Methods. Oxford University Press, Oxford, 2005. [16] A . Forsgren, P. E . Gil l , and J . Griffin. Iterative solution of augmented systems arising in interior methods. Technical Report NA-05-03, UCSD Department of Mathematics, 2005. [17] A . Forsgren, P. E . Gil l , and M . H. Wright. Interior methods for nonlinear optimization. SIAM Rev., 44(4):525-597, 2002. [18] R. W. Freund, F. Jarre, and S. Mizuno. Convergence of a class of inexact interior-point algorithms for linear programs. Math. Oper. Res., 24(1):50-71, 1999. [19] E . M . Gertz and S. J . Wright. Object-oriented software for quadratic pro-gramming. ACM Transactions on Mathematical Software, 29:58-81, 2003. [20] G.H. Golub and C. Greif. On solving block structured indefinite linear systems. SIAM J. Sci. Compt., 24(6):2076-2092, 2003. [21] N . I. M . Gould, D. Orban, and P. L. Toint. Cuter and sifdec: A con-strained and unconstrained testing environment, revisited. ACM Trans. Math. Softw., 29(4):373-394, 2003. [22] Anne Greenbaum. Iterative Methods For Solving Linear Systems. S IAM, Philadelphia PA, 1997. [23] C. Greif and D. Schotzau. Preconditioners for saddle point linear systems with highly singular (1,1) blocks. ETNA, Special Volume on Saddle Point Problems, 22:114-121, 2006. [24] C. Greif and D. Schotzau. Preconditioners for the discretized time-harmonic Maxwell equations in mixed form. NLAA, accepted, 2006. [25] C. Keller, N . I. M . Gould, and A. J . Wathen. Constraint preconditioning for indefinite linear systems. SIAM J. Matrix Anal. Appl., 2000. [26] S.. Mehrotra. On the implementation of a primal-dual interior point method. SIAM J. Optim, 2:575-601, 1992. [27] E. Ng and B. W. Peyton. Block sparse cholesky algorithms on advanced uniprocessor computers. SIAM J. Sci. Compt., 14:617-629, 1995. Bibliography 75 [28] J . Nocedal and S. J . Wright. Numerical Optimization. Springer, New York, 1999: [29] A . Oliveira and D. SOrensen.'A new class of preconditioners for large-scale linear systems from interior point methods for linear programming. Techni-cal Report CRPC-TR97771, Center for Research on Parallel Computation, Rice University, 1997. [30] Y . Saad. Iterative Methods for Sparse Linear Systems, Second Edition. SIAM, Philadelphia PA, 2003. [31] G. Strang. Introduction to Applied Mathematics. Wellesley-Cambridge Press, Wellesley, M A , 1986. [32] L . N . Trefethen and D. Bau II. Numerical Linear Algebra. S IAM, Philadel-phia PA, 1997. [33] A . Wachter and L . T. Biegler. On the implementation of a primal-dual interior point filter line search algorithm for large-scale nonlinear program-ming. Math. Prog., 106(l):25-57, 2006. [34] S. J . Wright. Primal-Dual Interior-Point Methods. S IAM, Philadelphia, PA, 1997. [35] Y . Zhang. Solving large-scale linear programs by interior-point methods under the M A T L A B environment. Optim. Meth. Softw., 1998. [36] Y . Zhang. User's Guide to LIPSOL. Optimization Methods and Software, 11-12:385-396, 1999. 

Cite

Citation Scheme:

        

Citations by CSL (citeproc-js)

Usage Statistics

Share

Embed

Customize your widget with the following options, then copy and paste the code below into the HTML of your page to embed this item in your website.
                        
                            <div id="ubcOpenCollectionsWidgetDisplay">
                            <script id="ubcOpenCollectionsWidget"
                            src="{[{embed.src}]}"
                            data-item="{[{embed.item}]}"
                            data-collection="{[{embed.collection}]}"
                            data-metadata="{[{embed.showMetadata}]}"
                            data-width="{[{embed.width}]}"
                            async >
                            </script>
                            </div>
                        
                    
IIIF logo Our image viewer uses the IIIF 2.0 standard. To load this item in other compatible viewers, use this url:
https://iiif.library.ubc.ca/presentation/dsp.831.1-0051722/manifest

Comment

Related Items