A Block Preconditioning Cost Analysis For Solving Saddle-Point Linear Systems by Sarah Claire MacKinnon-Cormier B.Sc, Dalhousie University, 2003 A THESIS SUBMITTED IN PARTIAL FULFILMENT OF T H E REQUIREMENTS FOR T H E DEGREE OF Master of Science in The Faculty of Graduate Studies (Computer Science) The University Of British Columbia October ;' 2005 © Sarah Claire MacKinnon-Cormier 2005 11 Abstract We investigate the cost of preconditioning when solving large sparse saddle-point linear systems with Krylov subspace methods. To use the block structure of the original matrix, we apply one of two block preconditioners. Algebraic eigenvalue analysis is given for a particular case of the preconditioners. We also give eigenvalue bounds for the preconditioned matrix when the precondi-tioner is block diagonal and positive definite. We treat linear solves involving the preconditioner as a subproblem which we solve iteratively. In order to min-imize cost, we implement a fixed inner tolerance and a varying inner tolerance based on bounds developed by Simoncini and Szyld (2003) and van den Eshof, Sleijpen, and van Gijzen (2000). Numerical experiments compare the cost of preconditioning for various iterative solvers and block preconditioners. We also experiment with different tolerances for the iterative solution of linear solves involving the preconditioner. iii Contents Abstract ii Contents • iii List of Tables v List of Figures vi Acknowledgements vii 1 Introduction 1 2 Background . . 5 2.1 Saddle-Point Systems 5 2.2 Krylov Subspace Solvers 7 2.2.1 Conjugate Gradient Method ' 9 2.2.2 Minimal Residual Method . . . 9 2.2.3 Generalized Residual Method 10 3 Block Preconditioners 14 3.1 Constraint Preconditioner 15 3.1.1 Factorizations of Mc 15 3.1.2 Eigenvalues of Mc • • 1 7 3.2 Positive-Definite Preconditioner 18 4 Eigenvalue Analysis 19 4.1 Connections Between the Eigenvalues of Mc and Mp 19 Contents iv 4.2 Eigenvalues of Mc when G — I . 23 4.3 Eigenvalue Bounds 24 4.3.1 Bounding the positive eigenvalues 26 4.3.2 Bounding the negative eigenvalues . . 28 5 Computational Cost of Preconditioning . . . • 31 5.1 Solving Mu = f 32 5.2 Cost of solving preconditioned linear system with GMRES . . . . 35 5.3 Cost of solving preconditioned linear system with MINRES . . . 38 6 Inner Tolerances 40 6.1 Constant Inner Tolerance 40 6.2 Varying Inner Tolerance 41 7 Numerical Experiments 45 7.1 Test Problems 45 7.1.1 stokes2d(0,60) 45 7.1.2 mosarqp2 47 7.2 Comparison of various G values , 49 7.3 The Dominance of the Schur Complement Solve 49 7.4 Comparison of Krylov subspace solvers 56 7.5 Setting the Inner Tolerance . . ' 62 7.6 The Bottom Line 70 8 Conclusions and Future Work 74 8.1 Future Research ' 75 Bibliography 77 A The Matlab Program 81 A . l OURGMRES 81 A.2 OURMINRES ; 84 V L i s t o f T a b l e s 7.1 Checking validity of estimated cost functions 55 7.2 Determining ratio of work for 5-solves for stokes2d(0,60) . . . 57 7.3 Determining ratio of work for 5-solves for mosarqp2, GMRES(15), Mp, and G equal to the identity 58 7.4 Multiple of extra work required for GMRES(15) to converge based on GMRES convergence • 59 7.5 Multiple of extra work required for MINRES to converge based on GMRES convergence 61 7.6 Comparison of number of 5-solve iterations required when using fixed inner tolerance 10~1 0 and varying inner tolerance 63 7.7 Number of 5-solve iterations for stokes2d(0,60) when G equals the identity 70 7.8 Number of 5-solve iterations for stokes2d(0,60) when G equals the diagonal of A 71 7.9 Number of 5-solve iterations for stokes2d(0,60) when G is the incomplete Cholesky factorization of A 71 7.10 Number of 5-solve iterations for mosarqp2 when G equals the identity 72 7.11 Number of 5-solve iterations for mosarqp2 when G equals the diagonal of A 72 7.12 Number of 5-solve iterations for mosarqp2 when G is the incom-plete Cholesky factorization of A . . 73 L i s t o f F i g u r e s 6.1 Comparison of clamped versus undamped varying inner tolerance 44 7.1 ,• Sparsity pattern'of H formed by stokes2d(0,60) . . . . . . . . 46 7.2 Eigenvalues of K, M~1K, and M~1K for stokes2d(0,5) . . . . 51 7.3 Sparsity pattern of Ti formed by mosarqp2 52 7.4. Eigenvalues of H, M^U, and M^H for mosarqp2 53 7.5 Comparison of various values for G 54 7.6 Differences in convergence behaviour for full GMRES, GMRES(15), and MINRES . . . : 60 7.7 Total number of S-solve iterations needed at various inner toler-ances when using GMRES with Mc- •. • 64 7.8 Total number of S-solve iterations needed at various inner toler-ances when using GMRES with M,v . . . 65 7.9 Total number of S-solve iterations needed at various inner toler-ances when using GMRES(15) with Mc 66 7.10 Total number of S-solve iterations needed at various inner toler-ances when using GMRES(15) with M.v 67 7.11 Total number of S-solve iterations needed at various inner toler-ances when using MINRES 68 7.12 X-shaped graphs demonstrating rates at which inner tolerance increases and relative residual decreases as GMRES converges . . 69 Acknowledgements First and foremost, I wish to thank my supervisor, Dr. Chen Greif. Thank you! Thank you! Thank you! I could double the length of this thesis with thank-yous to you alone Chen and still feel indebted. Your guidance, support, encouragement, patience, and kindness will never be forgotten. It has been a true honour working with you. To my second reader, Dr. Michael Friedlander, thank you too. Your thor-oughness and willingness to work under such strict time constraints have cer-tainly not gone unnoticed. Infinite thanks are owed to my family whose love and support have never wavered, even when I wouldn't have blamed them if they had. Mum: we did it! You are an inspiration and I am so proud of you. I look forward to our well-earned adventures in Portugal together. •Papa, tu es ma roche. Mon ancre. J'apprecie ton avis, ton support, et surtout ton amour. Sebastien and Julien, where do I start? I am the middle child sandwiched between two brothers, no wonder I ended up in computer science. You taught me to hold my own, to say what I mean, and to be true to myself. I love you guys. Many thanks to the best office mates a girl could ask for. Micheline and Greg, your friendship alone would have made the last two years worthwhile. Luckily, I got a thesis out of it too. Bullpen 238, you all made me feel so welcome right from the start. Yeah thats right, that's you Vance, Jim, Jeanette, Juan, and Richard. Lastly, I apologize to my roommate and best friend Jackie. Living with me these last few months has been hell. I promise to make it up to you. C h a p t e r 1 I n t r o d u c t i o n i We investigate iterative techniques for computing the solution of the linear system is also commonly known as a Karush-Kuhn-Tucker (KKT) system. We will concentrate on saddle-point systems where A is symmetric positive definite, B has full column rank, and m < n (possibly m <fC n). We assume that H is large and sparse. Hence, the solution must be found with an iterative method where matrix-vector products are the only convenient operation to perform. In recent years, preconditioning has been the focus of much active research. The idea behind preconditioning is quite simple: transform a linear system into another one that is easier to solve. Left preconditioning (1.1) would give us the new linear system where M G K ( n + m ) x is a preconditioner. When a linear system is altered by post-multiplying by a preconditioner, the system is said to be right preconditioned. Although right preconditioning is a common preconditioning technique, we will only focus on left preconditioning throughout this thesis. Therefore, left preconditioning will be referred to simply as preconditioning. Since we are working with linear systems with a particular block structure, we wish to exploit that structure when preconditioning. We investigate the (1.1) where A G R n x n , B G R n x m , and m < n. The above saddle-point linear system M-1Hu = M~1b, (1.2) Chapter 1. Introduction 2 following block preconditioners Mc = | ~_ " | , (1.3) and ( G 0 \ M P = (1.4) \ o ^ G - ^ y where G is an approximation to A. Within this thesis, M.c is called a constraint preconditioner while A4P is called a positive-definite preconditioner. The latter of the two names is appropriate as we will only be considering matrices A and G that are positive definite. In that case, Mp is positive definite, while A4C is indefinite. The goal of this thesis is to compare the cost of preconditioning a saddle-point linear system with the constraint and the positive definite preconditioners when iteratively solving with the Krylov subspace methods GMRES [30] and MINRES [27]. In Krylov subspace methods, solves of the form (1.2) are per-formed at each iteration. Instead of performing this solve with M A T L A B ' S built-in backslash operator, we treat linear solves involving the preconditioner as a subproblem. We solve the subproblem with a second iterative solver embedded into the first. We define the iterations taken by the first solver to be the outer iterations while those taken by the embedded solver are defined to be the inner iterations. We derive cost functions for solving (1.2) based on the block structure of the preconditioner used. The cost functions are broken down into total number of matrix-vector products (MVPs) with B or BT, as well as G-solves and 5-solves. As the name infers, a G-solve is the solution a linear system involving G. An S-solve is the solution of a linear system involving the negation of the Schur complement of Mc- G-solves and S-solves are central to this thesis and are properly defined in Chapter 5. Inner iterations can (and should) be solved with different input parameters than outer iterations. We concentrate on two different ways of setting the inner tolerance in the hopes of reducing the overall cost of preconditioning. Chapter 1. Introduction 3 The first method involves simply setting the inner tolerance to a fixed con-stant independent of the outer iteration. MINRES responds nicely to this meth-ods and has an optimal fixed tolerance of 10 - 2 for our test problems. On the other hand, full GMRES often stagnates when the inner tolerance is larger than the outer tolerance. Similar behaviour can be seen when applying GMRES(15) to the same problems. Stagnation in full GMRES and GMRES.(15) can be resolved by introducing a varying inner tolerance. The idea is to force a tight inner tolerance at the be-ginning of the solve and to relax the tolerance as the iterative method converges. Simoncini and Szyld [31-33], Notay [26] s and van den Eshof, Sleijpen, and van Gijzen [35] all developed similar inner tolerance bounds for inexact matrix-vector products in Krylov subspace methods. Despite different inner iteration definitions, we are able to extend this theory to meet our needs. We implement the varying- inner tolerance for both full GMRES and restarted GMRES. For our test problems, the results are favourable. In all cases where stagnation had occurred with a fixed inner tolerance, GMRES and GMRES(15) with a varying inner tolerance converged. All numerical examples are performed with one of three choices for G. Either the identity, the diagonal of A, or the incomplete Cholesky factorization of A. In the case that G is diagonal, we assume that all diagonal entries are positive and therefore that G is positive definite. The outline of this thesis is as follows: In Chapter 2 we introduce to saddle-point linear systems, their origins, and the fields in which they most often arise. Although it is assumed the reader has some knowledge of iterative methods, an overview of Krylov subspace solvers is given. More specifically, we revisit the conjugate gradient (CG) method, the minimized residual (MINRES) method and the generalized minimal residual (GMRES) method. In Chapter 3, we discuss important properties of Mc and Mp. We present block factorizations and pre-existing eigenvalue analysis for the two block pre-conditioners and the generalized eigenvalue problem formed when applying them to (1.1). Chapter 1. Introduction 4 In Chapter 4 we discuss a new algebraic result which links the eigenvalues of the two preconditioners when we take G equal to the identity. We also show a connection between the eigenvalues of M.c and the singular values of B when G is the identity. Then, we closely follow theory developed by Rusten and Winther [29] to find algebraic bounds for the eigenvalues of M~lH, where M is a block-diagonal positive-definite matrix. The bounds are based on the singular values of the (1,2) block of M~lH. Cost functions to quantify the cost of preconditioning GMRES or MINRES with M.c or Mp are given in terms of MVPs, G-solves, and S-solves in Chap-ter 5. When we solve the preconditioned saddle-point system with a fixed inner tolerance, these cost functions can be drastically simplified by using an average iteration number for an S-solve. Chapter 6 describes the two ways of setting the inner tolerance: fixed and varying. We also show some small cost savings when using a clamped varying inner tolerance (as opposed to an undamped one). Two test problems are chosen to demonstrate our findings in the numerical experiments chapter. We address the differences between the values for G, the dominance of the Schur complement solve, the differences between the conver-gence of the three Krylov subspace solvers, fixed versus varying inner tolerances, and most importantly, the difference between Mc and Mp. We conclude with an overview of future research. 5 C h a p t e r 2 B a c k g r o u n d We assume the reader has a fairly extensive knowledge of linear systems and the various ways of iteratively computing their solutions. This chapter is included to introduce the origin and importance of a particular linear system: the saddle-point linear system. A refresher on Krylov subspace methods is also given as they are the methods of choice throughout this thesis. 2.1 Saddle-Point Systems The K K T system (1.1) often arises from the first-order'optimality conditions for the following classical quadratic program: min ^xT Ax — cTx subject to BTx = d. In the quadratic programming formulation, y represents the vector of Lagrange. multipliers. A solution (x,y) of (1.1) is a saddle-point for the Lagrangian func-tion cf)(x, y) = ^xTAx - xTc + yT{BTx - d). This explains where and why saddle-point systems get their name. To compute the saddle points of <p(x,y), solve V4>(x,y)=0. For a more in depth look at the quadratic program and its origin, we refer the reader to [25, Chapter 16]. Chapter 2. Background 6 Benzi, Golub, and Liesen [3] provide a survey of numerical solution tech-niques used to solve saddle-point systems. They also offer the following exten-sive, though not exhaustive, list of fields where such systems arise: • Circuit analysis • Computational fluid dynamics • Constrained and weighted least-squares estimation • Constrained optimization • Data interpolation'and surface fitting • Economics • Electromagnetism • Finance • Image reconstruction • Image registration • Interior-point methods • Interpolation of scattered data • Linear elasticity • Mesh generation for computer graphics • Mixed finite-element approximations of elliptic PDEs • Model order reduction for dynamical systems • Optimal control • Parameter identification • Structural analysis Chapter 2. Background 7 Typically, saddle-point systems arise when a certain quantity needs to be minimized while assuring that a set of linear constraints are met. In other words, we need to minimize a function of n variables subject to m linear constraints. In some cases, such as in mixed finite-element formulations, constraints are sometimes added to unconstrained problems to acquire stability. Our numerical experiments will be done using saddle-point systems that arise in either optimization or fluid dynamics, more specifically, the Stokes problem. We have added the above lengthy list to convey the importance and abundance of saddle-point systems. 2.2 Krylov Subspace Solvers The literature is rich with ways of solving the saddle-point linear system (1.1). If the system is of a reasonable size, direct methods can be used [7]. There are also null-space methods [14], classical and inexact Uzawa methods [1, 11], splitting schemes [10], and Krylov subspace solvers. We are interested in the solution of (1.1) by preconditioned Krylov subspace solvers. Since the conjugate gradient (CG) method, the minimal residual (MIN-RES) method, and the generalized residual (GMRES) method play a significant role in this thesis, a short overview of Krylov subspace methods is included. Krylov subspace methods date back several decades. Two ground-breaking papers were published in 1952, one by Lanczos [21] and the other by Hestenes and Stiefel [17], describing what we now know today as the CG method. Oddly enough, Hestenes, who was based in Zurich at the time, and Stiefel, who was based in UCLA, developed the same algorithm independently and only collabo-rated on a publication after realizing the coincidence. Initially, Krylov subspace methods were not viewed as iterative methods. Despite their early discovery, it was not until 1971 when Reid [28] demonstrated the importance of the finite termination property that Krylov subspace methods were popularized. Like most iterative methods, Krylov subspace methods are used to solve the linear system Ax = b when A G R n x n is large and sparse or when we only have Chapter 2. Background 8 access to a subroutine that returns the matrix-vector product of A and a given vector. Definition 1: Let A G R n x n andbeW1 (b^O). Then, K.k(A, b) = span{6, Ab, A2b,Ak~lb) is called the k-dimensional Krylov subspace associated with A and b. To understand how Krylov subspace methods work, we first define minimal polynomials. Definition 2. The minimal polynomial p(x) of A £ R n x n , also known as the annihilating polynomial, is the monic polynomial of smallest degree such that P(A) =0. Suppose p(x) is the minimal polynomial of A. Then, we have several desir-able properties: • The roots of p(x) are equal to the eigenvalues of A • p(x) is unique up to scaling. • If A is nonsingular, the solution lies in a Krylov subspace of dimension k where k is equal to the degree of p(x). Hence, the fewer the number of distinct eigenvalues, the smaller the size of the Krylov subspace we must search for the solution. This leads to faster conver-gence. The algorithmic sketch of Krylov subspace methods is as follows: Guess an initial solution x^ of Ax = b (Typically, a;(0) — 0). Set the initial residual = b — Ax^°\ At every iteration i, produce an approximation x^ of Ax = b such that x^ G aj(°) + K.i(r(°\A) satisfies a certain optimality property. Like with all iterative methods, it is essential to use some form of precondi-tioning as will be done in this thesis. Chapter 2. Background 9 2.2.1 Conjugate Gradient Method A s p r e v i o u s l y m e n t i o n e d , H e s t e n e s a n d S t i e f e l a r e r e s p o n s i b l e f o r t h e d i s c o v e r y o f t h e C G m e t h o d . T o d a y , t h e C G m e t h o d i s t h e m e t h o d o f c h o i c e f o r s o l v i n g s y m m e t r i c p o s i t i v e - d e f i n i t e ( s p d ) l i n e a r s y s t e m s . I t i s s t r o n g l y r e l a t e d t o t h e L a n c z o s a l g o r i t h m f o r t r i d i a g o n a l i z i n g m a t r i c e s . I f e x a c t a r i t h m e t i c i s a s s u m e d , i t c a n b e s h o w n t h a t t h e c o n j u g a t e m e t h o d i s g u a r a n t e e d t o c o n v e r g e i n n i t e r a t i o n s w h e n a p p l i e d t o n x n m a t r i c e s . R e i d w a s t h e f i r s t u s e t h e f a c t t h a t t h e s o l u t i o n i s r e f i n e d a t e a c h i t e r a t i o n . H e n c e , i n e x a c t s o l u t i o n s c a n b e f o u n d m u c h b e f o r e c o m p l e t i n g a l l n i t e r a t i o n s . A t e a c h i t e r a t i o n , a n a p p r o x i m a t i o n i W o f Ax = b i s p r o d u c e d b y m i n i m i z i n g t h e e n e r g y n o r m o f t h e e r r o r . N a m e l y , w e a r e m i n i m i z i n g t h e e n e r g y n o r m (x^ — x)T A(x^ — x), w h e r e x i s t h e e x a c t s o l u t i o n . Algorithm 1 P s e u d o c o d e f o r t h e c o n j u g a t e g r a d i e n t m e t h o d Pi = r0 for i = 1 , 2 , . . . do „ _ r(i-i)rr(i-p Pi^l — r(;-2)rr(i-2) p(i) = r(i-l) + 1) q(i) = Ap(i) _ r<i-l)Tr(i-l) (^i) = x ( i - i ) + a . p ( i ) r(i) _ AxM = r ^ " 1 ) - aiqW end for 2.2.2 Minimal Residual Method M I N R E S i s a L a n c z o s b a s e d m e t h o d d e v e l o p e d i n 1 9 7 5 b y P a i g e a n d S a u n d e r s [ 2 7 ] . I t i s a v a r i a n t o f t h e c o n j u g a t e g r a d i e n t a l g o r i t h m t h a t c a n b e d i r e c t l y a p -p l i e d t o s y m m e t r i c i n d e f i n i t e s y s t e m s . I n s t e a d o f m i n i m i z i n g t h e e n e r g y n o r m ( w h i c h i s n o t v a l i d i n t h e i n d e f i n i t e c a s e ) , t h e 2 - n o r m o f t h e r e s i d u a l i s m i n i -m i z e d . Chapter 2. Background 10 The pseudocode for unpreconditioned MINRES in Algorithm 2 is given in [36]. 2.2.3 Generalized Residual Method GMRES was developed in 1986 by Saad and Schultz [30]. It is an extension of .MINRES that is applicable to unsymmetric matrices. Although the algorithm still generates a sequence of orthogonal vectors, this can no longer be accom-plished with short recurrences. The added storage requirements prompted the development of restarted GMRES. See Algorithms 3 and 4 for pseudocode. Chapter 2. Background 11 Algorithm 2 Pseudocode for the MINRES method Compute v\ = b — Axq for some initial guess xq r? = /3i 7i = 7o = 1 0"! = CTo = 0 v0 =0 u>o = = 0 for i = 1,2,... do ^ = cti = (Avi,Vi) Vi+i = Avi - ctiVi - 0Vi-i 8i+l. = ||fi+l||2 5 = liOLi - 7i_iaj/?i . ' Pi = y/s2 + /3f+1 p2 — Oia{ + 7 i_ i7 iA P3 = 7i+i = 5/pi o~i+i = ft+i/pi Wi = (Vi - P3Wi-2 - P2Wi-\)/p\ I N | 2 = k t+ i l lk i - i lb Check convergence; Continue if necessary. V - — c r i + 1 T j end for Chapter 2. Background 12 Algorithm 3 Pseudocode for the restarted GMRES method is an initial guess for j = 1, 2,... do solve r for Mr = b — Ax^ vW =r/\\r\\2 s:=lkl|2ei for ! = l , 2 , . . . , m d o solve w for Mw = Av^ for k = 1,2,..., i do w = w — hktiV^ end for hi+i,i = HHU apply J i , . . . , Ji-i on / i ^ , . . . , / i i + M construct J,; acting on ith and (i + 1)"1 component of h.^ such that (z + l) s* component of Jih.ti is 0 s := JjS if s(z + 1) is small enough then U P D A T E D ) quit end if end for UPDATE(x,m) check convergence, continue if necessary end for Chapter 2. Background 13 Algorithm 4 Pseudocode for UPDATE(5,i) Compute y as solution of Hy = s, in which the upper i x i triangular part of H has Hij elements (in least squares sense if H is singular). s represents the first i components of s. x = x(0) + yiv^ H h yiv{i) s ( » + i ) = | | 6 - J 4 5 | | 2 if x is an accurate enough approximation then quit else =x end if 14 C h a p t e r 3 Block Preconditioners Since we solve (1.2) iteratively, there is no need to form M~lTi explicitly. Doing so defeats the purpose of preconditioning altogether. At each iteration of our iterative solver, we solve a linear system of the form Mz = r. The computational cost of this solve is given in Chapter 5 for two particular choices of M. Choosing M is no easy task. Our selection of M should reduce the overall number of iterations within which our iterative method converges while keeping the cost of each iteration relatively low. We aim for M~lTi to have nicer spectral properties than Ti. In other words, we want M~1Ti to have fewer distinct eigenvalues than Ti and that the distinct eigenvalues are closely clustered together [16]. However, we also hope Mz = r can be solved cheaply for z at each iteration of the method. These two conditions tend not to coincide. A good choice for M should satisfy both conditions as reasonably as possible. Let us look at two extreme cases. If M is equal to Ti, then M_1Ti = / has wonderful spectral properties: all eigenvalues are equal to 1.'However, solving Mz — r is the same as solving Hz = r. We only precondition in the first place if we assume the original matrix is hard to solve for. On the other hand, if M is equal to the identity, then the linear system Iz = r could not be easier to solve. However, the spectral properties of M~lTi are identical to the spectral properties of Ti. We wish to precondition the saddle-point system in such a way that we take advantage of its block structure. This is accomplished through the use of block preconditioners. We focus on the constraint preconditioner (Mc) and the positive-definite preconditioner (Mp) defined in (1.3) and (1.4) respectively. Chapter 3. Block Preconditioners 15 G e l " 1 " 1 is a symmetric positive-definite approximation to A. Regardless of whether we employ Mc or A4P, a choice for the (1,1) block G needs to be made. Again, this choice is not trivial. The algebraic analysis presented is with a general G or with G equal to the identity. All of our numerical experiments are performed with G equal to the identity, the incomplete Cholesky factorization of A, or the diagonal of A. It is assumed in the last case that all diagonal entries of A are positive and that G is positive definite. 3.1 Constraint Preconditioner MC} also known as a constraint preconditioner, was given its name and popu-larized by Keller, Gould, and Wathen [19]. Much of their work stems from [22], where a preconditioner of the same form is used in an optimization setting to precondition Newton's method. One of the most attractive properties of the constraint preconditioner is that the constraints are left intact. The solution always stays on the constraint manifold. 3.1.1 Factorizations of Mc The popularity of the constraint preconditioner can be somewhat attributed to the fact that it has several nice block factorizations. The first that we will present is as follows: Dollar and Wathen [8, 9] have recently published a factorization developed by Wil Schilders. Since the factorization was unpublished at the time, they accredited Schilders by naming the factorization after him. The Schilders fac-(3-1) Chapter 3. Block Preconditioners 16 torization is as follows: ( Gn G21 Bi Mc G12 G22 B2 V Bf Bl 0 j ' Bi 0 L x \ ( D i 0 l\ I Bl Bl B2 L2 E 0 D2 0 0 I f v o 0 i •) \ i 0 o J \^ L f £ T 0 0 \ where Di —Bl 1GnB1 T — LjB1 t — Bx lLi, D2 = L J 1 ( G 2 2 - B2DxBl - EBl - B2ET)L, E = G 2 1 5 r T - B2Di - B2LTB^T. -T 2 ? G can be chosen to define D\, D 2 , and E; or Di, D 2 , and E can be chosen to define G. In either case, two conditions must be met: • G must be nonsingular; • ZTGZ, where Z is an n x (n — m) matrix spanning the nullspace of B, must be symmetric positive definite. The matrices Li e R m x m and L2 € K("-™)x(™-™) m u s t be chosen so that L2 is nonsingular. One of the difficulties of working with Schilders's factorization is that a permutation of rows must be devised that splits B into such that Bi G K m x m i s nonsingular. On the other hand, the 3 x 3 block form gives a lot of flexibility and several families of preconditioners can be derived. Schilders's factorization is not discussed further. We return to our block t factorization of interest, namely (3.1). The (2,2) block of P is the negation of the Schur complement of G which we denote as S = BTG~lB. Chapter 3. Block Preconditioners 17 3.1.2 Eigenvalues of Mc Suppose (v,w) is an eigenvector of the preconditioned matrix A4~lH with asso-ciated eigenvalue A. This leads to the following generalized eigenvalue problem Let / T B = QT = ( Y z){^ 0 be an orthogonal factorization of B such that T 6 ]R m x m i s upper triangular, V 6 R n x m is a basis for the range space of B and Z € K n x ( n - m ) - g a j^sis for the nullspace of BT. Then, we can decompose v as follows: v = Yvy + Zvz. for some vectors vv and vz, The following useful proposition was proved in [19]. Proposition 1 (Keller, et al. [19]). Preconditioning H by Mc = where G ^ A is symmetric, implies that has an eigenvalue 1 with alge-braic multiplicity 2m and that the other n — m eigenvalues are defined by the generalized eigenvalue problem ZTAZvz = \ZTGZvz. In the same paper, Keller et al. also showed that the n — m eigenvalues of M~lTi not equal to 1 are bounded by the extreme eigenvalues of G~lA. Let ai, c*2, • • •, an be the eigenvalues of G~lA. Then, ak < Afc < akfm, k = 1,2,... ,n - rn. The latter result illustrates the importance of the selection of G. Unlike the next preconditioner we are about to introduce, Aic is indefinite. Chapter 3. Block Preconditioners 1 8 3.2 Positive-Definite Preconditioner Since we only allow choices of G that are positive-definite and we assume that B has full column rank, we know that BTG~XB is also positive definite. Mp is a block-diagonal matrix whose block diagonals are both positive definite. Therefore, A4P is also positive definite. Numerous forms of block diagonal preconditioners have been used in the past to solve saddle-point problems arising in a variety of applications. As early as 1993, Wathen and Silvester [37] used a positive-definite block preconditioner to precondition stabilized Stokes systems. Batterman and Heinkenschloss [2] used a form of Mp to precondition linear systems arising in optimal control. Kuznetsov's [20] need for a block preconditioner had to do with problems in elliptic finite elements on non-matching grids. Only a few references are men-tioned here for they are many. We direct the'reader to [3] for a comprehensive list. - • Our reason for choosing to pursue A4C and Mp in this thesis has to do with the fact that the middle matrix of factorization (3.1), otherwise known as P, differs from Mp only by a negative sign in the (2,2) block. We exploit this fact in Chapter 4 in order to draw another parallel between the two preconditioners. The similarity between P and A4P also allow us to make comparisons between the cost of forming the two block preconditioners that are otherwise much harder to conduct. Murphy, Golub, and Wathen [24] showed some elegant algebraic results when G = A. In this case, the preconditioned matrix M~1H, where Although analytically elegant, in practice, G = A is not used. If G and besides, forming the Schur complement would be equivalent to solving the original linear system. has at most three distinct eigenvalues: 1, | ± A'1 were easily computed, then there wouldn't b< ie a need for preconditioning, 19 C h a p t e r 4 Eigenvalue Analysis In Chapter 3 we presented some of the analytic eigenvalue theory known for Mc and Mp in the literature. This chapter presents three new results concerning the eigenvalues of our chosen block preconditioners. The first draws a parallel be-tween the eigenvalues of the constraint preconditioner and the positive-definite preconditioner when G is the identity. The second is specific to the constraint preconditioner when G equals the identity. The third result is generic to all positive-definite block-diagonal preconditioners and is based on work done by Rusten and Winther [29]. 4.1 Connections Between the Eigenvalues of M c and Mp Since Mc is symmetric, it is unitarily diagonalizable. Let the spectral decom-position of Mc be Mc = VAVT. Here, A is a diagonal matrix whose entries are the eigenvalues of Mc and V is an orthogonal matrix whose columns are the eigenvectors of Mc- We can write Chapter 4. Eigenvalue Analysis 20 this in term of the block factorization (3.1) RPRT = VAVT RPRT{RTrl = VAVT(RTy1 RP = VAVT(RTy1 RTRP = RTVA(RTV)-1 RTRP = WAW~l where W = RTV. Since RPRT and RTRP are similairty transformations of A, we have just proved the following proposition. Proposition 2. RPRT and RTRP have the same eigenvalues. For an alternate (and more general) proof of proposition 2, we refer the reader to [18, Theorem 1.3.20, page 53]. It is beneficial to work with RTRP instead of RPRT. To find the eigenvalues of . M c , we consider the generalized eigenvalue problem Pv = X(RTR)-1v. (4.1) Let C = G-^B. Then, R1 R = CT I Since RTR is symmetric, we know its inverse will also be symmetric. There-fore, we are searching for matrices X,Y, and Z such that / 0 0 / ( Chapter 4. Eigenvalue Analysis 21 It suffices to solve the following four equations X + CCTX + CYT = I (4.2) Y + CCTY + CZ = 0 • (4.3) CTX + YT = 0 (4.4) CTY + Z = I. (4.5) Solve (4.2) for Z Z = I-CTY. (4.6) Substitute (4.6) into (4.3) Y + CCTY + C- CCTY = 0 This gives us Y = —C which we proceed to plug into (4.4) CTX - CT = 0. Hence, X — I. The validity of (4.2) is trivially satisfied. Therefore, x y \ ( I -C YT Z ) ~ \ - C T I + CTC The generalized eigenvalue problem (4.1) can be rewritten as G o U « . ) _ A ( ' -c ) N 0 -BTG~'B j \ v2 J \ - C T 1 + CTC j \ v2 where v = (vi,V2). Despite many failed attempts, we can say nothing further about the general generalized eigenvalue problem. However, if we allow G = I, this is not the case. Doing so simplifies C to B. Let A ^ 1 be an eigenvalue of Mc (with G = I) and hence of RTRP. Our generalized eigenvalue problem is now reduced to 1 0 \ ( x \ ( 1 ~B \ ( x 0 -BTB J [ y J " M -BT I + BTB j [ y Chapter 4. Eigenvalue Analysis 22 which can be rewritten as x = Xx- XBy- . (4.7) -BTBy = -XBTx + X(I + BTB)y. (4.8) Solve (4.7) for x to obtain x = X(X — l)~lBy. Substitute the value for x in (4.8): A 2 -BTBy = - - -BTBy + X(I + BTB)y. X — 1 Multiply both sides by (A — 1) to eliminate the denominator, -(A - \)BTBy = -X2BTBy + A(A - + BTB)y. After some simplification, we are left with BTBy = (A2 - X)y Now, let p be an eigenvalue of P (with G = I) with associated eigenvector (p,q). Then, ( : - . ) ( : ) - ( : ) • -P and Mp differ only in the fact that the (2,2) blocks are the negations of each other. So, if p is an eigenvalue of P,.then —p is an eigenvalue of Mp. We can rewrite (4.9) as P = m -BTBq = pq. If we focus solely on the second equation, we see" that for m eigenvalues -u = A 2 - A. In the case where G = / , we have found a connection between m eigenvalues of Mc and the eigenvalues of Mv. Chapter 4. Eigenvalue Analysis 23 4.2 Eigenvalues of M.c when G — I If we allow the (1,1) block of the constraint preconditioner to equal the identity once again, we come up with an equality relating the eigenvalues of Mc to the singular values of B. Let B = UYiVT be the singular value decomposition of B. Rewrite our preconditioner as / UY,VT VT,TU 0 U 0 0 V Not only do we have a block factorization of Mc, we have a similarity transfer mation. Therefore, the eigenvalues of A4C are equal to the eigenvalues of Let T J = Pit e\ en+i e2 en+2 v where denotes the ith column of the identity matrix. Pn is a permutation matrix. Therefore, pre-multiplying J by Pv and post-multiplying by P j does not affect the eigenvalues of J . It also leaves us with a more desirable block structure. Namely, ' 1 °x ox 0 1 02 CT2 0 1 o-m o-m ,0 •f-n—m Chapter 4. Eigenvalue Analysis 24 Since we have a block-diagonal matrix, the eigenvalues are simply the eigen-values of the blocks. We already know n — m eigenvalues are equal to 1. The other 2m eigenvalues can be trivially solved since we are dealing with 2 x 2 blocks. The characteristic equation for each block is 1 — Aj <7j o~i —Xi 0 for i = 1, which can be rewritten as A 2 — Xi — of = 0 for i = 1,..., m. By the quadratic formula, we can conclude that the rest of the eigenvalues are 2 In the situation where n 3> m, it is feasible that one would have easier access to the singular values of B than the eigenvalues of the constraint preconditioner. If G equals the identity, these two pieces of information are now one and the same. 4.3 Eigenvalue Bounds Using the same energy estimates procedure as Rusten and Winther [29], we now derive bounds for the eigenvalues of M^H where M is a block-diagonal positive-definite preconditioner. Since M.~lTC is an indefinite matrix, we will be able to bound the negative and the positive eigenvalues separately in terms of eigenvalues of the (1,1) block of M~1H and singular values of the (1,2) block of M^H. Let ( P 0 \ M = V 0 Q where P e R n x n and Q e j £ m X m are positive definite. We can factor M in the Chapter 4. Eigenvalue Analysis 25 following manner - l p-ip-i 0 0 P - 5 0 \ / p-i 0 0 Q-i ) \ 0 g - i By Proposition 2, we know that A- l - 1 ?^ has the same eigenvalues as / P-i 0 \ ( A B \ / P - T 0 \ 0 Q - i j ^ 5 T 0 j \ 0 Q - i / P-iAP-i P-iBQ-i | _ c ~ \ Q-12BTP~l2 0 Let A G A(C) with corresponding eigenvector (x,y). Then, P-lAP-i P-^BQ'i \ ( x \ _ x q-\btp-^ o ) \ y ) ~ Alternatively, we can write this as p-3AP-ix + p-*BQ-2y = \x; (4.10) Q-iBTP~ix = Ay. (4.11) Note that x ^ 0. This is due to the fact that we assume B, and hence P'iBQ~%, has full column rank. Proposition 3. Suppose A G JR™xrl is symmetric positive definite with eigen-values pi > p2 > • • • Mn > 0. Then, MII-TII2 > (Ax,x) > M n | |x|| 2. Proof. Let A = V AVT be the eigenvalue decomposition of A. The columns of V, vi, V2, • • •, vn, are the eigenvectors associated with pi, p2,..., pn respectively. We can write x as a linear combination of the columns of V. Namely, y = Vx. Then, yTAy _ (Vx)TA(Vx) _ xTVTAVx _ xTAx p\x\ H h pn^\ yTy = {Vx)T{Vx) = x T V T V x = xTx = xj + • • • + x2 ' Chapter 4. Eigenvalue Analysis 26 Hence, Mi > —. ;—n— -So, Ul\\x\[2 >(Ax,x)>nn\\x\\2. • Let P~iAP~? have eigenvalues pi > ^2 > • • • > p.n > 0. T h e n , by P r o p o -s i t ion 3, we have fxn\\x\\2 < (p-^AP-ix,x) < fi.Wxf, For a l l x e K". Since B has full co lumn rank, so does P~^BQ~^. L e t P~^BQ~i have singular values CTi > cr2 > • • • > c r m > 0. T h e n we can use P r o p o s i t i o n 3 to show that o-m\\y\\<\\P-tBQ-h\\<°i\\yl For a l l y eM.m and < \\Q-lBTP-iX\\ <CTl | | l | | , For a l l x € range( iJ) . 4.3.1 Bounding the positive eigenvalues L e t A be a posi t ive eigenvalue of M~l1i. P r e - m u l t i p l y (4.10) by xT to ob ta in (P-iAP~ix,x) + (Q-iBTp-$x)Ty = X\\x\\2. Use (4.11) to s impl i fy the second te rm on the left-hand side of the above equat ion and ob ta in (P-vAP-lx,x) + \\\y\\2 = A | | x | | 2 . So, (P-tAP~ix,x) = A | | x | | 2 - A | | 2 / | | 2 . (4.12) W e now use (4.12), along w i t h the inequal i ty Pn\\x\\2 <\\(P-^AP-±X,X), Chapter 4. Eigenvalue Analysis 27 to get ( ^ - A ) | | x | | 2 < - A | | y | | 2 . (4.13) Because A > 0, (4.13) implies that A > pn. This gives us a lower bound for the positive eigenvalues. Next, we derive an upper bound for the positive eigenvalues. To do so, we look at equation (4.11). We know that \\Q^BTp-^x\\=X\\y\\. Square both sides and solve for ||y||2: \\y\\2 = ^\\Q-iBTP^x\\2. (4.14) Substitute (4.14) into (4.12) to get (P~iAP-ix,x) = X\\xf - \\\Q-lBTp-ix\\2. A To eliminate the denominator, we multiply all terms by A, leaving us with X(p-^AP-ix,x) = X2\\xf-\\Q-^BTP-^X\\2. (4.15) Substitute the inequalities (P-iAP-kx,x) <pi\\x\\2 and | |Q-* J B : r J p-*rc| | 2 < a2||a;||2 from Proposition 3 into (4.15) to get ( A 2 - A M l - C 7 2 ) | | x | | 2 < 0 . Since ||a.'||2 > 0, this leaves us with A 2 - A/^i - o\ < 0. By the quadratic formula, we find that < Mi ± VMI + 4g 2 2 We have assumed A is a positive eigenvalue. Therefore, we ignore one of the inequalities and conclude that the positive eigenvalues are bounded from above by • Mi + v V i + 4er2 Chapter 4. Eigenvalue Analysis 28 4.3.2 Bound ing the negative eigenvalues Let A be a negative eigenvalue of M.~lT-L. Substitute inequalities from Proposition 3 into (4.15) to get ( A 2 - A / x „ - o - ? ) | | a ; | | 2 < 0 . . Since ||x||2 > 0, this leaves us with A 2 - Xpn - a\ < 0. By the quadratic formula, we find that < / i n ± y/u2 + 4<T2 ~ 2 We have assumed that A is negative. Therefore, we ignore one of the inequalities and conclude that the negative eigenvalues are bounded from below by M» ~ V > 2 + 4 g 2 2 Finally, we derive an upper bound for the negative eigenvalues. Let x = v + w where v G range(B) and w G null(BT). Note that (P~i AP~iv,-w) = (P~5AP~^w,v) since P~%AP~i is symmetric. Rewrite equations (4.10) and (4.11) in terms of v and w p-iAP-h + P-^AP-^W + p-^BQ-^y = Xv + Xw, (4.16) Q-^BTP~h = Xy. , (4.17) Pre-multiply (4.16) by vT vTP~^AP~^v + vTP~^AP^^w + vTP'^BQ~^y = XvTv + XvTw. We simplify by writing with inner products, norms, and remembering that vTw = 0: •(P-lAP-$v,v) + (P-*AP-*w,v) + (Q-iBTp-iv)Ty = X\\v\\2. (4.18) Use the same strategy as* we did for the positive eigenvalues. Solve (4.17) for y ^ y = ±(Q-iBTP-iv). Chapter 4. Eigenvalue Analysis 29 Plug this into (4.18) to get (P-iAP~iv,v) + (p-iAP-$w,v) + \\\Q~i BT P~h\\2 = X\\v\\2. A Use inequalities (P-*AP-iv,v) < m\\v\\2 and \\Q-% BT P~i v\\2 > a2m\\v\\2 from Proposition 3 and substitute them into (4.18) to get (P-2AP-?w,v) > A - ^ j A v |2 We now have a lower bound for (P 2 AP *w,v). We need an upper bound for (P~%AP~%v,w) = (P~% AP~%w,v). Pre-multiply (4.16) by wT to get wTP~i AP~^v + wJP~i AP'^w + wT P~^BQ~^y = \wTv + \wTw. We can simplify by writing with inner products and norms and remembering that vTw = 0 and (P-%BQ~i)Tw = 0: (P-*AP-h,w) + {P-lAP~*w,w) = A||w||2. (4.19) Substitute the inequality (P-%AP~?w,w) > pn\\wf from Proposition 3 into (4.19) to get (P-*AP-lv,w) < (\-nn)\\w\\2. Since we assume A is negative, we know the above quantity is less than or equal to zero. Putting it all together, we have 0> (X-Un)\\w\\2 >(P-iAP-$v,w) = (P-iAP-iw,v) > (\ - px - ^ \\v\\2. If we ignore the middle terms, we are left with - 0 > ( x - U l - ^ f \ H I 2 . Chapter 4. Eigenvalue Analysis 30 By equation (4.17), if v — 0, then y = 0. However, this reduces equation (4.10) to P~'% AP~?x = Xx. Since P~^AP~% is positive definite, A must be positive, a contradiction to our original assumption. So, we can safely say that 0 > A - m - ^ . ' Multiply both sides by A: 0 < A 2 - A M l - a i . ' By the quadratic formula we find that We assume A is a negative eigenvalue. Therefore, we ignore one of the inequal-ities and conclude that the negative eigenvalues are bounded from above by M i ~ \/Mj + 4Q-2, 2 We summarize the eigenvalue bounds just derived in the following theorem. Theorem 1. Suppose that ( P 0 M = \ 0 Q where P € R n x n and Q G R m X m a r e positive definite, is used to precondition • " n = where A £ M.nxn is positive definite and B £ fl£™xm kas full column rank. Let Mi =^ M2 > ' •' > Mn > 0 6e the eigenvalues of P~?AP~*. Let o\ > a-i > • • • > am > 0 be the singular values of P~iBQ~i. Then, the eigenvalues of A4~lH are in one of two intervals: Mi + VMT+4CT^ pn - V M „ + 4o-2 and Mi + \/Mf + 4t72 Mn! n 31 C h a p t e r 5 Computational Cost of Preconditioning Unfortunately, preconditioning is not free. There is a cost involved in applying a preconditioner to a linear system. An effective preconditioner is one whose additional cost is inconsequential when compared with the overall savings of solving a linear system with improved spectral properties. When solving a preconditioned linear system like (1.2) with an iterative solver such as GMRES or MINRES, the additional cost of applying the preconditioner is directly related to the cost of solving Mu=.f, (5.1) where M. is the preconditioner and / is a given right-hand side. We begin this chapter with two definitions that we refer to often. Definition 3. A G-solve is the process of solving a linear system of the form Gx = y, for x £ Mn. The matrix G £ Wixn is defined by the preconditioner A4C or M.v introduced in (1.3) and (1.4) respectively. Definition 4. An 5-solve is the process of solving a linear system of the form Sx = BTG~1Bx = y, for x £ R m . The matrix B £ R n x m is the (1,2) block of Ti and G £ R n x n is defined by the preconditioner Mc or Mp introduced in (1.3) and (1.4) respec-tively. Chapter 5. Computational Cost of Preconditioning 32 Note: Solving an 5-solve involves solving a G-solve. If the 5-solve is performed iteratively, then a G-solve must be performed at each iteration. Using the above definitions, we can now formulate cost functions associ-ated with solving (5.1) where M is either the constraint preconditioner or the positive-definite preconditioner. 5.1 Solving Mu = / Let us first consider the case of the constraint preconditioner M.c. The linear system of interest is M u f which can be solved in two stages as BTG~1Bw = BTG-1h1-h2; (5.3) and Gv = h 1 - Bw. (5.4) We first describe the solution of (5.3) for w. Solve Gk — hi for k (a G-solve). Next, form the right-hand side by performing a matrix-vector product with B. Namely, let / = BTk — h2. We are left with the 5-solve BTG~lBw = I. (5.5) We will discuss the cost of iteratively solving the 5-solve shortly. For now, we assume that we can find a solution w for (5.5). Next, solve (5.4) for v. Form the right-hand side by performing a matrix-vector product with B. Namely, let r — hi — Bw. Then, solve Gv = r for v (yet another G-solve). Chapter 5. Computational Cost of Preconditioning 33 By using the block structure, the overall cost of solving (5.2) can be com-pressed into equation form as 2 G-solves + 1 5-solve + 2 MVP . (5.6) where MVP stands for matrix-vector products with B or BT. Suppose we wish to solve (5.5) iteratively. Since we assume that G is positive definite and B is full rank, the negation of the Schur complement of G, 5, is also positive definite. The conjugate gradient method is therefore appropriate as an iterative solver. We are given wq, an initial guess for w. At iteration i of the conjugate gradient method, we do the following twice: • Let pi = BiVi (an MVP). • Solve Gqi = pi for g; (a G-solve). • Form B T q i (an MVP). Suppose 7 iterations of the conjugate gradient method are needed for con-vergence. Then, the cost of solving (5.5) is 27 (2 MVP + 1 G-solve). (5.7) Incorporate (5.7) into (5.6) to get a function for the cost of solving (5.2) in terms of only G-solves and MVPs: 2 7 (2 MVP + 1 G-solve) + 2 MVP + 2 G-solves = (47 + 2) MVP + (27 + 2) G-solves. (5.8) Now, let us consider the case of the positive-definite preconditioner Mp. The linear system of interest is Chapter 5. Computational Cost of Preconditioning 34 which can be solved in two stages as . Gv — hi; BTG~1Bw = h2. Here, the order in which the equations are solved does not matter. Simple observation tells us that the cost of solving (5.9) is 1 5-solve + I G-solve. (5.10) In (5.7) we derived the cost of an iterative 5-solve. Thus, the cost of solving (5.9) in terms of only G-solves and matrix-vector products with B (and BT) is 27 (2 MVP + 1 G-solve) + 1 G-solve = 47 MVP +(27 + 1) G-solves. ' (5.11) When we compare (5.8) and (5.11), we see that solving Mcu = / requires 2 more matrix-vector products and one more G-solve than solving Mpu = f. However, as we will see in Chapter 7 (Numerical'Experiments), the 5-solves account for roughly 98% of the total work of solving (5.1). This can also be seen by examining the following ratios: 47 . 2 7 + I 47 + 2 27 + 2' The number of MVPs and G-solves needed to solve Aipu = / appear on the numerator of the first and second ratios respectively. The number of MVPs and G-solves needed to solve Mcu = f appear on the denominator of the first and second ratios respectively. Because 7 > 0, both ratios are less than one and as 7 —> 00, the ratios approach 1. Thus, when 7 is large (typically the case) the cost difference between solving Mpu = / and M.cu = f is minor. Since we are dealing with presumably large matrices, we wish to solve the original preconditioned linear system (1.2) with an iterative solver. We explore two iterative Krylov subspace methods: GMRES and MINRES. Instead of using MATLAB'S built-in backslash operator for equations of the form (5.1), we use the block factorizations just described. Chapter 5. Computational Cost of Preconditioning 35 5.2 Cost of solving preconditioned linear system with G M R E S The iterative solver G M R E S is suitable for any square linear system. 'We have altered MATLAB'S built-in G M R E S function to take advantage of the factorizations of the previous section. Consequently, we have altered the input and output parameters in the following manner: [u,flag,relres,iter,resvec, |cost p = ourgmres(H,b,restart,|tol | , . . . max i t Gi G2 ,u0, pcf lag 1,1 Sf lag , pcSf lag , tolf lag The new or altered parameters are highlighted by a. box. For more implemen-tation details, we refer the reader to the appendix. The output parameter i ter returned by ourgmres is a 2 x 1 vector. The first entry is the number of outer iterations and the second entry is the number of inner iterations. Let i ter = [a /?] where 0 < a < maxit and 0 < (3 < restart. If we are using full GMRES, a = 1 and the total number of inner iterations is p. Otherwise, the total number of inner iterations is (a — 1) x restart + /?. Let 0 if restart = [ ] restart otherwise. Preconditioned GMRES solves a linear system of the form (5.1) at every outer and every inner iteration. As we have already shown, there is an 5-solve involved in solving (5.1) regardless of whether Mc or Mp is chosen. Let 7^ be the'number of iterations required for the 5-solve at iterations a = i and j3 = j. A subscript of 0 indicates the iteration count has not yet commenced for (3. The 1 Chapter 5. Computational Cost of Preconditioning 36 cost of preconditioning (1.1) with A4C is ((47<o + 2) MVP + (27«, + 2) G-solves) i=l + ( ( 4 ^ ' + 2) M V P + ( 27«=i + 2) ^-solves) 3 = 1 • Ot(: — 1 T + I ] H ((47ii + 2) MVP + (2 7 i i + 2) G-solves) i=l j=l = 2 (2^o MVP + 7 i 0 G-solves) + 2 ^ (2 7 Q c j MVP + 7 i i G-solves) i=i j=i + 2 ^ ^ (2 7 i j MVP + 7 i j G-solves) i=i j=i + 2(ac + a c r - t + /3C) MVP + 2(ac + a c r - r + /3C) G-solves (5.12) when GMRES (full or restarted) is the iterative solver used. The subscript c on a and (3 indicates that the preconditioner is A4C. The cost of preconditioning (1.1) with Mp is (47io MVP + (2 7 i 0 + 1) G-solves) i=l + J2 ( 4 7 a P i MVP + (2 7 i i + 1) G-solve) ap-l T + (4^«. M V P + ( 2 7 i i + 1) G-solves) ! i=i j=i = 2 (2 7 i 0 MVP + 7 i0G-solves) + ^ (2 7 ( l p j MVP + 7 y G-solves) a p -l r + 2 ^ ^ (2jij MVP + 7 i j G-solves) + (ap + apr - t-+ Pp) G-solves 1=1 j = l (5.13) when GMRES (full or restarted) is the iterative solver used. The subscript p on a and (3 indicates that the preconditioner is M.v. The right-hand side is the only element of the S-solve changing from one iteration to the next. 7 y is roughly the same regardless of i and j. To get a Chapter 5. Computational Cost of Preconditioning 37 more manageable cost function, let 7 be the average number of iterations needed for a successful 5-solve. Then, the cost of preconditioning (1.1) with Mc and the iterative solver GMRES is approximately (ac + pc + (ac - 1)T) ((47 + 2) MVP + (2 7 + 2) G-solves) = (4 7 (q c + pc + (a c - l)r) +2{ac+pc + (a c - l)r)) MVP + (27 (q c + PC + (q c - l)r) + 2 (a c + /?c + (a c - l)r)) G-solves. (5.14) The cost of preconditioning (1.1) with A4P and the iterative solver GMRES is approximately (ap + pp + (a p - l)r) (4 7 MVP + (2 7 + 1) G-solves) = 4 7 (q p + PP + {ap - 1)T) MVP + (27 (q p + PP + (q p - l)r) + (q p + /?p + (a p - l)r)) G-solves. (5.15) Let us compare the number of matrix-vector products with B (or BT) in-volved in the preconditioning of (1.1) when using both preconditioners by di-viding the number found in (5.15) by the number found in (5.14): 2j (a p + PP + (aP - l)r) + (q p + Pp + (q p - l)r) 27 (a c + Pc + (a c - l)r) + 2 (a c + pc + (a c - l)r)" 1 " ' We expect 7 to be large. Therefore, the second terms on both the numerator and the denominator are menial in comparison to the first terms on the numerator and denominator respectively. Hence, (5.16) is approximately equal to 27 ( a p + Pp + (ap - l ) r ) _ ap + Pp + (ap - l ) r (5.17) 27(a c + /?c + (a c - l ) -r) a c + pc + {ac - l)r ' We can do the same kind of comparison with G-solves by dividing the number of G-solves found in (5.15) by the number found in (5.14): 2 7 (a p + pp + (q p - 1)T) + (a p + ^ p + (a p - l)r) 27(q c+/3 c + ( q c - l ) r ) + 2 ( q c + /?c + ( q c - l ) r ) ' 1 ' ' Once again, the second terms on both the numerator and the denominator can be omitted since their contribution is minor. Hence, (5.18) is approximately equal to Chapter 5. Computational Cost of Preconditioning 38 27 (g p + (3p + (a p - 1)T) = qp + Pp + (a p - l)r 27 (a c + /3C + (a c - l)r) a c + /3C + (a c - 1)T ' The important thing to notice when looking at (5.17) and (5.19) is that neither ratio is dependent on 7. The overall work involved in preconditioning our saddle-point linear system is a function of amount of work per iteration ((5.8) or (5.11) depending on whether we are preconditioning with Mc or Mp respectively) times the total number of iterations. We know the ratios for work per iteration are fairly close. Therefore, the difference in overall cost depends almost entirely on the iteration counts a and /3. > 5.3 Cost of solving preconditioned linear system with M I N R E S The iterative solver MINRES is suitable for any symmetric linear system. Pre-conditioners must be symmetric positive definite. Hence, we only discuss pre-conditioning with Mp. We have altered MATLAB'S built-in MINRES function to take advantage of the factorization in Section 5.1. Consequently, we have altered the input and output parameters in the following manner: [u,f lag.relres , iter .resvec ,resveccg,| cost |] = ourminres(H,b, _ I) to l maxit GI G2 ,u0, Sflag pcSflag The new or altered parameters are highlighted by a box. Let i ter = <p be the number of MINRES iterations. We must solve a linear system of the form (5.1) twice before entering the iterative loop and once at every iteration of the loop. Let 7, be the number of iterations required, for a successful 5-solve when ip = i. Then, the cost of preconditioning (1.1) with Mp when solving with Chapter 5. Computational Cost of Preconditioning 39 MINRES is 47Q MVP + (27a + 1) G-solves + 4-yb MVP + (2jb + 1) G-solves + (47i MVP + (2 7 i + 1) G-solves) i=i = 4( 7 a + 7 6 ) MVP + (2( 7 a + 7 6 ) + 2) G-solves v 5^ (4 7 i MVP + (2 7 i + 1) G-solves) (5.20) where ja and 76 represent the number of conjugate gradient iterations needed to solve the S-solve occurring in the first and second solves before entering the loop. Once again, to get a more manageable cost function, let 7 be the average number of iterations needed for a successful S-solve. Then, the cost of precon-ditioning (1.1) with Mp is -[ip + 2) (47 MVP + (2 7 + 1) G-solves) = (4</>7 + 87) MVP + (2957 + 47 + ip + 2) G-solves (5.21) when using MINRES as the iterative solver. An analytic comparison between the cost functions arising from using GM-RES (with Mp) and MINRES (with Mp) is not easy. [ap,(3v] and </? play significant roles in the total work and we cannot relate the iteration counts. Be-cause of the 3-term recurrence, MINRES is more susceptible to rounding errors than GMRES. In [34], it is shown that the rounding errors propagated to the solution are proportional to k(H)2 for MINRES while for GMRES these errors are proportional only to k(H). This may explain why MINRES tends to need more iterations than GMRES for convergence in all the numerical experiments performed in Chapter 7. 1 C h a p t e r 6 I n n e r T o l e r a n c e s 40 Often solving a linear system with an inexact method leads to solving a sub-problem at each iteration. A second iterative solver can be embedded into the original iterative solver to find the inexact solution of such a subproblem. This concept is known as inner-outer iterations. The outer iterations are associated with the original problem while the inner iterations are associated with the subproblems. The subproblem we are concerned with is the solution of the linear system (5 .1) , which we must do at least orice for every outer iteration of all Krylov sub-' space methods. The subproblem involves an 5-solve which we solve iteratively using the conjugate gradient method. There is no reason to restrict ourselves to using the input parameters of the outer method for the inner method. In our implementation, we allow for .different tolerances and maximum iteration number for the outer and inner methods. In this chapter, we explore two ways of setting the inner tolerance for the inner solve at each iteration of the outer method. We assume all G-solves are done with MATLAB'S built-in backslash operator (as opposed to iteratively). 6.1 Constant Inner Tolerance The first way of setting the inner tolerance is quite simple. We just fix the inner tolerance over all outer iterations, independently of the outer tolerance. We solve each test problem several times with different inner tolerances (which remain fixed throughout the outer iterations). We quantify the cost of each of these runs with the cost functions presented in the previous chapter. We Chapter 6. Inner Tolerances 41 expect the number of outer iterations to be smaller when the inner tolerance is quite strict, leading tosmore inner iterations. Alternatively, we expect that the number of outer iterations to be larger with a loose inner tolerance, leading to fewer inner iterations per outer iteration. As we allow our inner tolerance to grow, we expect convergence to approach that of an unpreconditioned system. As we will see, this is not the case. Full GMRES, and sometimes restarted GMRES(15), stagnate when the inner tolerance is chosen to be greater than the predetermined outer tolerance. Oddly enough, MINRES does not suffer from such stagnation issues. As expected, there is no benefit in solving the inner iterations with a smaller tolerance than the outer tolerance. Furthermore, when the inner tolerance ap-proaches machine precision, the conjugate gradient method stagnates. 1 6.2 Varying Inner Tolerance The second approach allows the inner tolerance to differ from one outer iteration to the next. Initially, the inner tolerance needs to be as strict as the outer tolerance. However, as we approach convergence, the inner tolerance can be relaxed. In this case, GMRES does not suffer from stagnation as it does for the above case. We have implemented an inner tolerance bound proposed by Simoncini and Szyld in [32] and van den Eshof, Sleijpen, and van Gijzen in [35]. In both papers, inner iterations solve matrix-vector products inexactly. This concept of inner iteration comes from the promising numerical experiments carried out by Bouras and Fraysse [5]. Their experiments implemented inexact matrix-vector product inner iterations for numerous Krylov subspace methods, including GMRES. Theory developed by Simoncini and Szyld [31] is only applicable to full GMRES. Despite their "disclaimer, we were able so successfully apply it to GMRES(15). We do not even attempt to apply the theory to MINRES since MINRES does not stagnate when the inner tolerance is large. Although our definition of an inner iteration is different, the theory is still Chapter 6. Inner Tolerances 42 applicable. If we solve the unpreconditioned system (1.1) with a Krylov.subspace method, at each outer iteration i a matrix-vector product involving H is formed. Instead of forming Hvi exactly, (H + E^Vi ( 6 . 1 ) is formed, where Ei is an error matrix. Denote by z{ = H.Vi the exact matrix-vector product and Zi the computed matrix-vector product. Then, \\zi-ZiW = \\EiVi\\ < \\Ei\\\\vi\\. Since Vi (and therefore is given, convergence depends on Ei. We wish to solve the preconditioned linear system (1.2) with a Krylov sub-space method. At each outer iteration i, a matrix-vector product involving A4~1H is formed. Let yi = M-1Hvi, (6.2) and pre-multiply both sides of (6.2) by M. to get Myi = Hvi. Like with the unpreconditioned linear solve, let Zi = Hvi. Then, we have Myi = Zi. We need to solve for y\, a solve which coincides with our subproblem that defines our inner iteration. We perform this solve inexactly. So, Myi = Zi + Si, where y,; is the computed solution. Solve for to get y% = M~lZi + M~1Si = M~lHvi + M~l8i. Then, \\Vi-U = \\M-Hi\\<\\M-1\\\\5i\\. Chapter 6. Inner Tolerances 43 Since A4 (and therefore is given, convergence can be guaranteed by monitoring Si closely. Simoncini and Szyld [32] prove the following proposition which bounds the error matrix in (6.1). Proposition 4 (Simoncini and Szyld [31]). Let e > 0. Assume that m iterations of the inexact Arnoldi method have been carried out and let y^1 be the GMRES solution. Let r^1 = r0 - Axg^ be the GMRES residual after m iterations of the inexact Arnoldi method. If for every k < m, np || Q~m{Hm) 1 -E'fc — II ~gm || e> m F f c - i l l then \\r^n - ?~^m|| < e. Moreover, if 1 1 m K(Hm) then \\(Vm+1Hm)Tr9™\\ <e. Here, V is the matrix whose columns are the Arnoldi vectors, H is the upper Hessenberg matrix formed in Algorithm 3, and (Hm) is the condition number of the Hessenberg matrix. The magnitude of o(Hm) varies significantly depending on m, indicating sensitivity. Hence, we would like to avoid having to compute K(Hm) at each outer iteration. The bound from Proposition 4 is simplified to 1 \E,\\ < •m n _ I F f c - i This bound guarantees overall convergence below the given outer tolerance. Bouras and Fraysse [5] simplify the bound even further by allowing £m = 1, leaving l l ^ l l < TT^S, Chapter 6. Inner Tolerances 44 where k is the outer iteration number, tol^ = \\Ek\\ is the inner tolerance allowed at outer iteration k, fk-i is the computed residual at outer iteration k — 1, and e > 0 is a predetermined value. We choose e equal to the outer tolerance. We demonstrate in the numerical experiments section that using an inner tolerance that is smaller than the outer tolerance is futile. Therefore, we clamp our inner tolerance in the following manner: tolfc = max { 1 £, £ To illustrate the benefit of clamping the inner tolerance, we have solved the problem mosarqpl with both with a clamped and an undamped inner tolerance (See Figure 6.1). outer iteration Figure 6.1: A comparison of clamped versus undamped varying inner tolerances used to solve mosarqpl with full GMRES, outer tolerance e = 10~10, Mc, and G = I The overall cost savings due to clamping based on the cost functions of Chapter 5 is only about 1.1%. However, this is a cost savings nonetheless. 45 Chapter 7 Numerical Experiments All numerical experiments presented in this chapter were run using version 7.0.1 of M A T L A B . Throughout the outer tolerance is set to 10 - 1 0 . When restarted GMRES is used, it is done so with a restart value of 15. We begin by presenting two test problems on which all numerical experiments are performed. 7.1 Test Problems 7.1.1 stokes2d(0,60) Our first test problem is courtesy of Robert Bridson [6]: [A.B.K.f] = stokes2d(k,n) creates a Stokes problem whose domain length is n and fc is inversely propor-tional to the time step. If k = 0, we are solving the stationary Stokes problem. The matrices A and B are the (1,1) and (1,2) blocks, respectively, of the K K T matrix H = K and / is the right-hand side. We have selected values of fc = 0 and n = 60. This selection leaves us with a K K T matrix whose sparsity pattern is depicted in Figure 7.1. It is worth mentioning that excellent problem-dependent preconditioners exist for Stokes problems [12]. We focus purely on an algebraic investigation. Chapter 7. Numerical Experiments 46 Figure 7.1: Sparsity pattern of Ti formed by stokes2d(0,60) Properties of A: dimension rank condition number number of nonzero entries 7080 x 7080 7080 1482 34924 Properties of B: dimension rank condition number 7080 x 3540 3540 107 number of nonzero entries 13982 Properties of K: dimension rank condition number 10620 x 10620 10620 746 number of nonzero entries 62888 Properties of / : dimension number of nonzero entries 10620 x 1 169 Chapter 7. Numerical Experiments 47 Krylov subspace method convergence is strongly related to eigenvalue clus-tering. We have computed all the eigenvalues of a small Stokes problem, s t o k e s 2 d ( 0 , 5 ) . All three graphs of Figure 7.2 display much better clustering when using A4C. This is mainly due to the fact that at least 2m of the eigenvalues of M~XK are equal to one, in keeping with the theory of Chapter 3 and [19]. One of the advantages to preconditioning with Alp is that it does not change the inertia of the initial matrix. 7.1.2 mosarqp2 The second test problem comes from the Constrained and Unconstrained Testing Environment (CUTE) [4]. This test problem was constructed by Morales-Perez and Sargent in [23] as an academic problem purely for testing purposes, and is a convex quadratic program: min xT Ax + cTx dx < Wx < d2 subject to I < x < u where A is symmetric positive definite and B is large and sparse. Because we are only interested in the saddle-point linear system we only consider the equality constrained problem, where we ignore the bounds. Therefore, (7.1) is the first-order conditions of min xT Ax + cTx subject to Wx = d, • Chapter 7. Numerical Experiments 48 with B = —WT. As a right-hand side, we define where e £ R™+m [s a vector of ones. The sparsity pattern of the K K T matrix is depicted in Figure 7.3. Properties of A: dimension 900 x 900 rank 900 condition number 38 number of nonzero entries 990 Properties of B: dimension 900 x 600 rank 600 condition number 192 number of nonzero entries 2930 Properties of Ti: dimension • 1500 x 1500 rank 1500 condition number 77677 number of nonzero entries 6850 Properties of b: dimension 1500 x 1 number of nonzero entries 968 The 1500 eigenvalues of Ti for mosarqp2 are depicted in Figure 7.4. Chapter 7. Numerical Experiments 49 7.2 Comparison of various G values We begin the numerical experiments with a comparison of the three choices of G: the identity, the diagonal of A, and its incomplete Cholesky factorization. As can be seen in Figures 7.5, using the incomplete Cholesky factorization of A allows for convergence in significantly fewer iterations regardless of the test problem, iterative Krylov solver, or preconditioner (Mc or Mp). Allowing G to be the diagonal of A seems to be the an intermediate choice in terms of inner iteration count while G equal to the identity means needing the most number of inner iterations for convergence. This follows the theory that fewer iterations will be needed the closer the preconditioner is an approximation to the original matrix. Each G-solve performed with the incomplete factorization consists of two rounds of back-substitution at a cost of O(n) since G is sparse: Each G-solve performed with the diagonal of A costs n. Hence, G-solves where G is the incomplete Cholesky factorization of A are not substantially more expensive than when G is the diagonal of A. There is no cost to solving a G-solve where G is the identity. . The leftmost graph of figure 7.5 appears to be missing data from inner tolerance 10 _ 1 to 10~9. This is due to the fact that GMRES stagnated at those inner tolerances. All data points accentuated by a square indicate that the inner method (conjugate gradient) stagnated but the outer method was still able to converge. 7.3 The Dominance of the Schur Complement Solve In Chapter 5 we defined the notions of G-solves and S-solves. We were also able to give cost functions as to how much additional work is done when precondi-tioning with Mc or Mp based entirely on G-solves and matrix-vector products with B (and BT). Since S-solves are composed of G-solves and matrix-vector products, we wish Chapter 7. Numerical Experiments 50 to quantify the percentage of G-solves and matrix-vector products with B (and BT) done within S-solves. These percentages determine the dominance of the Schur complement solves in terms of total cost. Table 7.1 validates the use of the simpler cost functions (5.14), (5.15), and (5.21). Therefore, as long as a fixed inner tolerance is used, then 7 can be averaged and the simplier cost functions should be used. Chapter 7. Numerical Experiments 51 * eigenvalues of K \7 eigenvalues of M* 'K O eigenvalues ol M"1 K (a) G is the identity > eigenvalues ol K V eigenvalues ol M"'K O eigenvalues ol M _ , K (b) G is the diagonal of A * eigenvalues ol K •? eigenvalues al M K •j eigenvalues ol M" K (c) G is the incomplete Cholesky factoriza-tion of A Figure 7.2: Eigenvalues of K, M~1K, and M~1K for s t o k e s 2 d ( 0 , 5 ) Chapter 7. Numerical Experiments 52 Figure 7.3: Sparsity pattern of H formed by mosarqp2 Chapter 7. Numerical Experiments 53 40 35 30 0 500 1000 1500 (a) Eigenvalues of Ti 0 500 1000 1500 0 3 500 1000 1500 0 500 1000 1500 2 r • • 0 500 1000 1500 0 500 1000 1500 0 500 1000 1500 (b) Top row: eigenvalues of MclrH where G is identity, diagonal of A , and incomplete Cholesky factorizat ion of A respectively. Bo t tom row: eigenvalues of Mp 1Ti where G is identity, diagonal of A , and incomplete Cholesky factorization of A respectively. Figure 7.4: Eigenvalues of H, Mc lH, and M~lH for mosarqp2 Chapter 7. Numerical Experiments 54 x 10 ~ S — G: identiy H— G: diagonal of A - * - G: incomplete Cholesky factorization JBr-1e-2 1e-4 1e-6 1e-8 1e-10 Inner tolerance 1e-12 1e-14 (a) Solving mosarqp2 wi th full G M R E S using an outer tolerance of 10 and several inner tolerances.. Preconditioner is Aic-1e-2 1e-4 1e-6 1e-8 1e-10 1e-12 1e-14 Inner tolerance (b) Solving s tokes2d(0 ,60) wi th M I N R E S using an outer tolerance of 1 0 - 1 0 and several inner tolerances. Preconditioner is Figure 7.5: Comparison of various values for G Iterative p/c G i ter 7 Actual Estimated Actual # Estimated # solver # MVPs # MVPs G-solves G-solves identity [1 202] 344 279622 279734 140014 140070 GMRES Mc diagonal [1 198] 336 267994 267854 134196 134126 Cholesky [1 61] 108 26852 26908 13488 13516 identity - - - - - -GMRES Mp diagonal Cholesky - - - - - -identity [24 11] 345 525492 524500 263126 262300 GMRES (15) Mc diagonal [31 14] 334 663034 661440 332012 330780 Cholesky [6 11] 108 40068 39808 20126 19936 identity [477 12] 329 10028064 10039764 5021661 5027511 GMRES(15) Mv diagonal [177 12] 319 3610928 3609804 1808292 1807731 Cholesky [26 13] 105 173192 173880 87010 87354 identity 1029 348 1435620 1435152 718841 718607 MINRES Mp diagonal 637 343 876884 876708 439081 438993 Cholesky 179 108 77956 78192 39159 39277 Table 7.1: Checking validity of approximations (5.14), (5.15), and (5.21) of (5.12), (5.13), and (5.20) respectively by solving stokes2d(0,60) with various iterative solvers, both available block preconditioners, and all three G values. Chapter 7. Numerical Experiments 56 The missing data from rows 4, 5, and 6 of Table 7.3 means the given method was unable to converge in a reasonable number of outer iterations due to stag-nation. Table 7.3 demonstrates that the 5-solves are a dominant part of the total work involved in preconditioning a saddle-point linear system regardless of the test problem or preconditioner. It appears that the better G is an approximation to A, the less the percentage of total work is devoted to 5-solves. Upwards of 99.88% of all G-solves are done within 5-solves. Upwards of 99.86% of all matrix-vector products were performed within an 5-solve when preconditioning with the constraint preconditioner. All matrix-vector products performed are done within 5-solves when preconditioning with the positive-definite preconditioner. See (5.10). 5-solves are done as part of an inner iteration. Therefore, it would make sense that the fraction of work done for 5-solves reduces as the inner tolerance increases. To show that this is indeed the case, we have included Table 7.3. Inner tolerances with an asterisk beside them indicate that inner method stagnated but that GMRES(15) was still able to converge to the desired outer tolerance. Despite a rather large range of inner tolerances, 89.42% to 91.64% of all G-solves are still performed within 5-solves. 7.4 Comparison of Krylov subspace solvers The three Krylov subspace methods we implemented are full GMRES, restarted GMRES, and MINRES. From the previous section, we know that the bulk of the work associated with preconditioning (1.1) with either A4C or M.v is in the 5-solves. Hence all comparisons in this section will be done based on the total number of iterations needed to solve the 5-solves over all outer iterations. Since MINRES requires the preconditioner to be positive definite, we will not be able to show a MINRES convergence plot when M.c is used to precondition. Missing data points, which occur (or don't occur) only when the inner tolerance is large,' indicate the outer method stagnated and was unable to converge in Chapter 7. Numerical Experiments 57 Iterative p/c G % G-solves % MVP in solver in 5-solves S-solves identity 99.71% 99.85% GMRES Mc diagonal 99.70% 99.85% Cholesky 99.08% 99.54% identity - -GMRES Mp diagonal Cholesky - -identity 99.71% 99.86% GMRES(15) Mc diagonal 99.70% 99.85% Cholesky 99.09% 99.54% identity 93.88% 100% GMRES(15) Mp diagonal 93.49% 100% Cholesky 93.45% 100% identity 99.86% 100% MINRES Mp diagonal 99.85% 100% . Cholesky 99.54% 100% Table 7.2: Determining ratio of work done for S-solves for stokes2d(0,60) when the inner tolerance equals the outer tolerance. Chapter 7. Numerical Experiments 58 iter 7 Total # of of G-solves % of G-solves in 5-solves. 10-1 [100 9] 29 82434 89.42% lO" 2 [70 6] 142 42326 89.94% lO" 3 [43 15] 272 36948 91.66% IO"4 [40 14] 351 41236 90.83% io- 5 [35 15] 413 46127 91.24% i cr 6 [33 15] 446 50663 91.64% io- 7 [44 9] 472 52300 91.56% io- 8 [45 14] 501 54522 91.56% io- 9 [38 15] 523 56170 91.58% ' 1 0 - i o [38 15] 540 57758 91.59% IO" 1 1 [39 8] 554 59096 91.57% 10-12* [39 8] 565 60264 91.56% 1Q-13* [39 8] 575 61360 91.61% 1Q-14* [39 8] . 586 62814 91.68% IO" 1 5* [39 8] . 587 62814 91.64% Table 7.3: Determining ratio of work done for 5-solves for mosarqp2 precondi-tioned with Mp where G is the identity. Solving with restarted GMRES(15). Asterisk indicate the conjugate gradient method stagnated for those particular inner tolerances. Chapter 7. Numerical Experiments 59 Mc Mp . G: eye G: diag G: chol G: eye G: diag G: chol mosarqp2 1.80 1.00 1.00 2.36 1.15 1.00 stokes2d(0,60) 2.25 2.48 1.49 8.56 4.20 2.25 Table 7.4: Multiple of extra work required for GMRES(15) to converge. Based on average GMRES convergence over inner tolerances 10~10, 10 - 1 1 , and 10~12. a reasonable amount of iterations. Data points that are accentuated with a square indicate the inner method stagnated yet the outer method was still able to converge. In all three graphs of Figure (7.6), the full GMRES method is unable to con-verge when the inner tolerance is greater than the outer tolerance of 10 - 1 0 . We see that the circles which indicate data points associated with the full GMRES method commence at an inner tolerance of 10 - 1 1 and are located directly below the data points for MINRES. When an inner tolerance that is smaller or matches the outer tolerance is used, GMRES is consistently the cheapest method. When restarting actually does take place, GMRES outperforms GMRES (15) and MINRES as can be seen in Tables 7.4 and 7.5. We use full GMRES as the basis and show how much additional work must be done for GMRES(15) and MINRES to converge. Hence, one must multiply a value in one of the tables by the overall cost function of solving with full GMRES to get a cost function for the solving the same problem with a MINRES or GMRES(15). Note: Values were calculated as an average over inner tolerances 10~10, 10 - 1 1 , and 10~12 since GMRES did not converge for larger inner tolerances and the inner method stagnated for smaller ones for all iterative methods. The terms eye, diag, and chol are short form for G equal to the identity, the diagonal of A, or the incomplete Cholesky factorization of A, respectively. As can be seen on the first row of Table 7.4, GMRES and GMRES(15) need virtually the same number of S-solves for convergence when G is a good Chapter 7. Numerical Experiments 60 s I 4 ....<>.... unreftarlsd QMRES •• 4- ~ restarted GMRES(15) - * - • MINRES .... . . - a i 4 ^ - - * ~ r ~ - M i-8 1a-10 Inner tolerance Inner tolerance (a) Solving mosarqp2 with A4C where G (b) Solving mosarqp2 with MP where G is identity. is diagonal of A. —O— unrestarted QMRES - + - restarted GMRES(15) •• • MINRES (c) Solving stokes2d(0,60) with MV where G is incomplete Cholesky factor-ization of A. Figure 7.6: Differences in convergence behaviour for full GMRES, GMRES(15), and MINRES (where applicable) Chapter 7. Numerical Experiments 61 Mp G: eye G: diag G. chol mosarqp2 1.15 1.04 1.00 stokes2d(0,60) 1.01 1.02 1.00 Table 7.5: Multiple of extra work required for MINRES to converge. Based on average GMRES convergence over inner tolerances 10 - 1 0 , 10 - 1 1 , and 10 - 1 2 . approximation to A. This is due to the fact that mosarqp2 is the smallest of the test problems and converges before restarting actually takes place. The restarted GMRES method does not suffer from stagnation for the three examples shown. However, this is not always the case, as we will see in subse-quent numerical experiments to come. Restarted GMRES is used when memory restrictions arise. But, the memory savings come at another cost. At the kth iteration of full GMRES, the solution estimate is computed over a subspace of dimension k. At the kth iteration of GMRES(restart), the solution estimate is computed over a subspace of dimension restart (restart < k). Theoretically, the larger our restart value, the fewer number of iterations needed for convergence. However, this is not always the case. Embree [13] presented 2 examples where GMRES(l) easily converged while GMRES(2) stagnated. We choose restart = 15. For the test problems mosarqp2 and stokes2d(0,60), GMRES(15) is just as stable (i.e., does not suffer from stagnation issues) than full GMRES if not more so. MINRES converges even when the inner tolerance is as large as 10~2. This is consistent with observations made by Notay [26] and Golub and Overton [15] for the conjugate gradient method, another Krylov subspace method whose basis vectors are formed implicitly. Upon observing Table 7.5, it appears that MINRES and GMRES require almost the same amount of work to converge. Theoretically, the two methods should converge identically since solving a symmetric matrix with an Arnoldi based method such as GMRES should reduce to a Lanczos-type solution. We Chapter 7. Numerical Experiments 62 attribute the small difference to the rounding errors mentioned at the end of Chapter 5. 7.5 Setting the Inner Tolerance To experimentally find the optimal fixed inner tolerance, we solve each of our test problems various times, each time with a different value for the inner tolerance in the hope of finding the optimal fixed inner tolerance. The inner tolerances are chosen from the range 10 _ l where i € (0,15]. Using an inner tolerance that is small may leads to stagnation in the inner method since the inner tolerance is too close to machine precision as seen by the data points accentuated by squares for previous figures. Using a tolerance that is too large can lead to stagnation in the outer method making it impossible for convergence to occur. There is some variation in the optimal fixed inner tolerances in Figures 7.7-7.11. For our test problems, the following fixed inner tolerances have worked well: GMRES -» l O " 1 0 GMRES(15) -» l O - 1 0 MINRES -> H T 2 As described in Chapter 6, we have implemented a clamped varying inner tolerance. In order to illustrate the rate at which the inner tolerance will vary as GMRES or GMRES(15) converges, we plot the inner tolerance and the residual of the outer method at outer iteration fc, for two examples. In the previous section, we observed 15 out of 24 cases where full GMRES or restarted GMRES(15) stagnated and was unable to converge when the inner tolerance was set to be larger than the outer tolerance. Table 7.5 shows the total number of S-solve iterations needed to solve those very problems when using the optimal inner tolerance versus using the varying inner tolerance proposed by Simoncini and Szyld [31]. In every case of Table 7.5, allowing the inner tolerance to vary requires the Chapter 7. Numerical Experiments 63 Test Iterative p/c G # S-solve itns # 5-solve itns Problem Solver fixed inner tol varying inner tol mosarqp2 GMRES Mc eye 19464 17169 mosarqp2 GMRES Mc diag 5615 5490 mosarqp2 GMRES Mc chol 1611 1611 mosarqp2 GMRES MP eye 140198 117742 mosarqp2 GMRES Mp diag 18453 17251 mosarqp2 GMRES Mp chol 3209 3209 mosarqp2 GMRES(15) Mc diag 5635 5185 mosarqp2 GMRES(15) Mc chol 1611 1611 mosarqp2 GMRES(15) Mp chol 3209 3119 stokes2d(0,60) GMRES Mc eye 69804 51255 stokes2d(0,60) GMRES Mc diag 66899 47984 stokes2d(0,60) GMRES Mc chol 6682 5085 stokes2d(0,60) GMRES Mp eye - 1791668 stokes2d(0,60) GMRES Mp diag - 846833 stokes2d(0,60) GMRES Mp chol - 29260 Table 7.6: Comparison of number of 5-solve iterations required for convergence when using a fixed inner tolerance of 10~10 and the varying inner tolerance described in Chapter 6. Chapter 7. Numerical Experiments 64 1e-5 1e-10 1e-5 1 e - 1 0 1 e - 5 1e-10 Inner t o l e rance Figure 7.7: Total number of S-solve iterations needed for convergence at various inner tolerances when using GMRES with Mc. In the first column, G is the identity. In the second column, G is the diagonal of A. In the third column, G is the incomplete Cholesky factorization of A. same, or fewer S-solve iterations, than solving with the optimal inner tolerance of 10~10. Missing data indicates that the iterative solver stagnated and was unable to converge even when the inner tolerance equaled the outer tolerance. When the incomplete Cholesky factorization of A is used for G, convergence occurs in few outer iterations and the inner tolerance does not have a chance to be increased, causing the values of the two rightmost columns to be the same. For mosarqp2 the cost savings range from 0% to 16.03%. For stokes2d(0,60), the cost savings range from 23.9% to 28.4%. Loosely speaking, the larger and the more poorly conditioned the problem is, the greater the cost savings in using Chapter 7. Numerical Experiments 65 mosarqp2 1e-5 1e-10 1e-5 1e-10 Inner tolerance 1e-5 1e-10 Figure 7.8: Total number of S-solve iterations needed for convergence at various inner tolerances when using GMRES with M.P. In the first column, G is the identity. In the second column, G is the diagonal of A. In the third column, G is the incomplete Cholesky factorization of A. the varying inner tolerance as opposed to a fixed inner tolerance. Now that we have seen that allowing the inner tolerance to vary is a better alternative than to simply setting the inner tolerance equal to the outer tol-erance when using GMRES or GMRES(15), we should see how the total cost measures up when comparing the total number of 5-solve iterations performed when solving the same problem with MINRES and it's optimal inner tolerance of 10~2. Of course, these comparisons can only be done for the cases where the positive-definite preconditioner MP is used. Let us revisit the seven cases that appear in table 7.5. Chapter 7. Numerical Experiments 66 mosarqp2 x10 1e-5 1e-10 1e-5 1e-10 1e-5 1e-10 Inner tolerance Figure 7.9: Total number of S-solve iterations needed for convergence at various inner tolerances when using GMRES(15) with M C - In the first column, G is the identity. In the second column, G is the diagonal of A. In the third column, G is the incomplete Cholesky factorization of A . Case 1: mosarqp2, G is the identity matrix Full GMRES with varying inner tol requires 117742 5-solve itns MINRES requires 39118 5-solve itns Case 2: mosarqp2, G is the diagonal of A Full GMRES with varying inner tol requires 17251 5-solve itns MINRES requires 13330 5-solve itns ) Chapter 7. Numerical Experiments 67 1e-5 1e-10 1e-5 1e-10 1e-5 1e-10 Inner t o l e rance Figure 7.10: Total number of S-solve iterations needed for convergence at various<. inner tolerances when using GMRES(15) with A4p. In the first column, G is the identity. In the second column, G is the diagonal of A. In the third column, G is the incomplete Cholesky factorization of A. Cases 3 and 4: mosarqp2, G is the incomplete Cholesky factorization of A Full GMRES with varying inner tol requires 3209 S-solve itns GMRES(15) with varying inner tol requires 3119 S-solve itns MINRES requires 3219 S-solve itns Case 5: stokes2d(0,60), G is the identity matrix Full GMRES with varying inner tol requires 1791668 S-solve itns MINRES requires 140375 S-solve itns Chapter 7. Numerical Experiments 68 1e-5 1e-10 1e-5 1e-10 1e-5 1e-10 stokes2d(0,60) 1e-5 1e-10 1 e - 5 1 e - 1 0 1e-5 1e-10 Inner tolerance Figure 7.11: Total number of S-solve iterations needed for convergence at various inner tolerances when using MINRES (obviously with A4P). In the first column, G is the identity. In the second column, G is the diagonal of A. In the third column, G is the incomplete Cholesky factorization of A. Case 6: stokes2d(0,60), G is the diagonal of A Full GMRES with varying inner tol requires 846833 5-solve itns MINRES requires 107262 5-solve itns Case 7: stokes2d(0,60), G is the incomplete Cholesky factorization of A Full GMRES with varying inner tol requires 29260 5-solve itns ^ MINRES requires 7460 5-solve itns Chapter 7. Numerical Experiments 69 Figure 7.12: X-shaped graphs demonstrating rates at which inner tolerance increases and relative residual decreases as GMRES converges. 100 150 Iteration number (a) mosarqp2 solved with G M R E S and a varying inner toler-ance. Preconditioning with M.P where G is the identity. (b) stokes2d(0,60) solved with G M R E S and a varying inner tolerance. Preconditioning with MC where G is the diagonal of A. Chapter 7. Numerical Experiments 70 Table 7.7: Number of S-solve iterations needed for stokes2d(0,60) to converge given that G is the identity. Iterative Solver Inner Tolerance Preconditioner # S-Solve Itns GMRES fixed Mc 6980.4 GMRES varying Mc 52376 GMRES(15) fixed Mc 131183 GMRES(15) varying Mc 100192 GMRES fixed Mv -GMRES varying Mp 264385 GMRES(15) fixed • Mp 2507016 GMRES(15) varying Mp 1791668 MINRES fixed Mp 140375 Cases 3 and 4 were added for completeness. However, they are not very interesting as convergence occurs almost immediately. The benefits of a large inner tolerance can not be seen here. However, cases 1 and 2 strongly suggest that the optimal inner tolerance'of 10~2 allowed by MINRES leads to significant cost savings when preconditioning (1.1) with Mp. 7.6 The Bottom Line After all has been said and done, all we really care about is preconditioning as cheaply.as possible. Since the cost of an S-solve iteration differs greatly from one value of G to another, we list the cost of preconditioning both our test problems for each of the three values of G for both preconditioners, all three iterative Krylov subspace solvers, and fixed and varying inner tolerances where applicable. All values for solves with GMRES or GMRES(15) and a fixed inner tolerance are based on fixed inner tolerance of 10~10. All values for solves with MINRES are based on fixed inner tolerance of 10~2. Chapter 7. Numerical Experiments 71 Iterative Solver Inner Tolerance Preconditioner # 5-Solve Itns GMRES fixed Mc . . 66899 GMRES varying Mc 47984 . GMRES(15) fixed Mc 165511 GMRES(15) . varying Mc • 119244 GMRES fixed M v -GMRES varying Mp 190073 GMRES(15) fixed Mp 902732 GMRES(15) varying Mp 846833 MINRES fixed' Mp 107262 Table 7.8: Number of 5-solve iterations needed for stokes2d(0,60) to converge given that G is the diagonal of A. Iterative Solver Inner Tolerance Preconditioner # 5-Solve Itns GMRES fixed Mc 6682 GMRES varying Mc 5085 GMRES(15) fixed Mc 9971 ' GMRES(15) varying Mc 7349 GMRES fixed Mp -GMRES varying Mp ' 14260 GMRES(15) fixed. Mp 43298 GMRES(15) varying Mp 29260 MINRES fixed Mp 7460 Table 7.9: Number of 5-solve iterations needed for stokes2d(0,60) to converge given that G is the incomplete Cholesky factorization of A. Chapter 7. Numerical Experiments 72 Iterative Solver Inner Tolerance Preconditioner # 5-Solve Itns GMRES fixed Mc 19464 GMRES varying Mc .17169 GMRES(15) fixed Mc 32980 GMRES(15) varying Mc 28287 GMRES fixed Mv 140198 GMRES varying Mp 117742 GMRES(15) fixed . Mp , 328457 GMRES(15) varying Mp 282139 MINRES fixed Mp 39118 Table 7.10: Number of 5-solve iterations needed for mosarqp2 to converge given that G is the identity. Iterative Solver Inner Tolerance Preconditioner # 5-Solve Itns GMRES fixed Mc 5615 GMRES varying Mc 5490 GMRES(15) fixed Mc 5635 GMRES (15) varying Mc 5185 GMRES fixed Mp 18453 GMRES varying Mp 17251 GMRES(15) fixed Mp 28861 GMRES(15) varying Mp 24929 MINRES fixed Mp 13330 Table 7.11: Number of 5-solve iterations needed for mosarqp2 to converge given that G is the diagonal of A Chapter 7. Numerical Experiments 73 Iterative Solver Inner Tolerance Preconditioner # S-Solve Itns GMRES fixed Mc 1611 GMRES varying Mc 1611 GMRES(15) fixed Mc 1611 GMRES(15) varying Mc 1611 GMRES fixed Mp . 3209 GMRES varying Mp 3209 GMRES(15) fixed Mv 3209 GMRES (15) varying Mv 3209 MINRES fixed Mp 3219 Table 7.12: Number of S-solve iterations needed for mosarqp2 to converge given that G is the incomplete Cholesky factorization of A. For the test problems we examined, preconditioning with Mc is more cost effective than preconditioning with Mp, even if A4P is used with a very large fixed inner tolerance and MINRES. The iterative solver that appears to require the fewest number of S-solve iterations is GMRES. However, GMRES requires a lot of memory, which explains the need for restarted GMRES. Also, it does not take advantage of the symmetry of the saddle-point linear system as does MINRES. 74 Chapter 8 Conclusions and Future Work The contributions of this thesis commence in Chapter 4 where we show three analytic results that have to do with eigenvalues and our block preconditioners of interest. The first of these results relate m eigenvalues of A4C to those of Mp when G is equal to the identity. Hence, if the eigenvalues of one of the block preconditioners are known, m eigenvalues of the second preconditioner are also known. The second result relates the eigenvalues of the constraint preconditioner to singular values of B when G is the identity. The final result of Chapter 4 is based oh work done by Rusten and Winther [29]. However, instead of just finding eigenvalue bounds for Ti, we are able to and P G M.nxn and Q G M m x m are both positive definite. The bounds are based on the extremal eigenvalues of P~i AP~i and the extremal singular values of The concepts of G-solves and 5-solves are vital. Using these concept, we are able to quantify the cost of solving M.cu — f and Mpu — f in such a way that we take advantage of the block structure of the preconditioners. Within the Krylov subspace solvers MINRES and GMRES, we define solves of the type (5.2) and (5.9) to be our inner iteration. Hence, by multiplying find eigenvalue bounds for M 1Ti, where Chapter 8. Conclusions and Future Work 75 the total number of outer iterations by our cost function for one such solve, we derive a cost function for the total cost involved in preconditioning a saddle-point linear system with the constraint or positive-definite preconditioner. In order to minimize the above cost functions, we try setting the inner toler-ance in two different ways. When we allow the inner tolerance to be fixed over-all outer iterations, we can use a simpler cost function involving 7, the average number of inner iterations needed to solve an S-solve. For our test problems, MINRES responded well to solving the inner iteration with a large inner tol-erance. GMRES, on the other hand, did not. When the inner tolerance was set to be larger than the outer tolerance, GMRES stagnated and was unable to converge. This phenomenon also occurred for GMRES(15), albeit less often. This stagnation prompted the implementation of a varying inner tolerance. Simoncini and Szyld [31], as well as van den Eshof, Sleijpen, and van Gijzen [35], developed an inner tolerance bound dependent on the previous outer iteration number and residual. Although their definition of an inner iteration slightly differed from ours, we were able to extend their theory to our inner iteration. Neither GMRES or GMRES(15) stagnated with solving (1.2) with a varying inner tolerance when solving stokes2d(0,60) and mosarqp2. On the contrary, it was shown in tables 7.7 - 7.11 that GMRES with a varying inner tolerance and the preconditioner Mc was by far the cheapest iterative solver in terms of preconditioning. For our test problems, if Mp is chosen as the preconditioner, solving with MINRES and a large inner tolerance is the ideal way of solving (1.1). However, if the choice of preconditioner is fixed, solving with GMRES, Mc, and a varying inner tolerance reduces the amount of work needed for preconditioning to a minimum. This however, does not take into consideration memory constraints. 8.1 Future Research The choice of G greatly affects the total number of S-solve iterations. The better an approximation to A, the fewer iterations needed. However, each iteration is Chapter 8. Conclusions and Future Work 76 more expensive. Although we only explored three options for G, these are not the complete list of available choices. It might be worthwhile to run some tests with alternate G values. For restarted GMRES, our choice of restart = 15 was rather arbitrary. It might also be worthwhile to re-run the same test problems with a different value for restart. As Embree [13] showed, we might see some very different convergence behaviour. Numerically we observed that GMRES (and sometimes GMRES(15)) stag-nates when solving saddle point linear systems with a fixed inner tolerance that is greater than the outer tolerance while MINRES does not. The theory devel-oped by Simoncini and Szyld [31] is based on an Arnoldi basis that is explicitly formed. The (Lanczos) basis vectors are implicitly formed for MINRES. We suspect this is one of the reasons behind the different convergence behaviours of GMRES and MINRES. A further investigation into this matter would be informative. Preconditioning the 5-solve greatly reduces the total number of 5-solve iter-ations. However, the additional cost of this preconditioning has not been worked out and therefore has not been added to the cost functions of chapter 5. An-other layer needs to be added to the cost function for a fair comparison between preconditioned 5-solves and non preconditioned 5-solves. Although this thesis focuses solely on Mc and A4P, it should be mentioned that these are not the only two block preconditioners available for solving saddle-point linear systems, ourgmres and ourminres could be extended to accommo-date other block preconditioners as well (only positive-definite block precondi-tioners for ourminres). Of course, new cost functions would have to be derived for the total cost of preconditioning. 77 Bibliography [1] K. Arrow, L. Hurwicz, and H. Uzawa. Studies in Nonlinear Programming. Stanford University Press, Stanford, CA, 1958. [2] A. Batterman and M. Heinkenschloss. Preconditioners for Karush-Kuhn-Tucker systems arising in the optimal control of distributed systems. Op-timal Control of Partial Differential Equations, pages 15-32, 1998. [3] M. Benzi, G. H. Golub, and J. Liesen. Numerical solution of saddle point problems. Acta Numerica, 14:1-137, 2005. [4] I. Bongartz, A. R. Conn, Nick Gould, and P. L. Toint. CUTE: Con-strained and unconstrained testing environment. ACM Trans. Math. Softw., 21(1):123-160, 1995. [5] A. Bouras and V. Fraysse. Inexact matrix-vector products in Krylov meth-ods for solving linear systems: a relaxation strategy. SIAM J. Matrix Anal. Appl., 26:660-678, 2005. [6] R. Bridson. stokes2d(k,n). Private communication, 2005. [7] J. R. Bunch and B. N. Parlett. Direct methods for solving symmetric indefinite systems of linear equations. SIAM J. Numer. Anal, 8:639-655, 1971. [8] H. S. Dollar and A. J. Wathen. Approximate factorization constraint pre-conditioners for saddle-point matrices. SIAM J. Sci. Comput., 2004. To appear. Bibliography 78 [9] H. S. Dollar and A. J. Wathen. Incomplete factorization constraint precon-ditioners for saddle-point matrices. Technical Report NA-04/01, Oxford University, Numerical Analysis Group, 2004. [10] N. Dyn and W. E. Ferguson. The numerical solution of equality-constrained quadratic programming problems. Math. Comp., 41:165-170, 1983. [11] H. C. Elman and G. H. Golub. Inexact and preconditioned uzawa algo-rithms for saddle point problems. SIAM J. Numer. Anal, 31:1645-1661, 1994. [12] H.C. Elman, D.J. Silvester, and A.J. Wathen. Finite Elements and Fast Iterative Solvers. Oxford University Press, 2005. [13] M. Embree. The tortoise and the hare restart GMRES. SIAM Review, 45:259-266, 2003. [14] R. Fletcher and T. Johnson. On the stability of null-space methods for K K T systems. Technical Report NA/167, University of Dundee, Department of Mathematics and Computer Science, 1995. [15] G. H. Golub and M. L. Overton. The convergence of inexact Chebyshev and Richardson iterative methods for solving linear systems. Numer. Math., 53:571-593, 1988. [16] A. Greenbaum. Iterative methods for solving linear systems. Society for Industrial and Applied Mathematics, Philadelphia, PA, USA, 1997,. [17] M. R. Hestenes and R. Stiefel. Methods of conjugate gradients for solving linear systems. J. Research Nat. Bur. Standards, 49:409-436, 1952. [18] R. A. Horn and C. R. Johnson. Matrix Analysis. Cambridge University Press, New York, NY, USA, 1986. [19] C. Keller, N. I. M. Gould, and A. J. Wathen. Constraint preconditioning for indefinite linear systems. SIAM J. Matrix Anal. Appl, 21:1300-1317, 2000. Bibliography 79 [20] Y. A. Kuznetsov. Efficient iterative solvers for elliptic finite element prob-lems on nonmatching grids. Russian J. Numer. Math. Modelling, 10:187-211, 1995. [21] C. Lanczos. Solution of systems of linear equations by minimized iterations. J. Research Nat. Bur. Standards, 49:33-53, 1952. [22] L. Luksan and J. Vlcek. Indefinitely preconditioned inexact newton method for large sparse equality constrained non-linear programming problems. Nu-mer. Linear Algebra Appl., 5:219-247, 1998. [23] J. L. Morales-Perez and R. W. H. Sargent. On the implementation and performance of an interior point method for large sparse convex quadratic programming. Optim. Methods Sofw., 1:153-168, 1992. [24] M. F. Murphy, G. H. Golub, and A. J. Wathen. A note on preconditioning for indefinite linear systems. SIAM J. Sci. Comput, 21:1969-1972, 2000. [25] J. Nocedal and S. J. Wright. Numerical Optimization. Springer Series in Operations Research. Springer-Verlag, New York, 1999. [26] Y. Notay. Flexible conjugate gradients. SIAM J. Sci. Comput, 22:1444-1460, 2000. [27] C. C. Paige and M. A. Saunders. Solution of sparse indefinite systems of linear equations. SIAM J. Numer. Anal., 12:617-629, 1975. [28] J. K. Reid. On the method of conjugate gradients for the solution of large sparse linear equations. In J. K. Reid, editor, Large Sparse Sets of Linear Equations, pages 231-254. Academic Press, New York, NY, USA, 1971. [29] T. Rusten and R. Winther. A preconditioned iterative method for saddle-point problems. SIAM J. Matrix Anal. Appl., 13:887-904, 1992. [30] Y. Saad and M. H. Schultz. GMRES: A generalized minimal residual algo-rithm for solving nonsymmetric linear systems. SIAM J. Sci. Stat. Comput., 7:856-869, 1986. Bibliography 80 [31] V. Simoncini and D. B. Szyld. Flexible inner-outer Krylov subspace meth-ods. SIAM J. Numer. Anal, 40:2219-2239,2003. [32] V. Simoncini and D. B. Szyld. Theory of inexact Krylov subspace methods and applications to scientific computing. SIAM J. Sci. Comput., 25:454-477, 2003. [33] V. Simoncini and D. B. Szyld. On the occurrence of superlinear convergence of exact and inexact Krylov subspace methods. SIAM Review, 47:247-272, 2005. [34] G. L. G. Sleijpen, H. A. van der Vorst, and J. Modersitzki. Differences in the effects of rounding error in Krylov solvers for symmetric indefinite linear systems. SIAM J. Matrix Anal. Appl., 22:726-751, 2000. [35] J. van den Eshof, G. L. G. Sleijpen, and M. B. van Gijzen. Relaxation strategies for nested Krylov methods. Journal of Computational and Ap-plied Mathematics, 177:347-365, 2005. [36] Henk A. van der Vorst. Iterative Krylov Methods for Large Linear Systems. Cambridge University Press, 2003. [37] A. J. Wathen and D. J. Silvester. Fast iterative solution of stabilised Stokes systems Part I: Using simple diagonal preconditioners. SIAM J. Numer. Anal., 30:630-649, 1993. 81 Appendix A The Matlab Program All numerical experiments were performed with an altered version of M A T L A B ' S built-in G M R E S or M I N R E S functions which we cleverly call O U R G M R E S and O U R M I N R E S , respectively. Both new functions take advantage of the structure of the block preconditioner and uses the factorizations discussed in Chapter 5 to reduce the amount of work for the linear solves of the form Aiu = f. A . l O U R G M R E S In order to make OURGMRES as robust as possible we have altered the input and output arguments in the following manner: [ u . f l a g , r e l r e s , i t e r , r e s v e c , c o s t p = o u r g m r e s ( H , b , r e s t a r t , t o l maxit |,| GI |,| G2 |,u0,| p c f l a g , Sf l a g , pcSf l a g , t o l f l a g ) New or altered parameters are highlighted by a box. Input Parameters: H . K K T matrix of dimension (n + m) x (n + m). b Right-hand side of unpreconditioned K K T linear system, b is of dimension (n + m) x 1 r e s t a r t Restarts the method every restart iterations. If restart is n or [ ], then OURGMRES uses the full version of the method. Appendix A. The Matlab Program 82 to l Specifies the tolerances. If to l is [ ], a default value of le-6 is placed in to l ( l ) . tol(2) is the tolerance used if the S solves are solved inexactly. If tol(2) is [ ], to l ( l ) is the default value. to l (3) is the tolerance used if the G solves are solved inexactly. If tol(3) is [ ], tol(2) is used as a default. Note: If tolf lag = 1, the value of to l (2) is obsolete, maxit Specifies the maximum number of iterations. maxit(2) is the maximum number of iterations for solving the S solves if they . are done so inexactly. If maxit (2) is [ ], maxit (1) is the default value, maxit (3) is the maximum number of iterations for solving the G solves if they are done so inexactly. Gl If G2 is [ ], then G = Gl is (1,1) block of preconditioner. Positive definite and of dimensions n x n . If G2 is [ ], then a preconditioner is not applied. G2 If G2 is nonempty, then G = G1*G2 is (1,1) block of preconditioner. G must be positive definite. G2 is of dimensions n x n uO Specifies the first initial guess. If uO is [.], OURGMRES uses the default, an all zero vector of dimension (n + ra) x 1. pcf lag Determines the preconditioner: 0 —> Mc (constraint preconditioner). 1 —> Mp (positive-definite preconditioner). Sf lag Determines the method of solving S solves: 0 —•> Uses. MATLAB's built-in backslash operator. 1 —» Uses the conjugate gradient method. Note: If Sflag — 1, value'of tolf lag is obsolete. pcSf lag Determines whether we will precondition our S solves or not (as-suming we are solving S solves inexactly): 0 —+ No preconditioner! 1 —> Precondition with BTB. 2 -> Precondition with BT diag(G) - 1 B. Appendix A. The Matlab Program 83 tolf lag Determines whether the inner tolerance is fixed or varying (as-suming Sf lag =1). 0 —-> Inner tolerance is fixed (user specified). 1 —> Inner tolerance varies according to Simoncini and Szyld anal-ysis. Output Parameters: u Solution of linear system Hu = b. Vector of dimension (n+m) x 1. flag Convergence flag: 0 —> OURGMRES converged to desired tol within maxit itera-tions. 1 —> OURGMRES iterated maxit times but did not converge. 2 —> Preconditioner was ill-conditioned. 3 OURGMRES stagnated, relres Relative residual. i ter 1x2 vector whose (1,1) entry is the number of outer OURGMRES iterations and whose (1,2) entry is the number of inner OURGM-RES inner iterations necessary to compute u. 0 < i ter( l ) < maxit and 0 < iter(2) < restart. If we are using the unrestarted OURGMRES method, iter (1) = 1. Note: the terms inner and outer iterations do not have the same meaning in this context as they do everywhere else in this thesis, resvec Vector of residual norms at each iteration. cost Structure that keeps track of the number of solves (G and S) and the number of matrix-vector products. Appendix A. The Matlab Program 84 A . 2 O U R M I N R E S In.order to make OURMINRES as robust as possible we have altered the input and output arguments in the following manner: [u,f lag.relres, i ter ,resvec,resveccg,| cost |] = ourminres(H,b, . . . to l |,| maxit |,|G1 |,|G2 |,u0,| Sflag , pcSf lag ) New or altered parameters are highlighted by a box. Input Parameters: H K K T matrix of dimension (n + m) x (n + m). b Right-hand side of unpreconditioned K K T linear system, b is of dimension (n + m) x 1 to l ' Specifies the tolerances. If to l is [ ], a default value of le-6 is placed in to l ( l ) . tol(2) is the tolerance used if the S solves are solved inexactly. If tol(2) is [ ], to l ( l ) is the default value, to l (3) is the tolerance used if the G solves are solved inexactly. If tol(3) is [ ], tol(2) is used as a default. maxit Specifies the maximum number of iterations, maxit(2) is the maximum number of iterations for solving the S solves if they are done so inexactly. If maxit (2) is [ ], maxit (1) is the default value, maxit (3) is the maximum number of iterations for solving the G solves if they are done so inexactly. ' . Gl If G2 is ["], then G = Gl is (1,1) block of preconditioner. Positive definite and of dimensions n x n . If G2 is [ ], then, a preconditioner is not applied. G2 If G2 is nonempty, then G = G1*G2 is (1,1) block of preconditioner. G must be positive definite. G2 is of dimensions n x n uO Specifies the first initial guess. If uO is [ ], OURMINRES uses the default, an all zero vector of dimension (n + m) x 1. Appendix A. The Matlab Program 85 Sflag Determines the method of solving 5-solves: 0 —• Uses MATLAB's built-in backslash operator. 1 —• Uses the conjugate gradient method. Note: If Sflag = 1, value of tolf lag is obsolete. pcSf lag Determines whether we will precondition our 5-solves or not (as-suming we are solving 5-solves inexactly): 0 —» No preconditioner. 1 —» Precondition with B1 B. 2 -> Precondition with BT diag(G)-1 B. Output Parameters: b. Vector of dimension (n + m) x 1. u Solution of linear system Hu flag Convergence Hag: 0 —> OURMINRES converged to desired tolerance within maxit iterations. 1 —> OURMINRES iterated maxit times but did not converge. 2 —> Preconditioner was ill-conditioned. 3 -> OURMINRES stagnated. 4 -* One of the scalar quantities calculated during OURMINRES became too small or too large to continue computing. 5 —+ Preconditioner is not symmetric positive definite, relres Relative residual. iter Iteration number at which </. was computed. 0 < i ter < maxit resvec Vector of residual norms at each iteration. resveccg Vector of estimates of the conjugate gradient residual norms at each iteration. cost Structure that keeps track of the number of solves (G and 5) and the number of matrix-vector products.
- Library Home /
- Search Collections /
- Open Collections /
- Browse Collections /
- UBC Theses and Dissertations /
- A block preconditioning cost analysis for solving saddle-point...
Open Collections
UBC Theses and Dissertations
Featured Collection
UBC Theses and Dissertations
A block preconditioning cost analysis for solving saddle-point linear systems MacKinnon-Cormier, Sarah Claire 2005
pdf
Notice for Google Chrome users:
If you are having trouble viewing or searching the PDF with Google Chrome, please download it here instead.
If you are having trouble viewing or searching the PDF with Google Chrome, please download it here instead.
Page Metadata
Item Metadata
Title | A block preconditioning cost analysis for solving saddle-point linear systems |
Creator |
MacKinnon-Cormier, Sarah Claire |
Publisher | University of British Columbia |
Date Issued | 2005 |
Description | We investigate the cost of preconditioning when solving large sparse saddlepoint linear systems with Krylov subspace methods. To use the block structure of the original matrix, we apply one of two block preconditioners. Algebraic eigenvalue analysis is given for a particular case of the preconditioners. We also give eigenvalue bounds for the preconditioned matrix when the preconditioner is block diagonal and positive definite. We treat linear solves involving the preconditioner as a subproblem which we solve iteratively. In order to minimize cost, we implement a fixed inner tolerance and a varying inner tolerance based on bounds developed by Simoncini and Szyld (2003) and van den Eshof, Sleijpen, and van Gijzen (2000). Numerical experiments compare the cost of preconditioning for various iterative solvers and block preconditioners. We also experiment with different tolerances for the iterative solution of linear solves involving the preconditioner. |
Genre |
Thesis/Dissertation |
Type |
Text |
Language | eng |
Date Available | 2009-12-12 |
Provider | Vancouver : University of British Columbia Library |
Rights | For non-commercial purposes only, such as research, private study and education. Additional conditions apply, see Terms of Use https://open.library.ubc.ca/terms_of_use. |
DOI | 10.14288/1.0051203 |
URI | http://hdl.handle.net/2429/16638 |
Degree |
Master of Science - MSc |
Program |
Computer Science |
Affiliation |
Science, Faculty of Computer Science, Department of |
Degree Grantor | University of British Columbia |
GraduationDate | 2005-11 |
Campus |
UBCV |
Scholarly Level | Graduate |
AggregatedSourceRepository | DSpace |
Download
- Media
- 831-ubc_2005-0542.pdf [ 2.91MB ]
- Metadata
- JSON: 831-1.0051203.json
- JSON-LD: 831-1.0051203-ld.json
- RDF/XML (Pretty): 831-1.0051203-rdf.xml
- RDF/JSON: 831-1.0051203-rdf.json
- Turtle: 831-1.0051203-turtle.txt
- N-Triples: 831-1.0051203-rdf-ntriples.txt
- Original Record: 831-1.0051203-source.json
- Full Text
- 831-1.0051203-fulltext.txt
- Citation
- 831-1.0051203.ris
Full Text
Cite
Citation Scheme:
Usage Statistics
Share
Embed
Customize your widget with the following options, then copy and paste the code below into the HTML
of your page to embed this item in your website.
<div id="ubcOpenCollectionsWidgetDisplay">
<script id="ubcOpenCollectionsWidget"
src="{[{embed.src}]}"
data-item="{[{embed.item}]}"
data-collection="{[{embed.collection}]}"
data-metadata="{[{embed.showMetadata}]}"
data-width="{[{embed.width}]}"
data-media="{[{embed.selectedMedia}]}"
async >
</script>
</div>
Our image viewer uses the IIIF 2.0 standard.
To load this item in other compatible viewers, use this url:
https://iiif.library.ubc.ca/presentation/dsp.831.1-0051203/manifest