Numerical Solution of Skew-Symmetric Linear Systems by Tracy Lau B.Sc., The University of British Columbia, 2007 A THESIS SUBMITTED IN PARTIAL FULFILLMENT OF THE REQUIREMENTS FOR THE DEGREE OF MASTER OF SCIENCE in The Faculty of Graduate Studies (Computer Science) THE UNIVERSITY OF BRITISH COLUMBIA (Vancouver) December 2009 c Tracy Lau 2009 ° Abstract We are concerned with iterative solvers for large and sparse skew-symmetric linear systems. First we discuss algorithms for computing incomplete factorizations as a source of preconditioners. This leads to a new Crout variant of Gaussian elimination for skew-symmetric matrices. Details on how to implement the algorithms efficiently are provided. A few numerical results are presented for these preconditioners. We also examine a specialized preconditioned minimum residual solver. An explicit derivation is given, detailing the effects of skew-symmetry on the algorithm. ii Table of Contents . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . ii Table of Contents . . . . . . . . . . . . . . . . . . . . . . . . . . . . iii . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . iv List of Figures . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . v Acknowledgements . . . . . . . . . . . . . . . . . . . . . . . . . . . vi 1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1 Abstract List of Tables 2 Factorizations of skew-symmetric matrices . . . 2.1 Gaussian elimination variants . . . . . . . . . . . 2.2 Bunch’s LDLT decomposition . . . . . . . . . . 2.3 Incomplete factorizations . . . . . . . . . . . . . 2.4 Crout factorization for skew-symmetric matrices 2.5 Numerical results . . . . . . . . . . . . . . . . . . . . . . . 3 3 4 7 8 14 skew-symmetric systems . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 17 17 19 21 24 4 Conclusions and future work . . . . . . . . . . . . . . . . . . . 28 Bibliography . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 29 3 Minimum residual iterations for 3.1 Skew-Lanczos . . . . . . . . . 3.2 QR factorization of Tk+1,k . . 3.3 Skew-MINRES . . . . . . . . . 3.4 Preconditioned skew-MINRES . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . iii List of Tables 2.1 Convergence results of preconditioned GMRES for test problem. 15 iv List of Figures 2.1 Residual vector norms for test problem. . . . . . . . . . . . . 16 v Acknowledgements I am deeply indebted to my supervisor, Chen Greif, for tirelessly guiding and encouraging me on the journey that led to this thesis, sharing his knowledge and enthusiasm, and for supporting me in all my endeavours. Jim Varah has also assisted in my work and kindly volunteered to be a reader of this thesis, providing many helpful comments. Scientific Computing Lab members Dan Li, Ewout van den Berg, Hui Huang, and Shidong Shan have been of much help and shown me great kindness throughout my graduate studies. I am especially grateful to Ewout for consistently going out of his way to offer his expertise and excellent advice from the very first time I stepped into the lab for my undergraduate research. I thank all my labmates for fielding my numerous questions, sharing their wisdom and experience, and for their good company and humour. vi Chapter 1 Introduction A real skew-symmetric matrix S is one that satisfies S = −S T . Such matrices are found in many applications, perhaps not explicitly, but more likely as an implicit part of the problem. Any general square matrix can be split into a symmetric part and a skew-symmetric part, A= A + AT A − AT + , 2 2 and any non-symmetric linear system has a nonzero skew component. In some problems, the skew component may dominate the system, for example in convection-dominated convection-diffusion problems. Here are a few basic properties of a skew-symmetric matrix A: • Its diagonal is zero since aij = −aji → aii = 0 ∀i. • For any x, (xT Ax)T = −xT Ax = 0. • If the dimension n of A is odd, then A is singular. • A−1 is also skew-symmetric if it exists. • All eigenvalues of A are either 0 or occur in pure imaginary complex conjugate pairs. Skew-symmetric matrices can be factorized as P AP T = LDLT , where P is a permutation matrix, L is block lower triangular, and D is block diagonal with 1×1 and 2×2 blocks. This is similar to the LDLT factorization for symmetric indefinite matrices [6, §11.1]. In this thesis, we are concerned with solving large and sparse skewsymmetric systems using iterative solvers. Thus preconditioners are the focus of the first section. With the goal of using incomplete factorizations as preconditioners, we begin in Chapter 2 by discussing general factorizations of skew-symmetric matrices. The Bunch decomposition [1] is stable with appropriate monitoring of element growth, and it can be adapted to produce incomplete factorizations. By using skew-symmetry, it halves work 1 Chapter 1. Introduction and storage when compared to general LU factorizations. For an efficient algorithm for sparse matrices, however, we turn to the Crout variants of Gaussian elimination, and see how exactly to incorporate pivoting such that skew-symmetry is preserved. A few numerical experiments illustrate the performance of preconditioners generated by this procedure. In Chapter 3, we discuss the topic of iterative solvers themselves. The various properties of skew-symmetric matrices can lead to specific variants of existing solvers. We focus on a minimum residual algorithm starting with a specialized Lanczos procedure and examining how matrix structure affects MINRES [10]. Finally, we examine the preconditioned skew-MINRES algorithm. Thesis contributions In the section on factorizations, we derive a new algorithm to compute incomplete factorizations of skew-symmetric matrices, very much in the spirit of [8, Algorithm 3.3] for symmetric matrices. Our algorithm is based on an incomplete Crout variant of Gaussian elimination and incorporates the partial pivoting strategy of Bunch [1]. Implementation details are given for the two main factorization algorithms we discuss. In particular we consider the issues involved with working only on half of the matrix. Also stemming from our work in Matlab, prototype code was developed and numerical experiments were run to test our preconditioners. Our work on MINRES unpacks the details of the derivation of the method given in [5]. We present explicit algorithms for both the unpreconditioned and preconditioned schemes. 2 Chapter 2 Factorizations of skew-symmetric matrices We are interested in using iterative solvers on skew-symmetric systems, and so it is natural to explore the options for preconditioning. In particular, we look at incomplete LDLT factorizations based on the well-known ILU(0) and ILUT factorizations [11]. We start by briefly looking at two basic variants of Gaussian elimination since the order in which the elimination is executed affects the type of dropping scheme that can be implemented. The Bunch LDLT decomposition is then discussed along with some considerations for working with skewsymmetric matrices. We then review incomplete factorizations and mention how they can be generated with the Bunch decomposition. However, Crout factorizations may be better suited to generating incomplete factorizations for sparse matrices, so we derive a skew-symmetric version. Finally, a few numerical experiments examine the performance of the preconditioners that are generated by this new variant. 2.1 Gaussian elimination variants Gaussian elimination for a general matrix A is typically presented in its KIJ form (Algorithm 2.1), so-named due to the order of the loop indices. At Algorithm 2.1 KIJ Gaussian elimination for general matrices 1: for k = 1 to n − 1 do 2: for i = k + 1 to n do 3: aik = aik /akk 4: for j = k + 1 to n do 5: aij = aij − aik akj 6: end for 7: end for 8: end for 3 2.2. Bunch’s LDLT decomposition step k, every row below row k is modified as Schur complement updates for the Ak+1:n,k+1:n submatrix are computed. This is fine for a dense matrix, but it is inefficient when working with matrices stored in sparse mode. In this case, the IKJ variant of Gaussian elimination (Algorithm 2.2) is more efficient, modifying only row i in step i. It accesses rows above row i, which have previously been modified, but not those below. This is referred to as a “delayed-update” variant, since all Schur complement updates for a particular element aij are computed within step i [8]. Algorithm 2.2 IKJ Gaussian elimination for general matrices 1: for i = 2 to n do 2: for k = 1 to i − 1 do 3: aik = aik /akk 4: for j = k + 1 to n do 5: aij = aij − aik akj 6: end for 7: end for 8: end for The KIJ and IKJ algorithms produce the same complete factorization of a matrix if it has an LU decomposition [11, Proposition 10.3]. Both of these variants can be adapted in various ways to decomposing skew-symmetric matrices. Beyond modifying them to compute the block LDLT factorization, the idea is to use skew-symmetry wherever possible to make gains over the plain Gaussian elimination process. 2.2 Bunch’s LDLT decomposition A method for decomposing a skew-symmetric matrix A into LDLT is detailed in [1]. It takes advantage of the structure of A to halve both work and storage, and stability can be increased with pivoting. It generates the factorization P AP T = LDLT where P is a permutation matrix, L is block unit lower triangular containing the multipliers, and D is block diagonal with 1×1 and 2×2 ¤ Because the blocks are also skew, they are either £ blocks. 0 or of the form α0 −α 0 . In the case of the latter, they are nonsingular in this decomposition. We describe here a simplified version of the decomposition, and consider throughout only nonsingular matrices. This implies n is even, and that D will only have 2×2 blocks on its diagonal. When performing operations only on half the matrix, we will use the lower triangular part. 4 2.2. Bunch’s LDLT decomposition The decomposition starts by partitioning A as · ¸ S −C T C B where S is 2×2, C is (n−2)×2, and£B is (n−2)×(n−2). Note that S and ¤ B are also skew-symmetric. If S = a021 −a021 is nonsingular, i.e., a21 6= 0, then the first step in the factorization is · ¸ · ¸· ¸ ¸· S −C T I S 0 0 I −S −1 C T A= = . C B CS −1 I 0 B + CS −1 C T 0 I The Schur complement B +CS −1 C T is again skew-symmetric, so to continue the factorization, repeat this first step on the (n−2)×(n−2) submatrix. Because the submatrix in each step is skew-symmetric, the factorization can be carried out using only the lower triangular part of the matrix. At each step, A can be overwritten with the multipliers in CS −1 as well as the lower triangular parts of S and B + CS −1 C T . If S is singular, i.e. a21 = 0, there must be some i, 2 ≤ i ≤ n, such that ai1 6= 0 since A is nonsingular. Interchange the ith and second row and column to move ai1 into the (2, 1) position. This gives ¸ · S −C T T P1 A = P1 C B where S is now nonsingular and P1 = P1T is obtained by interchanging the ith and second column of the identity matrix. A high-level summary of the decomposition is given in Algorithm 2.3. The factors are obtained as L, which is the collection of CS −1 from each step, and D, the collection of matrices S from each step. Both L and D can be stored in half of A, and pivoting information can be stored in one vector. Algorithm 2.3 Bunch LDLT for skew-symmetric matrices 1: for k = 1, 3, . . . , n − 1 do 2: find pivot apq and permute A 3: set S = Ak:k+1,k:k+1 , C = Ak+2:n,k:k+1 , B = Ak+2:n,k+2:n 4: Ak+2:n,k:k+1 = CS −1 5: Ak+2:n,k+2:n = B + CS −1 C T 6: end for With regard to using only half of A, there is a small implementation issue that relates to the exact amount of storage required in total. When 5 2.2. Bunch’s LDLT decomposition computing the multiplier CS −1 and the Schur complement B + CS −1 C T , it would make sense to avoid computing CS −1 twice. It is then natural to overwrite C with CS −1 , but C T is still needed in the computation of the Schur complement. There are two straightforward ways to resolve this. If memory is not an issue, temporarily store either C or CS −1 , which only requires two vectors in either case, such that both are available for the Schur complement computation that follows. For the other solution, use the fact that C T = ((CS −1 )S)T , hence C T may be recovered with S and CS −1 . Since S is skew-symmetric, this amounts to multiplying two vectors with scalars and swapping their positions; in practice this only involves indexing into the proper elements if the Schur complement is computed element-wise. Pivoting and symmetric permutations The pivoting scheme used above only ensures that the factorization may be completed, but does nothing to control element growth in factors. In [1], Bunch proposes a partial piovting scheme that finds max2≤p≤n, 1≤q≤2 {|apq |}. It then moves that element into the block pivot A1:2,1:2 by interchanging the first and qth row and column, followed by interchanging the pth and second row and column. Of course, interchanging rows and columns should also exploit skewsymmetry and be performed only on the lower triangular part of the matrix. Let us examine a small 6×6 example to see the effects of skew-symmetry on how elements are moved around by symmetric permutations. Example 2.1. Let A be as shown below. The largest element in magnitude in the first two columns is a51 = 12. It is already in the first column, so we need only interchange the fifth and second row and column. Let Ã denote this permuted matrix. 2 6 6 6 A=6 6 6 4 0 1 5 11 12 3 −1 0 2 7 8 4 −5 −2 0 13 15 6 −11 −7 −13 0 10 14 −12 −8 −15 −10 0 9 −3 −4 −6 −14 −9 0 3 7 7 7 7 7 7 5 2 6 6 6 Ã = 6 6 6 4 0 12 5 11 1 3 −12 0 −15 −10 −8 9 −5 15 0 13 −2 6 −11 10 −13 0 −7 14 −1 8 2 7 0 4 −3 −9 −6 −14 −4 0 3 7 7 7 7 7 7 5 We show the entire matrix only for reference; the dark shading indicates the elements that cannot be accessed. The light shading highlights positions in the lower triangular part of the matrix where elements are modified. Starting in the (5, 2) position, ã52 = −a52 . Elements in Ã to the right of 6 2.3. Incomplete factorizations this position take their values from elements above this position in A and vice versa. For example, we set ã53 = −a32 , instead of a23 , which is off limits. That takes care of the elements that change sign. To the left of the second column, rows two and five are swapped giving ã21 = 12 and ã51 = 1. Likewise, below row five, columns two and five are swapped; ã62 = 9 and ã65 = 4 in this example. In general, suppose rows and columns p and q are to be interchanged where p < q. Then • • • • ãqp = −aqp , ãtp = −aqt and ãqt = −atp for t = p + 1 . . . q − 1, ãpt = aqt and ãqt = apt for t = 1 . . . p − 1, and ãtp = atq and ãtq = apt for t = q + 1 . . . n. While the Bunch partial pivoting scheme can limit element growth, it does not guarantee that the elements in L are smaller than one in magnitude. Indeed, even in Example 2.1, |ã32 | = 15 > 12 = |ã21 |. Thus element growth would have to be monitored to guarantee stability, similar to Gaussian elimination with partial pivoting. The rook pivoting strategy chooses a pivot that is maximal in both its original row and column; see [6, §9.1] for its use in general matrices. It is considered as an alternate pivoting scheme for this decomposition in [5]. We only mention here that the adaptation of rook pivoting to using only half of A is straightforward. Where either A:,p or Ap,: was searched in the full matrix for its maximal entry in magnitude, now search instead Ap,1:p−1 and Ap:n,p . 2.3 Incomplete factorizations Incomplete factorizations of the coefficient matrix of a linear system are a rich source of preconditioners [11, §10.3]. There are numerous ways to generate these, but here we focus on adapting two of the basic ones, namely ILU(0) and ILUT. In ILU(0), the sparsity pattern of the factors matches that of the original matrix [11, §10.3.2]. In the skew-symmetric case, since operations are performed on 2 × 2 blocks, the analogous ILDLT (0) produces factors that match the block sparsity pattern of A. This is fairly straightforward to incorporate into the Bunch decomposition. Compute the multiplier in line 4 of Algorithm 2.3 only if the original 2×2 block in A has nonzero norm. Then 7 2.4. Crout factorization for skew-symmetric matrices only compute the update for a block in line 5 if the corresponding multiplier from CS −1 has nonzero norm. For general matrices, ILUT is a more accurate factorization than ILU(0) [11, §10.4]. It uses two dropping rules to determine which elements to replace with zero. The first rule limits the amount of work performed by ignoring elements with a small norm: if the norm of an element is below a threshold relative to the norm of its row, then set that element to zero to skip further computations with it. The second dropping rule limits the amount of memory used to store L: in each row, allow at most a fixed number of nonzero blocks, keeping those that are largest in magnitude and setting the rest to zero. The diagonal element is never dropped. For the skew-symmetric version of ILUT, which we call ILDLT T, we again treat 2×2 blocks rather than individual elements. If ILDLT T were computed using the Bunch decomposition, then column-based dropping should be used instead of the row-based dropping described above. Again, any dropping would be performed after line 4 in Algorithm 2.3. However, the Bunch decomposition is general and not specialized for sparse matrices. When working with such matrices in a compressed storage format, it is not particularly efficient due to the pattern in which matrix entries are accessed. We now turn to Crout factorization, which is able to handle computing ILDLT T efficiently. 2.4 Crout factorization for skew-symmetric matrices It is possible to adapt ILUT, derived from IKJ Gaussian elimination, to skew-symmetric matrices by creating a 2×2 block version that also halves both work and storage. However, preserving symmetry such that these savings are possible is impractical once pivoting is introduced. The Compressed Sparse Row storage scheme [11, §3.4], which is the data structure typically used for sparse matrices, does not easily lend itself to symmetric pivoting in IKJ Gaussian elimination. Since pivoting is often necessary, we follow [8], which discusses sparse symmetric matrices, in using a Crout variant of Gaussian elimination. As a side note, Crout factorization allows for efficient implementation of more robust dropping strategies [9], but we will not discuss them here. The full Crout LU factorization for general matrices from [11, Algorithm 10.8] is shown here in Algorithm 2.4. The Crout form of Gaussian elimination for general matrices allows for both symmetric pivoting and the inclusion of dropping rules to produce 8 2.4. Crout factorization for skew-symmetric matrices Algorithm 2.4 Crout LU factorization for general matrices 1: for k = 1 : n do 2: for i = 1 : k − 1 and if aki 6= 0 do 3: ak,k:n = ak,k:n − aki ai,k:n 4: end for 5: for i = 1 : k − 1 and if aik 6= 0 do 6: ak+1:n,k = ak+1:n,k − aik ak+1:n,i 7: end for 8: for i = k + 1 : n do 9: aik = aik /akk 10: end for 11: end for incomplete factorizations. Additionally, it is easily adapted to produce a block factorization. It is similar to IKJ Gaussian elimination in that it also involves a delayed update. At each step k, the kth row of U and the kth column of L are computed. Note that if A were symmetric, lines 3 and 6 would produce essentially the same numbers. The Ak+1:n,k+1:n submatrix is not accessed. We now derive an incomplete skew-symmetric version of Crout factorization by making modifications to Algorithm 2.4. The new version is summarized in Algorithm 2.5. This algorithm operates on 2×2 blocks and produces an LDLT factorization. As with the Bunch factorization, work and storage can be halved, and so lines 2–4 of Algorithm 2.4 are redundant for skewsymmetric matrices. Algorithm 2.5 Incomplete skew-symmetric Crout factorization, ILDLT C 1: for k = 1, 3, . . . , n − 1 do 2: for i = 1, 3, . . . , k − 2 and kAk:k+1,i:i+1 k = 6 0 do 3: Ak:n,k:k+1 = Ak:n,k:k+1 − Ak:n,i:i+1 Ai:i+1,i:i+1 ATk:k+1,i:i+1 4: end for 5: apply dropping rules to Ak+1:n,k:k+1 6: for i = k + 2, k + 4, . . . , n − 1 do 7: Ai:k+1,k:k+1 = Ai:i+1,k:k+1 A−1 k:k+1,k:k+1 8: end for 9: end for 9 2.4. Crout factorization for skew-symmetric matrices In line 7, there is no explicit inversion of Ak:k+1,k:k+1 since Ai:i+1,k:k+1 A−1 k:k+1,k:k+1 · 0 −ak+1,k = Ai:i+1,k:k+1 ak+1,k 0 · ¸ 1 0 1 Ai:i+1,k:k+1 , = −1 0 ak+1,k ¸−1 which can be trivially computed element-wise. To keep with using only half of the matrix, line 3 can also be computed element-wise so that only ai+1,i is required from Ai:i+1,i:i+1 . Again there is the issue of overwriting A with the multipliers in L in line 3. If the full matrix were being operated on, the line would read Ak:n,k:k+1 = Ak:n,k:k+1 − Ak:n,i:i+1 Ai:i+1,k:k+1 . Since Ai:i+1,k:k+1 is in the upper half of A and not available, we want to use −ATk:k+1,i:i+1 , but the required value for a particular i would have been overwritten in line 7 during the ith step of the factorization. So Ai:i+1,k:k+1 can be recovered with Ak:k+1,i:i+1 Ai:i+1,i:i+1 . Transposing and negating gives line 3 as shown. Incorporating thresholding to obtain an incomplete LDLT factorization only requires the addition of line 5. Either method presented in §2.3 can be used here, and the algorithm is certainly not restricted to using only these dropping schemes. Pivoting Bunch’s partial pivoting strategy employed in the LDLT decomposition is somewhat more involved when used in a delayed update algorithm. In Crout, once pivoting is incorporated, it is no longer true that elements in Ak+2:n,k+2:n are not accessed at step k. As we shall see though, it will be only one column that is needed from this submatrix. Algorithm 2.6 summarizes the result of incorporating Bunch pivoting into ILDLT C. When choosing a pivot in the usual KIJ Gaussian elimination schemes, the set of elements from which the pivot is chosen has already been modified from the original matrix in previous steps with Schur complement updates. So in Algorithm 2.5, a pivot may only be chosen after the updates in lines 2–4. In Algorithm 2.6, instead of writing these updates directly into A, we store them in a temporary block vector w. This is due to the swapping of elements to and from the submatrix Ak+2:n,k+2:n when performing row and column interchanges. 10 2.4. Crout factorization for skew-symmetric matrices Algorithm 2.6 ILDLT C-BP 1: for k = 1, 3, . . . , n − 1 do 2: w1:k−1,1:2 = 0, wk:n,1:2 = Ak:n,k:k+1 3: for i = 1, 3, . . . , k − 2 and kAk:k+1,i:i+1 k = 6 0 do 4: wk:n,1:2 = wk:n,1:2 − Ak:n,i:i+1 Ai:i+1,i:i+1 ATk:k+1,i:i+1 5: end for 6: find max |wpq′ | s.t. p > q ′ 7: set q = q ′ + k − 1 8: if p > k + 1 then 9: z1:p−1 = −ATp,1:p−1 , zp:n = Ap:n,p 10: for i = 1, 3, . . . , k − 2 do 11: zk:n = zk:n − Ak:n,i:i+1 Ai:i+1,i:i+1 ATp,i:i+1 12: end for 13: end if 14: update A with w and z 15: interchange rows and columns k and q 16: interchange rows and columns k + 1 and p 17: apply dropping rules to Ak+2:n,k:k+1 18: for i = k + 2, k + 4, . . . , n − 1 do 19: Ai:i+1,k:k+1 = Ai:i+1,k:k+1 A−1 k:k+1,k:k+1 20: end for 21: end for For simplicity, we want to satisfy the invariant that Ak+2:n,k+2:n is untouched after each step of the elimination. Holding the updates in a temporary vector allows us to write into A only the updates for elements that remain, after permutations are performed, in Ak:n,k:k+1 . Any elements that are brought into Ak+2:n,k:k+1 due to row and column interchanges must also be updated first. Such elements lie in one column and we use the temporary vector z for this purpose. We illustrate this update process with the following example. Example 2.2. We ignore dropping rules for a moment and step through one step of both Bunch (Algorithm 2.3) and Crout (Algorithm 2.6) factorizations, paying attention to how elements are updated in the context of pivoting. In making this comparison, we see an instance of how delayedupdate algorithms differ from the usual KIJ-based algorithms. 11 2.4. Crout factorization for skew-symmetric matrices Let us take A to be A= 0 10 1 4 2 4 9 3 −10 0 2 5 3 10 1 2 −1 −2 0 0.7 4.9 11.2 10.3 2.6 −4 −5 −0.7 0 2.2 9 3.9 3.3 −2 −3 −4.9 −2.2 0 13.8 12.5 5.5 −4 −10 −11.2 −9 −13.8 0 1.4 11.8 −9 −1 −10.3 −3.9 −12.5 −1.4 0 10.5 −3 −2 −2.6 −3.3 −5.5 −11.8 −10.5 0 . We will now focus only on the lower triangular part of A. Let B (1) and C (1) denote what A looks like after the first step of Bunch and Crout, respectively. (The shading will be discussed shortly.) Note that the lower right submatrix of C (1) is unmodified from A. B (1) C (1) = = 0 10 −0.2 −0.5 −0.3 −1 −0.1 −0.2 0 10 −0.2 −0.5 −0.3 −1 −0.1 −0.2 0 0.1 0.4 0.2 0.4 0.9 0.3 0 0.1 0.4 0.2 0.4 0.9 0.3 0 0.7 4.9 11.2 10.3 2.6 0 1 5 11 12 3 0 2 7 8 4 0 13 15 6 0 10 14 0 9 0 0 2.2 9 3.9 3.3 0 13.8 12.5 5.5 0 1.4 11.8 0 10.5 0 In the second step, k = 3, we see from B (1) that the next pivot element (1) is in the (7, 3) position. Contrast this with C7,3 , which is not the largest element in magnitude in the third and fourth columns. The Schur complement update needs to be computed first for these two columns, and so w is (1) loaded with C3:8,3:4 . After the update, w is the same as the third and fourth columns of B (1) . Its elements below the diagonal are shaded. With the pivot element in the (p, q) = (7, 3) position, only rows and columns 4 and 7 need to be interchanged to move it into the block pivot. 12 2.4. Crout factorization for skew-symmetric matrices Performing this on B (1) , P B (1) P T = 0 10 −0.2 −0.1 −0.3 −1 −0.5 −0.2 0 0.1 0.9 0.2 0.4 0.4 0.3 0 12 5 11 1 3 0 −15 −10 −8 9 0 13 0 −2 −7 6 14 0 4 0 . There are now unshaded elements in the third and fourth columns, and some shaded elements have moved to the right. This tells us which elements of w need to be written into A in the Crout factorization, namely, those corresponding to the shaded elements that are still within the third and fourth columns. Elements of w corresponding to those that are now to the right of the fourth column should not be updated. Since p > k + 1 in our example, this also shows which elements to the right of the fourth column in C (1) need to be loaded into z and updated, namely column seven. Due to the restriction to the lower-half of the matrix, z is also loaded from the seventh row, as stated in line 9 of Algorithm 2.6. The shaded elements in C (1) are those that need to be updated before the matrix is permuted. It is no surprise that the shading pattern here is the same as the one in P B (1) P T which highlights elements originating from the third and fourth columns; it follows from how symmetric permutations are performed. The updated and permuted matrix in the Crout factorization is 0 10 −0.2 −0.1 −0.3 −1 −0.5 −0.2 0 0.1 0.9 0.2 0.4 0.4 0.3 0 12 5 11 1 3 0 −15 −10 −8 9 0 13.8 −2.2 5.5 0 −9 11.8 0 3.3 0 , which agrees with P B (1) P T in the first four columns. From here, the multipliers are computed in both factorizations, the Bunch factorization also computes Schur complements, and the second step is complete. We summarize the update procedure for line 14 of Algorithm 2.6 with the following: 13 2.5. Numerical results if q == k A(k+1:n,k) = w(k+1:n,1); if p == k+1 A(k+2:n,k+1) = w(k+2:n,2); else A(p,k+1:p-1) = -z(k+1:p-1); A(p+1:n,p) = z(p+1:n); end else % q==k+1 A(k+1,k) = w(k+1,1); A(k+2:p-1,k+1) = w(k+2:p-1,2); A(p+1:n,k+1) = w(p+1:n,2); if p > k+1 A(p,k:p-1) = -z(k:p-1); if p < n A(p+1:n,p) = z(p+1:n); end end end The idea is to update elements so that the pivot can be found, and then take permutations into account and write only the updates into A for elements that end up in columns k and k + 1. There are a few equivalence classes of possible pivot locations within these two columns, and the exact update details, which are trivial and somewhat tedious, are in the code shown above. We note that rook pivoting may again be used instead of Bunch partial pivoting here, but it is slightly more involved. Recall from §2.2 that rook pivoting on half the matrix involves looking at Ap,k:p−1 and Ap:n,p in each step. In Algorithm 2.6, after finding the maximal element in magnitude of w, z is exactly the next row/column searched in rook. At each subsequent step of rook, another vector serving a role similar to z will be needed to temporarily store updates before finding its maximal element. This is certainly more work, but as noted in [5], the number of rook steps needed to find the pivot is typically small. 2.5 Numerical results We compare in Matlab the performance of preconditioners generated from the incomplete factorizations ILDLT and ILU. We use our ILDLT C-BP routine to generate ILDLT factors. For the general ILU factors, we use 14 2.5. Numerical results Matlab’s built-in luinc routine. For the test problem, we use the skewsymmetric part of the matrix for the 3D convection-diffusion equation, discretized using the centered finite difference scheme on the unit cube with Dirichlet boundary conditions. The mesh Reynolds numbers are 0.48, 0.5, and 0.52. The grid size is 24, so A has dimension n = 13, 824. The number of nonzeros of A is 79,488, and the number of nonzeros in its full LU factorization is 7,220,435. Our results are presented in Table 2.1. The right-hand vector b is set preconditioner luinc luinc ildlc-bp ildlc-bp parameters 1e-1 1e-2 1e-2, 50 1e-3, 50 nnz 4,409,643 6,832,291 411,779 489,190 itn 21 31 9 9 err 1.50 2e-5 1.22 1.14 res 7e-7 8e-7 3e-7 1e-7 Table 2.1: Convergence results of preconditioned GMRES for test problem. to be Axe where xe is the normalized all-ones vector. Preconditioners are generated with a few different parameters for each of the methods. For luinc the parameter is the drop tolerance. For ildlc-bp, the first parameter shown is the drop tolerance and the second is the maximum number of nonzero blocks allowed per row. The number of nonzeros (nnz) reported is that of L+U for luinc and L+D for ildlc-bp since we need not store LT . We use Matlab’s gmres with tolerance 10−6 and restart of 30 iterations. The remaining columns of the table show the number of iterations required for convergence and the 2-norms of the error and residual. A plot of the relative residual norms at each step of GMRES is given in Figure 2.1 with no preconditioner and with the second and fourth preconditioners from the table. Unpreconditioned GMRES does not converge within 500 outer iterations. With the ILDLT preconditioners, GMRES converges to within the desired tolerance faster than with either of the ILU preconditioners. For all the preconditioned systems, the desired tolerance is achieved for the norm of the residual; however, most of them are not so well conditioned and we see that the error is relatively large except in the case of luinc with tolerance 1e-2. Unfortunately, the storage cost for achieving such accuracy is prohibitively large; the number of nonzeros in this luinc preconditioner is nearly that of the full factorization. When trying to obtain a sparser factor by using a higher drop tolerance with luinc, the error is similar to that for ILDLT preconditioners. For the latter, however, the factors are much 15 2.5. Numerical results 0 10 −1 10 −2 relative residual 10 −3 10 −4 10 −5 10 −6 unpreconditioned luinc (1e−2) ildlc−bp (1e−3,50) 10 −7 10 0 50 100 150 iteration 200 250 300 Figure 2.1: Residual vector norms for test problem. sparser, requiring an order of magnitude less storage than either of the ILU preconditioners. This gain is beyond that coming from not storing LT . There is potential for further gains in using ILDLT over ILU. Gaining insight into the parameter choice may be useful. Let us note that the factors generated were often quite ill-conditioned. It seems from some numerical experiments that this is due to large isolated eigenvalues in the preconditioned matrix. Seeking ways to improve conditioning remains an item for future work. 16 Chapter 3 Minimum residual iterations for skew-symmetric systems For skew-symmetric systems, it is natural to turn to short recurrence solvers such as the ones used in the symmetric case, for example, conjugate gradient and MINRES. These save on storage that would be required when using general non-symmetric solvers such as GMRES. Both CG and MINRES are slightly different when adapted to skew-symmetric systems. Greif and Varah give a detailed account of skew-CG in [5] and summarize the derivation for skew-MINRES, including preconditioning for both. See also [7] for an unpreconditioned version of MINRES for shifted skew-symmetric systems. In this chapter, we elaborate on skew-MINRES. We first discuss the unpreconditioned algorithm, starting with the Lanczos procedure and developing skew-MINRES, before adding preconditioning. Skew-symmetry introduces a number of zeros in predictable places, all due to the zero diagonal in the matrix. This slightly simplifies the iteration compared to that for standard symmetric systems. 3.1 Skew-Lanczos The Arnoldi iteration computes the Hessenberg reduction of a general matrix A = Vk Hk VkT with Hk k×k upper Hessenberg and Vk n×k orthonormal. For symmetric A, it is referred to as the Lanczos iteration and Hk is tridiagonal [2, §6.6.1]. Similarly, if A were skew-symmetric then Hk , which we shall now label Tk , is also skew-symmetric: TkT = (VkT AVk )T = VkT AT Vk = −VkT AVk = −Tk . This gives rise to an analogous skew-Lanczos iteration [5]. The iteration starts with the n×n, skew-symmetric matrix A and some initial vector v0 . We take the first vector in Vk to be v1 = v0 /kv0 k2 . At the kth step of the iteration, we have a sequence of vectors vi such that AVk = Vk+1 Tk+1,k . (3.1) 17 3.1. Skew-Lanczos The columns of Vk form an orthonormal basis of Kk (A; r0 ). Due to skewsymmetry, Tk+1,k is even sparser than that in the symmetric case: 0 α1 −α1 0 α2 −α2 0 α3 . . . . . . −α3 Tk+1,k = . . . . .. .. .. .. . 0 αk−1 −αk−1 0 −αk The kth column of (3.1) satisfies Avk = αk−1 vk−1 − αk vk+1 . Rearranging this gives the two-term recurrence, vk+1 = (αk−1 vk−1 − Avk )/αk , instead of the three-term recurrence found in the symmetric case. The αi are chosen to normalize the vi . If ever αi is effectively 0, the algorithm breaks down and another vector cannot be generated. This implies Avi ∈ Ki (A; r0 ). Algorithm 3.1 summarizes the skew-Lanczos iteration. The results of the algorithm are the set of vi that make up Vk+1 and the αi that define Tk+1,k . Note that A need not be available explicitly; it suffices to have a routine that returns matrix-vector products with A. Algorithm 3.1 Skew-Lanczos 1: set v1 = b/kbk2 , α0 = 0, v0 = 0 2: for i = 1 to k do 3: zi = Avi − αi−1 vi−1 4: αi = kzi k2 5: if αi = 0 then 6: quit 7: end if 8: vi+1 = −zi /αi 9: end for 18 3.2. QR factorization of Tk+1,k 3.2 QR factorization of Tk+1,k The skew-MINRES procedure that follows uses the QR factorization of Tk+1,k , µ ¶ Rk = Q̂k+1 Rk , Tk+1,k = Qk+1 0 where Qk+1 is (k + 1) × (k + 1) and Rk is k × k. It is convenient to define Q̂k+1 , which consists of the first k columns of Qk+1 . The relatively simple structure of Tk+1,k in turn gives a relatively simple Rk , and we can derive explicit formulas for each of its nonzero elements. The orthonormal matrix Q is not needed explicitly for skew-MINRES, but it is a product of Givens rotation matrices. ¡ 0 ¢ . Applying a Givens The first step of skew-Lanczos produces T2,1 = −α 1 rotation, µ ¶µ ¶ µ ¶ µ ¶ 0 R1 α1 0 −1 T G1 T21 = , = = 1 0 −α1 0 0 so R1 = (α1 ). Taking Q2 = G1 , the first factorization is complete. Each Gi denotes a particular rotation, but its dimension will vary depending on context, padding it with the identity matrix as necessary. In the kth iteration, GT1 will first be applied to Tk+1,k , followed by GT2 and so on up to GTk . They are all of dimension (k+1)×(k+1), and Qk+1 is defined as Qk+1 = G1 G2 ...Gk . The second step of skew-Lanczos produces 0 α1 0 . T3,2 = −α1 0 −α2 Applying the first rotation 0 −1 GT1 T3,2 = 1 0 0 0 To zero out −α2 , use the and α2p /r2,2 , respectively. r2,2 = α12 + α22 . In full, 1 0 GT2 (GT1 T3,2 ) = 0 c2 0 s2 matrix, α1 0 0 α1 0 0 = 0 α1 . 0 −α1 0 −α2 0 −α2 1 ¡ c 2 s2 ¢ T rotation −s , where c2 and s2 are α1 /r2,2 2 c2 This replaces the α1 in the second column with α1 p 0 α1 0 0 α1 = 0 −s2 0 α12 + α22 . 0 −α2 c2 0 0 19 3.2. QR factorization of Tk+1,k To explicitly see the pattern forming, we continue for a few more iterations, generating Rk from Tk+1,k . For k = 3, α1 p 0 −α2 1 0 0 0 0 1 0 0 0 α12 + α22 0 GT3 (GT2 GT1 T4,3 ) = 0 0 0 −1 0 0 0 0 0 1 0 0 0 −α3 r13 = −α2 α1 p 0 −α2 0 α12 + α22 0 , with r33 = α3 = . 0 0 α3 c = 0 3 0 0 0 s3 = 1 For k = 4, GT4 (QT4 T5,4 ) = α1 p 0 −α2 0 2 2 α1 + α2 0 −α3 s2 0 0 0 α3 p 0 0 0 0 (α3 c2 )2 + α42 0 0 0 0 with c4 = α3 c2 /r44 , and s4 = α4 /r44 . For k = 5, we again have c5 = 0 and s5 = 1. For k = 6, 0 B B B B T T G6 (Q6 T7,6 ) = B B B B @ α1 0 0 0 0 0 0 p 0 α12 + α22 0 0 0 0 0 −α2 0 α3 0 0 0 0 0 −α3 s2 0 p (α3 c2 )2 + α42 0 0 0 0 0 −α4 0 α5 0 0 0 0 0 −α5 s4 0 p (α5 c4 )2 + α62 0 1 C C C C C C C C A with c6 = α5 c4 /r66 and s6 = α6 /r66 . In summary, the QR factorization of Tk+1,k can be stated explicitly. First define c0 = 1, and the ci and si are ( ( 1 i odd 0 i odd , si = αi . ci = αi−1 ci−2 i even i even rii rii Qk+1 = G1 G2 · · · Gk and each Gi is a Givens rotation matrix, which is essentially the identity matrix with the substitution ¶ µ ci si . Gi (i : i + 1, i : i + 1) = −si ci 20 3.3. Skew-MINRES There are only two nonzero diagonals in Rk given by ( ( αi i odd −αi+1 ri,i = q , ri,i+2 = 2 2 −αi+1 si (αi−1 ci−2 ) + αi i even 3.3 i odd . i even Skew-MINRES Having developed skew-Lanczos and examined the QR factorization of the resulting skew-symmetric tridiagonal matrix, we have the tools to adapt MINRES into skew-MINRES to solve Ax = b where A is skew-symmetric. At each iteration, we seek xk such that kb − Axk k2 is minimized over the set x0 + Kk (A; r0 ). Thus xk = x0 + Vk yk for some yk , with Vk generated by skew-Lanczos using v0 = r0 . Throughout, we can use the initial guess x0 = 0 without loss of generality, for suppose we had x0 6= 0. Then define b′ = b − Ax0 = r0 , implying Kk (A; r0 ) = Kk (A; b′ ), and rewrite the original problem as min xk ∈x0 +Kk (A;r0 ) kb − Axk k2 = = min kb − Ax0 − A(xk − x0 )k2 min kb′ − Ax′k k2 , (xk −x0 )∈Kk (A;r0 ) x′k ∈Kk (A;b′ ) with x′k := xk − x0 . This new problem is equivalent to the original one with the new iterates being related to the original xk , and in particular, x′0 = x0 − x0 = 0. Hence from here on we will consider Kk (A; r0 ) to be Kk (A; b), and xk = Vk yk . (3.2) Making the usual substitutions, min kb − Axk k2 = min kb − AVk yk k2 xk yk = min kb − Vk+1 Tk+1,k yk k2 . yk Since Vk+1 is orthonormal and v1 = b/kbk2 , the problem becomes min kρe1 − Tk+1,k yk k2 , yk (3.3) where ρ = kbk2 and e1 is the first standard basis vector of size k+1. 21 3.3. Skew-MINRES Define Wk = Vk Rk−1 and zk = Q̂Tk+1 ρe1 . Combining (3.2), (3.3), and the QR factorization of Tk+1,k , the kth approximation to the solution of the system can be written as xk = (Vk Rk−1 )(Q̂Tk+1 ρe1 ) = Wk zk . Now we step through skew-MINRES iterations to derive expressions for Wk and zk , beginning with the latter. Define Iˆk to be the first k rows of the k+1 identity matrix. Rewrite zk = Q̂Tk+1 ρe1 = Iˆk QTk+1 ρe1 = Iˆk GTk (QTk ρe1 ). Note zk−1 = Iˆk−1 (QTk ρe1 ). For k = 1, µ ¶µ ¶ µ ¶ ρ 0 0 −1 T ˆ ˆ ˆ z1 = I1 Q2 ρe1 = I1 = I1 = 0, 1 0 0 ρ so x1 = 0 also. For k = 2 to 6, z2 z3 z4 z5 µ ¶ 0 1 0 0 0 0 ˆ ˆ ρ 0 c2 −s2 ρc2 = I2 = I2 = , ρc2 0 0 s2 c2 ρs2 0 0 1 0 0 0 0 ρc2 0 1 0 0 ρc2 ρc2 , ˆ = Iˆ3 0 0 0 −1 ρs2 = I3 0 = 0 ρs2 0 0 0 1 0 0 0 1 0 0 0 0 0 0 1 0 0 0 ρc2 ρc2 ρc2 ˆ 0 = Iˆ4 0 = I4 0 = 0 0 0 1 0 ρc4 s2 0 0 0 c4 −s4 ρs2 ρc4 s2 ρs2 s4 0 0 0 0 s4 c4 = Iˆ5 (0 ρc2 0 ρc4 s2 0 ρs2 s4 )T = (0 ρc2 0 ρc4 s2 0)T , and , z6 = Iˆ6 (0 ρc2 0 ρc4 s2 0 ρc6 s2 s4 ρs2 s4 s6 )T = (0 ρc2 0 ρc4 s2 0 ρc6 s2 s4 )T . In general, z1 = 0 and ¶ µ zk−1 , zk = ζk where ( 0 ζk = Q ρck i=2,4,...,k−2 si k odd . k even As for Wk , only the even-indexed columns are of interest since xk = Wk zk and all odd-indexed elements of zk are zero. Indeed, skew-MINRES will only produce an approximation xk for k even. 22 3.3. Skew-MINRES ´ ³ v2 v2 1 For k = 2, W2 := V2 R2−1 = rv1,1 r2,2 and w2 = r2,2 . For k = 4, 6, . . ., recalling the structure of Rk gives vk = wk−2 rk−2,k + wk rk,k . Rearranging this, vk − wk−2 rk−2,k wk = . rk,k In the end, each even iteration of skew-MINRES produces xk = w2 ζ2 + w4 ζ4 + . . . + wk ζk . Algorithm 3.2 shows the method in detail, incorporating skew-Lanczos. As in GMRES with a nonsingular A, here if αi = 0 for some i, then residual is also 0 and the solution has been found [11, Proposition 6.10]. Algorithm 3.2 Skew-MINRES 1: p = kbk2 , α0 = 0, v0 = w = 0, v1 = b/p, c = 1, s = 0, x0 = 0 2: for k = 1, 2, . . . , until convergence do 3: zk = Avk − αk−1 vk−1 4: αk = kzk k2 5: vk+1 = −zk /αk 6: if k even q then 7: r = (αk−1 c)2 + αk2 8: w = (vk + αk−1 sw)/r 9: c = αk−1 c/r 10: ζ = cp 11: xk = xk−2 + ζw 12: s = αk /r 13: p = ps 14: end if 15: end for Typically, the norm of the residual is used to determine convergence and, as usual, it can be computed without calculating b−Ax explicitly. Note that yk solves minyk kρe1 − Tk+1,k yk k by satisfying Tk yk = ρe1 [11, Proposition 23 3.4. Preconditioned skew-MINRES 6.9]. Thus the residual vector itself is rk = b − Axk = ρv1 − AVk yk = ρv1 − Vk+1 Tk+1,k yk = ρVk e1 − (Vk Tk + αk vk+1 eTk )yk = Vk (ρe1 − Tk yk ) − αk vk+1 eTk yk = −αk eTk yk vk+1 , where eTk yk is the kth element of yk . Its norm is given by krk k2 = |αk eTk yk | = |αk eTk (VkT Wk zk )| = |αk eTk (Rk−1 zk )| = |αk ζk /rkk | = |ζk |, exactly as in the general case. 3.4 Preconditioned skew-MINRES We can precondition skew-symmetric systems in a manner very similar to that in the symmetric case [4, Chapter 8] to obtain a preconditioned skewMINRES iteration, or skew-PMINRES. To preserve skew-symmetry, a symmetric positive definite preconditioner M = LLT is used in a symmetric preconditioning scheme, giving the system L−1 AL−T x̂ = L−1 b, x = L−T x̂. (3.4) Kk (A; b) Instead of searching over for iterates, skew-PMINRES searches k −1 −T −1 over K (L AL ; L b), so we start by developing preconditioned skewLanczos. If regular skew-Lanczos (Algorithm 3.1) were run with the matrix L−1 AL−T and initial vector v̂1 = L−1 b/kL−1 bk2 , then each iteration would produce ẑk = L−1 AL−T v̂k − αk−1 v̂k−1 q αk = ẑkT ẑk v̂k+1 = −ẑk /αk . 24 3.4. Preconditioned skew-MINRES Using this directly in a preconditioned MINRES algorithm would require having L explicitly, yet M may not always be available in factored form. Additionally, it would only compute iterates x̂k , and an additional solve with LT would be required to obtain xk . We make modifications to arrive at skew-PMINRES which addresses both of these issues. It requires only one solve with M at each iteration. First, to combine the L and LT solves in preconditioned skew-Lanczos, define vk = Lv̂k zk = Lẑk uk = L−T L−1 vk = M −1 vk . Note that the vk are no longer orthonormal; it is the v̂k that are orthonormal and form a basis for Kk (L−1 AL−T ; L−1 b). Now each iteration generates zk = LL−1 AL−T L−1 vk − αk−1 LL−1 vk = Auk − αk−1 vk q q αk = (L−1 zk )T (L−1 zk ) = zkT M −1 zk vk+1 = −zk /αk uk+1 = M −1 vk+1 . Define ũk = M −1 zk to remove the unnecessary preconditioner solve for αk . Then q αk = zkT ũk uk+1 = −ũk /αk . This gives Algorithm 3.3, the preconditioned skew-Lanczos routine that generates the Vk and Tk+1,k used in skew-PMINRES. There is only one further change to make in skew-MINRES to obtain the skew-PMINRES algorithm; one that makes the algorithm explicitly compute iterates xk approximating the solution to Ax = b rather than the x̂k of (3.4). Consider line 8 in Algorithm 3.2, wk = (vk + αk−1 sk−2 wk−1 )/rk . The xk generated in line 11 are simply linear combinations of the wk , which in turn are linear combinations of the vk . However, these are the vk from preconditioned skew-Lanczos. To obtain x̂k , we need v̂k = L−1 vk . If the solve with LT is introduced here, then line 11 computes the xk desired. Conveniently, 25 3.4. Preconditioned skew-MINRES Algorithm 3.3 Preconditioned skew-Lanczos √ 1: a0 = 0, v0 = 0, v1 = b/ bT M −1 b, u1 = M −1 v1 2: for i = 1, 2, . . . , k do 3: zi = Aui − αi−1 vi−1 −1 4: ũi = M q zi 5: αi = ziT ũi 6: vi+1 = −zi /αi 7: ui+1 = −ũi /αi 8: end for the uk from preconditioned skew-Lanczos are exactly L−T L−1 vk , and line 8 of Algorithm 3.2 becomes wk = (uk + αk−1 sk−2 wk−1 )/rk . With this we have skew-PMINRES in Algorithm 3.4. The inputs are A, b, and either M or L. It works on the preconditioned system of (3.4), and it returns approximations xk to the solution of Ax = b. Algorithm 3.4 Skew-PMINRES √ 1: p = bT M −1 b, α0 = s = 0, c = 1, v0 = w = x0 = 0, v1 = b/p, u1 = M −1 v1 2: for k = 1, 2, . . . , until convergence do 3: zk = Auk − αk−1 vk−1 −1 4: ũk = M q zk 5: αk = zkT ũk 6: vk+1 = −zk /αk 7: uk+1 = −ũk /αk 8: if k even q then 9: r = (αk−1 c)2 + αk2 10: w = (uk + αk−1 sw)/r 11: c = αk−1 c/r 12: ζ = cp 13: xk = xk−2 + ζw 14: s = αk /r 15: p = ps 16: end if 17: end for 26 3.4. Preconditioned skew-MINRES For computing the residual easily in the preconditioned algorithm, the derivation is similar to that for plain skew-MINRES. The problem now solved is that given in (3.4), yet the residual of interest is still b − Axk . Instead of (3.1), we now have (L−1 AL−T )(L−1 Vk ) = (L−1 Vk+1 )Tk+1,k , or AM −1 Vk = Vk+1 Tk+1,k , since it is the L−1 vk that are orthonormal here. Additionally, it is x̂k that can be written as L−1 Vk yk , for some yk . By the definition of x̂k , xk = L−T L−1 Vk yk = M −1 Vk yk . Putting these together, rk = b − Axk = ρv1 − AM −1 Vk yk = ρv1 − Vk+1 Tk+1,k yk . The remainder of the derivation is exactly as shown previously in the unpreconditioned case, so again krk k2 = |ζk |. 27 Chapter 4 Conclusions and future work We discussed the Bunch LDLT decomposition for skew-symmetric matrices and derived a Crout-based factorization for computing incomplete LDLT factorizations with pivoting. Implementation details have been given on how to practically save on computational work and storage by performing operations on only the lower triangular half of the matrix. This may be useful in writing more efficient implementations using compressed matrix storage schemes. Numerical results show that a skew-symmetric preconditioner can be an improvement over the general incomplete LU factorization. For preconditioners that gave roughly the same error and residual in GMRES, the skew preconditioner was much sparser than the general one and required fewer iterations for convergence. Further work here may investigate the effects of the parameters of ILDLT C-BP on the performance of preconditioned GMRES. Another item is to explore improving the conditioning of the factors, since those produced by our factorization were not as robust as expected. We have also presented the details of a skew preconditioned MINRES iteration, derived in [5], starting by adapting Lanczos to skew-symmetric systems and seeing that the structure results in skew-MINRES producing approximations every other iteration. While preconditioning was incorporated, the procedure relies on having a symmetric positive definite preconditioner. Given the gains observed when using a skew-symmetric preconditioner in GMRES, future work includes investigating whether it is possible to derive a form of MINRES that can utilize skew-symmetric preconditioners. It is not clear to us how to formulate such a MINRES procedure. Finally, we return to the point mentioned originally that skew-symmetric systems arise from non-symmetric systems. Another potential area of work lies in incorporating the above into general non-symmetric solvers when dealing with systems that have a dominant skew part. 28 Bibliography [1] J. R. Bunch. A note on the stable decomposition of skew-symmetric matrices. Math. Comp., 38(158):475–479, 1982. [2] J. W. Demmel. Applied Numerical Linear Algebra. SIAM, 1997. [3] I. S. Duff. The design and use of a sparse direct solver for skew symmetric matrices. J. Comput. Appl. Math., 226(1):50–54, 2009. [4] A. Greenbaum. Iterative methods for solving linear systems. SIAM, 1997. [5] C. Greif and J. M. Varah. Iterative solution of skew-symmetric linear systems. SIAM J. Math. Anal., 31(2):584–601, 2009. [6] N. J. Higham. Accuracy and Stability of Numerical Algorithms. SIAM, 2nd edition, 2002. [7] R. Idema and C. Vuik. A minimal residual method for shifted skewsymmetric systems. Technical Report 07-09, Delft University of Technology, 2007. [8] N. Li and Y. Saad. Crout versions of the ILU factorization with pivoting for sparse symmetric matrices. Electron. Trans. Numer. Anal., 20:75– 85, 2005. [9] N. Li, Y. Saad, and E. Chow. Crout versions of ILU for general sparse matrices. SIAM J. Sci. Comput., 25(2):716–728, 2003. [10] C. C. Page and M. A. Saunders. Solution of sparse indefinite systems of linear equations. SIAM J. Numer. Anal., 12(4):617–629, 1975. [11] Y. Saad. Iterative Methods for Sparse Linear Systems. SIAM, Philadelpha, PA, 2nd edition, 2003. [12] L. N. Trefethen and D. Bau. Numerical Linear Algebra. SIAM, 1997. 29
- Library Home /
- Search Collections /
- Open Collections /
- Browse Collections /
- UBC Theses and Dissertations /
- Numerical solution of skew-symmetric linear systems
Open Collections
UBC Theses and Dissertations
Featured Collection
UBC Theses and Dissertations
Numerical solution of skew-symmetric linear systems Lau, Tracy 2009
pdf
Notice for Google Chrome users:
If you are having trouble viewing or searching the PDF with Google Chrome, please download it here instead.
If you are having trouble viewing or searching the PDF with Google Chrome, please download it here instead.
Page Metadata
Item Metadata
Title | Numerical solution of skew-symmetric linear systems |
Creator |
Lau, Tracy |
Publisher | University of British Columbia |
Date Issued | 2009 |
Description | We are concerned with iterative solvers for large and sparse skew-symmetric linear systems. First we discuss algorithms for computing incomplete factorizations as a source of preconditioners. This leads to a new Crout variant of Gaussian elimination for skew-symmetric matrices. Details on how to implement the algorithms eﬃciently are provided. A few numerical results are presented for these preconditioners. We also examine a specialized preconditioned minimum residual solver. An explicit derivation is given, detailing the eﬀects of skew-symmetry on the algorithm. |
Genre |
Thesis/Dissertation |
Type |
Text |
Language | eng |
Date Available | 2010-01-04 |
Provider | Vancouver : University of British Columbia Library |
Rights | Attribution-NonCommercial-ShareAlike 3.0 Unported |
DOI | 10.14288/1.0051396 |
URI | http://hdl.handle.net/2429/17435 |
Degree |
Master of Science - MSc |
Program |
Computer Science |
Affiliation |
Science, Faculty of Computer Science, Department of |
Degree Grantor | University of British Columbia |
GraduationDate | 2010-05 |
Campus |
UBCV |
Scholarly Level | Graduate |
Rights URI | http://creativecommons.org/licenses/by-nc-sa/3.0/ |
AggregatedSourceRepository | DSpace |
Download
- Media
- 24-ubc_2010_spring_lau_tracy.pdf [ 235.12kB ]
- Metadata
- JSON: 24-1.0051396.json
- JSON-LD: 24-1.0051396-ld.json
- RDF/XML (Pretty): 24-1.0051396-rdf.xml
- RDF/JSON: 24-1.0051396-rdf.json
- Turtle: 24-1.0051396-turtle.txt
- N-Triples: 24-1.0051396-rdf-ntriples.txt
- Original Record: 24-1.0051396-source.json
- Full Text
- 24-1.0051396-fulltext.txt
- Citation
- 24-1.0051396.ris
Full Text
Cite
Citation Scheme:
Usage Statistics
Share
Embed
Customize your widget with the following options, then copy and paste the code below into the HTML
of your page to embed this item in your website.
<div id="ubcOpenCollectionsWidgetDisplay">
<script id="ubcOpenCollectionsWidget"
src="{[{embed.src}]}"
data-item="{[{embed.item}]}"
data-collection="{[{embed.collection}]}"
data-metadata="{[{embed.showMetadata}]}"
data-width="{[{embed.width}]}"
data-media="{[{embed.selectedMedia}]}"
async >
</script>
</div>
Our image viewer uses the IIIF 2.0 standard.
To load this item in other compatible viewers, use this url:
https://iiif.library.ubc.ca/presentation/dsp.24.1-0051396/manifest