ILDL Factorizations for Sparse Skew-Symmetric Matrices by Shiwen He B.Sc., The University of British Columbia, 2008 A THESIS SUBMITTED IN PARTIAL FULFILLMENT OF THE REQUIREMENTS FOR THE DEGREE OF Master of Science in THE FACULTY OF GRADUATE STUDIES (Computer Science) The University Of British Columbia (Vancouver) April 2012 c© Shiwen He, 2012 Abstract Our goal is to solve a sparse skew-symmetric linear system efficiently. We pro- pose a slight modification to the Bunch LDLT factorization with partial pivoting strategy for skew-symmetric matrices, which saves approximately one third of the overall number of swaps of rows and columns on average. We also develop a rook pivoting strategy for this LDLT factorization in Crout order. We derive an incom- plete LDLT factorization based on the full LDLT factorization, with both partial pivoting strategy and rook pivoting strategy. The incomplete factorization can be used as a preconditioner for Krylov subspace solvers. Various column versions of ILUTP factorizations are investigated. We show that the approach taken saves approximately half of the work and storage compared to standard ILU factoriza- tions that do not exploit the structure of the matrix, and the rook pivoting strategy is often superior to the partial pivoting strategy, in particular when the matrix is ill-conditioned and the overall number of elements in the rows of the factors is restricted. ii Table of Contents Abstract . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . ii Table of Contents . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . iii List of Tables . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . v List of Figures . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . vi Acknowledgments . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . vii 1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1 2 Factorizations of skew-symmetric matrices . . . . . . . . . . . . . . 3 2.1 Gaussian elimination variants . . . . . . . . . . . . . . . . . . . . 3 2.2 LDLT decomposition for symmetric matrices . . . . . . . . . . . 5 2.2.1 Symmetric Partial pivoting . . . . . . . . . . . . . . . . . 7 2.2.2 Symmetric Rook Pivoting . . . . . . . . . . . . . . . . . 10 2.3 LDLT decomposition for skew-symmetric matrices . . . . . . . . 11 2.3.1 Bunch’s partial pivoting strategy . . . . . . . . . . . . . . 12 2.3.2 Modified Bunch’s partial pivoting strategy . . . . . . . . . 15 2.3.3 Skew-Symmetric Rook pivoting . . . . . . . . . . . . . . 17 2.3.4 Crout factorization for skew-symmetric matrices . . . . . 19 3 Incomplete factorizations of skew-symmetric matrices . . . . . . . . 23 3.1 Incomplete LU factorizations . . . . . . . . . . . . . . . . . . . . 23 3.1.1 IKJ variant . . . . . . . . . . . . . . . . . . . . . . . . . 24 iii 3.1.2 JKI variant . . . . . . . . . . . . . . . . . . . . . . . . . 25 3.2 Incomplete LDLT of skew-symmetric matrices . . . . . . . . . . 32 3.3 Numerical experiments . . . . . . . . . . . . . . . . . . . . . . . 35 4 Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 40 Bibliography . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 41 iv List of Tables Table 3.1 Performance of 4 variants of ILUTPC for general matrices. . . 37 Table 3.2 Performance of ILDLT and ILUTPC for skew-symmetric ma- trices. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 38 v List of Figures Figure 3.1 Preconditioned GMRES for RAEFSKY1 . . . . . . . . . . . 30 Figure 3.2 Preconditioned GMRES for RAEFSKY3 . . . . . . . . . . . 31 Figure 3.3 Preconditioned GMRES for a skew-symmetric matrix A . . . 39 vi Acknowledgments I am very grateful to Chen Greif, my supervisor, for his help and guidance on my thesis, especially when I have had trouble moving on. His kindness and enthusiasm have assisted me a lot in writing this thesis. I thank Robert Bridson for being a reader of this thesis and giving useful suggestions on it. I also thank all the people that have devoted their time to provide all kinds of help in my life, including my family members, my lab-mates, and my friends. vii Chapter 1 Introduction A real skew-symmetric matrix A satisfies A = −AT . Every square matrix has a symmetric part and a skew-symmetric part, which can be combined in the follow- ing way: A= A+AT 2 + A−AT 2 . Symmetric matrices are very important and much time and effort have been de- voted to explore them, while skew-symmetric matrices have not been as extensively explored. Although skew-symmetric matrices are not as prominent as symmetric ones, they arise in many applications, explicitly and implicitly. In the Hermitian and Skew-Hermitian method [2], we need to solve a shifted skew-symmetric sys- tem as part of the solution procedure. For example, consider the matrix A associated with the equation of the finite difference of discretization of −∆u+(σ ,τ)∇u= f . If σ and τ are very large, A is dominated by its own skew-symmetric part. This makes the problem hard to solve. However, if we have a fast solver for a skew- symmetric linear system, it may help in solving the original linear system. A skew-symmetric matrix has several properties: • Every diagonal entry is zero since ai j =−a ji→ aii = 0 ∀i. 1 • For any x, (xTAx)T =−xTAx= 0. • If the dimension n of A is odd, then A is singular. • A−1 is also skew-symmetric if it exists. • All eigenvalues of A are either 0 or occur in pure imaginary complex conju- gate pairs. Our goal is to find an effective incomplete factorization as a preconditioner for a large and sparse skew-symmetric matrix. In the second chapter of this thesis, we explore a full factorization PAPT = LDLT , using Bunch’s partial pivoting strategy, where A is a skew-symmetric matrix, P is a permutation matrix, L is a unit lower triangular matrix, and D is a block diagonal matrix with every block of size 2×2. We develop a rook pivoting variant, and explore the merits of a Crout version of the factorization of a skew-symmetric matrix. In chapter 3 we derive an incomplete version of PAPT = LDLT and compare the performance of this incomplete factorization as a preconditioner with the in- complete LU factorization for general matrices. Thesis contributions Despite the similarities between skew-symmetric matrices and symmetric indefi- nite matrices, the zero diagonal elements and the inherent 2-by-2 block structure of factorizations for skew-symmetric matrices provide unique computational chal- lenges. The main contribution of this thesis is the development of a new incomplete LDLT factorization, which presents savings of half of the computational work and storage, compared to a standard ILU factorization. We also offer a discussion of variants of Gaussian elimination techniques. We note that the recent MSc thesis [8] also deals with skew-symmetric linear systems. The current work goes beyond it in terms of the pivoting strategies and the dropping rules that are applied. 2 Chapter 2 Factorizations of skew-symmetric matrices We want a fast iterative solver for sparse skew-symmetric systems, so it is natural to find an incomplete factorization. The general-purpose incomplete LU factorization does not serve us very well because it does not make use of the skew-symmetric pattern of the matrix. We first explore the LDLT factorization for skew-symmetric matrices and later derive an incomplete LDLT factorization from it. We first review the two basic variants of Gaussian elimination and the Crout version of it since the order of elimination is important when we turn the com- plete factorization into an incomplete factorization. The Bunch LDLT decompo- sition is then discussed as it is the factorization we are modifying to get an in- complete LDLT factorization for a skew-symmetric matrix. Li and Saad proposed an incomplete LDLT factorization for symmetric matrices in [9]. We incorporate some dropping rules with Bunch LDLT factorization to get an incomplete LDLT for skew-symmetric matrices. 2.1 Gaussian elimination variants Gaussian elimination is usually presented in its KIJ form (Algorithm 2.1). The name is due to the ordering of the loop indices. At step k, the Ak+1:n,k+1:n submatrix is updated by subtracting a rank-1 matrix, which is the outer product of one column 3 vector and one row vector. This is natural for a dense matrix, but as noted in [7] it is not suitable when we are working with matrices stored in sparse mode. It has the disadvantage that we need to access the whole reduced matrix at every step. Algorithm 2.1 KIJ Gaussian elimination for general matrices 1: for k = 1 to n−1 do 2: for i= k+1 to n do 3: aik = aik/akk 4: for j = k+1 to n do 5: ai j = ai j−aikak j 6: end for 7: end for 8: end for Therefore, the IKJ variant of Gaussian elimination (Algorithm 2.2) may be preferred over the KIJ variant in this case. It is more efficient since it accesses only row i at step i. This is referred to as a ”delayed-update” variant because all Schur complement updates for an element ai j are delayed to step i [10]. The KIJ and IKJ variants produce the same LU factorization for a matrix in exact arithmetic [11]. There is also a JKI variant, which is similar to the IKJ variant but looks at one column instead of one row at each step. Whether IKJ or JKI is more suitable for a sparse matrix depends on how the matrix is stored. The IKJ variant is better when the matrix is stored in compressed row form, while the JKI variant is better for compressed column form. Algorithm 2.2 IKJ Gaussian elimination for general matrices 1: for i= 2 to n do 2: for k = 1 to i−1 do 3: aik = aik/akk 4: for j = k+1 to n do 5: ai j = ai j−aikak j 6: end for 7: end for 8: end for The Crout variant of the factorization (Algorithm 2.4) is even more suitable than the IKJ variant. It accesses only the kth row and kth column at step k and thus 4 Algorithm 2.3 JKI Gaussian elimination for general matrices 1: for j = 1 to n do 2: for k = 1 to j−1 do 3: for i= k+1 to n do 4: ai j = ai j−aikak j 5: end for 6: end for 7: for i= j+1 to n do 8: ai j = ai j/a j j 9: end for 10: end for does not touch the reduced matrix Ak+1:n,k+1:n. The incomplete LU factorization, based on the Crout variant, is usually more robust than the incomplete LU factor- ization, based on the ILUT method [10]. This is also the method we are interested in and which we incorporate into the Bunch LDLT factorization. Algorithm 2.4 Crout LU factorization for general matrices 1: for k = 1 : n do 2: for i= 1 : k−1 and if aki 6= 0 do 3: ak,k:n = ak,k:n−akiai,k:n 4: end for 5: for i= 1 : k−1 and if aik 6= 0 do 6: ak+1:n,k = ak+1:n,k−aikak+1:n,i 7: end for 8: for i= k+1 : n do 9: aik = aik/akk 10: end for 11: end for 2.2 LDLT decomposition for symmetric matrices Before looking at skew-symmetric matrices, it is useful to first look at some factor- izations of symmetric matrices. A symmetric matrix is indefinite if it has both pos- itive and negative eigenvalues. It arises in many applications. How to solve Ax= b efficiently when A is symmetric indefinite? The Cholesky decomposition does not 5 work because A is not SPD. Gaussian elimination with partial pivoting works for this, but it destroys the symmetry of the matrix. A factorization A = LDLT is not always possible, where L is unit lower triangular and D is diagonal, even if sym- metric pivoting is used [7]. Moreover, if this factorization exists, it may not be stable. For example, the nearly perfectly conditioned matrix A≡ PAPT = ( ε 1 1 ε ) = ( 1 0 1/ε 1 )( ε 0 0 ε−1/ε )( 1 1/ε 0 1 ) has large element growth when 0 < ε 1. However, three are two different types of factorization that are useful for a symmetric indefinite matrix. The first one is the block LDLT factorization PAPT = LDLT where P is a permutation matrix, L is unit lower triangular, and D is block diagonal with diagonal blocks of dimension 1×1 or 2×2. The second one is the Aasen’s method, PAPT = LTLT where P is a permutation matrix, L is unit lower triangular with first column e1, and T is tridiagonal. Details and references for the above two factorizations are given in [7]. Aasen’s method is not contained in many program libraries. It seems that part of the reason is it is easier to work with a block diagonal matrix than to work with a tridiagonal matrix. We may need to further decompose a tridiagonal matrix into a block diago- nal one, so it may not be effective to have a tridiagonal matrix in some intermediate stage when the goal is to have a block diagonal matrix. We will not further pursue Aasen’s method here. We are interested in the block LDLT factorization. If the symmetric indefinite matrix A is not zero, there exists a permutation matrix P and an integer s= 1 or 2 such that PAPT = ( E CT C B ) , 6 where E is nonsingular. After finding the permutation matrix P and permuting A, we can factorize PAPT = ( Is 0 CE−1 In−s )( E 0 0 B−CE−1CT )( Is E−1CT 0 In−s ) . Continuing this process recursively on the (n− s)× (n− s) Schur complement Â= B−CE−1CT will give the factorization PAPT = LDLT . Algorithm 2.5 computes the LDLT factorization of a symmetric matrix A. Algorithm 2.5 LDLT for symmetric matrices 1: while k < n do 2: find a pivot apq in Ak:n,k:n and permute A accordingly 3: s = size of E 4: set E = Ak:k+s−1,k:k+s−1, C = Ak+s:n,k:k+s−1, B= Ak+s:n,k+s:n 5: Ak+s:n,k:k+s−1 =CE−1 6: Ak+s:n,k:k+s−1 = B−CE−1CT 7: k = k+ s 8: end while This algorithm overwrites the lower triangular part of A to store L, and the diag- onal part to store D. Different pivoting strategies differ in choosing the permutation matrix P at each step. That is, in line 2 of Algorithm 2.5, we choose a different E and permute A accordingly. We do not need to store P explicitly, but instead we have a vector p that records the interchanges of rows and columns. In MATLAB notation, P can be generated by I(p, :). It suffices for each pivoting strategy to describe only line 2 of the algorithm. 2.2.1 Symmetric Partial pivoting In 1971, Bunch and Parlett [4] proposed a complete pivoting strategy for decom- posing a symmetric matrix into LDLT form. This strategy, although very stable, searches the entire reduced matrix at each step. This is not practical when we are working on large sparse matrices, so we will not consider it here. In 1977, Bunch 7 and Kaufman proposed a partial pivoting strategy. This strategy only searches at most two columns of the reduced matrix at each step, so it is efficient and effec- tive. Recall that the partial pivoting strategy for LU decomposition searches only one column at each step. This partial pivoting strategy, however, searches at most two columns at each step because we sometimes need a 2× 2 pivoting matrix. It suffices to describe the first step of this strategy. Algorithm 2.6 Bunch-Kaufman LDLT using partial pivoting strategy 1: α = (1+ √ 17)/8 (≈ 0.64) 2: ω1 = maximum magnitude of any subdiagonal entry in column 1 3: if ω1 = 0 there is nothing to do at this stage 4: if |a11| ≥ αω1 then 5: Use a11 as a 1×1 pivot (s= 1) (case 1) 6: else 7: r= row index of first (subdiagonal) entry of maximum magnitude in column 1 8: ωr = maximum magnitude of any off-diagonal entry in column r 9: if |a11|ωr ≥ αω21 then 10: Use a11 as a 1×1 pivot (s= 1) (case 2) 11: else if |arr| ≥ αωr then 12: Use arr as a 1×1 pivot (s= 1, swap rows and columns 1 and r) (case 3) 13: else 14: Use ( a11 ar1 ar1 arr ) as a 2×2 pivot (s= 2, swap rows and columns 2 and r) (case 4) 15: end if 16: end if It’s important to look at some details of this pivoting strategy. Line 3 of Algo- rithm 2.6 will detect whether the matrix is singular. Even if the original matrix is singular, the algorithm can continue without any problems. In that case, the diag- onal matrix D will have at least one block of 0, and the lower triangular matrix L will have at least one column being standard unit vector ei. The mysterious constant α = (1+ √ 17)/8 controls the growth factor. It is the same as the growth factor in complete pivoting strategy, but the elements of the multipliers CE−1 are not bounded at all if we run into case 2 or case 4. The partial pivoting strategy can be shown to have satisfactory backward stability. In 2000, 8 Higham proved the stability of this partial pivoting strategy in [7]. The crucial part of this partial pivoting strategy is it has four cases. At each step, it determines which one of these four cases applies. It has several logical tests to do this. A brief review of these four cases is given below. • Case 1: This case applies when the diagonal entry a11 is larger than the largest entry ω1 times the constant α , both in absolute values. It guarantees the multipliers in L are smaller than 1/α in the first column. • Case 2: If case 1 does not apply, then the partial pivoting strategy will search another column, indexed by the row index of first (subdiagonal) entry of maximum magnitude in column 1. The product |a11ωr| is compared with αω21 . If the former is larger, then it is in case 2. Note that in this case, the ratio of ω1 to a11 can be anything larger than 1/α , which means the multipliers in the first column of L are unbounded. • Case 3: If cases 1 and 2 do not apply, and |arr| is larger than αωr, then it is defined as case 3. The analysis is similar to that in case 1. Look at how similar these two comparisons are: |a11| ≥ αω1 (case 1) VS. |arr| ≥ αωr (case 3). This case, unlike the first two, requires swapping of rows and columns 1 and r. Thus, this case also guarantees that the multipliers in L are smaller than 1/α in the first column. • Case 4: This case applies if none of the above three cases hold. This time, not only swapping of rows and columns 2 and r is required, but also a 2×2 matrix is used as the pivot. Similar to case 2, the ratio of ω1 to a11 can be anything larger than 1/α . The multipliers in L here are not as simple as a division of two numbers because now the pivot is a 2×2 matrix. Similar to case 2, the multipliers are unbounded. In sum, cases 1 and 2 use a11 as the pivot, case 3 uses arr, and case 4 uses [ a11 ar1ar1 arr ]. Cases 2, 3, and 4 need to search two columns while case 1 only needs to search one column in each loop iteration. The multipliers in L are bounded by a constant in both cases 1 and 3, but are unbounded in cases 2 and 4. Even though sometimes the multipliers in L are unbounded, the growth factor for D is bounded, which guarantees stability of the factorization. 9 2.2.2 Symmetric Rook Pivoting Although the partial pivoting strategy is backward stable, the possibly large ele- ments in the unit lower triangular matrix L may cause some trouble. We might want a pivoting strategy that has both the advantages of complete pivoting strat- egy and partial pivoting strategy. Similar to the LU decomposition, there is a rook pivoting strategy that is a trade-off between complete pivoting strategy and partial pivoting strategy. It searches the matrix in spiral order to find the pivoting matrix E. Algorithm 2.7 LDLT using rook pivoting strategy 1: α = (1+ √ 17)/8 (≈ 0.64) 2: ω1 = maximum magnitude of any subdiagonal entry in column 1 3: if ω1 = 0 there is nothing to do on this stage 4: if |a11| ≥ αω1 then 5: Use a11 as a 1×1 pivot (s= 1) 6: else 7: i= 1 8: repeat 9: r = row index of first (subdiagonal) entry of maximum magnitude in col- umn i 10: ωr = maximum magnitude of any off-diagonal entry in column r 11: if |arr| ≥ αωr then 12: Use arr as a 1×1 pivot (s= 1, swap rows and columns 1 and r) 13: else if ωi = ωr then 14: Use ( aii ari ari arr ) as a 2×2 pivot (s= 2, swap rows and columns 1 and i, and 2 and r) 15: else 16: i= r,ωi = ωr 17: end if 18: until a pivot is chosen 19: end if Algorithm 2.7 searches the elements of the matrix in spiral order until it finds an element that is largest in absolute value in both its row and its column, or ter- minates if it finds a relatively large diagonal element. Here, a diagonal element being relatively large means it is larger than α times the largest element in its col- 10 umn. Comparing the rook pivoting strategy with the partial pivoting strategy, we see that case 2 of the partial pivoting strategy is omitted in the rook pivoting strat- egy, and case 4 of the partial pivoting strategy is modified a little bit to ensure that after swapping, a12 and a21 are the largest element in column 1 and 2. With this property, the elements of L are bounded, similar to the complete pivoting strategy. 2.3 LDLT decomposition for skew-symmetric matrices After looking at symmetric matrices, we consider skew-symmetric ones. When the size of a skew-symmetric matrix is odd, it is singular. We do not want to work with singular matrices, so we concentrate on only nonsingular skew-symmetric matrices with even dimensions. Bunch proposed an algorithm to decompose a skew-symmetric matrix into LDLT form using partial pivoting strategy [3]. Here we describe a simplified version (Algorithm 2.8) of it, only for skew-symmetric matrices with even size. A similar analysis for skew-symmetric matrices as that for symmetric matrices is PAPT = ( E −CT C B ) , where E is nonsingular. After finding the permutation matrix P and permuting A, we can factorize PAPT = ( Is 0 CE−1 In−s )( E 0 0 B+CE−1CT )( Is −E−1CT 0 In−s ) . Continuing this process recursively on the (n− s)× (n− s) Schur complement Â= B+CE−1CT will give the factorization PAPT = LDLT for skew-symmetric matrices. Note that the factors above are almost the same as those in symmetric ma- trices. Since we are working on skew-symmetric matrices with an even size and assume nonsingularity, the dimension s of the pivoting matrices is always 2. B is skew-symmetric, and CE−1CT is also skew-symmetric, so the Schur complement Â = B+CE−1CT is skew-symmetric too, which allows the process to continue 11 recursively. Algorithm 2.8 LDLT for skew-symmetric matrices (KIJ order) 1: for k = 1,3, . . . ,n−1 do 2: find a pivot apq in Ak:n,k:n and permute A accordingly 3: set E = Ak:k+1,k:k+1, C = Ak+2:n,k:k+1, B= Ak+2:n,k+2:n 4: Ak+2:n,k:k+1 =CE−1 5: Ak+2:n,k+2:n = B+CE−1CT 6: end for Algorithm 2.8 is in KIJ order, and it is similar to the algorithm for decomposing a symmetric matrix (Algorithm 2.5). There are only two differences between these two algorithms. The first difference is that two columns are updated in each loop iteration in Algorithm 2.8, while one or two columns are updated in Algorithm 2.5, depending on whether the pivoting matrix is 1× 1 or 2× 2. The second one is that Ak+2:n,k+2:n = B+CE−1CT is used in Algorithm 2.8, while Ak+2:n,k+2:n = B−CE−1CT is used in Algorithm 2.5. The signs are different because the upper right part of a skew-symmetric matrix is −CT , not CT . It suffices to describe only line 2 of the algorithm for each pivoting strategy. 2.3.1 Bunch’s partial pivoting strategy In line 2 of Algorithm 2.8, we find a pivot apq and permute it to the position of Ak+1,k, that is, a21 of the Schur complement submatrix of Ak:n,k:n. There are several pivoting strategies for how to find the pivot apq: complete pivoting, partial pivot- ing, and rook pivoting. Similar to the pivoting strategies for symmetric matrices, the complete pivoting strategy looks at every entry of the submatrix Ak:n,k:n to find a good pivot apq, and the rook pivoting strategy searches the pivot in a spiral or- der. However, the partial pivoting strategy is somewhat different from that used in decomposing symmetric matrices. For a symmetric matrix, the partial pivoting strategy considers at most two columns (rows) at each step. The first column (row) to be looked at is always the first column (row). If the entries of the first column satisfy some property (case 1), we do not need to look at another column. Other- wise, we need to look at another column indexed by the row index of the largest entry in absolute value in the first column. 12 In contrast, for a skew-symmetric matrix, Bunch’s partial pivoting strategy al- ways looks at the first two columns (rows) at each step to find the pivot apq and permute it to a21. Let apq be the entry with maximum magnitude in the first two columns in the current step. Bunch’s pivoting strategy can be described in two cases: • If apq is in the first column (q = 1), swap rows and columns 2 and p to put apq in a21. • If apq is in the second column (q= 2), first swap rows and columns 1 and 2 to put apq in ap1, and then swap rows and columns 2 and p to put ap1 in a21. Since two columns are examined each time, we always advance the loop index by 2 after each iteration, as in line 1 of Algorithm 2.8. Recall that in partial pivoting strategy for symmetric matrices, only case 4 uses a 2×2 matrix as the pivoting matrix, whereas for nonsingular skew-symmetric ma- trices, 2×2 matrices are always used. Therefore, we can think of this factorization as always being in case 4 conceptually. However, using a 2× 2 skew-symmetric matrix as the pivot just means we use the two entries a21 and a12 as the pivots to zero out entries in the first two columns (rows) because a11 = a22 = 0. Bunch gave the following analysis of the stability of his decomposition in [3]. The entries in L are CE−1 in the current step. The row i of L is [ai1,ai2] [ 0 1/a21 −1/a21 0 ] = [−ai2/a21,ai1/a21] and the elements of Â= B+CE−1CT are of the form âi j = ai j− (ai2/a21)a j1+(ai1/a21)a j2 That is, if |a21| ≥ min{|ai2|, |a j1|}, and |a21| ≥ min{|ai1|, |a j2|}, then |âi j| ≤ 3×max r,s |ars|. Bunch’s pivoting strategy ensures that a21 is larger than every entry in the first column in absolute value, so the above condition is always true. This guarantees that the element growth factor, the largest element in all the reduced matrices divided by max r,s |ars|, is bounded by √ 3 (n−2) , or 1.7321(n−2) [3]. This 13 growth factor is satisfactory, compared to the factor 2(n−1) in the classical example for GE with partial pivoting. There is one subtlety here. If the maximum entry apq in absolute value appears in the second column (q=2), Bunch performs two swappings of rows and columns to put it into the position of a21. The first swap puts the entry to ap1, and the second swap puts it to a21. An alternative would be just swapping the entry ap2 to a12 by swapping rows and columns of 1 and p. Because of the skew-symmetric of the matrix, now a21 is the negation of apq. To illustrate this, consider a small example: A= 0 −5 −1 −2 −3 −4 5 0 −6 −7 −8 −9 1 6 0 0 0 0 2 7 0 0 0 0 3 8 0 0 0 −10 4 9 0 0 10 0 . The maximum entry in the first two columns of A is a62 = 9. Bunch first swaps the first and second row and column to put the 9 at a61. The new matrix is: A1 = 0 5 −6 −7 −8 −9 −5 0 −1 −2 −3 −4 6 1 0 0 0 0 7 2 0 0 0 0 8 3 0 0 0 −10 9 4 0 0 10 0 . Then the sixth row and column are swapped with the second row and column. The resulting matrix is: A2 = 0 −9 −6 −7 −8 5 9 0 0 0 10 4 6 0 0 0 0 1 7 0 0 0 0 2 8 −10 0 0 0 3 −5 −4 −1 −2 −3 0 . 14 Now, the maximum entry 9 of the first two columns in the original matrix A has been put into the position of a21. Observe that in the first two columns (or rows) of A2, −10 (or 10) is the largest entry in absolute value, not a21 = 9. However, this still guarantees that the element growth factor is bounded by √ 3 (n−2) because the hypothesis of Bunch’s analysis only requires a21 to be larger than every entry in the first column in absolute value. The fact that the magnitude of a21 is not necessarily the largest in the first two columns does affect the entries of L. Remember that the entries of L in the first two columns are CE−1. E is a 2× 2 skew-symmetric matrix. Multiplying E only involves swapping and scaling of the two columns of C. Because every entry in the first column of C is smaller than E21 in magnitude, and some entries in the second column of C may be larger than E21 in magnitude, the entries of L=CE−1 in the second column are bounded by 1 in magnitude, and the entries of L in the first column are unbounded. Unboundedness just means they may be greater than 1 in magnitude, but in practice very few entries are. After the algorithm finishes, all entries of L in odd-numbered columns are bounded by 1 in magnitude, and entries of L in even-numbered columns are unbounded. Notice that only three columns (and rows) are touched when we perform pivoting. In this example, only the first, second, and sixth columns (and rows) are touched, while the block A3:5,3:5 is not touched at all. If a21 happens to be the largest entry in magnitude in the first two columns (and rows), we do not need to pivot and only two columns are touched in each step. Relatively few columns (2 or 3) are looked at in each step; this is a key part of the partial pivoting strategy. 2.3.2 Modified Bunch’s partial pivoting strategy In the previous section, we mentioned that Bunch’s partial pivoting strategy per- forms two swappings of rows and columns if the maximum entry apq in absolute value appears in the second column (q= 2). However, two swappings are not nec- essary. In this section we describe a slightly modified version of Bunch’s partial pivoting strategy that can save some work. We call it the modified Bunch partial pivoting strategy: • If apq is in the first column (q = 1), swap rows and columns 2 and p to put apq in a21. 15 • If apq is in the second column (q = 2), swap rows and columns 1 and p to put apq in a12. In the first case, we do the same swapping as Bunch does, but in the second case we only do one swapping of rows and columns to put the largest entry in the first two columns in a12. We still use A as the example to illustrate. a62 = 9 is largest in magnitude in the first two columns. Bunch’s pivoting strategy first swaps it to a61 and then to a21. Instead of doing this in two steps, we can just swap the first and sixth row and column of A to put a62 = 9 into the position of a12 and a26 =−9 into the position of a21 to get A3 = 0 9 0 0 10 4 −9 0 −6 −7 −8 5 0 6 0 0 0 1 0 7 0 0 0 2 −10 8 0 0 0 3 −4 −5 −1 −2 −3 0 Both A2 and A3 have the desired property that a21 and a12 are relatively large. Observe that in A2, the entry−10 (or 10), which causes the trouble of not bounding entries of L by 1, is in the second column (or row), but in A3, −10 (or 10) is in the first column (or row). Note that this modified version of the partial pivoting strategy also considers at most three columns in each step. We wish to ensure that the growth factor is bounded by √ 3 (n−2) in this mod- ified Bunch’s partial pivoting strategy. When we deduce the bound, the hypoth- esis is after swapping rows and columns, |a21| must be greater than or equal to min{|ai2|, |a j1|}, and |a21| must be greater than or equal to min{|ai1|, |a j2|}. When using Bunch’s pivoting strategy, a21 is larger than every entry in the first column in magnitude in both cases. This makes the hypothesis true. In comparison, when we use modified Bunch’s partial pivoting strategy, a21 is larger than either every entry in the first column (first case) or every entry in the second column (second case) in magnitude. This also makes the hypothesis true! To see this, notice that ai1 and a j1 are in the first column, and a j2 and ai2 are in the second column. Thus, either |a21| ≥ |ai1| and |a21| ≥ |a j1|, 16 or |a21| ≥ |ai2| and |a21| ≥ |a j2|. is true. The L factors generated by the modified Bunch partial pivoting strategy have a special property that for every two columns of k and k+1 (k is odd), the entries of L in one column are bounded by 1 in magnitude, but the entries in another column are unbounded. In total, we know that half entries of L are bounded by 1, but the other half are unbounded. However, similar to the case in Bunch’s pivoting strategy, in practice for those entries that are unbounded in theory, very few entries are larger than 1 in magnitude. Therefore, most entries in L are less than 1 in magnitude. In practice, these two versions give similar results. The error estimates ||PAPT− LDLT ||/||PAPT ||, generated by our codes in MATLAB, of both factorizations are very close, so we will adopt the modified Bunch version as it is simpler in the second case when the largest entry in magnitude is in the second column. For sim- plicity, when we say partial pivoting strategy for skew-symmetric matrices in the rest of this thesis, we refer to the modified Bunch partial pivoting strategy in this section, not the original Bunch partial pivoting strategy in the previous section. Al- though the modified version of partial pivoting strategy saves some work, it does not help improve the stability of the factorization. The question remains: Can we improve the stability of the factorization? 2.3.3 Skew-Symmetric Rook pivoting Both Bunch’s pivoting strategy and modified Bunch’s pivoting strategy cannot bound the factors of L, although both of them have a relatively small element growth bound √ 3 for D. For complete factorizations, it does not matter too much, as L only occasionally has a few entries larger than 1 in magnitude. However, we want a good incomplete factorization for the matrix A. The entries of L are the entries to be dropped when doing incomplete factorization, not the entries of D because D stores the 2× 2 pivoting matrix in every loop iteration and is more important than L in some sense. In the LDLT factorization for symmetric matrix, the rook pivoting strategy can guarantee the entries of L smaller than or equal to 1 in magnitude. The LDLT 17 factorization using rook pivoting strategy for skew-symmetric matrices also has this property. To describe the LDLT factorization with rook pivoting strategy for skew- symmetric matrices, it suffices to describe only line 2 of Algorithm 2.8. Algorithm 2.9 LDLT using rook pivoting strategy 1: ω1 = maximum magnitude of any subdiagonal entry in column 1 2: i= 1 3: repeat 4: r= row index of first (subdiagonal) entry of maximum magnitude in column i 5: ωr = maximum magnitude of any off-diagonal entry in column r 6: if ωi = ωr then 7: Use ( 0 −ari ari 0 ) as a 2×2 pivot (swap rows and columns 1 and i, and 2 and r) 8: else 9: i= r,ωi = ωr 10: end if 11: until a pivot is chosen Algorithm 2.9 is again similar to Algorithm 2.7, but even simpler. It does not have any predefined constant. It does not have a 1× 1 pivoting matrix, either. Only a 2× 2 skew-symmetric matrix can be used as the pivoting matrix. The algorithm just looks for an entry that is largest in both its column and its row in spiral order. After it finds this entry apq, which is largest in column (and row) p and q in magnitude, it first permutes it to ap1 and then permutes it to a21. Since a21 is in the first column and the second row of the reduced matrix, it is the largest entry in absolute value in the first two columns (and rows) in magnitude. Recall that in the LDLT factorization with partial pivoting strategy, in most cases only three columns of the reduced matrix Â are touched in each step. One possible drawback of the rook pivoting strategy is it may need to search many columns until a good pivot is found, but as noted in [6], the number of steps is usually small. The element growth factor is bounded by √ 3 (n−2) , same as that in the partial pivoting strategy, but the entries in L are all bounded by 1. Therefore, the rook pivoting strategy is more stable than the partial pivoting strategy. 18 2.3.4 Crout factorization for skew-symmetric matrices We have seen that for Gaussian elimination, there exists a Crout version. For skew- symmetric matrices, it is also possible to decompose A into LDLT form in Crout order. We can adapt Algorithm 2.8 to have a new one so that it examines the entries of the matrix in Crout order. Algorithm 2.10 Skew-symmetric Crout factorization, LDLTC 1: for k = 1,3, . . . ,n−1 do 2: find a pivot apq in Ak:n,k:n and permute A accordingly 3: for i= 1,3, . . . ,k−2 and ‖Ak:k+1,i:i+1‖ 6= 0 do 4: Ak:n,k:k+1 = Ak:n,k:k+1−Ak:n,i:i+1Ai:i+1,i:i+1ATk:k+1,i:i+1 5: end for 6: Ak+2:n,k:k+1 = Ak+2:n,k:k+1A−1k:k+1,k:k+1 7: end for The factorization in Crout order only touches two columns in each loop itera- tion. There is no need to look at the rows of the matrix due to skew-symmetry of it. It looks like the JKI variant because both of them load and update the matrix col- umn by column. However, the columns which the JKI variant work on are usually of fixed size, while the Crout variant works at columns with shrinking sizes. Thus this factorization is called Crout order, not JKI order. Different pivoting strategies (partial pivoting and rook pivoting) only differ in line 2 of Algorithm 2.10. The order of decomposition is not necessarily related to the pivoting strategy. We can have any combination of order of decomposition and pivoting strategy. In line 6 of Algorithm 2.10, we do not need to fine the inverse of Ak:k+1,k:k+1 since Ak+2:n,k:k+1A−1k:k+1,k:k+1 = Ak+2:n,k:k+1 [ 0 −ak+1,k ak+1,k 0 ]−1 = 1 ak+1,k Ak+2:n,k:k+1 [ 0 1 −1 0 ] , which can be trivially computed by swapping of columns of k and k+1 and scaling. To keep with using only half of the matrix, line 4 can also be computed element- 19 wise so that only ai+1,i is required from Ai:i+1,i:i+1. There is the issue of overwriting A with the multipliers in L in line 4, but we can recover those entries of A with the multipliers and the diagonal factors. If the full matrix were being operated on, the line would read Ak:n,k:k+1 = Ak:n,k:k+1− Ak:n,i:i+1Ai:i+1,k:k+1 [8]. Because Ai:i+1,k:k+1 is in the upper half of A and we only work on the lower half of A, we can use −ATk:k+1,i:i+1 instead of Ai:i+1,k:k+1. We give an example here to show that when the factorization is in Crout order, it is more involved than that in the usual KIJ order, especially when the rook pivoting strategy is used. Consider a very simple skew-symmetric matrix A= 0 −1 0 0 1 0 1 0 1 1 0 1 0 −1 0 −1 −1 −1 0 −1 1 0 1 0 −1 0 1 −1 0 −4 0 −1 1 0 4 0 . After one step of the factorization in KIJ order (Algorithm 2.8), the matrix A is updated to A1 = 0 −1 0 0 1 0 1 0 1 1 0 1 0 −1 0 −1 −2 −1 0 −1 1 0 0 0 −1 0 2 0 0 −3 0 −1 1 0 3 0 . Note that the factors of L should be stored in the lower triangular part of A. However, for the sake of simplicity here, we keep the first two columns of A (and A1) unchanged and assume we have a separate matrix L to store the multipliers. In fact, in this simple example, there is only one nonzero entry in the first column (row) of A. The first loop operation only updates the fifth column (and row) of A. It is as simple as adding the second row of A to the fifth row of A, followed by 20 adding the second column of A to the fifth column of A because adding a21 = 1 to a51 =−1 zeros out a51. Look at the reduced matrix A1(3:6,3:6) of dimension 4×4. If the partial pivoting strategy is used, 2 (and -2) will be selected as the pivot and we need to swap rows and columns of 2 and 3 to put the entry 2 at a21 of the reduced matrix. If the rook pivoting strategy is used, it first searches the first column and finds 2, then searches the third column and finds 3, then searches the fourth column and finds out that -3 (and 3) is indeed the maximum entry in magnitude in both its row and column. 3 (and 3) is permuted to the position of a21 as the pivot. The above discussion is based on the factorization in KIJ order, and all the searches are performed on the reduced matrix A1(3:6,3:6). If the factorization is done in Crout order, after one loop iteration, we do not have A1 yet because the updates are delayed. We only have A3:6,3:6 to work on. This makes the search for a pivot harder, but as explained below, it is still possible and involves no more than a few column update operations. If the partial pivoting strategy is used, we need to load the first and second columns of A3:6,3:6 into two temporary column vectors w1 and w2, and then update them. That is, w1 = [0,1,1,1]T and w2 = [−1,0,−1,0]T . They are updated to w1 = [0,1,2,1]T and w2 = [−1,0,0,0]T . We find that 2 is the maximum entry in these two columns and we need to swap rows and columns of 2 and 3. Another temporary column vector w is used to store the third column. w is loaded with [−1,1,0,4]T and then updated to [−2,0,0,3]T . Both w1 and w are the first two columns in the reduced matrix after swapping, while w2 is discarded because column 2 is swapped out. If the rook pivoting strategy is used, we need to load the first column into a temporary column vector w1 and then update it. That is, w1 = [0,1,1,1]T is updated to w1 = [0,1,2,1]T . The row index of the largest element 2 in w1 is 3, so the third column is loaded into another vector and then updated. That is, w2 is loaded with [−1,1,0,4]T and then updated to [−2,0,0,3]T . Now the row index of the largest element 4 in w2 is 4, so the fourth column is loaded and updated. Before this is done, w1, which stores the old column vector, is set to be w2. w2 can be used again for storing a new column. In the next search, w2 is loaded with the fourth column [−1,0,−4,0]T and then updated to [−1,0,−3,0]T . Finally, the pivot 3 (and −3) is 21 found and we need to swap rows and columns of 1 and 3, followed by rows and columns of 2 and 4, to put 3 at the position of a21. The two temporary column vectors w1 and w2 are the first two columns in the reduced matrix after swapping. From the explanation above, we see that when we search for a pivot in the factorization process in Crout order, we have actually updated the first two columns after swapping. There is no need to do line 4 again since the update is done in line 2. In most cases, the partial pivoting strategy needs to visit three columns in each loop iteration. In the above example, the partial pivoting strategy looks at column 1, 2, and 3 of the reduced matrix A3:6,3:6 when it searches for a pivot. In contrast, the rook pivoting strategy looks at column 1, 3, and 4 of the reduced matrix A3:6,3:6. The number of steps needed to find a good pivot for the rook pivoting strategy is small in this example, as is usually the case. If the skew-symmetric matrix is block diagonally dominant, then there is no need to pivot and both the partial pivoting strategy and the rook pivoting strategy only visit two columns in each loop iteration. 22 Chapter 3 Incomplete factorizations of skew-symmetric matrices We now turn to incomplete factorizations in this chapter. When we want to solve large sparse linear systems, incomplete factorizations are a good source of precon- ditioners [1]. Our goal is to find an effective matrix as preconditioner for solving a large sparse skew-symmetric linear system. We first look at the standard incom- plete factorization for a general matrix using Gaussian Elimination, that is, ILUT. Later we will generalize this incomplete factorization and incorporate pivoting to derive the incomplete factorizations of LDLT for skew-symmetric matrices. 3.1 Incomplete LU factorizations Incomplete factorizations can be generated in many ways, depending on the matri- ces we are working on. We start with the basic ones, incomplete LU factorization for a general matrix. The simplest form of incomplete LU factorization is ILU(0). In ILU(0), the sparsity pattern of L+U matches the sparsity pattern of the original matrix A. Although this version of ILU is extremely simple and introduces no fill- in, it is not very robust as a preconditioner because it drops a lot of entries during the factorization process. We will thus focus on a somewhat more complicated but accurate version of ILU, the incomplete LU factorization with threshold (ILUT). 23 3.1.1 IKJ variant The following algorithm 3.1 [11] is adapted from algorithm 2.2, since the IKJ variant is more suitable for incomplete factorizations than the KIJ variant for a general matrix. Algorithm 3.1 ILUT (IKJ order) 1: for i= 2 to n do 2: w= ai∗ 3: for k = 1 to i−1 and when wk 6= 0 do 4: wk = wk/ukk 5: Apply a dropping rule to wk 6: if wk 6= 0 then 7: w= w−wk ∗uk∗ 8: end if 9: end for 10: Apply a dropping rule to row w 11: for j = 1 to i−1 do 12: li j = w j 13: end for 14: for j = i to n do 15: ui j = w j 16: end for 17: end for A brief review is given in [11, §10.4] and reproduced here. The algorithm 3.1 takes two arguments p and τ . In the factorization process of ILUT(p, τ), the following dropping rules are used: 1. In line 5, the entry wk is dropped if it is smaller than the relative tolerance τi, obtained by multiplying the tolerance τ with the norm of the ith row. 2. In line 10, the whole working row w is examined. Again entries that are smaller than the relative tolerance τi are dropped, and then only the p largest entries in magnitude in the L part and U part are kept, respectively. The first dropping rule drops small entries, while the second dropping rule controls the memory usage. These two dropping rules together constitute the key 24 part of ILUT. They are incorporated into the IKJ variant of GE. The key idea of these two dropping rules is just to keep the p largest entries that are no smaller than the relative tolerance τi in each row of L and U , respectively. Note that Algorithm 3.1 may break down if it encounters a zero pivot. To make it more stable, we incorporate pivoting into it, but we will not discuss pivoting for this form of ILUT for two reasons: 1. This variant of ILUT is in IKJ order. We visit the matrix A and generate both factors L and U row by row. Row pivoting (swapping rows) is impossible here because at step i, every entry in column i below aii has not been touched at all. We cannot select a correct candidate entry to pivot when encounter- ing a zero pivot. Instead, we are free to do a column pivoting. Recall that premultiplying a permutation matrix swaps the rows of a matrix, while post- multiplying a permutation matrix swaps the columns of a matrix. We will have a factorization AP= LU instead of PA= LU , as we normally get when we factor the matrix in KIJ order. Although this is completely fine, we will not explore it further in this thesis. 2. MATLAB stores matrices in Compressed Sparse Column format. MATLAB’s build-in luinc command thus visits the matrix column by column and also computes both the L andU factors column by column. We are going to com- pare the performance of preconditioners generated by our incomplete LDLT factorization and ILU factorization by MATLAB’s luinc routine. However, MATLAB’s luinc routine only accepts one argument, the drop tolerance τ . It does not have the second dropping rule to control memory usage. We need to implement our own ILUT factorization, but in JKI order not in IKJ order. 3.1.2 JKI variant As explained in the previous section, we will explore the ILUT factorization in JKI order in detail. First look back to Algorithms 2.2 and 2.3 in the previous chapter. Note that there are several differences between these two algorithms besides the difference of their orders of looping indices. Inside the nested loop of running index k of Algorithm 2.2, one multiplier is computed, followed by a row update. 25 It is clear here that when this algorithm is turned into an incomplete factorization, the first dropping rule should be incorporated between line 3 and line 4. We decide whether to drop a multiplier before it is used to do a row update operation. In contrast, inside the nested loop of running index k of algorithm 2.3, there is only one column update operation. After j−1 column update operations, we compute a column of multipliers at the same time. At first glance, it is not very clear that where the first dropping rule should be incorporated into. Look at line 4 carefully: It’s ai j = ai j−aikak j, same as line 5 of algorithm 2.2. However, the running indices of the loops where these lines are inside are different. When doing a column update operation, the product of a column of multipliers and one entry of the U factor ak j is subtracted from the working column. It seems that the entry ak j plays a key role here. It is natural to guess ak j in Algorithm 2.3 has the same functionality as aik has in Algorithm 2.2. We may add the first dropping rule between line 2 and line 3 of Algorithm 2.3. That is, drop the element ak j ofU if it is smaller than the relative tolerance, obtained by multiplying the tolerance τ with the norm of the jth column. Somewhat surprisingly, MATLAB’s luinc routine does not do it this way. It performs column update operation no matter how small the entry ak j is. The for loop in lines 3 to 5 is always executed. The first dropping rule is added between line 5 and line 6 of 2.3. Doing it this way makes the incomplete factorization more accurate at the cost of more preprocessing time. Another issue here is how we drop the elements of L. Unlike Algorithm 2.2, where a row of multipliers are not computed in consecutive operations, the for loop from line 7 to line 9 in Algorithm 2.3 computes a column of multipliers in consecutive operations. When we apply the first dropping rule to these elements of L, do we compare each individual element with the relative tolerance, obtained by multiplying the tolerance τ with the norm of the jth column of A, or only the norm of the jth column of L? We know that in the IKJ variant, when we compute the relative tolerance, we choose the norm of the ith row of A. We do not have another choice because the norm of the ith row of L is not available at that time. However, in the JKI variant, the whole column of L is computed at the same time, making it possible to compute the norm of the jth column of L at the time when we decide which elements of L should be dropped. It seems like a trivial question here. Why not just choose the norm of the jth column, as in Algorithm 3.1? 26 We raise this question here because later when we want to compute the in- complete LDLT factorization for a skew-symmetric matrix, we will have a similar situation. For a skew-symmetric matrix, we store only the lower triangular part of the matrix A. As the factorization goes, some elements of A have been overwrit- ten by elements of L. Although recovering one whole column of A is completely possible, it may not be worthwhile to do this just for the norm of one column of A. Instead, it is more natural to consider the norm of a column of L. Going back to the ILUT with JKI order for a general matrix, we want a justified reason to choose the norm of a column of L to calculate the relative tolerance. We have two things that we are uncertain about. The first is whether we can drop the entry ak j before the column update operation. The second is can we choose the norm of a column of L instead of a column of A to calculate the relative tolerance. We therefore implement a total of 2×2 variants of ILUT in JKI order. We compare the performance of these 4 implementations of ILUT. We first list all the 4 variants here. Algorithm 3.2 ILUT (JKI order, variant 1) 1: for j = 1 to n do 2: w= a∗ j 3: for k = 1 to j−1 do 4: if wk 6= 0 then 5: wk+1:n = wk+1:n−wk ∗ lk+1:n,k 6: end if 7: Apply a dropping rule to wk 8: end for 9: Apply a dropping rule to column w (based on the norm of w) 10: for i= 1 to j do 11: ui j = w j 12: end for 13: for i= j to n do 14: li j = wi/w j 15: end for 16: end for If the element ak j is small, the first 2 variants do the column update operations before dropping ak j, while the last 2 variants drop it first and do no column update 27 Algorithm 3.3 ILUT (JKI order, variant 2) 1: for j = 1 to n do 2: w= a∗ j 3: for k = 1 to j−1 do 4: if wk 6= 0 then 5: wk+1:n = wk+1:n−wk ∗ lk+1:n,k 6: end if 7: Apply a dropping rule to wk 8: end for 9: Apply a dropping rule to column w (U part based on the norm of w, L part based on the norm of w j+1:n) 10: for i= 1 to j do 11: ui j = w j 12: end for 13: for i= j to n do 14: li j = wi/w j 15: end for 16: end for Algorithm 3.4 ILUT (JKI order, variant 3) 1: for j = 1 to n do 2: w= a∗ j 3: for k = 1 to j−1 do 4: Apply a dropping rule to wk 5: if wk 6= 0 then 6: wk+1:n = wk+1:n−wk ∗ lk+1:n,k 7: end if 8: end for 9: Apply a dropping rule to column w (U part based on the norm of w, L part based on the norm of w j+1:n) 10: for i= 1 to j do 11: ui j = w j 12: end for 13: for i= j to n do 14: li j = wi/w j 15: end for 16: end for 28 Algorithm 3.5 ILUT (JKI order, variant 4) 1: for j = 1 to n do 2: w= a∗ j 3: for k = 1 to j−1 do 4: Apply a dropping rule to wk 5: if wk 6= 0 then 6: wk+1:n = wk+1:n−wk ∗ lk+1:n,k 7: end if 8: end for 9: Apply a dropping rule to column w (based on the norm of w) 10: for i= 1 to j do 11: ui j = w j 12: end for 13: for i= j to n do 14: li j = wi/w j 15: end for 16: end for operation. In line 9 of all 4 variants, we have finished all the j−1 column update operations. We again drop small elements of the working column w. For the el- ements in the L part of w, variants 2 and 3 compare each individual element with the norm of all elements in the L part of w, while variants 1 and 4 compare it with the norm of w. For the elements in the U part of w, all 4 variants compare each individual element with the norm of w. If we want to do pivoting, we can just add it between line 9 and line 10 to swap the largest element under a j j with a j j. If we set the second argument to be ∞ in variant 1, it is MATLAB’s luinc routine. Recall that luinc does not have the second argument, so variant 1 has more functionality when given the second argument. Here we include 2 typical examples of the performance comparison of these 4 variants. More comparisons are given in section 3.3. In Figure 3.1, the testing matrix is RAEFSKY1 [5]. We use 113, which is equal to 2.5 times nnz(A)/2n, half the number of nonzero entries per row, as the second argument. It turns out that this number is large enough for the first variant, so the second dropping rule is not applied in this case and the first variant gives the same result as MATLAB’s luinc routine. The red curve overlaps with the 29 Figure 3.1: Preconditioned GMRES for RAEFSKY1 cyan curve. As expected, the second variant converges faster than the other three variants because it performs column update operation before dropping the element ak j and it compares an individual L factor with the norm of a column of L, which is often smaller than the norm of a column of A and thus makes it hard to drop an element. However, the second variant has the largest number of nonzeros in the factor L+U . The nonzeros of the factors L+U of these four variants are 137533, 289383, 243065, 134365, respectively. If we use the number of nonzeros times the iteration number as a rough estimate, it seems that the first variant is the best, but the performance of the other three variants are not so far from the first variant, taking into account that the first two variants take more time to do the preprocessing 30 Figure 3.2: Preconditioned GMRES for RAEFSKY3 step. In Figure 3.2, the testing matrix is RAEFSKY3. We use ∞ as the second argu- ment because this matrix is ill-conditioned and we may need to keep a lot of entries in each column to have an accurate incomplete factorization. The red curve again overlaps with the cyan curve. The purpose of this example is to show that some- times only the most accurate incomplete factorization, variant 2, converges. The first variant does not converge within 600 iterations, although it has a trend to con- verge in the next 100 iterations. We see that those variants which drop element ak j before performing column update operations fail here, so it is hard to tell whether we can do it this way. That really depends on which matrix we are working on. 31 The above examples do not show any significant drawbacks of replacing the norm of a column of A with the norm of a column of L. Therefore, we will do this in the incomplete LDLT factorization for skew-symmetric matrices. We finally add pivoting into line 10 of each of these 4 variants and rename them as ILUTPC1, ILUTPC2, ILUTPC3, and ILUTPC4. P stands for pivoting, and C stands for column version (JKI order). Only ILUTPC1 is shown here. Algorithm 3.6 ILUTPC1 1: for j = 1 to n do 2: w= a∗ j 3: for k = 1 to j−1 do 4: if wk 6= 0 then 5: wk+1:n = wk+1:n−wk ∗ lk+1:n,k 6: end if 7: Apply a dropping rule to wk 8: end for 9: Apply a dropping rule to column w (based on the norm of w) 10: find an entry wp that is largest in absolute value in w j:n 11: swap rows j and p of A and swap w j and wp 12: for i= 1 to j do 13: ui j = w j 14: end for 15: for i= j to n do 16: li j = wi/w j 17: end for 18: end for 3.2 Incomplete LDLT of skew-symmetric matrices Now that we know how to incorporate the dropping rules into the factorization, we can incorporate these two dropping rules into the algorithm 2.10 to derive an incomplete factorization for skew-symmetric matrices. It is almost immediate to get Algorithm 3.7. The only change is we add line 6 to Algorithm 2.10. Recall that Ak+2:n,k:k+1 stores the entries of L after each loop iteration when the looping index is k. Those entries of L are calculated in line 7, so in line 6, they have not been calculated yet. 32 Algorithm 3.7 Incomplete skew-symmetric Crout factorization, ILDLTC 1: for k = 1,3, . . . ,n−1 do 2: find a pivot apq in Ak:n,k:n and permute A accordingly 3: for i= 1,3, . . . ,k−2 and ‖Ak:k+1,i:i+1‖ 6= 0 do 4: Ak:n,k:k+1 = Ak:n,k:k+1−Ak:n,i:i+1Ai:i+1,i:i+1ATk:k+1,i:i+1 5: end for 6: Apply dropping rules to Ak+2:n,k:k+1 7: Ak+2:n,k:k+1 = Ak+2:n,k:k+1A−1k:k+1,k:k+1 8: end for Before they are computed, we first apply dropping rules to these two columns to make them sparse. It can save some work and storage. If we put line 6 after line 7, that is, apply dropping rules after the entries of L are computed, we still get the same incomplete factorization. The reason is line 7 only involves swapping and scaling of the two columns of Ak+2:n,k:k+1 to compute the entries of L. It does not change the order of the entries in each column. Note that the row index goes from k+2 to n because Ak:k+1,k:k+1 is the D factors. Again, different pivoting strategies (partial pivoting and rook pivoting) only do differently in line 2, as explained in the previous chapter. In line 6, We need to drop entries of two columns. One approach is to partition Ak+2:n,k:k+1 into (n− k−1)/2 blocks of 2×2 matrices. We drop a 2×2 matrix if its 2-norm is small. This approach works, but it does not give a good result. The reason is small entries are usually grouped with large entries in one block, so they are not dropped even if they are extremely small (especially for those that are 0 in exact arithmetic but not in floating point system). Another approach is to consider these two columns independently and apply both dropping rules to each column independently. Small entries are decoupled with their neighbouring entries. This approach is better than the first approach, since it drops small entries most of the time. It is very useful that in Algorithm 3.7 both dropping rules 1 and 2 are together in line 6, unlike the ILUTPC, in which the two dropping rules are separate. The relevant part of the implementation in the MATLAB script is given below: 33 % first dropping rule if (tol > 0) relativetol1 = tol * norm(w1); relativetol2 = tol * norm(w2); for (i = 1:length(w1)) if (w1(i) ˜= 0 && abs(w1(i)) < relativetol1) w1(i) = 0; % first dropping end end for (i = 1:length(w2)) if (w2(i) ˜= 0 && abs(w2(i)) < relativetol2) w2(i) = 0; % first dropping end end end % second dropping rule if (pel < nnz(w1)) wtemp = abs(w1); [r index] = sort(wtemp); w1(index(1:length(wtemp)-pel)) = 0; end if (pel < nnz(w2)) wtemp = abs(w2); [r index] = sort(wtemp); w2(index(1:length(wtemp)-pel)) = 0; end w1 and w2 are two temporary column vectors that store the first two columns of the reduced matrix Ã. The two relative tolerance variables are calculated inde- pendently. The first dropping rule 1 is applied, immediately followed by the second dropping rule 2 that controls memory usage, but entries of the diagonal blocks are never dropped. Note that the sizes of both w1 and w2 are shrinking as the factor- 34 ization proceeds, and this makes entries in the right part of the matrix harder to drop. This is fine and there is no need to restore a full column of A. 3.3 Numerical experiments We first test the 4 variants of ILUTPC as preconditioners in MATLAB for 7 general matrices from the Davis collection [5]. These matrices were used as testing ma- trices in [10]. In the column labeled by itn, ”-” means GMRES does not converge within 600 iterations. In most cases, the order of rate of convergence is ILUTPC2 > ILUTPC3 > ILUTPC1 > ILUTPC4, which is the reverse order of their nonze- ros. Note that in general, the rate of convergence greatly depends on the condition number and the eigenvalue distribution. Those matrices with small condition num- bers typically converge fast, while matrices with large condition numbers often converge slowly or even do not converge. The fraction nnz2n denotes half the average number of nonzero entries in each column. The parameters are chosen this way: 1. If the drop tolerance τ is 0.01, then p, which controls memory usage, is selected to be 2.5× nnz2n . 2. If τ is 0.001, then p is 5× nnz2n . However, if a scheme does not converge within the specified number of itera- tion, we replace the second parameter p with ∞. It is more likely to converge after the replacement, at the cost of more memory usage. It seems that ILUTPC1 is the best, taking into account of both convergence rate and storage. Therefore, ILUTPC1 is used in the next set of tests for skew- symmetric matrices. The performance of ILDLT Crout (both partial pivoting and rook pivoting) is compared with ILUTPC1. ILDLTRP stands for incomplete LDLT factorization with rook pivoting, and ILDLTPP stands for incomplete LDLT factorization with modified Bunch’s partial pivoting. MATLAB’s luinc command is also used here for reference. luinc (τ) is same as ILUTPC1(τ,∞) The testing matrix A is the skew-symmetric part of the 2D convection-diffusion equation with constant coefficients, discretized using the centered finite difference on the unit square. The mesh Reynolds numbers are 0.8 and 0.2. The mesh is 35 100× 100, so the dimension of A is 10000× 10000. A skew-symmetric matrix arising this way is often ill-conditioned. We also explore better conditioned skew- symmetric matrices B and C by making the matrices block diagonally dominant. Define a block diagonal skew-symmetric matrix of the same size as A: J = 0 1 −1 0 0 1 −1 0 . . . . . . 0 1 −1 0 . J has a condition number of 1. It is a permutation of the matrix( I 0 0 −I ) . B and C are obtained by adding some multiples of J to A. Observe that for the matrix A, whose nnz2n = 1.98 is extremely small and con- dition number is not small, all preconditioners are very sensitive to the second parameter p. For A, p must be way larger than 5× nnz2n for the scheme to converge. If all the 4 preconditioners converge, ILDLTRP and ILDLTPP usually use only a little more than half the storage ILUTPC1 uses, but still converge slightly faster. We also include Figure 3.3 to show the converge plot when using parameter (τ = 0.01, p = 40), ILDLTRP outperforms ILDLTPP significantly. With limited memory usage, ILDLTRP is much more stable than ILDLTPP because every entry in L using rook pivoting strategy is guaranteed to be less than 1, while only half of the entries in L using partial pivoting strategy are guaranteed to be less than 1. For a well conditioned matrix B, we can safely set the second parameter p to be 2.5× nnz2n = 5 without any trouble. All the preconditioners converge in almost same number of iterations. ILDLTRP and ILDLTPP still save about half the com- putational work and storage. 36 For a nearly perfectly conditioned matrix C, with condition number of 2.5, all preconditioners perform very well. There are not many differences among those preconditioners. Matrix Size (n×n) nnz nnz2n condition number preconditioner parameters (τ, p) nnz(L+U) itn LNS3937 3937 25407 3.2 1.06×106 ILUTPC1 1e-3, inf 90961 48 ILUTPC2 137273 18 ILUTPC3 153843 80 ILUTPC4 88995 77 RAEFSKY1 3242 293409 45.4 3.72×104 ILUTPC1 1e-2, 113 137533 38 ILUTPC2 289383 21 ILUTPC3 243065 31 ILUTPC4 134365 40 RAEFSKY2 3242 293551 45.4 1.17×104 ILUTPC1 1e-2, 113 204038 58 ILUTPC2 305794 21 ILUTPC3 286570 39 ILUTPC4 201693 61 RAEFSKY3 21200 1488768 35.1 2.69×105 ILUTPC1 1e-2, inf 973176 - ILUTPC2 1449893 98 ILUTPC3 1525163 - ILUTPC4 1043537 - SHERMAN5 3312 20793 3.1 6.52×103 ILUTPC1 1e-2, 7 17060 22 ILUTPC2 19771 22 ILUTPC3 19433 23 ILUTPC4 16975 23 UTM3060 3060 42211 6.9 2.13×107 ILUTPC1 1e-3, inf 306280 - ILUTPC2 341918 - ILUTPC3 329390 - ILUTPC4 297393 - UTM5940 5940 83842 7.1 1.04×109 ILUTPC1 1e-3, inf 884993 - ILUTPC2 927344 - ILUTPC3 926486 - ILUTPC4 816136 - Table 3.1: Performance of 4 variants of ILUTPC for general matrices. 37 Matrix Size (n×n) nnz nnz2n condition number preconditioner parameters (τ, p) nnz(L+D) or nnz(L+U) itn A 10000 39600 1.98 6.74×103 ILDLTRP 1e-2, inf 350064 79 ILDLTPP 374709 77 ILUTPC1 608589 - luinc 1e-2 608589 - ILDLTRP 1e-2, 40 303505 179 ILDLTPP 329535 - ILUTPC1 587376 - luinc 1e-2 608589 - ILDLTRP 1e-3, inf 496079 6 ILDLTPP 500285 6 ILUTPC1 957897 8 luinc 1e-3 957897 8 ILDLTRP 1e-3, 45 459459 20 ILDLTPP 459848 32 ILUTPC1 884727 208 luinc 1e-3 957897 8 B= A−4× J 10000 39600 1.98 701.03 ILDLTRP 1e-2, 5 54780 13 ILDLTPP 54780 13 ILUTPC1 87660 14 luinc 131342 13 C = A+4× J 10000 39600 1.98 2.5 ILDLTRP 1e-2, 5 54002 4 ILDLTPP 54002 4 ILUTPC1 49302 4 luinc 49302 4 Table 3.2: Performance of ILDLT and ILUTPC for skew-symmetric matrices. 38 Figure 3.3: Preconditioned GMRES for a skew-symmetric matrix A 39 Chapter 4 Conclusion We have proposed a slightly simpler version of Bunch’s LDLT factorization for skew-symmetric matrices with partial pivoting strategy, which saves one swapping of rows and columns half of the time. We have also examined the LDLT factor- ization with rook pivoting strategy. These two factorizations can be done in Crout order. They save about half the computational work and storage, compared to the standard LU factorization. We have derived an incomplete LDLT factorization, based on the LDLT fac- torization in Crout order, with both partial pivoting strategy and rook pivoting strategies. Their performance when applied as preconditioners for GMRES for skew-symmetric matrices is compared with the standard ILU factorization. Nu- merical results show that both the incomplete LDLT factorization with rook pivot- ing strategy (ILDLTRP) and partial pivoting strategy (ILDLTPP) can save half the computational work and storage. ILDLTRP is even stable than ILDLTPP. The usage of a second parameter p, which controls how many nonzero ele- ments in a column of L are kept, is greatly affected by the condition number of the skew-symmetric matrix. To make our scheme more robust, future work needs to be done on investigating the relationship between condition number and p. 40 Bibliography [1] U. M. Ascher and C. Greif. A First Course in Numerical Methods. SIAM, 2011. → pages 23 [2] Zhong-Zhi Bai, Gene H. Golub, and Michael K. Ng. Hermitian and skew-Hermitian splitting methods fornon-Hermitian positive definite linear systems. SIAM J. Matrix Anal. Appl., 24(3):603–626, 2003. → pages 1 [3] J. R. Bunch. A note on the stable decomposition of skew-symmetric matrices. Math. Comp., 38(158):475–479, 1982. → pages 11, 13 [4] James R. Bunch and Linda Kaufman. Some stable methods for calculating inertia and solving symmetric linear systems. Math. Comp., 31(137):163–179. → pages 7 [5] Timothy A. Davis. The university of florida sparse matrix collection. http://www.cise.ufl.edu/research/sparse/matrices. → pages 29, 35 [6] C. Greif and J. M. Varah. Iterative solution of skew-symmetric linear systems. SIAM J. Math. Anal., 31(2):584–601, 2009. → pages 18 [7] N. J. Higham. Accuracy and Stability of Numerical Algorithms. SIAM, 2nd edition, 2002. → pages 4, 6, 9 [8] T. Lau. Numerical solution of skew-symmetric linear systems. Master’s thesis, The University of British Comlumbia, 2009. → pages 2, 20 [9] N. Li and Y. Saad. Crout versions of the ILU factorization with pivoting for sparse symmetric matrices. Electron. Trans. Numer. Anal., 20:75–85, 2005. → pages 3 [10] N. Li, Y. Saad, and E. Chow. Crout versions of ILU for general sparse matrices. SIAM J. Sci. Comput., 25(2):716–728, 2003. → pages 4, 5, 35 41 [11] Y. Saad. Iterative Methods for Sparse Linear Systems. SIAM, Philadelpha, PA, 2nd edition, 2003. → pages 4, 24 42
- Library Home /
- Search Collections /
- Open Collections /
- Browse Collections /
- UBC Theses and Dissertations /
- Ildl factorizations for sparse skew-symmetric matrices
Open Collections
UBC Theses and Dissertations
Featured Collection
UBC Theses and Dissertations
Ildl factorizations for sparse skew-symmetric matrices He, Shiwen 2012
pdf
Page Metadata
Item Metadata
Title | Ildl factorizations for sparse skew-symmetric matrices |
Creator |
He, Shiwen |
Publisher | University of British Columbia |
Date Issued | 2012 |
Description | Our goal is to solve a sparse skew-symmetric linear system efficiently. We propose a slight modification to the Bunch LDL\T factorization with partial pivoting strategy for skew-symmetric matrices, which saves approximately one third of the overall number of swaps of rows and columns on average. We also develop a rook pivoting strategy for this LDL\T factorization in Crout order. We derive an incomplete LDL\T factorization based on the full LDL\T factorization, with both partial pivoting strategy and rook pivoting strategy. The incomplete factorization can be used as a preconditioner for Krylov subspace solvers. Various column versions of ILUTP factorizations are investigated. We show that the approach taken saves approximately half of the work and storage compared to standard ILU factorizations that do not exploit the structure of the matrix, and the rook pivoting strategy is often superior to the partial pivoting strategy, in particular when the matrix is ill-conditioned and the overall number of elements in the rows of the factors is restricted. |
Genre |
Thesis/Dissertation |
Type |
Text |
Language | eng |
Date Available | 2012-04-23 |
Provider | Vancouver : University of British Columbia Library |
Rights | Attribution-NonCommercial-NoDerivatives 4.0 International |
DOI | 10.14288/1.0103449 |
URI | http://hdl.handle.net/2429/42185 |
Degree |
Master of Science - MSc |
Program |
Computer Science |
Affiliation |
Science, Faculty of Computer Science, Department of |
Degree Grantor | University of British Columbia |
GraduationDate | 2012-05 |
Campus |
UBCV |
Scholarly Level | Graduate |
Rights URI | http://creativecommons.org/licenses/by-nc-nd/4.0/ |
AggregatedSourceRepository | DSpace |
Download
- Media
- 24-ubc_2012_spring_he_shiwen.pdf [ 301.45kB ]
- Metadata
- JSON: 24-1.0103449.json
- JSON-LD: 24-1.0103449-ld.json
- RDF/XML (Pretty): 24-1.0103449-rdf.xml
- RDF/JSON: 24-1.0103449-rdf.json
- Turtle: 24-1.0103449-turtle.txt
- N-Triples: 24-1.0103449-rdf-ntriples.txt
- Original Record: 24-1.0103449-source.json
- Full Text
- 24-1.0103449-fulltext.txt
- Citation
- 24-1.0103449.ris
Full Text
Cite
Citation Scheme:
Usage Statistics
Share
Embed
Customize your widget with the following options, then copy and paste the code below into the HTML
of your page to embed this item in your website.
<div id="ubcOpenCollectionsWidgetDisplay">
<script id="ubcOpenCollectionsWidget"
src="{[{embed.src}]}"
data-item="{[{embed.item}]}"
data-collection="{[{embed.collection}]}"
data-metadata="{[{embed.showMetadata}]}"
data-width="{[{embed.width}]}"
async >
</script>
</div>
Our image viewer uses the IIIF 2.0 standard.
To load this item in other compatible viewers, use this url:
https://iiif.library.ubc.ca/presentation/dsp.24.1-0103449/manifest