ANALYSIS OF C Y C L I C R E D U C T I O N FOR T H E N U M E R I C A L SOLUTION OF THREE-DIMENSIONAL CONVECTION-DIFFUSION EQUATIONS by CHEN GREIF B . Sc. (Applied Mathematics), Tel Aviv University, 1991 M . Sc. (Applied Mathematics), Tel Aviv University, 1994 A THESIS S U B M I T T E D IN P A R T I A L F U L F I L L M E N T O F THE REQUIREMENTS FOR THE DEGREE OF DOCTOR OF PHILOSOPHY in T H E F A C U L T Y OF G R A D U A T E STUDIES Department of Mathematics Institute of Applied Mathematics We accect this thesis as conforming to tre^ required ^ m d \ r d T H E UNIVERSITY OF BRITISH C O L U M B I A April 1998 Â© Chen Greif, 1998 In presenting this thesis in partial fulfillment of the requirements for an advanced degree at the University of British Columbia, I agree that the Library shall make it freely available for reference and study. I further agree that permission for extensive copying of this thesis for scholarly purposes may be granted by the head of my department or by his or her representatives. It is understood that copying or publication of this thesis for financial gain shall not be allowed without my written permission. Department of Mathematics The University of British Columbia Vancouver, Canada Abstract This thesis deals with the numerical solution of convection-diffusion equations. In particular, the focus is on the analysis of applying one step of cyclic reduction to linear systems of equations which arise from finite difference discretization of steady-state three-dimensional convectiondiffusion equations. The method is based on decoupling the unknowns and solving the resulting smaller linear systems using iterative methods. In three dimensions this procedure results in some loss of sparsity, compared to lower dimensions. Nevertheless, the resulting linear system has excellent numerical properties, is generally better conditioned than the original system, and gives rise to faster convergence of iterative solvers, and convergence in cases where solvers of the original system of equations fail to converge. The thesis starts with an overview of the equations that are solved and general properties of the resulting linear systems. Then, the unsymmetric discrete operator is derived and the structure of the cyclically reduced linear system is described. Several important aspects are analyzed in detail. The issue of orderings is addressed and a highly effective ordering strategy is presented. The complicated sparsity pattern of the matrix requires careful analysis; comprehensive convergence analysis for block stationary methods is provided, and the bounds on convergence rates are shown to be very tight. The computational work required to perform cyclic reduction and compute the solution of the linear system is discussed at length. Preconditioning techniques and various iterative solvers are considered. ii Table of Contents Abstract ii T a b l e of C o n t e n t s iii List of Tables v List of F i g u r e s vi Acknowledgements C h a p t e r 1. 1.1 1.2 1.3 2.3 1 25 O r d e r i n g Strategies ; . 26 29 31 36 39 46 Block Ordering Strategies for 3D Grids The Family of Two-Plane Orderings The Family of Two-Line Orderings Comparison Results C h a p t e r 4. 2 2 4 9 14 14 20 23 Cyclic Reduction Complete Cyclic Reduction The Three-Dimensional Cyclically Reduced Operator 2.2.1 The Constant Coefficient Model Problem 2.2.2 The Variable Coefficient Case Properties of the Reduced Matrix C h a p t e r 3. 3.1 3.2 3.3 3.4 Introduction Background 1.1.1 The Convection-Diffusion Equation 1.1.2 Finite Difference Methods 1.1.3 Solving the Linear System One Step of Cyclic Reduction 1.2.1 The One-Dimensional Case 1.2.2 More Space Dimensions Thesis Outline and Notation C h a p t e r 2. 2.1 2.2 viii C o n v e r g e n c e A n a l y s i s for B l o c k Stationary M e t h o d s 4.1 Symmetrization of the Reduced System 4.1.1 The Constant Coefficient Case 4.1.2 The Variable Coefficient Case 4.2 Bounds on Convergence Rates 4.3 "Near-Property A " for I D Partitioning of the Two-Plane Matrix 4.4 Computational Work 4.5 Comparison with the Unreduced System 4.6 Fourier Analysis 4.7 Comparisons m 48 50 54 58 64 64 65 72 75 87 94 97 103 107 Table of Contents C h a p t e r 5. 5.1 5.2 5.3 5.4 5.5 Solvers, P r e c o n d i t i o n e r s , I m p l e m e n t a t i o n and P e r f o r m a n c e Krylov Subspace Solvers and Preconditioners - Overview Incomplete Factorizations for the Reduced System The Overall Cost of Construction of the Reduced System Numerical Experiments Vector and Parallel Implementation C h a p t e r 6. S u m m a r y , Conclusions and F u t u r e R e s e a r c h 6.1 Summary and Conclusions 6.2 Future Research 113 113 120 126 128 140 153 153 157 Bibliography 160 iv List of Tables 4.1 4.2 4.3 4.4 4.5 4.6 4.7 comparison between the computed spectral radius and the bound for the I D splitting 86 comparison between the computed spectral radius and the bound for the 2D splitting 86 comparison of computed spectral radii of the I D Jacobi iteration matrix with matrices which are consistently ordered 92 comparison between iteration counts for the reduced and unreduced systems . . . . . . 108 iteration counts for different iterative schemes 110 iteration counts for one nonzero convective term Ill iteration counts for two nonzero convective terms Ill 5.1 5.2 5.3 5.4 5.5 5.6 5.7 5.8 5.9 5.10 5.11 5.12 5.13 5.14 5.15 5.16 number of nonzero elements in the I L U ( l ) factorization computational work involved in the construction of the reduced matrix comparison between the computed spectral radii and the bounds comparison of solving work/time for various mesh sizes comparison between estimated condition numbers performance of block stationary methods for Test Problem 1 construction time/work of I L U preconditioners performance of Q M R for Test Problem 1 performance of B i C G for Test Problem 1 performance of B i - C G S T A B for Test Problem 1 performance of C G S for Test Problem 1 overall flop counts and average computed constants for Test Problem 2 performance of various incomplete factorizations for Test Problem 2 norm of the error for Test Problem 3 comparison of performance for Test Problem 3 comparison of iteration counts of natural and multicolor orderings 123 127 129 130 132 132 133 134 134 135 135 138 139 140 141 152 List of Figures 1.1 c o m p u t a t i o n a l molecules 1.2 1.3 numerical solution of E q . (1.24) r e d / b l a c k ordering of the one-dimensional grid 14 15 8 1.4 the m a t r i x associated w i t h the red/black ordering for the one-dimensional case 16 1.5 1.6 1.7 1.8 eigenvalues of the reduced m a t r i x (solid line) and the unreduced m a t r i x (broken line) for the one-dimensional model problem w i t h n = 65 and /3 = 0.3 19 n a t u r a l lexicographic ordering of the tensor-product grids 21 sparsity patterns of the matrices 22 r e d / b l a c k ordering in the two-dimensional case 22 2.1 2.2 2.3 2.4 2.5 2.6 2.7 a three-dimensional checkerboard r e d / b l a c k ordering of the 3 D grid points t h a t are affected by the block Gaussian elimination structure of the c o m p u t a t i o n a l molecule associated w i t h the reduced operator sparsity pattern of the lexicographically ordered reduced m a t r i x eigenvalues of both systems for Poisson's equation singular values of both matrices for E q . (2.24) 30 31 32 33 35 43 44 3.1 orderings of the 2 D block grid 49 3.2 three members in the family of natural two-plane orderings 50 3.3 3.4 3.5 3.6 3.7 3.8 3.9 red/black and toroidal two-plane ordering corresponding to x-y oriented 2 D blocks . 51 sparsity patterns of two members of the two-plane family of orderings 53 block c o m p u t a t i o n a l molecule corresponding to the family of orderings 2 P N 53 four-color I D block ordering 54 ordering and sparsity pattern of the m a t r i x associated w i t h 2 L N x y ordering 55 block c o m p u t a t i o n a l molecule corresponding to the ordering strategy 2 L N x y 58 possible block partitionings of the two-plane m a t r i x 59 3.10 3.11 3.12 3.13 a z o o m on 2 D blocks 60 comparison of the spectral radii of block Jacobi iteration matrices for cross-sections of the mesh R e y n o l d s numbers 61 spectral radii of iteration matrices vs. mesh Reynolds numbers 62 s y m m e t r i c reverse C u t h i l l - M c K e e ordering of the reduced m a t r i x 63 4.1 4.2 sparsity patterns of the matrices involved in the proof of L e m m a 4.8 "Near P r o p e r t y A " for the I D splitting 84 89 4.3 sparsity pattern of the m a t r i x C J ' 90 4.4 4.5 4.6 the function hs (at) â€¢ spectral radius of the S O R iteration m a t r i x vs. the relaxation parameter comparison of the spectral radii of the Gauss-Seidel iteration matrices 5.1 5.2 a two-dimensional slice of the stencil associated w i t h U 121 fill-in in the construction of I L U ( l ) in the plane containing the gridpoint for which the discretization was done 121 1 vi 93 95 103 List of Figures 5.3 fill-in in the construction of I L U ( l ) in the plane adjacent to the gridpoint for which the discretization was done 122 5.4 sparsity patterns of the factors of I L U ( l ) and ILU(2) 123 5.5 sparsity patterns of factors for ILU with drop tolerance 1 0 125 5.6 /2-norm of relative residual for preconditioned B i - C G S T A B 131 5.7 a 2D slice of the numerical solution of Test Problem 3 140 5.8 2D mesh architecture (3 X 3) 143 5.9 a 2D slice of gridpoints associated with one processor 145 5.10 the part of the reduced matrix that contains the gridpoints associated with a certain subcube, and the rectangular matrix before local ordering 149 5.11 the "local part" of the rectangular matrix after re-ordering 150 5.12 a 2D slice of processors that hold entries of I D or 2D sets of unknowns 151 - 2 vii Acknowledgements I would like to thank my supervisor, J i m V a r a h , for his devoted guidance. J i m has been of great inspiration to me, always supportive, enthusiastic, insightful and helpful. I learned a lot from J i m about m a t r i x analysis and linear algebra, and am deeply grateful to h i m for the many hours he devoted to me, and for m a k i n g these years of work on my thesis a most pleasant, enjoyable and satisfying experience. I a m indebted to Gene G o l u b . T h i s work started after I had read Gene's j o i n t papers w i t h H o w a r d E l m a n , and was fascinated by the results and the analysis. I a m grateful to Gene for his interest in my work, for pointing out useful references and i m p o r t a n t aspects of the problem, and for his kind hospitality during my visits at Stanford University. I would also like to thank H o w a r d E l m a n for being the external examiner of this thesis, and in particular, for his valuable comments and suggestions.concerning Section 4.3, which substantially improved the exposition. M a n y thanks to U r i Ascher. U r i has been very helpful, and I am grateful to h i m for numerous pleasant and fruitful discussions and for teaching me many interesting and i m p o r t a n t aspects of numerical c o m p u t a t i o n . T h a n k s to M i c h a e l W a r d and A n t h o n y Peirce for their interest and input in our committee meetings; thanks to M i c h a e l for providing interesting problems and references. T h a n k s also to E l d a d H a b e r , for many s t i m u l a t i n g discussions on everything from global politics to cyclic reduction. I have benefited from several discussions w i t h X i a o d i Sun and X i a o - W e n C h a n g and t h a n k them for their helpful suggestions. F i n a l l y , very special thanks to my wife R o n i and my daughter Shakaed, who have been a wonderful source of motivation and support, and have given me all the love and encouragement I could hope for. viii Chapter 1 Introduction Mathematical models for fluid mechanics are of major interest in science and engineering. The important task of modelling physical phenomena (such as the motion of particles in a fluid or the relative motion between an object and a fluid) is complicated and challenging, not only from the point of view of the scientists who design the model for a given problem, but also from the point of view of the numerical analysts who are interested in finding accurate numerical solutions for the model. Concentration of a pollutant (for example radioactive waste or atmospheric pollution), flow of fluid past an obstacle, the motion of airplanes and submarines, the wind that blows past a bridge, semi-conductor modelling, and even some financial models of share options all these important problems have in common the type of equations that are used to describe them. These are partial differential equations whose associated unknown quantities are typically functions of space and time; the equations are typically second order in space and first order in time; they could be nonlinear, and an analytical solution is usually not available, due to their complicated nature. One such family of equations, which we will focus on throughout this thesis, is known by the name convection-diffusion equations. A great deal of investigation and analysis has been performed on these equations in the last few decades. They encapsulate many mathematical difficulties, and are of major importance in several areas of application. In the preface to Morton's book [81] on numerical solution of this type of equations, the first sentence reads: "Accurate modelling of the interaction between convective and diffusive processes is the most ubiquitous and challenging task in the numerical approximation of partial differential 1 Chapter 1. Introduction equations". In this thesis the focus is on the particular problem of how to solve linear systems t h a t arise from finite difference discretization of three-dimensional steady-state convection-diffusion equations. In particular, we analyze the technique of one step of cyclic reduction, w h i c h proves to be numerically stable and effective. T h e description of the method and its analysis are to come later. W e first give a general description of the equations we deal w i t h , and the linear systems t h a t arise when the equations are discretized. 1.1 Background 1.1.1 T h e Convection-Diffusion Equation W e consider steady-state convection-diffusion equations of the following type: - V â€¢ ( D V u ) + V â€¢ (Eu) on Q ; = F du vu + ipâ€” = G on (1.1a) ^ oil . n on (1.1b) Here u is an unknown quantity. W e focus on the three-dimensional problem, i n which case u, D, E, F and G are trivariate functions, and Â£1 Â£ 1Z . 3 Consider for example the equations associated w i t h fluid m o t i o n . N e w t o n ' s second law of m o t i o n for fluids, according to which the rate of change of m o m e n t u m of a fluid particle is equal to the net force acting on it, is represented by the Navier-Stokes example, [46]). Considering the particular class of equations for viscous, equations (see, for incompressible fluids, the equations for this type are: /9(ui + u - V u ) = - V p + ^ A u + f ; (1.2a) V â€¢u = 0 , (1.2b) 2 Chapter 1. Introduction where: â€¢ p is the density of the fluid. â€¢ T h e u n k n o w n q u a n t i t y u is the velocity field: a vector function, each of whose elements depends o n the spatial variables x, y and z, and on the time variable, t. â€¢ p is the pressure: a scalar quantity considered t o be u n k n o w n , which depends o n the spatial variables and the time. â€¢ p is the coefficient of viscosity, which is a physical property of the m a t e r i a l . â€¢ f denotes the external force acting on the fluid, which is assumed to be k n o w n . Such a force could be, for example, gravity. Eq. (1.2a) is called the momentum equation. E q . (1.2b) is the continuity equation, and illustrates the incompressibility of the flow, which means that pressure variations do not produce any significant density variations. T h e most obvious difficulty associated w i t h the Navier-Stokes equations is the nonlinearity. O n e can deal w i t h it i n several ways. F o r example, using P i c a r d like iteration [73] for the m o m e n t u m equation leads to the linear Oseen equation [44], which for steady-state is a linear equation of the form (1.1). O t h e r c o m m o n ways to handle the nonlinearity are N e w t o n ' s method or quasi-Newton methods [86]. T y p i c a l l y , i n a nonlinear iterative solver, at each iterate a linear system of equations has to be solved. It is therefore of much i m p o r t a n c e to derive efficient methods for numerically solving these linear systems. E q u a t i o n s of the type (1.1) arise when considering, for example, temperature variation, or v a r y i n g concentration of some substance (e.g. chemical or radioactive pollutant) m i x e d w i t h the fluid. Related examples are thermal processes in meteorology, salt concentration i n oceans, and air a n d water pollution [109]. A n i m p o r t a n t term here is "convection", which describes heat transfer i n case of temperature variation, or mass transfer when concentration variation is considered. T h e d i s t r i b u t i o n of the concentration c of a fluid is determined by its advection by m o v i n g particles and by its diffusion between fluid particles. T h e equation is ct + Â« â€¢ V c = V â€¢ (DVe) + / . 3 (1.3) Chapter 1. Introduction Here u is the velocity and D is a coefficient of diffusivity. If these quantities do not depend on c, E q . (1.3) is linear. T h i s is a typical convection-diffusion model problem. In order to "quantify" difficulties that arise in physical problems such as the ones t h a t have been described above, a dimensionless quantity that is t y p i c a l of the problem is defined. In the problem describing concentration of a pollutant, it is the Peclet number, which is the velocity times the length scale of the d o m a i n , divided by the diffusivity coefficient. In the Navier-Stokes equations, the dimensionless Reynolds number divided by the viscosity of the fluid. Reynolds number is denoted by Re; Peclet number is denoted by Pe. is the product of velocity, density and length, These numbers significantly vary from one problem to another. A s it turns out, as long as these numbers are s m a l l , the problem is relatively simple to solve and theory can provide an accurate description of the physical phenomenon. For example, according to [109], the flow past a cylinder can be quite accurately modelled for Reynolds numbers up to about 40; the flow is called laminar in this case. A s the Reynolds numbers pass this l i m i t , there is separation to w h a t is called turbulent flow, instability develops, and it is much more difficult to predict or model the behavior of the flow. T h e physical problems scientists and engineers are interested in are obviously not l i m i t e d to low Reynolds or Peclet numbers. W h e n these numbers are high, the difficulties t h a t arise are not only limited to the physical sense; there are also difficulties in the numerical solution of the equations. 1.1.2 Finite Difference Methods There are various approaches for numerical solution of convection-diffusion equations. A m o n g t h e m we m e n t i o n finite differences [80],[91],[101], finite elements [17],[66],[72], and finite volumes [70] . T h e approach used in this thesis is finite differences. Here the idea is to approximate the differential operator by a difference quotient which can be thought of as a discrete difference operator. F o r example, consider the ordinary differential equation Â£u = f 4 (1.4) Chapter 1. Introduction on the interval (0,1), where Â£ is a differential operator of the type Â£ = a(x)-Â£s + b(x)-^:, and either u or u' are given on x = 0 and x = 1. First, the problem is discretized as follows: a grid consisting of n gridpoints is defined, each of which is a point where an approximate solution of u is to be computed. Suppose the grid is uniform, denote its size by h = and let U{ denote the numerical solution at X{ = ih, i â€” 1,..., n. If h is small enough and u is sufficiently smooth [101], the first few terms of the Taylor expansion can be used to derive an effective approximation; given can be expressed as: i, u(xi i) + h h 2 u(x ) i+1 Similarly, for 3 = u(xi) + hu'{xi) + â€”u"(xi) + -yu"'{xi) + 0(h ) . (1.5) yu"(x*) - -^'"(a:,-) + 0(h ) . (1.6) 4 u(xi-i): Â«(a:,--i) = u( ) Xi h? hu'(xi) + h 3 A Now, simple combinations of Eq. (1.5) and (1.6) provide a few possible approximation formulas for u'(xi) and u"(xi), expressed in terms of w(x _i), t u(xi) and u(xi+i). For the first derivative it is straightforward to verify that: # . u > ( ) = "(*i+iJH"(*0 X i + 0 u'(xi) = ^ Â± l t ^ i = l l (h) . (h ) 2 + 0 . The following approximation schemes can therefore be defined: 1. First order accurate "forward scheme": ( u )i = 1 â€¢ I - ) â€¢ (1-8) 1 7 2. First order accurate "backward scheme": (Â«')â€¢â€¢ = 3. Second order accurate "centered scheme": Chapter 1. Introduction For the second derivative, one usually uses a second order accurate centered scheme, which can be obtained by summing E q . (1.5) and E q . (1.6): (u")i = . 1 + 1 (1.10) What all the approximation schemes that have been presented have in common, is that the derivatives of u at the point X{ are approximated by three values only, namely tt,-_i, Ui i. + and The difference operator is thus a three-point operator. The generalization to more space dimensions is straightforward. In the two-dimensional case, if n is the number of unknowns in each horizontal or vertical line of the mesh and h = â€”rr n+l is the grid size, we denote by the numerical approximations to u(ih,jh), 1 < i,j < n, and discretize the differential equation as follows [101]: â€¢ The first order derivatives are approximated by either a second order accurate centered difference scheme, or by a first order accurate one sided scheme. For example, for the derivative in ^-direction, in case of centered differences we have ' . ( ^ S t ! ^ , - (1 .n) a backward scheme would be ^, i k ) M SiZJS=li ; (1 . ) 12 and a forward scheme is Â±u(ih,jh)^ \~ ^ Ut+1 . U ox h (1.13) â€¢ The second derivatives are approximated using a second order accurate centered difference scheme. For example: d 2 cu -u\ -^u(ih,jh) i+i,j u Â« ~ 2 i,j u '1 and analogously for the y-derivative. 6 ~f + i-l,j fi I A \ u Â± , (1.14) Chapter 1. Introduction A t this point the pattern (for any number of space dimensions) is clear: each gridpoint is associated w i t h its two neighbors i n each direction, and thus: i n the two-dimensional case five points are involved i n the difference equation, and i n the three-dimensional case seven points are involved. Indeed, for the 2 D and 3 D case, the operators are termed the operator a n d the seven-point operator five-point respectively [67]. T h e associated computational molecule is defined as the set of values that are associated w i t h the discretization of a certain gridpoint. In the (general) variable coefficient case the values of the c o m p u t a t i o n a l molecule are gridpointdependent. O n the other hand, in the constant coefficient case the c o m p u t a t i o n a l molecule has the same form for a l l gridpoints that are not next to the boundary. T h r o u g h o u t the thesis, we shall use the term "the model p r o b l e m " i n reference to the following constant coefficient equation: ~Au +V Vu = w , T (1.15) on Q, = (0, l ) , subject to Dirichlet boundary conditions on dQ. Here s is the number of space s dimensions, and V T is an s-dimensional vector of constants: V â€” (a) i f s = 1, V = (O~,T) if s = 2, a n d V = (a, r , p) i f s = 3. Let 0 = Â°~h , T 7 = rh , T . ah S = ^ . , â€ž, (1.16) Clearly, i n two dimensions only j3 and 7 are defined, and in one dimension only (3 is defined. (3, 7 and 5 are the mesh Reynolds numbers. 1 In order to examine how effective a numerical solution w o u l d be, both the magnitude of the P D E coefficients and the size of the grid are considered, and a mesh Reynolds number encapsulates both of them in one m a t h e m a t i c a l quantity. T h e discrete operators can be expressed i n terms of the mesh R e y n o l d s numbers. F o r the model problem, the c o m p u t a t i o n a l molecules for interior gridpoints i n one, two and three dimensions are graphically illustrated i n F i g . 1.1. If centered differences are used for a p p r o x i m a t i n g 1 We note that we are following the definition used by Elman & Golub in [40], [41], [42]. In some texts, e.g. [46], [81], there is no division by 2 in (1.16). The words "cell" instead of "mesh", and "Peclet" rather than "Reynolds", are used interchangeably in the literature and are equivalent. 7 Chapter 1. Introduction the first order derivatives, the values of the c o m p u t a t i o n a l molecules are as follows: a = 6 ; b= -1 - 7 ; c = - 1 - /3 ; d = - 1 + /? ; e= - i + T ; / = -1-*; 9 = -1 + 6 . (1.17) F o r b a c k w a r d approximations we have: a = 6 + 2 â€¢ (/3 + 7+ 5) ; 6 = - 1 - 2 ; c = - 1 - 2/3 ; T d = - 1 ; e = - 1 ; / = - 1 - 26 ; g = - 1 . (1.18) F o r forward approximations we have: a = 6-2- (P d = - 1 + 2/3 ; e = - 1 + 2 + y + S) ; b = - 1 ; c = T -1 ; ; / = - 1 ; g = - 1 + 25 . (1.19) g/ / c a/ d cad / F i g u r e 1.1: c o m p u t a t i o n a l molecules associated w i t h the three-point, five-point and seven-point operators for the constant coefficient problem. T h e difference operators for the I D , 2 D and 3 D problems, after scaling by h?, are given respectively by (1.20a) (1.20b) F 3 Mi,i,fc = a + b M , j - i , f c + c u , - i , j , f e + d Wj+ij.fc + e 8 + / Â« i , j , f c - i + 5 Â« i , j , A ; + i .(1.20c) Chapter 1. Introduction 1.1.3 Solving the Linear System Once the finite difference scheme has been determined, we have a large sparse linear system of equations for the unknowns. T h e sparsity pattern of the associated m a t r i x depends on the ordering of the unknowns. F o r many c o m m o n l y used orderings the m a t r i x is narrow-banded. In more t h a n one dimension the band is necessarily sparse, regardless of the ordering used. Suppose the m a t r i x is of size N X N, and the linear system is given by Ax = b . (1.21) T h e direct solving approach of Gaussian elimination [54] is equivalent to transforming E q . (1.21) into a system Ux = y where U is upper triangular, by solving a system Ly = b where L is a unit lower triangular m a t r i x . A = LU is known as the L U decomposition of A , and storing L and U is advantageous when one needs to solve a system w i t h the same m a t r i x but w i t h multiple right-hand-side vectors. Unfortunately, for sparse matrices this approach suffers the major drawback of fill-in in the factors L and U, which occurs d u r i n g the factorization: the b a n d w i d t h of the m a t r i x is preserved, but the sparsity inside the band is destroyed. T h e issue here is not only waste of storage; fill-in causes an unacceptable increase in c o m p u t a t i o n a l work, and i n fact, the amount of work increases w i t h the number of space dimensions when finite difference discretization of p a r t i a l differential equations is concerned. It is thus advisable to use iterative methods, which exploit the sparsity pattern of the m a t r i x and typically use only a very s m a l l number of e x t r a c o l u m n vectors in addition to the storage required for the linear system. These are methods which generate a sequence of vectors {x^}, the solution x. which converge to T h e amount of c o m p u t a t i o n a l work involved in solving the system is modest when efficient iterative techniques are used. T h e multigrid method [16],[115], for example, can solve the system w i t h only 0(N) floating point operations. A t y p i c a l iterative m e t h o d involves essentially only matrix-vector products (as opposed to direct methods). T h r o u g h o u t the thesis we shall consider stationary methods and K r y l o v subspace methods. D e r i v a t i o n and analysis of these methods can be found in many books. In particular, we mention the books of V a r g a [113] and Y o u n g [117] on stationary methods, the book of G r e e n b a u m [61] on K r y l o v subspace 9 Chapter 1. Introduction solvers and preconditioners, and the books of G o l u b k V a n L o a n [55] and Saad [95], which present a variety of topics, including stationary and K r y l o v subspace methods. In C h a p . 5 a short overview on K r y l o v subspace solvers is to be given. Below we briefly describe stationary methods. T h i s is a family of fixed point schemes, which have been known and used for a long time. Consider the system Ax = b and denote by x^ the a p p r o x i m a t i o n to the solution x at the fcth iterate. T h e idea is: given an initial guess a;( ', iterate as follows: 0 Mi ( l + 1 ' = ( M - A)xW + b , ft = 0,1,... (1.22) where M is a m a t r i x associated in some way w i t h the m a t r i x A. Ideally, M should approximate A (or rather, M _ 1 should be an effective a p p r o x i m a t i o n to A - 1 ) , and at the same time solving a system involving M should be inexpensive - much cheaper than solving a system involving A. Clearly, these two requirements are somewhat contradictory, and here the m a i n difficulty in picking an efficient scheme lies. S t a n d a r d schemes are based on using a splitting [113] of the m a t r i x A, which is the operation of w r i t i n g it as A =M - N . T h e m a t r i x M~ N l guess is p(M~ N) l is the iteration matrix. (1.23.) A necessary condition for convergence for any initial < 1, where p denotes spectral radius (the m a x i m a l absolute value among the eigenvalues). D e n o t i n g the diagonal part of A by D and its strict lower and upper triangular parts by E and F, respectively, classical schemes are point J a c o b i , which corresponds to picking M = D, and point Gauss-Seidel, which corresponds to picking M â€” D + E'. T h e point Successive-OverR e l a x a t i o n scheme ( S O R ) is obtained by picking M = ^[D + uE] where u is a scalar between 0 and 2, and can be considered as a technique for accelerating the convergence of the Gauss-Seidel scheme. W h e n to is close to the o p t i m a l value (that is, the value that would give the smallest possible spectral radius of M~ N), l the convergence of the S O R scheme is substantially faster t h a n the convergence of J a c o b i and Gauss-Seidel. A great deal of convergence analysis for the S O R method (mainly for symmetric positive definite matrices) has been done by Y o u n g in the 1950s and 1960s. In particular, Y o u n g defined a class of "consistently ordered" matrices, for 10 Chapter 1. Introduction which strong connections between the eigenvalues of the J a c o b i , Gauss-Seidel a n d S O R iteration matrices exist. Some of these results will be discussed in C h a p t e r 4. A n o t h e r technique for accelerating the convergence of iterative schemes is the Chebyshev m e t h o d (see G o l u b & V a r g a [59]). O t h e r splittings which are more relevant to this work, are block splittings. T h e idea is to work w i t h square submatrices of the system's m a t r i x rather than single m a t r i x elements, as the basic building blocks for the splitting. F o r example, the block J a c o b i scheme corresponds to t a k i n g M = D, where D now is no longer the diagonal part of A; rather, i t is the WocAr-diagonal part of A. G e n e r a l convergence results for stationary methods are given i n V a r g a ' s book [113]. V a r g a presents many interesting and powerful results, a n d below we mention the ones which w i l l be useful for us later. See [113] for proofs. D e f i n i t i o n 1.1. A matrix B = (b{j) is reducible is a 2 X 2 block upper triangular that P BP T if there exists a permutation (nonnegative), A matrix B and is denoted is called P such matrix. If a m a t r i x does not satisfy the conditions of Defn. 1.1, i t is termed D e f i n i t i o n 1.2. matrix positive (nonnegative) irreducible. if all its elements are positive by B > 0 (B > 0). A n i m p o r t a n t class of matrices is the following: D e f i n i t i o n 1.3. and B ' 1 A matrix B = (bij) is an M-matrix if it is nonsingular, bij < 0 for all i ^ j , > 0. Obviously, one does not want to compute the inverse of a m a t r i x i n order t o determine whether or not i t is an M - m a t r i x . A useful way of determining whether a m a t r i x is an M- m a t r i x is: T h e o r e m 1.1. dominant [113, Cor. n X n matrix 1, p. 85] If a matrix B â€” (b^j) is a real, irreducibly diagonally with b{^ < 0 for all i ^ j and b{^ > 0 for all 1 < i < n, then B~ x 11 > 0. Chapter 1. Introduction A s it turns out, the above-defined properties could i n some circumstances guarantee convergence of stationary schemes: D e f i n i t i o n 1.4. A = M - N is a regular N splitting of A if M is nonsingular with M _ 1 > 0 and > 0. T h e o r e m 1.2. then the iterative [113, Thm. scheme 3.13, p. 89] If A = M-N associated with this splitting is a regular converges splitting of A and A for any initial - 1 > 0, guess. W h e n a t t e m p t i n g to solve the linear systems t h a t arise from discretizing P D E s , potential difficulties t h a t might arise are: 1. If any of the mesh Reynolds numbers is nonzero, the m a t r i x is n o n s y m m e t r i c . The n o n s y m m e t r y complicates the analysis significantly. T h e numerical properties are difficult to assess, compared to the symmetric case for which a great deal of analysis is available. T h e m a t r i x is not necessarily diagonally dominant. 2. If the Reynolds numbers are large the m a t r i x can be ill-conditioned, leading to reduced accuracy i n the numerical solution and possible divergence of iterative methods. A good example of these numerical difficulties is when the original (continuous) problem has ill-conditioning which the discretized linear system inherits. T h i s occurs, for example, when considering the singularly perturbed exit problem [77] of B r o w n i a n m o t i o n of particles confined by a finite potential well. In this case the continuous eigenvalue problem has an eigenvalue which is exponentially small, and a standard finite difference method would result i n a m a t r i x which is close to singular. F o r discussion and examples, see S u n [103] a n d references therein. It is i m p o r t a n t to distinguish between numerical difficulties t h a t arise as a result of the discretization scheme itself, and ones that arise when coming to solve the u n d e r l y i n g linear system. A s an i l l u s t r a t i o n of this, consider the following classical problem (see, for example, [46],. [81], [97]): - e u " ( x ) + a u ' = 0 , u ( Q ) 12 = 0, u(l) = 1 . (1.24) Chapter 1. Introduction This equation has the exact solution crx/e _ ^ u(x) = Here the mesh Reynolds number is (3 = e . (1.25) Using centered difference schemes for discretizing u" and u', the corresponding difference equation is given by â€¢ (-l-fi)u j 1 + 2u + (-l + 3)u =0. j f (1.26) j+1 In this simple case, the difference equation can be solved directly by seeking a solution of the form Uj â€” ip , and we get a quadratic equation whose solution for cp is [46]: 3 f + = Y^ â€¢ ; f - = l - ( L 2 7 ) After incorporating the boundary conditions, we obtain: -1 (1.28) When (5 > 1 the numerical solution is oscillatory (UJ changes sign in dependence on the parity of j ) , whereas the analytical solution of the differential equation is smoothly monotonically increasing. The larger j3 is, the more oscillatory and less accurate (as an approximation to the analytical solution) the numerical solution is. In Fig. 1.2(a) the oscillations are illustrated. In this graph, a was chosen to be equal to 1, e = 0.01, and a uniform 16-point grid was used. The solid line represents the analytical solution, and the broken line represents the numerical solution. Here the oscillations are not a result of numerical instability in the solution process; rather, they are a result of the finite difference scheme that is used. The natural remedy in this case would be to use a different numerical scheme. For example, if upwind differences are used, then the numerical solution is smooth - see Fig. 1.2(b). However, in this case the scheme is only first order accurate, and in particular, the numerical solution is not accurate at the vicinity of the boundary layer. 13 Chapter 1. Introduction (a) centered scheme (b) backward scheme F i g u r e 1.2: numerical solution of E q . (1.24). 1.2 One Step of Cyclic Reduction A t this point we present the technique that this thesis is concerned w i t h : (one step of) cyclic reduction. T h e idea of the method is to take the linear system arising from a discretization scheme, decouple the unknowns, and then compute the solution by solving smaller systems. T h e smaller systems generally have better numerical properties compared to the original system (including a smaller condition number), and thus by performing this procedure the difficulties described in the previous section become easier to handle. 1.2.1 The One-Dimensional Case In order to illustrate cyclic reduction and its advantages, we first consider a simple onedimensional model problem: -u"{x) + au' = 0. , (1.29) subject to D i r i c h l e t boundary conditions. T h e most c o m m o n ordering of the unknowns is the natural lexicographic ordering. In one dimension this ordering simply means t h a t for each i, i = 1,..., n, the i t h u n k n o w n corresponds to the numerical solution a p p r o x i m a t i n g Let /3 = ^ u(ih). be the mesh Reynolds number. T h e m a t r i x associated w i t h the linear system, 14 Chapter 1. Introduction using centered difference discretization, is [40]: ( 2 -1 -1-/3 + J3 -l + (3 2 0 2 -1-13 -1 + 13 (1.30) -1-/3 2 -l + (3 0 -1-/3 2 V Lexicographic ordering is only one alternative. A n o t h e r c o m m o n ordering strategy is the red/black ordering: referring to the grid as a checkerboard, each gridpoint is assigned a color, either red or black, and points of one of the colors, say red, are numbered first. In the onedimensional case r e d / b l a c k ordering means that we take the indices used in the lexicographic ordering, and re-order them, so that the odd-numbered indices are ordered first. T h i s is illustrated in F i g . 1.3. R B R B 1 5 2 R 6 B 3 R B 7 4 . 8 F i g u r e 1.3: red/black ordering of the one-dimensional grid T h e sparsity pattern of the m a t r i x associated w i t h the red/black ordering is depicted in Fig. 1.4. It is evident by looking at the m a t r i x in this figure t h a t the linear system has the form: B D C E where b o t h B and E are diagonal. \ ( j \ * In E q . % J ( ^ \ \ (1.81) J (1-31) the superscripts (r) and (b) have been introduced to illustrate the dependence on the color of the gridpoint. N o w , referring to the 15 Chapter 1. Introduction 0 5 10 IS nz = M 20 25 30 Figure 1.4: the matrix associated with the red/black ordering for the one-dimensional case. matrix as a 2 x 2 block matrix, a simple process of block Gaussian elimination in E q . (1.31) leads to a system whose second block-row is a (smaller) system for the black points, which is called the reduced system [40]: [E-DB- C}u^ l Since B~ x = w^-DB~ w^ l . (1.32) is diagonal and has entries whose typical values are not zero or close to zero (when an appropriate discretization scheme is used), the inversion is numerically stable. The new system in the I D case is tridiagonal, just like the original system, but is half the size. Once the solution for the black points is computed, the solution for the red points corresponds to solving a diagonal system. The procedure of moving from system (1.31) to system (1.32) amounts to performing one step of cyclic reduction. In order to illustrate the advantages of one step of cyclic reduction in the nonsymmetric case, let us refer back to E q . (1.30). The region of numerical stability for centered difference discretization is < 1 [65]. For these values of (3 the matrix is irreducibly diagonally dominant [40]. Also, Elman & Golub showed in [40], that for these values of (3, there exists a real diagonal matrix Q such that Q~ AQ l is symmetric positive definite. For solving a tridiagonal system, Gaussian elimination is clearly the preferred numerical technique, and from either one of the above two observations it follows that if \f3\ < 1 the algorithm is stable even without pivoting. If \(3\ > 1 the matrix A is not diagonally dominant, and pivoting needs to be used to ensure 16 Chapter 1. Introduction stability. A l t e r n a t i v e l y , consider applying the point-Jacobi iterative scheme; the J a c o b i iteration m a t r i x is tridiagonal w i t h zeros along its diagonal, and convergence results can be readily obtained by using the following (see, for example [40], or [95]): L e m m a 1.1. agonal, The main eigenvalues diagonal, and of an n x n tridiagonal superdiagonal respectively, \i = a + sign(c)2\/&ccos f matrix with b, a and are given , c along its subdi- by i = l,...,n. (1.33) B y L e m m a 1.1 the J a c o b i scheme is convergent only if (3 < 2 [40]. U s i n g the J a c o b i scheme for solving a tridiagonal system is generally not efficient, but the fact t h a t the scheme converges for such a narrow range of mesh Reynolds numbers is a p r i m a r y indication of the difficulties t h a t arise (and i n fact are magnified) when considering two-dimensional or three-dimensional problems (where, as opposed to I D problems, iterative methods are efficient). W e now compare the above w i t h the analogous properties of the cyclically reduced m a t r i x . T h e reduced m a t r i x (for odd n), after scaling by 2, is [40]: / 2 + 2(3 -(l-/?) 2 -(l + f3) 2 N 2 2 + 2(3 -(1-/3) 2 - ( 1 + /3) 2 2 2 + 2f3 -(l-(3) 2 2 S = (1-34) - ( 1 + /?) V 2 2 + 2/? -(1 -(l-(3) 2 + P) 2 2 2 + 2P 2 J It can now be observed that the reduced m a t r i x is diagonally dominant for all values of (3. In a d d i t i o n , it is symmetrizable by a real diagonal m a t r i x for all (3 and the symmetrized m a t r i x is positive definite [40]. C o m p a r i n g t h a t to the original system, we see t h a t the restriction \/3\ < 1 17 Chapter 1. Introduction has disappeared. (We note, however, that the s y m m e t r i z a t i o n operation itself, for b o t h the reduced a n d unreduced m a t r i x , m a y be numerically unstable [40]). A s far as the p o i n t - J a c o b i scheme is concerned, it has been shown in [40] that it is convergent for /3 ^ 1. In comparison t o the c o n d i t i o n of convergence for the unreduced solver (/? < 2), this is a substantial improvement. A d d i t i o n a l observations regarding the eigenvalues and the spectral c o n d i t i o n numbers of the unreduced and the reduced matrices are presented below: P r o p o s i t i o n 1.1. Suppose of the unreduced |/3| < 1, n is odd, and h is sufficiently and the unsealed min \f matrices Denote the by A ^ ' and A ^ ' respectively. eigenvalues Then < min A ^ < max A ^ ' < m a x A ^ . ] Proof. reduced small. (1.35) B y L e m m a 1.1, the eigenvalues of the unreduced m a t r i x are: = 2- 2â€¢v !-/? \f 7 ] 2 â€¢ cos(7rjh) , j = 1,..., n . (1.36) T h e eigenvalues of the unsealed reduced m a t r i x (namely \S) are Af ) = l + /3 -(l-/? )-cos(7r^) 2 , 2 j = l,...,i(n-l) , . Since m a x A f = 2 + 2 â€¢ yfl - (3 â€¢ COS(TT/*) and m a x \ \ where h = 2 i ^2â€”hi = 1 + (3 + (1 2 j P ) -cos(7r^), and since 1 + fi < 2, 1-/3 2 R ) (1.37) 2 2 < 2^/l - fi 2 and cos(-jrh) < cos(7r/i), it readily follows t h a t m a x A ^ ' < m a x A ^ ' . A s for the smallest eigenvalues, m i n A ; ' ' = 2 â€” 2 â€¢ \ / l â€” /3 -cos(7r/i) 7 3 and m i n \ { R ) 2 7 3 3 = 1 + (3 - (1 - (3 ) â€¢ cos(nh). 2 2 Denote x = y/l -j3 . 2 Then min \ 3 3 2x â€¢ (1 â€” x) + o(h); this expression is nonnegative for h sufficiently s m a l l . Fig. { R ) - m i n \f ) = 3 â€¢ 1.5 illustrates P r o p . 1.1 for a particular example: n = 65 a n d (5 = 0.3. Here the smallest a n d largest eigenvalues are 0.1841 and 1.9959 for the reduced m a t r i x , a n d 0.0943 and 3.9057 for the unreduced m a t r i x . T h e condition numbers are 72.58 a n d 262.27 - a ratio of about 3.6 in favor of the reduced m a t r i x . N u m e r i c a l experiments that we have performed, indicate 18 Chapter 1. Introduction t h a t for \(3\ < 1 the reduced m a t r i x is better conditioned than the unreduced m a t r i x by a factor of 2 to 4. In fact, for the s y m m e t r i c case (3 = 0 it can be shown that: P r o p o s i t i o n 1.2. is better conditioned, Proof Fork sufficiently small, and the improvement K2{A) ^4K (S). In other words, 2 is by a factor of approximately the reduced matrix 4- T h e two matrices i n this case are identical (as far as the values on each diagonal are concerned), except the reduced m a t r i x is about half the size of the unreduced m a t r i x . B y the eigenvalues specified in P r o p . 1.1 it follows that Ko(A) = ] \ fl and no(S) = +cos nl Â° V T a y l o r expansions, for h sufficiently small K (A) = ^2 2 VI ' â€¢ l-COS(7T/l) + Â°{h ) 2 a Z n d K {S) = 2 since h ~ 2/i, the result readily follows. 1 + c o s ( ^) Using 7 r l-COs(7r/l) b + o(h ), 2 and â€¢ F i g u r e 1.5: eigenvalues of the reduced m a t r i x (solid line) and the unreduced m a t r i x (broken line) for the one-dimensional model problem w i t h n = 65 and (3 = 0.3. W e have seen, then, by examining a one-dimensional model problem, t h a t one step of cyclic reduction improves the numerical properties of the linear system. In a d d i t i o n , since it involves solving a t r i d i a g o n a l system for only half of the unknowns and a diagonal system for the rest of the unknowns, the cyclic reduction step gives rise to a faster solution procedure. In finite a r i t h m e t i c a n d i n situations of very large linear systems, the improvement can be very significant. 19 Chapter 1. 1.2.2 More Space Dimensions Introduction In more than one space dimension the equations are considerably more complicated and the difficulties directly affect the level of difficulty in numerically solving the underlying discretized system of equations. From the linear algebra point of view, a major difference between the one-dimensional problem and higher dimensions, is that for the latter there is no way to turn the matrix into one with a dense band, regardless of the ordering strategy used. The natural lexicographic ordering for 2D and 3D is illustrated in F i g . 1.6; the corresponding sparsity patterns are depicted in F i g . 1.7. The red/black ordering in two dimensions is depicted in F i g . 1.8. Another major difference is that more than one convective term is involved; thus, we can now face a situation of small and large Reynolds numbers simultaneously. When a discretization gives rise to a matrix which is not diagonally dominant, there might be a need for block methods which are performed in accordance with how weakly or strongly coupled the unknowns are in each of the directions [6]. Finite difference schemes and orderings that take into account the direction of the flow or attempt to minimize the truncation error might perform better than standard schemes [92],[98],[39]. In [92], Roe & Sidilkover derive multidimensional equivalents of the upwinding scheme in a direct fashion (that is, not by generalizing the I D scheme in a dimension-by-dimension fashion). In particular, they examine linear schemes and determine the range of values of coefficients for which the truncation error is minimized over all positive schemes. The formulas which were derived depend on the geometry of the characteristics in the time-dependent problem. As is shown in [92], the cross-stream diffusion is smaller for the suggested scheme, compared to the dimensional upwind scheme, and these schemes have typically narrow stencils. See [99] for an overview of several narrow schemes with small truncation error for convection-dominated equations. The procedure of one step of cyclic reduction can be performed for any two-dimensional and three-dimensional problem originally discretized by a finite difference scheme. When the five-point and seven-point operators are used, the difference equation for a red point depends 20 Chapter 1. Introduction only on the point itself and its neighboring black points; similarly for the difference equations corresponding to the black points, and therefore the matrix B is diagonal, and the cost of performing one step of cyclic reduction is low. However, the matrix B in E q . (1.31) is not always diagonal. In higher dimensions cyclic reduction causes some loss of sparsity (as opposed to the I D case). Another important difference from the I D case is that the elimination must be ac- companied by re-ordering of the unknowns (equivalently, permutation of the matrix). The re-ordering is essential because orderings which are effective for the standard tensor-product grid are not necessarily effective for the reduced grid, which has "holes" that correspond to the now-eliminated red points. 13 14 15 16 9 10 11 12 5 6 7 8 3 4 1 2 (a) 2D (b) 3D Figure 1.6: natural lexicographic ordering of the tensor-product grids. In the early 1990s, Elman & Golub conducted analysis and an extensive set of numerical experiments, examining the properties of one-dimensional and two-dimensional non-self-adjoint problems to which one step of cyclic reduction is applied. Their findings were published in a series of papers [40],[41],[42]. They showed that in the two-dimensional case the cyclically reduced operator is a 9-point operator (different from the classical compact 9-point operator), and a matrix-vector product with the reduced matrix is slightly less costly than a matrix-vector product with the unreduced, original matrix. One-line and two-line orderings were considered, and bounds on convergence rates were derived for both orderings. Symmetrization results were 21 Chapter 1. Introduction (a) 2D fn = (b) 3D (n = 4) Figure 1.7: sparsity patterns of the matrices. 15 7 16 \V\ \v. \'t, *â€¢. 8 *. tm : W: 5 13 6 ':':. *â€¢. 14 W-. ' . 1. â€¢. 1. 11 3 12 4 \>!.\ W \ \ 1 9 2 10 â€¢! \ \ t : \ \ \ 0 (a) ordering 10 20 (b) 30 Â«0 50 matrix Figure 1.8: red/black ordering in the two-dimensional case. given, which showed that the reduced matrix can be symmetrized by a real diagonal similarity transformation for a larger region of mesh Reynolds numbers, compared to the unreduced matrix: both matrices can be symmetrized for any mesh Reynolds number if upwind differences are used, or if centered differences are used and the mesh Reynolds numbers are smaller than 1 in magnitude. However, only reduced matrix can be symmetrized if both mesh Reynolds numbers are larger than 1 in magnitude and centered differences are used. The symmetrization was used to derive tight bounds on convergence rates of block stationary methods [40],[41]. Earlier observations of Thompson, Ferziger & Golub indicate convergence of the reduced SOR 22 Chapter 1. Introduction solver for any mesh Reynolds number, and convergence which is faster than the convergence of unreduced solvers. Elman & Golub showed analytically as well as experimentally, that reduced solvers converge faster. In the constant coefficient case it was shown in [40] that the reduced matrix is a diagonally dominant M-matrix for any value of the P D E coefficients when upwind differencing is used, or for P D E coefficients in the diffusion-dominated region when centered differencing is used. The variable coefficient case was also addressed [42], and an extensive set of numerical experiments demonstrating performance and the connection between orderings and the direction of the flow, was given. The work of Elman & Golub is very relevant to the current work. Even though the matrices derived and analyzed in this work are very different from the ones in [40],[41],[42], some of the same techniques are used. Throughout the thesis we will point out similarities and differences between the 3D case discussed here and the findings of Elman & Golub for the two-dimensional case. The step of cyclic reduction could be repeated several times. The complete cyclic reduction approach involves repeating the decoupling until the reduced systems are small enough that they can be solved directly. This procedure was originally presented by Hockney [71], based on Golub's ideas. Buneman [18] pointed out that computing the right-hand-side vector in finite precision arithmetic could result in severe round-off errors, and suggested a numerically stable algorithm (mathematically equivalent to Hockney's algorithm). Buzbee, Golub and Nielson [20] presented Buneman's algorithm and a variation of it, and provided careful analysis of the stability issue. 1.3 Thesis Outline and Notation This work is the first to derive and analyze one step of cyclic reduction in conjunction with the important class of three-dimensional non-self-adjoint elliptic problems. In the succeeding chapters, we demonstrate the superiority of the reduced linear system of equations, in particular the convergence rate and amount of computational work involved. 23 1. Chapter Introduction In Chapter 2 we describe cyclic reduction and derive the cyclically reduced operator for the three-dimensional problem. In Chapter 3 we present block ordering strategies which can be applied to any three-dimensional problem, and demonstrate how they are applied to the reduced grid. Full details on the structures of the matrices associated with the various orderings are given. In Chapter 4 a comprehensive convergence analysis is performed for block stationary methods, and it is shown that the new system of equations is in general better conditioned than the original. Symmetrization conditions and tight bounds on convergence rates are derived. In Chapter 5 we discuss ways of solving the reduced system using Krylov subspace methods in conjunction with several preconditioning techniques. We give details on implementation, and present numerical experiments which include comparisons with the unreduced system. Finally, in Chapter 6 we draw conclusions and suggest future directions of investigation. For notational convenience, the following rules are used throughout the thesis: â€¢ For narrow banded matrices the terms 'diag', 'tri', 'penta' etc. are used in the same manner as in the Matlab command 'spdiags', that is: if x, y and z are vectors of size n, then tri[a:,y, z] will denote a tridiagonal matrix whose main diagonal consists of entries of the vector y, the subdiagonal consists of x\ to Â£ _ i , and the superdiagonal consists of n Z2 to z . x n n and z\ do not appear in the matrix in this case, x, y and z could be also matrices, in which case the same rules (as for vectors) apply. â€¢ The index of a given diagonal in a matrix is: 0 for the main diagonal, positive numbers for superdiagonals and negative numbers for subdiagonals. â€¢ I stands for the identity matrix of order n. n â€¢ p(T) stands for the spectral radius of the matrix T. â€¢ If A and B are matrices, then B > A or B > A refer to inequalities elementwise. â€¢ -E's ,...,s 1 k denotes a vector whose entries are si,...,Sk, repeated. For example EQ\ = (0,1, 0,1, 0 , 1 , . . . ) , -Eiooi = (1) 0) 0) 1) 1) 0,0,1,...) and so on. The size of the vector will be clear from the context where it appears. 24 Chapter 2 Cyclic Reduction In the Introduction the term "one step of reduction" was explained and its advantages in the one-dimensional and two-dimensional cases were briefly described. As was mentioned, several steps of cyclic reduction can be performed, until the system is small enough to be solved directly [71],[18],[20]. In Sec. 2.1 we present this procedure, namely complete cyclic reduction. The purposes of presenting it are: first, to clarify how cyclic reduction can be carried out for general block tridiagonal systems, and how the step can be iterated. Secondly, we mention the issues that arise when seeking to compute the solution accurately. It was shown in [20] that complete cyclic reduction can be performed in a stable fashion if the matrix associated with the linear system arises from Poisson's equation, i.e. it is symmetric positive definite. However, in the nonsymmetric case, where typically the matrix is not diagonally dominant, and does not necessarily have real eigenvalues, the stability of the complete cyclic reduction algorithm cannot be guaranteed. Therefore, in non-self-adjoint problems, the focus is on applying a single step of cyclic reduction. In Section 2.2 we derive the cyclically reduced operator for the three-dimensional case, and in Section 2.3 we present some of its properties. 25 Chapter 2.1 2. Cyclic Reduction Complete Cyclic Reduction Consider the linear system Ax = b , where A is an n X n block tridiagonal matrix of the form: 2 B \ C B C 0 A 2 C B C C (2.1) = C 0 V and B and C are nxn matrices. 1 B C C B Suppose also that n = 2 â€” 1, where k is some positive integer k (see Sweet [105], [106] for an explanation on how to perform cyclic reduction if n is arbitrary). In the following it is assumed that when the index exceeds the dimensions of the matrix the corresponding value is considered as being identically zero. For any even 2 < j < n â€” 1, we write three consecutive CXJ_2 + b l o c k equations of this system, as follows: = b j - i BXJ-I + CXJ CXJ_I + BXJ + CXJ I C x j + BXJ I + + + CXJ +2 = bj = b j + 1 If we now multiply the first and third block equations by C , the second block equation by â€”B, and add up the three block equations, we get: C Xj2 2 + (2C 2 B )XJ 2 + C 2 x j + 2 = C6j_i + C V i - We thus obtain a new system of equations, for the vector (x2, B b j . (2.2) ..., z _ i ) , where the correT n In general there is no need to assume equality between the number of blocks and the size of each of the blocks. It is done here merely for the purpose of presentation. 26 Chapter 2. Cyclic Reduction sponding matrix is ( 2 C C - B 2 C 2 -B 2C 2 2 0 2 C C 2 2 C 2 2 - B 2 C 2 2 (2.3) C 2 C 2 0 V - B 2 C C 2 2C 2 2 2 - B 2 and the right-hand-side is ( C b x + C b 3 - C b 3 + C b 5 - B b ^ 2 B b 4 (2.4) \ C6â€ž_ + C 6 - B 6 _ i J 2 n n We can proceed recursively, as follows: define B ^ = B , C ^ = C , 6 B (m+i) ( C < ) ) - (5< )) ; C < m = 2 2 m 2 m + 1 ) = (C( )) ,65 m 2 = C( )(6^ m + 1 ) / i?( ) TO C( ) m 0 + b^\ ) m We now have a system of equations of the form A ^ x ^ = b j , and for m > 0, m m - Â£( )&J m m ) . = c ^ \ where m C ^ ^ #( ) c( ) C* ) B' m m m m ' c( ) m (2.5) c( ) JB( ) c ( ' m 0 V and the elements of c ( ' are given by c^" ' = b ^ l . m 1 m m C< ' 5 ( ' m m J Once this system is solved, it is straight- forward to compute the solution for the unknowns that were previously eliminated throughout the process. 27 Chapter 2. Cyclic Reduction Note that even if B and C are sparse, B ^ are C ' ' are not necessarily sparse. For solving m the reduced system, Fourier type algorithms based on diagonalizing the matrix (by the matrix whose columns are the eigenvectors) could be used. This option is very attractive when B and C commute (BC = C B ) , as in this case these two matrices can be diagonalized simultaneously. Since B ^ and are polynomials in B and C , it follows that their eigenvalues can be easily computed, provided that the eigenvalues of B and C are known. Indeed, for Poisson's equation the spectrum of the matrix is known, and stability analysis is available [20]. The problematic term in the computation of the right-hand-side is B ^ b ^ . There is more than one way to employ a recurrence relation for computing it. For example [20]: i>o = -26j ;u = m ) 1 B b ^ - v k = - Using this formula, we get ^ ( " O f c H = C v - . 2 k 2 u m. 2 Suppose Xj and LOJ are the eigenvalues of B and C respectively. It is shown in [20] that if |Aj| > then the computation of _B( )&' ' is actually done by a recurrence relation involving m 2\LJJ\ the term cosh(jzj), where Zj = c o s h m ( i j ) â€¢ Thus there can be a significant difference in - 1 1 and b ^ \ and the latter could be lost in rounding errors if n is large. magnitude between (This situation occurs frequently; for example, it occurs in the finite difference discretization of Poisson's equation). Buneman's algorithm [18] for overcoming the difficulty involves "backward"-type computations. For Poisson's equation, where C is the identity matrix, the technique uses the equality B = B ^ B ~ l - 2B' ; 1 and in general 5< ) = m 2I - n B ^ m + 1 \ Substituting this for j = 1,..., m, after some algebraic manipulation, a two-step recurrence relation is obtained, where the term (Â£(â„¢-i)B(>â„¢-2) . . . B ( o ) y i B { m ) r e p l a c e s t h then shown that the 2-norm of the matrix e ~ c B l , e problematic term in the original algorithm. It is (B*â„¢- )B^-V 1 â€¢- - B ^ ) - 1 is bounded from below by where c is some constant, and #i = cosh (â€”A;/2). For the Poisson's equation, _1 6\ > 1, and thus the upper bound is small enough to ensure stability. The full proof of stability of this procedure can be found in [20, pp. 648-655]. Throughout the years, many extensions to the original papers and algorithms [71],[18],[20] have been presented. Buzbee, Dorr, George & Golub [19] use cyclic reduction for solving Pois- 28 Chapter 2. Cyclic Reduction son's equation on irregular regions; Concus & Golub [25] discuss two-dimensional nonseparable cases; Bauer and Reiss [12] generalize the procedure for block pentadiagonal systems arising from the discrete biharmonic operators; Heller [68] compares the iteration count of block cyclic reduction to that of block Gaussian elimination [111] and shows that when the matrix is diagonally dominant, norms of the off-diagonal blocks decrease quadratically relative to the diagonal blocks at each step of reduction, and based on this observation, suggests criteria for early termination of the process, when the matrix becomes essentially block diagonal. Detyna [33] suggests an algorithm which is based on introducing two sets of equations, each with a different stencil, whose solutions are identical, and using the deliberately created large number of degrees of freedom, to eliminate half of the equations in each of these two sets, in a manner that preserves the structure of the computational molecules (of each of the sets). This step can be repeated, and leads to a procedure with 0(N ) 2 operations, thus is a faster algorithm than the classical algorithm [71],[18],[20] described in detail above; on the other hand, the algorithm gives rise to larger errors in the numerical solution; the error can be reduced by applying the suggested algorithm on the finer grids, and proceeding at the higher grids in the hierarchy with the more stable cyclic reduction algorithm (or any other accurate fast solver). Bondeli & Gander discuss application of cyclic reduction for special tridiagonal systems [15]; Swarztrauber & Sweet [104], Gallopoulos & Saad [51], Amodio & Mastronardi [2] discuss aspects of parallelization; Amodio & Paprzycki [3] present a parallel algorithm for solving boundary value problems using cyclic reduction, and apply it to a distributed memory machine. A description of the cyclic reduction idea and a list of references can be found in Golub & Van Loan [55]. 2.2 The Three-Dimensional Cyclically Reduced Operator Even though the complete cyclic reduction algorithm can be carried out even when B and C do not commute [20], for nonsymmetric matrices the spectrum is frequently not known and numerical stability cannot be ensured in general. For example, Heller's observations in [68] are not applicable, as no diagonal dominance is guaranteed. In three dimensions issues of fill-in and stability are magnified, and could be very problematic. It is our purpose, then, to focus in 29 Chapter 2. . Cyclic Reduction a single step of cyclic reduction and perform comprehensive analysis. As a first step we derive the cyclically reduced operator. The derivation is done by performing the block Gaussian elimination step described in the Introduction [see E q . (1.31) and (1.32)]. Once the points corresponding to one of the colors (say red) are eliminated, the resulting reduced grid has a somewhat irregular structure: there are "holes" in the grid, which correspond to the previously present red points. A difficulty which is specific to three dimensions is: the parity of the planes has to be taken into consideration; points of the same color are not located on the same spots for even-indexed planes and odd-indexed planes. This can be illustrated by looking at the three-dimensional checkerboard depicted in F i g . 2.1. Figure 2.1: a three-dimensional checkerboard. Our starting point is the original, unreduced problem in three dimensions. The red/black ordering is depicted in F i g . 2.2; the points that have to do with the block elimination are numbered in F i g . 2.3. Here the point for which the discretization is done is in the center, indexed by #13. If this point is black, then points #4, #9, #12, #14, #17 and #22 are red and are to be eliminated. The other points in the figure are black, but their corresponding entries in the matrix at row #13 are to be changed after eliminating the red points. The construction of the reduced computational molecule is done by eliminating half of the rows and the columns of the matrix that corresponds to red/black ordering. The constant coefficient 30 Chapter 2. Cyclic Reduction Figure 2.2: red/black ordering, and the corresponding sparsity pattern of the associated matrix, for the three-dimensional problem. case is easier to derive, as the computational molecule is independent of the coordinates of the gridpoints. We thus illustrate how to construct the cyclically reduced operator for this case first. 2.2.1 T h e C o n s t a n t Coefficient M o d e l Problem Consider the model problem (1.15). In terms of the matrix elements, below are the entries which are affected by the block elimination step for the model problem. column: row 4 4 9 12 14 17 row 9 a a. row 14 1 2 3 5 6 f e c d b a row 22 d b 8 10 e c d e 11 15 18 19 20 b e f 16 d 23 ' 13 eliminate 4 : -fa- d b g e c d a eliminate 9 : â€” ea~ b eliminate 12 : â€” ca" d eliminate 14 : â€” da~ c eliminate 17 : â€”ba~ e eliminate 22 : â€” ga~ / i 2 -/a" / 1 3 -/a - 1 e -ea - 1 / 5 -fa~ c 6 -fa~ d 1 1 7 -ca' l f -da~ f â€”1 e â€”ea â€” 1 c -ca~ 1 e -ea _ 1 d -da~ e l l -6a 31 10 8 -fa~ b â€”ea 1 25 g c Here are the computations performed during the block elimination: column : 24 g 6 / g 21 g c / e a c 7 f c row 17 e f d a f 13 g b row 12 row 13 22 a - 1 / b g Chapter 2. Cyclic Reduction 7 10 11 12 13 14 16 17 15 18 19 Figure 2.3: points that are affected by block Gaussian elimination for point #13 (center) column : 19 20 eliminate 4 : eliminate 9 : eliminate 12 â€” ca c -ca _ 1 b -ba _ 1 c eliminate 14 -da eliminate 17 _ 1 -ba 6 _ 1 d -da~ g L -ba _ 1 b eliminate 22 -ba~ g 1 â€”ga~ e 1 â€” ga~ c 1 â€”ga d â€”1 â€”ga~ b 1 From this, we can see that the typical value on the diagonal of the reduced matrix is: a - 1 (a 2 2be - 2cd - 2fg) By 'typical' we mean an entry that is associated with an . interior (2.6) point. For non-interior points fewer operations of elimination are required, and the value in their associated diagonal entry changes with respect to (2.6) in the following manner: â€¢ add a cd in case the x-coordinate of the associated point (ih,jh, kh) is â€¢ add a~ be in case the y-coordinate of the associated point (ih,jh, kh) is â€¢ add a~ fg _1 x 1 in case the ^-coordinate of the associated point 32 (ih,jh, kh) is i j = 1 or = k 1 or i = n. â€” n. j = 1 or k = n. Chapter 2. Cyclic Reduction Let R be the reduced operator for the model problem, after scaling by ah . Then for an interior 2 gridpoint, (ih, jh, kh), the difference operator is given by: R = (a - 2 U i t j t k 2be - 2cd - 2fg) Ui ijik -2cf Ui-ij, -i ~ 2df u i,j,k-\ -2de ~ c Ui-2,j,k -b 2 Ui i j+i,k + t u i t j - ,k 2 -2bdui ij-i k + t ~ i+ k 2 - 2eg Ui,j+i,k+i ~ ~ d - f W Ui,j-i,k-i 2dgu j i ~ u - 2bg i+1 ik+ - e 2 u u i -i itj+ i t j + 2 tk ,k J i+2 ~ 9 i-r,j,k+i 2c - 2e/ ~ 2&cÂ«;_i -i,fc u ,j,k 2 Mij,fc-2 2 Uij-i k+i t 2 c e J+i.^ - g i,jM 2 u 2 â€¢ (2.7) Figure 2.4: structure of the computational molecule associated with the reduced operator. The gray squares correspond to the gridpoints which have been eliminated throughout the elimination process. Referring to the indices used in Figure 2.3, in terms of mesh Reynolds numbers the values of the computational molecule (for an interior gridpoint) are given below. 33 Chapter Index 1 2. Cyclic Reduction G e n e r a l case C e n t e r e d differences U p w i n d differences -P -(1 + S) - ( l + 2c5) . 2 2 2 -2ef - 2 ( l - ) ( l + c5) -2(1+2(5) 3 -2c/ - 2 ( l + /5)(l + <5) - 2 ( 1 + 2/5)(l + 25) 5 -2df - 2 ( l - / 3 ) ( l + <5) - 2 ( 1 + 25) 6 -26/ - 2 ( l + 7 ) ( l + <5) - 2 ( l + 2 ) ( l + 2Â£) 7 -e -a-7) -1 8 -2ce -2(1 + / 3 ) ( 1 - ) - 2 ( 1 + 2/3) 10 -2de -2(1-/3)(1- ) -2 11 -c 13 a 15 -d -(l-p) 16 -26c - 2 ( 1 + /?) (1 + 7) - 2 ( l + 2/3)(l + 2 ) 18 -2bd -2(1-/3)(1 + -2(1+ 2 ) 19 -b -(1 + 7) 20 -2eg -2(l- )(l-<5) -2 21 -2cg - 2 ( 1 + /3)(1-<S) - 2 ( 1 + 2/3) 23 -2dg -2(l-/3)(l-Â«) -2 24 -2bg -2(1 + -2(1+ 2 ) -9 -(l-<5) 25 7 2 T 2 T T - ( 1 + 2.9) 2 - 2be - 2cd - 2 2 2 2fg 30 + 2(/3 + 2 2 T + T ) 30 + 4(/3 + 7 + <5) + 20(/? + 7 2 7 7 -(1 + 2 ) 2 2 T 7 2 2 -1 2 7 8) 2 )(1-*) 7 -1 2 If the points in the reduced grid are ordered lexicographically, then the sparsity pattern of the matrix is as in F i g . 2.5. Notice that the matrix corresponds to a grid of size 6 x 6 x 6 , but is only 108 X 108 in size, whereas 6 = 216. This is because half of the unknowns in the original linear system 3 were eliminated. Another thing to notice is that the matrix is less sparse than the original, 34 Chapter 2. Cyclic Reduction Figure 2.5: sparsity pattern of the reduced matrix corresponding to lexicographic ordering of the unknowns (for a 6 X 6 X 6 grid) unreduced matrix: there are typically 19 nonzeros in each rows, as opposed to (typically) 7 nonzeros in each row of the unreduced matrix. Finally, notice that the main block diagonal has "holes" within itself, which get "larger" when the size of the matrix is larger. This is because no re-ordering which better fits the structure of the reduced grid has been performed here. This aspect is to be addressed in Chapter 3. In the discussion that follows, we refer to centered schemes. Similar observations can be made for upwind schemes. Denote the continuous operator corresponding to the model problem by (2.8) L = - A + (cr,r, ) V . T M Expanding (2.7) in a multivariate Taylor expansion about the gridpoint (ih,jh,kh) 1 2RU = LU - 1 1 1 2 ^~ 3 l + - h p, u i 2 l - â€”h a 2 xxz 12 2 1 - h a fiu 1 l"i.2 ,2 j Y2 u - - h u 2 xx o 6 ^ ^yy l - - h u ^^^ o z z 6 + 0(h ) . 3 zzzz XZZ + .- h 2 xy 1"L2 2 xxxx 2 xz OU 1 - - h aru 2 xxx ^" ^yyy + ~ h} Z + â€” h Gu 1~L2 â€¢^ Z 2 xyy 1-22 1 1 + - h au xxy 1 ~ ~ h} T fj, Uy + ~ h} \X Uyy ZZ 2 Y2 1 Uyyyy ~ ~ h} Uyy + -h Tu o 1 yields: [in zzz l"i.2 ^ yy xx g ^xxzz (2.9) The above computation was carried out using Maple V . The right-hand-side in E q . (2.9) can be referred to as an introduction of a new continuous operator, based on a combination 35 Chapter 2. Cyclic Reduction of the original equation, and 0(h ) terms associated w i t h second or higher derivatives of the 2 solution. T h i s continuous operator can be discretized directly, using the points t h a t belong to the c o m p u t a t i o n a l molecule of the cyclically reduced operator. F o r example, Uj+2,j,k â€” 2 i,j,k + u u and similarly for u yy x x ^ h Uj-2,j,k 2 a n d u ; and for the first derivatives zz ^ + ^i+lj-l.fc ^ Ui-l,j+l,k i+l,j+l,k u â€” Ui-l,j-l,k Ah and similarly for u a n d u . T h e result is a discretization scheme w i t h an 0(h ) discretization 2 y z error. T h i s result is similar to the result i n [57] for the two-dimensional case. W e also state t h a t the reduced right-hand-side is equal to w ',j,fc w i t h an 0(h ) 2 t error. Indeed, G a u s s i a n e l i m i n a t i o n yields the following right hand side: b Wi,j,k 'â€¢" c i,j-l,k a d i-l,j,k w e i+l,j,k w a f i,j+l,k w a g i,j,k-l w i,j,k+l w a w a ' (2.10) a whose T a y l o r expansion about the gridpoint (ih, jh, kh), after scaling by 2ah 2 [that is, after doing the same scaling as for E q . (2.9)] is: w â€” y^[ â€” Aw + (aw + rw + pw ) ] , where all the x y z derivatives of w are evaluated at the point (ih, jh, kh). T h i s result is similar to the result in [41] for the two-dimensional case. 2.2.2 T h e Variable Coefficient Case F o r the variable coefficient case performing cyclic reduction is considerably more difficult, as now the components of the c o m p u t a t i o n a l molecule depend on each gridpoint's coordinates. W e first illustrate how the seven-point operator is derived. T h i s derivation is standard; see for example [41] for details on the two-dimensional case. T h e discrete difference operator is derived by w o r k i n g o n a fine grid whose size is | rather than h. E q u a t i o n (1.1) is rewritten in the following manner: ~[{pux)x + (qu )y y + [ru ) ] z + su z x + tu y + vu z = w. A s s u m e t h a t p, q, r > 0 on Q,. T h e seven-point discretization is done as follows: 36 (2-11) Chapter 2. Cyclic Reduction â€¢ For the convective terms a centered finite difference approximation or an upwind discretization scheme is used. For example, if s > 0 on Q, the term involving the derivative in the x-direction is discretized either as follows (centered differencing): i,j,k ' ' ' h'~ ' '' s U +1 J k l : i k 2 The terms tu as follows (upwind backward differencing): o r 'Â« su x su Â« x Sij,fc"'' ' ~^'~ ' ' Jl i 1 J, c . and vu are discretized in a similar fashion. y z â€¢ For the diffusive terms a centered difference scheme is applied: a mesh of size | is used and the terms pu , qu and ru x y z are discretized, and then the derivatives of these terms in the x, y and z directions respectively are computed. In doing so, a scheme that defines u at a grid whose mesh size is h is obtained, using values of the coefficient functions at the finer grid whose size is \ . For example, for the first term in E q . (1.1) \P x)x ~ ~ (Pi+i,j,k Pi L U l,j,k + Jtk t+ u The terms and (qu ) y y (ru ) z z + Pi-I,j,jfeKj,fc + 7^2 Pi_i Ui-l,j,k ijik â€¢ are discretized by applying a similar procedure. The difference operator for an interior gridpoint is given by: F ^i,j,k â€” ^i,j,k V'i,j,k ~\~ ^i,j,k i,j +e;,j,fc Uij k +ii If Sij,fe, ti,j,k and Vi j k t t "f" di,j,k ^i+l,j,fc â€” l,k ~\~ ^i,j,k iâ€”l,j,k u u + fi,j k â€¢ v-i,j,k-i + 9i,j,k i,j,k+i u t (2.12) are all positive and centered differences are used, the values of the computational molecule are given by i,i,k a = Pi+U, + Pi-U,k k + 9i,,-+i,k + % -\,k + r 3 ti,j,kh bij,k = -9i,j-i,k Ci,j,k = -Pi-i,j,k fiJ,k = j i t M L + r . iihk _ 2~ ' Si,j,kh 2~ ' 6 i . ti,j,kh ' ' ~ q ~ P 3 k i,i+k*-r j 9ij,k - i,j,k-L--Yr , i,j,kh 2 s = '+2-.i' - r i i j M _ 2 f c i L " + -2- . â€¢ V' > li If one uses upwind schemes, then the type of scheme depends on the sign of the convective terms. Assuming that s, t and v do not change sign in the domain, for each of them which is 37 Chapter 2. Cyclic Reduction positive one would use the backward scheme, and forward scheme would be used in the negative case. Discretizing using backward differences yields ai,j,k P i,j,k = + Pi-\,j,k i+ +Uj,kh + %3+\,k + Vij h ]bij >k t k % -\,k + =-q i _i d t -t k Ci,j,k = -Pi_Lj h J r i j i i k ; h s : k -i ~ w ,j,kh r i,j,kh s ;,j,fc-i + = -q itjtk dij = ik ; t + e ; ~ i,i,kh )k fi,3,k = - r + i,j,k+i 3 -p gi,j,k = - r L i i j + \_- i + i t j M ; j k ; k , i_ '(2.14) and for forward differences the above needs to be modified in an obvious manner. A s opposed to the constant coefficient model problem [Eq. (1.20c)], now the components of the c o m p u t a t i o n a l molecule do depend on the associated coordinates of the gridpoints. Nevertheless, the sparsity structure of the reduced m a t r i x is identical in both cases. T h e entries of the m a t r i x which are affected by the elimination are specified below. column: ( i , j , fe â€” 1) row (i,],k-l) a _i row (â€¢.j+l.fc) (i-l,j',fc) (i+l,j,fc) i,j i-l,j,fc Â°i,J-l.l= a ei.j.fc (t, J + 1, fc â€” 1) fi,j,k-l <=i,j,fc (i â€” 1, j , fc â€” 1) i,j,k-l t,j> f > (t + 1, j , fc â€” 1) 'J,k-l e fe) d di,j,k-\ c i.i." 9 Â« , J , * + l- '.J. i,j-l,* /i,J,fc + l f e 0 i (Â«, j - 1 ,.fc - 1) '-i. f c + 2, fc) i,j,k-l b fi,j+l,k e row (t - l . j ' . f c ) i + l,J.* e 1) (e, >, fc â€” 2) row (t, j + 1, c fc) /i.j.fc i-l.i,fc d Â° ' + l,J.'= row + i,k b a - 1, row ( Â« , j , f c - 1) (i.j'.fc) i,j+l,k row ( i + l , J , f c ) column: (i,j,fc+l) a row ( i - l . J . f c ) row ( i , j , f c + fe) 3iJ,k-l + l,fe) row ( i , j â€” 1, iijik i,i+i,k li-l,j,k row (t + 1, j , fe) fi+lj,k row ( i , j - 1, fe) row (i, j, k + 1) row (t, fe) column : row (t, j, ( t - l , j + l,fc) (t + 1, j + 1, row (i,j+l,k) c i , j + l,fc d (i â€” 2, j , fc) (i + 2 , j , f c ) (.-l,j-l,fe) â€¢ (Â« + 1, j - 1, fc) i,j-fl,fc row (Â« â€” l , j , f c ) row (t + 1, row fc) fc â€” 1) c i-l,j,k fc) l>i-l,j,fc i + U,k i + d - l,fc) b Ci,j-l,k row (t, >,fe+ 1) row (t, J , fc) 38 l,j,k di,j-\,k Chapter 2. Cyclic Reduction column : (i + 1, j , fc + 1) (Â» â€” 1, j , fc + 1) (Â», j â€” 2, fc) ( . , j + l , f c + l ) row (t, j, ( i , j - 1, fc + 1) row (i, j + 1, fc) 9 i , j + l,fe row ( i - 1, j, fc) 9i-l,j,fc row (. + 1, j, fc) row ( i , j - 1, k) 9i+i,j,k i>i,j-i,t row row ( i , j , fc) 9i,j-l,k fc+l) c i,j,fc + l d >,j,fc + l b i,j,fc + l Let i? denote the reduced discrete operator, after scaling by h . 9i,i,fc+i F o r an interior gridpoint, 2 (ih,jh, (i,j,k + 2) 1) fc â€” kh), the operation of R is given by the difference equation 5 u . _ l , ., ^ h l , h fi,j,k9i,j,k-l ^t,j,fc i+l,j,fc _ , + e i > i f e ej,j,kfi,i+l,k a l,fc , 9i,j,kCj,j,k + l \ _^,-,j,fcgt+l,,,fc , 9i,i,kdj l _ l i ; + _ bij- l l k bij _ '.J+i,fc i,.),* e a (Cj,i, bi-\,j, k i , j + l,fc , . . . , â€ž , ' J _ ' , , k (bi,j,k9i,j-l,k 9i,j,k9i,j,k+l k , + x A,j,fc-i/,-,j,fc \ â€ž. e t,j,k â€” l a t ) J i k ; liM /..j.fc/.j.fc-i (fi,i,kCj,j,k-l { a _ u /Cj,j,k9i-l,j,k 1 ' y ^ , , f c / , i , , f c N, ij lk \ a-i,j,k+\ \ . ^ t,3+l,k-l ! J + . _(d j - Ji j k a 9i,j,kfi,j,k+l a-i,j-\,k _( i,i,k-lfi,],k I a; _, i,j+l,k a bj,j,kei,i-\.,k a-i + l j . f c ei,j,kbj,j+i,k i-\,3,k a _ c Ci,j,kdj-i,j,k i,3,k-l k c;,.;,fc/i-l,. ,fc Â»i-i,j,fc \ ^ fc;,j,fc/i,,-i,fc N ? bi,j, Ci,i-l,k \ 9i,j,kb , k , i J t k + l Â».._,., ^ - i . ^ " 1 i..i.k i+i,j.k d a d t+l,j/,fc Ci,i,kCi-\,j,k , J . ' . / n i r t Notice t h a t E q . (2.15) involves significantly more floating point operations compared to the seven-point operator. T h e point of c o m p u t a t i o n a l work involved in constructing the reduced m a t r i x is to be discussed in C h a p . 5. 2.3 Properties of the Reduced Matrix T h e reduced m a t r i x S = E â€” DB~ C l is the Schur complement m a t r i x of the original m a t r i x A. Schur complements arise in many applications. In particular, we mention the class of decomposition domain techniques [22]. These are techniques which are suitable for irregular (e.g. L- shaped) domains: the d o m a i n is divided into "simple" (e.g. rectangular) subdomains, where the 39 Chapter 2. Cyclic Reduction partial differential equation can be easily solved. The difficulty is that for some partitionings, between each two subdomains there is an "interface" layer, of unknowns that belong to both subdomains (see e.g. [95]). The submatrix associated with the interface unknowns is the Schur complement and is typically a much smaller matrix than the submatrices associated with unknowns that belong to the subdomains. If the solution for the interface unknowns is obtained first, by solving for the Schur complement, then it is easy to find a solution for the rest of the unknowns. This approach is used in several areas of applications. For example, in cut-loop techniques for multibody systems simulation (see [7] and references therein). Schur complements have many useful properties (see, for example, [27]). In the context of cyclic reduction, we are interested in seeing whether numerical properties of the unreduced matrix are preserved, or improved, after one step of cyclic reduction is performed. The results in Lemmas 2.1-2.4 below are not specific to the particular class of problems discussed in this thesis and are well known (see, for example, [95], for some of the results presented below); the discussion and results following after these lemmas are specific to our reduced system. L e m m a 2.1. n o n s i n g u l a r P r o o f . a s // t h e o r i g i n a l m a t r i x A i s n o n s i n g u l a r , t h e n t h e S c h u r c o m p l e m e n t m a t r i x S i s w e l l . The block-LU decomposition of A is (2.16) â€¢ thus S must be nonsingular. L e m m a 2.2. P r o o f . I f t h e m a t r i x A i s s y m m e t r i c p o s i t i v e d e f i n i t e , t h e n s o i s S . It is straightforward to show that (2.17) and thus, if A is symmetric positive definite, so is S . 40 â€¢ Chapter 2. For Cyclic Reduction M-matrices (see Defn. 1.3), we have the following result (the proof can be found in [45] or [9, T h m . 6.10]): L e m m a 2.3. As I f a m a t r i x i s a n M - m a t r i x , t h e n I f A i s s y m m e t r i c p o s i t i v e d e f i n i t e , o t h e r B ~ w o r d s , t h e m a t r i x S i s b e t t e r From the proof of Lemma 2.2 P r o o f . T t h e a s s o c i a t e d S c h u r m a t r i x . 1 C . Since S = E â€” C T B _ 1 C , t h e n (2.18) 2 c o n d i t i o n e d t h a n A. is symmetric positive definite. Hence so are B B ~ x and must be symmetric positive definite, and thus E ||S|| <||Â£||2<||A|| . 2 On c o m p l e m e n t < K {A) . K2{S) C i s far as conditioning is concerned, we have: L e m m a 2.4. I n s o (2.19) 2 the other hand, A = / \ B ~ x + B - - S C 1 - l S D ~ B D 1 - x B - 1 -B^OS" S ~ l 1 \ (2.20) j so we have 11^â€” ||2 < X From E q . (2.19) and (2.21) the result stated in the lemma readily follows. (2-21) â€¢ Moving to the nonsymmetric case, we first note that, as already mentioned in the Introduction, the first step of cyclic reduction is attractive mainly because the matrix to be inverted is diagonal. In fact, matrices that arise from discretization of differential equations, using threepoint, five-point or seven-point operators, all belong to a special class of matrices, defined as follows [67, Â§9.2]: 41 Chapter 2. Cyclic D e f i n i t i o n 2.1. Reduction Let T be partitioned into q ( T is said to have nonempty Tij subsets ^ 0 and block Property SR and SB of i ^ j , then i Â£ SR Ti,i (2.22) = y T submatrices: 2 J-q,l â€¢â€¢â€¢ J-q,q ) A with respect to this partitioning {1,2,..., and j Â£ SB q} such that SRU or i Â£ SB SB if there exist two = {1, â€¢â€¢â€¢,<?} and such disjoint that if and j Â£ SR. . Property A was first defined by Young [116], and was later generalized by Arms, Gates & Zondek [4], and by Varga [112]. The matrices that are associated with the three-point, fivepoint and seven-point operators, all have Property A with respect to l x l matrices; showing it is easy: the two subsets SR and SB mentioned in Definition 2.1 correspond to the set of red gridpoints and the set of black gridpoints when red/black ordering is used. A matrix that corresponds to any arbitrary ordering of the unknowns can be transformed by a symmetric permutation to the matrix corresponding to red/black ordering, and thus it has Property A. The question whether a matrix has Property A or not, is very significant in the convergence analysis of stationary methods, in particular the SOR scheme [117]. We will discuss this in Chapter 4, where we will also address the question in what circumstances the reduced matrix has Property A, and relative to which partitionings. In Fig. 2.6 plots and histograms of the eigenvalues of both the unreduced and the reduced systems are depicted, for the symmetric positive definite case corresponding to Poisson's equation -Au = f (2.23) discretized on a 12 x 12 X 12 grid. In the histograms, on the rc-axis we have the eigenvalues, and the y-axis is their number. The eigenvalues of the unreduced system are explicitly known 42 Chapter 2. Cyclic Reduction in analytical terms and are distributed between 0 and 12. The maximal eigenvalue is close to 12, and the minimal eigenvalue is close to 0. On the other hand, the reduced matrix has all its eigenvalues between 0 and 6, and it is evident either by the histogram of by the moderate slope of the graph for the eigenvalues (at the region of the largest ones), that the majority of the eigenvalues are between 5 and 6. Also, the smallest eigenvalues of the reduced matrix are more isolated compared to the unreduced matrix, and are larger than the smallest eigenvalues of the unreduced matrix. In this particular case the minimal eigenvalues of the unreduced and reduced matrices are 0.174 and 0.344 respectively. The condition number of the unreduced matrix is larger by a factor of 3.885. Since the rate of convergence of Krylov space methods strongly depends on the condition number of the matrix and on clustering of the eigenvalues, the above observations lead to the conclusion that convergence of the reduced solver will be faster. (a) unreduced (b) reduced Figure 2.6: eigenvalues of both systems for Poisson's equation on a 12 X 12 X 12 grid Similar observations apply to the general nonsymmetric case; when convection is moderate in size, the structure of eigenvalues is very similar to the structure presented in Fig. 2.6. When 43 Chapter 2. Cyclic Reduction convection dominates, the situation is a bit different, but still, the reduced m a t r i x is better conditioned. In F i g . 2.7 we refer to a more difficult problem: the singularly perturbed equation - e A u + V u â€¢ VIA = / , where v = \ [ { x - \ ) 2 (y- |) + 2 + ( z - \ ) 2 (2.24) and the d o m a i n is the unit cube (0,1) X (0,1) X (0,1). ] U p w i n d differencing is used to discretize the problem on a uniform g r i d . T a k i n g e = 0.02 on an 8 x 8 x 8 grid, we get t h a t the spectral condition number of the unreduced m a t r i x is K = 2, 292 2 with o m i n = 0.0098 ; â€” 759, w i t h K (R) 2 c r m i a n m a x = 22.46 = 0.0195 ; c r and the condition number of the reduced m a t r i x is m a x = 14.80 . In F i g . 2.7 we give the histogram of the singular values of the m a t r i x for this particular problem. (a) unreduced (b) reduced F i g u r e 2.7: singular values of both matrices for E q . (2.24) ( 8 x 8 x 8 grid) A n o t h e r point of interest, is the circumstances in which the reduced m a t r i x is an irreducibly diagonally d o m i n a n t M - m a t r i x . Here, in a way which is similar to [40, C o r . 3] we can obtain the following useful result: Lemma d o m i n a n t P r o o f . 2.5. I f b e , c d , f g > 0 t h e n b o t h t h e u n r e d u c e d a n d t h e r e d u c e d m a t r i c e s a r e d i a g o n a l l y M - m a t r i c e s . Consider the unreduced m a t r i x . F o r the m a t r i x to be an M - m a t r i x , a necessary condition is t h a t the off-diagonal elements must be nonpositive, which is readily obtained i f be, cd, fg > 0. It is straightforward to see, by substituting the values of the c o m p u t a t i o n a l molecule in (1.17), 44 Chapter 2. Cyclic Reduction (1.18) or (1.19), that the unreduced m a t r i x is diagonally dominant and irreducible. T h u s by T h e o r e m 1.1 the unreduced m a t r i x is an M - m a t r i x . A s for the reduced m a t r i x , consider a row associated w i t h an interior point: diagonal dominance is translated to the requirement that a 2 - 2be - 2cd - 2fg > b + c + d 2 2 2 + e + f 2 2 + g 2 + 2\bf\ + 2\cf\ + 2\df\ + 2\ef\ +2\bc\ + 2\bd\ + 2\ce\ + 2\de\ + 2\bg\ + 2\cg\ + 2\dg\ + 2\eg\ . (2.25) If (2.25) holds, then diagonal dominance of rows which correspond to points close to the boundary also holds, as in this case the associated diagonal entry is larger. E q . (2.25) is identical to a 2 > (H + |c| + | d | + H + | / | + \g\) , which holds because the unreduced m a t r i x is diagonally 2 d o m i n a n t , by L e m m a 2.3 the reduced m a t r i x is also an M - m a t r i x . â€¢ T h e results mentioned in this section are i m p o r t a n t in the sense t h a t if the original m a t r i x has valuable numerical properties, then so does the reduced m a t r i x ; the procedure of cyclic reduction does not damage properties such as diagonal dominance or positive definiteness. 45 Chapter 3 O r d e r i n g Strategies T h e question of ordering of the unknowns is of importance for both iterative and direct solvers, as a good ordering strategy can lead to significant saving in c o m p u t a t i o n a l work. A m o n g books t h a t address this subject we mention George & L i u [52] and Duff, E r i s m a n & R e i d [35]. A m o n g the m a n y papers that deal w i t h effects of orderings on preconditioned iterative methods (in particular K r y l o v subspace solvers) we mention [14],[23],[29],[31],[36],[37],[89]. T h e r e are several possible "guidelines" for picking orderings. P o p u l a r ones are orderings for s m a l l b a n d w i d t h , which a t t e m p t to minimize the profile of the m a t r i x (e.g. C u t h i l l - M c K e e [28] or Reverse C u t h i l l - M c K e e [52]), or orderings which attempt to m i n i m i z e the amount of fill-in, i.e. the number of entries t h a t were initially zero and become nonzero in the process of e l i m i n a t i o n (e.g. M i n i m u m Degree [108]). A convenient way to obtain such orderings is by w o r k i n g w i t h the graph of the m a t r i x [52]. O r d e r i n g strategies which consider the graph of the m a t r i x as well as its values are M D F [29] and T P A B L O [83]. In seeking an effective ordering technique for our reduced m a t r i x , our purpose i n general is t o choose a strategy that results in a well-structured narrow-banded m a t r i x . In a d d i t i o n , we are interested in ordering the unknowns so t h a t the main diagonal blocks of the m a t r i x are as dense as possible. Some motivation for t h a t can be given, for example, by considering the following result due to V a r g a [113, T h m 3.15]: Let A = Mi -Ni = M2 N be two regular splittings of A , where A" 1 2 46 > 0. If N 2 > N x > 0, Chapter 3. Ordering Strategies equality excluded, then 0 < piM^Nx) < p(M ~ N ) 1 2 2 < 1 (3.1) O n the other hand, dense blocks mean more c o m p u t a t i o n a l work when block iterative solvers are considered. F o r some of these solvers the block diagonal part of the m a t r i x needs to be inverted in each iteration. There is a clear tradeoff between the number of nonzero diagonals t h a t belong to the block diagonal part of the m a t r i x and the amount of work required to invert this s u b m a t r i x . O f course, the term "block diagonal part" is very loose, as it is determined by the user (taking things to extreme, the user could decide t h a t the b a n d w i d t h of the "block diagonal part" of the m a t r i x is equal to the m a t r i x b a n d w i d t h ) . Here we shall adopt a strategy of referring only to the block diagonal part of the m a t r i x whose b a n d w i d t h does not depend on the m a t r i x size, and a t t e m p t i n g to group as many nonzero diagonals as possible in this part of the m a t r i x . Inversion of the diagonal block would then still require only a number of floating point operations which is proportional to the number of unknowns, and at the same time a substantial part of the m a t r i x is to be included in its block diagonal part. E x a m i n i n g the sparsity pattern of the lexicographically ordered m a t r i x ( F i g . 2.5) reveals i the flaws of this ordering: it is not suitable for the reduced grid, as it does not take into account the structure of the grid, that is, the fact that the red points are now missing. In a d d i t i o n , it does not take into account the special structure of the c o m p u t a t i o n a l molecule of the reduced operator. Since the stencil is not compact, and contains gridpoints from five parallel planes, an ordering strategy which numbers unknowns from more than one plane at a time might be required. A n o t h e r major point of consideration should be the ability to parallelize the solution procedures: the matrices considered here are very large due to the number of space dimensions, and their block structure might lend itself to parallelism. Below we present ordering strategies which can be applied to any three-dimensional problem. T h e idea is to divide the unknowns into one-dimensional and two-dimensional sets, and 47 Chapter 3. Ordering analyze the Strategies o r d e r i n g not only relative to the three-dimensional g r i d but lower d i m e n s i o n a l grid of sets of 3.1 also relative to the gridpoints. Block Ordering Strategies for 3D Grids D e n o t e the n u m b e r of u n k n o w n s by N and suppose t h a t a certain g r i d p o i n t ' s i n d e x I, 1 < Â£ < is associated w i t h c o o r d i n a t e values x, y, z. W e denote its coordinate indices by i = | , j N, = k â€” I", and refer to sets of coordinate indices s i m p l y as " c o o r d i n a t e sets". T h e c o o r d i n a t e indices are integers a s s u m i n g values from 1 to n. Here n is the same as i n the previous chapters. D e f i n i t i o n 3.1. ordinate set exactly S one a ID m be 0(n)-item m of the gridpoints of the to which D e f i n i t i o n 3.2. set we say independent contains m and \JS = m indices, {1,..., variable is associated be 0(n )-item disjoint 2 m contained variables, = { 1 , . . . , N}. m oriented, are associated x-z with Each oriented, all integers from cofor We each of the all integers call y-oriented or from 1 to n. such that the indices, all integers set T from oriented, sets z-oriented, coor- 1 to n for is a 2D block of m or y-z the 1 to n N}. contains m that from sets of clustered in each set T and (J T x-y with such all integers and we say that this block is x-oriented, Let {T } that it is either sets of clustered in each set S variables, independent variables disjoint contained independent of the gridpoints of the independent and {S } block of gridpoints, according dinate Let according 1 to n in the coordinate two gridpoints, to which two set. N o t i c e t h a t from the above definitions it follows t h a t the I D blocks do not necessarily form single lines, and the 2 D blocks do not necessarily form single planes. B e l o w we define a 2 D block g r i d and a block c o m p u t a t i o n a l molecule. D e f i n i t i o n 3.3. oriented, where fined as a {S , S3> = 2 S\ is one 2D block grid the one-dimensional blocks associated of {x, a three-dimensional y, z}. of S\-oriented Then ID blocks. The with a certain grid variables ordering of points associated with can are be this grid s\redeare {z,y,2}\{si}. D e f i n i t i o n 3.4. defined Suppose as the For a certain computational given gridpoint, molecule in the its associated corresponding 48 block computational 2D block grid. That molecule is, its is com- Chapter 3. ponents are all the sets of ID blocks point Ordering Strategies computational molecule which associated contain at least one gridpoint with the 3D which belongs to the problem. U s i n g the above definitions, we c a n now easily define variants of block orderings: D e f i n i t i o n 3.5. dimensional A certain blocks 3D ordering are ordered strategy is called in the corresponding natural 2D block grid block ordering using if the natural one- lexicographic ordering. D e f i n i t i o n 3.6. A certain dimensional blocks D e f i n i t i o n 3.7. dimensional 3D ordering are ordered A certain strategy in the corresponding 3D ordering blocks are ordered is called red/black 2D block grid strategy is called in the corresponding block ordering using toroidal block 2D grid using if the red/black block ordering toroidal one- ordering. if the ordering one- strategy Ul]. In F i g . 3.1 we illustrate the above-mentioned orderings. See [41] for an explanation on t o r o i d a l ordering and its advantages. 13 14 15 16 15 7 16 8 13 2 7 12 9 10 11 12 5 13 6 14 9 14 3 8 5 6 7 8 11 3 12. 4 5 10 15 4 1 2 3 4 1 9 2 10 1 6 11 16 (a) natural (b) red/black (b) toroidal F i g u r e 3.1: orderings of the 2 D block grid. E a c h point i n these grids corresponds t o a I D block of gridpoints i n the underlying 3 D grid H a v i n g established the general idea, we now consider particular families of orderings for the reduced g r i d . Various families can be determined according to the way their I D and 2 D sets of gridpoints are picked. 49 Chapter 3.2 3. Ordering Strategies The Family of Two-Plane Orderings T h i s ordering corresponds to ordering the unknowns by gathering blocks of 2n gridpoints from two horizontal lines and two adjacent planes. In F i g u r e 3.2 three different members of the natural two-plane ordering are depicted. F o r notational convenience, we label this family by " 2 P N " ( " 2 P " stands for two-plane, and " N " stands for natural). Similarly, we name the r e d / b l a c k family and the t o r o i d a l family by " 2 P R B " and " 2 P T " respectively. T w o a d d i t i o n a l letters are added in order to distinguish between members of the family: for example 2 P N x y is the ordering associated w i t h I D blocks of gridpoints in x-direction and 2 D blocks in x-y direction, and so o n . (a) 2PNxy (b) 2PNxz (c) 2PNyz F i g u r e 3.2: three members in the family of natural two-plane orderings Let us illustrate the definitions presented in the previous section by examining the ordering 2 P N x y : the basic one-dimensional sets are 2n-item sets: for the specific case of F i g . 3.2(a), which corresponds to n = 4, indices 1-8, 9-16, 17-24 and 25-32 are each a one-dimensional set which forms a "stripe" of size n X 2 x 2. T h u s the I D sets are z-oriented. N e x t , the sets 1-16, 17-32 form an n x n x 2 shape, thus the 2 D sets are x-y oriented. T h e block grid is thus a 2 D grid of x-oriented I D sets of gridpoints, associated w i t h the independent variables y and z, and if we associate y w i t h rows and z w i t h columns, then the I D sets are ordered in a natural lexicographic rowwise fashion. 50 Chapter 3. Ordering Strategies In Fig. 3.3 we depict the two-plane red/black and toroidal families of orderings. The combination of Fig. 3.1, Fig. 3.2 and 3.3 clarifies how the ordering for a larger grid is done. 25 29 (a) 2PRBxy (b) 2PTxy Figure 3.3: red/black and toroidal two-plane ordering corresponding to x-y oriented 2D blocks We now specify the entries of the matrices for the two-plane family of orderings. We arbitrarily pick one member of this family, 2PNxz; any other family member's specific entries can be deduced simply by interchanging the roles of x, y and z or by changing the indexing of the blocks in an obvious manner. The matrix can be written as an ^-order block tridiagonal matrix S â€” trifSjj-i, S j j , S j j + i ] . (3.2) Each of the above components of S is an n x n matrix, block tridiagonal relative to 2n X 2 2 2n blocks. Let us use superscripts (â€”1), (0) and (1) to describe subdiagonal, diagonal, and superdiagonal blocks of each of the block matrices, respectively. Denote by 'cntr'.the center of the computational molecule, specified by (2.6) and the explanation that follows. The diagonal submatrices 5 h a v e nonzero entries in diagonals -4 to 4, and their diagonals 51 Chapter 3. Ordering Strategies are given as follows : - 2 c / â€¢ Â£"10, -2ce â€¢ Â£1001 - 26c â€¢ Â£ n o , - 2 c # â€¢ E [-c , 2 0 c n t r , -2bg â€¢E o w o - 2df â€¢ E w - 2eg â€¢ E oi, 01 -2bdE no 00 - 2e/ â€¢E - 2de â€¢ E 0 w , l 0 0 1 o o - 26/ â€¢ Â£ oio, 0 -2dg â€¢E 0 1 , -d ] 2 . (3.3) T h e other submatrices contained i n Sjj are of an irregular structure. T h e superdiagonal subm a t r i x Sj j L has nonzero entries in diagonals -3 to 1, as follows: [-2cg-E , w 0, -2egEltm and the subdiagonal s u b m a t r i x S j h a s [ - 2 c / â€¢ Â£cn, - / 2 - 2bg â€¢ E io, 0 1 0 0 -2dg , â€¢ E i] . 0 . (3.5) ; (3.6) â€¢E 2 + 1 SJjl^ 1 0 0 1 (3.4) are given by: = p e n t a [ - 2 c e â€¢ Â£0110, - 2 e / â€¢ Â£0010, - e , -2eg = penta[-26c â€¢ E w 0 owo S$ , â€¢E] - 2 e / â€¢ Â£ ^ o i , 0, -2df SJJ^ = diag[-2e/ â€¢E ] SjÂ°j+i 2 nonzero entries i n diagonals -1 to 3, as follows : , -26/â€¢E T h e components of 5j,j+i and SJJ-I - g 00 = diag[-2e o w o , ; -2deE } 0110 (3.7) â€¢ Â£ oio] ; (3-8) = d i a g [ - 2 6 / â€¢ Â£0001] ; (3-9) , - 2 6 / â€¢E , 5 0 - 6 , -2bg â€¢ Â£ o o i , -2bdE i] 2 100Q 0 = diag[-26Â£f â€¢ E ] 1000 â€¢ WO ; (3.10) (3.11) T h e sparsity pattern of the m a t r i x is depicted in F i g . 3.4(a). H a v i n g specified the entries of the m a t r i x , we can now find the block c o m p u t a t i o n a l molecule corresponding t o the twoplane ordering. Refer t o the n a t u r a l ordering depicted i n F i g . 3.2. E x a m i n i n g the 2n-item blocks of the ordering, we get a block c o m p u t a t i o n a l molecule of the form depicted i n F i g . 52 Chapter 3. Ordering 0 Strategies 50 100 150 nz = 3760 200 250 0 (a) natural 50 100 150 nz = 3760 200 250 (b) 2D red/black F i g u r e 3.4: sparsity patterns of two members of the two-plane family of orderings 3.5(c). F i g . 3.5(a) and 3.5(b) illustrate what sets are associated w i t h a single gridpoint. A s is evident, the structure depends upon the parity of one of the gridpoint's indices. T h e block c o m p u t a t i o n a l molecule is obtained by t a k i n g the union of the 2ra-item sets associated w i t h each of the gridpoints in the block, and is identical to the c o m p u t a t i o n a l molecule of the classical nine-point operator. Revealing this structure allows one to learn about the cyclically reduced operator (as a block operator), by what is known about the 9-point operator. F o r example, it is clear t h a t the reduced m a t r i x does not have block P r o p e r t y A relative to p a r t i t i o n i n g into 2n x In blocks, (discussion of this is t o come in Sec. 3.4.) X X X xâ€”xâ€”X X (a) even k â€” ^ X X xâ€”xâ€”X X X X (b) even k = j r X X xâ€”xâ€”X X X X (c) c o m p u t a t i o n a l molecule F i g u r e 3.5: block c o m p u t a t i o n a l molecule corresponding to the family of orderings 2PN T h e 9-point operator gives rise t o matrices for which parallelization can be obtained by using four-color schemes [57],[95]. In general, multicoloring techniques are useful for parallel c o m p u t a t i o n [95]. Here we consider four-color ordering of I D sets of gridpoints, and the resulting 53 Chapter 3. Ordering Strategies m a t r i x for the two-plane ordering is illustrated in F i g . 3.6. Notice t h a t the main blocks are narrow-banded block diagonal matrices, and thus are easy to invert. \ ' \\\ \ ' v \ \ v\ v V\ v\ v \\ x\ 0 200 V V \\ \\ \ \ \ \ \ 400 600 BOO 1000 1200 1400 1600 1BO0 2000 (12.3+400 F i g u r e 3.6: four-color I D block version of the two-plane ordering (the m a t r i x is of size 2048 X 2048, obtained by discretization on a 16 X 16 X 16 grid) T h e same ideas can be applied to 2D blocks of gridpoints. T h e block grid in this case is one-dimensional. In F i g . 3.4(b) the sparsity pattern of a 256 X 256 m a t r i x corresponding to r e d / b l a c k ordering based on 2 D blocks is depicted. A s in the c o m m o n cases of r e d / b l a c k ordering for problems w i t h a smaller number of space dimensions, the b a n d w i d t h of the r e d / b l a c k m a t r i x is larger t h a n the one corresponding to natural ordering. 3.3 The Family of Two-Line Orderings T h e two-plane ordering, presented in the previous section, is unique to three-dimensional problems. W e now consider an alternative: a straightforward generalization to 3 D of the two-line ordering for 2 D problems used by E l m a n & G o l u b in [41]. In this ordering the numbering is done i n pairs of horizontal lines that lie in a single plane. T h e ordering and sparsity structure of the m a t r i x corresponding to the n a t u r a l two-line ordering 2 L N x y are depicted in F i g . 3.7. Other members of the family can be obtained in a straightforward manner. Here the I D blocks are sets of n grid points, and each of the 2 D blocks corresponds to a single x-y plane. T h e reduced m a t r i x for this ordering strategy can be w r i t t e n 54 Chapter 3. Ordering Strategies V-. %\ . . 'â€¢â€¢v-.vvs'-v- (a) ordering (b) sparsity pattern F i g u r e 3.7: ordering and the sparsity pattern of the m a t r i x associated w i t h the 2 L N x y ordering strategy as a block pentadiagonal m a t r i x : S - p e n t a , [ S j t j - 2 , S j j - i , S j j , S j t j i , S j j + + 2 ] â€¢ (3.12) E a c h S i j is ( n / 2 ) X ( n / 2 ) , and is a combination of ^ uncoupled matrices, each of size n X n. 2 The 2 diagonal matrices {Sjj} are themselves block tridiagonal. E a c h s u b m a t r i x is of size n x n and its diagonal block is sS2 =t penta(-c , 2 - 2ce â€¢ E i o , cntr, -2bd â€¢ E - 2 6 c â€¢ EQI [ p e n t a ( - c , - 2 6 c â€¢ E w - 2ce â€¢ EQI, cntr, - 2 6 d â€¢ E For 2 - 2de â€¢ E i o , -d ) 2 0 1 - 2de â€¢ EQI, -d ) 2 w j odd j even the superdiagonal and the subdiagonal blocks of the matrices Sjj we have the following irregular t r i d i a g o n a l structure, which depends on whether j is even or o d d . T h e superdiagonal matrices are given by ( -e 2 -2de 0 2 -2ce c(l) \ -2de _ (3.13) -2ce -2de _â€ž2 55 Chapter 3. Ordering Strategies if j is o d d , or -2ce â€” e -2de 2 -e c(i) 2 _ (3.14) -2ce - e l if j is even. T h e s u b d i a g o n a l matrices are / -6 2 -26c -6 -2bd 2 -6 2 o(-i) (3.15) ^3,3 -6 2 -26c -6 2 if j is o d d , or I -b 2 -2bd -b 2 -26c S . (-i) -6 2 -2bd _ (3.16) 3,3 -26c -6 2 -2bd -b if j is even. 56 2 Chapter 3. Ordering Strategies T h e superdiagonal and the subdiagonal blocks of S , Sj jÂ±i, are block tridiagonal: t diag(-26/.f?oi) - < j odd [ diag(-26/ â€¢E ) j w c(0) t r i ( - 2 c / , -2bf-E -2ef-E , 10 tri 01 2c/, - 2 6 / â€¢ E - 2ef â€¢ E 01 S$- = { ^ t - ^ ' w -2df) j odd -2df) j , ^ i diag(-2e/ â€¢Â£ i ) + H ' diag(-26fif â€¢ E ) d i a g ( 2 6 5 E o i ) J _ J tri(-2c<7, -2bg â€¢ E d (3.19) d d d ( - 2eg â€¢ E i, -2dg) j odd 01 - 2eg â€¢ E , -2dg) j even 0 10 diag(-2e# â€¢ E ) j odd w \ [ diag(-2e# â€¢ E ) j 01 3.20) even w [ t r i ( - 2 c # , -2bg â€¢ E ^ 3,3 + 1 ~ o j w o(0) 0 (3.18) even J even 0 4 ; (3.17) even (3.21) (3.22) even F i n a l l y , the matrices 5 j j - 2 and Sj j+? are diagonal: t Sjj-2 = d i a g ( - / 3 = 3, â€¢ â€¢ . , n ; 2 j = 1,.. . , n - 2 Sj, +2 = d i a g ( - ) 2 3 5 (3.23) (3.24) T h e block c o m p u t a t i o n a l molecule corresponding to the two-line ordering relative to splitt i n g into I D blocks is presented in F i g . 3.8. 57 Chapter 3. Ordering Strategies III I X I W xâ€”xâ€”x-xâ€”x xâ€”xâ€”x-x-x X X X X ! (a) xâ€”xâ€”x-x-x / I \ even k = | (b) odd k = \ X X /l\ X X X (c) c o m p u t a t i o n a l molecule F i g u r e 3.8: block c o m p u t a t i o n a l molecule corresponding to the ordering strategy 2 L N x y 3.4 Comparison Results In order to compare the various orderings, we consider stationary methods, for which there are some useful general results [113] which make it possible to relate the orderings to the rate of convergence. V a r i o u s splittings are possible. W e consider two obvious ones, based on dimension: the term "one-dimensional s p l i t t i n g " is used for a splitting which is based on p a r t i t i o n i n g the m a t r i x into 0 ( n ) blocks (2n x 2n blocks for the two-plane ordering and n x n blocks for the two-line ordering). A "two-dimensional s p l i t t i n g " is one which is based on p a r t i t i o n i n g the m a t r i x into 0 ( n ) blocks ( n x 2 2 n 2 blocks for the two-plane ordering and x ( n / 2 ) blocks 2 {n /2) 2 for the two-line ordering). In other words, the I D (2D) splitting for both ordering strategies is essentially associated w i t h I D (2D) sets of gridpoints. T h e partitionings for the two-plane m a t r i x are illustrated in F i g . 3.9. A n i l l u s t r a t i o n of the importance of using ordering strategies which fit the structure of the reduced grid is given in F i g . 3.10: the sparsity patterns of a single n x n 2 2 diagonal block of the two-plane m a t r i x and the m a t r i x that corresponds to point n a t u r a l lexicographic ordering are depicted. T h e matrices in the figure arise from discretization on a 12 x 12 X 12 g r i d . If the two matrices are referred to as block tridiagonal, the b a n d w i d t h of the block diagonal part of the two-plane m a t r i x is significantly smaller. A t the same time, if only diagonals whose indices do not depend on n are referred to, then the two-plane m a t r i x has more of them grouped next to the m a i n diagonal. 58 Chapter 3. Ordering Strategies 10 (a) 20 30 40 50 60 nz=1440 70 80 90 100 0 I D partitioning 10 20 (b) 30 40 50 60 ra-1440 70 BO SO 100 2D partitioning F i g u r e 3.9: possible block partitionings of the two-plane m a t r i x . In F i g . 3.11 the spectral radii of the one-dimensional J a c o b i iteration matrices for both of the two-plane ordering and the natural lexicographic ordering are plotted. T h e graphs correspond to a constant coefficient case, and two cross-sections of the mesh R e y n o l d s numbers are given. A s is evident, the two-plane iteration m a t r i x has a smaller spectral radius. This results i n faster convergence. It is clear, then, t h a t block ordering strategies such as the two-plane ordering can be more effective t h a n n a t u r a l lexicographic ordering of the reduced g r i d . A t this point, we compare the two families of block orderings that have been introduced in the previous sections: the two-plane ordering and the two-line ordering. T h e o r e m 3.1. tionings smaller Proof. If the reduced the spectral radius than the spectral matrix is an M-matrix, of the Jacobi radius iteration of the iteration then for matrix matrix both the associated associated ID and the 2D with two-plane parti- ordering with the two-line is ordering. E a c h of the two ordering strategies generates a m a t r i x which is merely a s y m m e t r i c p e r m u t a t i o n of a m a t r i x associated w i t h the other ordering. Suppose Si â€” M\ â€” i V i is either a I D s p l i t t i n g or a 2 D splitting of the two-plane ordering m a t r i x , and S 2 = M 2 â€”N 2 is an analogous s p l i t t i n g for the two-line ordering, w i t h I D and 2 D blocks of gridpoints oriented in the same direction. There exists a permutation m a t r i x P such t h a t P S P T 2 59 â€” S\. Consider the Chapter 3. Ordering 0 20 Strategies 40 (a) 60 80 nz a 1636 100 120 140 0 20 lexicographic 40 60 80 nz =1636 100 120 140 (b) two â€” plane F i g u r e 3.10: a zoom on 2 D blocks of the matrices corresponding to two ordering strategies of the reduced grid splitting P T SP = P MP T 2 entries t h a t P N P T 2 2 â€”P T N P. It is straightforward to show by e x a m i n i n g the m a t r i x 2 > N . T h e latter are both nonnegative matrices, thus by [113, T h m 3.15] x it follows t h a t 0 < p(M ~ Ni) 1 1 < p(P M^ N P) T = p{M~ N ) l 1 2 2 < 1. â€¢ Some of our numerical experiments, presented i n F i g . 3.12, validate the results indicated in T h m . 3.1. In the figure spectral radii of the block J a c o b i iteration matrices are presented. It is interesting to observe that the spectral radius of the two-plane iteration m a t r i x is smaller also for the case be, cd, fg < 0, which corresponds to the region of mesh R e y n o l d s numbers larger t h a n 1 a n d for which the theorem does not apply. A few cross-sections of mesh R e y n o l d s numbers are examined. F o r example, graph (a) corresponds to flow w i t h the same velocity in x, y a n d z directions. G r a p h (b) corresponds t o flow only i n x and y directions, a n d no convection in the z direction, and so on [see E q . (1.16) for the definition of (3, 7 a n d 6], W e note t h a t the b a n d w i d t h of the two-plane m a t r i x is larger than the b a n d w i d t h s of the n a t u r a l lexicographically ordered m a t r i x and the two-line m a t r i x : n + 2n v s . n . However, 2 2 the difference is negligible, as typically 2n <C n . Smaller b a n d w i d t h c a n be obtained by using 2 direct solver orderings such as R C M (see F i g . 3.13). However, numerical experiments that have been performed w i t h this ordering show slower convergence of block s t a t i o n a r y methods, 60 Chapter 3. Ordering 0 0.1 0.2 Strategies 0.3 0.4 (a) p 0.5 = 0.6 1 0.7 0.B 0, 0 0.1 0.2 0,3 0.4 O.S 0.6 (b) 7 = <5 = 8 0.7 0.8 0.9 = 0 F i g u r e 3.11: comparison of the spectral radii of block J a c o b i iteration matrices for certain cross-sections of the mesh Reynolds numbers. T h e upper curves correspond to lexicographic ordering. T h e lower curves correspond to two-plane ordering. T h e matrices were obtained using centered difference discretization. compared t o the two-plane ordering. N e x t , we consider the issue of consistent ordering. Recall the definition of consistently ordered matrices (see [67],[95],[117]): D e f i n i t i o n 3.8. can A matrix be partitioned and j in the graph is said to be consistently in p sets S i , belong S , â€¢ â€¢., S 2 to two consecutive p with ordered if the vertices the property partitions S k of its adjacency that any two adjacent and S ^ , graph vertices i with k' = k â€” 1 if j < i, and k' = k + 1 if j > i. A m a t r i x t h a t is consistently ordered has P r o p e r t y A ; conversely, a m a t r i x w i t h P r o p e r t y A c a n be permuted so that it is consistently ordered [95]. It should be noted t h a t not a l l consistently ordered matrices for a given problem have the same J o r d a n canonical form [47]. T h i s is of importance, because the size of the J o r d a n block associated w i t h the largest eigenvalue (in magnitude) affects the magnitude of the spectral norm of the iteration m a t r i x ; the latter affects the speed of convergence [47],[113]. It was mentioned i n the Introduction that the unreduced m a t r i x has P r o p e r t y A relative 61 Chapter 3. Ordering Strategies (a) 0 = 7 = 5 (b) p = 7, 8 = 0 (c) 7 = 5 = 0 (d) /3 = 0, 7 = Â£ (e) /3 = 7 = 0 (f) /3 = 5 = 0 F i g u r e 3.12: spectral radii of iteration matrices vs. mesh Reynolds numbers for the block J a c o b i scheme, using I D s p l i t t i n g and centered difference discretization. T h e broken lines correspond to two-plane ordering. T h e solid lines correspond to two-line ordering. to p a r t i t i o n i n g into l x l matrices. F o r the reduced system, we have the following observations: P r o p o s i t i o n 3.1. The reduced Property to ID or 2D Proof. A relative matrix associated with two-line ordering, SL, does not have partitionings. T h i s can be deduced directly from the structure of the block c o m p u t a t i o n a l molecules of the I D a n d 2 D partitionings. Formally, let 5 , j denote the (i,j)th Q be an ( n / 2 ) X (n /2) 2 2 n x n block of SL, a n d let m a t r i x , whose entries satisfy qij â€” 1 i f Sij ^ 0 and g;j = 0 otherwise. L e t T be an n X n m a t r i x , such that i j j = 1 if the (i,j)th ( n / 2 ) X (n /2) 2 block s u b m a t r i x of 2 5 is nonzero, and i j = 0 otherwise. T is a pentadiagonal m a t r i x , thus does not have P r o p e r t y 2 A . Since Q could have P r o p e r t y A only i f T d i d , it does not have P r o p e r t y A either. P r o p o s i t i o n 3.2. The reduced Property to ID Proof. A relative matrix associated with two-plane ordering, Sp, â€¢ does not have partitioning. L e t S,-j denote the (i,j)th 2 n x 2 n block of Sp, and let Q t o be an (n /4) 2 62 x (n /4) 2 matrix, Chapter 3. Ordering Strategies F i g u r e 3.13: s y m m e t r i c reverse C u t h i l l - M c K e e ordering of the reduced m a t r i x (of size 2 5 6 x 2 5 6 ) . whose entries satisfy qij = 1 if 5 ; j ^ 0 and </;j = 0 otherwise. It is straightforward to see t h a t the nonzero pattern of Q is identical to that of the m a t r i x associated w i t h using a 9-point operator for a 2 D grid (indeed, this is indicated by the structure of the block c o m p u t a t i o n a l molecule). Since the latter does not have P r o p e r t y A relative to p a r t i t i o n i n g into l x l matrices, the result follows. â€¢ O n the other hand, we have: P r o p o s i t i o n 3 . 3 . The reduced and moreover - is consistently matrix ordered associated relative with two-plane to 2D ordering, Sp, has Property A, partitioning. Proof. T h e m a t r i x is block tridiagonal relative to this p a r t i t i o n i n g , thus is consistently ordered [117]. â€¢ T h e above results indicate t h a t it will be difficult to analyze the convergence properties of the reduced matrices, as they are not necessarily consistently ordered. In C h a p t e r 4 we will address this point and provide some analysis which overcomes the loss of P r o p e r t y A for the two-plane m a t r i x , relative to I D p a r t i t i o n i n g . 63 Chapter 4 Convergence A n a l y s i s for B l o c k S t a t i o n a r y Methods In this chapter bounds on the convergence rates of the J a c o b i , Gauss-Seidel, and S O R schemes are derived. F i r s t , the constant coefficient case is analyzed, and then the results are generalized to the variable coefficient case. W e consider block splittings, which well suit the block structure of the reduced m a t r i x . In particular, the I D and the 2 D splittings discussed in C h a p t e r 3 are considered. Since the two-plane ordering was shown in C h a p t e r 3 to be more effective than other orderings, we focus on matrices associated w i t h this ordering. W e analyze the block J a c o b i scheme, derive upper bounds on convergence rates, and demonstrate their tightness. W e then discuss consistently ordered matrices [113],[117] i n the context of the reduced m a t r i x , and use our observations to derive bounds for the block Gauss-Seidel and S O R schemes. Finally, convergence results are derived for the unreduced (original) system, and are compared to the bounds on convergence rates for the reduced system. 4.1 Symmetrization of the Reduced System In order to estimate the spectral radii of the iteration matrices involved in solving the reduced system, the following idea of E l m a n & G o l u b [40],[41] is used: denoting the reduced m a t r i x by S, suppose there exists a real diagonal nonsingular m a t r i x Q, such t h a t S = Q~ SQ is 1 s y m m e t r i c . T h e s y m m e t r y can then be used as follows: suppose S = D â€” C is a s p l i t t i n g , such t h a t D = Q~ DQ l is s y m m e t r i c positive definite, and C â€” Q~ CQ l 64 is s y m m e t r i c . Since D~ C l Chapter 4. and D~ C X Convergence Analysis for Block Stationary Methods are similar, it follows that: p(D- C) = p{D- C) x x =-4Â£L < \\D-%\\C\\ 2 . (4.1) ^minyl-J j T h e latter is useful, since the eigenvalues of a symmetric m a t r i x are real, are positive if the m a t r i x is positive definite, and it is easier to compute them (compared to n o n s y m m e t r i c m a trices) . 4.1.1 T h e C o n s t a n t Coefficient Case Consider the model problem (1.15). L e t a, &, c, d, e, /, g be the components of the c o m p u t a t i o n a l molecule [see E q . (1.17), (1.18) and (1.19)]. W e have: T h e o r e m 4.1. formation Proof. if and The reduced matrix S can be symmetrized only if the products bcde, with a real diagonal befg and cdfg are similarity trans- positive. F o r a certain ordering, the corresponding m a t r i x is merely a s y m m e t r i c p e r m u t a t i o n of a m a t r i x which is associated w i t h another ordering. T h u s we can examine the s y m m e t r i z a t i o n conditions for the m a t r i x associated w i t h a particular ordering w i t h o u t loss of generality. W e thus refer to the particular ordering whose associated m a t r i x was described in detail i n C h a p t e r 3, namely 2 P N x z . O u r aim is to find a real diagonal m a t r i x Q , so that Q~ SQ is s y m m e t r i c . l Suppose Q â€” diag[<3i,i, Q 1 > 2 , Q i , f , Q2.i1 Q2,2, Q2.f1 Q f , f ] , where each m a t r i x Q j . / i 1 < j-, I < f 1 is a diagonal 2n X 2n m a t r i x whose entries are denoted by: Q i ^ d i a g ^ , . . . , ^ ] . (4.2) Consider first the diagonal block matrices 5,-Â°-'. W e require t h a t Qj}Si }Qj J be s y m m e t r i c . 0 N o t i c e t h a t for any j, SjÂ°j is a notation that corresponds to ?p submatrices, but i n order to avoid a d d i t i o n a l notation, we do not add a script that indicates the location of these matrices in each block, as it can be understood from the double index used for the submatrices of Q . A straightforward c o m p u t a t i o n for the entries of s\Â°), JiJ 65 1 < j < 77 and for all the Â§ submatrices in ; ^ Â£ Chapter 4. Convergence Analysis for Block Stationary Methods each block S j j , leads to the following conditions O'.O c (%7y) 2 = ^ , 2 i = 5,...,2n, (4.3) *= 4,6,8,...,2n, (4.4) Â°i-4 ( % ) ) Cf = ^ , 2 1i-3 (i-0 ( l i ^ ) 2 = â€” , l < i < 2 n , i m o d 4 = 2 or 3 , (4.5) l < i < 2 n , i m o d 4 = 0 or 1 , (4.6) Â°i-2 a' be {j l) Ii \2 0 C â€ž(i.O ' de Hi-2 c Â° ( i , { U 7 ) ( )2 o ' ( j ) 2 = d J ' _ |I i = 3,5,7,...,2n-l, 1 < i < 2n , ) (4.7) i mod 4 = 2 , (4.8) i mod 4 = 0 . (4.9) bf 0 ( Vn ca ) 2 = â€” , K i < 2 n , T h e restriction that the diagonal similarity transformation be real leads to the following conditions: â€¢ E q u a t i o n s (4.4) and (4.7) i m p l y cdfg > 0. â€¢ E q u a t i o n s (4.5) and (4.6) i m p l y bede > 0. â€¢ E q u a t i o n s (4.8) and (4.9) i m p l y befg > 0. W e choose q ' \ l x q^' \ l 1 < j , / < f arbitrarily, and use E q u a t i o n s (4.7), (4.8) and (4.9) to determine 2 < i < 2 n , 1 < j,l < In so doing, we must make sure that E q u a t i o n s are consistent w i t h E q u a t i o n s (4.7)-(4.9). Indeed there is full consistency. 66 (4.3)-(4.6) Below we show Chapter 4. Convergence Analysis for Block Stationary Methods this for entries whose index, i, satisfies i m o d 4 = 0. T h e same procedure can be done for the other values of i. Since [i - 1) m o d 4 = 3, applying (4.7) for i - 1 (rather t h a n i) a n d (i> ) 1 m u l t i p l y i n g it by (4.9) we obtain ( \ ^ ) = ^ â€¢ ^ = ^ , which is exactly (4.6). N e x t , since 2 1,-2 (i - 2) m o d 4 = 2, we can use (4.8) to conclude t h a t ( -jjjr ) 2 = f- , a n d c o m b i n i n g this 1,-3 (J.O equation w i t h (4.6) we obtain ( %pr ) 2 = | | â€¢|Â£ = 9 , which is identical t o E q u a t i o n (4.4). 1i-3 qii.1) F i n a l l y , (i - 3) m o d 4 = 1 , therefore (4.7) implies ( - ^ f r ) = 2 . M u l t i p l y i n g this by E q u a t i o n Iiâ€” 4 (4.4) yields ( \ ^ ) 2 = ^ â€¢^ = ^ , which is identical to (4.3). T h i s completes the proof of consistency for indices which correspond to E q u a t i o n (4.9). F o r indices which satisfy either E q . (4.7) or E q . (4.8) the process is completely analogous, and the algebraic details are o m i t t e d . T h e same procedure repeats when considering the off-diagonal matrices of the m a i n block diagonals, namely Sj^, and the off-diagonal block matrices SjjÂ±\. the following equation, which determines the ratio between qf' ^ l+1 values of q \ 3 , l + 1 F o r the former we have and q\ \ 3,l thus defines the \ using q\ ^: 3,1 ( L r ) 2 = Z _ 1 ) < l < 2 n . (4.10) Notice t h a t (4.10) establishes conditions for values that were previously considered arbitrary, namely q[ ' \ 3 l I > 1. A t this stage only q[ \ 3,1 1 < j < Â§ are left arbitrary. In a d d i t i o n , we have the following: (In the equations that appear below, I goes from 1 to j â€” 1, a n d j goes from 1 to ^.) ( [ \2 _ ^'-1 ,(i.O ' hf Â°J 1 < i < 2n , \2 'ii li-i i mod 4 = 2 (4.11) ~ eg 3, 5 , . . . , 2n - 1 , dg 67 (4.12) Chapter 4. Convergence Analysis for Block Stationary ) = ~ , u > l ) ( ^rj-rp ) - y 2 g Methods i = 4 6 ...,2n ) 1 < i < 2n , , 1 (4.13) i mod 4 = 0 , (4.14) ) E q u a t i o n s (4.11)-(4.14) can all be obtained from combinations o f the group o f E q u a t i o n s (4.3)-(4.10), thus are consistent and do not impose any a d d i t i o n a l restrictions. N e x t , the off-diagonal block matrices SJJÂ±I are examined. W e start w i t h S j Â° j Â± 1 . The following conditions, for 2 < j < ^ a n d 1 < / < f , need to be satisfied: Ji+U) < ( 7 7 T r2 ) 2 = ? ' l<^<2r^, 1 < i < 2n , Q U + 1 1 { . ' be 1 ) ) = â€” , 2 ; ) 1 < i < 2n , (4.15) i mod 4 = 2 , i m o d 4 = 2 or 3 , (4.16) (4.17) %-2 9,(i.0 ( ; ~ 1 < i < 2n , ef ' ) = â€” , 2 ce 1 < t < 2n , E q u a t i o n (4.15) defines the values of q^ ' \ +1 l i mod 4 = 0 , (4.18) i m o d 4 = 0 or 1 , (4.19) 1 < j < f â€” 1, given q\^' \ Imposing this conl d i t i o n leaves only q^' ^ arbitrary. E q . (4.16)-(4.19) are consistent w i t h the previous equations: 1 68 Chapter 4. Convergence Analysis for Block Stationary Methods â€¢ E q u a t i o n s (4.15) + (4.8) i m p l y (4.16). â€¢ E q u a t i o n s (4.15) + (4.5) i m p l y (4.17). â€¢ E q u a t i o n s (4.15)+(4.9) i m p l y (4.18). â€¢ E q u a t i o n s (4.15)+(4.6) i m p l y (4.19). In order for E q u a t i o n s (4.15)-(4.19) to hold w i t h a real Q, we require bcde, befg > 0. These conditions are contained i n the three conditions that were imposed when E q u a t i o n s (4.3)- (4.9) were discussed. F i n a l l y , the matrices 5^ j Â± \ are considered. F o r these we require t h a t for 1 < J < f : ( ( \2 _ Â°J 1 < 1 < i < 2n , ) = â€” , Cli 2 r i ) 2n , i < mod 4 = 0 i i mod 4 = 2 , I < 2 < (4.20) 1< / < - - 1. C i (4.21) 2 E q u a t i o n s (4.18)+(4.10) i m p l y (4.20), whereas E q u a t i o n s (4.15)+(4.11) i m p l y (4.21): ( ibl â€žU+U+i) ) 2 â€ž(i+i,0 1 (J+1./-1) ( * U,i) 1 A Vi-l ' 2 = f Si U+U) "i-l [ A Y'p -1 [ (J+1./-1) ) q (' ; b_g_ ef Ji-0 ' J+1.0 ) . f Sl=l 2 > [ ; bj_ = (4.22) ge ,2 h n )2 e_g_ b_ _ bg_ = > bf J e 2 ~ ef [ J ' E q u a t i o n s (4.20) a n d (4.21) are to hold for real Q only i f befg > 0, which is a c o n d i t i o n which has already been imposed. â€¢ In terms of the mesh Reynolds numbers, T h e o r e m 4.1 leads to: 69 Chapter 4. Convergence C o r o l l a r y 4.1. f o r m a t i o n I7I, |/31, P r o o f . /3, f o r \8\ < T h e a n y 1 o |/3| r e d u c e d /3, r |/3|, for Block m a t r i x > 0 i f o n e I7I, \S\ > 1 c d f g 7 and 8. T h e same is true for c d f g S 8 7, F o r u p w i n d schemes difference schemes For Analysis = Stationary c a n s y m m e t r i z e d u s e s i f o n e t h e u s e s w i t h u p w i n d c e n t e r e d a r e a l d i a g o n a l ( b a c k w a r d ) d i f f e r e n c e s i m i l a r i t y s c h e m e s , a n d f o r t r a n s - e i t h e r s c h e m e s . ( l + 2/3)(l + 2<5) which is always positive for positive values of b e f g = ( l + 27)(l+2r5) and (1 - /3 )(1 - 8 ), 2 = b e Methods 2 b e f g = (1 - 2 7 b c d e = ) ( 1 - 5 ), 2 ( l F o r centered 2(3)(l-\-2j). + and b c d e = (1 â€” /3 )(1 â€” 7 ) . 2 2 to be positive, we require that either |/3| < 1 and |<5| < 1 or |/3| > 1 and <5| > 1. If c d f g < 1 and |<5| < 1, then b e f g > 0 implies I7I < 1 and then b c d e > 0 holds as well. If |/3|, \ S \ > 1, then the same argument yields \j\ > 1. In â€¢ order to construct the m a t r i x Q E q u a t i o n s (4.7)-(4.15) are used. proof it is clear how the symmetrized m a t r i x Q ~ l However, from the looks and there is no need to construct S Q Q and actually perform the similarity transformation. Moreover, the symmetrizer might contain very large values, thus using it might cause numerical difficulties. In [40] it is shown t h a t in the one-dimensional case, for the equation â€”u" + cru' = / w i t h Dirichlet boundary conditions, if the first entry of the symmetrizer is 1, and the mesh Reynolds number is between 0 and 1, the last entry of the symmetrizer goes to e Â° as n goes to infinity, and thus is very large for large values of the underlying P D E coefficient. Instability occurs in the three-dimensional problem, as is now shown: T h e o r e m 4.2. m e t r i z e r P r o o f . of Q S o l v i n g t h e i s n u m e r i c a l l y S y m m e t r i z i n g by symmetrized l i n e a r s y s t e m o b t a i n e d b y u s i n g t h e d i a g o n a l s y m - u n s t a b l e . Q means t h a t instead of solving the system 5a; = w , a linear system the form Sx = w , is solved, where S = Q ~ l S Q is symmetric, x, â€” structed as follows: 70 Q ~ x (4.24) x and w = Q ~ l w . The matrix Q is con- Chapter 4. Convergence Analysis for Block Stationary Methods Choose any value for q\(Li) f o r i = 2 to 2n if i m o d 2 = 1 (i.D _ ' (M) if i m o d 4 = 0 rf (i,i) J-) 1 1 if i m o d 4 = 2 for / = 2 to \ for i = 1 to 2n g (i,0 /.JM-D = for j = 2 to f for / = 1 to | for i = 1 to 2n 0,0 & (iâ€”i,. F r o m the a l g o r i t h m it follows that the entries of Q are not bounded as n tends to infinity. F o r example, the last row in the algorithm implies that if | | | ^> 1, even i f q[ ' ^ is small, 3 1 there exists i such that c o m p u t i n g qj '^ 3 7 Â« 1 and centered differences are used. will cause overflow. Such a s i t u a t i o n occurs when In this situation, if the other two mesh Reynolds numbers are small, the original m a t r i x is well conditioned; hence, the i n s t a b i l i t y arises from the s y m m e t r i z a t i o n , not as a result of conditioning of the m a t r i x . F r o m T h e o r e m 4.2 we can deduce, then, that the symmetrized m a t r i x should be used â€¢ merely for the convergence analysis; in actual numerical c o m p u t a t i o n , the n o n s y m m e t r i c system is used. 71 Chapter 4. Convergence Analysis for Block Stationary Methods T h e entries of the symmetrizer can be determined up t o sign. F o r example, for be, cd, fg > 0, a s y m m e t r i z a t i o n operator that preserves the sign of the m a t r i x entries transforms the m a t r i x entries as follows: -t -be ; - c -b 2 2 -> -cd ; -d -2bf -+ - 2 v ^ ; -2c/ -26# -Â»â€¢ -26c ->â€¢ -2^be~fg~ -2y/bcde ->â€¢ -cd; -e 2 -fg 2 ; -g 2 - 2 v ^ ; -2df ->â€¢ -2vWff ; -2e/ - Â» -2^c~dfg~ ; -2c# ; -2bd -be ; -f 2 -t -2\/bcde -+ -2y/c~djg~ ; -2eÂ«? ; -2dflr -2y/bcde ; -2ce ; -2de -> -fg ; -+ -2^/677^ ; ->â€¢ -2^be~fg ; -> -2Vbcde . T h e value of the c o m p u t a t i o n a l molecule corresponding t o the center point is unchanged under the s y m m e t r i z a t i o n operation. 4.1.2 The Variable Coefficient Case C o n s i d e r E q u a t i o n (2.11). A s i n the constant coefficient case, we denote the diagonal s y m - metrizer by Q, and require s y m m e t r y element by element t o derive the conditions of the entries of Q. L e t qi denote the Ith diagonal entry in Q. T h e n 13 Z It is sufficient t o look at 2n x 2n blocks. W e start w i t h the m a i n diagonal blocks. W e have to examine all the entries of the main block that appear in the / t h row of the m a t r i x , namely S;,/_4, S/./-3, â€¢ â€¢ â€¢, s/,/, â€¢ to the (i â€” 2,j,k) â€¢ â€¢ , S(,;+4. For s/ /_4, ) if / m o d 2n > 5 or is equal t o 0, then 1-4 corresponds mesh point. T h u s (4.26) Sl-A,l = aâ€”%â€”2,/3=j,7=fc a n d from this it follows t h a t Ql \ ,97-4/ _ \ a '-i,>,* / _ (di-2 kdi-ij, \ \ ai-i,j,k J l h k i-l,j,k i,j,k c c (4.27) di- ,j,kdi-l,j,k 2 In this case the values associated w i t h the center of the c o m p u t a t i o n a l molecule (namely a;,j,fc) are canceled, b u t this happens only for rows that involve the (i Â± 2,j,k), (i,j,k Â± 2) gridpoints. Â± 2,k) and A p p l y i n g the same procedure for the rest o f the entries o f the m a i n diagonal block we obtain 72 Chapter 4. Convergence Analysis for Block i,j,k â€” 1 fi,j,k Methods i i,j,k Ii â€” l,j,fc c J7_ Stationary c i,j,k-l ,-l,j,k a a gi,j,fc-l^i-l,j,fc-l _j_ d-i-l,j,k Ci,j+l,kej,j,k _|_ -i,i,fc (4.29) c a >,J-l,* <^Â« â€” l , j , f c Â« â€”1 ,j â€”1 ,fc _|_ i , j â€”l,fc <^t â€”1 ,j â€”l,fc &i â€” l,j,k i,j â€” l,k I m o d 4 = 0 or 1 (4.30) e e a 9i â€” \,j,k i,j,k j c c i,7,fc+1 9i,j,k i-l,j,k i,j,k+l a di- â€¢1 a + e t , j , f c - l fi,j,k qi a _|_ i,] + l,k fr>,j+l,fc/i,j+l,fc-l i,j,k-l t>i,j,k-lfi,j,k (4.32) / mod 4 = 0 . (4.33) I Ji,j â€” l,kf>i,j,k i,j,k-l e / mod 4 = 2 ; 0-i,j+l,k at,j'-l,fc a gi,j,fc-l i , j - l , f c - l (4.31) t,j,fc+l a 9 i , j , k - \ _ | _ qi / m o d 2n = 3 , . . .,2n - 1 ; Ji,j+l,kej,j,k i,j,k-l a a qi-i / m o d 4 = 2 or 3 fet â€”1 ,j,fc Â»,j,fc _|_ i , j â€”1 ,fc c qi-i (4.28) Ci,j,kej-i,j,k .91-2, qi-2 ,2n - 2 i-\,j,k i,3,k-l 9l mod 2 n = 0 , 4 , 6 , a a . ei,j-l,k3i A s is evident, E q u a t i o n s (4.31)-(4.33) are sufficient t o determine a l l the diagonal entries, except the first entry i n each 2 n x 2n block, which at this stage can be a r b i t r a r i l y chosen. We have t o make sure, then, that E q . (4.27)-(4.30) are consistent w i t h these three equations, and this requirement imposes some additional conditions. In the constant coefficient case there is u n c o n d i t i o n a l consistency. T h e problematic nature of the variable coefficient case can be demonstrated simply by looking at one of the consistency conditions. C o n s i d e r the (i,j,k) gridpoint whose associated index, /, satisfies / m o d 4 = 0. A p p l y i n g (4.31) t o / â€” 1 means looking at the row corresponding t o the â€” l,k â€” 1) gridpoint, a n d m u l t i p l y i n g (4.31), 73 Chapter 4. Convergence Analysis for Block Stationary Methods applied to / - 1, by (4.33) results in an equation for which should be consistent with (4.30). T h e consistency condition is translated into the following: + Q,j-l,fc<7i,j-l,fc-l 8-l,j-l,fc-l Si-l,j-l,k-lCj,j-l,k-lO-i,j-l,k A di-l,j-l,k-lfi-l,i-l,k^i,j-l,k ,j-l,fc/ij-l,ytÂ«i-l J-l,fc-l + bi,j,kâ€”lfi,j,kQ'i,j â€” l,k ~\~ biJ,kfi,j â€” l,kâ€”lQ'i,j,kâ€”l X â€”â€” 9i,j,k-iei,j-l,k-iai,j-l,k bi-l,j,kCj,j,k1i,j-l,k â€” ei,j-l,k9i,j-l,k-iai,j,k-l + + bi,j kCi,3-l,kO-i-l,j,k ^ t d{â€”l,j,kâ‚¬i â€” I J â€” l,k^i,j â€” l,k ~\~ Â£i,j â€” l,kdiâ€”lJ ^ â€” l kQ'i â€” l,j,k l T h e r e are three a d d i t i o n a l consistency conditions for the m a i n block, and eight addit i o n a l conditions for the rest of the blocks of the reduced m a t r i x . E q u a t i n g variables t h a t belong to the same location i n the c o m p u t a t i o n a l molecule, necessary conditions for the abovementioned consistency conditions to hold are bi-ij^ di-i,j,k, eij-.x^-i = ej-ij-i^, fij k â€” t c ; j - i ^ - i = i,j,k, â€” /t-ij-i,fe, c = 9i-i,j-i,k-i- 9i,j,k-i ^t'-ij-i.fc-i = U n d e r these condi- tions E q . (4.34) becomes Ci9k-\ bjf _ k fkdi-i ej-igk-i bjCi (4.35) di-\ej_i which is obviously satisfied. T h e analysis for off-diagonal blocks is identical, and the following a d d i t i o n a l conditions are obtained: fi,j,kfi,j,kâ€”l ,9l-2n, V_ qi <7/_ 2 / n 2 n < l < (4.36) n (4.37) 9i,j,k-l9i,j,k-2 b h J ^ k b h h k 2 < l < ^ . e,- j_i fcejj_2,fc i i T h e above t w o equations determine the rest of the entries of the m a t r i x , and only the first entry i n the s y m m e t r i z e r c a n be determined arbitrarily. L a s t , i n order for the symmetrizer to be real, the products c ; d ; _ i , must have the same sign. bjej-i and f g -\ k k T h e actual meaning of the conditions stated above is that the 74 Chapter 4. Convergence Analysis for Block Stationary Methods continuous problem has to be separable: in (2.11) the P D E variable coefficients should satisfy p â€” p(x), q = q(y), r â€” r(z), s = s(x), t = t(y) and w = w{z). In this sense the three- dimensional problem has the same behavior as the two-dimensional problem [42]. W e can now summarize all t h a t has been said in the following s y m m e t r i z a t i o n theorem: T h e o r e m 4.3. nonzero and matrix Suppose the operator have the same Q such that Q _ 1 sign for of (2.11) SQ is is separable. all i, j and J/c,-d,-_i, bjej_i k, then there exists and fkOk-i are a real nonsingular all diagonal symmetric. A s in the constant coefficient case, the symmetrized c o m p u t a t i o n a l molecule should be derived w i t h o u t actually performing the similarity transformation. F o r example, the s y m m e t r i z e d value corresponding to - c , ' J responding to - ( ' t _ 1 C k _ J { l V > t ] h ' -' c , J t / i j J 1 is , f c â€” ^ - <h hiM -'*,h >-i,h c, l + Â£Li*Zi=LL*) i c, k i,j,k-l _ ^g 1 t]hk ai-l,j,k a |~ ^ f c C ' 1 J> kd â€¢ the k d _ j, _ _ , <lt-l,j,fc ' i 1 i,j,k-l v i s y m m e t r i z e d value cor- s rf _ , _ t kC a 1 fcgi Lilji i ; d j ff b j e j m K separable the value at the center of the c o m p u t a t i o n a l molecule is given by a,- ,â€¢ k â€” ' ~â€” J k9k 1 i,j,k â€” l a B j b j + '~ Â°t-l,j',fc l c , ( j i,j+l,k a 4.2 ' Oi+l,j,fc 1 d , c + J j-i i,j-l,k 1 6 e a _ 9kfk4-\ a t,j',/c+l a n c j j g u n c h rr a n e c l u n d e r the similarity transformation. Â° J Bounds on Convergence Rates B e l o w we attach the subscripts ' 1 ' and ' 2 ' to matrices associated w i t h the I D and 2 D partitionings respectively. Di = Q~ DiQ l L e t S =â€¢ D\ â€” C\ = Di â€” C 2 and C{ = Q CiQ, _1 be the corresponding splittings, and let where Q is the diagonal symmetrizer. Below we refer to the particular m a t r i x 2 P N x z (convergence results for other matrices in this family can be obtained by appropriate interchange of indices or roles of x, y and z). Consider the constant coefficient case, w i t h be, cd, fg > 0. D e f i n i t i o n 4.1. consists of points imposed on x An interior block is a 2n x 2n whose y and z coordinates are block sf*} not whose associated 1 / ( T I + 1) or n / ( n + 1). set No of grid restriction points is coordinates. We can focus on interior blocks, because the minimal eigenvalues of Di are sought; since non-interior blocks differ only on their diagonals, and these are larger algebraically than the 75 . Chapter 4. Convergence Analysis for Block Stationary diagonals of interior blocks if be, cd, fg Methods > 0, the latter have larger m i n i m a l eigenvalues, compared to interior blocks. L e t us define a few auxiliary matrices and constants: r = -2^/befg S n = tvi[E ,0, En] ; T 10 n ; R = tvi[E , n 0, E ] 01 ; w = t r i [ l , 0,1] ; U = p e n t a [ l , 0, 0,0,1] ; n V The ; s = -2^cdfg n = s e p t a [ Â£ , 0, 0, 0,0, 0, E ] 1 0 ; Z 01 n = tri[s, r, s] . (4.38) subscript n stands for the order of the matrices. Notice that a l l the above matrices are s y m m e t r i c (see Sec. 1.3 for explanation of the notation). T h e m a t r i x U 2 has l ' s on its fourth superdiagonal and subdiagonal, 2's on its m a i n diagonal except for the first two entries and last two entries where the values are 1, and zeros elsewhere. If we define W = ( a - 2be - 2fg) â€¢ I 2 n n - 2\fbcde â€¢ U - cd â€¢ C / 2 n (4.39) and X n = -2^/c~dfg- (R + V ) -2^beTg-S n n n , (4.40) then an interior block of D\ is given by S|j = ^ 2 n + X 2n We now examine the eigenvalues of W 2n Kronecker products and X . . (4.41) In the following, we shall make use of 2n (see, for example, B a r n e t t [10]), denoted by the s y m b o l R e c a l l that for any two matrices A and B of sizes, say, m X n and p X q respectively, ( a l t l B a B 1<2 â€¢â€¢â€¢ a l j n B ^ A6AB (4.42) y Â«m,i5 a ^B m In (4.42) each of the blocks is of size p X q, and A 0 â€¢â€¢â€¢ a B m>n J B is of size (mp) x (nq). K r o n e c k e r products have a few useful properties [10], of which we mention in particular the ones which we w i l l use for d e r i v i n g bounds: â€¢ A n y four given matrices A, B, C, D w i t h the appropriate sizes satisfy (A (g) B)(C (g) D) = (AC) (g)(BD) 76 . (4.43) Chapter 4. Convergence â€¢ If {AJ,M,-} a n d Analysis for Block Stationary Methods are the sets of eigenvalues/eigenvectors of {pi,Vi} A a n d B respectively, then ( A Â® f l ) ( u ; Â® i ; 0 = (A;/x,-)(u,-0i; -) â€¢ (4-44) t U s i n g these properties, we have: L e m m a 4.1. The eigenvalues ofW are given by 2n a - 2be â€” 2fg - 4Vbcde â€¢ cos(wjh) - 4cd cos (n jh) , 2 2 j = 1,..., n , (4.45) each with algebraic multiplicity of 2. Proof. Since C/ 2n = T (g)/ n 2 , (4.46) this m a t r i x ' s eigenvalues are {2 cos(7rj/i)}â„¢ , each of algebraic m u l t i p l i c i t y 2. W =1 n m i a l i n U , therefore it has the eigenvalues stated in (4.45). is a polynoâ€¢ n L e m m a 4.2. The matrix X n has the following eigenvalues: 2 Xf = Â±{2y/beTg + 4yfc~dTg â€¢ c o s ^ ) ] Proof. Suppose Y = 2 n and thus by (4.44) the eigenvalues of X 2n n a n d Y , given by [r + 2s cos(njh)] 2 . j = l,...,n. (4.47) . Then * 2 â€ž = Z (g)Y of Z , 2 , (4.48) are a l l the combinations of products of eigenvalues â€¢ ( Â± 1 ) , 1 < J < n. L e m m a s 4.1 a n d 4.2 can now be used to establish the following: 77 â€¢ Chapter 4. Convergence T h e o r e m 4.4. Analysis for Block Stationary The eigenvalues (if of interior Methods blocks are - 2be - 2fg - 4\/bcde = a 2 â€¢ cos(njh) Â±[2 /bejg'+4y/clfg'-cos(TTJh)] , y Proof. Since I Y 2 = {T Z ) (g)(/ y ) XU = {Z T ) (g)(Y I ) = YI 2 2 = Y 2 2 and T Z n 2n n 2n n = ZT n n 2n hence X and U 2n n 2 n 2 = U X 2n 2n 2n commute, which means t h a t X 2n and W 2 (4.50a) 2 (4.50b) 2 , (4-51) and W 2n have c o m m o n eigenvectors, can 2n be simultaneously diagonalized, and the eigenvalues of SjÂ°j are the s u m of the eigenvalues of , given in (4.49). n â€¢ W e remark t h a t another way of analyzing the spectrum of X 2n 2 2n â€” (Z ) . The symmetric given a in (4-49) eigenvalues eigenvalue of D\ of D\, r e a ^ s o matrix eigenvalues D\ is positive of Di, are all in the interval m i n ( / i j ) , is given j by rj = a - 2be - 2fg - 2^befg 2 is by using the relation W e are now ready to prove 2 2n T h e o r e m 4.5. Proof. (4.49) we conclude t h a t n XU {X ) 2 j = l,...,n. UX 2n 2n cos (njh) A p p l y i n g (4.43), using (4.46) and (4.48), we have 2n X - Acd each definite of multiplicity [min(^j), i - 4(Vbcde+ if be, cd, fg > 0. The (^ â€” 2 ) . The 2 max(ytij) + be + cd + fg]. j eigenvalues rest of the The minimal y/cdfg) â€¢ COS(TT/I) - 4cd cos (nh) . (4.52) 2 T h e eigenvalues of D\ are positive by combining L e m m a 2.5, [113, C o r . 2] and [113, T h m . 3.12]. F o r the eigenvalues (4.49), the m u l t i p l i c i t y is determined by counting the number of interior blocks. Since non-interior blocks differ from interior blocks only in their diagonal, 78 Chapter 4. Convergence Analysis for Block Stationary Methods and only by 0 to be + cd + fg in each diagonal, the minimal eigenvalue is attained in interior blocks and is thus given exactly by (4.52). Using equality between 2-norm and spectral radius of symmetric matrices, and the triangular inequality, results in an upper bound for the eigenvalues, given by max(/ij) + be + cd + fg. â€¢ j Next, D â€” Di is considered. For a given value of j (1 < j ' < ~), denote the jth block of 2 D â€” D\ by Sjj = t r i [ 5 - ~ \ 0, Sj~^ ]. 1 2 T Sjj is an n x n 2 2 block tridiagonal matrix. A l l the block matrices Sjj , 1 < j < \ , are identical to each other, thus the analysis can be carried out for a single block. Define Y (r, 2n s) to be the following In x In matrix : Y {r,s) 0 r 0 s 0 0 0 0 0 s 0 r s 0 0 0 0 0 0 0 r 0 s r 0 (4.53) = 2n 0 0 0 0 s 0 r V o and let W(r, s) = tri[Y (r, s), 0,Y (r, 2ri 2n s)] be an n x J matrix, consisting of ^ blocks, each of 2 size 2n X 2n. Then SJ,J = t n [ S ^ \ 0, 5 J J 1 } T ] = tri[-/flf 7 , 0, -fg 2n 79 I] 2n + W(-2^beJg~, -2^/clfg) . (4.54) Chapter 4. Convergence Analysis for Block Stationary Methods Since Sjj is s y m m e t r i c , its spectral radius can be estimated using p{S ) = ||tri[-/ hl f f I } + W(-2^/befg, l n,0,-fg -2yfidfg)\\ 2n 2 2 (4.55) < ||tri[-/5/ 2 n ,0 -/<7/ ) 2 n ]|| 2 + ||W(-2 /^,-2v^)||2 â€¢ v 7 , 0 , - fg I ] are -2fg T h e eigenvalues of the n x n m a t r i x ti\[-fg 2 2 2n 2 n COS(TT^T) , 2 1 < i < f j each w i t h algebraic m u l t i p l i c i t y of 2n. Since i t is a s y m m e t r i c m a t r i x , we o b t a i n ||tri[-/<7 l ,0,-fg I2n]\\2 = p(tn[-fg T o estimate | | W | | I ]) = l ,0,-fg 2n 2n 2n in (4.55), i t is easier to work w i t h W , and use | | W | | = | ' | W | | 2 . W e have: 2 2 2 2fg c o s ( ^ â€” ) . 2 / 2 L e m m a 4.3. The matrix W has the following eigenvalues : 0, of multiplicity 2n, and 4befg + ldcdfg â€¢ C O S ( T T J / I ) + wVbcde â€¢ fg â€¢ COS(TTjh) , 1< j <n , (4.57) 2 2 each of multiplicity n â€” 2. Proof. F o r m i n g W in terms of Y 2 / and Y , we get T 2N 2 N 0 YY T 0 YY T + (vT\2 1 0 Y 2 T\2 0 YY YY T 0 + YY T (Y T\2 W (4.58) 1 0 Y' Y V YY T 2 + YY T 0 0 vYY v T 1 J [in E q . (4.58) the subscripts 2n were dropped]. N o w , a straightforward c o m p u t a t i o n shows that ( l ^ n ) 2 = (Y ) T = 0. F r o m that i t follows that W 2 2 2 n T h e matrices Y Y 2N T 2 N and Y Y , T 2 N 2N is actually block diagonal. which appear i n the first and last diagonal block entry 80 (4.56 Chapter of W , 2 4. Convergence Analysis for Block Stationary Methods are given by / r 2 2 s 0 2rs 0 0 0 0 0 0 2rs 0 r + 2s 0 2rs 0 0 0 0 0 0 0 0 0 2rs 0 + 2 2 r 2s 0 2 + 2 s 2 2rs (4.59) 0 0 2rs 0 V / Y Y2â€ž T 2 N 0 r + s 2 0 0 0 0 0 0 0 r + s 0 2rs 0 0 0 0 0 0 0 2rs 0 r + 2s 0 0 0 0 2 2 2 2 0 0 0 0 0 0 0 0 2rs 0 0 0 0 2 0 2 0 0 s 0 s 0 J \ 2 0 (4.60) â€” v + 2s 2 V C o m b i n i n g (4.59) and (4.60), the m a t r i x Y2 Y n the block diagonal entries of W , 2 is given by 81 T 2 n 2 0 2rs 0 0 0 0 0 2rs 0 r 2 + s 2 ) + l ^ n ^ n , which appears in all the rest of Chapter 4. r 2 Convergence 0 + s 2 0 Analysis r 2 + s 2 0 2rs 0 r 2 for Block Stationary 2rs 0 0 + 2s 2 0 2rs r s 0 2rs 0 s 0 2rs 0 0 2rs 2 + 2s 2 Methods 2 2 s 2 0 s 2 (4.61) 0 s 2 2rs 0 0 2rs. 0 0 2rs s 2 s 2 r 2s 0 2 + 2 r + 2 2rs 0 s 2 0 r 2 + s m (4.61) it is evident that in terms of U [see E q . (4.38)] we have n Y2nY T 2 n + Y Y = T 2 n 2n r + 2rs 2 U s â€¢ TJ + 2n . (4.62) 2 2 thus, the eigenvalues of this m a t r i x , w i t h r and s as in (4.38), are the ones given in E q . (4.57). E a c h eigenvalue is of algebraic multiplicity 2. The matrix Y Y T 2 n 2n can be permuted so that the n rows containing nonzero entries are first, (X followed by the zero rows. D o i n g so, we obtain a m a t r i x of the type V0 s y m m e t r i c pentadiagonal n X n m a t r i x given by penta[s , 2rs, r + 2s ,2rs, 2 and the last entries in the main diagonal are r + 2 2 0\ 2 â€¢~ where X is a Oj s ] , except the first 2 s. 2 A g a i n , we use an auxiliary m a t r i x that has been introduced i n E q . (4.38), and we have X = r 2 â€¢ / â€ž + 2rs -T F r o m this it follows t h a t the eigenvalues of Y Y 2n 2n + s -T 2 n . 2 and Y Y 2n 2n (4.63) are thus exactly the ones given by (4.57), each of m u l t i p l i c i t y 1, plus the eigenvalue 0, of m u l t i p l i c i t y n. W e can now find the eigenvalues of W 2 by assembling all the eigenvalues of all blocks. Since 82 Chapter 4. Convergence Analysis for Block Stationary Methods there are f â€” 2 blocks in W that are equal to Y Y 2 2n 2n + Y ^Y , the result in the statement of 2r 2n the l e m m a follows. â€¢ H a v i n g t h e eigenvalues of W 2 at hand, we can now use equations (4.56) a n d (4.57) t o obtain: L e m m a 4.4. The spectral radius 2 y46e/7+16 â€¢ Â£ = 2fg c o s ( - ^ â€” ) + 2" of D â€” T)\ can be bounded by cdfg â€¢ cos (irh) + wVbcle 2 â€¢ fg â€¢ COS(TT/I) . (4.64) + 1 C o m b i n i n g T h m . 4.5 and L e m m a 4.4, and a p p l y i n g Rayleigh quotients t o the matrices E>i and D â€” Di, we o b t a i n : 2 L e m m a 4.5. The minimal the expressions given eigenvalue of D is bounded from 2 in (4-52) and (4-64) below by r/ â€” Â£ ; where r\ and Â£ are respectively. A s a last step, we estimate the spectral radius of -C = tn[S^% 2 L e m m a 4.6. The spectral 2\Jbefg. Moreover, Proof. + HSV_ 0, radius its eigenvalues of t r i f S J " ^ , 0, are either 3 - 8$*] + t r i L S ^ , 0, T . 0, ] + t r i [ 5 , 0, Sj ^] +2y/befg or radius (4.65) is 1 â€”2y/befg. T h e square of the m a t r i x is a diagonal m a t r i x whose entries are either zeros or Abefg. L e m m a 4.7. The spectral Proof. 0, V â€¢ 3 of the ^ X ^ matrix tri[â€”be-I 2, 0, â€”be-I 2] is 2 6 e - c o s ( ^ j - ) . N N L e t Z denote the f x | m a t r i x t r i [ l , 0,1]. T h e n t r i [ - 6 e â€¢ I 2, 0, -be â€¢ I 2~\ = N be â€¢ ( Z < g > / 2 ) . N â€¢ n 83 Chapter 4. Convergence L e m m a 4.8. Analysis T h e s p e c t r a l for Block Stationary r a d i u s o f tri[SJÂ°Li, Â°> ^ij-I-] ~ 2V fg + be P r o o f . Let C \ Methods 4 : V b c d e â€¢c o s ( n h ) t r i [ - 6 e â€¢ I 2,0, N - b e â€¢I] i s n2 g i v e n b y (4.66) . denote the m a t r i x that results from p ermu tin g the rows of the m a t r i x given in the statement of the l e m m a so that indices whose m o d u l i 4 are 0 or 1 are indexed first, in increasing order, and indices whose m o d u l i 4 are 2 or 3 are indexed later. Let C 2 p e r m u t a t i o n of C \ such that the rows and columns indexed n / A â€” n / 2 + 1 to n / 4 + 2 3 ( n such rows & columns) become r o w s / c o l u m n s ra /2 â€” n 3 2 r o w s / c o l u m n s are shifted accordingly. T h e last n denotes the upper left ( n / 2 â€” TI ) x { n / 2 â€” n 3 0 U ii 0 2 3 2 2 + 1 to n /2 rows and columns of C ) s u b m a t r i x of C 2 3 2 3 be a n /2 2 and the rest of the 2 are zeros. If C3 , then it is a block m a t r i x of ] and U is a block diagonal m a t r i x consisting of n X n t r i d i a g o n a l matrices the form | given by - 2 â€¢ t r i [ y / b c d e , ^ b e f g , ^ b c d e ] . In F i g . 4.1 the sparsity patterns of the above matrices are depicted. (a)Cr (b) C 2 (c)C 3 F i g u r e 4.1: sparsity patterns of the matrices involved in the proof of L e m m a 4.8 By L e m m a 1.1 the spectral radius of C3 is the one given in (4.66). It is also the spectral radius of the m a t r i x referred to in the statement of the l e m m a , as the rest of its eigenvalues are zero. We " â€¢ can now combine the results obtained in L e m m a s or Theorems 4.3 - 4.8 to o b t a i n the following result: 84' Chapter 4. Convergence T h e o r e m 4.6. The Analysis spectral for Block radius <f> = 4^/befg the spectral radius of C\ of C Stationary Methods is bounded 2 by + iVbcde â€¢ cos(â€”^â€”) + 2be cos( is bounded by Â£ + cj) [Â£ is defined * n ) ; (4.67) in Eq. (4-64)]- W e are now ready to prove the main convergence result. T h e o r e m 4.7. a n The spectral radii d â€”respectively, of the iteration matrices D^C'i where n, Â£ and <f> are defined and in Eq. Z? _ 1 2 C 2 are bounded by (4-52), (4-64) and (4.67), respectively. F r o m T h e o r e m 4.7 we can draw the following conclusion w i t h regard to the convergence of the block J a c o b i scheme: C o r o l l a r y 4.2. 2D If be, cd, fg>0 then the block Jacobi iteration converges for both the ID and splittings. Proof. T h e Taylor expansions of the bounds given in Theorem 4.7 are given by: P(D^C\) ^ ^ < ^ + Â£ _ = 11 - C {-^ ^ . 2 +. 1 , 2 1 0 +, 1 2. , 2 < 2\ -1 ' 2)w * +, o(h') (4.68) U a h and P(D?C ) 2 <^ = 1 - (2* 3 +A thus smaller t h a n 1. 2 + l r + la )/, + o(h ) 2 2 2 2 M , (4.69) â€¢ W e remark that convergence of the scheme can also be deduced by [113, T h m . 3.13]. In Tables 4.2 and 4.1 we give some indication on the quality of the bounds. A s can be observed, the bounds are tight and become tighter as n increases, which suggests t h a t they tend to the actual spectral radii as h â€”> 0. A n o t h e r i m p o r t a n t point is t h a t the actual spectral radii are bounded far below 1. 85 Chapter 4. Convergence Analysis for Block scheme Stationary Methods upwind centered n P bound ratio P bound ratio 8 0.641 0.717 1.12 0.475 0.525 1.11 12 0.720 0.758 1.05 0.529 0.554 1.05 16 0.752 0.775 1.03 0.551 0.566 1.03 20 0.768 0.783 1.02 0.562 0.571 1.02 24 0.778 0.788 1.01 0.568 0.575 1.01 Table 4.1: comparison between the computed spectral radius a n d the b o u n d , for the I D splitt i n g , w i t h (3 = 0 . 4 , 7 = 0.5, 5 = 0.6. scheme upwind centered n P bound ratio P bound ratio 8 0.497 0.583 1.17 0.343 0.388 1.13 12 0.585 0.632 1.08 0.392 0.415 1.06 16 0.625 0.653 1.05 0.412 0.427 1.04 20 0.645 0.664 1.03 0.423 0.432 1.02 24 0.657 0.671 1.02 0.429 0.436 1.02 Table 4.2: comparison between the computed spectral radius and the b o u n d , for the 2 D splitting, w i t h /? = 0.4, 7 = 0.5, S = 0.6. F o r the variable coefficient case, recall from the previous section t h a t i f the problem is separable the m a t r i x can be symmetrized by a real diagonal similarity transformation. Inspired by E l m a n & G o l u b ' s strategy [42, C o r . 2], and the technique t h a t is used to prove i t , T h e o r e m 4.7 can be generalized as follows: Theorem 4.8. Suppose the continuous problem is separable, and c i(i,-, J+ 6 j i e j and fk+igk all positive and bounded by /3 , j3 and fi respectively. Suppose also that a-ij^ > X and k. Denote h = -nVr. y z & ar + for all i, j Then the spectral radii of the iteration matrices associated with the block Jacobi scheme which correspond to ID splitting and 2D splitting are bounded by â€” : V 86 Chapter 4. Convergence and â€” ^ â€” respectively, Analysis for Block Stationary where: n = a - 2 P - 2(3 - 2^f - 4(y/p%p\ 2 y Â£ = 2/3 z Z z cos(Trfc) + \J^ (3 y + y/%f ) 2 + 160 (3z cos (nh) 2 z <t> = 4y/P Pz y Proof. Methods x + Ay/B p x â€¢ cos{nh) y cos(7r/i) - 4/3* cos (rrh) ; 2 + I6f3 ^(3 (3 z x y COS(TT/I) ; (4.70a) (4.70b) + 2/3 cos(-K~h) . (4.70c) y T h e conditions stated in the theorem guarantee t h a t the m a t r i x is s y m m e t r i z a b l e by a real diagonal nonsingular m a t r i x . Denote the reduced m a t r i x by S, and the s y m m e t r i z e d m a t r i x by 5 , and suppose 5* is obtained by modifying the S in the following manner: replace each occurrence of c; and d{ by yfp%, replace each occurrence of bj and ej by y/p\, each occurrence of f k and g k by ^fffl and replace each occurrence of a ; , ^ by a. replace Denote by S* = D* â€” C* the s p l i t t i n g which is analogous (as far as sparsity p a t t e r n is concerned) to the s p l i t t i n g S â€” D â€” C. F o r the I D s p l i t t i n g , the m a t r i x D* is block diagonal w i t h s e m i b a n d w i d t h 4, its sparsity pattern is identical to that of D, and it is componentwise smaller t h a n or equal to the entries of D. B y C o r . 2.5, D* is a diagonally dominant M - m a t r i x . Clearly, C* > C > 0. T h u s the Perron-Frobenius theorem [113, p. 30] can be used to o b t a i n an upper b o u n d on the convergence rate. Since the m a t r i x S* can be now referred to as a s y m m e t r i z e d version of a m a t r i x that is associated w i t h a constant coefficient case, the bounds on the convergence rates are readily obtained from T h m . 4.7. 4.3 "Near-Property Matrix â€¢ A " for ID Partitioning of the Two-Plane T h e block J a c o b i scheme converges slowly, and thus is not satisfactory. W e are interested in performing convergence analysis for the block S O R solver, which is t y p i c a l l y very efficient if the o p t i m a l relaxation parameter (or a good a p p r o x i m a t i o n to it) is k n o w n . C o n s i d e r i n g the two-plane ordering, the reduced m a t r i x is consistently ordered relative to the 2 D p a r t i t i o n i n g 87 Chapter 4. Convergence Analysis for Block Stationary Methods (by P r o p . 3.3), and thus i n this case Y o u n g ' s analysis can be straightforwardly applied. O n the other h a n d , a n a l y z i n g the I D p a r t i t i o n i n g is more challenging: by P r o p . 3.2 the reduced m a t r i x does not have P r o p e r t y A relative to this p a r t i t i o n i n g . However, in Section 4.4 it w i l l be shown t h a t the I D p a r t i t i o n i n g gives rise to a more efficient solution process. It is thus i m p o r t a n t to provide some analysis for this case. O u r analysis and experimental observations are for the constant coefficient case and refer m a i n l y to centered difference discretization. F i r s t , we consider the (easier) case of 2 D partit i o n i n g and provide an a p p r o x i m a t i o n to the o p t i m a l relaxation parameter. W e then consider the I D p a r t i t i o n i n g : the spectral radius of the J a c o b i iteration m a t r i x is bounded in terms of spectral r a d i i of two iteration matrices, one of which is associated w i t h a consistently ordered m a t r i x , and one whose specral radius is s m a l l . W e then use V a r g a ' s extensions of the theory of p-cyclic matrices [112],[113, Sec. 4.4] (we are concerned w i t h p = 2) to show t h a t t h a t relative to I D p a r t i t i o n i n g , the reduced m a t r i x behaves like a consistently ordered m a t r i x . W e begin w i t h the 2 D p a r t i t i o n i n g . F o r this case we have the following: T h e o r e m 4.9. two-plane Let C w orderinq. denote If be, cd, the block SOR fq>0or<0 operator associated then the choice with u>* â€” 2D splitting and , , I+VI-PW ^) 2 using minimizes 1 p(C ) respect w to LO, and p(C Â») w = LO* â€” 1. T h e proof of this theorem directly follows from [117, Â§5.2 and Â§14.3] and is essentially identical to the proof of [41, T h m . 4]. T h e algebraic details on how to pick the signs of the diagonal symmetrizer so that the symmetrized block diagonal part of the s p l i t t i n g is a diagonally d o m i n a n t M - m a t r i x are o m i t t e d . B y C o r . 4.2 it is known that p(D^" C2) < 1. T h e reduced 1 m a t r i x is consistently ordered by P r o p . 3.3. A way to approximately determine the o p t i m a l relaxation parameter for be, cd, fg>0 is to replace /?(.D^ C2) by the bound for it (given in T h e o r e m 4.7) in the expression for LO* in 1 T h e o r e m 4.9. If the bound for the block J a c o b i scheme is tight, then the estimate of to* is fairly accurate: 88 Chapter 4. Convergence C o r o l l a r y 4.3. h sufficiently Analysis Suppose small, the be, for Block cd, fg > 0. Stationary For the system Q* - minimizes associated with 2D splitting and for choice u* = approximately Methods p(C ). The u 2 spectral ( ; ~^ (4.71) ? radius of the iteration matrix is approximately 1. It is straightforward to verify that if u* is an accurate a p p r o x i m a t i o n to the o p t i m a l relaxation parameter, the a s y m p t o t i c convergence rate of the S O R scheme is 0(h). M o v i n g to the I D partitioning, as mentioned above, it is fundamentally different from the 2 D p a r t i t i o n i n g and an analogous result to the one stated in T h e o r e m 4.9 cannot be obtained by using Y o u n g ' s analysis. However, for be, cd, fg > 0 we have noticed t h a t in m a n y senses the reduced m a t r i x does behave like a consistently ordered m a t r i x relative to I D p a r t i t i o n i n g . For example, we have observed t h a t the spectral radius of the block J a c o b i iteration m a t r i x satisfies p 2 w pes, and the graphs in F i g . 4.2 illustrate this phenomenon numerically. T h e broken lines correspond to the square of the spectral radius of the iteration m a t r i x associated w i t h block J a c o b i , for an 8 X 8 X 8 grid and f3 = 7 = 8 = 0.5. T h e solid lines correspond to the spectral radius of the block Gauss-Seidel iteration m a t r i x . A s can be seen, the curves are almost indistinguishable. T h i s phenomenon becomes more d r a m a t i c as the systems become larger. (a) pas vs. p j - centered 2 (b) pes vs. p 2 - upwind F i g u r e 4.2: "Near P r o p e r t y A " for the I D s p l i t t i n g Let {Sij} denote the n 2 Xn 2 blocks of the reduced m a t r i x . E a c h block Sij 89 is a block Chapter 4. Convergence Analysis for Block Stationary Methods tridiagonal m a t r i x relative to 2n x 2n blocks. Define (-1) (i) (4.72) where the matrices on the right hand side are defined in Sec. 3.2. A s before, let S = D \ â€” C \ so that S = D \ â€” (CJ ^ + C ^ ) . be the I D J a c o b i s p l i t t i n g of the m a t r i x , and define c[ ^ 2 The matrix D \ â€” C \ 1 is consistently ordered, but S is not, and does not have P r o p e r t y A either. It is convenient to consider the very sparse m a t r i x c ' ' 1 as one which prevents S from having P r o p e r t y A . In this m a t r i x the magnitudes of the nonzero values are bounded by 2 i f b e , cd, f g > 0 and centered difference discretization is used for the convective terms. The nonzero pattern of C J ' is depicted in F i g . 4.3. 1 c[^. F i g u r e 4.3: sparsity pattern of the m a t r i x When b e , c d , fg>0 the reduced m a t r i x 5 is a diagonally dominant M - m a t r i x (by L e m m a 2.5), which can be symmetrized by a real diagonal similarity transformation (by T h e o r e m 4.1). ~ B y T h e o r e m 4.5 the symmetrized m a t r i x D i is positive definite, thus D 1/2 x is well defined. T h i s leads to Lemma 4.9. m a t r i x S T h e s p e c t r a l i s b o u n d e d f r o m a b o v e P(D^C{ ) 2) X r a d i u s a < o f t h e n P I db e l o w ( D ^ C \ ) D J a c o b i i t e r a t i o n a s f o l l o w s : < (D^ci ) 2) P m a t r i x a s s o c i a t e d w i t h t h e r e d u c e d 1 + {D?CP) P (4.73) I would like to thank Howard.Elman for pointing out and providing the proof of the right-hand inequality in (4.73). 90 Chapter Proof. 4. Convergence Analysis Stationary Methods and c[ ^ denote the symmetrized versions of C J ' a n d c f L e t the matrices tively. Since D^&i for Block 2 respec- 1 is similar to the s y m m e t r i c m a t r i x D\ (D^C^D^ ^ /2 = 1 D^ CiD^ , 1/2 1/2 we have = {b?Cy) piD-'Cr) = pib-^c.b; ' ) <wb^c^b; ^ + 2 ) 1/2 /2 \\b; c[ b; \\ 1/2 1 = p i b ^ C ^ b ^ ||Â£>- cW li? = 1 2 P 2) 1/2 2 + p i b - ^ C ^ b - ^ ) + P{D?CP) = pipfdp) . (4.74) T h i s completes the proof for the right-hand inequality. O n the other hand, we have that b o t h C[ ^ a n d C\ are nonnegative matrices. 2 C^ Since D\ is an M - m a t r i x , a n d since C J ' < 2 + C p ' = C i , we c a n use [95, P r o p . 1.7], which states that i f A , B and C are nonnegative matrices w i t h A < B, then CA < CB (and AC < BC). Hence D^C < D ^ d . N o w , using {2) [95, T h m . 1.14], which states that for two matrices 0 < A < B we have p(A) < p(B) (this actually follows from the Perron-Frobenius theorem [113, p. 30]), the left-hand inequality in (4.73) follows. â€¢ Table 4.3 illustrates numerically the inequalities given i n L e m m a 4.9 for a p a r t i c u l a r example. In the table, spectral radii of the iteration matrices defined above are given, for several cross-sections of mesh Reynolds numbers. T h e m a t r i x for which the numbers are given is one arising from discretization on a 12 X 12 x 12 grid. A s is evident, the bounds are tight (by the analysis, the values i n the second c o l u m n of the table are bounded from above by the values in the t h i r d c o l u m n , and from below by the values i n the fourth c o l u m n ) . In particular, the right-hand inequality is a very tight upper b o u n d . A n o t h e r interesting point is t h a t the spectral radius of the iteration m a t r i x associated w i t h D\ â€” c j ' is fairly s m a l l . T h i s c a n be quantified analytically, as follows: 1 P r o p o s i t i o n 4.1. associated For h sufficiently with the splitting rfflfCh small, the spectral D\ â€” C J ' is bounded 1 <\ - +Â± f 91 from + ^ radius of the Jacobi iteration matrix above by +^ V + .<*') â€¢ (4.75) Chapter 4. Convergence Analysis for Block 13 = 7 = 5 Stationary Methods p( ) p{D^C ) 2 x pW 0.1 0.8734 0.8762 0.7759 0.1002 0.3 0.7487 0.7511 0.6648 0.0862 0.5 0.5441 0.5458 0.4827 0.0631 0.7 0.3151 0.3156 0.2788 0.0368 0.9 0.0979 0.0981 0.0866 0.0115 Table 4.3: comparison of computed spectral radii of the I D J a c o b i iteration m a t r i x and matrices which are consistently ordered. T h e expressions in the header of the table stand for: pM = P(D^C[ ) and p( > E E 1] Proof. P(D^C[ ). 2 2) In L e m m a 4.6 the spectral radius of a m a t r i x which is exactly equal to â€” C ^ ' is proved to 1 be equal to 2^/befg. 2^/bejg ^ w n e r e ^ j ga Since piD^C^) s j n = p^D^C^) < P ^ ^ , it follows that p{D^ C^) 1 < gq_ (4.52). T h e leading terms of the Taylor expansion of this expression are given by E q . (4.75). â€¢ W e note t h a t the m a t r i x C J ' has terms which are associated only w i t h two independent 1 variables (y and z in this case), and thus the value in the Taylor expansion (4.75) which multiplies r 2 and p? (namely â€” -^h ) 2 is not equal to the value m u l t i p l y i n g a. 2 T h e above analysis and experimental observations focus on the J a c o b i iteration m a t r i x . However, it is still not clear at this point whether the Gauss-Seidel (or S O R ) eigenvalues behave in a way which resembles consistently ordered matrices. Here we can use V a r g a ' s extensions of the theory of p-cyclic matrices [112], [113, Sec. 4.4]. Below we briefly describe some of V a r g a ' s results, and use them for the reduced m a t r i x . In [113, Defn. 4.2], a set V of matrices is defined as follows: 2 T h e square m a t r i x B Â£ V if B satisfies the following properties: 2 I n [113] the set is denoted by S. Here, since S denotes the reduced matrix, the letter V is used. 92 Chapter 4. Convergence Analysis for Block Stationary Methods 1. B > 0 w i t h zero diagonal entries. 2. B is irreducible and convergent. 3. B is s y m m e t r i c . D~ SD~ -1/2 _ = I - Define S = ll2 x D -1/2^ â€” 1/2 C\D . A p p l y i n g block J a c o b i to the original reduced system is analogous to applying point J a c o b i to S, in the sense t h a t the spectra of l the i t e r a t i o n matrices associated w i t h both systems are identical to each other. T h e iteration " â€”1/2 ~ " â€”1/2 ~ m a t r i x associated w i t h S is B = D x C\D . Showing that the m a t r i x B belongs to the X set V defined above is easy and is o m i t t e d . L e t L be the lower part of B. 6L + L ,6>0 T and m {6) = p(M {0)). B B h (ln9) B Define M (6) B = Let = (4.76) 6 > 0 m {6) p{B)0W B T h e n we have by [113, T h e o r e m 4.7] that h (o>) = 1 if and only if B is consistently ordered. B In some sense h (ln6) B measures the departure of the m a t r i x B from having block P r o p e r t y A . F i g . 4.4 demonstrates how close the function h B is to 1 for the reduced m a t r i x when I D p a r t i t i o n i n g is used, and is another way to illustrate the "near-Property A " of the m a t r i x . In the figure, the function h B is computed for the same m a t r i x for which F i g . 4.2(a) is given. F i g u r e 4.4: the function Let L ByW h {ot) B denote the S O R iteration m a t r i x . [113, T h m . 4.8] reads: T h e o r e m 4 . 1 0 . If B 6 V then (4.77) with equality possible only if B is a consistently ordered 93 cyclic matrix of index 2. Chapter 4. Convergence Analysis for Block Stationary Methods T h i s is a sharpened form of the Stein-Rosenberg T h e o r e m [113, p . 70]. A p p l y i n g this theorem t o our reduced m a t r i x , we have: T h e o r e m 4.11. radius with and If the bound for the block Jacobi as h â€”>â€¢ 0, then the square including Proof. the spectral of the bound for terms of order radius iteration of the block the spectral radius tends Gauss-Seidel of the block to the actual iteration Jacobi spectral matrix iteration coincides matrix up to h . 2 Since the iteration m a t r i x B has the same spectral radius as D ~ Ci, we can use the 1 1 bound of T h m . 4.7. F o r simplicity of notation, denote it by " Clearly, since 0 < p(B) < <f>, <-Af 2-$ <B) 2-p{B) (4-78) ~ Since $ has a Taylor expansion of the form 1 - ch + o(h ), 2 and matrix <f> have the same Taylor expansion up to 0(h ) 2 2 2 by E q . (4.68) it follows t h a t terms, of the form 1 â€” 2ch 2 + o(h ). 2 Indeed, in terms of the P D E coefficients, 2^ and = 1 - f^ k r ( + + 2 + ^ a + 0 ( f c 3 ) ' - (4 79) the same for $ . â€¢ 2 For 4.9] as the block S O R scheme, the upper bound for the spectral radius is given i n [113, T h e o r e m yju>* â€” 1 a n d is not tight. However, it is numerically evident t h a t the b o u n d for the J a c o b i iteration m a t r i x can be effectively used to estimate the o p t i m a l S O R parameter. Fig. In 4.5 we can observe that the behavior for the I D splitting is q u a l i t a t i v e l y identical to the behavior of 2-cyclic consistently ordered matrices [the graph is given for the m a t r i x t h a t was used i n F i g . 4.2(a)]. In the figure we also present the behavior of the S O R unreduced iteration matrix. 4.4 Computational Work H a v i n g performed some convergence analysis, i n this section we examine the question which of the solvers is more efficient overall. If be, cd, fg > 0, then by E q . (4.68) a n d (4.69) (or by 94 Chapter 4. Convergence Analysis for Block Stationary Methods 01. 0 I â€” I â€” I â€” I â€” I â€” I â€” I â€” I â€” I â€” I â€” I 0 02. 04. 06. 08. 1 12. 14. 16. 18. 2 F i g u r e 4.5: spectral radius of the S O R iteration m a t r i x vs. the relaxation parameter. T h e uppermost curve corresponds to I D s p l i t t i n g for the unreduced m a t r i x , and then, in order, 2 D s p l i t t i n g for the unreduced m a t r i x , I D s p l i t t i n g for the reduced m a t r i x , and 2 D s p l i t t i n g for the reduced m a t r i x . [113, T h m . 3.15]) i t is evident that the spectral radius of the iteration m a t r i x associated w i t h the 2 D s p l i t t i n g is smaller than that of the I D iteration m a t r i x . However, inverting D\ involves less c o m p u t a t i o n a l work than inverting D . 2 Consider the block J a c o b i scheme. A s y m p t o t i c a l l y , in the constant coefficient case there is a fixed ratio of 1.8 between the rate of convergence of the two splittings - see E q . (4.68) and (4.69). In rough terms, this number characterizes the ratio between number of iterations until convergence for the two solvers. A s far as the c o m p u t a t i o n a l work per iteration is concerned, if D\ = L\U\ and D = 2 LU 2 2 are the L U decompositions of the matrices of the systems that are to be solved in each iteration, we can assume t h a t the number of operations per iteration is a p p r o x i m a t e l y the number of nonzeros in L{ + U{ plus the number of nonzeros in C,-. In order to avoid costly fill-in using G a u s s i a n elimination for D (whose band is sparse), a technique of inner-outer iterations is 2 used instead. Let k\ and k denote the number of iterations for the schemes associated w i t h the I D 2 s p l i t t i n g and the 2 D s p l i t t i n g respectively. Let us also define cost functions, as follows: 95 c\{n) Chapter 4. Convergence Analysis for Block Stationary Methods and c (n) represent the overall number of floating point operations for each of the solvers, a n d 2 Ci (n) represents the cost of the inner solve. T h e n n i{ ) c n c (n) 2 + Ui) + nz(S â€” D\)] â€¢ ki = [IOTI â€” 19ra + 4n] â€¢ k\ ~ [nz(Li 3 w [c (n) tn + nz(S In E q . (4.80) the term nz(X) , J - D )] â€¢ k 2 2 = [c {n) ; 2 + 3 n - 8 n + 4n] â€¢ k 3 in (4.80a) . 2 2 (4.80b) stands for the number of nonzeros of a m a t r i x X, a n d 5 stands for the reduced m a t r i x . P r o p o s i t i o n 4.2. a less costly Proof For n large enough, solution process the scheme associated than the one associated with the ID splitting If n is large enough we can use the relation of n i n the expressions for C\{n) and c (n). 2 gives only if Ci (n) n rise to < 15n . 3 = 1.8 and refer only to the leading power So doing, i t follows t h a t ci(n) C2{n) with the 2D splitting 18n s Ci (n) n + 3n ' 3 (- ) 4 81 and the result stated i n the proposition readily follows. â€¢ N e x t , the amount of work involved i n solving the inner system of equations is determined. A n a t u r a l choice of a s p l i t t i n g for this system is D 2 = D\ â€” (Di â€” D ). It is straightforward to show, by L e m m a 4.4 and T h e o r e m 4.5, that: 2 P r o p o s i t i o n 4.3. spectral radius If block Jacobi of the inner based on the splitting iteration iteratio matrix, namely D = D\ â€” (Â£>i â€” D ) 2 2 I â€” D, 1 D , is bounded 2 is used, then the by â€”, where n and Â£ are as in Eq. (4-52) and (4-64)- For considering stationary methods that are faster than block J a c o b i for the inner system, we have: P r o p o s i t i o n 4.4. Proof. The inner matrix is block consistently ordered relative T h e inner m a t r i x is block tridiagonal relative to this p a r t i t i o n i n g . 96 to ID partitioning. â€¢ Chapter 4. Convergence Analysis for Block Stationary Methods W e now have: T h e o r e m 4.12. sidered If be, cd, in this chapter, Proof. fg > 0, then for the ID solver n large enough, is faster and the stationary than the 2D methods con- solver. Below the result is shown for the block J a c o b i outer solver; for Gauss-Seidel and S O R it is done i n a similar way. T h e T a y l o r expansion of the bound in P r o p . 4.3 is Â£ 4 rj 9 .43 - = --(â€”TT v 81 2 2 + 65 648 o u 2 P 29 + 648 r 2 2 + 25 324 9 N l 9 a )h 2 2 ' + o(h ). 2 M V 4.82 ; F o r h s m a l l enough, we can simply examine the leading t e r m : the bound is a p p r o x i m a t e l y | if block J a c o b i is used, and since by P r o p . 4.4 the m a t r i x is consistently ordered, by Y o u n g ' s analysis the spectral radius is approximately | | if block Gauss-Seidel is used, and approximately 0.055 i f block S O R w i t h the o p t i m a l relaxation parameter is used. iteration costs about 7n 3 of 1 0 m F o r these schemes each floating point operations. Since reducing the i n i t a l error by a factor takes roughly â€” m/\og p 10 iterations, where p is the spectral radius of the associated iteration m a t r i x , it follows that even for the block S O R scheme w i t h the o p t i m a l relaxation parameter, which is the fastest scheme considered here, after two iterations the error is reduced only by a factor of approximately I O 2 5 which is far from enough. T h u s the iteration count is larger t h a n two, and the cost of inner solve is larger than 15ra floating point operations. 3 â€¢ W e remark t h a t an inexact inner solve could also be considered (see, for example, [43]), but we do not analyze this type of solver here. 4.5 Comparison with the Unreduced System T h e unreduced m a t r i x is an nth-order block tridiagonal m a t r i x w i t h respect to n 2 x n 2 A = t r i [ A ( " ) , A(Â°>, A W ] , 1 where A<Â°) = f I 2, n A* ) = g I i 1 n blocks: (4.83) , and A ( Â° ) are themselves block tridiagonal matrices, A(Â°> = t r i [ B ( - ) , B ( Â° ) , - B ( ) ] , 1 1 J 97 (4.84) Chapter 4. Convergence Analysis for Block Stationary Methods consisting of n x n matrices given by : B ^ = bl ; = tri[c, a, d] ; B^ n = e I n . (4.85) Consider the one-dimensional splitting A â€” D-C where D = diag[ B(Â°),..., B(Â°)] . J T h e o r e m 4.13. If cd > 0, be > 0, A = Q and fg>0 is then there exists a real nonsingular diagonal matrix Q such that the matrix Proof. This can be shown by direct substitution, by requiring that the symmetrized matrix be _ 1 AQ (4.86) J symmetric. equal to its transpose. Denote the ith entry in the main diagonal of Q by q{. q\ ^ 0 can be an arbitrary value, and the following algorithm generates the desired diagonal matrix : for i â€” 2 to n for Â£ 2 to n â€” for i = 1 to n Q(i-l)n+i for j = 2 for to â€” \ [ \ ' (t-2)n+i a n I = 1 for to n i = 1 to n From the above it is clear that the similarity transformation is real and nonsingular only if cd, fg and be are positive. â€¢ 98 Chapter 4. Convergence Analysis for Block Stationary Methods T h e resulting symmetrized m a t r i x is A = tri[i(- ),i(Â°),i( )] 1 1 (4.87) where A^ - 1 â€” y/Jg ' = I2 ; n i ( Â° ) = tri[ B(- ), fit )] 1 (4.88) 1 J with B (-i) = B (i) ^ = j B n ( 0 ) = tn[VcH, a, Vcd) . (4.89) T h e o r e m 4.13 leads to the following s y m m e t r i z a t i o n result, obtained by expressing the conditions i n terms of /3, 7 and 6 : C o r o l l a r y 4.4. symmetrizable the coefficient Cor. If\f3\, \S\ < 1, the coefficient by a real diagonal matrix similarity is symmetrizable matrix for the centered transformation. For upwind difference (backward) scheme is schemes, for all /3, 7 , 5 > 0. 4.4 demonstrates the similarity of the s y m m e t r i z a t i o n conditions between the 2 D [40] a n d 3 D problems: i n both cases the unreduced m a t r i x can be s y m m e t r i z e d by a diagonal m a t r i x only i n the diffusion dominated case i f centered differences are used, and i n both cases the reduced m a t r i x can be symmetrized for a larger range of P D E coefficients, including the case where a l l mesh Reynolds numbers are larger than 1 i n magnitude. W e now find the spectrum of the iteration m a t r i x . L e t D and C be the s y m m e t r i z e d versions of D and C respectively. T h e n : T h e o r e m 4.14. The eigenvalues of the iteration matrix D~ C l are given by (4.90) 1 < j,k,t< n . Proof. Inspired by the techniques used by E l m a n & G o l u b i n [40, p. 677], suppose V n is an n X n m a t r i x whose columns are the eigenvalues of the tridiagonal m a t r i x tn[Vcd, a, Vcd], and 99 Chapter let 4. V = Convergence diag[V ,..., D = V~ DV l n and Analysis have V] n for Block Stationary copies of V . n 2 Methods Then n Since V- b~ CV C = V~ CV. l l l D. Denote we can find the spectrum of = b~ C l diagonalizes A = V~ AV l t)- C l by examining D~ C. The latter is easier to form explicitly, as D is diagonal (as opposed to D, l which is tridiagonal). D^ C has the same nonzero pattern of C. X Let Pi denote the permutation matrix that transforms rowwise ordering into columnwise ordering, leaving the ordering of planes direction) unchanged. (z is block Fi = P^ D~ CPi x tridiagonal with respect to n X n blocks, and its superdiagonal and subdiagonal blocks are 2 2 diagonal matrices, all equal to each other. Each of these n X n matrices looks as follows : 2 diag[ iF , S , 2 ,..., a + 2ycc?cos(7r/i) a + 2v cd cos(2ir h) n terms n terms ^ a + 5 ]. (4.91) 2\ cdcos(wnh) / n terms The diagonal block consists of n identical n x n identical matrices, each being an nth-order 2 2 block diagonal matrix whose jth component is the tridiagonal matrix G j = ^or?< t r i [ a + -J â€¢ ( 4 - 9 2 ) 2vcacos(7rjft) Let V 2 denote a matrix whose columns are the eigenvectors of diag[Gi,..'. ,G ], and let V N n be the block diagonal matrix consisting of n uncoupled copies of V 2] then N is still block tridiagonal with respect to n X n 2 is a diagonal matrix : it contains whose entries are becosfrjh) 2 n 1 < n 2 x < n. j . k 2 = V ~ FiV 1 2 2 blocks, but now the main diagonal block 2 identical F 2 blocks, each being a diagonal matrix Finally, let P be a permutation matrix n 2 2 a + 2 V c Â£ f cos(7rfc/i) â€¢" ^ z which transforms rowwise ordering to planewise ordering, leaving the orientation of columns unchanged. Then F = 3 nxn is an n -order block diagonal matrix whose components are 2 P ~F P l 2 2 2 symmetric tridiagonal matrices. Their eigenvalues can be found using Lemma 1.1, and are given by (4.90). â€¢ Theorem 4.14 leads to the following useful result: Corollary 4 . 5 . the unreduced For cd, be, fg>0 system, using the spectral the preconditioner 2Vbe-cos(wh) radius defined + a + 2ycd â€¢ 100 of the line in (4-86), is 2y/Jg-cos(TTh)_ cos(nnh) Jacobi iteration matrix for Chapter 4. Convergence Analysis for Block Stationary Methods T h e T a y l o r expansion of E q . (4.90) about h = 0 is given by + o(h ) (4.94) . 2 F o r the plane J a c o b i a similar procedure to the one used in the proof of T h m . 4.14 can be applied. ( T h e algebraic details are omitted.) T h e o r e m 4.15. system is given The spectral radius of the plane iteration matrix for the unreduced by a + 2\fcd and its Taylor Jacobi expansion â€¢ cos(7rn/i) + 2\/be â€¢ (4.95) cos(nnh) about h = 0 is ^-& 2 2 y + lÂ° +\r + 2 -p ) h W 2 8 2 2 1 8 + o(h ) 2 (4.96) . T h e same type of analysis that has been done i n Section 4.4, c o m p a r i n g the I D s p l i t t i n g to the 2 D s p l i t t i n g for the reduced system, is possible for the unreduced system. Below we sketch the m a i n details: suppose inner-outer iterations are used in solving the scheme associated w i t h the 2 D s p l i t t i n g . Denote, again, this s p l i t t i n g for the inner system as D [D\ 2 = D\ â€” ( D i â€” Â£ ) ) 2 a n d J 9 are now different than the ones defined in the previous section). T h e n we have: 2 P r o p o s i t i o n 4.5. large, Consider and ID splitting considered Proof. the unreduced is used in this chapter, in solving the ID solver system. the inner is faster Suppose be, cd, fg system. Then than the 2D > 0, n is sufficiently for the stationary methods solver. T h e ratio between the a s y m p t o t i c rate of convergence between the I D solver a n d the 2 D solver is 2. T h e number of nonzeros of the whole m a t r i x is a p p r o x i m a t e l y 7 n , the number of 3 nonzeros of D\ is a p p r o x i m a t e l y 3 n and the number of nonzeros of D 3 is a p p r o x i m a t e l y 5 n . 3 2 Since the spectral r a d i i for the two splittings are available, we can find the spectral radius for the i t e r a t i o n m a t r i x of the inner system. Its Taylor expansion is given by \ â€” ( Â§ 7 r + j^a 2 ^r )/i 2 2 + o(h ). 2 2 + Defining cost functions analogous to the ones defined i n Section 4.4 for the reduced system, a n d using the same line of argument, we have ci(n) c%{n) 14n c (n) in 101 3 + 2n 3 (4.97) Chapter 4. Convergence Analysis, for Block Stationary Methods and from this it follows that only if c; (ra) < 1 2 n the 2 D solver is more efficient. However, 3 n as in T h m . 4.12, this means that at most two iterations of the inner solve can be performed, which is not enough for the required accuracy. â€¢ Since the I D s p l i t t i n g for both the reduced and the unreduced systems gives rise to a more efficient solve, we compare these two systems, focusing on this s p l i t t i n g . T h e L U decomposition for the solution of the system in each iteration is done once and for all (see [54] for operation count) and its cost is negligible in comparison w i t h the amount of work done in the iterative process. E a c h iteration in the reduced system costs about 1 0 n floating point operations whereas 3 each iteration for the unreduced system costs approximately 7 n floating point operations per 3 i t e r a t i o n . Hence, the amount of c o m p u t a t i o n a l work per iteration is cheaper for the unreduced system by a factor of about 1 0 / 7 . However, using the a s y m p t o t i c formulas (4.68) and (4.94), it is evident t h a t the number of iterations required for the unreduced system is larger t h a n t h a t required for the reduced system, and in the worst case, the ratio between the work required for solving the reduced system vs. the unreduced system is roughly (10/7) â€¢ (27/40), which is 27/28 and is still smaller than 1, thus the reduced solver is more efficient. If the convective terms are nonzero, then this ratio becomes smaller, and in practice we have observed substantial savings, as is illustrated in the test problems in Sec. 4.7. In F i g . 4.5 the superiority of the reduced system over the unreduced system for the GaussSeidel scheme is illustrated. T h e graphs were created for a 512-point grid. It is interesting to notice t h a t the reduced Gauss-Seidel iteration matrices for both I D and 2 D splittings have spectral radii t h a t are significantly smaller than 1 even for the convection-dominated case, when centered difference discretization is applied. T h e 2 D s p l i t t i n g gives rise to a m a t r i x w i t h P r o p e r t y A , and in all the numerical experiments that have been performed for this s p l i t t i n g and for Gauss-Seidel and S O R the rate of convergence was much faster than for the unreduced system; i n several cases the solver for the unreduced system d i d not converge. These observations are similar to the results in [107] for lower dimensions. 102 Chapter 4. Convergence Analysis for Block Stationary Methods The superiority of the reduced system is evident also for the S O R scheme - see F i g . 4.5. Notice that for the S O R scheme it is difficult to determine the optimal relaxation parameter when be, cd and fg are negative. 0 05. 1 (a) p gs â€” centered (b) p gs â€” upwind Figure 4.6: comparison of the spectral radii of the Gauss-Seidel iteration matrices of the reduced and unreduced systems. Axes are: x - mesh Reynolds numbers (/? = j â€” 5); y - spectral radius. The uppermost curve corresponds to I D splitting for the unreduced system, and then, in order, 2D splitting for the unreduced system, I D splitting for the reduced system, and 2D splitting for the reduced system. 4.6 Fourier Analysis We now perform Fourier analysis in the spirit of Chan & Elman [21] and Elman & Golub [40], for convection dominated equations: We are considering here centered difference discretization of convection-dominated equations where be, cd, fg < 0. The boundary conditions are assumed to be periodic and the equation has constant coefficients. For the results presented below, we denote the periodic symmetrized reduced operator, scaled by ah , by S = D â€” C, where the question which matrices D and C are is to be clear in 2 the context where they appear. T h e o r e m 4.16. line ordering with Given x-y be, cd, oriented fg < 0, suppose 2D blocks 8y/\fg\ of gridpoints 103 â€¢ {\/\cd\ + y/\be\). < a . If the two- is used, then the eigenvalues of D~ C 2 1 Chapter 4. Convergence associated Analysis with the symmetrized for Block Stationary periodic operator Methods using 2D preconditioning are all smaller 1 in magnitude. Proof. Suppose v^ ^^ are vectors whose components are u^lfi^ ~ 2vtjah 2-Kikph 2inl^h_ j-f a e e than e ere i = y/â€” 1, and 1 < j, k, /, a, /?, 7 < n. F o r a grid point Uij k we have t D Ui ,k = {a - 2be - 2cd - 2fg) u j - be u 2 %J -cdui + 2 it j > k >k - 2Vocdeu;-i,j_i,/c - beu _ itj - 2vbcdeu itj+2jk - 2 2jk V b c d e i i + - t j + l i k - cdui^ ,j,k 2 2Vbcdeu ij_i i+ (4.98) >k and C = -fgu i t h k _ - 2y/befg,u j _ 2 i +lik -2yJcdfgui+i j y yk -2yjcdfg ui_ h h k + i - 1 - 2yfbejg~Uij- , - X 1 k - 2y/cdfg u i + l t j M 1 2yJ'cdj'g - 2y/befg 1 - 2y/befg u ji t +i l t k u ~ fgu i t j + l i k + 1 i t j t k + 2 . (4.99) S u b s t i t u t i n g the eigenvectors presented above, by some algebra we get that the eigenvalues of D are A- = a - 2be - 2cd - 2fg - Ay/bcde â€¢ [cos(0 + </>) + cos(0 - </>)] 2 - 2 6 e c o s 2 0 - 2cdcos20 , where 6 = 2irah, (4.100) <p = 2ir(3h, and Â£ = 2717/1. T h e eigenvalues of C are given by A*a^,7 = \ / W f f â€¢ [Â«>s(# + 0 + cos!'/' - 0 ] 4 +4Vcd7y-[cos(0 + Â£) + c o s ( 0 - O ] + 2 / 5 c o s 2 Â£ . are all the combinations of / i / A . Denote p = -\/|cd|, T h e eigenvalues of D~ C X q = 6eI, r = (4.101) vj/g|. N o t e t h a t cd = - p , be = - q , fg = - r . T h e n 2 2 2 p, 8pr cos 0 cos Â£ + 8gr cos <f> cos Â£ - 4 r c o s Â£ + 2 r A a + 2 r â€” 8pg cos # cos 0 + Ap cos # + 4 g c o s <> / 2 2 2 2 2 2 2 2 2 (4.102) For the denominator in (4.102) we have A = a + 2r 2 2 + 4 ( p c o s 0 - qcoscf>) 2 104 > a + 2r 2 2 . (4.103) Chapter 4. Convergence Analysis for Block Stationary Methods c o s Â£ appears only i n the numerator, which is a quadratic function i n i t : M(cosÂ£) = 2 r ( l + s c o s Â£ - 2 c o s Â£ ) 2 , 2 (4.104) where s = 4- Let s = m a x | s | = 4(p+q)/r. m a x p cos 8 + q cos d> ^ . (4.105) If p+q < r then p has a local m a x i m u m at â€”1 < c o s Â£ = | < 1, and then ^ T h e m i n i m a l value of ^ = l + s /8 < l + s 2 . (4.106) occurs at either c o s Â£ = 1 or c o s Â£ = â€” 1, where its value is ^f thus Pmin/(2r ) m a x = - l Â± S , (4.107) = - 1 â€” |s| a n d from this i t follows that the m a x i m a l absolute value o f 2 is bounded by 1 + \s\. W e conclude then, t h a t for this case \p/{2r ) 2 \ < 1+ s m a x p/(2r ) 2 . It now remains to check what happens if p-\- q > r. In this case there are no local e x t r e m a for p, a n d we have t o look at the end points. p/(2r ) 2 = - l i s which yields \p/(2r )\ <l +s 2 | S u b s t i t u t i n g c o s Â£ = Â± 1 , as above, we have m a x . W e thus have |<2r -[l+i^l]. (4.108) 2 M F r o m (4.103) a n d (4.108) it follows that \p/X\ < 1 i f 8r(p + q) < a , which i n terms o f the 2 components of the c o m p u t a t i o n a l molecule is exactly the condition stated i n the theorem. â€¢ F o r the two-plane ordering, since the m a t r i x entries were given i n the previous section for 2 P N x z , we refer to this particular ordering strategy below, a n d we have the following: T h e o r e m 4.17. the eigenvalues two-plane Given of D~ C ordering l be, cd, fg < 0, suppose associated are all smaller 4-y/|6e| â€¢ (2^/\cd\ with the periodic than 1 in operator magnitude. 105 + y/\be\ + 2y/\fg\) using < a. 2D preconditioning 2 Then for the 4. Chapter Convergence Analysis for Block Stationary Methods For a grid point Â«t,j,fc, the structure of the D and C depends on the parity of the index Proof. j, associated with the independent variable y. If j is even D u = (a - 2be - 2cd - 2fg) u j ~ fg Ui,j,k-2 2 i i h k it -2y/cdfg Ui-i -i - 2y/cdfg ijtk -cdu , ,k u - 2Vbcdeui-i j- k i+2 3 t -2y/cdjg u i + h j M i tk i + l i j i k - i ~ 2y/befgu _ k-i ijJ - li - 2 y / c d f g - li - 2y/befguij_ cdui_ ,j,k 2 2 v & c r f e Â« i + i j - i ,k / (4.109) - fgu 2 iijM l!k+1 and Cu j - it tk -2y/befgu j -i ~ 2y/befg -beuij-2,k - beu j ,k it +ltk u - i: +2 i t J + l t h + i - 2Vbcde Ui- 2\/bcdeUi+ij+i, k l t j + l t k (4.110) â€¢ If j is odd, then the roles of j + 1 and j â€” 1 in D and C are interchanged. In both cases the eigenvalues are identical, and after some algebra we get: p 4qr cos <j> cos Â£ + 4pq cos <f> cos <f) + 2q cos 2c/> A a + 2g â€” 8pr cos 9 cos Â£ + 4p cos 9 + 4 r cos Â£ â€” 4gr cos <> / cos Â£ â€” 4p</ cos 9 cos <^> 2 2 2 2 2 2 2 (4.111) We can write A = a + 2q + 4(pcos9 2 2â€¢ Â° p c s g + r c o s g , W 2 - r c o s Â£ ) - 4qcos<f>(pcos# + r cosÂ£) , and if s = 2 = 2(p + ) / r , we have 9 A > a + 2q - 2 s 2 2 m a x q 2 . (4.112) For the numerator, p, we use a technique similar to the one used in Theorem 4.16: p is quadratic in cos^, and we have p = 2 g ( - l + scos</>-2cos <?!>) . 2 If |s| < 4 then ^ 2 (4.113) 2 obtains its maximum at cos<?!> = | , where p/(2q ) 2 = 1 + (s )/8 < 1. The 2 â€” minimum is obtained at cos<^ = Â± 1 and we have M n W ( 2 g ) = - 3 - \s\ . 2 106 (4.114) Chapter 4. So, \p,/(2q ) 2 Convergence \< 3+ s m a x Analysis for Block Stationary Methods . Hence, it follows that 2(3 -f- s a 2 + 2q 2 m a x - )q 2s 2 m a x q (4.115) 2 â€¢ T h e latter is smaller than 1 if 4g(2p + q + 2r) < a . 2 4.7 Comparisons In this section some i n d i c a t i o n as to the performance of block stationary methods for the reduced system is given, and the effectiveness of the cyclic reduction step is i l l u s t r a t e d . C o m p a r i s o n between the reduced and the u n r e d u c e d systems Consider E q . (1.15) where the right hand side is such that the solution for the continuous problem is u(x,y,z) = sin(7ra;) â€¢ sin(7ry) â€¢ sm(nz), vector was taken as an i n i t i a l guess, and and the d o m a i n is the unit cube. T h e zero H^'lb/lM^lb < IO - 1 0 was used as a s t o p p i n g criterion ( r ^ denotes, as usual, the residual at the ith iterate). T h e program stopped i f the stopping criterion was not satisfied after 2,000 iterations. In the experiments t h a t are presented, the I D solver is used. In Table 4.4, the grid is of size 32 X 32 X 32. T h e m a t r i x of the underlying system of equations is of size 32, 768 X 32, 768. In the table, iteration counts for the J a c o b i , Gauss-Seidel and S O R schemes are presented for four values of the P D E coefficients, and for two discretization schemes. It is straightforward to make the connection between iterations and c o m p u t a t i o n a l work: since the problem has constant coefficients, the construction of both the reduced and the unreduced systems is straightforward and inexpensive, and is negligible in the overall cost of the c o m p u t a t i o n (see Sec. discussion on the cost of constructing the reduced m a t r i x ) . 5.3 for T h e number of floating point operations per iteration is approximately twice the number of nonzeros in each m a t r i x . The number of nonzero entries in the reduced m a t r i x is higher by a factor of about 35% than the number of nonzeros for the unreduced m a t r i x . F o r the block J a c o b i m e t h o d , in the L U decomposition of the preconditioner (which is done once and for all), there is no fill-in for the unreduced m a t r i x , and there is fill-in of a modest amount of \ 107 entries for the reduced m a t r i x . Chapter 4. Convergence Analysis for Block Stationary Methods A s a result, each iteration for the reduced system costs a p p r o x i m a t e l y 2 0 n floating point 3 operations, and each iteration for the unreduced m a t r i x costs a p p r o x i m a t e l y 1 4 n floating point 3 operations. T h e iterations in the reduced solver are thus approximately 42% more expensive. system P D E coeff. ( o = T = fi) reduced unreduced 10 20 100 1000 10 20 100 1000 Jacobi centered 393 173 53 N/C 1,030 444 N/C N/C GS centered 188 77 14 322 492 198 N/C N/C SOR centered 36 25 - 61 38 - - Jacobi upwind 455 239 75 43 1,194 620 179 89 GS upwind 219 111 27 10 574 287 63 16 SOR upwind 39 27 18 9 66 45 24 11 Table 4.4: comparison between iteration counts for the reduced and unreduced systems, for different values of mesh Reynolds numbers. ' N / C marks no convergence after 2,000 iterations. T h e experiments were performed on a 32 x 32 x 32 grid. F o r the values of the P D E coefficients in the table, namely 10, 20, 100 and 1000, the corresponding values of the mesh Reynolds numbers are 0.1515, 0.3030, 1.515 and 15.15. Notice t h a t the last two are larger than 1, and so, for these values we do not know the o p t i m a l relaxation parameter and the S O R experiments for these values were not performed. T h e following observations can be made: 1. O v e r a l l the reduced solver is substantially faster than the unreduced solver. E v e n though each iteration of the reduced solver is more expensive (as explained above), the savings in iteration counts are much more significant and in regions where b o t h solvers converge, the reduced solver is more efficient by a factor of at least 50% (except one case), and much more in many cases. Moreover, a significant fact here is t h a t there are cases where the reduced solver converges whereas the unreduced solver does not. 2. F o r the reduced solver, the Gauss-Seidel scheme outperforms the J a c o b i scheme by a factor of a p p r o x i m a t e l y 2 in diffusion-dominated regions, which illustrates the "near P r o p e r t y 108 Chapter 4. Convergence Analysis for Block Stationary Methods A " the m a t r i x has. In convection-dominated regions w i t h centered differencing the GaussSeidel scheme is significantly better t h a n J a c o b i . (In this case the m a t r i x is not necessarily nearly consistently ordered). 3. If centered differences are used, for o = 20 convergence is faster t h a n for a = 10. T h i s illustrates the phenomenon which is supported by the analysis and holds also for the two-dimensional case [41] - that for sufficiently small mesh R e y n o l d s numbers, the "more n o n s y m m e t r i c " systems converge faster t h a n the "close to s y m m e t r i c " ones. 4. T h e u p w i n d difference scheme converges more slowly t h a n the centered difference scheme when the mesh Reynolds numbers are small in magnitude, but convergence is fast for large mesh R e y n o l d s numbers. T h i s applies to both the reduced and the unreduced systems, and follows from the fact that as the P D E coefficients grow larger, the m a t r i x is more diagonally d o m i n a n t when upwind schemes are used. N e x t , we consider an example of a nonseparable problem, for which our convergence analysis does not apply. Consider the problem â€”O.lAu + yzu w i t h N e u m a n n boundary conditions u x z + xzuy + xyu z = w , (4.116) = 0 at z = 0 , 1 , and zero Dirichlet conditions for x = 0,1 and y = 0 , 1 , on the unit cube. N o t e that there is a t u r n i n g point, but it is on the boundary. Here w was constructed so that the exact solution is u(x,y,z) = sin(7ra;) sin(7ry) cos(7rz). T h e ordering strategy used is 2 P N x z . In this case it is impossible to know the o p t i m a l S O R parameter; we examine the iteration count for block J a c o b i and block Gauss-Seidel. T h e results in T a b l e 4.5 are for a 20 X 20 X 20 grid (8,000 gridpoints in the tensor-product g r i d ) , using centered difference discretization. T h e following observations can be made: 1. E v e n though the problem is nonseparable, the solvers have a behavior which is similar to the convergence results for the constant coefficient case. 109 Refer back to E q . (4.68), Chapter 4. Convergence Analysis for Block Stationary Methods method reduced unreduced BJ,1D 655 1,787 BJ,2D 445 904 BGS,1D 323 847 BGS,2D 220 466 Table 4.5: iteration counts for different iterative schemes (4.69), (4.94), (4.96): T h e difference in iteration counts matches the ratios.predicted by the analysis: the improvement after the cyclic reduction step is performed is by a factor of 100% to 200% in iteration counts. A s is predicted by the analysis, the improvement for the I D solver is more d r a m a t i c than the improvement for the 2 D solver. 2. F o r the unreduced system the differences between the iteration counts between the I D solvers and the 2 D solvers is approximately a factor of 1.5; in the constant coefficient case the convergence analysis provides a factor of 1.8. A l s o , we notice an improvement by a factor of approximately 2 between the I D and the 2 D solver (in iterations) for the unreduced system; the predicted factor of improvement for the constant coefficient case is 2. 3. T h e improvement in rate of convergence between J a c o b i and Gauss-Seidel is approximately 2 for I D partitioning, relative to which the m a t r i x does not have P r o p e r t y A . C o m p a r i s o n of variants of the orderings Tables 4.6 and 4.7 present selected results of applying block Gauss-Seidel based on various splittings. These results were obtained as part of an extensive set of numerical experiments w h i c h have been performed for all possible combinations of signs of the P D E coefficients, for various magnitudes of the convective terms and various test problems. T h e results presented in the tables are for the model problem (1.15), where the right-hand-side vector is such that the a n a l y t i c a l solution is u = 100xyz(l â€” x ) ( l â€” y)(l â€” z) exp(a; + y + z). Centered difference discretization was done on a 12 X12 X12 grid. T h e results give good indication as to the situation 110 Chapter 4. Convergence Analysis for Block Stationary Methods for finer grids. The stopping criterion was | | 7 " ' ^ | | / | | r ( ' | | 2 < 1 0 0 - 1 3 2 . The two-plane matrix was used, with x-z oriented sets of gridpoints. The following partitionings were considered: 1. The natural two-plane ordering with I D preconditioning (termed ' I D ' in the tables). 2. The four-color block ordering with I D preconditioning ('4C'). 3. The natural two-plane ordering with 2D preconditioning ('2D'). 4. The 2D red/black block ordering with 2D preconditioning ('RB'). (7 r 100 0 -100 ID 4C 2D RB o~ T M ID 4C 2D RB 0 17 15 12 11 500 0 0 14 14 10 10 0 0 17 15 12 11 -500 0 0 14 14 10 10 0 100 0 24 26 28 28 0 500 0 197 201 192 194 0 -100 0 28 25 30 28 0 -500 0 203 202 194 195 0 0 100 25 24 12 11 0 0 500 197 201 10 10 0 0 -100 28 24 12 11 0 0 -500 202 201 10 10 Table 4.6: iteration counts for the block Gauss-Seidel scheme, for one nonzero convective term. <J r V- ID 4C 2D RB 100 100 0 28 26 26 26 -100 100 0 28 26 25 -100 -100 0 31 26 100 -100 0 31 26 T M ID 4C 2D RB 500 500 0 260 273 254 275 26 -500 500 0 253 278 256 279 28 26 -500 -500 0 263 278 263 280 27 26 500 -500 0 264 274 258 275 Table 4.7: iteration counts for the block Gauss-Seidel scheme, for two nonzero convective terms. The results can be interperted in two aspects: performance of natural block orderings vs. multicolor block orderings, and relation to the direction of the velocity vectors. In [39], Elman & Chernesky have studied the one-dimensional convection-diffusion equation in conjunction with the Gauss-Seidel solver, and concluded that the solver is most effective when the sweeps 111 Chapter 4. Convergence Analysis for Block Stationary Methods follow the flow; when the sweeps are opposite to the flow, red/black orderings perform better t h a n n a t u r a l orderings, but not as well as natural orderings in the case where the sweeps follow the flow. In [42], E l m a n & G o l u b conducted numerical experiments for the two-dimensional convection-diffusion problem, using one-line and two-line orderings in conjunction w i t h block s t a t i o n a r y methods as well as I L U preconditioned G M R E S , and for stationary methods concluded t h a t r e d / b l a c k orderings have essentially the same iteration counts in average as n a t u r a l orderings have, but they are less sensitive to the direction of the flow. F o r the three-dimensional problem w i t h the orderings examined here, we observe t h a t in general multicolor and n a t u r a l orderings have essentially the same iteration counts for moderate convection ( P D E coefficients of magnitude 100) but on the other hand, the performance of n a t u r a l orderings is better when convection dominates. (The differences are bigger when all convective terms are nonzero.) M u l t i c o l o r orderings are less sensitive to change of sign of the P D E coefficients. Loosely speaking, if good performance is related to following the direction of the flow, then since the I D blocks are x-z oriented and the ordering of the blocks in y direction is done from b o t t o m to top, it is expected t h a t when the signs of r or p, are positive, the scheme would converge faster (the sign of a plays a smaller role, as it is associated w i t h the variable i n the direction of which the preconditioning is done). A s can be observed in Table 4.6, the n a t u r a l orderings are slightly more sensitive to the change of sign in r (orientation of 2 D sets of gridpoints), and least sensitive to changes in a (the direction associated w i t h the preconditioning). Since there is alternating direction in the ordering of the gridpoints in each of the I D sets of gridpoints, the sensitivity to sign is in general not p a r t i c u l a r l y high. T h e fact that the Gauss-Seidel solver w i t h multicolor orderings converges in a reasonable rate is meaningful: these ordering strategies are parallelizable (see Sec. 5.5). 112 Chapter 5 Solvers, P r e c o n d i t i o n e r s , I m p l e m e n t a t i o n a n d Performance In this chapter solving techniques, implementation and performance are addressed. For solving the reduced system, block stationary methods have been considered in Chap. 4, and in this chapter we consider preconditioned Krylov subspace solvers. This is followed by discussion on implementation. We then present numerical results which demonstrate the effectiveness of solvers of the cyclically reduced system. 5.1 Krylov Subspace Solvers and Preconditioners - Overview Krylov subspace solvers are a family of projection methods, which in the context of linear systems can be described as methods that compute an approximation to the solution x = Ab _1 in a particular subspace of 1Z . Requiring that the approximate solution belong to a n specific subspace introduces a series of constraints. These could be, for example, orthogonality conditions on the residual. A general introduction to projection methods can be found in [95, Chap. 5]. Krylov subspace methods have not been used as long as stationary methods. Golub and van der Vorst [58] remark that the first Krylov subspace methods were developed roughly at the same time the S O R method was developed, but the latter was more popular, due to its modest storage requirements. The available convergence analysis of these solvers is less comprehensive than the convergence analysis of stationary methods, particularly for nonsymmetric systems like ours. Nevertheless, their performance is generally very good, particularly when the linear 113 Chapter 5. Solvers, Preconditioners, Implementation and Performance system is effectively preconditioned first. M u c h research work has been done in this area in the last ten to fifteen years; many new variants and convergence results have been derived, and numerical difficulties (in particular, situations of breakdown) have been addressed. There are several excellent references for these methods. A m o n g them we mention the books of Saad [95] and G r e e n b a u m [61], and the survey papers of F r e u n d , G o l u b & N a c h t i g a l [49] and G o l u b & van der V o r s t [58]. A s before, consider the linear system Ax = b, and let the kth iterate. T h e fc-dimensional Krylov subspace = b â€” AxW associated w i t h A and r(Â°) is defined as: / C ( A , r ( Â° ) =span{r( \Ar(Â°\...,A - rW} 0 . k 1 fc T h e basic principle is: at the kth iterate, seek an approximate solution x^ subspace x ^ ' + / C ^ ( A , r ^ ' ) . T h e criteria for c o m p u t i n g x^ 0 0 1. T h e R i t z - G a l e r k i n approach: compute x^ be the residual at .. (5.1) in the shifted K r y l o v fall into three m a i n categories [58]: such that r' ) is orthogonal to ICk(A, fc r ( ' ) . For 0 general nonsymmetric systems this is known as F O M (full orthogonalization method) [96]. W h e n the m a t r i x is s y m m e t r i c positive definite we get the well known conjugate gradient m e t h o d [69]. 2. T h e m i n i m u m residual approach: compute x^ such that | | r ( ' | | is m i n i m a l over fCk{A, fc 2 r( '). 0 In this category M I N R E S [85] and G M R E S [96] are well k n o w n methods. 3. T h e P e t r o v - G a l e r k i n approach: compute x^ fc-dimensional so that is orthogonal to some other subspace. B i C G [90] and Q M R [50] are methods based on this approach. H i g h l y effective methods that are hybrids of the above three approaches are C G S [100], B i - C G S T A B [110], T F Q M R [48], F G M R E S [93], and others - see [58] for descriptions. T h e first concern when c o m p u t i n g o p t i m a l solutions in the K r y l o v subspace, is the basis t h a t should be used. Using the obvious basis {A'A }} would cause numerical difficulties i n finite 0 precision arithmetic, as the vectors A AÂ°\ 3 even for relatively small j , might be d o m i n a t e d (as far as their direction goes) by the eigenvector corresponding to the largest eigenvalue; instead, an o r t h o n o r m a l basis is formed. A widely used technique for generating this basis is A r n o l d i ' s 114 Chapter 5. Solvers, Preconditioners, Implementation and Performance method [5], which was originally introduced as a technique for reducing a dense matrix into an upper Hessenberg form (useful for computing the eigenvalues). A t each step, the j'th vector Vj of the basis for K, is multiplied by the matrix A , and orthogonalized against all previous k vectors in the basis, namely v\,..., The starting vector is Vj-i- (o) w%r vi= ' (5 2) The upper Hessenberg matrix is generated throughout the process. Originally, the technique for orthogonalizing the vectors in [5] was the standard GramSchmidt algorithm. Later versions of Arnoldi's method use modified Gram-Schmidt, which is mathematically equivalent to Gram-Schmidt but is more stable numerically, and Householder transformations (see [55] for details on computational work and accuracy of these methods). When the matrix A is nonsymmetric, the process of orthogonalization becomes increasingly expensive as the dimension of the subspace increases. Denoting the (& + 1) X k upper Hessenberg matrix generated in Arnoldi's algorithm by H k + l i k V AV = H T k w , k Kk e have [95]: , ' (5.3) where V is the k x k matrix whose columns are the vectors of the orthonormal basis, and k H ,k k is the matrix obtained by taking the first k rows of H k + i t k . From here, it depends which approach is taken. The Ritz-Galerkin approach is equivalent to requiring v A ) = o, and it T k k can be shown [95] that = x ^ + V y( \ k k where = [H ^ }~ 1 k k | | r ' Â° ) | | e i , and 2 e\ denotes the first vector in the standard basis for 1Z . If A is symmetric positive definite, the Hessenberg k matrix is reduced to a tridiagonal form, and the above-described procedure is (essentially) the conjugate gradient method [69] (see also [56]). If the above approach is modified so that instead of orthogonality, one requires that H r ^ ) ^ is minimal over the shifted Krylov space, then the result is the G M R E S method of Saad & Schultz [96]. This is a most important contribution, which actually marked the beginning of a wave of enormous popularity of Krylov subspace methods. Practical implementation of the algorithm is discussed in detail by Saad in [95, Sec. 6.5.3]. For a theoretical discussion on the 115 Chapter 5. Solvers, Preconditioners, Implementation and Performance differences and the similarities between F O M and G M R E S , see [95, pp. 165-168]. The P e t r o v - G a l e r k i n approach is based on the a t t e m p t to obtain the attractive property of three-term recurrence relations among the eigenvectors (that s y m m e t r i c matrices have) "by brute force". In general, an orthogonal basis based on three-term recurrence relations cannot be constructed for nonsymmetric matrices. Nevertheless, it can be obtained if the requirement of o r t h o g o n a l i t y is d r o p p e d . One can then add constraints of orthogonality w i t h respect to some other basis. T h i s means that we are looking for a m a t r i x W j , such t h a t each new basis vector Vi of the K r y l o v subspace is orthogonal to the first i - 1 c o l u m n vectors of W ; ( w i , . . . , WJ-I),. and we also require vjwi ^ 0. T h e construction is thus done by generating bi-orthogonal basis sets. T h e B i C G [90] and the Q M R [50] methods are based on these ideas (with features that are designed to avoid certain situations of breakdown). F o r example, in the B i C G m e t h o d there are two sequences of residuals, one for each of the (bi-orthogonal) sets, denoted by r'- ' and 7 which are p o l y n o m i a l s i n A and A T f^ \ 3 respectively. A s mentioned earlier, hybrids between the various approaches have been also developed. One of the most popular methods here is B i - C G S T A B , of van der V o r s t [110]: it is a variant of the B i C G m e t h o d , derived by a different choice of the p o l y n o m i a l associated w i t h f the choice of the p o l y n o m i a l coefficients is done so that the residual vector A ^ is m i n i m i z e d . 3 In this sense B i - C G S T A B is a hybrid of B i C G and G M R E S . K r y l o v subspace methods perform significantly better when the m a t r i x is well conditioned. A n i l l u s t r a t i o n of t h a t can be given by referring to the following convergence result for the conjugate gradient method [61]: ||x-x(*)|U<2.f^^fiy.||x-^)|U where \\.\\A is the energy norm, defined by \\V\\A â€” VV Ay. T l A s is evident, convergence is fast when A is well conditioned; when it is not, it is worthwhile to find a preconditioning say M , which has the property that M~ A X (5.4) matrix, is better conditioned t h a n A, and then replace the 116 Chapter 5. Solvers, Preconditioners, Implementation and Performance original system Ax = b by M- Ax (5.5) = M~ b. 1 1 Effective preconditioning is a crucial issue. Saad [95, p. 245] states that "In general, the reliability of iterative techniques, when dealing w i t h various applications, depends much more on the q u a l i t y of the preconditioner than on the particular K r y l o v subspace accelerators used". In fact, we have already discussed preconditioned systems i n C h a p . 4; indeed, for fixed point schemes based on the splitting A = M â€” N we have x^ ^ = g(x^), k+1 g(x) = M~ Nx 1 + M'H where g : TZ n 7Z , and n (5.6) . Convergence is to the fixed point of g: x = g(xY, (5.7) and thus by s u b s t i t u t i n g (5.7) and N = M â€” A in (5.6) we are actually solving the linear system (5.5). T w o obvious ways to perform preconditioning are [95]: 1. left preconditioning, which means solving (5.5). 2. right preconditioning, which means solving AM~ y l = b, y â€” Mx. T h e difference between these two approaches is discussed i n [95, Sec. 9.3.4]. of left and right preconditioning is also widely used (split preconditioning). Combination A m o n g popular preconditioning techniques we mention preconditionings based on incomplete factorizations, preconditionings based on approximate inverses, and block preconditioners. See [95] for an overview of several of these techniques. O n e of the most popular techniques for preconditioning a linear system is the class of incomplete factorizations. These methods are based on c o m p u t i n g a factorization LU which is a fairly good a p p r o x i m a t i o n to the m a t r i x , and at the same time the factors are sparse, so t h a t the the preconditioner solve is inexpensive. Here the simplest forms are (point or block) J a c o b i 117 Chapter 5. Solvers, Preconditioners, Implementation and Performance and S S O R . These factorizations are based on the coefficient m a t r i x itself: J a c o b i corresponds to diagonal preconditioning, and S S O R corresponds to t a k i n g L = I + uED~ l and U = D + u>F, where D, E and F are the diagonal, lower triangular and upper triangular parts of the m a t r i x respectively. If we set u = 1 we get the S G S scheme. These preconditioning techniques are setup in m i n i m a l cost ( S S O R ) or no cost at all (Jacobi). N o additional storage is required for both techniques. O n the other hand, these techniques (and in particular the J a c o b i preconditioner) are not always effective (see numerical examples in [87]). Incomplete LU (ILU) factorizations, which are based on c o m p u t i n g the L U decomposition, but w i t h o u t allowing much fill-in to occur, form a popular class of methods. In 1977, Meijerink and van der V o r s t [78] showed t h a t when the m a t r i x for which the factorization is performed is a s y m m e t r i c M - m a t r i x , there exists an I L U factorization and it is highly effective when applied to conjugate gradient as a preconditioner. T h i s paper was very i m p o r t a n t in understanding the merits of incomplete L U factorizations and in directing interest of researchers to these techniques. A well k n o w n member in the class of incomplete L U factorizations is I L U ( O ) : here L and U are constructed so t h a t the m a t r i x A = LU satisfies a;j = a;j whenever a,-j ^ 0. ILU(O) is easy to implement. Generally, the cost of setting it up is approximately (5.8) C = â‚¬ n floating point operations [87], where m is the number of nonzeros i n the m a t r i x and n is the number of unknowns. ILU(O) could behave poorly, for several reasons [14]. One possible reason: the pivots can be very s m a l l . In nonsymmetric matrices arising from discretization of P D E s , diagonal dominance is not at all guaranteed. In fact, even when the coefficient m a t r i x is very well behaved, for example symmetric positive definite, the pivots of the incomplete factorization are not guaranteed to be necessarily positive [11], or far from zero in magnitude. A n o t h e r source of trouble could be t h a t the triangular solvers themselves are severely ill-conditioned. E l m a n [38] studied this aspect in detail. W a y s of detecting the ill-conditioning are suggested in [24]. 118 Chapter 5. Solvers, Preconditioners, Implementation and Performance T h e c o n d i t i o n i n g of the factors for certain two-dimensional convection-diffusion equations has been experimentally studied in [14]. One way to improve the stability is by allowing more nonzeros in the factors L and F o r example, denote the factors of the ILU(O) factorization by L 0 U. and UQ. T h e n , the I L U ( l ) factorization is obtained by allowing L and U have the nonzero structure of L U . 0 0 The I L U ( l ) factorization is t y p i c a l l y more accurate compared to I L U ( O ) . O n the other hand, the number of nonzeros can be substantially higher, so t h a t the process is more costly (a system associated w i t h the preconditioning m a t r i x has to be solved once or twice at every i t e r a t i o n ) . T h e same principle leads to I L U ( 2 ) , I L U ( 3 ) , and so on. T h e number of nonzeros in the factors is i m p o r t a n t , because is can serve as a good i n d i c a t i o n for: 1. T h e storage required. 2. T h e set-up time for the preconditioner. 3. T h e time a preconditioner solve requires. Obviously, the number of nonzeros in the factors of the ILU(O) factorization is equal to the number of nonzeros in the s u m of the triangular parts of the original m a t r i x . In order to find the nonzeros of the matrices associated w i t h the more accurate I L U ( l ) f a c t o r i z a t i o n , there is no need to actually perform any m a t r i x product [95]. Instead, one can determine the fill-in by e x a m i n i n g the c o m p u t a t i o n a l molecules associated w i t h the triangular parts L and U. I L U ( p ) factorizations exist if the m a t r i x is an M - m a t r i x [78], and in this case their associated s p l i t t i n g is a regular s p l i t t i n g [95, T h m . 10.2]. I L U ( p ) factorizations do not take into account the magnitude of the m a t r i x elements. One might instead consider using a threshold: look at the magnitude of each element d u r i n g the factorization, and drop it if it is below a certain (prespecified) threshold. Such a technique is I L U T [94], which first applies a rule based on threshold, and after the factorization is completed takes only a certain number of elements (the largest) in each row. W h e n the drop tolerance 119 Chapter 5. Solvers, Preconditioners, Implementation and Performance is sufficiently small these factorizations are very effective (see, for example, Pommerell [87]); however, the amount of nonzeros in the factors could be substantially higher than that of the coefficient matrix. A l l the above-mentioned incomplete factorizations have block versions. The idea of these factorizations is to treat the matrix as consisting of blocks (submatrices) rather than elements. The existence of incomplete block factorizations is guaranteed if the matrix is an M-matrix (Axelsson [8]). A n important paper which contains a comprehensive testing of various block preconditioners for the symmetric positive definite case is by Concus, Golub & Meurant [26]. When using block factorizations, inverting the pivotal blocks is costly and might cause fill-in, as the inverse of a sparse matrix is not necessarily sparse (see Meurant [79] for a review of the topic of inverses of symmetric tridiagonal and block tridiagonal matrices). There is fill-in also in the off-diagonal blocks, due to a repetitive computation of Schur complements of the blocks. In order to overcome the problem of extra work and storage, approximate inverses which preserve sparsity are considered. Possible strategies are suggested, for example, in [9],[34],[114]. If the matrix is referred to as block tridiagonal then only the diagonal block elements are modified and fill-in can be avoided. Modifying only the diagonal elements can be done also for general matrices (not necessarily tridiagonal), and is known as the D - I L U factorization (Pommerell [87]). In general, block incomplete L U factorizations are less effective for three-dimensional problems than for two-dimensional problems [11]. Magolu & Polman [76] discuss the effectiveness of block ILU factorizations for three-dimensional problems, and show that line partitionings could in some cases be superior to point incomplete factorizations, but not always; plane partitionings are less effective. 5.2 Incomplete Factorizations for the Reduced System For our reduced system, recall that the computational molecule consists of 19 points. The ith row of LU is obtained by multiplying each of the ten components of the computational molecule 120 Chapter 5. Solvers, Preconditioners, Implementation and Performance which are associated w i t h the m a t r i x L, by the corresponding rows in U. Suppose the row of the m a t r i x for which fill-in is examined is associated w i t h a gridpoint whose coordinates are (i,j,k), and numbering of the unknowns is done in x-y plane oriented natural lexicographic fashion. In F i g . 5.1 we present a two-dimensional slice of the stencil associated w i t h the factor U; we pick the plane where the associated gridpoint for which discretization was done lies. T h e stencil in the figure is also the fill-in pattern corresponding to the planes which contain gridpoints (i, j, kÂ±2). X X X Â® x F i g u r e 5.1: a two-dimensional slice of the stencil associated w i t h U (the value on the diagonal is circled) In F i g . 5.2 we demonstrate the fill-in (a 2 D slice) t h a t occurs by combining the stencils in F i g . 5.1 associated w i t h all the components in the c o m p u t a t i o n a l molecule. In F i g . 5.3 the stencils associated w i t h the neighboring planes are presented. X X X X X (x) (x) X Â®~Â® x Â® x F i g u r e 5.2: fill-in in the construction of I L U ( l ) in the plane that contains the gridpoint for which the discretization was done (dashed circle). C i r c l e d are the points t h a t belong to the original c o m p u t a t i o n a l molecule. H a v i n g determined the nonzero patterns in all planes which belong to the lower triangular 121 Chapter 5. Solvers, Preconditioners, Implementation and Performance X X X X X X Â® X F i g u r e 5.3: fill-in i n the construction of I L U ( l ) i n the plane adjacent to the gridpoint for which the discretization was done (circled are the points that belong t o the original c o m p u t a t i o n a l molecule) m a t r i x L, where fill-in is experienced, the above can be s u m m a r i z e d as follows: P r o p o s i t i o n 5.1. of the reduced Li + Ui Denote matrix by L\ and U\ the matrices associated with Proof. ordering. with the ILU(l) Then the number factorization of nonzeros of satisfies nz(Li where lexicographic associated n z ( 5 ) stands for the number + Ui)*-.nz(S) of nonzeros (5.9) , in the reduced matrix. F i g . 5.1-5.3 clarify how many nonzeros are generated i n each 2 D slice when going by planes z-direction. (There are other 2 D slices that need to be considered, b u t these do not generate any nonzeros t h a t have not been already presented i n the three figures; the details are omitted.) T h u s i n each row associated w i t h an interior gridpoint there are 13 + 1 1 â€¢ 2 + 5 â€¢ 2 = 45 nonzeros. T h e two factors of 2 here come from the fact t h a t there are t w o neighboring planes, and t w o separate planes contain the points ( i , j , A: Â± 2). F o r points next to the b o u n d a r y the fill-in level is smaller, a n d thus the the number | | is only an estimate. â€¢ In order to check the estimate i n P r o p . 5.1, the I L U ( l ) factorizations of reduced matrices corresponding to 8 X 8 X 8, 16 X 16 X 16 and 24 X 24 X 24 grids have been generated; the actual number of nonzeros a n d the estimates are presented i n Table 5.1. 122 Chapter 5. Solvers, Preconditioners, Implementation and Performance n nz(5) n z ( L i + Ui) estimate 8x8x8 3,760 7,522 8,905 16 x 16 x 16 34,400 75,218 81,474 24 X 24 x 24 121,104 272,194 286,825 Table 5.1: number of nonzero elements in the I L U ( l ) factorization: actual numbers and estimates T h e same can be done for any other ordering, and for higher allowed levels of fill. In F i g . 5.4 the sparsity patterns of the factors corresponding to the I L U ( l ) and the ILU(2) factorizations for the two-plane m a t r i x are presented. 0 50 100 150 200 250 0 nz = 7510 (a) I L U ( l ) 50 100 150 nz= 11380 200 250 (b) ILU(2) F i g u r e 5.4: sparsity patterns of the factors of I L U ( l ) and I L U ( 2 ) , associated w i t h the two-plane matrix. M u l t i c o l o r block versions of orderings, discussed in C h a p . 3, give rise to a larger amount of fill-in i n the incomplete factorization. F o r the four-color two-plane ordering the m a t r i x has 8,700 nonzeros - compare to 7,510 for the natural two-plane ordering. A s a result, more work per iteration is required. T h e following result is due to Beauwens [13]: 123 Chapter 5. Solvers, Preconditioners, Implementation and Performance Suppose A is a nonsingular M-matrix and A = D - C\ = D - C are splittings of A , and let x D\ and = LiUi D 2 = LU 2 2 2 2 be incomplete factorizations of A such that the set of nonzeros of Li + Ui is contained in the set of nonzeros of L + U . Then 2 p{D^C ) 2 < p(D^Ci) 2 . (5.10) This result was used by Elman & Golub in [42, T h m . 2], and can also be directly used for our reduced system: P r o p o s i t i o n 5.2. matrix the / / the reduced associated iteration Prop. with matrix ILU(p) matrix is an M-matrix, factorizations associated is smaller with block Jacobi the spectral than radius of the or equal to the spectral with the 2D iteration radius of splitting. 5.2 applies, for example, to the constant coefficient case where be, cd, fg > 0. For the model problem we can also obtain an existence result for the I L U T factorization. For showing this, we need the following [95, p. 290]: D e f i n i t i o n 5.1. negative, An N x N matrix the rest of the diagonal H is an M-matrix values if its last entry are all positive, the off-diagonal on the diagonal values are is nonnonpositive, and N hi < h 0 . l<i< N . (5.11) j=i+l P r o p o s i t i o n 5.3. used, reduced Proof. or centered matrix Consider differences the model problem with be, cd, fg (for all the orderings and suppose > 0. Then that either the IL UT factorization that have been considered The diagonal values of the matrix are a â€” 2 2be upwind â€” in Chap. led â€” 2fg differences exists for are the 3). in rows associated with interior points, and larger in rows associated with with non-interior points. Since the matrix is an M-matrix in the cases specified in the lemma, all its off-diagonal elements are nonpositive. The matrix has at least one negative value to the right of the main diagonal in each row except 124 Chapter 5. Solvers, Preconditioners, Implementation and Performance the last. T h u s E q . (5.11) holds and the m a t r i x is an M - m a t r i x . In a d d i t i o n , since the m a t r i x is diagonally dominant, and moreover, since the diagonal entries are all positive, it is guaranteed t h a t all its row sums are nonnegative. dominant M-matrix. T h i s special property is termed in [95] a diagonally T h e m a t r i x thus satisfies all the conditions required for existence of the I L U T factorization, by [95, T h m . 10.13]. â€¢ In F i g . 5.5 the nonzero pattern of the factors corresponding to I L U factorization w i t h drop tolerance 1 0 ~ is presented. Here the factorization was obtained by using M a t l a b ' s c o m m a n d 2 ' l u i n c ' , where the d r o p p i n g rule is: for each entry, the threshold is the drop tolerance specified by the user, m u l t i p l i e d by the n o r m of the column containing the entry (with the exception t h a t diagonal values of the m a t r i x U are never dropped). T h e m a t r i x which is shown in the figure comes from centered difference discretization of the equation â€”Au + 10(a:, y, z) Vu T = f. N o t e t h a t for I L U factorizations based on thresholds, there is flexibility (which does not exist in I L U ( p ) factorizations) which allows more fill-in for more ill-conditioned problems. 0 50 100 150 200 nz = 9146 F i g u r e 5.5: sparsity patterns of factors for I L U w i t h drop tolerance 10 2 . B l o c k factorizations based on I D p a r t i t i o n i n g require the following amount of c o m p u t a t i o n a l 2 work: the number of block rows is and the semibandwidth of the m a t r i x is n + 1. The number of nonzero blocks in the lower part as well as in the upper part of this m a t r i x is n 2 + 0(n). T y p i c a l l y there are 9 non-zero block elements in each row. 125 E a c h element is a Chapter 5. Solvers, 2n x 2n m a t r i x . products and n 2 Preconditioners, Implementation and In t o t a l there are approximately 5 n Performance 2 operations involving 2n X 2 n m a t r i x inversions or approximate inversions of 2n X 2n matrices. T a k i n g the exact inverses of the diagonal blocks, and considering the fill-in in off-diagonal block elements, t o t a l cost is 0(n ) 5 the operations, which is not satisfactory. Variants which restrict the fill-in i n the off-diagonals blocks are possible but have not been tested. Some tests w i t h incomplete L U factorizations based on 2 D p a r t i t i o n i n g w i t h approximate inverses of the p i v o t a l blocks have been performed and have not been found particularly effective. 5.3 The Overall Cost of Construction of the Reduced System One i m p o r t a n t issue in the implementation of one step of cyclic reduction is the cost of performing it. It will now be shown that the construction of the reduced system is an inexpensive component compared to the work involved in c o m p u t i n g the solution. Nevertheless, it is not negligible. Consider first the constant coefficient case. Here the reduced m a t r i x can be constructed in a straightforward manner, and there is no need to actually construct the unreduced m a t r i x first and perform the Gaussian elimination. B y the difference equation (2.7) it is evident t h a t since the c o m p u t a t i o n a l molecule is gridpoint-independent there are merely 2-3 floating point operations for each of the 19 components of the c o m p u t a t i o n a l molecule (and fewer floating point operations in t o t a l for points next to the boundary). T h e overall c o m p u t a t i o n a l work is thus negligible; in fact most of the construction time is spent on assigning the values of the c o m p u t a t i o n a l molecule to the m a t r i x . Since the reduced m a t r i x has a block structure similar to the block structure of the unreduced m a t r i x , the overall construction work is almost equal to the work involved in constructing the latter. T h e variable coefficient case requires more careful attention. Here each entry in the m a t r i x is c o m p u t e d separately, as the values of the c o m p u t a t i o n a l molecule are gridpoint-dependent. For b o t h the reduced and the unreduced systems the set-up of the components of the c o m p u t a t i o n a l molecule throughout the grid needs to be done. F i r s t , the P D E coefficients for every gridpoint 126 Chapter 5. Solvers, Preconditioners, Implementation and Performance need to be c o m p u t e d . T h e r e are six trivariate functions, each of which needs to be evaluated at n gridpoints (or approximately this number). T h e cost of this step depends on the cost of 3 evaluating the P D E coefficients and could be high. In addition, there is a need to construct the components of the c o m p u t a t i o n a l molecule. B y E q . (2.13) and (2.14), this takes a p p r o x i m a t e l y 1 2 5 n operations for centered differences, and 1 0 5 n operations for upwind differences. A s for 3 3 the step of cyclic reduction, counting operations in E q . (2.15), there is a t o t a l of about 125 floating point operations for c o m p u t i n g the entries of the m a t r i x in a row t h a t corresponds to an interior g r i d p o i n t . However, certain values in the difference equation appear several times; each of the values hi t,j-l,k a i,j,k ' i,j+l,k 9i,3,k i,j,k+l e ' i-l,j,k a ' i+l,j,k a ' a i,j,k-l a ' a appears six times in the difference equation, and careful implementation reduces the number of floating point operations by 20%25%. E v e n if the construction is done w i t h o u t such o p t i m i z a t i o n (which is the case if block G a u s s i a n elimination is applied directly), since the m a t r i x is of size -, the overall amount of floating point operations for constructing the reduced m a t r i x is a p p r o x i m a t e l y 6 0 n . 3 The construction time of the right-hand-side vector is approximately 3 n . In Table 5.2 we present 3 the actual amount of work that was required for the overhead of constructing the reduced m a t r i x for a few grids, using Gaussian elimination. n 8 512 0.025 48.8 12 1,728 0.090 52.1 16 4,096 0.222 54.2 20 8,000 0.442 55.3 24 13,824 0.775 56.1 28 21,952 1.243 56.6 32 32,768 1.869 57.0 3 Mflops flops n Table 5.2: c o m p u t a t i o n a l work involved in the construction of the reduced m a t r i x . T h e reduced system is constructed once and for a l l , and the construction is clearly an inexpensive component compared to setting up a preconditioner (approximately 2 0 0 n floating 3 point operations for setting up the I L U ( 0 ) factorization) or solving the system (at least 8 0 n 127 3 Chapter 5. Solvers, Preconditioners, operations per iteration Implementation and Performance for K r y l o v subspace solvers like B i - C G S T A B or C G S ) . F i n a l l y , it is noted t h a t the construction can be almost fully parallelized or vectorized, as i l l u s t r a t e d in Sec. 5.5. 5.4 Numerical Experiments T h i s section provides details on numerical experiments. Results of some numerical experiments for stationary methods have been given in C h a p . 4. T h e computations have been performed on an S G I O r i g i n 2000 machine, w i t h four parallel 195 M H Z parallel processors, 512 M B R A M and 4 M B cache. There is no parallel component in the implementation; the a c t u a l c o m p u t a t i o n s are performed by a single processor on the machine. T h e computer programs are w r i t t e n in M a t l a b 5. Test P r o b l e m 1 C o n s i d e r the separable problem -Au + pi x u x + p 2 y u + p y z u 3 = w(x,y,z) z (5.12) on Q = (0,1) X (0,1) X (0,1) , w i t h zero Dirichlet boundary conditions, where w(x,y,z) is constructed so that the solution is u(x, y, z) = x y z (1 - x) (1 - y) (1 - z) exp(a: + y + z) . (5.13) F i r s t some convergence analysis for block stationary methods is provided. F o r n o t a t i o n a l convenience, denote 7 = 5 = ^ and p â€” Suppose h is sufficiently s m a l l and centered difference discretization is performed. Below we refer to the components of the c o m p u t a t i o n a l molecule, as explained in C h a p . 4. Since the problem is separable, each of the components of the c o m p u t a t i o n a l molecule depends only on one variable, and thus can be w r i t t e n w i t h a single subscript. T h e components in the a;-direction satisfy c idi i+ - (1 + 7 ^ + i K ~ l i) 1 x 128 = 1 + l h ~ 7 i (Â« + 0 , 2 / 2 2 Chapter 5. Solvers, Preconditioners, Implementation and Performance which leads to 1 + y h - 7 ( 1 - h) < c d 2 If â€”1 < 7 < jzrfi then Ci id{ > 0. + identical manner. i + 1 F o r bj iej < 1 + 7/1 - 2y h 2 t and fk+igk + . 2 the bounds are o b t a i n e d i n an T h e center of the c o m p u t a t i o n a l molecule is exactly a = 6. In terms of the P D E coefficients, the condition on 7 means that the convergence analysis performed i n C h a p . 4 for centered difference discretization of the convective terms is applicable i f the P D E coefficients are 0(n). In this case the m a t r i x is symmetrizable by a real diagonal m a t r i x . U s i n g the notation of C h a p . 4, let S be the symmetrized m a t r i x , let (3 = 1 + jh â€” 2y h , 2 (3 = 2 X 1 + 5h â€” 28 h , 2 2 f3 = 1 + ph - 2p, h , 2 and let S* be a modified version of of S, such t h a t 2 z each occurrence of Ci+xdi, bj ej +i y or fk+igk i n S is replaced by the bounds, namely (3 , (3 a n d X y )3 respectively. Since S* > 5 , S* is a s y m m e t r i z e d version of a m a t r i x corresponding t o the Z constant coefficient case, and by C o r . 2.5 is a diagonally dominant M - m a t r i x . U s i n g T h e o r e m 4.8, bounds on the convergence rates of block stationary methods are available. splitting ID n 2D P bound ratio P bound ratio 8 512 0.793 0.894 1.13 0.682 0.826 1.21 12 1,728 0.895 0.946 1.06 0.825 0.908 1.10 16 4,096 0.937 0.968 1.03 0.892 0.944 1.06 20 8,000 0.958 0.979 1.02 0.927 0.962 1.04 24 13,824 0.970 0.985 1.02 0.948 0.973 1.03 Table 5.3: comparison between the computed spectral radii of the block J a c o b i iteration m a trices and the bounds, using centered differences, for the two splittings, w i t h pi = p = P3 = 1. 2 A s opposed to the constant coefficient case, here the bounds are sensitive to sign, and i n general, we find t h a t they are tighter if the convective coefficients are negative. A n explanation for this is t h a t the values (3 , j3 and j3 are larger than 1 i f p i , p X y z 2 and p 3 are positive. O n the other hand, i f the latter are negative, then we obtain values of the type 1 â€” (3 , X P a n d (3 , w i t h y Z ci,c 2 > C\h 2 â€” ch A 2 for 0. These values are closer to the values that would be obtained 129 Chapter 5. Solvers, Preconditioners, Implementation and Performance if the analogous constant coefficient problem was considered, for which the bounds have been shown in Chap. 4 to be very tight. In Table 5.3 the tightness of the bounds is demonstrated; in this case their quality is as good as bounds for the constant coefficient case. In fairness we remark that for larger values of the P D E coefficients the tightness of the bounds deteriorates. A n explanation for this is that the values f3 , f3 and f3 , which are attained for x â€” h, y â€” h x y z and z = h, are very loose bounds for gridpoints whose coordinate values are close to 1. In Table 5.4 the performance of solvers for the reduced and the unreduced systems for Pi = 50, pi â€” 20, P3 = 10 is compared, jj^joyjj < 1 0 - 1 0 was used as a stopping criterion, where is the residual at the ith iterate. The method that is used is B i - C G S T A B , preconditioned by ILU(O). In this example B i - C G S T A B was implemented using Netlib's Matlab routine. The rate of increase in iteration count as the grid is refined is in agreement with theory, at least if one makes the assumption that for this well conditioned and mildly nonsymmetric system, the convergence behavior is qualitatively the same as that of the conjugate gradient method for symmetric problems of the same type. The preconditioned matrix in this example has a condition number whose square root is of magnitude thus it is expected that refining 0(h~ ), 1 the mesh by a factor of 2 would result in doubling the iteration count. The factor of time as well as number of floating point operations should be at least 16 as the mesh is refined by a factor of 2, since the iteration count is doubled and the size of the matrix is larger by a factor of 8. n n 3 iterations time (sec.) Mflops unreduced reduced unreduced reduced unreduced reduced 8 512 11 6 0.26 0.13 0.59 0.34 16 4,096 22 11 4.49 2.07 12.5 7.2 32 32,768 45 23 97.3 47.1 300.2 177.8 64 262,144 99 44 2,499 1,138 8,559 4,594 Table 5.4: comparison of solving work/time of the unreduced and the reduced solvers for various mesh sizes, using preconditioned B i - C G S T A B , for pi = 50, p â€” 20, p â€” 10. 2 130 3 Chapter 5. Solvers, Preconditioners, Implementation and Performance T h e solver of close to symmetric systems converges slower than the solver of more strongly n o n s y m m e t r i c systems: a solver applied to p\ â€” p 2 = p 3 = 1 on a 64 X 64 X 64 grid (262,144 unknowns) converges in 173 iterations, and the C P U time is 4, 390 seconds. T h e convergence is thus slower than the convergence for larger magnitudes of the P D E coefficients and the same grid size - compare to the last row in Table 5.4. F i g . 5.6 presents the norm of the relative residual throughout the iteration for the case which is closer to s y m m e t r i c . 10* IO-Â»I 1 1 ' ' 0 20 40 60 80 1 1 1 1 100 iterations 120 140 160 1 1B0 F i g u r e 5.6: /2-norm of relative residual for preconditioned B i - C G S T A B applied to a 262,144 X 262,144 system corresponding to discretization of the test problem w i t h pi = p = p = 1, on a 64 x 64 x 64 g r i d . 2 In Table 5.5 estimates of the condition numbers of the reduced and unreduced for pi = 500, p 2 = 200 and p 3 3 matrices = 100 w i t h upwind difference discretization are presented. The estimates were obtained using M a t l a b ' s c o m m a n d 'condest'. N e x t , we discuss the question which method should be used. A s far as K r y l o v subspace solvers are concerned, it has been shown in [82] for three methods ( C G N , C G S , G M R E S ) , t h a t for each of them there can be found an example where it fails and an example where it is superior to the other methods. F o r our reduced system we have no knowledge of how to give a n a l y t i c a l justification for preferring one method over another, and we have to settle for numerical experiments for particular cases a n d / o r follow "recipe" recommendations (e.g. 131 the Chapter 5. Solvers, Preconditioners, Implementation and Performance Â« (tf) K {R) 8 420.85 195.04 12 1,049.0 489.83 16 1,859.4 866.53 20 2,693.7 1,258.6 24 3,578.4 1,657.1 n 2 2 Table 5.5: comparison between estimates of condition numbers of the unreduced m a t r i x (denoted by TJ') vs.. the reduced m a t r i x ('#'), for p = 500, p = 200, p = 100. x 2 3 "flowchart" given i n [11] for picking a solver). Consider the case p x -eAu + (x,y,z) Vu = p 2 = p 3 = 10. T h a t is, = f , T (5.14) w i t h e = ^ j . Zero is used as an initial guess, and centered difference discretization is performed on a 16 X 16 X 16 g r i d . In Table 5.6 we examine the performance of block stationary methods. T h e convergence criterion was | | ? * W | | 2 / | | r ( ' | | 2 < 1 0 0 - 1 0 . T h e values for ' M f l o p s ' have been obtained by using M a t l a b ' s c o m m a n d 'flops'. F o r all cases M a t l a b ' s builtin functions have been used. method time (sec.) iterations Jacobi 16.02 305 Gauss-Seidel 11.56 152 S O R (w = 1.2) 7.98 97 S O R (UJ = 1.4) 4.38 53 S O R {UJ = 1.5) 2.81 34 S O R (UJ = 1.6) 3.80 46 S O R (UJ = 1.8) 8.76 106 Table 5.6: performance of block stationary methods for Test P r o b l e m 1. F r o m Table 5.6 we can conclude that, as expected, J a c o b i and Gauss-Seidel are slow to converge. A s for S O R , the o p t i m a l relaxation parameter is a p p r o x i m a t e l y UJ* = 1.5, and for 132 Chapter 5. Solvers, Preconditioners, Implementation and Performance this value the performance of the scheme is very good. However, in this case our bound fails to provide an effective approximation to the o p t i m a l relaxation parameter. Here we used our bound for the block J a c o b i scheme, and the estimate we obtained for LO* was 1.851, which is obviously far from the actual o p t i m a l relaxation parameter. M o v i n g to consider preconditioned K r y l o v subspace solvers, the preconditioners t h a t were used were I L U ( O ) , I L U w i t h numerical dropping ( N D ) , block ( I D ) J a c o b i and S S O R . nz(L+U) preconditioner time (sec.) Mflops ILU(O) 14.1 0.42 1 'ND(O.l) 0.12 0.074 0.07 ND(O.Ol) 2.04 5.47 1.28 ND(O.OOl) 5.94 22.29 3.68 nz(S) Table 5.7: construction t i m e / w o r k of I L U preconditioners. ' N D ' stands for numerical d r o p p i n g , and in brackets the threshold value is given. T h e work involved in setting up block J a c o b i and S S O R is negligible; F o r the I L U factorizations Table 5.7 presents the construction time and work (in megaflops), as well as the ratio between the number of nonzeros that were generated i n the factors and the number of nonzeros of the coefficient m a t r i x . A s was mentioned earlier, the latter is useful in order to estimate the amount of work involved in the preconditioning solve. T h e results in Table 5.7 are difficult to interpret. In particular, the construction time of the I L U ( 0 ) factorization is extremely long, but the flop count is very s m a l l . It seems that this c o n t r a d i c t i o n has to do w i t h M a t l a b ' s implementation of the ' l u i n c ' c o m m a n d , and for the purpose of c o m p u t i n g overall work, it will be reasonable to take the flop count as a more reliable d a t u m - the number of flops is in accordance w i t h the estimate specified in E q . (5.8). In Tables 5.8-5.11 we present our results when testing the various preconditioners w i t h various K r y l o v subspace solvers. In the K r y l o v subspace methods considered ( B i C G , Q M R , B i - C G S T A B , C G S ) , two matrix-vector products and two preconditioner solves are required i n 133 Chapter 5. Solvers, Preconditioners, Implementation and Performance preconditioner time (sec.) Mflops iterations ILU(O) 3.77 9.8 22 ND(O.l) 6.05 15.28 49 ND(O.Ol) 3.28 8.77 18 ND(O.OOl) 3.22 9.23 11 BJ 7.08 18.22 49 SGS .4.55 11.99 27 Table 5.8: performance of Q M R for Test P r o b l e m 1. preconditioner time (sec.) Mflops iterations ILU(O) 3.51 8.9 22 ND(O.l) 5.82 13.99 51 ND(0.01) 3.06 7.99 18 ND(O.OOl) 3.03 8.55 11 BJ 6.56 16.35 49 SGS 4.26 10.91 27 Table 5.9: performance of B i C G for Test P r o b l e m 1. each i t e r a t i o n . F o r B i - C G S T A B , convergence can occur after " h a l f an i t e r a t i o n " [11]. W e also remark t h a t restarted G M R E S was also tested and was generally slower to converge. In general, B i - C G S T A B and C G S are more efficient than B i C G and Q M R . In fact, the latter are slower by a significant factor. W e have no simple explanation for this; note t h a t B i C G and Q M R involve c o m p u t i n g the transpose of the m a t r i x , which in our i m p l e m e n t a t i o n and for this problem is available, but might not always be available. C G S and B i - C G S T A B are on the same level of efficiency for this problem, w i t h m a r g i n a l difference in performance (except for an unclear difference between their performance when block J a c o b i preconditioning is used). O f the preconditioners, I L U ( 0 ) seems to be most effective in this case. W e remark t h a t when testing problems in strong convection-dominated regions, we d i d find examples where 134 Chapter 5. Solvers, Preconditioners, Implementation and Performance preconditioner time (sec.) Mflops iterations ILU(O) 1.8 6.01 12 ND(O.l) 3.53 11.07 30 ND(O.Ol) 1.51 5.15 9Â± ND(O.OOl) 1.26 4.83 BJ 3.61 11.79 27i SGS 2.17 7.25 14i Â°2 Table 5.10: performance of B i - C G S T A B for Test P r o b l e m 1. preconditioner time (sec.) Mflops iterations ILU(O) 1.74 5.30 13 ND(O.l) 3.56 10.33 29 ND(0.01) 1.51 4.91 11 ND(O.OOl) 1.23 4.66 6 BJ 4.31 13.20 37 SGS 1.90 6.11 15 Table 5.11: performance of C G S for Test P r o b l e m 1. I L U ( 0 ) preconditioned solvers performed poorly. S G S is also competitive, in particular because it involves a small set-up time. F i n a l l y , we mention t h a t in cases where the o p t i m a l relaxation parameter (or a reasonable a p p r o x i m a t i o n to it) is available for the S O R scheme, it performs very well and is highly c o m p e t i t i v e w i t h preconditioned K r y l o v subspace solvers. If the bounds presented in C h a p . 4 are tight, a good a p p r o x i m a t i o n to the o p t i m a l relaxation parameter can be obtained. SOR requires o n l y one matrix-vector product per iteration, and is thus significantly cheaper than an i t e r a t i o n of preconditioned K r y l o v subspace solvers. problem w i t h P D E coefficients p , x p, 2 A s an example, consider the same P3 = 1, for which the o p t i m a l relaxation parameter is a p p r o x i m a t e l y u* = 1.5. F o r this value the S O R solver converges w i t h i n 37 iterations. However, 135 Chapter 5. Solvers, Preconditioners, Implementation and Performance the convergence analysis provided an approximation UJ = 1.599; for this value the S O R scheme converges within 46 iterations, and for a fair comparison, this is the datum that should be used. In comparison, ILU(O) preconditioned C G S converges in this case within 11 iterations and B i - C G S T A B converges within 10\ iterations. Thus S O R and these two Krylov subspace solvers converge within almost the same number of matrix-vector products. If the construction work of the preconditioner is added, the conclusion is that S O R is faster. When the grid is finer, the estimate of the optimal relaxation parameter is more accurate and S O R performs even better. In addition, SOR requires much less storage. Test P r o b l e m 2 Consider the following singularly perturbed problem: -sAu + Vv â€¢ Vu = f (5.15) where v= ^(x 2 + y + z) , 2 (5.16) 2 subject to nonzero Dirichlet type boundary conditions on a square domain (a , b ) x (a , b ) x x (a ,b ) z z x y y which contains the origin. This class of problems is currently being studied by Sun & Ward - see [103] and references therein. E q . (5.15) is similar to the equation considered in Test Problem 1, but now there are turning points inside the domain, which turn the problem into a much more difficult one. The problem comes from a large variety of applications, including semiconductors, population growth [75], the exit problem [77], and more. One interesting phenomenon here is that for any number of space dimensions the continuous problem has an isolated exponentially small eigenvalue. The eigenvalue problem for a class of one-dimensional problems which contain the one-dimensional version of E q . (5.15) has been studied by De Groen [30]. Asymptotic analysis (for different aspects) has been done by Ludwig [75], Grasman & Mantkowsky [60], Mantkowsky & Schuss [77], and others (see [103] for a list of references). The analysis in these papers shows that the analytical solution of (5.15) has boundary layers near the edges and is constant in the interior 136 Chapter 5. Solvers, Preconditioners, Implementation and Performance of the d o m a i n . In [60] a variational approach is used to determine the constant as a weighted average of the boundary d a t a . F i n i t e difference schemes have truncation errors which might be larger than e, and it is not clear how accurate the computed solution is. A d j e r i d , Aiffa & F l a h e r t y [1] have experimented w i t h a similar two-dimensional problem (with different boundary conditions) and used a finite element code. T h e y have observed that iterative solvers converge quickly to a constant in the interior of the d o m a i n , but not necessarily to the exact constant predicted by G r a s m a n & M a n t k o w s k y ' s a s y m p t o t i c analysis. F o l l o w i n g Sun's suggestions [102], Il'in scheme was used [74]. Essentially it is a cen- tered difference scheme, modified by using fitting parameters associated w i t h the functions cotha;, c o t h y , coth z applied to the gridpoints. W h e n e is very s m a l l , the scheme behaves like the u p w i n d scheme; when e is large, the scheme is second order accurate [74] (see also [97] for useful analysis and results on the accuracy of finite difference schemes applied to singularly perturbed problems). E q . (5.15) has been considered w i t h boundary conditions = x + y + z. T h e leading order of the asymptotic expansion requires in general c o m p u t i n g a m u l t i d i m e n sional integral (see [60, p. 594] for the closed form). F o r the particular P D E coefficients in (5.15) the constant can be obtained by a weighted sum of the contact points of the function v w i t h the boundary, t h a t are closest to the origin. There are six contact points i n this case, all of which are w i t h i n distance 1 from the origin; the constant in this case is expected to be zero. T h e cyclically reduced operator inherits the properties of the continuous problem: for e = 0.03 and discretization on a 16 X 16 X 16 grid, by M a t l a b ' s c o m m a n d 'eigs' the three smallest eigenvalues of the reduced m a t r i x (multiplicity excluded) are: 2.64 â€¢ 1 0 , 1.93 and 3.59. F o r - 6 this case the condition number estimated by M a t l a b ' s 'condest' c o m m a n d is a p p r o x i m a t e l y 1 0 . 9 Table 5.12 presents flop counts and the numerical solution for a problem discretized on a 16 x 16 x 16 grid, using I L U ( 0 ) preconditioned B i - C G S T A B . T h e numbers represent averages of 5 tests, started w i t h a random initial guess. T h e sum of all the absolute values of the solution in interior points of the grid, presented in the t h i r d c o l u m n , can be considered an a p p r o x i m a t i o n 137 Chapter 5. Solvers, Preconditioners, Implementation and Performance to the constant in the interior. F o r e = 0.03 and e = 0.02 the accuracy is satisfactory. The average ( t h i r d column) for e = 0.03 is larger because the boundary layer is less "sharp" i n this case. F o r e â€” 0.01 there is an error in the solution. Here e is already significantly smaller t h a n h, and in the numerical tests there were fluctuations in the constants obtained, as well as the c o m p u t a t i o n a l work. Differences in accuracy between the reduced and the unreduced systems have been observed to be negligible. F o r this class of problems the reduced solver does not converge significantly faster t h a n the unreduced solver, in particular when e gets smaller. Â£ Mflops ( n _ 2 ) 3 zZ 0.03 10.0 7.54e-4 0.02 11.1 2.71e-5 0.01 14.6 0.53 Table 5.12: Test P r o b l e m 2: overall flop counts and average computed constants for three values of e, when I L U ( 0 ) preconditioned B i - C G S T A B is used. F o r severely ill-conditioned problems I L U ( 0 ) preconditioned iterations do not at all converge. Here the only preconditioners that have been found effective are incomplete L U factorizations w i t h a very s m a l l drop threshold. Table 5.13 presents details on numerical experiments for Â£ â€” 0.005. Notice the large amount of c o m p u t a t i o n a l work t h a t is required. T h e ratio between the number of nonzeros in the factors and the number of the nonzeros of the m a t r i x is given in the c o l u m n titled " ^ g ^ â€¢ F o r comparison, the same ratio for the complete L U factorization is 27.93 in this case. Since e is very small, in this discretization the constant in the interior cannot be expected to be computed accurately. W e note, though, that the solution for numerical dropping w i t h b o t h 1 0 - 4 and 1 0 - 6 has been observed to be constant in the interior and thus behaves qualitatively like the leading ordering of the analytical solution. T h i s example demonstrates the robustness of incomplete factorizations based on dropping. T h e c o m p u t a t i o n a l work that is required is large, but these are the only preconditioners for which the solver converges. Notice also t h a t once these preconditioners obtain convergence, the tradeoff between the larger work 138 Chapter 5. Solvers, Preconditioners, Implementation and Performance per i t e r a t i o n and the s m a l l iteration count translates into almost the same amount of overall c o m p u t a t i o n a l work (last two rows in the table). method nz(L+U) nz(S)_ Mflops 1 N/C ND(10- ) 1.43 N/C ND(10~ ) 3.25 36.2 ND(1(T ) 5.09 40.4 ILU(O) 2 4 6 Table 5.13: comparison of performance of various incomplete factorizations i n conjunction w i t h C G S , for Test P r o b l e m 2. ' N / C stands for no convergence. Test P r o b l e m 3 C o n s i d e r the following problem: -Au + v- Wu = w , (5-17) on the unit cube, where v = (sm(ny),sin(7ra:), 1) T , (5.18) subject to N e u m a n n b o u n d a r y conditions on z = 0,1 and D i r i c h l e t b o u n d a r y conditions on the rest of the boundary. T h i s is an example of a nonseparable problem, for which the convergence analysis of C h a p . 4 does not apply. T h e N e u m a n n conditions on z-direction are discretized using centered difference schemes, by adding two artificial planes of variables (whose z indices are â€”1 and n + 1). In order to examine the accuracy of the scheme, w has been constructed so t h a t the a n a l y t i c a l solution is u(x, y, z) = sin(7ra;) â€¢ sin(7ry) â€¢ COS(TTZ). Table 5.14 presents the nice behavior of the n o r m of the error, and illustrates the second order accuracy of the scheme. Table 5.15 compares the performance of the reduced solver to the performance of the unreduced solver, using ILU(O) preconditioned B i - C G S T A B . T h e criterion used here is number of flops. A l l the components are included: construction of the system, construction of the preconditioners, solution and back substitution for the gridpoints which were previously eliminated by 139 Chapter 5. Solvers, Preconditioners, Implementation and Performance F i g u r e 5.7: a 2 D slice of the numerical solution of Test P r o b l e m 3, for z = pp. n 1Mb 8 0.004346 16 0.001123 32 0.000285 Table 5.14: n o r m of the error for Test P r o b l e m 3, for various sizes of the g r i d . In the header, e stands for the error and N stands for the number of unknowns. the reduction. T h e performance of the reduced solver is better, and by a larger m a r g i n as the grid gets finer. T h i s reflects the fact that the overhead of constructing the cyclically reduced m a t r i x becomes less significant compared to the solve time as the number of unknowns increases. T h e numbers in the last row (for a 32 X 32 X 32 grid) are typical of the factor of improvement; here the solving t i m e is by far longer than the construction; the ratio of improvement is expected to be similar for larger grids. 5.5 Vector and Parallel Implementation Since the reduced m a t r i x has a clear block structure, parallel implementation is an attractive option for b o t h constructing the m a t r i x and c o m p u t i n g the solution. T h e need for parallelism is i m p o r t a n t especially because systems of equations associated w i t h three-dimensional problems are typically very large. F o r example, a grid of size 64 X 64 X 64 on the unit cube, which might be too coarse for difficult applications, requires solving a non-small system of more than 250,000 140 Chapter 5. Solvers, Preconditioners, n Implementation and n Performance Mflops 3 unreduced reduced 8 512 1.17 0.91 16 4,096 14.03 10.13 32 32,768 237.83 139.17 Table 5.15: comparison between the performance of the unreduced and the reduced solvers for various grids, using ILU(O) preconditioned B i - C G S T A B , for Test Problem 3. unknowns. The problem here is storage as well as time. If the matrix is too big to fit in the physical memory of the computer, swapping (paging) takes place, which substantially slows the execution time (the computation time could slow down by a factor of 50 [87]). This section addresses a few issues associated with parallel and vector implementation. It should be viewed only as a general overview; in practice our implementation is vectorized, written in Matlab, and includes no parallel components. Overview General overviews on parallel implementation of solution of linear systems are, for example, Demmel, Heath & van der Vorst [32], and an earlier review by Ortega & Voigt [84]. The earliest forms of parallelism (or vectorization, rather) were based on having a single processor with multiple functional units, such as multipliers and adders. During compilation, dependence analysis was done, and so, if for example a certain computation involved four additions and two multiplications which were independent of one another, these operations were performed simultaneously using four adders and two multipliers. The next stage was pipelining. Golub and Van Loan [55] illustrate pipelining by making an analogy to an assembly line used in car manufacturing. Suppose a certain operation can be divided into m stages, each of which takes t seconds to complete. If we have n such operations to perform, then the overall time it would take when performing all the operations sequentially is m â€¢ n â€¢ t. If instead of waiting for the first operation to be fully completed before performing the second operation we perform the first stage of operation number 2 while the second stage of operation number 1 is taking place, and 141 Chapter 5. Solvers, Preconditioners, Implementation and Performance so on, the overall amount of time is the necessary time m â€¢ t, plus the overhead of the w a i t i n g time for operation number n to start, which is (n - 1) â€¢ t. T h i s is a substantial improvement over the time it takes to perform the c o m p u t a t i o n sequentially. speed-up, A n i m p o r t a n t t e r m here is which is the time it takes to perform an algorithm sequentially, divided by the time it takes to perform it in the parallel environment the user has. In this particular example we have speed-up of â„¢â„¢-\ > which is close to m for n sufficiently large. m T h e first vector computers used pipelined functional units, and had the c a p a b i l i t y to work very efficiently on vectors and arrays by using vector instructions as part of their i n s t r u c t i o n sets. Such instructions included loading and storing vectors, adding or s u b t r a c t i n g vectors, performing dot products and so on. Users were not required to specify their desire to perform these operations i n a vectorized manner; rather, the object-code for pipelining was typically generated i n c o m p i l a t i o n time, and it was the role of the compiler to identify components t h a t could be pipelined. T o d a y ' s parallel computations are based on multiprocessor systems, w i t h several processors, each having a C P U , local memory, and capability to pass messages to other computers in the network. Loosely speaking, there are two m a i n philosophies as to the architecture of a parallel network [32]: 1. Shared m e m o r y architecture. 2. D i s t r i b u t e d memory architecture. Shared m e m o r y architectures are ones where each processor has access to a g l o b a l , c o m m o n m e m o r y area. T h e advantage of this architecture is t h a t as far as the user is concerned, any processor can access any data, and very little work has to be done in order to manage d a t a access. T h i s makes the p r o g r a m m i n g considerably easy. O n the other hand, locality of d a t a (as is the case w i t h discretized partial differential equations) cannot be taken advantage of as efficiently as in other architectures [95]. In a distributed memory environment a large number of processors are connected in some topological form; each of them has its own local memory, and executes its own programs. D a t a 142 Chapter 5. Solvers, Preconditioners, Implementation and Performance is sent t o other processors in the network in the form of messages. defines how the processors are interconnected. T h e network topology E x a m p l e s for possible topologies are rings, meshes, hypercubes and trees. Description of these architectures can be found, e.g. i n [95],[55]. Meshes of processors well suit solution of discretized partial differential equations, as groups of gridpoints can be straightforwardly mapped into each of the processors. A 2 D mesh is illustrated in F i g . 5.5. F i g u r e 5.8: 2 D mesh architecture ( 3 x 3 ) . W h e n considering solving a linear system on a parallel machine using preconditioned K r y l o v subspace methods, different aspects need to be addressed: preconditioner set-up, m a t r i x - v e c t o r m u l t i p l i c a t i o n , vector updates, dot products and preconditioner solve. F o r a general overview, see [87]. T h e preconditioner set-up and the preconditioner solve are usually the bottleneck. D o t products are easy to handle, but require a fairly large c o m m u n i c a t i o n , as each processor performs the c o m p u t a t i o n for its assigned variables, but the final result is required by a l l the processors. C o n s t r u c t i o n of the reduced system F o r the reduced system, when considering distributed architecture, for simplicity it is convenient to refer to a 3 D mesh of p 3 processors, assume that we have a uniform mesh w i t h n 3 unknowns, and t h a t p divides n (if this is not so, then we can proceed by m a p p i n g an unequal number of gridpoints into each of the processors). One i m p o r t a n t advantage of the ordering strategies discussed in C h a p . 3 is t h a t all the block rows have the same size, and thus the issue of load balancing [32],[55],[87] in this particular 143 Chapter 5. Solvers, aspect is resolved. Preconditioners, Implementation and Performance Our working assumption is standard [95]: pairs of rows-unknowns are handled by the same processor. That is, row number i and unknown number i are both mapped into the same processor. Usually for P D E s , due to the locality of the operators, the geometry of the gridpoints is the source for mapping the unknowns into the processors in the network. We can thus assign subcubes of the original domain to each processor. As a result, each processor constructs a part of the reduced matrix which is a rectangular matrix whose number of rows is the number of unknowns that are associated with it, and the number of columns is the size of the reduced matrix. If only a 2D mesh of processors is available, then mapping of the unknowns can be done by assigning "stripes", for example of size n X 2 X 2 (equivalent to the I D sets of the two-plane ordering, described in Chap. 3) to each of the processors. However, the drawback here is that it allows less flexibility than a 3D mesh of processors in dynamically changing the orientation of the blocks relative to the independent variables x, y and z. The construction of the reduced matrix entries is divided into two stages: 1. Compute the components of the seven-point operator. 2. Perform the algebraic reduction. The first stage can be parallelized in a straightforward manner. For the reduction step, constructing a row associated with a certain gridpoint, the values of the computational molecule of its neighbors are required. If are (i of y/2. Â± 2, j , k),(i,j Â± 2, k), k) is the point being considered, then its furthest neighbors k Â± 2); the rest of the neighbors are within a smaller distance Each processor must have data on an external interface layer of thickness 2 on each of the six sides of the subcube (see Fig. 5.9). There are two ways to obtain the values of the computational molecule associated with the gridpoints that belong to the external interface layer: either by communicating with neighboring processors, or by computing these values as part of the first stage, which will cause overhead in arithmetic. Since typically the number of gridpoints in the mesh is very large, and since communication will require message exchange with a large number of 26 processors, the second option (of overlap in arithmetic) seems a better alternative as it eliminates communication. 144 Chapter 5. Solvers, Preconditioners, Implementation and Performance EXTERNAL INTERFACE LOCAL r INTERFACE 1 i I LOCAL I i . 1 J Figure 5.9: a 2D slice of gridpoints associated with one processor. The local and external interface layers for the reduced system are of "thickness" of 2 unknowns. The computation of the entries of the reduced matrix is thus almost fully parallelizable, with minimum communication and overhead caused by additional computation of components of the computational molecule for external interface variables. Suppose computing a value of the computational molecule of the seven-point operator takes fx flops and computing each component for the reduced computational molecule takes / flops. Then, for step 1, the amount of 2 computations per each processor (referring to processors associated with an interior subdomain) is {^ff^j â€¢ fi flops, and for step 2 the work involved is | â€¢ â€¢ / flops. Thus the speed-up 2 in arithmetic is nearly p if n >> p. In a able. vector environment the computation of the components of the matrix is vectoriz- Here both construction diagonal by diagonal or construction block by block can be efficiently vectorized. When constructing diagonal by diagonal, the idea is to reshape the threedimensional arrays which contain the previously computed values of the computational molecule of the cyclically reduced operator (which can clearly be done in a vectorized manner) into a one-dimensional array, and correct to zero the values that correspond to values of the computational molecule which are zero due to the associated gridpoint being close to the boundary. In an environment which has a parallel as well as a vector component, the construction of all the diagonals of the matrix can be done in parallel, provided that all the processors have copies of the values of the computational molecule of the whole domain (here, a shared memory 145 Chapter 5. Solvers, Preconditioners, Implementation and Performance environment might be appropriate). A s an example, below we present a piece of code w r i t t e n in M a t l a b s y n t a x which assigns two (arbitrarily picked) diagonals of the reduced m a t r i x which corresponds t o n a t u r a l lexicographic ordering. dzmm= 1 - ( b ( i , j , k - l ) . * f ( i , j ,k) . / a ( i , j , k - l ) . . . . +b(i,j.k).*f(i,j-l,k)./a(i,j-l,k)); dzmm(:,1,:)=0; dzmm(:,:,1)=0; diag_zmm=reshape(dzmm,sys_size,1) ; izmm=lny+lnx*lny; diag_zmm(l:sys_size-izram)=diag_zmm(izmm+l:sys_size); dppz= -(d(i,j+l,k).*e(i,j,k)./a(i,j+l,k) ... +d(i,j,k).*e(i+l,j,k)./a(i+l,j,k)) ; dppz(lnx,:,:)=0; dppz(:,lny,:)=0; diag_ppz=reshape(dppz,sys_size,1); ippz=lny+l; diag_ppz(ippz+l:sys_size)=diag_ppz(l:sys_size-ippz); In the above code, a, b, d, e, f are three-dimensional arrays which hold the previously c o m p u t e d components of the c o m p u t a t i o n a l molecule, Inx, Iny, Inz are the numbers of c o m ponents i n each direction, diagjzmm is the diagonal that contains gridpoints whose coordinates are indexed (i, j - 1, k â€” 1) and diagjppz T h e integer numbers izmm corresponds to the gridpoints indexed (i + 1, j + 1, k). and ippz are the indices of the diagonals; finally, the values are shifted so as to fit i n the m a t r i x . T h e shift is different between superdiagonals and subdiagonals. Reordering is done by using vectors which map coordinates t o row # in the m a t r i x a n d vice versa. F o r example, for the two-plane ordering, the connection between the gridpoints l o c a t i o n in the m a t r i x Â£ and its coordinate values is given by l I n the actual Matlab program the implementation is different. The code presented here is only for the purpose of illustration of the general idea. 146 Chapter 5. Solvers, Preconditioners, Implementation and Performance i = fix{[(Â£ - 1) mod (2n)]/2} + 1 2 â€¢ [fix(^) + 1] J = < ( 2-fix(^-) + l (5.19a) Â£ mod 4 = 0 or 1 (5.19b) n 2-fix(( - > < 1 Â£ mod 4 = 2 or 3 )+ l I odd " ) + l] * even modn 9 2-[fix(( - ) 7 < 1 d 2 For the two-line ordering, the connection is i = [(Â£ - 1) mod n] + 1 , ^=fi*(^) (5.20a) + l , 2 â€¢ (fix((fix(Â£ - (Jfe - 1) â€¢ (n /2)) - l)/n)) + 1 J = < [ 2-(fix((fix(Â£-(A;-l)-(n /2))-l)/?i)) + 2 2 2 (5.20b) A: odd k even (5.20c) M a t r i x - v e c t o r products for the reduced matrix One drawback that the reduced matrix has, compared to the unreduced matrix, is that its associated computational molecule is non-compact. As a result, when performing matrix-vector products each of the processors in the network must communicate with all the processors surrounding it, which amount to 26 for "interior" processors. This can be observed by examining the computational molecule, see Fig. 2.4. However, the messages that need to be exchanged are very small in size and the communication can be done simultaneously with other computations, as will be clarified below. 147 Chapter 5. Solvers, Preconditioners, Implementation and Performance For the purpose of the illustration, it is convenient to assign two colors to the processors in the network: suppose the processors are either "red" or "black", by the usual r e d / b l a c k ordering of a 3 D mesh (see F i g . 2.1). A l s o , suppose the processor that is being considered is one which is an interior processor in the mesh of processors, and let P stand for its " i d n u m b e r " . T h e n the sizes of the messages exchanged by processor P and its neighboring processors in the process of c o m p u t i n g the product of the reduced m a t r i x w i t h a vector are: â€¢ (j^J numbers for the six processors of the opposite color. Here a layer of size ^ x ^ X 2 of external interface variables is required. It contains unknowns, not 2 (j^J , because only the black gridpoints appear in the reduced grid). â€¢ E i t h e r ^ or less numbers for the processors w i t h the same color of processor P. T h u s six processors will transfer most of the d a t a required. T h i s is different from the unreduced system where these same six processors will transfer all the required d a t a to processor P. T h e sizes of the messages are small, and as an illustration, consider a network of 1,000 processors w i t h 1,000,000 unknowns; then six processors exchange 100 numbers w i t h processor P, and the rest exchange 10 or less numbers. The local variables that cause the need to exchange messages are the ones close to the b o u n d a r y : the local interface points [95]. F o r the reduced system, the thickness of this layer in terms of unknowns is two, just like the thickness of the external interface layer. T h i s follows from the structure of the reduced c o m p u t a t i o n a l molecule. F o r the unreduced system the thickness of these layers is one. T h e local points (see F i g . 5.9) are the points which form the dense "block d i a g o n a l " part of the rectangular m a t r i x . The actual matrix-vector product consists of three separate stages, the first two of which can be done in parallel [95]: multiplication of local variables, message exchange and m u l t i p l i c a t i o n of external | â€¢ (^) 3 variables. T h e size of the each subcube in the reduced grid is ^ , and it contains unknowns; In order to illustrate how this works, we refer below to a p a r t i c u l a r example: we take n = 16, p â€” 4 and refer to n a t u r a l lexicographic ordering. T h e interior subcube is indexed (2, 2, 2) in the mesh of processors (the equality of the indices here has no special 148 Chapter 5. Solvers, Preconditioners, Implementation and Performance meaning whatsoever). T h e subcube indices in this case, in terms of the index of the ordering original of the reduced grid, are: (547, 548, 555,556,563, 564, 571,572) + i â€¢ 128, i = 0 , 1 , 2, 3 ; there are 32 = 4j- gridpoints in t o t a l . T h e sparsity pattern of the 32 X 2048 rectangular m a t r i x which is the part of the reduced m a t r i x handled by processor P is depicted in F i g . 5.10. T h i s is before the unknowns were locally ordered. In the top figure, the spy pattern of the original reduced (square) m a t r i x illustrates the (two) 2 D sets of gridpoints (see C h a p . 3 for definition) i n which the subcube we are considering is contained; these are 32 of the rows between the two horizontal lines, namely between rows 512 + 1 and 1024 (indeed, 1024 - 512 = 2 n ) . T h e b o t t o m figure is the sparsity pattern of the 2 rectangular m a t r i x of size 32 X 2048 which contains all the rows associated w i t h the gridpoints of the subcube. A s is evident, the m a t r i x contains several nonzero blocks, more dense in some places (which correspond to the location of the local gridpoints). 0 300 400 600 D00 1000 MOO MOO 1600 1800 ?COO F i g u r e 5.10: the part of the reduced m a t r i x t h a t contains the gridpoints associated w i t h subcube ( 2 , 2 , 2 ) , and the rectangular m a t r i x before local ordering in processor P has been done. After local ordering of the unknowns is performed (which is done by using the same strategy as used for the whole reduced grid) the result is a rectangular m a t r i x whose " d i a g o n a l block" is fairly dense, and corresponds to the local example. variables. F i g . 5.11 illustrates this for our particular Here the band or the sparsity pattern of the m a t r i x are not i m p o r t a n t ; what is 149 Chapter 5. Solvers, Preconditioners, i m p o r t a n t is the density. Implementation and Performance In this example the number of nonzeros is in the m a i n block is 344, compared to 608 of the whole rectangular m a t r i x (see F i g . 5.10). In practice, for large n, the s i t u a t i o n is much better than in this example (where n is very s m a l l ) , i n the sense t h a t most of the nonzeros of the rectangular m a t r i x appear in the m a i n diagonal block and most of the rows will have 19 nonzero elements (equal to the number of components in the reduced c o m p u t a t i o n a l molecule). F o r these local points no c o m m u n i c a t i o n is required in order to perform the matrix-vector product. In order to perform the m u l t i p l i c a t i o n of the local part more efficiently, re-ordering of the "block diagonal" m a t r i x could be done, so t h a t a l l the local points are ordered first, and only then the local interface points appear. F i g u r e 5.11: the "local part" of the rectangular m a t r i x depicted in F i g . 5.10, after re-ordering. If n ^> p, for the reduced system there are approximately | â€¢ {j^^j only a p p r o x i m a t e l y 6 â€¢ (j^j local points and local interface points and about the same number of external points. T h a t is, the external m a t r i x (that is, the rectangular m a t r i x , w i t h the "block d i a g o n a l " excluded) has a number of nonzeros which is smaller by an order of magnitude from the number of nonzeros i n the square m a t r i x t h a t corresponds to the "block d i a g o n a l " . T h i s is i m p o r t a n t , because it means t h a t the c o m m u n i c a t i o n required for exchanging values of the external interface variables does not form a bottleneck - it can be done while a massive 0 ( n ) c o m p u t a t i o n is being 3 carried out. T h e c o m m u n i c a t i o n w i t h the neighboring processors will include also t r a n s m i t t i n g numbers corresponding to the local interface points; these are considered external interface points for the neighboring processors. Once the stage where simultaneous c o m p u t a t i o n of the local matrix-vector p r o d u c t and the 150 Chapter 5. Solvers, Preconditioners, Implementation and Performance exchange of local and external interface variables has been completed, what is left is t o perform and a d d the inexpensive component of the external matrix-vector product. F o r the reduced m a t r i x this operation takes approximately 1 2 n floating point operations. 2 P r e c o n d i t i o n e r set-up a n d solve F i g u r e 5.12: a 2 D slice of processors i n a 7 X 7 X 7 mesh that hold entries of I D or 2 D sets of unknowns t h a t are associated w i t h the unknowns mapped into the processor marked by ' X ' , or communicate w i t h this processor when matrix-vector products are performed. It has already been mentioned before, that the solve is the costly part of the c o m p u t a t i o n . O n e of the most i m p o r t a n t aspects here is the ordering strategy. In fact, i n t r o d u c i n g the fourcolor block ordering (for I D partitioning) and the red/black block ordering (for 2 D partitioning) in C h a p . 3, a n d testing their performance ( C h a p . 4) is mainly for the purpose of vectorization and parallelization. These ordering strategies are less efficient than n a t u r a l ordering. Intuitively, this c a n be easily explained by observing t h a t the number of nonzeros of these orderings in the (complete) L U is larger, and thus an incomplete L U factorization is less accurate. O n the other hand multicolor orderings are easily parallelizable. E x p e r i m e n t a l investigation of effects of coloring on performance of parallel solvers is provided, for example, i n [88]. F o r the reduced system, the four-color scheme ( F i g . 3.6) gives rise t o solution i n parallel for four blocks of unknowns, according to their color. It should be noted, however, that the diagonal blocks are block diagonal matrices; F i g . 5.12 depicts the processors which hold un- knowns associated w i t h a I D or 2 D block, or communicate w i t h the processor marked by ' X ' 151 Chapter 5. Solvers, Preconditioners, Implementation and Performance when matrix-vector products are performed. In a network of p 3 processors, a p p r o x i m a t e l y 3p processors will hold the entries of a ID or a 2D set of unknowns of the reduced g r i d . Table 5.16 presents iteration counts for ILU(O) preconditioned B i - C G S T A B , Test P r o b l e m 1 (Sec. 5.4) w i t h p\ = p 2 applied to = P3 = 2. T h e natural ordering is more efficient than multicolor version (the difference in percentage is not negligible) but would obtain much smaller speed-up rates. Investigation of parallel solution techniques for the reduced system should be subject further investigation and is left as a possible future to project. ordering iterations Natural 16 ID four-color 19i 2D red/black 20| Table 5.16: comparison of iteration counts of natural and multicolor orderings for Test P r o b l e m 1 w i t h p\ = p 2 = P3 = 2. 152 Chapter 6 Summary, Conclusions and Future Research In this dissertation a three-dimensional cyclically reduced linear system of equations arising from the steady-state convection-diffusion equation has been derived and analyzed. A n a l y s i s of cyclic reduction for non-self-adjoint problems was done for the first time only in the early 1990s, when E l m a n & G o l u b considered the two-dimensional problem. D e r i v a t i o n and analysis of the three-dimensional non-self-adjoint problem has been done in this thesis for the first time. A s u m m a r y of the m a i n results, conclusions and suggestions of future research i n related topics is given below. T h e m a i n results of this thesis are also presented in [62],[63],[64]. 6.1 Summary and Conclusions T h e following issues have been addressed: 1. D e r i v a t i o n of the linear system for both the constant and the variable coefficient problems. T h e step of cyclic reduction for three-dimensional problems has been described in d e t a i l . G e n e r a l numerical properties of the reduced m a t r i x have been described. It has been shown t h a t the reduced m a t r i x is diagonally d o m i n a n t , s y m m e t r i z a b l e or an M - m a t r i x whenever the original m a t r i x is, and moreover, there are regions of P D E coefficients for which only the reduced m a t r i x has such properties. T h e reduced m a t r i x is generally better conditioned than the original m a t r i x . 2. O r d e r i n g s . T h e reduced grid is different than a standard tensor-product re-ordering of the unknowns is an i m p o r t a n t issue. for three-dimensional grids has been presented. 153 grid, and A general block ordering strategy T h e approach is based on referring to Chapter 6. Summary, Conclusions and Future Research one-dimensional or two-dimensional grids of sets of gridpoints and their associated c o m p u t a t i o n a l molecule. A highly effective family of two-plane block orderings has been pre- sented, and comparisons w i t h other ordering strategies have been performed. N a t u r a l as well as multicolor block versions have been considered. 3. S y m m e t r i z a t i o n . In order to obtain bounds on convergence rates of s t a t i o n a r y methods, E l m a n & G o l u b ' s strategy of finding a diagonal symmetrizer has been adopted. Here it t u r n s out t h a t the three-dimensional case is similar to the two-dimensional case: in b o t h cases the reduced m a t r i x can be symmetrized for a substantially larger range of P D E coefficients compared to the original m a t r i x . In particular, the reduced matrices for both the 2 D and 3 D problems can be symmetrized when all the mesh R e y n o l d s numbers are larger than 1 in magnitude. 4. B o u n d s o n convergence rates. Comprehensive convergence analysis has been per- formed for block stationary methods. F i r s t , tight bounds on convergence rates of the block J a c o b i scheme have been derived. T h e convergence analysis applies to u p w i n d difference discretization w i t h any value of the P D E coefficients, or centered difference discretization w i t h mesh Reynolds numbers smaller than 1. T w o partitionings based on dimension ( I D and 2 D ) have been considered. T h e reduced m a t r i x is not consistently ordered relative to I D p a r t i t i o n i n g , and in order to use Y o u n g ' s analysis for Gauss-Seidel and Successive-Over-Relaxation, it has been shown by extensions of the theory for p-cyclic matrices, due to V a r g a , that the two-plane m a t r i x is nearly consistently ordered relative to this p a r t i t i o n i n g for the range of P D E coefficients for which the convergence analysis applies. T h e bounds on convergence rates for block Gauss-Seidel are tight, and for S O R the estimate of the o p t i m a l relaxation parameter, obtained by using Y o u n g ' s form u l a , is accurate. Fourier analysis extends the convergence results to the case of centered difference discretization of convection dominated equations. 5. C o m p u t a t i o n a l work. Careful analysis of the work involved in each of the solvers has been performed. In particular, when the P D E coefficients are such t h a t the numerical solution is stable and convergence is guaranteed, it has been shown t h a t block solvers based 154 Chapter 6. Summary, Conclusions and Future Research on I D p a r t i t i o n i n g are generally more efficient than solvers based on 2 D p a r t i t i o n i n g . 6. P r e c o n d i t i o n e r s have been considered. F o r incomplete L U factorizations, the amount of fill-in in the reduced m a t r i x and the c o m p u t a t i o n a l work involved in constructing the preconditioner have been discussed. T h e construction of incomplete L U factorizations is more costly for the reduced m a t r i x , compared to the unreduced m a t r i x . E x p e r i m e n t a l results show t h a t when the problem is not too i l l conditioned, ILU(O) is more effective t h a n other preconditioners; incomplete factorizations based on threshold require more c o m p u t a t i o n a l work but perform well for ill-conditioned problems. 7. T h e cost of the construction of the reduced system has been addressed, and it has been shown t h a t the construction is an inexpensive component in the overall c o m p u t a t i o n a l work required to compute the solution. 8. C o m p a r i s o n w i t h the u n r e d u c e d s y s t e m has been performed, and it has been shown t h a t reduced solvers typically converge faster. T h i s has been proven a n a l y t i c a l l y for block stationary methods, and has been demonstrated empirically for a variety of K r y l o v subspace solvers. A l s o , it has been shown that there is a range of P D E coefficients for which the unreduced solver diverges whereas the reduced solver converges. 9. E x p e r i m e n t a l c o m p a r i s o n between various solvers has been performed. F o r preconditioned K r y l o v subspace solvers B i - C G S T A B and C G S seem to be more efficient than other K r y l o v subspace methods that have been considered. T h e y seem more appropriate to use for other reasons as well - due to the high memory requirements for the class of problems considered in this thesis G M R E S might not be the first choice, and due to the need to have the transpose m a t r i x , Q M R and B i C G are less preferable. W h e n the o p t i m a l relaxation parameter is known, which is the case for constant coefficient problems and certain variable coefficient problems, S O R is highly competitive w i t h K r y l o v subspace solvers. T h i s underlines the importance of the convergence analysis t h a t has been performed i n C h a p t e r 4. 10. I m p l e m e n t a t i o n has been discussed. Some preliminary observations on aspects of vec155 Chapter 6. Summary, Conclusions and Future Research torization and parallelism have been provided. The construction of the reduced system is fully parallelizable and highly vectorizable; four-color ID block orderings and red/black 2D block orderings are appropriate for parallel implementation of solvers. 11. Numerical results which validate the analysis and illustrate various aspects have been presented. The issue of ordering is especially important: in particular, the derivation of an ordering strategy which fits the three-dimensional reduced grid. Such orderings are superior to natural lexicographic orderings or general ordering strategies which do not take the special structure of the reduced grid and the reduced computational molecule into account. The most important advantage of the cyclically reduced system is that there is typically an improvement in the condition number by a factor of 2 to 3. This directly affects the rate of convergence. As has been shown, for block stationary methods the bounds on convergence rates can be referred to as a reliable indication of the gains, and here the ratio between the bounds on convergence rates of the reduced system and the unreduced system goes up to approximately | for the ID partitioning, and slightly less for the 2D partitioning. On the other hand, the 3D cyclically reduced system has some disadvantages; the most serious one is: more significant loss of sparsity compared to lower dimensions, which causes an increase in computational work involved in a single matrix-vector product, and in construction of preconditioners. In fact, the three-dimensional problem is different from the ID and 2D problems, in that it is the only case where the number of nonzeros of the reduced matrix is higher than the number of nonzeros of the unreduced matrix: about 35% more nonzeros in the 3D case, as opposed to nearly 50% and 10% fewer nonzeros in the ID and 2D cases respectively. Another disadvantage is that in general, setting up the reduced system is more difficult than using other finite difference schemes, due to the structure of the computational molecule and the structure of the grid. 156 Chapter 6. Summary, Conclusions and Future Research Despite these disadvantages, one step of cyclic reduction is highly effective: the significant gain in iteration counts compensates for the more expensive matrix-vector products, and in practice the solution time has been observed to be t y p i c a l l y faster by a factor of two or more. T h e difficulty in implementation is minor; implementation can be done by performing the step of block G a u s s i a n elimination directly, which is easy. A l t e r n a t i v e l y , the difference operator, which is given explicitly in C h a p . 2, can be used. T h e above discussion leads to the conclusion that one step of cyclic reduction for threedimensional problems is an effective preconditioning technique, which leads to faster solution of linear systems associated w i t h three-dimensional elliptic problems. 6.2 Future Research A few possible future research activities are: 1. A p p l i c a t i o n of the cyclic reduction step to other discretization schemes. T h e cyclic reduction step is effective for matrices which satisfy P r o p e r t y A . W h e n the P D E has rough or discontinuous coefficients there are more appropriate schemes, in which there is strong coupling between the red and the black points. W h e n considering cyclic reduction for schemes whose associated m a t r i x does not satisfy P r o p e r t y A , one potential difficulty is t h a t the m a t r i x to be inverted is not diagonal. A s a result: â€¢ T h e step of cyclic reduction requires more construction time and more c o m p u t a t i o n a l work per iteration. â€¢ It is difficult to derive a difference operator and perform convergence analysis. O n the other hand, the significant gains in c o m p u t a t i o n a l work t h a t have been presented in this thesis lead one to believe that cyclic reduction might still be highly effective. Some numerical experiments that have been performed as part of this research on R o e & Silikover's o p t i m u m positive residual scheme [92] for three-dimensional problems w i t h constant coefficients, demonstrate significant gains in iterations for the reduced system but 157 Chapter 6. Summary, Conclusions and Future Research considerably larger computational work per iteration. Careful investigation and implementation are required in order to reach definite conclusions as to whether cyclic reduction is effective here. As part of application of the cyclic reduction step to other discretization schemes, higher order schemes and linear systems associated with finite elements are of much interest. 2. E x t e n d the class of problems. Navier-Stokes equations, which are difficult to solve numerically, translate into a 2 X 2 block indefinite linear system of equations, which contains "a convection-diffusion matrix". Here cyclic reduction can be used as an inner iteration. 3. O r d e r i n g strategies. The framework of block orderings which has been described for the reduced grid in Chap. 3 is an effective tool, and research in this area should be pursued. Questions of interest are whether there can be obtained a "nearly-optimal" ordering strategy, based on point and block computational molecules and grids, along with other considerations. A n interesting idea of Golub [53] is a dynamic change of direction of the ordering through- out the iteration. Such an adaptive ordering strategy does not seem to have been applied, and might be a promising technique, in particular when the direction of the flow is not clear. 4. Parallelism. As already mentioned, parallelism is important in particular for three- dimensional problems, due to the size of the discretized systems. The cyclically reduced operator has some features that are different from standard finite difference operators. In particular, the structure of the computational molecule is more complicated. A possible idea is to map unknowns into processors not necessarily in the standard way; rather, assign subcubes whose sides form a 45Â° angle in all of x, y and z directions, relative to the axes. This fits the structure of the reduced computational molecule, and might reduce the communication time required between processors in the network. On the other hand, it requires a nontrivial treatment of the grid, for example close to the boundaries. 158 Chapter 6. Summary, Conclusions and Future Research Implementation in a parallel environment requires further investigation. 5. A n a l y s i s of the cyclically reduced operator as a s m o o t h e r for m u l t i g r i d algo- r i t h m s . Preliminary smoothing analysis suggests that the cyclically reduced operator is a good smoother for multigrid, and generates smaller smoothing factors than the analogous seven-point operator. In multigrid, since it involves several grids, and since on coarser grids the mesh Reynolds numbers grow larger, the fact that the cyclically reduced difference equation corresponds to adding artificial viscosity to the original equation suggests that in convection-dominated P D E s the gain of applying cyclic reduction might be significant. A n indication for this is the stability of the block Gauss-Seidel scheme, when applied to the cyclically reduced linear system in convection-dominated regions. Research on this topic is currently underway. 159 Bibliography S. A d j e r i d , M . Aiffa, and J . E . Flaherty. High-order finite element methods for singularly p e r t u r b e d p a r a b o l i c problems. Technical Rensselaer Polytechnic Institute, New report, York, Scientific Computation Research Center, 1993. P . A m o d i o and N . A . M a s t r o n a r d i . A parallel version of the cyclic reduction a l g o r i t h m on a hypercube. Parallel Comput., 19(11):1273-1281, 1993. P . A m o d i o a n d M . P a p r z y c k i . A cyclic reduction approach to the numerical solution of boundary value O D E s . SIAM J. Sci. Comput., 1 8 ( l ) : 5 6 - 6 8 , 1997. R . J . A r m s , L . D . Gates, and B . Zondek. A method of block iteration. J. SIAM, 1956. 4:220-229, W . E . A r n o l d i . T h e principle of minimized iteration in the solution of the m a t r i x eigenvalue problem. Quart. Appl. Math., 9:17-29, 1951. U . M . A s c h e r . I n t r o d u c t i o n t o m u l t i g r i d . lecture notes, University of British Columbia, 1997. U . M . Ascher a n d P . L i n . Sequential regularization methods for s i m u l a t i n g mechanical systems w i t h many closed loops. SIAM J. Sci. Comput., to appear, 1998. O . A x e l s s o n . A general incomplete b l o c k - m a t r i x factorization m e t h o d . Appl., 74:179-190, 1986. O . A x e l s s o n . Iterative S. B a r n e t t . Matrices Solution Methods. - Methods Linear Algebra C a m b r i d g e U n i v e r s i t y Press, 1994. and Applications. C l a r e n d o n Press, O x f o r d , 1990. R i c h a r d B a r r e t t , M i c h a e l Berry, Tony F . C h a n , James D e m m e l , June D o n a t o , Jack D o n garra, V i c t o r E i k h o u t , R o l d a n P o z o , Charles R o m i n e , and Henk van der V o r s t . Templates for the Solution of Linear Systems: Building Blocks for Iterative Methods. SIAM, P h i l a d e l p h i a , 1994. L . B a u e r a n d L . Reiss. B l o c k five diagonal matrices and the fast numerical solution of the b i h a r m o n i c equation. Math. Comp., 26:311-326, 1972. R . Beauwens. F a c t o r i z a t i o n iterative methods, M - o p e r a t o r s a n d H-operators. Math., 31:335-357, 1979. Numer. M . B e n z i , D . B . S z y l d , and A . van D u i n . Orderings for incomplete factorization precondit i o n i n g of n o n s y m m e t r i c problems. Technical phia, S. B o n d e l i and W . G a n d e r . Matrix Report 97-91, Temple University, Philadel- 1997. Anal. A . Brandt. Appl., C y c l i c reduction for special tridiagonal systems. SIAM J. 15:321-330, 1994. M u l t i - l e v e l adaptive solutions to boundary-value problems. 31:333-390,1977. 160 Math. Comp., Bibliography F . B r e z z i and M . Fortin. Mixed and Hybrid Finite Element Methods. Springer-Verlag, N e w Y o r k , 1991. 0 . B u n e m a n . A compact non-iterative Poisson solver. Report Institute for Plasma Research, Stanford, California, 294, Stanford University 1969. B . L . Buzb.ee, F . W . D o r r , J . A . George, and G . H . G o l u b . T h e direct solution of the discrete Poisson equation on irregular regions. SIAM J. Num. Anal., 8:722-736, 1971. B . L . Buzbee, G . H . G o l u b , and C . W . Nielson. O n direct methods for solving Poisson's equations. SIAM J. Num. Anal, 7:627-656, 1970. T . F . C h a n and H . C . E l m a n . Fourier analysis of iterative methods for elliptic problems. SIAM Rev., 31:20-49, 1989. T . F . C h a n and T . P. M a t h e w . D o m a i n decomposition algorithms. 1994, Cambridge University Press, Cambridge, In Acta Numerica 1994. M . P . Chernesky. O n preconditioned K r y l o v subspace methods for discrete convectiondiffusion problems. Numerical Methods for Partial Differential Equations, 13:321-330, 1997. E . C h o w and Y . Saad. E x p e r i m e n t a l study of I L U preconditioners for indefinite matrices. Minnesota Supercomputer Institute Technical Report TR 97/95, University of Minnesota, 1997. P. C o n c u s and G . H . G o l u b . Use of fast direct methods for the efficient numerical solution of nonseparable elliptic equations. SIAM J. Num. Anal., 10:1103-1120, 1973. P . C o n c u s , G . H . G o l u b , and G . M e u r a n t . B l o c k preconditioning for the conjugate gradient m e t h o d . SIAM J. Sci. Statis. Comput., 6:543-561, 1985. R . W . C o t t l e . Manifestations of the Schur complement. Linear 1974. E . C u t h i l l and J . M c K e e . Reducing the bandwidth of sparse Algebra symmetric Appl., matrices. 8:120-211, Brandon Press, N e w Jersey, 1969. E . F . d ' A z e v e d o , P . A . F o r s y t h , and W e i - P a i T a n g . O r d e r i n g methods for preconditioned conjugate gradient methods applied to unstructured Anal, grid problems. SIAM J. Matrix 13(3):944-961, 1992. P . P . N . d e ' G r o e n . T h e nature of resonance in a singular perturbation problem of t u r n i n g point type. SIAM J. Appl. Math, l l ( l ) : l - 2 2 , 1980. M . A . D e L o n g and J . M . Ortega. S O R as a preconditioner. Applied Numerical Mathemat- ics, 18:431-440, 1995. J . W . D e m m e l , M . T . H e a t h , and H . van der V o r s t . P a r a l l e l numerical linear algebra. Acta Numerica 1998, Cambridge University 161 Press, Cambridge, 1993. In Bibliography [33] E . D e t y n a . P o i n t cyclic reductions for elliptic boundary-value problems I. the constant coefficient case. J. Comp. Phys., 33:204-216, 1979. [34] J . C . D i a z a n d C . G . M a c e d o . F u l l y vectorizable block preconditionings w i t h approximate inverses for non-symmetric systems of equations. Int. J. Num. Meth. Eng., 27:501-522, 1989. [35] I. S. Duff, A . M . E r i s m a n , and J . K . R e i d . Direct Press, O x f o r d , 1986. Methods for Sparse Matrices. Clarendon [36] I. S. D u f f and G . A . M e u r a n t . T h e effects of ordering on preconditioned conjugate gradients. BIT, 29:635-657, 1989. [37] L . C . D u t t o . T h e effect of ordering on preconditioned G M R E S a l g o r i t h m for solving the comprensible Navier-Stokes equations. Int. J. Num. Meth. Eng., 36:457-497, 1993. [38] H . C . E l m a n . S t a b i l i t y analysis of incomplete L U factorizations. Math. 217, 1986. Comp., 47:191- [39] H . C . E l m a n and M . P . Chernesky. O r d e r i n g effects on relaxation methods applied t o the discrete one-oimensional convection-diffusion equation. SIAM J. Numer. Anal., 30:12681290, 1993. [40] H . C . E l m a n a n d G . H . G o l u b . Iterative methods for cyclically reduced linear systems. Math. Comp., 54:671-700, 1990. non-self-adjoint [41] H . C . E l m a n a n d G . H . G o l u b . Iterative methods for cyclically reduced linear systems II. Math. Comp., 56:215-242, 1991. non-self-adjoint [42] H . C . E l m a n a n d G . H . G o l u b . Line iterative methods for cyclically reduced convection-diffusion problems. SIAM J. Sci. Stat. Comput., 13:339-363, 1992. discrete [43] H . C . E l m a n a n d G . H . G o l u b . Inexact and preconditioned U z a w a algorithms for saddle point problems. SIAM J. Numer. Anal., 31(6):1645-1661, 1994. [44] H . C . E l m a n and D . Silvester. Fast nonsymmetric iterations a n d preconditioning for Navier-Stokes equations. SIAM J. Sci. Comput., 17:33-46, 1996. [45] K . F a n . N o t e on M - m a t r i c e s . Quart. [46] C . A . J . F l e t c h e r . Computational J. Math. Techniques Oxford Ser.(2), for Fluid 11:43-49, 1960. Dynamics 1. Springer Series in C o m p u t a t i o n a l Physics. Springer-Verlag, 2st edition, 1991. [47] G . E . F o r s y t h e a n d W . R . W a s o w . Equations. Finite-Difeerence Methods for Partial Differential J o h n W i l e y and Sons, N e w Y o r k - L o n d o n , 1960. [48] R . W . F r e u n d . A transpose-free quasi-minimal residual a l g o r i t h m n o n - H e r m i t i a n linear systems. SIAM J. Sci. Comput., 14:470-482, 1993. [49] R . W . F r e u n d , G . H . G o l u b , and N . M . N a c h t i g a l . Iterative solution of linear systems. In Acta Numerica 1992, Cambridge University 162 Press, Cambridge, 1992. Bibliography [50] R . W . F r e u n d and N . M . N a c h t i g a l . Q M R : a q u a s i - m i n i m a l residual method for nonH e r m i t i a n linear systems. Numer. Math., 60:315-339, 1991. [51] E . G a l l o p o u l o s and Y . Saad. A parallel block cyclic reduction a l g o r i t h m for the fast solution of elliptic equations. Parallel [52] J . A . George a n d J . W . L i u . Computer Comput., Solution 10(2):143-159, 1989. of Large Sparse Positive Definite Systems. P r e n t i c e - H a l l , Englewood Cliffs, N e w Jersey, 1981. [53] G . H . G o l u b . personal c o m m u n i c a t i o n . [54] G . H . G o l u b and C . F . V a n L o a n . Matrix Computations. U n i v e r s i t y Press, B a l t i m o r e , M D , 1989. [55] G . H . G o l u b a n d C . F . V a n L o a n . Matrix U n i v e r s i t y Press, B a l t i m o r e , M D , 1996. Computations. Second E d i t i o n , Johns H o p k i n s T h i r d E d i t i o n , Johns H o p k i n s [56] G . H . G o l u b and D . P . O ' L e a r y . Some history of the conjugate gradient and Lanczos algorithms. SIAM Review, 31:50-102, 1989. [57] G . H . G o l u b and R . S. T u m i n a r o . C y c l i c r e d u c t i o n / m u l t i g r i d . Technical University, report, Stanford CA, 1993. [58] G . H . G o l u b and H . A . van der V o r s t . Closer to the solution: iterative linear solvers. Technical report, Stanford University, 1996. [59] G . H . G o l u b and R . S. V a r g a . Chebychev semi-iterative methods, relaxation methods, and second-order Richardson iterative methods. 3:147-168, 1961. successive overNumer. Math., [60] J . G r a s m a n and B . J . M a t k o w s k y . A variational approach to singularly perturbed boundary value problems for ordinary a n d p a r t i a l differential equations w i t h t u r n i n g points. SIAM J. Appl. Math, 32(3):588-597, 1977. [61] A . G r e e n b a u m . Iterative Methods for Solving Linear Systems. S I A M F r o n t i e r s in A p p l i e d M a t h , 1997. [62] C . Greif. C y c l i c reduction for three-dimensional elliptic equations w i t h variable coeffi- cients. SIAM J. Matrix Anal. Appl., accepted for publication, 1998. [63] C . G r e i f a n d J . M . V a r a h . Block stationary methods for n o n s y m m e t r i c cyclically reduced systems arising from three-dimensional elliptic equations. SIAM J. Matrix Anal., accepted for publication, 1998. [64] C . G r e i f and J . M . V a r a h . Iterative solution of cyclically reduced systems arising from discretization of the three-dimensional convection diffusion equation. SIAM J. Sci. Comp., to appear, 1998. [65] P . M . Gresho and R . L . Lee. D o n ' t suppress the wiggles - they're telling y o u something. Comput. Fluids, 9:223-253, 1981. 163 Bibliography M . Gunzburger. Finite Element Methods for Viscous Incompressible Flows. Academic Press, S a n Diego, 1989. L . A . H a g e m a n and D . M . Y o u n g . Applied 1981. Iterative Methods. A c a d e m i c Press, N e w Y o r k , D . Heller. Some aspects of the cyclic reduction algorithm for block t r i d i a g o n a l linear systems. SIAM J. Numer. Anal, 13(4):484-496, 1976. M . R . Hestenes a n d E . Stiefel. M e t h o d s of conjugate gradients for solving linear systems. J. Res. Nat. Bur. Standards, 49:409-436, 1952. C . H i r s c h . Numerical Computation of Internal and External Flows. J o h n W i l e y and Sons, New Y o r k , 1988. R . W . Hockney. A fast direct solution of Poisson's equation using F o u r i e r analysis. Assoc. Comput. Mach., C . J o h n s o n . Numerical Method. J. pages 9 5 - 1 1 3 , 1965. Solution of Partial Differential Equations by the Finite Element C a m b r i d g e U n i v e r s i t y Press, N e w Y o r k , 1987. O . A . K a r a k a s h i a n . O n a G a l e r k i n - L a g r a n g e multiplier method for the stationary NavierStokes equations. SIAM J. Numer. Anal, 19:909-923, 1982. R . B . K e l l o g and A . T s a n . A n a l y s i s of some difference a p p r o x i m a t i o n for a singular p e r t u r b a t i o n problem without t u r n i n g points. Math. Comp., 32(144):1025-1039, 1978. D . L u d w i g . Persistence of d y n a m i c a l systems under random perturbations. SIAM 17:605-640,1975. M. M . M a g o l u and B . P o l m a n . Theoretical comparison of pointwise, linewise and planewise preconditioners for solving three-dimensional problems. 9609, University of Nijmegen, The Netherlands, B . J . M a t k o w s k y and Z . Schuss. systems. SIAM J. Appl Math, Review, Technical Report No. 1993. T h e exit problem for r a n d o m l y perturbed d y n a m i c a l 33(2):365-382, 1977. J . A . Meijerink and H . A . van der V o r s t . A n iterative solution method for linear systems of which the coefficient m a t r i x is a symmetric m - m a t r i x . Math. Comp., 31:148-162, 1977. G . Meurant. A review on the inverse of symmetric tridiagonal a n d block tridiagonal matrices. SIAM J. Matrix Anal. Applic, A . R . M i t c h e l l a n d D . F . Griffiths. Equations. 13:707-728, 1992. The Finite Difference Method in Partial Differential J o h n W i l e y and Sons, Chichester - N e w Y o r k - Brisbane - T o r o n t o , 1967. K. W . Morton. Numerical Solution of Convection-Diffusion Problems. C h a p m a n and H a l l , 1st edition, 1996. N . M . N a c h t i g a l , S. C . Reddy, and N . Trefethen. iterations? SIAM J. Matrix Anal. Appl, H o w fast are n o n s y m m e t r i c m a t r i x 13(3):778-795, 1992. 164 Bibliography [83] J . O ' N e i l and D . S z y l d . A block ordering method for sparse matrices. SIAM Comp., l l ( 5 ) : 8 1 1 - 8 2 3 , 1990. J. Sci. Stat. [84] J . M . O r t e g a and R . G . V o i g t . Solution of partial differential equations on vector and parallel computers. SIAM Review, 27:149-240, 1985. [85] C . C . Paige and M . A . Saunders. Solution of sparse indefinite systems of linear equations. SIAM J. Numer. Anal, 12:617-629, 1975. [86] M . P a p a d r a k a k i s and G . Pantazopoulos. A survey of quasi-Newton methods w i t h reduced storage. Int. J. Num. Methods in Eng., 36:1573-1596, 1993. [87] C . P o m m e r e l l . Solution of Large Unsymmetric Systems of Linear Equations, v o l u m e 17. H a r t u n g - G o r r e Verlag, K o n s t a n z , 1992. [88] C . P o m m e r e l l , M . A n n a r a t o n e , and W . Fichtner. A set new of m a p p i n g and coloring heuristics for distributed-memory parallel processors. SIAM J. Sci. Comput., 13(1):194226, 1992. [89] E . L . Poole and J . M . O r t e g a . M u l t i c o l o r I C C G methods for vector computers. SIAM Numer. [90] Anal, J. 24:1394-1418, 1987. R . F l e t c h e r . Conjugate gradient methods for indefinite systems, in Lecture Notes in Math, volume 506. Springer-Verlag, Berlin-Heidelberg-New Y o r k , 1976. [91] R . D . R i c h t m y e r and K . W . Morton. Difference Methods for Initial-Value Problems. Interscience Publishers (John W i l e y and Sons), 2nd edition, N e w Y o r k - L o n d o n - Sydney, 1967. [92] P . L . R o e and D . Sidilkover. O p t i m u m positive linear schemes for advection in two and three dimensions. SIAM J. Numer. Anal, 29(6):1542-1568, 1992. [93] Y . S a a d . A flexible inner-outer preconditioned G M R E S a l g o r i t h m . SIAM 14:461-469, 1993. J. Sci. Comp., [94] Y . Saad. I L U T : a dual threshold incomplete L U factorization. Numerical Linear Algebra [95] with Applications, 1:387-402, 1994. Y . S a a d . Iterative Methods for Sparse Linear Systems. P W S Publishing Company, 1996. [96] Y . Saad and M . H . Schultz. G M R E S : a generalized m i n i m u a l residual a l g o r i t h m for solving nonsymmetric linear systems. SIAM J. Sci. Stat. Comp., 7:856-869, 1986. [97] A . Segal. A s p e c t s of numerical methods for elliptic singular perturbation problems. J. Sci. [98] Statis. D . Sidilkover. equations. Comput., SIAM 3:327-349, 1982. A genuinely multidimensional upwind scheme for the compressible Euler W o r l d S c i . P u b l i s h i n g , Stony B r o o k , N e w Y o r k , 1994. [99] D . Sidilkover and U . M . Ascher. A m u l t i g r i d solver for the steady state Navier-Stokes equations using the pressure-Poisson formulation. Comp. 165 Appl. Math., 1 4 ( l ) : 2 1 - 3 5 , 1995. Bibliography P. Sonneveld. C G S : a fast Lanczos-type solver for nonsymmetric linear systems. J. Sci. Stat. J. Strikwerda. C. Comput., 10:35-52, Finite SIAM 1989. Difference Schemes and Partial Differential Equations. W a d s w o r t h and B r o o k s / C o l e , 1989. X . S u n . personal c o m m u n i c a t i o n . X . Sun. P h . D . thesis proposal. University of British Columbia, 1997. P. N . Swarztrauber and R . A . Sweet. Vector and parallel methods for the direct solution of Poisson's equation. J. Comp. Appl. Math, 27:241-263, 1989. R . A . Sweet. A generalized cyclic reduction a l g o r i t h m . SIAM 1974. J. Num. Anal, 11:506-520, R . A . Sweet. A cyclic reduction algorithm for solving block tridiagonal systems of a r b i t r a r y dimension. SIAM J. Num. Anal., 14:706-720, 1977. M . C . T h o m p s o n , J . H . Ferziger, and G . H . G o l u b . B l o c k S O R applied to the cyclicallyreduced equations as an efficient solution technique for convection-diffusion equations. in Computational Techniques & Applications, edited by J. Noye & C. Fletcher, pages 637-646,1988. W . F . T i n n e y and J . W . W a l k e r . D i r e c t solutions of sparse network equations by o p t i m a l l y ordered triangular factorization. Proc. IEEE, 55:1801-1809, 1967. D . J . T r i t t o n . Physical Fluid Dynamics, second edition. C l a r e n d o n Press, O x f o r d , 1988. H . van der V o r s t . B i - C G S T A B : a fast and smoothly converging variant of B i - C G for the solution of nonsymmetric linear systems. SIAM J. Sci. Stat. Comp., 13:631-644, 1992. J . M . V a r a h . O n the solution of of block-tridiagonal systems arising from certain finitedifference equations. Math. Comp., 26:859-868, 1972. R . S. V a r g a . p-cyclic matrices : a generalization of the Y o u n g - F r a n k e l successive overrelaxation scheme. Pacific R . S. V a r g a . Matrix J. Math., Iterative Analysis. 9:617-628, 1959. Prentice-Hall, 1962. P. S. Vassilevski. A l g o r i t h m s for construction of preconditioners based on incomplete block-factorizations of the m a t r i x . Int. J. Num. Meth. Eng., 27:609-622, 1989. P. Wesseling. An Introduction 1992. to Multigrid Methods. J o h n W i l e y and Sons, N e w Y o r k , D . M . Y o u n g . Iterative methods for solving p a r t i a l difference equations of elliptic type. Trans. Amer. D. M . Young. Math. Soc, Iterative 76:92-111, Solution 1954. of Large 1971. 166 Linear Systems. A c a d e m i c Press, N e w York,
- Library Home /
- Search Collections /
- Open Collections /
- Browse Collections /
- UBC Theses and Dissertations /
- Analysis of cyclic reduction for the numerical solution...
Open Collections
UBC Theses and Dissertations
Featured Collection
UBC Theses and Dissertations
Analysis of cyclic reduction for the numerical solution of three-dimensional convection-diffusion equations Greif, Chen 1998
pdf
Page Metadata
Item Metadata
Title | Analysis of cyclic reduction for the numerical solution of three-dimensional convection-diffusion equations |
Creator |
Greif, Chen |
Date Issued | 1998 |
Description | This thesis deals with the numerical solution of convection-diffusion equations. In particular, the focus is on the analysis of applying one step of cyclic reduction to linear systems of equations which arise from finite difference discretization of steady-state three-dimensional convection-diffusion equations. The method is based on decoupling the unknowns and solving the resulting smaller linear systems using iterative methods. In three dimensions this procedure results in some loss of sparsity, compared to lower dimensions. Nevertheless, the resulting linear system has excellent numerical properties, is generally better conditioned than the original system, and gives rise to faster convergence of iterative solvers, and convergence in cases where solvers of the original system of equations fail to converge. The thesis starts with an overview of the equations that are solved and general properties of the resulting linear systems. Then, the unsymmetric discrete operator is derived and the structure of the cyclically reduced linear system is described. Several important aspects are analyzed in detail. The issue of orderings is addressed and a highly effective ordering strategy is presented. The complicated sparsity pattern of the matrix requires careful analysis; comprehensive convergence analysis for block stationary methods is provided, and the bounds on convergence rates are shown to be very tight. The computational work required to perform cyclic reduction and compute the solution of the linear system is discussed at length. Preconditioning techniques and various iterative solvers are considered. |
Extent | 7773753 bytes |
Genre |
Thesis/Dissertation |
Type |
Text |
File Format | application/pdf |
Language | eng |
Date Available | 2009-05-29 |
Provider | Vancouver : University of British Columbia Library |
Rights | For non-commercial purposes only, such as research, private study and education. Additional conditions apply, see Terms of Use https://open.library.ubc.ca/terms_of_use. |
DOI | 10.14288/1.0080032 |
URI | http://hdl.handle.net/2429/8505 |
Degree |
Doctor of Philosophy - PhD |
Program |
Mathematics |
Affiliation |
Science, Faculty of Mathematics, Department of |
Degree Grantor | University of British Columbia |
Graduation Date | 1998-05 |
Campus |
UBCV |
Scholarly Level | Graduate |
Aggregated Source Repository | DSpace |
Download
- Media
- 831-ubc_1998-271501.pdf [ 7.41MB ]
- Metadata
- JSON: 831-1.0080032.json
- JSON-LD: 831-1.0080032-ld.json
- RDF/XML (Pretty): 831-1.0080032-rdf.xml
- RDF/JSON: 831-1.0080032-rdf.json
- Turtle: 831-1.0080032-turtle.txt
- N-Triples: 831-1.0080032-rdf-ntriples.txt
- Original Record: 831-1.0080032-source.json
- Full Text
- 831-1.0080032-fulltext.txt
- Citation
- 831-1.0080032.ris
Full Text
Cite
Citation Scheme:
Usage Statistics
Share
Embed
Customize your widget with the following options, then copy and paste the code below into the HTML
of your page to embed this item in your website.
<div id="ubcOpenCollectionsWidgetDisplay">
<script id="ubcOpenCollectionsWidget"
src="{[{embed.src}]}"
data-item="{[{embed.item}]}"
data-collection="{[{embed.collection}]}"
data-metadata="{[{embed.showMetadata}]}"
data-width="{[{embed.width}]}"
async >
</script>
</div>
Our image viewer uses the IIIF 2.0 standard.
To load this item in other compatible viewers, use this url:
http://iiif.library.ubc.ca/presentation/dsp.831.1-0080032/manifest