Parallel Direction Splitting for 3Dincompressible Navier-StokesEquationsbyArun RajendranB.Tech Honors, Indian Institute of Technology, Madras, 2017A THESIS SUBMITTED IN PARTIAL FULFILLMENT OFTHE REQUIREMENTS FOR THE DEGREE OFMASTER OF SCIENCEinThe Faculty of Graduate and Postdoctoral Studies(Mathematics)THE UNIVERSITY OF BRITISH COLUMBIA(Vancouver)August 2019c© Arun Rajendran 2019The following individuals certify that they have read, and recommend to the Faculty of Graduate and Postdoctoral Studies for acceptance, a thesis/dissertation entitled:Parallel Direction Splitting for 3D incompressible Navier-Stokes Equationssubmitted by Arun Rajendran in partial fulfillment of the requirements forthe degree of Master of Sciencein MathematicsExamining Committee:Dr. Anthony WachsSupervisor Dr. Ian FrigaardSupervisory Committee Member iiAbstractIn this work, an efficient direction splitting algorithm for solving incom-pressible Navier-Stokes equations is implemented. The main feature of thismethod involves using the operator A = (1−xx)(1−yy)(1−zz) for approxi-mately solving the pressure correction step instead of the Poisson operatorused in the standard projection methods. The complexity of the algorithm ismuch lower as compared to the standard projection methods and it is shownto have similar convergence properties as Poisson based pressure-correctiontechniques. The algorithm is validated on multiple test cases, both in twoand three dimensions, respectively. The method is suitable for parallel im-plementation and tests show excellent performance on a distributed memorycluster of up to 1,000 processors. Scalability results are reported on a three-dimensional lid-driven cavity flow for a range of processors on a large numberof grid points.iiiPrefaceThis thesis presents the implementation, validation, scalability and per-formance comparison results of the direction splitting solver proposed byGuermond and Minev [13, 18] carried out by the author, Arun Rajendran,under the supervision of Dr. Anthony Wachs. In this research work, all thecomputations were performed and validated by the author according to thestandard norm followed in the literature.ivTable of ContentsAbstract . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . iiiPreface . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . ivTable of Contents . . . . . . . . . . . . . . . . . . . . . . . . . . . . vList of Tables . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . viiiList of Figures . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . ixAcknowledgements . . . . . . . . . . . . . . . . . . . . . . . . . . . xi1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11.1 Problem Description . . . . . . . . . . . . . . . . . . . . . . . 11.1.1 Centered Field . . . . . . . . . . . . . . . . . . . . . . 21.1.2 Staggered Field . . . . . . . . . . . . . . . . . . . . . 21.1.3 Boundary Conditions . . . . . . . . . . . . . . . . . . 31.2 Related Work . . . . . . . . . . . . . . . . . . . . . . . . . . 41.2.1 Pressure correction schemes . . . . . . . . . . . . . . 41.2.2 Velocity correction schemes . . . . . . . . . . . . . . . 51.2.3 Consistent Splitting methods . . . . . . . . . . . . . . 61.2.4 Inexact factorization schemes . . . . . . . . . . . . . . 71.2.5 Other schemes . . . . . . . . . . . . . . . . . . . . . . 81.3 Motivation . . . . . . . . . . . . . . . . . . . . . . . . . . . . 92 Direction Splitting. . . . . . . . . . . . . . . . . . . . . . . . . . 102.1 Types of Splitting . . . . . . . . . . . . . . . . . . . . . . . . 102.1.1 Explicit method . . . . . . . . . . . . . . . . . . . . . 102.1.2 Implicit method . . . . . . . . . . . . . . . . . . . . . 112.1.3 Crank-Nicolson Method . . . . . . . . . . . . . . . . . 122.2 Our Direction Splitting Navier-Stokes Solver . . . . . . . . . 132.2.1 Pressure predictor . . . . . . . . . . . . . . . . . . . . 14vTable of Contents2.2.2 Velocity update . . . . . . . . . . . . . . . . . . . . . 142.2.3 Advection Term . . . . . . . . . . . . . . . . . . . . . 162.3 Penalty step . . . . . . . . . . . . . . . . . . . . . . . . . . . 182.3.1 Pressure update . . . . . . . . . . . . . . . . . . . . . 193 Domain Decomposition . . . . . . . . . . . . . . . . . . . . . . 213.1 Types . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 213.1.1 Cell centered Field . . . . . . . . . . . . . . . . . . . 223.1.2 Vertex centered Field . . . . . . . . . . . . . . . . . . 223.2 Saddle point problem . . . . . . . . . . . . . . . . . . . . . . 223.2.1 Uzawa Algorithm . . . . . . . . . . . . . . . . . . . . 233.2.2 Schur Complement method . . . . . . . . . . . . . . . 243.2.3 Comparison . . . . . . . . . . . . . . . . . . . . . . . 243.3 Problem Formulation . . . . . . . . . . . . . . . . . . . . . . 253.3.1 With Lagrange Multiplier . . . . . . . . . . . . . . . . 263.3.2 Without Lagrange Multiplier . . . . . . . . . . . . . . 263.4 Implementation . . . . . . . . . . . . . . . . . . . . . . . . . 273.5 Linear System Solvers . . . . . . . . . . . . . . . . . . . . . . 313.5.1 LU decomposition . . . . . . . . . . . . . . . . . . . . 323.5.2 Thomas algorithm . . . . . . . . . . . . . . . . . . . . 324 Numerical tests . . . . . . . . . . . . . . . . . . . . . . . . . . . 354.1 Space Convergence . . . . . . . . . . . . . . . . . . . . . . . . 354.1.1 2D Heat equation (bodyterm & Dirichlet BC) . . . . 354.1.2 3D Heat equation (bodyterm & Dirichlet BC) . . . . 374.1.3 2D Periodic Poiseuille Flow . . . . . . . . . . . . . . . 384.2 Time Convergence . . . . . . . . . . . . . . . . . . . . . . . . 404.2.1 2D Heat equation (Dirichlet & bodyterm) . . . . . . 404.2.2 3D Lid Driven Cavity Problem . . . . . . . . . . . . . 425 Scalability . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 465.1 Weak Scalability . . . . . . . . . . . . . . . . . . . . . . . . . 465.1.1 2D Lid Cavity Problem . . . . . . . . . . . . . . . . . 465.1.2 3D Lid Cavity Problem . . . . . . . . . . . . . . . . . 485.2 Strong scalability . . . . . . . . . . . . . . . . . . . . . . . . 505.2.1 Small Load . . . . . . . . . . . . . . . . . . . . . . . . 505.2.2 Large Load . . . . . . . . . . . . . . . . . . . . . . . . 525.3 Performance Comparison . . . . . . . . . . . . . . . . . . . . 52viTable of Contents6 Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 556.1 Summary of results . . . . . . . . . . . . . . . . . . . . . . . 566.2 Future Work . . . . . . . . . . . . . . . . . . . . . . . . . . . 57Bibliography . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 59viiList of Tables3.1 Execution time comparison between Uzawa algorithm andSchur compelement Technique . . . . . . . . . . . . . . . . . . 255.1 Weak Scalability on 2D Lid Cavity Problem: CPU time (insecond) per time step . . . . . . . . . . . . . . . . . . . . . . 475.2 Weak Scalability on 3D Lid Cavity Problem: CPU time (insecond) per time step . . . . . . . . . . . . . . . . . . . . . . 48viiiList of Figures1.1 Pressure unknowns in x direction after direction splitting . . 21.2 Velocity(x component) in x direction after direction splitting 22.1 2D Domain with pressure and velocity field . . . . . . . . . . 133.1 Non overlapping splitting for cell centered unknown . . . . . 223.2 Non overlapping splitting for vertex centered unknown . . . . 223.3 Domain decomposition for centered field . . . . . . . . . . . . 283.4 Domain decomposition for staggered field . . . . . . . . . . . 283.5 Communications and processors in charge of Schur complement 293.6 MPI communications for assembling Schur complement ma-trix . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 303.7 MPI communications for solving linear system in 3.18 . . . . 304.1 Solution to 2D Heat equation problem with bodyterm andhomogeneous Dirichlet boundary conditions . . . . . . . . . . 364.2 Log-log plot of error vs mesh size(h) for 2D Heat equationproblem with bodyterm and homogeneous Dirichlet boundaryconditions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 364.3 Log-log plot of error vs mesh size(h) for 3D Heat equationproblem with bodyterm and homogeneous Dirichlet boundaryconditions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 374.4 2D Domain for Periodic Poiseuille flow . . . . . . . . . . . . . 384.5 Pressure solution to 2D Periodic poiseuille problem . . . . . . 394.6 Velocity solution to 2D Periodic poiseuille problem . . . . . . 394.7 Log-log plot of error vs mesh size(h) for 2D Periodic Poiseuilleproblem . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 404.8 Log-log plot of error vs time step(dt) for 2D Heat equationproblem . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 414.9 Pressure solution to 3D Lid Driven Cavity problem . . . . . . 424.10 Velocity solution to 3D Lid Driven Cavity problem . . . . . . 43ixList of Figures4.11 Log-log plot of Div u vs time step(dt) for 3D Lid Cavityproblem with Reynolds numbers 0.1 and 100 . . . . . . . . . 434.12 Log-log plot of error vs time step(dt) for 3D Lid Cavity prob-lem with Reynolds number 0.1 . . . . . . . . . . . . . . . . . 444.13 Log-log plot of error vs time step(dt) for 3D Lid Cavity prob-lem with Reynolds number 100 . . . . . . . . . . . . . . . . . 455.1 Weak Scalability: Performance on Multiple Nodes for 2D LidCavity Problem . . . . . . . . . . . . . . . . . . . . . . . . . . 475.2 Weak Scalability: Performance on Multiple Nodes for 3D LidCavity Problem . . . . . . . . . . . . . . . . . . . . . . . . . . 495.3 Speedup in Pressure update step for Small Load per proc ona 3D Lid Cavity Problem. Ideal linear speedup shown as asolid green line. . . . . . . . . . . . . . . . . . . . . . . . . . 515.4 Speedup in Velocity update step for Small Load per proc ona 3D Lid Cavity Problem. Ideal linear speedup shown as asolid green line. . . . . . . . . . . . . . . . . . . . . . . . . . 515.5 Speedup in Pressure update step for Large Load per proc ona 3D Lid Cavity Problem. Ideal linear speedup shown as asolid green line. . . . . . . . . . . . . . . . . . . . . . . . . . 535.6 Speedup in Velocity update step for Large Load per proc ona 3D Lid Cavity Problem. Ideal linear speedup shown as asolid green line. . . . . . . . . . . . . . . . . . . . . . . . . . 535.7 Performance comparison with Standard Projection solver forLarge Load per proc on a 3D Lid Cavity Problem. Ideal lineshown as a solid green line. . . . . . . . . . . . . . . . . . . . 54xAcknowledgementsI would like to first thank my thesis advisor Dr. Anthony Wachs for hisguidance, support, and the engagement through the learning process of thismaster’s thesis. I also sincerely thank my colleagues Damien, Can, Arthur,Mohammad, Arman and Michael for sharing their wisdom and knowledgeduring the process.Finally, I express my profound gratitude to my parents, my sister andmy friends Sapna, Madhu, Ramprasanth for supporting me and encouragingme throughout the process of research and writing this thesis. Thank you.xiChapter 1Introduction1.1 Problem DescriptionThe objective of this work is to implement and validate a highly paral-lelizable direction splitting algorithm for solving the time-dependent in-compressible Navier-Stokes equations shown below for a dimensional boxΩ = [0, Lx]× [0, Ly]× [0, Lz]× ⊂ R3ρ∂tu+ ρu.(∇u)− µ4u+∇p = f, in Ω× [0, T ] (1.1)∇.u = 0, in Ω× [0, T ] (1.2)u|∂Ω = a, in ∂Ω× [0, T ] (1.3)u|t=0 = u0, in Ω (1.4)The quantity f is the source term, a is the boundary data and u0 is theinitial condition. The important idea behind the algorithm implemented inthis work is the replacement of the standard Poisson problem for pressurecorrection by a series of one dimensional pressure boundary value problemsusing direction splitting. This algorithm is already implemented by Guer-mond and Minev [13, 18] but the performance comparison with standardprojection solvers has not been documented yet. The objective of this workis to understand the capabilities of this solver and to report the performancegain with respect to the standard projection solver obtained by this algo-rithm. The convergence and stability properties of the direction splittingsolver are also validated using different test cases. In the below subsections,different types of fields and boundary conditions are explained followed bythe motivation for this work.11.1. Problem Description1.1.1 Centered FieldIn a centered field, the unknown variables are located at the centre of thefinite volume cell. In the Navier-Stokes equations, pressure is a centeredfield. To understand the centered field degree of freedom layout on a gridbetter, an example is shown in Figure 1.1 for pressure unknown in x directionafter direction splitting. In this figure, it can be clearly seen that the pressureunknowns are located at the centre of the cells (at xi− 12,xi+ 12, etc.) and thecontrol volume at xi+ 12goes from xi to xi+1 such that the pressure unknownis at the center of the control volume. Since pressure unknowns are locatedat the centre of the cells in a multidimensional grid, after direction splittingthey are located at the center of the control volume in all the directions asshown in Figure 1.1.0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1xi−1 xi xi+1xi− 12xi+ 12Figure 1.1: Pressure unknowns in x direction after direction splitting1.1.2 Staggered FieldIn a staggered field, the degrees of freedom are located at the faces of thefinite volume cell. In the Navier-Stokes equations, velocity is a staggeredfield. To understand the staggered field degree of freedom layout on a gridbetter, an example is shown in Figure 1.2.In this figure, it can be clearly seen that the velocity(x component)unknowns are located at the vertex of the cell (at xi−1,xi,xi+1 etc.) and thecontrol volume at xi goes from xi− 12to xi+ 12such that the velocity unknownis at the center of the control volume.0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1xi−1 xi xi+1xi− 12xi+ 12Figure 1.2: Velocity(x component) in x direction after direction splitting21.1. Problem DescriptionSince velocity is a staggered field which is located at the centre of theface of each cell, the x component of velocity will be similar to a centeredfield such as pressure in the y and z directions and to a vertex field in thex direction. The same applies to the two other velocity components byrotation of indices.1.1.3 Boundary ConditionsTo solve the system of equations represented in 1.1 to 1.4, we need a setof boundary conditions for velocity and pressure. It is one of the requiredcomponents to make the problem well posed. In this section, the threedifferent types of boundary conditions that are generally encountered indifferent problems involving the Navier-Stokes equations are explained.Dirichlet conditionsIn this type of boundary condition, the value of the unknown variable (h = a)is imposed at the boundary of the problem domain where h is the unknownand a is the specified value. For example, in Figure 1.1 value of pressure isspecified at the points x = 0.0 and x = 1.0 and these are used to solve thesystem of equations for the remaining degrees of freedom, i.e., the unknowns.Neumann conditionsIn case of a Neumann boundary condition, the gradient of the variable (∂h∂n =a) is specified at the boundary of the problem domain where n is the normaldirection to the domain boundary (here x, y, or z as the domain is a cuboid),h is the unknown and a is the specified value. One of the most commonNeumann boundary conditions is the no flux condition, i.e. homogeneousNeumann condition which specifies that ∂h∂n = 0 at the domain boundary.Periodic conditionsIn a periodic boundary condition, the unknown at two opposite boundariesof the domain are set equal to each other. This type of boundary is usuallyspecified when the solution is of periodically repeating nature. For example,in Figure 1.2 value of velocity at x = 0.0 is set equal to velocity at x = 1.0for a periodic boundary condition. One of these 2 degrees of freedom isremoved from the list of unknowns, and this is used to solve the system ofequations for the remaining unknowns.31.2. Related Work1.2 Related WorkThe major problem encountered while developing an algorithm for solvingthe incompressible Navier-Stokes equations is due to the coupling of thevelocity field and the pressure field through the incompressibility constraint.Projection methods have been used since the 1960’s to solve this issuein time dependent viscous incompressible flows after being introduced bythe revolutionary work of Chorin and Temam [5, 41]. They used finitedifference method to solve the incompressible Navier-Stokes equations bydecomposing the velocity field into a divergence free part and a gradientwhich resulted in a pressure Poisson equation. At each time step, onlya sequence of decoupled elliptic equations need to be solved for velocityand pressure in this method which makes it highly efficient for large-scalesimulations.These methods can be viewed as fractional time step methods but sincepressure is not a dynamic variable, the usual techniques developed by Ya-nenko [48] or Glowinski [9] cannot be applied here directly. Hence, it isnot straightforward to develop higher-order projection methods, but a lotof developments have been made in the past few years that have identi-fied different theoretical and implementation issues ([15], [23], [14]). Manyresearchers (Glowinski [9] and Prohl [32]) have been working towards pro-ducing optimal projection schemes.Several works that followed were based on this decomposition of the ve-locity field and on solving the pressure Poisson equation. Most of thesealgorithms can be classified into three categories, pressure correction, veloc-ity correction and consistent splitting methods [19].1.2.1 Pressure correction schemesPressure correction schemes are methods where pressure is treated explicitlyor ignored in the first step and corrected in the second step by projectingvelocity on a divergence free space. The simplest pressure correction schemewas introduced by Chorin and Temam [5, 41] where they ignored thepressure in the first step and used an Euler time stepping method for velocity.Goda [11] developed a multi-step technique that used the explicit pres-sure gradient term in the first step with an Euler time stepping scheme.They used a triple sweep method for pressure resolution that converts thethree dimensional problem into three one dimensional problems and reducesthe computational time to great extent.Van Kan [44] improved this method by proposing an ADI (Alternating41.2. Related WorkDirection Implicit) scheme with pressure correction using Crank-Nicolsonmethod that is second order consistent in time and space. The author alsoobserved that his scheme is not unconditionally stable but indicated thatit might be due to stability of ADI scheme or non-linearity of convectiveterms.Timmermanns, Minev and Van De Dosse [42] proposed a ro-tational incremental pressure correction scheme that solves the problemscaused by artificial pressure Neumann boundary conditions. They use a highorder spectral element method for space discretization that allows them todevelop an efficient algorithm using diagonal mass matrix.Kim and Moin [26] also proposed an algorithm which is similar to therotational form of the pressure correction scheme. They solved the pres-sure Poisson equation using transform method and derived the appropriateboundary conditions for intermediate velocity field that leads to consistentsolution.Shen [37, 38] showed a rigorous error analysis of first order and secondorder projection methods by interpreting them as respective time discretiza-tions of the perturbed Navier-Stokes equations.Strikwerda and Lee [39] showed that the accuracy of the fractionalstep method introduced by Kim and Moin [26] is limited by the boundarycondition in the projection step. They also identified that there is a numer-ical boundary layer for the auxiliary pressure variable but not for velocityand pressure.Achdou and Guermond [1] treated the convection term using a Lagrange-Galerkin technique and used a projection method for advection and diffu-sion. Guermond and Minev [20] proposed a modified version of thismethod where the projected velocity is used as the advecting field in the ex-plicit advection. They showed that this method has similar computationalcost and produces more accurate results on a given grid.1.2.2 Velocity correction schemesVelocity correction schemes were first introduced in a different form byOrszag et al. [28] and Karniadakis et al. [25]. The important fea-ture of this class of schemes is that the roles of velocity and pressure ascompared to pressure correction schemes are interchanged here i.e. viscousterm is treated explicitly or ignored in the first step and velocity is correctedin the next step.Orszag et al. [28] identified that fractional step methods are susceptibleto create numerical boundary layers. They tried high-order extrapolation51.2. Related Workmethods to reduce the time differencing errors caused by these boundarylayers.Karniadakis et al. [25] further minimized the effects of erroneous nu-merical boundary layers by improving pressure boundary conditions. Theyemployed a class of stiffly stable schemes in mixed implicit/explicit timeintegration schemes and showed these schemes have better stability regionscompared to Adams-family schemes.The rotational form of the velocity correction scheme is introduced byGuermond and Shen [22, 23] which yields a better approximation ofpressure than the standard form.Guermond and Shen [22] studied convergence and stability of varia-tions of pressure correction and velocity correction methods. They gave thefirst rigorous proof of convergence and stability of these methods.Guermond and Shen [23] also examined velocity correction methodsin standard and rotational form. They showed that rotational form specif-ically improves error estimates for H1-norm for velocity and L2-norm forpressure. They also mentioned that their methods can be easily imple-mented using other variational approximation techniques such as the FiniteElement method.Rannacher [35] interpreted classical projection methods [5] as a variantof pressure stabilization method of TJR Hughes et al. [24] and showedsome approximation properties especially for pressure.1.2.3 Consistent Splitting methodsThis class of methods was first introduced by Guermond and Shen [14]and it can be reduced to the very famous gauge methods proposed byWeinan et al. [47] by change of variables in continuous space. This methodwas also studied by Wang and Liu [46] and Brown [4].Weinan et al. [47] explored a new formulation of the Navier-Stokesequations in terms of a gauge variable and an auxiliary field. They showedthat the gauge freedom allows one to work with simple boundary condi-tions eliminating the pressure boundary condition problem faced in standardmethods. They also showed that this method does not suffer from numericalboundary layers using normal mode analysis.Brown [4] presented a projection method that improves the first orderaccuracy of pressure in L∞-norm to fully second order and demonstrated thisusing normal mode analysis. The author also discussed a gauge formulationof the Navier-Stokes equations and explained the connection between thetwo methods.61.2. Related WorkWang and Liu [46] provided the theoretical analysis and computa-tional advantages of the gauge method introduced by Weinan et al [47].Specifically, they discussed an implicit gauge method that uses BackwardEuler/Crank-Nicolson for time discretization and showed that the compu-tational cost is reduced to the cost of solving standard heat and Poissonequations.1.2.4 Inexact factorization schemesThis class of methods has been studied by many people as they don’t involveany explicit artificial pressure boundary condition and are believed to givebetter convergence than pressure correction schemes.Gresho and Chan [12] explored the theory behind the semi implicitprojection methods for viscous incompressible flows and implemented it us-ing a Finite Element method. They also showed that this implementationleads to a nearly consistent mass matrix.Perot [30] presented a generalized block LU decomposition for solvingthe incompressible Navier-Stokes equations and showed that this methodachieves high temporal order of accuracy and resolves the pressure boundarycondition issue.Quarteroni and Salari [34] examined the approximation done by split-ting techniques using factorization. They introduced Yosida method thatuses splitting techniques and belongs to the class of quasi-compressibilityschemes. Quarteroni and Salari [33] further presented an analysis of theYosida method by considering a first-order time advancing unsplit method.They showed that the error caused by the Yosida scheme does not affect theoverall accuracy of the solution.Lee et al. [27] examined different types of second-order splitting meth-ods and identified the methods recommended for obtaining zero divergenceand those for obtaining highly accurate pressure values. They derived sec-ond order methods that allow zero pressure gradient boundary conditionusing a transformation that involves ”delta-form” pressure.Guermond [16] reviewed three different approximations of projectionmethods. In the first method, projection step is solved as a div-grad problemwith velocity satisfying paradoxical Dirichlet conditions while in the secondmethod, only the normal component satisfies a boundary condition. In thethird method, the projection step is solved as a Poisson equation with Neu-mann boundary conditions. It is shown that the first method is economicalfor Finite Element approximations, the second method is useful for spectralapproximations and the third method is the easiest to implement as it avoids71.2. Related Workthe problems of the mass matrix.Guermond and Quartepelle [21] analyzed fractional step projectionmethods by means of Finite Element approximations. They identified thatvelocity fields calculated in the viscous and incompressible half steps of theprojection method need to be represented in two spaces and proposed aFinite Element projection method with Poisson equation for incrementalpressure unknown.Guermond [17] studied a variant of the Chorin-Temam projection methodthat uses a three-level backward difference technique for time discretizationand Galerkin technique for space approximation. He showed that the split-ting error on pressure correction is second order in time and is independentof the approximation of the velocity time derivative.Auteri et al. [3] considered first order and second order non-incrementalprojection methods based on spectral Galerkin-Legendre spatial discretiza-tion and showed that convergence of these methods depends on the ability ofspectral framework to approximate the Stokes problem correctly. They alsoinvestigated the importance of the Ladyshenskaya-Babuka-Brezzi conditionin spectral projection methods.Guermond et al. [19] clearly showed that this class of methods enforcesa weak pressure boundary condition and does not provide better accuracythan PDE based methods (pressure correction schemes).1.2.5 Other schemesGirault and Raviart [8] presented a comprehensive overview of differ-ent Finite Element methods used for solving incompressible stationary flowproblems.Glowinski [10] reported different applications of Finite Element meth-ods for the Navier-Stokes equations to solve incompressible and compressibleviscous flow problems.Taylor and Hood [40] proposed a Finite Element discretization tech-nique to solve the Navier-Stokes equations and compared the different formu-lations using velocity-pressure against stream function-vorticity for differentflow problems.Shang et al. [36] developed a parallel two level linearization methodfor stationary incompressible Navier-Stokes equations that is based on FiniteElement discretization and domain decomposition. Their approach involvedsolving a non-linear problem by Newton iterations on a coarse grid andcorrecting it by solving a linearized Oseen problem in parallel on the finegrid.81.3. Motivation1.3 MotivationImmersed boundary type of micro-scale simulations of particle-laden flowspose the Navier-Stokes problem in a simple cuboid domain. In fact, com-plex geometries are considered as immersed boundaries. These simulationsfeature up to hundreds of millions of grid cells and require long computingtimes. It generally turns out that the solution to the incompressible Navier-Stokes equations represents a significant part of the total computing time(often beyond 80% in dilute systems) and can be seen as the bottleneck interms of computational speed. This demands a highly efficient Navier-Stokessolver for faster and effective computations.This work involves solving the system of three dimensional equations byconverting them into three 1D problems and solving them with Schur com-plement technique without any iterative procedure. This method is depictedto be highly scalable and expected to have similar convergence properties asstandard projection algorithms. Hence, this work can be directly used in themicro-scale simulations of particle-laden flows to reduce the total comput-ing time. Additionally, the solver is added to the list of solvers implementedin the platform maintained by the group such that it can be plugged withother techniques for the solution of several classes of problems. The solveris developed and validated against many test cases both in 2D and 3D. Theperformance of the solver is compared with the standard projection solver ona range of cores from one to 1,000 and the scalability of both solvers are com-pared. At a personal level, developing this new solver involved developingnew skills including understanding the different types of direction splittingtechniques and domain decomposition techniques, C++ programming, MPIcommunications and grouping, and conducting weak and strong scalabilitytests to estimate the performance of a solver. This part of my work is notstraightforward to report or straightforwardly realized by the reader of thisreport. As a simple illustration, my solver corresponds to about 5,000 linesof code and hundreds of hours of programming, checking, testing and fixingbugs.The thesis is organized as follows: Chapter 2 describes the directionsplitting algorithm for the Navier-Stokes solver. Chapter 3 explains thedomain decomposition methods used for parallelizing the solver on multipleprocessors. Chapter 4 reports the space convergence and time convergenceproperties of the solver for different test cases. Chapter 5 illustrates thescalability of the overall solver and its different components and comparesit with the standard projection solver. Some conclusions and directions forfuture work are discussed in Chapter 6.9Chapter 2Direction SplittingAlternating Direction Implicit (ADI) strategy was first proposed by Peace-man and Rachford [29] for 2D parabolic problems. It is a two-stage timeintegration scheme similar to the Crank-Nicolson method where at each timesub-step, the second order derivative with respect to one space variable issolved implicitly while the other is made explicit.This strategy is highly effective for parabolic problems resulting in schemesthat are unconditionally stable and have similar computational complexityas explicit schemes. Later, Douglas [6] proposed a Direction Splitting schemesimilar to ADI that can be applied to parabolic problems in any space di-mension and has comparable stability and convergence properties as ADI.2.1 Types of SplittingCombining various time integration schemes with Alternate Direction Im-plicit (ADI) schemes results in different types of Direction Splitting. Inthis section, explicit method, implicit method and Crank-Nicolson methodof Direction splitting is illustrated for a 3D Heat equation problem definedbelow:∂u∂t− ν4u = f (2.1)2.1.1 Explicit methodExplicit method involves combining explicit scheme for time stepping withADI scheme. This is first order accurate in time. In this, unknowns at timet = n + 1 are directly calculated using known quantities at time t = n. Itrequires less computations but is only stable for certain values of time stepsizes depending on the problem. The explicit time integration of the 3DHeat equation equation results in the following equation:un+1 − un4t − ν(∂xxun + ∂yyun + ∂zzun) = f(tn+12 ) (2.2)102.1. Types of SplittingIncorporating an ADI scheme, we solve for two intermediate unknownsηn+1, ζn+1 and use it to solve for un+1 as shown below.ηn+1 − un4t − ν∂xx(un) = f(tn+12 ) (2.3)ζn+1 − ηn+14t − ν∂yy(ηn) = 0 (2.4)un+1 − ζn+14t − ν∂zz(ζn) = 0 (2.5)2.1.2 Implicit methodImplicit method involves combining implicit scheme for time stepping withADI scheme. This is first order accurate in time. In this, unknowns attime t = n + 1 are calculated by solving system of coupled equations. Itrequires more computations than the explicit method but is unconditionallystable, meaning that it works for all values of time step. The implicit timeintegration of the 3D Heat equation equation is as follows:un+1 − un4t − ν(∂xxun+1 + ∂yyun+1 + ∂zzun+1) = f(tn+12 ) (2.6)Incorporating an ADI scheme, we introduce intermediate unknowns ηn+1,ζn+1and solve the system of equations in separate directions x,y and z for a giventime t = n+ 1 as follows:ηn+1 − un4t − ν∂xx(ηn+1) = f(tn+12 ) (2.7)ζn+1 − ηn+14t − ν∂yy(ζn+1) = 0 (2.8)un+1 − ζn+14t − ν∂zz(un+1) = 0 (2.9)112.1. Types of Splitting2.1.3 Crank-Nicolson MethodThis method involves combining the Crank-Nicolson scheme for time step-ping with an ADI scheme. It is second order accurate in time as it involvessolving for unknowns at fractional time steps. Like the implicit method, thismethod also requires more computations and is unconditionally stable. TheCrank-Nicolson time integration of 3D Heat equation is as follows:un+1 − un4t −ν24(un + un+1) = f(tn+ 12 ) (2.10)Incorporating an ADI scheme, we obtain the following set of equations:ξn+1 − un4t − ν4un = f(tn+12 ) (2.11)ηn+1 − ξn+14t −ν2∂xx(ηn+1 − un) = 0 (2.12)ζn+1 − ηn+14t −ν2∂yy(ζn+1 − un) = 0 (2.13)un+1 − ζn+14t −ν2∂zz(un+1 − un) = 0 (2.14)As shown in the previous section, the 3D Navier-Stokes equations aresolved using Standard Projection method as follows:∂tu− ν4ue +∇pe = f (2.15)4tApe +∇.ue = 0 (2.16)Using the Crank-Nicolson method of direction splitting described above,we convert the velocity update step such that it is solved as three 1D prob-lems and similarly, we approximate the operator A as A = (1 − ∂xx)(1 −∂yy)(1− ∂zz) to perform pressure splitting resulting in three 1D equationsfor pressure. In a standard projection algorithm, the operator A is the clas-sical multidimensional Laplacian operator −∆. In the following sections, weshow the different steps of our numerical scheme to approximate the solutionto the 3D Navier-Stokes equations.Figure 2.1 shows the 2D domain and how the different components ofvelocity and pressure fields are located in the domain in a MAC fashion.This figure also shows that after direction splitting, pressure field is always122.2. Our Direction Splitting Navier-Stokes Solverpi+ 12 ,j,kpi− 12 ,j,k pi+ 32 ,j,kui−1,j,k ui+1,j,kui,j,kui,j+1,kui,j−1,kvi,j−1,kvi,j+1,kvi,j,k vi+1,j,kvi−1,j,kFigure 2.1: 2D Domain with pressure and velocity fieldcell centered whereas the x component of velocity (u) is vertex centered inx direction and cell centered in other directions. The same applies to the ycomponent of velocity (v) by rotation of indices. Similarly, if the domain isvisualized in 3D it can be understood that z component of velocity (w) isvertex centered in z direction and cell centered in other directions.2.2 Our Direction Splitting Navier-Stokes SolverFinite Volume method is used to discretise and solve the above equations.Similar to Finite Difference method and Finite Element method, this methodconverts the partial differential equations to algebraic equations by calcu-lating values at discrete points in a given mesh. A Small control volumearound each node point in the mesh is denoted as ”Finite Volume” in thismethod.Initially, the equations are integrated over a given finite volume and vol-ume integral of divergence terms are converted to surface integrals usingGauss’s divergence theorem. These surface integrals are then evaluated us-ing fluxes at the surfaces of each finite volume. Since the fluxes entering andleaving a given finite volume cancel out, this method is conservative.132.2. Our Direction Splitting Navier-Stokes Solver2.2.1 Pressure predictorThe first step is the pressure predictor where we predict an intermediatepressure that is used for updating velocity. This is similar to any fractionaltime stepping technique used to solve the Navier-Stokes equations. We ini-tialize p−12 = 0 and φ−12 = 0 on the entire domain at time step t = 0. Theintermediate pressure p∗,n+12 at time t = n is calculated using the followingequation:p∗,n+12 = pn−12 + φn−12 (2.17)Integrating the equation over a finite volume centered at (i,j,k) we get,∫Vp∗,n+12dV =∫Vpn−12dV +∫Vφn−12dV (2.18)Expanding the integrals and solving, we getp∗,n+ 12i+ 12,j,k= pn− 12i+ 12,j,k+ φn− 12i+ 12,j,k(2.19)2.2.2 Velocity updateThe second step involves updating the velocity using the intermediate pres-sure. We do this using the second order direction splitting technique dis-cussed earlier. Velocity is initialized as u0 = 0 at time t = 0 and at anytime t = n + 1 is calculated, by solving for intermediate velocities in eachdirection as follows:ξn+1 − un4t − ν4un +∇p∗,n+ 12 +NL(un,un−1) = f(tn+ 12 ) (2.20)ηn+1 − ξn+14t −ν2∂xx(ηn+1 − un) = 0 (2.21)ζn+1 − ηn+14t −ν2∂yy(ζn+1 − un) = 0 (2.22)un+1 − ζn+14t −ν2∂zz(un+1 − un) = 0 (2.23)Integrating the above equations over a finite volume centered at (i,j,k) ,we get:142.2. Our Direction Splitting Navier-Stokes Solver∫Vξn+1 − un4t dV −∫Vν4undV +∫V∇p∗,n+ 12dV+∫VNL(un,un−1)dV =∫Vf(tn+12 )dV (2.24)∫Vηn+1 − ξn+14t dV −∫Vν2∂xx(ηn+1 − un)dV = 0 (2.25)∫Vζn+1 − ηn+14t dV −∫Vν2∂yy(ζn+1 − un)dV = 0 (2.26)∫Vun+1 − ζn+14t dV −∫Vν2∂zz(un+1 − un)dV = 0 (2.27)Converting the integrals in equation 2.24 using Gauss divergence theo-rem, we get∫Vξn+1 − un4t dV −∫Sν(∇un.nˆ)dS +∫S(p∗,n+12 .nˆ)dS+∫VNL(un,un−1)dV =∫Vf(tn+12 )dV (2.28)Solving the integrals in equation 2.28 we getξn+1i,j,k − uni,j,k4t dxdydz − ν((uni+1,j,k − uni,j,kxi+1 − xi −uni,j,k − uni−1,j,kxi − xi−1 )dydz+ (uni,j+1,k − uni,j,kyi+1 − yi −uni,j,k − uni,j−1,kyi − yi−1 )dxdz+ (uni,j,k+1 − uni,j,kzi+1 − zi −uni,j,k − uni,j,k−1zi − zi−1 )dxdy+ (p∗,n+ 12i+ 12,j,k− p∗,n+12i− 12,j,k)dydz iˆ+ (p∗,n+ 12i,j+ 12,k− p∗,n+12i,j− 12,k)dxdz jˆ+ (p∗,n+ 12i,j,k+ 12− p∗,n+12i,j,k− 12)dxdy kˆ +NL(un,un−1)dxdydz = f(tn+12 )dxdydz(2.29)Note that the scalars (pressure differences) indicated with iˆ , jˆ and kˆcontribute to the x-component, y-component and z-component of velocityrespectively. Integrating the equations 2.25, 2.26 and 2.27 in 1D, we get:152.2. Our Direction Splitting Navier-Stokes Solverηn+1i,j,k − ξn+1i,j,k4t dx−ν2((ηni+1,j,k − ηni,j,kxi+1 − xi −ηni,j,k − ηni−1,j,kxi − xi−1 )−(uni+1,j,k − uni,j,kxi+1 − xi −uni,j,k − uni−1,j,kxi − xi−1 )) = 0 (2.30)ζn+1i,j,k − ηn+1i,j,k4t dy −ν2((ζni,j+1,k − ζni,j,kyj+1 − yj −ζni,j,k − ζni,j−1,kyj − yj−1 )−(uni,j+1,k − uni,j,kyj+1 − yj −uni,j,k − uni,j−1,kyj − yj−1 )) = 0 (2.31)un+1i,j,k − ζn+1i,j,k4t dz −ν2((un+1i,j,k+1 − un+1i,j,kzk+1 − zk −un+1i,j,k − un+1i,j,k−1zk − zk−1 )−(uni,j,k+1 − uni,j,kzk+1 − zk −uni,j,k − uni,j,k−1zk − zk−1 )) = 0 (2.32)2.2.3 Advection TermIn 2.29, different methods can be used to evaluate explicitly the advectionterm NL(un,un−1) with respect to discretization in space and time. Thesemethods are discussed below.Space discretizationTwo methods are considered for discretizing the advection term in space,namely a first order in space Upwind scheme and a second order in spaceTotal Variation Diminishing scheme. These methods are briefly explainedbelow.Upwind Scheme: Upwind scheme is a type of space discretizationmethod that numerically incorporates the direction of information flow withina fluid. It discretizes the equations using differencing like other numericalmethods but this is biased in the flow direction identified by the sign of thecharacteristic speed.Total Variation Diminishing(TVD): The first order Upwind schemeproduces high level of false diffusion close to high gradients due to its loworder of accuracy. Conversely, high order methods result in oscillations in162.2. Our Direction Splitting Navier-Stokes Solversolutions near high gradient points. TVD schemes are a class of methodsdeveloped to obtain highly accurate and oscillation free solutions. Theseschemes ensure that the total variation of the discrete solution diminisheswith time. This leads to preserving monotonic behavior ensuring no oscil-lations or wiggles in the solution. Hence, total variation is checked at eachstep and the solution is total variation diminishing if the following equationholds true at all timesTV D(φn+1) ≤ TV D(φn) (2.33)The absence of oscillations in the solution is achieved through the use of aflux limiter. There exists many different types of flux limiter. In our code,we use a Superbee flux limiter.Time discretizationWe also consider two explicit discretization schemes in time with first andsecond order accuracy, respectively. These time discretization schemes arediscussed below.First Order scheme : First order scheme is one of the most commonmethods that only uses the values and derivatives from the previous discretetime to obtain the numerical solution at the current discrete time. Thistechnique is used to approximate the advection term at time t = n as follows:NL(un) = un.∇unIt is first order accurate, unconditionally bounded and numerically stableif the CFL condition is satisfied which is as follows:∣∣∣∣v4t4x∣∣∣∣ < 1 (2.34)The time step 4t is chosen such that this CFL condition is always satisfiedensuring the stability of the numerical solution obtained.Adams-Bashforth scheme : Adams-Bashforth scheme is one of thelinear multistep methods that use the linear combination of values andderivatives from previous discrete time to obtain the numerical solutionat the current discrete time. This technique is used to approximate theadvection term at time t = n using velocities un and un−1 as follows:NL(un,un−1) =32un.∇un − 12un−1.∇un−1172.3. Penalty stepThe Adams-Bashforth scheme also requires to satisfy a CFL conditionas it is an explicit scheme. However, it seems that the stability conditionof the Adams-Bashforth scheme is slightly more stringent than for the firstorder scheme. In fact, practice has shown that the proper (but potentiallyconservative) CFL condition is ∣∣∣∣v4t4x∣∣∣∣ < 0.35 (2.35)2.3 Penalty stepThe next step is solving the pressure Poisson equation for updating φ. Thisis usually solved as a 3D multidimensional Poisson problem in classical pro-jection solvers. Here, we use the direction splitting technique proposed byGuermond and Minev [13, 18] to convert this 3D multidimensional Poissonproblem into three 1D problems. This is achieved by replacing the standardLaplacian operator A = −4 by the operator A = (1−∂xx)(1−∂yy)(1−∂zz).ψn+12 − ∂xxψn+ 12 = − 14t∇.un+1 (2.36)ϕn+12 − ∂yyϕn+ 12 = ψn+ 12 (2.37)φn+12 − ∂zzφn+ 12 = ϕn+ 12 (2.38)Integrating the equations over a finite volume centered at (i,j,k) we get,∫Vψn+12dV −∫V∂xxψn+ 12dV =∫V− 14t∇.un+1dV (2.39)∫Vϕn+12dV −∫V∂yyϕn+ 12dV =∫Vψn+12dV (2.40)∫Vφn+12dV −∫V∂zzφn+ 12dV =∫Vϕn+12dV (2.41)Converting the integral in equation 2.39 using Gauss divergence theorem,we get∫Vψn+12dV −∫S(4ψn+ 12 .nˆ)dS =∫S− 14t(un+1.nˆ)dS (2.42)182.3. Penalty stepIntegrating and solving the equations 2.40,2.41 and 2.42 we get,ψn+ 12i,j,k dx− ((ψn+ 12i+1,j,k − ψn+ 12i,j,kxi+1 − xi )− (ψn+ 12i,j,k − ψn+ 12i−1,j,kxi − xi−1 )) =− dx4t((un+1i+1,j,k − un+1i,j,kxi+1 − xi ) + (un+1i,j+1,k − un+1i,j,kyj+1 − yj ) + (un+1i,j,k+1 − un+1i,j,kzk+1 − zk )) (2.43)ϕn+ 12i,j,k dy − ((ϕn+ 12i,j+1,k − ϕn+ 12i,j,kyj+1 − yj )− (ϕn+ 12i,j,k − ϕn+ 12i,j−1,kyj − yj−1 )) = ψn+ 12i,j,k dy (2.44)φn+ 12i,j,k dz − ((φn+ 12i,j,k+1 − φn+ 12i,j,kzk+1 − zk )− (φn+ 12i,j,k − φn+ 12i,j,k−1zk − zk−1 )) = ϕn+ 12i,j,k dz (2.45)2.3.1 Pressure updateThe final step is updating the pressure using the velocity and updated φfield. It is done as follows:pn+12 = pn−12 + φn+12 − χν∇.(12(un+1 + un)) (2.46)Integrating the equation over a finite volume centered at (i,j,k) we get,∫Vpn+12dV =∫Vpn−12dV+∫Vφn+12dV−∫Vχν∇.(12(un+1+un))dV (2.47)Reducing the equation using Gauss’s Divergence Theorem we get,∫Vpn+12dV =∫Vpn−12dV +∫Vφn+12dV − χν2∫S((un+1 +un)).nˆdS (2.48)Expanding the integrals and solving, we get192.3. Penalty steppn+ 12i+ 12,j,k= pn− 12i+ 12,j,k+ φn+ 12i+ 12,j,k− χν2((un+1i+1,j,k − un+1i,j,kxi+1 − xi )− (uni+1,j,k − uni,j,kxi+1 − xi )(un+1i,j+1,k − un+1i,j,kyj+1 − yj )− (uni,j+1,k − uni,j,kyj+1 − yj )(un+1i,j,k+1 − un+1i,j,kzk+1 − zk )− (uni,j,k+1 − uni,j,kzk+1 − zk )) (2.49)The variable χ controls the order of accuracy for pressure. Value ofχ = 1 involves the contribution from velocity and leads to better order ofaccuracy for pressure.20Chapter 3Domain DecompositionDomain decomposition methods solve a boundary value problem by divid-ing the domain into multiple subdomains. This results in solving smallerboundary value problems on each subdomain and repeatedly synchronizingthe solution between adjacent subdomains. The smaller problems on eachsubdomain are independent and this makes these methods appropriate forparallel computing. These methods are usually used as preconditioners foriterative methods such as the conjugate gradient method. Based on thenumber of points shared across subdomains, there are two different typesof domain decomposition methods, namely overlapping methods and non-overlapping methods.In overlapping methods, subdomains share more than an interface. Thesemethods include Schwarz alternating method and additive Schwarz method.Many other methods can be seen as special case of additive Schwarz method.In non-overlapping methods, subdomains share only the interface. So-lution continuity across subdomain interface is enforced in primal methodslike balancing domain decomposition by denoting the value of solution assame unknown on all adjacent subdomains whereas it is done by Lagrangemultipliers in dual methods such as FETI.Using direction splitting, the computationally expensive update steps forvelocity and pressure in our 3D Navier-Stokes problem are converted intothree one dimensional problems respectively. Furthermore, the domain ineach direction is split into multiple subdomains according to domain decom-position methods.3.1 TypesIn this work, only a non-overlapping method (which shares only an interface)is implemented for domain decomposition. It is explained in this sectionfor two different types of field that are present in Navier-Stokes equations,namely the cell centered field (pressure) and the vertex centered field (ve-locity component in their respective direction) with domain diagrams forbetter understanding.213.2. Saddle point problem3.1.1 Cell centered FieldFor cell centered fields, non-overlapping domain decomposition results in aninterface that does not contain any unknown. It is shown as below.0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1Figure 3.1: Non overlapping splitting for cell centered unknownAs depicted in the Fig 3.1, the unknowns are at the center of the cells forthis method. Assuming the number of subdomains is two in x, the splittinghappens at 0.5 for this method and there is no unknown at the interface.The two subdomains do not share any interface unknown. Centered fieldshence require a special treatment as described in section 3.4.3.1.2 Vertex centered FieldVertex centered fields are obtained for velocity components in their respec-tive direction after direction splitting. For a vertex centered field, non-overlapping domain decomposition results in an interface that contains anunknown. It is shown as below.0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1Figure 3.2: Non overlapping splitting for vertex centered unknownIn this method, as shown in the Fig 3.2, the unknowns are at the verticesof the cell. Assuming the number of subdomains is two in x, the splittinghappens at 0.5 for this method and there is an unknown at 0.5. The twosubdomains share an interface unknown.3.2 Saddle point problemSaddle point problems are very common in many fields such as incompress-ible flow problems, image processing, structural analysis, resistive networksand PDE constrained optimization.Let us consider a saddle point problem as follows:Ax+ Cλ = f (3.1)223.2. Saddle point problemCTx = g (3.2)These problems are usually solved mainly using two classes of solvers,namely direct methods that are based on factorization of matrix A andKrylov Subspace methods that are iterative in nature. In the followingsubsections, two methods are described and compared. The former methodis an iterative method known as Uzawa algorithm and the latter method isa direct method based on Schur complement.3.2.1 Uzawa AlgorithmUzawa algorithm is usually used to solve such saddle point problems in aniterative manner and it is named after Hirofumi Uzawa who introduced itfor concave programming [43].The first step of the algorithm involves computing the residual (in λcalculation) for the conjugate gradient algorithm,r2 = CTA−1f − g − Sλ = CTA−1(f − Cλ)− g = CTx− g, (3.3)where we use the x = A−1(f−Cλ) obtained from substituting the initialguess of λ in 3.12 equation.We assume the first search direction asp2 = r2 (3.4)and this completes the initialization.Each step of the process involves computing the following,a2 = Sp2 = CTA−1Cp2 = CT p1 (3.5)wherep1 = A−1Cp2 (3.6)Scaling factor is computed asα =pT2 r2pT2 a2(3.7)and this is used to update r2 and λ as follows:λ = λ+ αp2, r2 = r2 − αa2 (3.8)233.2. Saddle point problemThe variable x is updated using p1 as follows:x = x− αp1 (3.9)And the new search direction is defined as:β =rT2 a2pT2 a2, p2 = r2 − βp2 (3.10)Uzawa solution is assumed to have converged to the true solution if thenorm of residual r2 is smaller than a tolerance value (10−6 was used in thesettings).3.2.2 Schur Complement methodIn order to solve this equation, we multiply 3.1 by CTA−1 and subtract itfrom 3.2 equation which results in the following equation,Sλ = CTA−1f − g (3.11)where S = CTA−1C is the Schur complement. Solving 3.11 and substi-tuting λ in 3.1 we get,Ax = f − Cλ (3.12)This is the general solution for a saddle point problem as depicted in 3.1.However, computing the Schur complement S is very expensive in most ofthe cases and even sometimes practically impossible.3.2.3 ComparisonDirect methods are very popular in some areas but they tend to be com-putationally expensive as it is difficult to parallelize them. They are alsonot feasible for 3D problems. Iterative methods are appropriate for largeproblems but they tend to converge slowly and the number of iterationsfor convergence increases with problem size. Good preconditioners are alsoneeded for iterative methods.But for lots of different rhs values, in Schur complement method, weonly compute the Schur complement once at the start and then solve thetwo system of equations x and λ by Gaussian elimination whereas for theUzawa algorithm, the iterative procedure has to be performed for each rhsvalue until convergence.243.3. Problem FormulationTo understand the difference between their efficiency, consider a variantof 1D Poisson equation as follows:(−4)u = f (3.13)This equation is solved numerically using direction splitting and domaindecomposition on smooth analytical solution u = inx2 which gives f =( − 1) inx2 for i = 1, ...n. Both Uzawa algorithm and Schur complementmethod is used to solve the system. The chosen analytical solution is usedto test both the methods for several different rhs values and the observedexecution runtimes is tabulated in Table 3.1.Numberof rhsUzawaAlgorithmSchurComplement10 0.0076 0.0029100 0.0158 0.00761000 0.0470 0.029410000 0.3866 0.2619100000 3.1436 2.1676100000 41.3113 28.2538Table 3.1: Execution time comparison between Uzawa algorithm and Schurcompelement TechniqueFrom the table, it can be clearly seen that the Schur complement methodis faster than the Uzawa algorithm for any number of rhs values. It is alsonaturally more accurate than the Uzawa algorithm as it is a direct method.This makes the Schur complement method more suitable for these problems.3.3 Problem FormulationDomain decomposition problems can be formulated in two different ways,with and without using Lagrange multiplier. In order to understand thesetwo methods, let us consider a 1D Heat equation problem of the form givenbelow:∂u∂t− ν4u = f (3.14)on a given domain. This problem is converted to a N dimensional linearsystem Ax = b after discretization. For domain decomposition, the domain253.3. Problem FormulationD is split into multiple subdomains, say two subdomains D1 and D2 thatshare an interface. The flux contribution from interface is handled differentlyin the two methods and in the following sections, these are explained indetail.3.3.1 With Lagrange MultiplierIf the vectors of unknowns are x1 and x2 in D1 and D2 respectively, thenthe linear system corresponding to the discretized version of equation (3.14)can be written as follows:A11 0 A1T0 A22 A2TAT1 AT2 0x1x2λ =F1F2G (3.15)where F1,F2 are corresponding rhs values for unknowns x1,x2 and G isthe rhs value corresponding to the overlapping interface (and is generally 0).In the above equation, the Lagrange multiplier λ enforces that x1 and x2match at the interface and provides the flux contribution for the variablesat the interface on each sub-domain. 3.15 represents a linear system ofequations and by assuming all subdomain unknowns as x we can rewrite3.15 as:[Aii AiTAT i 0] [xλ]=[FG](3.16)3.16 represents a saddle point problem and it can be solved using differentmethods as discussed in the previous section. Both the Schur complementtechnique and the Uzawa algorithm were tested and as mentioned earlier,the Schur complement technique surpasses the Uzawa algorithm for multiplerhs values.3.3.2 Without Lagrange MultiplierIn this method, the interface variables are explicitly written as interfaceunknowns xe along with the unknowns in the subdomains D1 and D2. Thediscretized version of equation 3.14 can then be written as follows:A11 0 A1e0 A22 A2eAe1 Ae2 Aeex1x2xe =f1f2fe (3.17)263.4. Implementationwhere f1 and f2 are rhs values corresponding to the unknowns x1, x2 inthe subdomains D1 and D2 and fe corresponds to the interface unknownsxe. This saddle point problem can be rewritten by grouping all subdomainunknowns together into xi as follows:[Aii AieAei Aee] [xixe]=[fife](3.18)3.18 can be solved using different techniques as described in the previoussection. Please also note that in the form (3.18) the problem can be simplysolved by a linear system solver as it corresponds to a re-arrangement (re-numbering) of unknowns. However, this should be understood in the sense ofparallel computing where each sub-matrix Aii belongs to a separate processand the matrices Aei, Aie and Aee are needed to ”glue” the pieces of thesolution on the various processes (in practice processors) together. Sincedirection splitting allows to solve a sequence of 1D problems, the matricesAii in the linear systems of the form (3.18) are simple tridiagonal matricesand the matrix Aee is a simple diagonal matrix. Hence the Schur complementis not over expensive to construct. Actually, in section 3.4, we present thegeneral algorithm with the construction of the Schur complement for 1Dproblems.Also, the rhs values change multiple times while all the matrices remainconstant. For example if the system is solved in x, then the matrices Aii,Aie, Aei, and Aee remain constant but different rhs values are obtainedfor all combinations of j and k values in y and z directions on the grid,respectively. This indicates that the Schur complement method might beprefarable compared to the Uzawa algorithm as it performs much better inthis case as shown earlier.3.4 ImplementationFinally, the problem is formulated without Lagrange multiplier and is solvedusing a Schur complement method due to the advantages indicated in theprevious section. The domain decomposition for different fields is done asexplained below.For centered fields, the last unknown in each subdomain is assumed asthe interface unknown and the last subdomain doesn’t have any interfaceunknown. For example, in Fig 3.3, we show a 1D domain split into threesubdomains. The subdomain unknowns are labeled in blue and the interface273.4. Implementationunknowns are labeled in red. As explained above, the interface unknownsare the last unknowns in first two subdomains whereas last subdomain hasno interface unknown. This is done to address the problem shown in Figure3.1 where the subdomains don’t share an interface unknown.0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1Figure 3.3: Domain decomposition for centered fieldThe only exception to this is when the domain has a periodic boundarycondition. In this case, the unknown at−0.05 shown in brown color in Figure3.3 should be equal to the unknown at 0.95 and hence this is assumed as aninterface unknown to impose this condition. In this case, all the subdomainshave an interface unknown.For staggered fields, the velocity component unknown might be on thevertex in one direction and on the center of the cell in the other two di-rections. For example, the x component of velocity is on the vertex in xdirection and on the center of the cell in y and z directions. The unknownsat the center of the cell are treated as shown above for centered fields. Theunknowns at the vertex of the cell are treated similarly with the overlappinginterface. For example, in Fig 3.4, we show a 1D domain split into threesubdomains for x component of velocity in x direction. The subdomainunknowns are labeled in blue and the interface unknowns are labeled in red.0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1Figure 3.4: Domain decomposition for staggered fieldAs explained above for centered fields, when the boundary condition isperiodic, the unknown at 0 shown in Figure 3.4 in brown color should beequal to the unknown at 1.0 and this condition is imposed by assuming theunknown at 1.0 as an interface unknown and this ensures that for periodicboundary condition, all the subdomains have an interface unknown.This algorithm was implemented for both Heat equation problems aswell as the Navier-Stokes problem on a 3D Cartesian structured grid usingMPI (message passing interface) techniques.Direction splitting followed by discretization results in 1D linear tridi-agonal systems that are used as explained above by formulating the prob-283.4. Implementationlem without Lagrange multiplier and solving it using a Schur complementmethod. This procedure is done as follows:In each direction, domain decomposition results in subdomain unknownsxi and interface unknowns xe. This results in a large number of partitionedlinear systems in each subdomain of the following form that has to be solvedat each time step in x,y and z respectively,[Aii AieAei Aee] [xixe]=[fife](3.19)where fi and fe are appropriate rhs values for subdomain unknowns andinterface unknowns.In each direction, the first processor in that direction is in charge ofassembling and storing the Schur complement. This processor is referred toas master processor in that direction. Master processors in each directionare grouped using MPI grouping which is used to assign set of processorsinto a group and ensure that communication occurs only within the group.The master processors in each direction and the communication involvedbetween them is shown in Figure 3.5.Figure 3.5: Communications and processors in charge of Schur complementIn order to solve equation 3.19 using a Schur complement method, weneed to compute the Schur complement matrix S = Aee −AeiA−1ii Aie. Thismatrix is computed once at the pre-processing stage (at the initial timet = 0) and stored in the master processor in that direction.Denoting the number of interfaces in the x direction as ne and the num-ber of unknowns in ith subdomain is ni. The matrix Aii is a ni×ni matrix,Aie is a ni × ne matrix and Aei is a ne × ni matrix. The Schur complementmatrix S is a ne × ne matrix.Since we don’t want to compute A−1ii that is computationally expensive,we solve the system of equations AiiX = Aie for computing the term A−1ii Aie293.4. Implementationin all the processors and this is done by taking each column of Aie as a rhsvector b and solving the system Aiiz = b to obtain the columns of the matrixX.It has to be noted here that this is very efficient as the maximum numberof non-zero columns in Aie is only two as the maximum number of interfacesassociated with a subdomain is two. Once this matrix X is computed, thematrix AeiX is communicated to the master processor along with the fluxterm corresponding to the interface unknown ae.AiiX = AieOther processorsS = Aee − AeiA−1ii AieMaster processorAeiXaeFigure 3.6: MPI communications for assembling Schur complement matrixThe term AeiX from all the processors is used to construct AeiA−1ii Aieand the flux values ae are assembled to obtain the diagonal matrix Aee.These matrices are then subtracted to obtain the complete Schur comple-ment matrix S in the master processor.The total amount of communication per processor in each direction con-sists of sending ne × ne values along with the flux value in the subdomainto the master processor in that direction. For notation purposes, the lastunknown in each subdomain is considered as the interface unknown. Thisinformation flow is also shown in the Figure 3.6.Now, we solve the equation Sxe = fe − AeiA−1ii fi and use the values ofinterface unknowns xe to solve Aiixi = fi −Aiexe for subdomain unknownsxi. More specifically, this is done by first solving the local tridiagonal sys-tems Aiiyi = fi and then communicating the vector Aeiyi along with the rhsvalue pe of the interface equations to the master processor where the Schurcomplement is stored.Aiiyi = fiAiixi = fi − AiexeOther processorsSxe = fe − AeiA−1ii fiAiixi = fi − AiexeMaster processorAeiyi + pexeFigure 3.7: MPI communications for solving linear system in 3.18303.5. Linear System SolversIn the master processor, the rhs terms pe from all the processors areused to construct the fe vector and Aeiyi values are used to assemble thevector AeiA−1ii fi. These two vectors are subtracted to obtain the rhs vectorfe − AeiA−1ii fi and the interface unknowns are obtained by solving Sxe =fe−AeiA−1ii fi. After this, the vector Aiexe is communicated back to all theother processors where their respective unknowns are obtained by solvingAiixi = fi −Aiexe.Note that solving the linear system Aiiyi = fi does not involve any com-munication. Thus, the total amount of communication per processor in eachdirection consists of sending a vector of size equal to the number of interfaceunknowns (each internal domain has two interfaces in the given direction)along with the rhs value for the interface unknown in that subdomain andreceiving the vector of size equal to the number of interface unknowns. Thisoverall communication process is visually shown in Figure 3.7In order to minimize the total number of MPI communications in eachdirection, all the data that need to be sent in a particular direction arepacked and this ensured that there are only two MPI communications for agiven direction at any time step to solve the linear system.3.5 Linear System SolversThere are several algorithms for solving system of linear equations likeCramer’s rule, row reduction, elimination of variables and so on. Whilethese algorithms are generally used for solving small systems, large systemsare usually formulated in matrix form such as Ax = b and standard algo-rithms for solving these sets of linear equations can be categorized into twotypes, namely direct and iterative methods.Direct methods use Gaussian elimination or its variant to solve the sys-tem Ax = b by converting the matrix A into a lower triangular matrix andperforming back substitution. For special cases where matrices have uniquestructure, algorithms make use of this to achieve faster or accurate solutions.For example, if A is symmetric positive definite, the system can be solvedtwice as fast with Cholesky decomposition. Special methods exist for sparsematrices which have many zeros as entries.Iterative methods are a class of methods that start with an initial guesssolution (not necessary to be accurate) and repeatedly update this solutionsuch that it gets closer to the true solution. Examples of iterative methodsare Jacobi method, Gauss Seidel method, Successive over relaxation methodand so on. In this section, we compare the performance of two direct meth-313.5. Linear System Solversods, namely LU decomposition and Thomas algorithm.3.5.1 LU decompositionLU decomposition method consists of usually decomposing the left hand sidematrix A into a lower triangular matrix L and an upper triangular matrixU . Then the linear system Ax = b is solved as follows:LUx = b (3.20)Ly = b (3.21)Ux = y (3.22)Thus, once the decomposition into matrices L and U is done for a givenmatrix at time t=0, this method is very fast for different vectors b as there isno need to recompute the matrices L and U for each vector b at all time stepsand only a sequence of backward/forward substitutions is required. Thismethod can be seen as a matrix form of the traditional Gaussian elimination.3.5.2 Thomas algorithmThomas algorithm is a special case of Gaussian elimination for tridiagonalmatrices. It is one of those algorithms that make use of the special structureof matrix A (tridiagonal). The left hand side matrix is converted into anupper triangular matrix and similar operations are performed on the vectorb. This system is then solved using backward substitution.For this work, a vectorized version of this method is implemented whereonly the main diagonal, super diagonal, sub diagonal and modified diagonalvectors are calculated at time t=0 for each linear system. Then for differ-ent vectors b at all time steps, only the vector manipulation and backwardsubstitution are carried out to obtain the solution. This method can beapplied only to tridiagonal matrices whereas LU decomposition works forall matrices.To understand the different steps of the Thomas algorithm, let us con-sider the following tridiagonal systemb1 c1 0a2 b2 c2a3 b3. . .. . .. . . cn−10 an bnx1x2x3...xn =d1d2d3...dn (3.23)323.5. Linear System SolversFirst, the forward elimination process involves converting the matrix Ainto an upper triangular matrix, i.e. bi = 1 and modify the upper diagonalci. This is done through the Gaussian elimination as shown below.c˜1 =c1b1(3.24)c˜i =cibi − ai ˜ci−1 ; i = 2, 3, ..., n− 1 (3.25)Similarly, rhs is modified through the same steps resulting in the follow-ing equations.d˜1 =d1b1(3.26)d˜i =di − ai ˜di−1bi − ai ˜ci−1 ; i = 2, 3, ..., n− 1 (3.27)Now, the system of equations is solved using backward substitution asthe matrix A is converted to an upper triangular matrix. This results insolving equations 3.28-3.29 to obtain the solution values xi for the systemshown in 3.23.xn = d˜n (3.28)xi = d˜i − c˜ixi+1; i = n− 1, n− 2...., 1 (3.29)VariantFor periodic boundary conditions, the left hand side matrix of equation 3.23is slightly different and this equation is perturbed to the following equationb1 c1 a1a2 b2 c2a3 b3. . .. . .. . . cn−1cn an bnx1x2x3...xn =d1d2d3...dn (3.30)This cannot be solved using the traditional Thomas algorithm describedin the above section. This problem occurs only when the linear system is333.5. Linear System Solverssolved in one processor as in multiple processors the terms a1 and cn aretaken care by Aie of the respective subdomains.In order to solve the equation 3.30, we use Sherman-Morrison formulato avoid additional Gaussian elimination steps and still use the traditionalThomas algorithm. We solve a problem of form(A′+ uvT )x = d (3.31)where vectors u and v are as followsuT =[−b1 0 0 . . . 0 cn] (3.32)vT =[1 0 0 . . . 0 −a1b1](3.33)A′is a slightly different tridiagonal system than above, and the solutionis obtained by solvingA′q = u (3.34)A′y = d (3.35)using the standard Thomas algorithm and computing the solution x asx = y − vT y1 + (vT q)q (3.36)For different time steps, the equation 3.34 is solved at time t = 0 andthe equation 3.35 is solved for each rhs value and equation 3.36 is used toobtain the final solution1.1The Thomas algorithm depicted in this section is referenced from https://www.cfd-online.com/Wiki/Tridiagonal_matrix_algorithm_-_TDMA_(Thomas_algorithm)34Chapter 4Numerical testsIn this section we validate the method by obtaining space convergence andtime convergence using different test cases for two and three space dimen-sions. We show the results for both Heat equation problems and Navier-Stokes problems. All the results obtained in this section are for χ = 1.4.1 Space Convergence4.1.1 2D Heat equation (bodyterm & Dirichlet BC)The 2D Heat equation problem is solved using second order direction split-ting with Crank-Nicolson method for time stepping. The algorithm is testedwith the following steady state solutionu = sin(pix)sin(piy) (4.1)At steady state, ∂u∂t is assumed to be zero and this results in the Poissonequation (with f referred to as bodyterm here) as followsf = −4u (4.2)⇒ f = −∂xx(sin(pix)sin(piy))− ∂yy(sin(pix)sin(piy)) (4.3)⇒ f = pi2(sin(pix)sin(piy)) + pi2(sin(pix)sin(piy)) (4.4)⇒ f = 2pi2sin(pix)sin(piy) (4.5)The problem is solved in Ω = (0, 1) × (0, 1), for 0 ≤ t ≤ T := 1 with atime step 4t = 0.01 and Dirichlet boundary conditions on all the walls. Theboundary value was given by point-wise solution at that point which for theabove solution, results in homogeneous Dirichlet boundary conditions. Theinitial condition at time t = 0 is u = 0. The accuracy of the algorithm ismeasured by computing the errors in the numerical solution for six uniform354.1. Space ConvergenceFigure 4.1: Solution to 2D Heat equation problem with bodyterm and ho-mogeneous Dirichlet boundary conditionsmeshes (16 × 16, 32 × 32, 64 × 64, 128 × 128, 256 × 256, 512 × 512). Allthe computations were run on 4 processors, 2 processors in each direction.The numerical solution on one of the meshes (256× 256) is shown in Figure4.1. From the figure, it can be clearly seen that the solution has maximumvalue at the center, zero values at the walls and is symmetric in nature.This shows qualitatively that our numerical solution is approximating theanalytical solution in Equation 4.1.Figure 4.2: Log-log plot of error vs mesh size(h) for 2D Heat equationproblem with bodyterm and homogeneous Dirichlet boundary conditions364.1. Space ConvergenceIn order to compute the convergence in space, we plot in log-log scalein Figure 4.2 the L2 norm of the difference between the computed solutionand the analytical solution. The slope of this plot is used to understand thespatial order of accuracy. As it can be seen from the Figure 4.2, the slopeof this plot is 2.00167 and this indicates that convergence rate in space issecond order.4.1.2 3D Heat equation (bodyterm & Dirichlet BC)The 3D Heat equation problem described in 2.1 is solved using second or-der direction splitting with Crank-Nicolson method for time stepping. Thealgorithm is tested with the following steady state solutionu = sin(pix)sin(piy)sin(piz) (4.6)which results in the right hand side function f (referred to as bodytermhere) as followsf = −4u (4.7)f = 2pi2sin(pix)sin(piy)sin(piz) (4.8)Figure 4.3: Log-log plot of error vs mesh size(h) for 3D Heat equationproblem with bodyterm and homogeneous Dirichlet boundary conditionsThe problem is solved in Ω = (0, 1)×(0, 1)×(0, 1), for 0 ≤ t ≤ T := 1 witha time step 4t = 0.01 and Dirichlet boundary conditions on all the walls.374.1. Space ConvergenceThe boundary value was given by pointwise solution at that point which forthe above solution, results in homogenous Dirichlet boundary conditions.The initial condition at time t = 0 is u = 0. The accuracy of the algorithmis measured by computing the errors in the numerical solution for six uniformmeshes (8× 8× 8,16×16× 16, 32× 32× 32, 64× 64× 64, 128× 128× 128).All the computations were run on 1 processor. Again we plot in log-log scalein Figure 4.3 the L2 norm of the difference between the computed solutionand the analytical solution. The slope of this plot is used to understandthe spatial order of accuracy. As can be seen from the Figure 4.3, the slopeof this plot is 2.00359 and this indicates that convergence rate in space issecond order.4.1.3 2D Periodic Poiseuille FlowThe 2D periodic poiseuille flow problem is solved using the Navier-Stokessolver with direction splitting and domain decomposition. This problem ischosen as its analytical solution is known and velocity profile is given asfollowsvx(y) =4P2µL(y2 − h2) (4.9)where 4P is the pressure drop in x, L is the length of the plate and 2his the vertical distance between the two plates. The details of the domainused for obtaining numerical solution is described below.LhhFigure 4.4: 2D Domain for Periodic Poiseuille flowThe problem is solved in Ω = (0, 4) × (0, 1), for 0 ≤ t ≤ T := 100 witha time step 4t = 0.05 for Reynolds number equal to 100. The domain hashomogeneous Dirichlet boundary conditions on top wall and bottom wallfor velocity and a fixed pressure drop imposed in x direction. The initialcondition on the velocity at time t = 0 is u = v = 0. The domain is asshown in the Figure 4.4.384.1. Space ConvergenceFigure 4.5: Pressure solution to 2D Periodic poiseuille problemThe numerical solution obtained for pressure is shown in Figure 4.5 andfor velocity is shown in Figure 4.6. From Figure 4.5, it can be clearly seenthat the numerical solution is linear in nature and perfectly matches theanalytical solution. Similarly, the numerical solution for velocity is parabolicin nature with maximum at the centre and zero at the boundary as can beseen in Figure 4.6.Figure 4.6: Velocity solution to 2D Periodic poiseuille problemThe accuracy of the algorithm is measured by computing the errors inthe numerical solution for six uniform meshes (8 × 8 , 16 × 16 , 32 × 32 ,64× 64 , 128× 128 , 256× 256 , 512× 512 ). All the computations were runon 4 processors, 2 processors in each direction. We plot in log-log scale in394.2. Time ConvergenceFigure 4.7 the L2 norm of the difference between the computed solution andthe analytical solution. It can be clearly seen from the plot that the slopeis 2.00006 and this shows that the numerical solution converges as secondorder in space.Figure 4.7: Log-log plot of error vs mesh size(h) for 2D Periodic Poiseuilleproblem4.2 Time Convergence4.2.1 2D Heat equation (Dirichlet & bodyterm)The 2D Heat equation problem is solved using second order direction split-ting with Crank-Nicolson method for time stepping. The algorithm is testedwith the body term defined in subsection 4.1.1.The problem is solved in Ω = (0, 1)×(0, 1) on a mesh size of 100×100 withhomogeneous Dirichlet boundary conditions on all the walls. The problemis solved for 0 ≤ t ≤ T where final time T is chosen as Tsteady2 as the errorin time goes to zero for T = Tsteady. The value of T = Tsteady is estimatedas 0.4 as the error in solution is of the order 10−5 at this time. So, the finaltime is taken as 0.2 to obtain the solution at a transient state.404.2. Time ConvergenceFor testing time convergence, we obtain a reference solution using a verysmall time step 4t = 10−6. This solution is used instead of the analyticalsolution to obtain the errors on the numerical solution. Numerical solution iscalculated for range of time steps (10−2 to 10−5 ) and the error is calculatedwith respect to the reference solution.Figure 4.8: Log-log plot of error vs time step(dt) for 2D Heat equationproblemLog-log plot of L2 norm of error as a function of time step is shown inFigure 4.8. The slope of this plot is used to understand the time order ofaccuracy. As can be seen from the Figure 4.8, the slope of this plot is closeto 1 and this indicates that convergence rate in time is first order.Note that the algorithm implemented is expected to be second order intime. Since the results obtained are first order in time, more analysis needsto be done in order to understand the possible reason for this convergence.There is a possibility that this might be a bug in the code and if this turnsout to be the case, it will be fixed in the next 2 weeks. The work is reportedwith the available results at the time being.414.2. Time Convergence4.2.2 3D Lid Driven Cavity ProblemThe 3D lid driven cavity problem is solved using second order directionsplitting with Crank-Nicolson method for time stepping. The problem issolved in Ω = (0, 1) × (0, 1) × (0, 1) on a mesh size of 20 × 20 × 20 withDirichlet boundary conditions on all the walls. The boundary value is set to1.0 on the top wall and 0.0 on all the other walls. The problem is solved for0 ≤ t ≤ T where final time T is chosen as Tsteady2 as the error in time goes tozero for T = Tsteady. The value of T = Tsteady is estimated as 2×10−3 as thedivergence of velocity is of the order 10−10 at this time. So, the final timeis taken as 10−3 to obtain the solution at a transient state. The problemis solved for two different values of Reynolds numbers, 0.1 and 100 and theconvergence results are presented for both the values.Figure 4.9: Pressure solution to 3D Lid Driven Cavity problemThe numerical solution obtained for pressure is shown in Figure 4.9 andfor velocity is shown in Figure 4.10. From Figure 4.9, it can be clearly seenthat the numerical solution achieves maximum and minimum pressure atthe corners of the top wall. This shows that it qualitatively matches theanalytical solution. Similarly, the numerical solution for y component ofvelocity attains maximum value and minimum value at the corners of thetop wall and the x component of velocity reaches maximum value at thecentre and minimum value at the boundaries as it can be seen in Figure4.10.In order to understand if the numerical solution satisfies the incompress-ibility constraint, we plot divergence of velocity as a function of time step(dt). This is done for the transient solution calculated untilTsteady2 time and424.2. Time ConvergenceFigure 4.10: Velocity solution to 3D Lid Driven Cavity problemFigure 4.11: Log-log plot of Div u vs time step(dt) for 3D Lid Cavity prob-lem with Reynolds numbers 0.1 and 100this is done for a range of time steps (10−4, 5×10−5, 2.5×10−5, 1.25×10−5,10−5, 5 × 10−6, 2.5 × 10−6, 1.25 × 10−6, 10−6). Figure 4.11 shows the L2norm of Divergence of velocity as a function of time step and it can beclearly seen that the slope of the best fit line is between 1 and 1.5 for boththe values of Reynolds numbers. It can also be seen that the slope of the line434.2. Time Convergencecorresponding to a Reynolds number of 0.1 (exact value is 1.47) is almostparallel to the line with slope of 1.5. This indicates that the divergence ofvelocity is converging to zero with an order of 1.5 in time for low Reynoldsnumbers and an order between 1 and 1.5 for high Reynolds numbers.For testing time convergence, we obtain a reference solution using a verysmall time step 4t = 10−7. This solution is used instead of the analyticalsolution to obtain the errors on the numerical solution. Numerical solutionis calculated for a range of time steps as mentioned above and the error iscalculated as follows:ei = ‖uref‖2 − ‖ui‖2 (4.10)where ‖uref‖2 is the L2 norm of reference solution, ‖ui‖2 is the L2 normof solution obtained using time step i and ei is the error in the numericalsolution obtained using time step i.Figure 4.12: Log-log plot of error vs time step(dt) for 3D Lid Cavity problemwith Reynolds number 0.1Figures 4.12 and 4.13 show the Log-log plot of error as a function of444.2. Time ConvergenceFigure 4.13: Log-log plot of error vs time step(dt) for 3D Lid Cavity problemwith Reynolds number 100time step for the x component of velocity, the y component of velocity andthe pressure for Reynolds numbers 0.1 and 100 respectively. The slope ofthis plot is used to understand the time order of accuracy. As it can be seenfrom the Figures 4.12 and 4.13, the slope for the x component of velocity andthe y component of velocity is close to two for both high and low Reynoldsnumber. Note that the z component of velocity is not plotted as the valuesof z component of velocity are very low for this problem. This indicatesthat the convergence in time for L2 norm of velocity is second order. It canalso be observed that the slope for pressure is between one and two for lowReynolds number and it converges to two for high Reynolds number. Thus,the convergence in time on L2 norm of pressure is between 1 and 2 for lowReynolds numbers and close to 2 for high Reynolds number.It can be seen that the points are scattered in the Figures 4.12 and4.13 and the scattering is more significant for high Reynolds number. Thisneeds to be further investigated in order to understand the reason behindthe scattering and it will be done in the upcoming weeks.45Chapter 5ScalabilityThe solver implemented using direction splitting and domain decompositionis very efficient and scalable. In this section, the scalability of the solver istested by running computations on multiple cores on the Cedar cluster.5.1 Weak ScalabilityWeak scalability tests are performed by keeping the number of grid pointsper processor fixed and recording the run-times (CPU time) as the numberof processors increases. All the tests are performed on the 2D and 3D lid-driven cavity problems. Two different settings is tested, one with small loadper processor (small number of grid points per processor) and another onewith large load per processor (large number of grid points per processor).5.1.1 2D Lid Cavity ProblemFor the 2D Lid Cavity problem, 10,000 grid points per processor is used forsmall load and 1,000,000 grid points per processor is used for large load.Both the simulations are run for a range of processors from 1-96 (1, 2, 4, 8,16, 32, 48, 64, 96). The time step used for small load is 10−4 and for largeload is 10−5 chosen such that the CFL condition mentioned in equation 2.35is satisfied.All the computations have been done over 1000 time steps. Table 5.1shows the weak scalability of the Direction Splitting solver for both smallload (104 grid points per proc) and large load (106 grid points per proc) ona 2D Lid Cavity problem. The CPU times reported in each row of the tablecorrespond to a fixed number of processors (1 processor in row 1, 2 in row2 and so on.). All the times reported in the table are per time step. Forinstance, with 106 grid points per processor, the CPU time per time stepgoes from 0.0243s on 1 processor to 0.0425s on 96 processors. This showsthat the weak scalability of the algorithm is reasonable.Also, Figure 5.1 shows the scalability on multiple nodes compared toone node (assumed to be composed of 16 processors). The scalability of p465.1. Weak Scalability# procs 104 gps/proc 106 gps/proc1 0.0247 2.432 0.0324 3.234 0.039 3.828 0.0398 3.9216 0.0408 432 0.044 4.0148 0.0472 4.1764 0.0449 4.0596 0.0484 4.25Table 5.1: Weak Scalability on 2D Lid Cavity Problem: CPU time (insecond) per time stepprocessors is defined as the CPU time taken to solve the problem on onenode divided by the time taken to solve the problem on p processors thatcorrespond to p/16 nodes. In order to obtain this, only rows with processorsgreater than or equal to 16 are considered in Table 5.1 and the scalabilityis obtained for each of them. The number of nodes corresponding to pprocessors is p/16.Figure 5.1: Weak Scalability: Performance on Multiple Nodes for 2D LidCavity Problem475.1. Weak ScalabilityNote that in Cedar cluster, usually one node consists of 48 processorsand here we use 16 processors as the basis for measuring this performanceas the problem was run on only small numbers of processors/nodes.Figure 5.1 shows that the algorithm has very good weak scalability as allthe points are almost close to 1 indicating that the problem takes approxi-mately the same time given the fixed number of grid points per processor.It can also be seen that the weak scalability is lower for small load per pro-cessor as compared to the large load per processor. This is expected in thesmall load case, since the MPI communications take more time as comparedto the solution of the linear system leading to a lower scalability value. Con-trary to this, in the large load case, the weak scalability is much better asMPI communications play a very small role as compared to the other partsof the algorithm.5.1.2 3D Lid Cavity ProblemFor the 3D Lid Cavity problem, 8,000 grid points per processor is used forsmall load and 1,000,000 grid points per processor is used for large load.Both the simulations are run for a range of processors from 1-1000 (1, 2,4, 8, 16, 32, 48, 64, 96, 192, 512, 1000). The time step used for smallload is 10−3 and for large load is 10−4 chosen such that the CFL conditionmentioned in equation 2.35 is satisfied.# procs 8× 103 gps/proc 106 gps/proc1 0.0395 5.132 0.0491 6.474 0.0583 7.488 0.0688 8.8516 0.0709 8.632 0.0776 9.2848 0.0797 9.3364 0.0758 9.1796 0.0819 9.61192 0.0894 9.76512 0.115 101000 0.126 9.76Table 5.2: Weak Scalability on 3D Lid Cavity Problem: CPU time (insecond) per time step485.1. Weak ScalabilityAll the computations have been done over 1000 time steps. Table 5.2shows the weak scalability of the Direction Splitting solver for both smallload (8 × 103 grid points per proc) and large load (106 grid points perproc) on a 3D Lid Cavity problem. The times reported in the two columnscorresponds to the two different loads (8×103 in column 1 and 106 in column2). The CPU times reported in each row of the table correspond to a fixednumber of processors (1 processor in row 1, 2 in row 2 and so on.). Allthe times reported in the table are per time step. For instance, with 106grid points per processor, the CPU time per time step goes from 5.13s on 1processor to 9.76s on 1000 processors. This shows that the weak scalabilityof the algorithm is reasonable.Also, Figure 5.2 shows the scalability on multiple nodes compared toone node (assumed to be composed of 48 processors). The scalability ofp processors is defined as CPU time taken to solve the problem on onenode divided by the time taken to solve the problem on p processors thatcorrespond to p/48 nodes. In order to obtain this, only rows with processorsgreater than or equal to 48 are considered in Table 5.2 and the scalabilityis obtained for each of them. The number of nodes corresponding to pprocessors is p/48.Figure 5.2: Weak Scalability: Performance on Multiple Nodes for 3D LidCavity ProblemFigure 5.2 shows that the algorithm has very good weak scalability for495.2. Strong scalabilitylarge load as all the points are almost close to 1 indicating that the problemtakes approximately the same time given the fixed number of grid pointsper processor. The weak scalability is again slightly lower for small load perprocessor and this is due to the MPI communications that take more time ascompared to the solution of the linear system. These MPI communicationsact as the bottleneck and lead to a lower scalability value for small loadcase. Contrary to this, in the large load case, the weak scalability is muchbetter as MPI communications play a very small role as compared to theother parts of the algorithm since the linear systems involved are larger.5.2 Strong scalabilityStrong scalability tests are conducted to analyze the performance of thesolver on the same problem while increasing the number of processors. Thesetests are usually conducted by looking at the speedup on p processors. Thisspeedup is calculated by taking the ratio of time taken to solve the problemon one processor divided by the CPU time taken to solve the problem onp processors. All the tests are performed on the 3D Lid Cavity problem.Performance is analyzed for two different grids, a small grid composed of8,000,000 grid points and a large grid composed of 64,000,000 grid points.Standard projection solver is also run on the same grids with same param-eters and the results are compared with the performance of the directionsplitting solver.5.2.1 Small LoadThe computations are done for 20 timesteps on 200 × 200 × 200 grid fora range of processors from 1-512 (1,2,4,8,16,32,48,64,96,192,512). The timestep used is 10−3 and is chosen such that the CFL condition mentioned inequation 2.35 is satisfied. Figure 5.3 shows the speedup on upto 512 pro-cessors for the pressure update step of a 3D Lid Cavity problem. Similarly,5.4 shows the speedup on upto 512 processors for the velocity update step.From these figures, it can be easily inferred that the direction splitting solverhas good scalability that is comparable to the standard Projection solver.In fact, for the velocity update step direction splitting solver scales slightlybetter compared to the standard projection solver.Note that the overall time taken by both solver is not used to obtain thestrong scalability plots. This is due to the fact that the standard projec-tion solver requires the computation of the multigrid preconditioner for theconjugate gradient algorithm and this takes a long time to compute. But505.2. Strong scalabilityFigure 5.3: Speedup in Pressure update step for Small Load per proc on a3D Lid Cavity Problem. Ideal linear speedup shown as a solid green line.Figure 5.4: Speedup in Velocity update step for Small Load per proc on a3D Lid Cavity Problem. Ideal linear speedup shown as a solid green line.515.3. Performance Comparisonit is only done at time t = 0 and the subsequent computations are muchfaster. Since our computations are only for 20 timesteps, comparing theoverall time taken by standard projection solver and the direction splittingmight be biased against the standard projection solver. Hence, the overalltime taken by the solvers are not used for comparing the scalability.5.2.2 Large LoadThe computations are done for 20 timesteps on 400 × 400 × 400 grid for arange of processors from 1-1000 (1,2,4,8,16,32,48,64,96,192,512,1000). Thetime step used is 10−4 and is chosen such that the CFL condition mentionedin equation 2.35 is satisfied. Figure 5.5 shows the speedup on upto 1000 pro-cessors for the pressure update step of a 3D Lid Cavity problem. Similarly,5.6 shows the speedup on upto 1000 processors for the velocity update step.From these figures, it can be easily inferred that the direction splitting solverhas good scalability that is comparable to the standard projection solver.In fact, for the velocity update step direction splitting solver scales slightlybetter compared to the standard projection solver while for the pressure up-date it is the reverse but overall the strong scalability is of the two solvers isvery similar. For the same reason described in the previous section, overalltime taken by the solvers are not used for comparing the scalability. Theyare compared only for the pressure update step and the velocity update step.5.3 Performance ComparisonIn this section, the performance of the Direction Splitting solver is comparedto the standard projection solver. This is done by looking at the accelerationfactor on a range of processors. The acceleration factor is defined as theratio of time taken by the standard projection solver divided by the timetaken by the Direction Splitting solver. All the tests are performed on the3D Lid Cavity problem. Performance is analyzed on a large, fixed gridcomposed of 64,000,000 grid points. The computations are done for 20timesteps on 400 × 400 × 400 grid for a range of processors from 1-1000(1,2,4,8,16,32,48,64,96,192,512,1000). The time step used is 10−4 and ischosen such that the CFL condition mentioned in equation 2.35 is satisfied.Figure 5.7 shows the acceleration factor obtained using the DirectionSplitting solver. From the Figure, it can be easily inferred that the directionsplitting solver is much faster as compared to the standard projection solveras the acceleration factor is much greater than 1 for the different processors.525.3. Performance ComparisonFigure 5.5: Speedup in Pressure update step for Large Load per proc on a3D Lid Cavity Problem. Ideal linear speedup shown as a solid green line.Figure 5.6: Speedup in Velocity update step for Large Load per proc on a3D Lid Cavity Problem. Ideal linear speedup shown as a solid green line.535.3. Performance ComparisonFor instance on 1000 processors, the direction splitting solver is roughly 10times faster as compared to the standard projection solver.Figure 5.7: Performance comparison with Standard Projection solver forLarge Load per proc on a 3D Lid Cavity Problem. Ideal line shown as asolid green line.Note that the acceleration factor shown in Figure 5.7 is calculated us-ing the overall time taken by both solvers for computing 20 time steps.As mentioned in the section 5.2, the standard projection solver requires tocompute the multigrid preconditioner for the conjugate gradient algorithmat time t = 0 and this takes a long time to compute. But the subsequentcomputations are much faster. Therefore, the acceleration factor value ofapproximately 10 on 1000 processors might be slightly overoptimistic andshould be taken with reservation.54Chapter 6ConclusionWe have implemented a 3D Navier-Stokes solver building on the class of pres-sure correction methods. The main feature is the use of another symmetricpositive definite operator instead of the Laplace operator in the pressurecorrection step. This enables us to use a direction splitting technique forpressure update. The most important feature of this algorithm is that ithas similar convergence and stability properties as other pressure correc-tion methods. Using second order central difference schemes for the implicitdiffusion terms, the discretized system of equations are linear tridiagonalsystems.At first, finite difference method was investigated for space discretizationand it worked generally well for uniform domains. For non-uniform domains,the Uzawa algorithm didn’t converge as the matrix was not symmetric. Inorder to overcome this limitation of finite difference method, finite volumemethod was tried as it results in symmetric matrices even for non-uniformdomains. Finite volume method worked well for uniform and non-uniformdomains and was used for all the other experiments.Different types of splitting methods were experimented. Both first or-der and second order splitting methods were explored for Heat equationproblems and as expected, second order splitting methods provided moreaccurate solution with second order accuracy.Similarly, different types of methods were experimented for time dis-cretization. We tested first order explicit and second order Adams-Bashforthexplicit methods for the time discretization of the advection term and firstorder implicit and second order Crank-Nicolson implicit methods for thetime discretization of the diffusion term. In order to guarantee second orderaccuracy in time, we selected Adams-Bashforth and Crank-Nicolson.Both overlapping and non-overlapping domain decomposition methodswere explored and non-overlapping methods worked well with the problemformulation. We combined a non-overlapping method with a Schur comple-ment technique to solve the discretized system of equations.The resulting saddle point problem was described and was solved usingtwo methods. Uzawa algorithm and Schur complement technique were com-556.1. Summary of resultspared and the Schur complement technique was preferred as it was muchfaster. Problem formulation was also explored with and without Lagrangemultiplier. Both the formulations are effective and didn’t affect the perfor-mance. Finally, we implemented the formulation without Lagrange multi-plier as the construction of the Schur complement is quite straightforward.Domain decomposition was performed using MPI communications andinvolved very few communications (2 communications to solve in one direc-tion). The Schur complement was calculated at time t = 0 and was usedto solve subdomain unknowns at each time step. The Schur complementwas very small in size. For instance, 1,000,000 unknowns in 3D on 1000processors results in a Schur complement of size 9× 9.LU decomposition and Thomas algorithm were considered for solvinglinear tridiagonal systems and their performance was compared. For largenumber of rhs values, the Thomas algorithm was much more efficient com-pared to LU decomposition and it is used to solve all the linear tridiagonalsystems.6.1 Summary of resultsConvergence tests were run for different test cases to estimate the spatialorder of accuracy. At a fixed time step, the error between analytical solutionand numerical solution was calculated for a range of meshes and Log-logplot was obtained between the L2 norm of error and the mesh size. 2DHeat equation problem was computed with a smooth analytical solutionand Dirichlet boundary conditions. Similarly, a 3D Heat equation problemwas also computed. The Navier-Stokes solver was tested in a 2D periodicpoiseuille flow problem. For all the test cases, second order convergence inspace was observed as expected. For time convergence, the lack of analyticalsolution was addressed by obtaining a reference solution using a very smalltime step, say 10−6 on a mesh size of 100 × 100. The Numerical solutionwas obtained on the same mesh for a range of time steps and the error withthe reference solution was obtained. Log-log plots of the L2 norm of erroras a function of time step size was generated to understand the time orderconvergence. A 2D Heat equation problem was computed with a bodytermand Dirichlet boundary conditions. Similarly, the 3D Navier-Stokes solverwas tested on a lid driven cavity problem. First order convergence in timewas observed for both the Heat equation solver and second order convergencein time was observed for the Navier-Stokes solver.The main feature of this solver was that it was expected to be highly scal-566.2. Future Workable as compared to other standard methods used for Navier-Stokes solveras it involves direction splitting and domain decomposition in both velocityand pressure update steps. The solver was tested for both weak scalabilityand strong scalability.The weak scalability of the solver was tested by running it on a rangeof processors with fixed load per processor and obtaining the ratio of timetaken with time taken on 1 processor. The scalability on multiple nodeswas tested by plotting the ratio of time taken on p processors divided bytime taken on one node with the number of nodes. For 2D, one node wasassumed to be a group of 16 processors whereas for 3D, one node was takenas the standard set of 48 processors (according to the node specification inCedar cluster). In general, very satisfactory weak scalability was observed.The strong scalability of the solvers was compared for the pressure up-date and the velocity update steps. We showed that the direction splittingsolver has a similar scalability as that of the standard projection solver. Per-formance of both the direction splitting solver and the standard projectionsolver were compared with computations performed for 20 time steps. Asexpected, the results from the direction splitting solver were much fastercompared to the standard projection solver.6.2 Future WorkThis algorithm can be only applied for problems posed in domains composedof parallelepipeds with faces parallel to coordinate axis or their union. Thistype of arrangement is still relevant to a wide range of problems encounteredin science and engineering. It can be used to solve problems posed in simplegeometries such as simulation of turbulent flows in the atmosphere and inthe ocean, stratified flows, variable density flows, combustion etc.It can also be applied to problems with complex geometries by combiningit with a fictitious domain formulation as discussed in [45] or an immersedboundary formulation shown in [31]. We can also use penalty methods asexplained in [2] or boundary fitting techniques using directional grid adjust-ment as illustrated in [7].Higher order space discretization schemes can be used instead of secondorder central differencing scheme used in this solver. High order/Multi-step time discretization schemes can be used instead of the Crank-Nicolsonscheme. Similarly, higher order schemes can be used for calculating ad-vection term both in space and time as compared to TVD and Adams-Bashforth, respectively. The time convergence obtained for 2D Heat equa-576.2. Future Worktion problem is first order but it is expected to be second order. This canbe possibly due to a bug in the code which should be fixed in the upcomingweeks. The Log-log plots of the L2 norm of error as a function of time stepsize for 3D lid cavity problem had significant scattering for high Reynoldsnumber and this needs to be analyzed to identify the underlying reason.58Bibliography[1] Achdou, Y., and Guermond, J.-L. Convergence analysis of a fi-nite element projection/lagrange–galerkin method for the incompress-ible navier–stokes equations. SIAM Journal on numerical analysis 37,3 (2000), 799–826.[2] Angot, P. Analysis of singular perturbations on the brinkman prob-lem for fictitious domain models of viscous flows. Mathematical methodsin the applied sciences 22, 16 (1999), 1395–1412.[3] Auteri, F., Guermond, J.-L., and Parolini, N. Role of the lbbcondition in weak spectral projection methods. Journal of Computa-tional Physics 174, 1 (2001), 405–420.[4] Brown, D. L., Cortez, R., and Minion, M. L. Accurate projectionmethods for the incompressible navier–stokes equations. Journal ofcomputational physics 168, 2 (2001), 464–499.[5] Chorin, A. J. Numerical solution of the navier-stokes equations.Mathematics of computation 22, 104 (1968), 745–762.[6] Douglas, J. Alternating direction methods for three space variables.Numerische Mathematik 4, 1 (1962), 41–63.[7] Galo, J., Albarreal, I., Calzada, M., Cruz, J., Ferna´ndez-Cara, E., and Mar´ın, M. Convergence and optimization of theparallel method of simultaneous directions for the solution of ellipticproblems. Journal of Computational and Applied Mathematics 222, 2(2008), 458–476.[8] Girault, V., and Raviart, P.-A. Finite element methods forNavier-Stokes equations: theory and algorithms, vol. 5. Springer Science& Business Media, 2012.[9] Glowinski, R. Finite element methods for incompressible viscous flow,vol. 9. 12 2003, pp. 3–1176.59Bibliography[10] Glowinski, R. Finite element methods for incompressible viscous flow.Handbook of numerical analysis 9 (2003), 3–1176.[11] Goda, K. A multistep technique with implicit difference schemes forcalculating two-or three-dimensional cavity flows. Journal of computa-tional physics 30, 1 (1979), 76–95.[12] Gresho, P. M. On the theory of semi-implicit projection methods forviscous incompressible flow and its implementation via a finite elementmethod that also introduces a nearly consistent mass matrix. part 1:Theory. International Journal for Numerical Methods in Fluids 11, 5(1990), 587–620.[13] Guermond, J., and Minev, P. Start-up flow in a three-dimensionallid-driven cavity by means of a massively parallel direction splittingalgorithm. International Journal for Numerical Methods in Fluids 68,7 (2012), 856–871.[14] Guermond, J., and Shen, J. A new class of truly consistent splittingschemes for incompressible flows. Journal of computational physics 192,1 (2003), 262–276.[15] Guermond, J., and Shen, J. On the error estimates for the rotationalpressure-correction projection methods. Mathematics of Computation73, 248 (2004), 1719–1737.[16] Guermond, J.-L. Some implementations of projection methods fornavier-stokes equations. ESAIM: Mathematical Modelling and Numer-ical Analysis 30, 5 (1996), 637–667.[17] Guermond, J.-L. Un re´sultat de convergence d’ordre deux en tempspour l’approximation des e´quations de navier–stokes par une techniquede projection incre´mentale. ESAIM: Mathematical Modelling and Nu-merical Analysis 33, 1 (1999), 169–189.[18] Guermond, J.-L., and Minev, P. A new class of massively par-allel direction splitting for the incompressible navier–stokes equations.Computer Methods in Applied Mechanics and Engineering 200, 23-24(2011), 2083–2093.[19] Guermond, J.-L., Minev, P., and Shen, J. An overview of pro-jection methods for incompressible flows. Computer methods in appliedmechanics and engineering 195, 44-47 (2006), 6011–6045.60Bibliography[20] Guermond, J.-L., and Minev, P. D. Analysis of a projec-tion/characteristic scheme for incompressible flow. Communicationsin numerical methods in engineering 19, 7 (2003), 535–550.[21] Guermond, J.-L., and Quartapelle, L. On the approximationof the unsteady navier–stokes equations by finite element projectionmethods. Numerische mathematik 80, 2 (1998), 207–238.[22] Guermond, J.-L., and Shen, J. Quelques re´sultats nouveaux sur lesme´thodes de projection. Comptes Rendus de l’Acade´mie des Sciences-Series I-Mathematics 333, 12 (2001), 1111–1116.[23] Guermond, J.-L., and Shen, J. Velocity-correction projection meth-ods for incompressible flows. SIAM Journal on Numerical Analysis 41,1 (2003), 112–134.[24] Hughes, T. J., Franca, L. P., and Balestra, M. A new finite el-ement formulation for computational fluid dynamics: V. circumventingthe babusˇka-brezzi condition: A stable petrov-galerkin formulation ofthe stokes problem accommodating equal-order interpolations. Com-puter Methods in Applied Mechanics and Engineering 59, 1 (1986), 85–99.[25] Karniadakis, G. E., Israeli, M., and Orszag, S. A. High-ordersplitting methods for the incompressible navier-stokes equations. Jour-nal of computational physics 97, 2 (1991), 414–443.[26] Kim, J., and Moin, P. Application of a fractional-step method to in-compressible navier-stokes equations. Journal of computational physics59, 2 (1985), 308–323.[27] Lee, M. J., Do Oh, B., and Kim, Y. B. Canonical fractional-step methods and consistent boundary conditions for the incompress-ible navier–stokes equations. Journal of computational physics 168, 1(2001), 73–100.[28] Orszag, S. A., Israeli, M., and Deville, M. O. Boundary con-ditions for incompressible flows. Journal of Scientific Computing 1, 1(1986), 75–111.[29] Peaceman, D. W., and Rachford, Jr, H. H. The numerical so-lution of parabolic and elliptic differential equations. Journal of theSociety for industrial and Applied Mathematics 3, 1 (1955), 28–41.61Bibliography[30] Perot, J. B. An analysis of the fractional step method. Journal ofComputational Physics 108, 1 (1993), 51–58.[31] Peskin, C. S. Numerical analysis of blood flow in the heart. Journalof computational physics 25, 3 (1977), 220–252.[32] Prohl, A. Projection and quasi-compressibility methods for solvingthe incompressible Navier-Stokes equations. Springer, 1997.[33] Quarteroni, A., Saleri, F., and Veneziani, A. Analysis of theyosida method for the incompressible navier–stokes equations. Journalde mathe´matiques pures et applique´es 78, 5 (1999), 473–503.[34] Quarteroni, A., Saleri, F., and Veneziani, A. Factorizationmethods for the numerical approximation of navier–stokes equations.Computer methods in applied mechanics and engineering 188, 1-3(2000), 505–526.[35] Rannacher, R. On chorin’s projection method for the incompressiblenavier-stokes equations. In The Navier-Stokes equations IItheory andnumerical methods. Springer, 1992, pp. 167–183.[36] Shang, Y. A parallel two-level linearization method for incompressibleflow problems. Applied Mathematics Letters 24, 3 (2011), 364–369.[37] Shen, J. On error estimates of projection methods for navier–stokesequations: first-order schemes. SIAM Journal on Numerical Analysis29, 1 (1992), 57–77.[38] Shen, J. On error estimates of the projection methods for the navier-stokes equations: second-order schemes. Mathematics of Computationof the American Mathematical Society 65, 215 (1996), 1039–1065.[39] Strikwerda, J. C., and Lee, Y. S. The accuracy of the fractionalstep method. SIAM Journal on numerical analysis 37, 1 (1999), 37–47.[40] Taylor, C., and Hood, P. A numerical solution of the navier-stokesequations using the finite element technique. Computers & Fluids 1, 1(1973), 73–100.[41] Temam, R. Navier-Stokes equations: theory and numerical analysis,vol. 343. American Mathematical Soc., 2001.62Bibliography[42] Timmermans, L., Minev, P., and Van De Vosse, F. An approxi-mate projection scheme for incompressible flow using spectral elements.International journal for numerical methods in fluids 22, 7 (1996), 673–688.[43] Uzawa, H. Iterative methods for concave programming. Studies inlinear and nonlinear programming 6 (1958), 154–165.[44] Van Kan, J. A second-order accurate pressure-correction scheme forviscous incompressible flow. SIAM Journal on Scientific and StatisticalComputing 7, 3 (1986), 870–891.[45] Veeramani, C., Minev, P. D., and Nandakumar, K. A fictitiousdomain method for particle sedimentation. In International Conferenceon Large-Scale Scientific Computing (2005), Springer, pp. 544–551.[46] Wang, C., and Liu, J.-G. Convergence of gauge method for incom-pressible flow. Mathematics of computation 69, 232 (2000), 1385–1407.[47] Weinan, E., Liu, J.-G., et al. Gauge method for viscous incom-pressible flows. Communications in Mathematical Sciences 1, 2 (2003),317–332.[48] Yanenko, N. N. The method of fractional steps for solving multidi-mensional problems of mathematical physics. Novosibirsk: Science 196(1967).63
- Library Home /
- Search Collections /
- Open Collections /
- Browse Collections /
- UBC Theses and Dissertations /
- Parallel direction splitting for 3D incompressible...
Open Collections
UBC Theses and Dissertations
Featured Collection
UBC Theses and Dissertations
Parallel direction splitting for 3D incompressible Navier-Stokes equations Rajendran, Arun 2019
pdf
Page Metadata
Item Metadata
Title | Parallel direction splitting for 3D incompressible Navier-Stokes equations |
Creator |
Rajendran, Arun |
Publisher | University of British Columbia |
Date Issued | 2019 |
Description | In this work, an efficient direction splitting algorithm for solving incompressible Navier-Stokes equations is implemented. The main feature of this method involves using the operator $A=(1-\parital_{xx})(1-\parital_{yy})(1-\parital_{zz})$ for approximately solving the pressure correction step instead of the Poisson operator used in the standard projection methods. The complexity of the algorithm is much lower as compared to the standard projection methods and it is shown to have similar convergence properties as Poisson based pressure-correction techniques. The algorithm is validated on multiple test cases, both in two and three dimensions, respectively. The method is suitable for parallel implementation and tests show excellent performance on a distributed memory cluster of up to 1,000 processors. Scalability results are reported on a three-dimensional lid-driven cavity flow for a range of processors on a large number of grid points. |
Genre |
Thesis/Dissertation |
Type |
Text |
Language | eng |
Date Available | 2019-08-26 |
Provider | Vancouver : University of British Columbia Library |
Rights | Attribution-NonCommercial-NoDerivatives 4.0 International |
DOI | 10.14288/1.0380620 |
URI | http://hdl.handle.net/2429/71440 |
Degree |
Master of Science - MSc |
Program |
Mathematics |
Affiliation |
Science, Faculty of Mathematics, Department of |
Degree Grantor | University of British Columbia |
GraduationDate | 2019-09 |
Campus |
UBCV |
Scholarly Level | Graduate |
Rights URI | http://creativecommons.org/licenses/by-nc-nd/4.0/ |
AggregatedSourceRepository | DSpace |
Download
- Media
- 24-ubc_2019_september_rajendran_arun.pdf [ 1.14MB ]
- Metadata
- JSON: 24-1.0380620.json
- JSON-LD: 24-1.0380620-ld.json
- RDF/XML (Pretty): 24-1.0380620-rdf.xml
- RDF/JSON: 24-1.0380620-rdf.json
- Turtle: 24-1.0380620-turtle.txt
- N-Triples: 24-1.0380620-rdf-ntriples.txt
- Original Record: 24-1.0380620-source.json
- Full Text
- 24-1.0380620-fulltext.txt
- Citation
- 24-1.0380620.ris
Full Text
Cite
Citation Scheme:
Usage Statistics
Share
Embed
Customize your widget with the following options, then copy and paste the code below into the HTML
of your page to embed this item in your website.
<div id="ubcOpenCollectionsWidgetDisplay">
<script id="ubcOpenCollectionsWidget"
src="{[{embed.src}]}"
data-item="{[{embed.item}]}"
data-collection="{[{embed.collection}]}"
data-metadata="{[{embed.showMetadata}]}"
data-width="{[{embed.width}]}"
async >
</script>
</div>
Our image viewer uses the IIIF 2.0 standard.
To load this item in other compatible viewers, use this url:
https://iiif.library.ubc.ca/presentation/dsp.24.1-0380620/manifest