A Higher-Order Accurate Unstructured Finite Volume Newton-Krylov Algorithm for Inviscid Compressible Flows by AMIR NEJAT B.Sc. (Aerospace Engineering), AmirKabir University of Technology, 1996 M.Sc. (Aerospace Engineering), AmirKabir University of Technology, 1998 A THESIS S U B M I T T E D IN P A R T I A L F U L F I L L M E N T O F T H E R E Q U I R E M E N T S F O R T H E D E G R E E O F D O C T O R O F P H I L O S O P H Y in T H E F A C U L T Y O F G R A D U A T E S T U D I E S (Department of Mechanical Engineering) T H E U N I V E R S I T Y O F B R I T I S H C O L U M B I A April 2007 © Amir Nejat, 2007 Abstract A fast implicit (Newton-Krylov) finite volume algorithm is developed for higher-order un-structured (cell-centered) steady-state computation of inviscid compressible flows (Euler equations). The matrix-free General Minimal Residual ( G M R E S ) algorithm is used for solv-ing the linear system arising from implicit discretization of the governing equations, avoiding expensive and complicated explicit computation of the higher-order Jacobian matrix. A n Incomplete Lower-Upper factorization technique is employed as the preconditioning strat-egy and a first-order Jacobian as a preconditioning matrix. The solution process is divided into two phases: start-up and Newton iterations. In the start-up phase an approximate solution of the fluid flow is computed which includes most of the physical characteristics of the steady-state flow. A defect correction procedure is proposed for the start-up phase consisting of multiple implicit pre-iterations. At the end of the start-up phase (when the linearization of the flow field is accurate enough for steady-state solution) the solution is switched to the Newton phase, taking an infinite time step and recovering a semi-quadratic convergence rate (for most of the cases). A proper limiter implementation for higher-order discretization is discussed and a new formula for limiting the higher-order terms of the reconstruction polynomial is introduced. The issue of mesh refinement in accuracy mea-surement for unstructured meshes is revisited. A straightforward methodology is applied for accuracy assessment of the higher-order unstructured approach based on total pressure loss, drag measurement, and direct solution error calculation. The accuracy, fast convergence • and robustness of the proposed higher-order unstructured Newton-Krylov solver for differ-ent speed regimes are demonstrated via several test cases for the 2nd, 3rd and 4th-order discretization. Solutions of different orders of accuracy are compared in detail through sev-eral investigations. The possibility of reducing the computational cost required for a given level of accuracy using high-order discretization is demonstrated. Contents 1 Introduction 1 1.1 Motivation 1 1.2 Background 7 1.2.1 Mesh Generation and Spatial Discretization 7 1.2.2 Higher-Order Discretization 10 1.2.3 Implicit Method and Convergence Acceleration 12 1.3 Objective 15 1.4 Contributions . . 15 1.5 Outline , 16 2 Flow Solver 18 2.1 Governing Equations 18 2.2 Implicit Time-Advance : 20 2.3 Upwind Flux Formulation 23 2.3.1 The Godunov Approach . . ." 25 2.3.2 Roe's Flux Difference Splitting Scheme 26 2.4 Boundary Flux Treatment 28 iii CONTENTS iv 2.4.1 Wall Boundary Condition 28 2.4.2 Inlet/Outlet Boundary Conditions 29 2.5 Flux Integration 31 3 Reconstruction and Monotonicity 33 3.1 Introduction 33 3.2 K-Exact Least-Square Reconstruction 34 3.2.1 Conservation of the Mean 34 3.2.2 K-Exact Reconstruction 37 3.2.3 Compact Support 38 3.2.4 Boundary Constraints 39 3.2.4.1 Dirichlet Boundary Constraint 39 3.2.4.2 Neumann Boundary Constraint 40 3.2.5 Constrained Least-Square System : . 40 3.3 Accuracy Assessment for a Smooth Function 41 3.4 Monotonicity Enforcement 43 3.4.1 Flux Limiting 43 3.4.2 Slope Limiting 44 4 Flux Jacobian 52 4.1 What is the Jacobian ? 52 4.2 Flux Jacobian Formulation 55 4.2.1 Roe's Flux Jacobian . . . •. • 57 CONTENTS ' v 4.2.2 Boundary Flux Jacobians 60 4.2.2.1 Wall Boundary Flux Jacobian 61 4.2.2.2 Subsonic Inlet Flux Jacobian ; ,61 4.2.2.3 Subsonic Outlet flux Jacobian 64 4.2.2.4 Supersonic Inlet/Outlet Flux Jacobians 65 4.3 Finite Difference Differentiation 65 4.4 Numerical Flux Jacobian 68 5 Linear Solver and Solution Strategy 71 5.1 Introduction 71 5.2 G M R E S Linear Solver 73 5.2.1 The Basic G M R E S Algorithm 74 5.2.2 Matrix-Vector Products Computation in G M R E S 77 5.2.3 Preconditioning 80 5.2.4 G M R E S with Right Preconditioning f 83 5.3 Solution Strategy ' 84 5.3.1 Start-up Phase 86 5.3.2 Newton Phase 88 t t 6 Results(I): Verification Cases 90 6.1 Reconstruction Test Cases 90 6.1.1 Square Test Case . , : 91 6.1.2 Annulus Test case 94 CONTENTS vi 6.2 Subsonic Flow Past a Semi-Circular Cylinder 97 6.3 Supersonic Vortex 112 6.3.1 Numerical Solution 115 6.3.2 Solution accuracy measurement . 117 7 Results(II): Simulation Cases 124 7.1 Subsonic Airfoil, N A C A 0012, M = 0.63, a = 2 .0° 124 7.1.1 Solution Process • • • • 1 2 7 7.2 Transonic Airfoil, N A C A 0012, M = 0.8, a = 1 .25° 153 7.3 Supersonic flow, Diamond airfoil , M = 2.0, a = 0.0 161 8 Concluding Remarks 169 8.1 Summary and Contributions 169 8.2 Conclusions 171 8.3 Recommended Future Work 173 8.3.1 Start-up 173 8.3.2 . Preconditioning . 173 8.3.3 Reconstruction 173 8.3.4 Extension to 3D 174 8.3.5 Extension to Viscous Flows 174 Bibliography 175 List o f Tables 1.1 Qualitative illustration of research on solver development ' 4 5.1 Ratio of non-zero elements in factorized matrix . . 82 6.1 2nd-order error norms for the square case 91 6.2 3rd-order error norms for the square case 92 6.3 4th-order error norms for the square case 93 6.4 2nd-order error norms for the annulus case 94 6.5 3rd-order error norms for the annulus case 94 6.6 4th-order error norms for the annulus case . 95 6.7 Sizes and ratios of the control volumes for circular cylinder meshes 98 6.8 Error norms for total pressure, 2nd-order solution 106 6.9 Error norms for total pressure, 3rd-order solution . . . 106 6.10 Error norms for total pressure, 4th-order solution 109 6.11 Solution error norms, 2nd-order discretization : . . 121 6.12 Solution error norms, 3rd-order discretization 121 6.13 Solution error norms, 4th-order discretization 121 vii LIST OF TABLES viii 7.1 Mesh detail for N A C A 0012 airfoil 124 7.2 Convergence summary for N A C A 0012 airfoil, M = 0.63, a = 2° 131 7.3 Lift and drag coefficients for all meshes and discretization orders, N A C A 0012, M = 0.63, a = 2°, far field size of 25 chords 142 7.4 Effect of the far field distance oh lift and drag coefficients, N A C A 0012, M = 0.6.3, a = 2 ° . . . ' 143 7.5 Convergence summary for N A C A 0012 airfoil, M = 0.8, a = 1.25° 155 ' 7.6 • Lift and drag coefficients, N A C A 0012, M = 0.8, a = 1.25° 156 7.7 Convergence summary for diamond airfoil, M = 2.0, a = 0° 164 7.8 Drag coefficient, diamond airfoil, M = 2.0, a = 0° 164 List of Figures 1.1 Main approaches in fluid dynamics 3 1.2 C F D overall algorithm 5 1.3 Example of a structured and an unstructured mesh over a 2D airfoil . . . . 8 2.1 Propagation of a linear wave in positive direction 24 2.2 Shock-Tube problem 26 2.3 Rounding the characteristic slope near zero 28 2.4 Schematic illustration of Gauss quadrature points 31 3.1 A typical cell center control volume and its reconstruction stencil, including three layers of neighbors 36 3.2 Imposing boundary constraint at the Gauss boundary points '39 3.3 Typical unlimited/limited linear reconstruction 46 3.4 Using first neighbors for monotonicity enforcement 47 3.5 Typical unlimited/limited quadratic reconstruction 50 3.6 Defining a as a function of cf> 51 4.1 Schematic of Direct Neighbors : . 55 ix / L I S T OF FIGURES x 4.2 Typical cell-centered mesh numbering 57 4.3 Total numerical error versus perturbation magnitude 67 5.1 Linearization of a sample function 72 6.1 Unstructured meshes for a square domain 92 6.2 Error-Mesh plot for the square case 93 6.3 Unstructured meshes for a curved domain (annulus) 95 6.4 Error-Mesh plot for the annulus case 96 6.5 Circular domain over half a cylinder, Mesh 1 (1376 C V s ) 97 6.6 Circular cylinder, Mesh 1 (1376CVs) 98 6.7 Circular cylinder, Mesh 2 (5539 CVs) 99 6.8 Circular cylinder, Mesh 3 (22844 CVs) 99 6.9 Convergence history for the coarse mesh (Mesh 1) 101 6.10 Convergence history for the fine mesh (Mesh 3) 101 6.11 2nd-order pressure coefficient contours,. Mesh 1 102 6.12 3rd-order pressure coefficient contours, Mesh 1 102 6.13 4th-order pressure coefficient contours, Mesh 1 . . 103 6.14 4th-order pressure coefficient contours, Mesh 3 104 6.15 Pressure coefficient along the axis 104 6.16 Close up of the pressure coefficient along the axis (suction region) . . . . . 105 6.17 Pressure coefficient along the axis, Mesh 3' 105 6.18. Error in total pressure ratio, 2nd-order discretization, Mesh 1 107 LIST OF FIGURES " xi 6.19 Error in total pressure ratio, 3rd-order discretization, Mesh 1 107 6.20 Error in total pressure ratio, 4th-order discretization, Mesh 1 108 6.21 Error in total pressure ratio, 4th-order discretization, Mesh 3 108 6.22 Error-Mesh plot for the total pressure 109 6.23 Drag coefficient versus mesh size 110 6.24 Drag coefficient versus C P U time -. . I l l 6.25 Annulus, Mesh 1 (108 C V s ) 112 6.26 Annulus, Mesh 2 (427 CVs) . . 113 6.27 Annulus, Mesh 3 (1703 CVs) 113 6.28 Annulus, Mesh 4 (6811 C V s ) 114 6.29 Annulus, Mesh 5 (27389 CVs) . 114 6.30 Convergence history for the coarse mesh (Mesh 1) • 116 6.31 Convergence history for the fine mesh (Mesh 5) 116 6.32 2nd-order Mach contours for the coarse mesh (Mesh 1) 117 6.33 3rd-order Mach contours for the coarse mesh (Mesh 1) 118 6.34 4th-order Mach contours for the coarse mesh.(Mesh 1) 118 6.35 2nd-order density error for the coarse mesh (Mesh 1) 119 6.36 3rd-order density error for the coarse mesh (Mesh 1) • 119 6.37 4th-order density error for the coarse mesh (Mesh 1) . . . : . . 120 6.38 Density, 4th-order solution over the fine mesh (Mesh 5) 120 6.39 Error-Mesh plot for the solution (Density) . 122 6.40 Error versus C P U Time 123 LIST OF FIGURES xii 7.1 N A C A 0012, Mesh 1, 1245 C V s 125 7.2 N A C A 0012, Mesh 2, 2501 C V s ' .125 7.3 N A C A 0012, Mesh 3, 4958 C V s 126 7.4 N A C A 0012, Mesh 4, 9931 C V s 126 7.5 N A C A 0012, Mesh 5, 19957 C V s . ^ . 127 7.6 C p over the upper surface after start-up, Meshl (1245 C V s ) 128 7.7 C p over the upper surface after start-up, Mesh 5 (19957 C V s ) 129 7.8 C P U time versus the grid size, N A C A 0012, M = 0.63, a = 2° : 132 7.9 Total work unit versus the grid size, N A C A 0012, M = 0.63, a = 2 ° . . . . 133 7.10 Newton phase work unit versus the grid size, N A C A 0012, M = 0.63, a = 2 ° 133 7.11 Convergence history, N A C A 0012, Mesh 1, M = 0.63, a = 2 ° 134 7.12 Convergence history, Mesh 5, M = 0.63, a = 2° 135 7.13 Non-linear residual versus linear system residual, Mesh 3, M = 0.63, a — 2 ° 135 7.14 Linear system residual dropping order, Mesh 3, M — 0.63, a — 2 ° 136 7.15 Eigenvalue pattern for. the preconditioned system, Mesh 3, M = 0.63, a = 2 ° 137 7.16 Condition No. of the preconditioned system, Mesh 3, M = 0.63, a = 2 ° . . 138 7.17 Condition No. of the preconditioned system, Mesh 5, M = 0.63, a = 2 ° . . 138 7.18 Lift coefficient convergence history, N A C A 0012, Mesh 1, M =.0.63, a = 2 ° 139 7.19 Drag coefficient convergence history, N A C A 0012, Mesh 1, M = 0.63, a = 2 ° 140 7.20 Lift coefficient convergence history N A C A 0012, Mesh 5, M = 0.63, a = 2 ° 141 7.21 Drag coefficient convergence history, N A C A 0012, Mesh 5, M = 0.63, a = 2 ° 141 7.22 N A C A 0012, Mesh 3 (4958 CVs) '. 144 LIST OF FIGURES xiii 7.23 2nd-order Mach contours for N A C A 0012 airfoil, Mesh 3, M = 0.63, a = 2 ° 145 7.24 3rd-order Mach contours for N A C A 0012 airfoil, Mesh 3, M = 0.63, a = 2 ° 145 7.25 4th-order Mach contours for N A C A 0012 airfoil, Mesh 3, M = 0.63, a = 2 ° 146 7.26 Mach profile, upper side, N A C A 0012 airfoil, Mesh 1, M — 0.63, a = 2 ° . . 146 7.27 Mach profile close up, upper side, N A C A 0012 airfoil, Mesh 1, M = 0.63, a = 2 ° 147 7.28 Mesh 1, close-up at the leading edge region 147 7.29 Mach profile, lower side, N A C A 0012 airfoil, Mesh 1, M — 0.63, a = 2 ° . . . 148 7.30 Mach profile, upper side, N A C A 0012 airfoil, Mesh 5, M — 0.63, a = 2 ° . , 148 7.31 Mach profile close up, upper side, N A C A 0012 airfoil, Mesh 5, M = 0.63, a = 2 ° 149 7.32 Mach profile, lower side, N A C A 0012 airfoil, Mesh 5, M — 0.63, a = 2 ° . . . 149 7.33 1 - TA-, upper side, N A C A 0012 airfoil, Mesh 1, M = 0.63, a = 2 ° 150 i too 7.34 lower side, N A C A 0012 airfoil, Mesh 1, M = 0.63, a = 2° 150 •* too 7.35 1 - upper side, N A C A 0012 airfoil, Mesh 5, M = 0.63, a = 2 ° . . . . . 151 7.36 1 - lower side, N A C A 0012 airfoil, Mesh 5, M = 0.63, a = 2 ° 151 7.37 Mach profile at the end of start-up process, M — 0.8, a — 1 .25° 154 7.38 Convergence history for N A C A 0012, M = 0.8, a = 1 .25° 155 7.39 2nd-order Mach contours, N A C A 0012, M = 0.8, a = 1 .25° 156 7.40 3rd-order Mach contours, N A C A 0012, M = 0.8, a = 1 .25° 157 7.41 4th-order Mach contours, N A C A 0012, M = 0.8, a = 1 .25° 157 7.42 limiter <j) (3rd-order), N A C A 0012, M = 0.8, a = 1 .25° 158 7.43 limiter a (3rd-order), N A C A 0012, M = 0.8, a = 1 .25° 159 7.44 Mach profile, N A C A 0012, M = 0.8, a = 1 .25° 159 LIST OF FIGURES • xiv 7.45 Mach profile in shock regions, N A C A 0012, M = 0.8, a = 1 .25° 160 7.46 Mesh 7771 C V s , diamond airfoil, M = 2.0, a = 0.0 162 7.47 C p at the end of start-up process, diamond airfoil, M = 2.0, a — 0.0 . . . . 163 7.48 Convergence history for diamond airfoil, M = 2.0, a = 0 ° 164 7.49 2nd-order Mach contours, diamond airfoil, M = 2.0, a = 0.0 . 165 7.50 3rd-order Mach contours, diamond airfoil, M = 2.0, a — 0.0 165 7.51 4th-order Mach contours, diamond airfoil, M = 2.0, a = 0.0 166 7.52 2nd-order C p , diamond airfoil, M = 2.0, a = 0.0 166 7.53 3rd-order C p , diamond airfoil, M = 2.0, a = 0.0 167 7.54 4th-order C p , diamond airfoil, M = 2.0, a = 0.0 167 List of Symbols Roman Symbols a speed of sound A area, Jacobian matrix b righr hand side (Ax = b) B fixed iteration matrix C center CL lift coefficient CD drag coefficient Cp specific heat at constant pressure, pressure coefficient Cv specific heat at constant volume D • reconstruction solution vector (derivatives) e specific internal energy E total energy, error F flux vector G Gauss (integration) point h specific enthalpy, mesh length scale H Hessenberg matrix I identity matrix X V LIST OF SYMBOLS xvi J Jacobian matrix k number of subspace size K order of accuracy, polynomial order, constant in Venkatakrishnan limiter I length of the control volume face, norm L norm, left eigenvector m number of restarts M Mach number, moments, preconditioner matrix, coefficient matrix in reconstruc-tion n normal vector N total number of ... p polynomial order P static pressure, polynomial r residual, distance R gas constant, residual, radius, right eigenvector S slope in higher-order limiter t time, exponent of the weighting in reconstruction T static temperature U solution vector, conservative variables u, v velocity components V • primitive variables, velocity, subspace x, y Cartesian coordinates, unknown vectors z preconditioning vector LIST OF SYMBOLS Greek Symbols a angle of attack 7 specific heat ratio e perturbation parameter A eigenvalue, wave speed p Density a higher-order limiter (f> slope limiter UJ relaxation factor Superscripts i iteration index k, K order, iteration number, subspace number m, n polynomial exponent Subscripts b boundary c center CD central difference CV control volume DB Dirichlet boundary FD forward difference G Gauss point i inner, control volume index LIST OF SYMBOLS in inlet L left side m magnitude min, max minimum and maximum n normal, normalized N neighbor NB Neumann boundary o outer out outlet ref reference R right side RB reconstructed at the boundary Subln subsonic inlet SubOut subsonic outlet t total W wall x, y Cartesian directions Acknowledgments I would like to express my deep appreciation to my research supervisor, Dr. Car l Ollivier-Gooch for all of his guidance, support and patience. His remarkable feedbacks throughout the course of this research were very helpful. I also would like to thank Dr. Chen Greif from the Computer Science Department. Attend-ing his valuable lectures on sparse matrix solvers and having the opportunity to engage in discussions with him, have greatly benefited my research. I am grateful to my all colleagues in the A N S L a b research group in the Mechanical E n -gineering Department, especially Chris Michalak, Serge Gosselin, and Harsha Perera, for their computer assistance on many occasions. Finally, my most sincere and deepest appreciation is for my great family, especially my mother "Homa" and my father "Mehdi", whose constant support, encouragement, patience and love. I have enjoyed all through my life. xix C h a p t e r 1 Introduction Prediction of fluid flow quantities, such as pressure, velocity, and temperature, and the study of flow behavior are the main goals in the field of fluid dynamics. Like other science and engineering disciplines, fluid dynamics has greatly benefited from the development of computing technology (numerical algorithms and computational tools) over the last four decades, resulting in the creation of a new born approach in the field known as Computa-tional Fluid Dynamics ( C F D ) , F ig (1.1). C F D has shown remarkable capability for fluid flow analysis both in academia and industry. C F D has not only made possible the sim-ulation of flows (such as reentry of space vehicles) for which complete analysis (either by theory or experiment) was impossible before, but also has provided a valuable feedback and information source for improving theoretical and experimental fluid dynamics. Increasing computing power over the last two decades has resulted in the development of new computa-tional techniques and algorithms, enhancing the versatility of C F D application. Nowadays, C F D is not just a research tool, and it is used extensively and successfully in industry throughout the design process, from preliminary design to shape optimization. 1.1 Motivation In the field of computational aerodynamics, the final goal is the accurate simulation of the flow field around (and/or inside) complex 3D geometries to compute aerodynamic force coefficients. In the mid 1980's, Jameson [36] was the first person to compute the three-dimensional flow over realistic aerodynamic configurations using the finite volume technique (a robust conservative numerical approach for discretization of the fluid flow equations over 1 CHAPTER 1. INTRODUCTION 2 general meshes [38]); since then tremendous progress has been made in this area and appli-cation of C F D for aerodynamic computations has revolutionized the process of aerodynamic design [27]. To simulate the flow field around a 3D complex geometry accurately, a C F D package should include three essential parts: 1. State of the art mesh generation capability. Mesh generation or domain dis-cretization, is one of the most important parts (if not the most!) in C F D simulations. Since the discretization of the fluid flow equations is carried out on the mesh, without a good domain discretization the C F D solution can be. very inaccurate. Furthermore generating an appropriate mesh, especially for complex geometries in practice is the most time consuming part for a C F D user and certainly is not a trivial task. Accord-ing to a real aerodynamic case study in aerospace engineering, the mesh preparation time can be up to 45 times larger than the required computation time for the fluid flow simulation [47]. Therefore, it is both desirable and necessary to reduce the mesh prepa-ration time and there is a large potential to gain by automating this process. Ideally, meshing software should be able to generate a geometrically and physically suitable mesh around or inside a complex 3D geometry with a reasonable user workload. Also the user should be able to refine the mesh according to geometric parameters and to adapt the mesh based on flow features without excessive effort. The unstructured mesh technique, among other types of mesh generation methods, is a very good candi-date to address mesh generation issues due to its automation capability in generating meshes for complex geometries and its flexibility in refinement and adaptation. 2. Accurate physical modeling of high Reynolds number turbulent flow. Most practical engineering applications such as aircraft aerodynamics involve turbulent flows. The Direct Numerical Simulation (DNS) of turbulent flows for practical pur-poses is not feasible at least for the next couple of decades due to computing technology limitations (memory and speed). Therefore modeling the turbulent flow is the only viable approach in high Reynolds number C F D simulations and will remain a major active research area for the foreseeable future [7, 75]. Discussing the physical and numerical criteria for choosing an appropriate turbulence model and related issues are beyond the scope of this thesis, but accurate physical modeling is a key part of valid C F D simulation for practical engineering problems..It should be noted that the mod-eling of the physical phenomena in C F D simulations is not just limited to turbulence, but is also essential for combustion, multiphase flows, hydromagnetic flows, and other types of fluid flows. CHAPTER 1. INTRODUCTION 3 Figure 1.1: Main approaches in fluid dynamics 3. A robust, efficient, accurate flow solver for the generic mesh. Numerical solution of the fluid flow equations is what C F D is all about and this solution must be stable and converge to the correct answer at a reasonable cost. A solver algo-rithm includes three separate components: discretization of the fluid flow equations, numerical flux computation, and updating the solution mostly-through time advance methods. F ig (1.2) shows overall schematic of a C F D algorithm. Current C F D flow solver algorithms still have some limitations both in terms of efficiency and accuracy • especially for simulation of physically complicated flows. Specifically, the memory and speed of current computers (even using parallel techniques) do not yet allow us to simulate physically complex flows in realistic geometries, with sufficient accuracy, in a short time and at a reasonable cost. As a result, we have to simplify the physics of the fluid flow via approximate modeling, and neglect some of the numerical/physical issues caused by insufficient mesh density (especially for complex geometries), limiting the validity of the simulation and adversely affecting its application. Before proceeding forward with further details, it should be mentioned that this thesis is aimed only at the third component mentioned above improving the efficiency and accuracy of C F D algorithms. The numerical error in a simulation can be written in the form of h p , where h is the mesh length scale and p is the discretization order of accuracy. Clearly, then, improving the accuracy of a numerical simulation (modeling issues aside) is possible by means of increasing CHAPTER 1: INTRODUCTION 4 Order Structured Explicit Structured Implicit Unstructured Explicit Unstructured Implicit Second-Order ******* ** Higher-Order ** * * ? Table 1.1: Qualitative illustration of research on solver development the mesh density (using smaller grid or decreasing h), and/or increasing the discretization order. Reducing mesh length scale can be achieved by global or local mesh refinement or adaptation. Nearly all modern C F D codes use second-order methods, which produce a diffusive error proportional to h? due to diffusive derivatives beside possible added artificial viscosity for stability purposes (in central difference schemes). For instance in a second-order 2D finite volume formulation in the Cartesian coordinates this numerical error for each control volume can be written in the following form: Numerical diffusive error = d2U Ax2 d2U dx2 2 dxdy Ax Ay + 32U Ay2 8y2 2 (1.1) This leading-order error term causes two significant numerical problems. First, it smears sharp gradients in convection dominated parts of the flow and spoils the conservation of , total pressure in isentropic regions of the flow field as it acts like a (numerical) viscosity. Second, this numerical diffusivity produces parasitic error in viscous regions by adding extra diffusion, which is very grid dependent, to the solution. Therefore using a high resolution numerical scheme for discretization is quite desirable. Application of discretization orders larger than second-order both for structured and unstructured grids has been an area of ongoing research for the last two decades [35, 10], and will be the focus of this thesis. However, convergence of high resolution schemes is not as efficient as second-order schemes especially for unstructured grids [88, 46] due to the increased complexity of the discretiza-tion, decreased damping (lack of diffusive damping) and adding more error modes (which must be damped in the solution process). Consequently implementing a higher-order un-structured discretization within an implicit framework to achieve the efficient convergence is extremely helpful if not necessary! As shown in Fig(1.2), up to the flux integral compu-tation, the overall C F D algorithm is the same, but the choice of the time advance technique in updating the solution changes the level of complexity of the solution process completely. Integration of the discretized equations in time can be done either explicitly or implicitly. In the explicit time integration the space discretization is performed at the previous time CHAPTER 1. INTRODUCTION 5 Geometry & Solution Domain Mesh Generation Package Physics & Fluid Flow Equations Boundary & Initial Conditions Discretized Domain Discretization of the Fluid Flow Equations over the Discretized Domain Flux Integral Explicit Time Advance Implicit Time Advance Multistage Techniques Flux Integral Linearization Solution Update Preconditioning Sparse Matrix Solver Solution Update Figure 1.2: C F D overall algorithm CHAPTER 1. INTRODUCTION' 6 level using the known flow quantities found at the previous time iteration. In the implicit time integration both the space and the time discretizations are performed at the current time level where the flow quantities are needed as unknowns. Equation (1.2) shows a typical unsteady fluid flow P D E where the right hand side represents the spatial discretization and the left hand side shows the time derivative. For example, employing the first-order forward differencing time advance technique in the explicit and implicit forms leads to E q . (1.3) and E q . (1.4). IT = - R { U ) ( 1 ' 2 ) u n + 1 - un Explicit time advance: ' ^ — l - = -R(U?) (1.3) T j n + l _ j j n Implicit time advance: 1 ^ — l - = -R(U?+1) (1.4) While an explicit update just needs multistage integration of the flux integral using a Runge-Kutta type scheme, an implicit update requires linearization of the flux integral (Chapter 2) and constructing a large linear system which requires an efficient sparse matrix solver (discussed in Chapter 4 and 5). Efficiently solving a large linear system, especially with an ill conditioned matrix resulting from a higher-order discretization, demands effective preconditioning which adds to the complexity of the process. But the error reduction of each solution update in implicit integration is far larger than the explicit one, since implicit methods do not suffer from the stability issues of explicit methods and large time steps can be taken. O n balance, therefore, it is preferable to bear the complexity of the algorithm and accelerate the solution toward the steady-state in a relatively small number of implicit iterations. Table 1.1 provides a qualitative summary on finite volume solver development research, where the number of symbols represents the approximate volume of the research on the solver development since earjy 80's (based on the author's survey). Clearly the trend in solver development is moving toward: 1. Unstructured meshes to address the mesh generation issues for complex geometries 2. Higher-order discretizations to increase the global solution accuracy 3. Implicit techniques to improve the efficiency of the solution process This research is mainly intended to contribute to the development of efficient and accurate flow solvers filling the gap in high-order implicit methods, on unstructured meshes. CHAPTER 1. INTRODUCTION 7 1.2 Background This section provides an overview of relevant aspects of current C F D solvers including discretization type and order as well as implicit algorithms. This overview is complemented by detailed discussion of previous work related to each part of the solver in the relevant chapters of the thesis. 1.2.1 M e s h G e n e r a t i o n a n d S p a t i a l D i s c r e t i z a t i o n Numerical simulation of fluid flow consists of two main parts: discretization of the flow field around or inside the geometry by a finite number of cells (grid generation) and solution of the fluid flow equations over the discretized domain (flow solver). Structured and unstruc-tured meshes, F ig (1.3), are the most common types of the grids used in C F D applications. In a structured mesh, all cells and vertices have the same topology but in an unstruc-tured mesh, elements can have irregular and variable topologies. The task of generating structured grids around complex configurations has proved to be a considerable challenge. Sophisticated structured approaches such as multi-block grid generation has resolved this issue by dividing the domain between the body and far field into simple geometrical blocks; structured grids are generated inside each block. However, automation of the blocking procedure is still a relatively difficult job [76]. Another structured approach is overlap or chimera grids. In this technique, the computational domain is divided into multiple zones and a suitable grid is generated in each zone. The chimera approach allows zones to over-lap, and interpolation routines are used to transmit data between the overset grids in the flow solver. However, generalizing the grid generation and adaptation in this approach is still not an easy task and demands a high level of expertise as well as considerable effort. Furthermore, interpolation between the blocks and overlapped meshes has its own issues and can introduce additional error. The most powerful approach for complex geometries is unstructured grids (typically triangular in 2D and tetrahedral in 3D). Unstructured grids have a higher flexibility in refinement based on the geometry and adaptation based on the solution features and gradients. Therefore, unstructured meshes are one of the most suitable choices for complex geometries; associated solvers are becoming more common in modern C F D applications and promise to be more capable and successful for complex aerodynamic problems [88]. The fluid flow equations (PDEs) generally are discretized in one of the following forms: finite difference, finite element or finite volume. Finite difference is the point wise repre-sentation of the flow field where the flow equations are solved only for variables defined at CHAPTER 1. INTRODUCTION 8 (a) Structured (b) Unstructured Figure 1.3: Example of a structured and an unstructured mesh over a 2D airfoil grid points. The finite difference scheme was the original approach to the C F D problems and it is well suited for structured grids. Therefore, its higher-order implementation can be easily achieved by employing higher-order differencing formula. However, the finite dif-ference discretization does not conserve mass, momentum .and energy of the flow which is an important issue for most of the practical applications such as shock capturing. Further-more and more importantly the finite difference discretization can not be implemented on unstructured grids. The finite element method, another discretization technique, is one of the most complete and well established mathematical approach for numerical solution of PDEs. In this method, the flow equations are multiplied by a test function and then integrated over the discretized domain. The solution is represented by a local basis function (interpolation function) for each element. Finite element method is very flexible both in terms of theory and appli-cation and it can be easily used for unstructured meshes. Its high-order extension is also fairly common by employing higher-order basis and test functions. The challenging part in application of the finite element for C F D computation is again the conservation of the flow equations especially for non-smooth flows. Although conserving the mass, momentum and energy is possible in the finite element formulation but it is not an easy task, and finite element codes require considerable fine tuning in shock capturing. The finite volume approach is designed based on the conservation of mass, momentum CHAPTER 1. INTRODUCTION 9 and energy. The solution is represented, by control volume averages and equations are discretized over the volume integrals. Like the finite element method, it is very flexible for complex geometries and unstructured meshes. At the same time its robustness for nearly all C F D applications especially shock capturing problems is well established. Higher-order application of finite volume methods is possible by using a higher-order polynomial inside each control volume which the polynomial average integral over the control volume represents the control volume average. Jameson and Mavriplis [37] reported some of the earliest unstructured finite volume C F D results, solving two-dimensional inviscid flow on regular triangular grids obtained by subdi-viding quadrilateral grids; central differencing was used. Their approach was second-order (linear distribution). The artificial viscosity and the second-order truncation error made the mentioned approach relatively high diffusive for general irregular unstructured meshes. Sec-ond order upwind schemes (discretization of the flow equations based on the physical waves propagation directions) have also been used on unstructured grids either through Green-Gauss gradient technique or least-squares linear reconstruction method. Applying upwind schemes for unstructured grids is more complicated than for structured grids especially for higher-order approximation. For the unstructured case, over each finite volume (triangle in 2D) a polynomial approximation to the solution is reconstructed with the help of the neighboring control volumes, and then the Riemann (shock tube) problem is solved approx-imately at the control volume interfaces. One of the most successful approaches in applying the upwind scheme was undertaken by Barth[10]. In this approach, Barth defined a general upwind formulation (in multi-dimensions), introducing the minimum energy least-square reconstruction procedure for flux calculation up to the desired order of accuracy. However, any upwind scheme higher than first-order (which is monotone by its nature), often causes oscillations in the vicinity of sharp gradients and discontinuities which can produce , instability problems. For example, Agarwal and Halt [6] proposed a compact higher-order scheme for solution of Euler equations over unstructured grids using explicit time integra-tion. Considerable over and undershoots in their transonic airfoil case (3rd-Order) were evident. A common solution to that is using limiters, which enforce monotonicity at the expense of adversely affecting both accuracy and convergence. Barth and Jespersen [13] introduced a multi-dimensional limiter to achieve the monotonic solution. Although that approach was quite successful in suppressing oscillations, reaching full convergence was not possible even after freezing the limiter because the limiter was not differentiable. Since then several attempts have been made to design a differentiable limiter for unstructured grid solvers, and Venkatakrishnan's limiter [87] seems to be one of the most robust. That limiter does not strictly enforce monotonicity but allows only small overshoots in the con-CHAPTER 1. INTRODUCTION i 10 verged solution. However, it preserves accuracy especially for smooth regions where there are local extrema. Although this limiter shows better convergence behavior, it still has some convergence issues with implicit solvers. Designing an appropriate limiter for higher-order unstructured grid solvers is a fairly unexplored topic so far and needs to be addressed for practical higher-order unstructured application. Another more sophisticated approach to cure oscillations in compressible flow computation is the essentially non-oscillatory ( E N O ) family of schemes. These schemes are uniformly accurate and prevent oscillations in the non-smooth regions by detecting discontinuity and modifying the reconstruction stencil from cell to cell and time level to time level [30]. E N O schemes are computationally expensive and sacrifice fast convergence because of their dy-namic stencils [34]. Weighted E N O ( W E N O ) schemes were developed to address the prob-lems caused by dynamic stencils. Near discontinuities, weighted E N O schemes ([3, 57, 28]) remove the effect of non-smooth data in the reconstruction stencil by giving it an asymptot-ically small weight. However no comprehensive convergence analysis and/or computational cost studies are presented. Furthermore, performance of E N O / W E N O schemes in the con-text of implicit time advance, which is one of the most efficient solution strategies, has not yet been studied. 1.2.2 H i g h e r - O r d e r D i s c r e t i z a t i o n For structured meshes, application of higher-order algorithms has progressed considerably and it has been shown that, for practical levels of accuracy, using a higher-order accurate method can be more efficient both in terms of solution time and memory usage. Wi th higher-order accurate methods, the cost of flux computation, integration, and other associ-ated numerical calculations increase per control volume. However, as we can use a coarser mesh, computation time and memory are saved overall and accuracy can be increased as well. De Rango and Zingg [66] applied a globally third-order accurate algorithm for steady turbulent flow over a 2-D airfoil using a structured grid. They showed this approach can lead to a dramatic reduction in numerical error in drag using relatively coarse grids, and the results provide a convincing demonstration of the benefits of higher-order methods for practical flows. Zingg et al. [96] compared different flux discretization techniques with higher-order accuracy for laminar and turbulent flows (including transition) both in sub-sonic and transonic speed regimes. Extending the conclusion of the previous research, it was shown that the higher-order discretization produces solutions of a given accuracy much more efficiently than the second-order methods. More aspects of implementation of higher-order methods have been discussed by De Rango and Zingg [67], and convergence behavior CHAPTER 1. INTRODUCTION 11 of this approach has been studied in detail. Again a higher-order algorithm has been ap-plied for calculation of the flow around the multi-element airfoil using the multi-block grid technique by the same researchers [68]. A grid convergence study in this research showed that the higher-order discretization produces a substantial reduction in the numerical errors in the flow field in comparison with the second-order algorithm. This smaller error has been achieved on a grid several times coarser than the grid which had been used for the second-order algorithm. In summary, these studies show that achieving the desired accuracy in practical aerodynamic flows using higher-order algorithms not only is possible but also has some advantages. Research in high-order unstructured solvers is motivated by the desire to combine the ac-curacy and efficiency benefits seen in the application of high-order methods on structured meshes with the geometric and adaptive flexibility of unstructured meshes. The application of methods having higher than second-order accuracy for solving the compressible Euler and Navier-Stokes equations on unstructured meshes has not been thoroughly investigated yet and remains an active research topic. Several researchers have achieved higher-order accuracy by the use of the finite element method. Bey and Oden [17] have used a discontinuous Galerkin method to reach fourth-order accuracy for smooth flows. Bassi and Rebay [15] have used a new discontinuous element for discretization of Euler equations and have computed the compressible flow over a simple unstructured grid. The finite volume approach has received more attention, Barth and Fredrickson [12] derived a general condition for a scheme to be higher-order accurate, including a reconstruction pro-cedure satisfying the properties of conservation of mean, K-exactness and compact support (these criteria are discussed in detail in Chapter 3). They also proposed a minimum energy (least-square) reconstruction to calculate the required polynomial coefficients. Delanaye and Essers [23] proposed a quadratic reconstruction finite volume scheme for compressible flows on unstructured adaptive grids. The overall accuracy of the scheme was second-order. The inviscid flux was computed directly from their quadratic polynomials; however, diffusive derivatives were obtained through a linear interpolation. For monotonicity enforcement, a discontinuity detector was introduced and higher-order terms in reconstructed polynomials were dropped in the vicinity of discontinuities. Ollivier-Gooch and Van Altena [60] have analyzed a new approach for higher-order accurate finite-volume discretization for diffusive fluxes that is based on the gradients computed during solution reconstruction; fourth-order accurate solution for the advection-diffusion problem has been computed. Recently Nejat and Ollivier-Gooch [48] developed an implicit higher-order unstructured solver for Poisson's CHAPTER 1. INTRODUCTION 12 equation. They clearly showed the possibility of reducing computational cost required for a given level of solution accuracy using higher-order discretization over an unstructured stencil for certain fluid problems. 1.2.3 Implicit Method and Convergence Acceleration Flow features, especially in physically complicated flows, vary greatly in size. In particular, time and length scales associated with physical phenomena like turbulence and combustion can be very disparate. The use of millions of cells, which is common today in practical simulations, results in global length scales spanning the computational domain which are several orders of magnitude larger than the smallest scales resolved by neighboring grid points. W i t h such disparate length scales, and such large numerically stiff problemes, effi-cient time integration and solution convergence are real challenges for the solution of the resulting discrete systems. Generally explicit integration methods (multi-stage Runge-Kutta family schemes), even with the help of acceleration techniques, such as local time stepping and residual smoothing, still show slow convergence behavior for steady-state solution of large and/or stiff C F D problems. Implicit methods are a fairly common and efficient approach for the steady-state solution of the fluid flow equations. Regardless of the space discretization technique, finding solutions to fluid flow problems requires solving a large linear system resulting from the linearization of fluid flow equations in time (temporal discretization). In the limit of very large time steps, implicit time advance schemes approach Newton's method. Newton methods, which will be discussed in Chapter 5, have been used in C F D since the late 80's and are considered an attractive approach for solution convergence of steady flows due to their property of quadratic convergence (when starting from a good initial guess). In early attempts direct methods were employed for solving the linear system arising at each Newton iteration [85, 89, 8]. While direct solvers have been developed for stiff linear systems, the size of the systems of equations arising in C F D makes applying direct methods impossible in practice. Therefore, using iterative linear solvers with proper preconditioning, which is a crucial factor for complex problems, is the only reasonable choice for solving the linear system in each Newton iteration. At the same time, the cost of each iteration in terms of C P U time and memory usage for a pure Newton method is relatively high. Quasi-Newton methods can have satisfactory convergence behavior, lower memory usage and less cost per iteration at the expense of increasing total number of Newton iterations and losing quadratic convergence rate [62]. Quasi-Newton methods are generally categorized as Approximate Newton and Inexact Newton methods. CHAPTER 1. INTRODUCTION 13 In Approximate Newton methods the flux Jacobian on the left hand side (arising from linearization) either is computed through some simplifications or is evaluated based on lower-order discretization, while the flux integral on the right hand side is evaluated up to the desired order of accuracy. In either case the linearization is done approximately and although the Jacobian matrix has simpler structure and is better conditioned (i.e. easier to invert), the overall convergence rate of the non-linear problem will be degraded. This approach is also known as the defect correction technique, and it is useful when there are memory limitations and storing the full Jacobian is impractical, specifically in 3D. If the true Jacobian is very stiff and solving the linear system of the true linearization is challenging, Approximate Newton may work better than the original Newton method. This is often true in the early stage of Newton iterations when a good starting solution is not yet available. Therefore, Approximate Newton is a very good candidate for the start-up process, especially if the solution process is started from a poor initial guess. In the second category, Inexact Newton methods [25], the complete linearization based on the flux integral on the right hand side is employed and the true Jacobian is calculated. However, the resultant linear system at each Newton iteration is solved approximately by an iterative linear solver. For highly non-linear problems such as compressible flows, the linearized system, especially at the initial iterations is not an accurate representation of the non-linear problem. As a result, completely solving the linear system does not improve the overall convergence rate. Instead, the linear system is solved up to some tolerance criteria which normally is chosen as a fraction (typically between 1 0 _ 1 and 1 0 - 2 ) of the flux integral on the right hand side. Among iterative linear solvers, Krylov subspace methods are the most common, and amongst these, the Generalized Minimal Residual ( G M R E S ) [72] algorithm (visited in section 5.2 in detail) has been developed mainly for non-symmetric systems such as those resulting from unstructured meshes. In the matrix-free G M R E S , the matrix vector products required by the G M R E S algorithm are computed without forming the matrix explicitly. Matrix-free G M R E S [14] is a very attractive technique for dealing with complicated Jacobian matri-ces, because it reduces memory usage considerably and eliminates the problem of explicitly forming the Jacobian matrix. This is especially helpful for higher-order unstructured mesh solvers where full (analytic) Jacobian calculation is extremely costly and difficult, if not impossible. G M R E S efficiency depends strongly on the conditioning of the linear system. This is especially important for higher-order discretization, which makes the Jacobian ma-trix more off-diagonally dominant and quite ill-conditioned, and for the Euler equations (compressible flow) with the non-linear flux function and possible discontinuities in the so-lution. Applying a good preconditioner for G M R E S under these circumstances becomes a CHAPTER 1. INTRODUCTION 14 necessity [49]. As a result, there are a wide variety of Quasi-Newton methods, where the forming of the Jacobian, solving the linear system, choosing the preconditioner and starting up the solution process are the key factors in overall performance and robustness of the solver. For structured meshes, Pueyo and Zingg [63, 64, 65] presented an efficient matrix-free Newton-GMRES solver for steady-state aerodynamic flow computations. They investigated the efficiency of different Quasi-Newton methods, Incomplete Lower-Upper (ILU) precondi-tioning strategies (see section 5.2.3), and reordering techniques for a variety of compressible inviscid, laminar and turbulent flows using a G M R E S iterative solver. They showed that the Approximate Newton method using matrix-free G M R E S - I L U ( 2 ) with Reverse Cuthill-McKee reordering, when the lower order Jacobian is employed as the preconditioner matrix, provides the best efficiency in terms of C P U time for most cases. Later on Nichols and Zingg [54] developed a 3D multi-block Newton-Krylov solver for the Euler equations using the same approach. Through parametric study, they showed that I L U ( l ) provides an adequate balance between good preconditioning and low computational time. For unstructured meshes, Venkatakrishnan and Mavriplis [90] developed an approximate Newton-GMRES implicit solver for computing compressible inviscid and turbulent flows around a multi-element airfoil. They compared different preconditioning strategies and found out that G M R E S with I L U preconditioning had the best performance. In their case the graph of the linearized Jacobian and the unstructured mesh were the same, as the Jacobian was approximated based on the direct neighbors, and ILU(O) demonstrates satis-factory result. Barth and Linton [14] successfully applied both full matrix and matrix-free Newton-GMRES for computing turbulent compressible flow on unstructured meshes. They also presented a technique for constructing matrix-vector products which is an exact calcu-lation of the directional derivatives. Delanaye et al. [24] presented an I L U preconditioned matrix-free N e w t o n - G M R E S solver for Euler and Navier-Stokes Equations on unstructured adaptive grids using quadratic reconstruction. This study shows ILU(O) in preconditioning of stiff problems when the Jacobian on the left hand side is higher-order can be insufficient for reaching full convergence. By permitting one more fill-level in the I L U decomposition (ILU(l)) , full convergence was achieved. A totally matrix-free implicit Newton-GMRES method was introduced by Luo et al. [44] for 3D compressible flows using L U - S G S (Lower-Upper Symmetric Gauss-Seidel) as the preconditioning strategy. They completely elimi-nated the storage of the preconditioning Jacobian matrix by approximating the Jacobian with numerical fluxes. However, most probably because of the stability .consideration for their preconditioning strategy (LUSGS) , the full Newton iterations were not performed and CHAPTER 1. INTRODUCTION 15 the convergence rate remained nearly linear. Manzano et al. [45] presented an efficient I L U preconditioned matrix-free Newton-Krylov(GMRES) algorithm for 3D unstructured meshes. They used different levels of fill (ILU(l-3)) depending on the case and the flux residual to achieve optimum performance. Nejat and Ollivier-Gooch [50] developed a L U - S G S precon-ditioned matrix-free N e w t o n - G M R E S algorithm for higher-order inviscid compressible flow computations. Their results show that L U S G S - G M R E S works almost as efficiently for the third-order discretization as for the second-order one. They also presented a supersonic airfoil case in which the third-order discretization converged faster than the second-order discretization. That again raised the possibility that using higher-order discretization could in fact increase both accuracy and efficiency. 1.3 Objective To develop an efficient and accurate algorithm for a compressible viscous flow (i.e. Navier-Stokes equations), the first step is to develop the base flow solver for a compressible inviscid flow (i.e. Euler equations). It is worth mentioning that the extension of an inviscid solver to the viscous solver will not affect the overall algorithm in principle, although application of anisotropic/hybrid meshes as well as computing the viscous flux function (instead of inviscid one) are required. The goal of this research is to develop an efficient and accurate solution algorithm for inviscid compressible fluid flow simulations using unstructured meshes. To achieve this objective an existing higher-order unstructured reconstruction procedure [60] is combined with an efficient implicit time advance algorithm. 1.4 Contributions A fast and efficient implicit (matrix-free) algorithm is designed and successfully implemented for the 2nd, 3rd and 4th order accurate steady-state computation of inviscid compressible flows. The robustness and accuracy of the developed solver have been verified through several test cases. It should be noted that the 4th-order unstructured finite volume solution for inviscid compressible flows has not been available prior to this research, and the currently available 3rd-order results in the literature lack curved boundary implementation. CHAPTER 1. INTRODUCTION 16 The solution process has been divided into two separate phases, a start-up phase and a Newton phase. A defect correction procedure is proposed for the start-up phase consisting of multiple implicit pre-iterations. This research provides a deep insight into preconditioning, a comprehensive convergence comparison, and a meaningful cost breakdown study for the implicit algorithm for the 2nd, 3rd and 4th-order unstructured discretization methods. i A differentiable switch is designed for the limiting-of the higher-order terms in the recon-struction polynomial in the vicinity of discontinuities. Accuracy assessment is performed for a series of independently generated irregular unstruc-tured meshes, which based on the author's knowledge to this extent is unprecedented and proves the overall accuracy of the developed solver algorithm. Unstructured finite volume solution of the 2nd, 3rd and 4th-order methods have been com-pared in detail for subsonic, transonic and supersonic cases. The possibility of reducing computational cost required for a given level of accuracy using high-order unstructured discretization is demonstrated. 1.5 Outline In Chapter 2, the main flow solver is introduced, including governing equations, implicit time advance algorithm (linearization of the flux in time), upwind flux formulation, boundary flux treatment, and flux integration. In Chapter 3, the employed reconstruction procedure, monotonicity enforcement and higher-order limiter implementation are discussed; a new formula for limiting the higher-order terms of the reconstruction polynomial is introduced including a smooth switch. In Chapter 4, the Jacobian matrix calculation used in preconditioning is described. A first-order approximate analytical Jacobian is derived for the interior and boundary fluxes. Also, a similar low-order Jacobian computation using the finite difference technique is introduced in addition to further discussion on the accuracy of this finite difference Jacobian. Chapter 5 lays out the solution strategy and introduces the matrix-free' G M R E S linear solver (with special higher-order considerations) and the applied preconditioning in detail. CHAPTER 1. INTRODUCTION 17 Chapter 6 and 7 are devoted to the numerical results which include the verification cases (Part I) and simulation cases (Part II) respectively. First the accuracy and performance of the developed solver are verified for basic (not necessarily easy) test cases. Then fast con-vergence and robustness of the proposed higher-order unstructured Newton-Krylov solver have been investigated for subsonic, transonic, and supersonic flows. Furthermore solutions of different orders of accuracy for all test cases have been compared in detail. Finally, in Chapter 7, the thesis is brought to closure by summarizing the research, describ-ing the contributions, providing conclusions based on the results, and recommending some future work. Chapter 2 Flow Solver 2.1 Governing Equations Conservation of mass (continuity), momentum (Newton's second law) and energy (first Thermodynamics's Law) are the principal equations which govern the dynamics of all fluid flows. To simulate a fluid flow field, we need to solve these equations with proper implemen-tation of physical boundary conditions. If dissipation, viscous transport phenomena, and thermal conductivity are neglected in a fluid flow, the flow is considered to be non-viscous or inviscid. Leonhard Euler, the Swiss scientist for the first time derived (1775) inviscid fluid flow equations, known today as Euler equations. Euler equations are the limit form of the Navier-Stokes equations, where Reynolds number goes to infinity. For many practical aerodynamic applications, Euler flow is a relatively accurate representation of the flow field and includes both rotational and discontinuous (shock) phenomena in the flow providing an excellent approximation for lift, induced drag and wave drag. Also, a robust Euler solver is an essential part of any Navier-Stokes solver. The finite volume formulation of the unsteady 2D Euler equations for an arbitrary control volume can be written in the following form of a volume and a surface integral (2.1), where U is the solution vector in conservative variables, and F is the flux vector 1 . (2.1) 1 More formally, F should be referred as a dyad. 18 CHAPTER 2. FLOW SOLVER 19 where p u = . pu , F = pv E puun + Pnx pvun + Pny (E + P)un (2.2) In (2.2), u n = unx + vhy and [p pu pv E]1 are the densities of mass, x-momentum, y-momentum and energy, respectively. The energy is related to the pressure by the perfect gas equation of state, (i.e. P = pRT). The following equations (or definitions) correlate thermodynamic properties such as energy and enthalpy with density, temperature, pressure and velocity in the Euler equations: E = P 7 - 1 + P1 (u2 + v2) (2.3) 7 = Cp r - ^R Cy R 7 - 1 ' (2.4) 1 P e = CvT, et = e+ ~(u2 + v2), E = pet, h = e + - , (2.5) A p In (2.3), E is the total energy and 7 is the ratio of specific heats. In (2.4) R is the gas constant, Cp and Cv are the specific heats. Finally in (2.5) e is the internal energy per unit mass (specific energy), et is the total energy per unit mass, and h is the specific enthalpy. For a compressible flow, most of the flow properties are described as a function of Mach number, M, which is a non-dimensional form of the flow speed, and it is defined as the ratio of velocity magnitude to the local sound speed, a. = V(7 - l)h = ^RT (2.6) Normally for aerodynamic applications, density is in the order of 1, velocity is in the order of 102, and energy is in the order of 10 5. As a result, the solution vector, U, and the flux vector, F, (2.2) in their dimensional form have quantities with very different scales. This CHAPTER 2. FLOW SOLVER 20 introduces tremendous potential for numerical errors, as well as unnecessary stiffness of the resultant linearized system of equations (next section). To avoid that issue we need to normalize our system of equations, in such a way that all quantities become order of one. This can be done by normalizing the quantities respect to some properly chosen reference values, like free stream values. While normalization does not change^the form of the original equations, it transforms them to the proper non-dimensional form. However, some of the correlations between the properties may look different. For example, in the next couple of lines we show part of this process for Euler equations: Reference quantities: Pief - Poo, Tre{ = T o o , P r e f = 7 P 0 0 , F-ref = Roo = Const. (2.7) Non-Dimensional quantities: p„ = A" Tn = ^ , Pn = -£-, Rn = ~ = l (2.8) Pref ref ref -^ref It is not difficult to show that the speed of sound in its. non-dimensional form is equal to the square root of normalized temperature, i.e. an = y/Tn . In general, the mathematical characteristics of the fluid flow equations depend on the flow speed regime and they change with Mach number. For instance the system of steady Euler equations are elliptic in space for subsonic flow and they are hyperbolic in space for supersonic flow. If the fluid flow equations are written in their unsteady form, they would be hyperbolic in time independent of the speed regime. As a result, it is relatively easy to integrate fluid flow equations in time instead of space. The set of unsteady Euler equations are a first-order nonlinear P D E , which make a hyperbolic system in time, and therefore properties propagate along the characteristic lines. To obtain a steady-state solution time marching process would be continued up to the point that all time derivatives reach some tolerance criteria. 2.2 Implicit Time-Advance Assuming the discretized physical domain does not change in time, U can be brought out from the integral in E q . (2.1), as the average solution vector of the control volume: CHAPTER 2. FLOW SOLVER 21 ^ = - - L - / FdA (2.9) dt AcViJcSi The left hand side of (2.9) is the first time derivative of the average solution vector in each control volume and the right hand side represents the spatial discretization for the same control volume. Right hand side of (2.9) is called flux integral or residual of the control volume, which is a nonlinear function of solution vector, and can be re-written in the form of (2.10) for control volume i : d § = -R(u>) (2.10) To form an implicit expression for (2.10), we need to evaluate both sides of the equation at the time level "n+1". This can be done by backward time differencing (backward Euler) of the left hand side and residual linearization in time for the right hand side. n+l UJ1 At -R(U?+1) = -BR R(ur) + —(u?+\ ur) + o((u?+i-u?)2) (2.11) or (ii+ w)SUi = ~R(un> u?+l = u?+6U> (2-12) where | ^ is the Jacobian matrix resulting from residual linearization. Equation (2.12) is a large linear system of equations which should be solved at each time step to obtain an' update for the vector of unknowns. Wi th this approach the accuracy in time is first order; however, as we are only interested in steady-state solution, time accuracy is not our concern, and advancing the solution in time continues till the residual of the non-linear problem practically converges to zero. In E q . (2.12), the Jacobian matrix, is an essential element in implicit formulation, and setting Jacobian equal to zero is equivalent to explicit time advance. Note that in the case of Euler flux, Jacobian calculation is not a trivial task and it takes a considerable amount of effort. The degree of difficulty of Jacobian evaluation depends on the type of. the employed flux formula and its complexity. The details of Jacobian evaluation will be discussed in CHAPTER 2. FLOW SOLVER 22 Chapter 4. Having the Jacobian computed, solving the resultant large linear system in each time iteration which is often very ill conditioned and off-diagonal is another part of implicit solvers. Due to size and sparsity structure (graph) of the matrix, subspace iterative methods are the most appropriate choice for solving such a system (Chapter 5). Coding an implicit formulation is much more complicated than similar explicit formulation and efficient parallelization of implicit solvers is a relatively difficult process. However, implicit methods have a critical advantage over explicit methods which is stability and fast convergence. Implicit methods do not suffer from restrictive stability condition of explicit methods. In theory they are unconditionally stable and by taking large time steps steady-state solution is reached rapidly. If we take an infinite time step E q . (2.12) becomes Newton iteration (2.13). Newton's method generally converges quadratically in the vicinity of solution. However, near singularities the convergence rate may degrade from quadratic rate to linear rate [14]. In practice, taking a time step too large is not beneficial when the solution process is started from a poor initial guess. Because Euler flow is highly non-linear, the linearization of the flow equations is not an accurate representative of the original non-linear problem at an early stage of the iterations. Therefore any update based on a large time step (such as Newton update) would be inaccurate, often causing instability or stall in convergence. Normally, a start-up process is needed to advance the solution from an initial guess to a good solution state for taking large time step and eventually switching to Newton iteration. One strategy for this start-up process is defect correction, in which the flux integral on the right hand side is computed to the desired order of accuracy while on the left hand side the linearization is done based on a lower-order discretization (normally first-order): Wi th this approach, the higher-order Jacobian computation which is very expensive and inaccurate in early stage of iterations is avoided. As the linear system based on lower-order discretization has simpler structure in terms of the matrix bandwidth and is better condi-tioned, solving the resultant.system is easier and more stable especially for stiff Jacobians. The defect correction linear system is relatively inexpensive to form and it can be precon-ditioned effectively by the same matrix. In addition, early high-frequency oscillations are (Wf)6Ui = -R(U?), U^+1 = Ur + SUi (2.13) (2.14) CHAPTER 2. FLOW SOLVER 23 •damped efficiently when a moderate time step is used which is necessary anyhow for the start up process.. Another start-up technique is using E q . (2.12) based on higher-order Jacobian but taking small time steps at early stage of solution process. W i t h this approach At is small, the linear system would be diagonally dominant or at least better conditioned, and consequently easier to solve. The drawback is in the case of complicated flows like transonic flow, higher-order Jacobian would be very ill-conditioned, and to be able to solve the linear system we need to choose very small At, which may not accelerate the solution convergence efficiently. In practice one may combine the above strategies with mesh sequencing, starting the solution process on coarser meshes and then transferring it to a final fine mesh. First-order solution also could be used as an initial guess either independently or in combination with other strategies. In general, it is hard to say which strategy or combination will work best for the start-up process as it depends on the problem and type of linear solver (Chapter 5). 2.3 Upwind Flux Formulation To compute the control volume residual or flux integral, we need to evaluate the numerical flux at Gauss points located on control volume boundaries and integrate along the boundary edges. In other words, residual or flux integral computation includes three steps: 1. Reconstructing solution vector V (in primitive variables) over each control volume up to the desired accuracy (next chapter). 2. Computing the numerical flux vector at the control volume interfaces based on the reconstructed V . . 3. Integrating numerical flux along the control volume interfaces using Gauss quadrature rule. To describe an upwind scheme, it is helpful to start from a first-order linear wave equation, Eq.(2.15), where A is the wave speed. For a positive value of A, this represents the propa-gation of a wave from left to the right (positive direction) along the x axis, as it is shown in Fig(2.1). (2.15) CHAPTER 2. FLOW SOLVER 24 i-1 - • — Propagating Wave i KM I X u2 X Figure 2.1: Propagation of a linear wave in positive direction Based on the physics and the model equation, it is evident that the information in the field is propagating with the wave speed and in the wave direction. Therefore, the properties at point i are influenced by the propagated properties originating from point i—1 and properties at point i + 1 will not physically affect the properties at point i. As a result it makes sense that in discretization of wave equation, | ^ is replaced by U i ~ £ x ~ l or by any other type of backward differencing. It is well understood that using central or forward differencing would result in non-physical and/or oscillatory behavior and typically instability in solution. In other words, any numerical scheme used in discretization of the fluid flow equations should respect the physical characteristics of the fluid flow, i.e. the velocity and direction of the propagated information throughout the flow field. The unsteady Euler equation, which is hyperbolic in time, behaves somewhat similarly to the wave equation, although it is much more complex, has non-linearity, and is in vector form. In Euler (inviscid compressible) flow, information travels along characteristic lines, and the slope of these lines are the eigenvalues of the Jacobian matrix, which is the derivative of the Euler flux vector with respect to the solution vector, E q (2.16 ). CHAPTER 2. FLOW SOLVER 25 0 nx ny 0 9F_ _ (7 - l)efc"x - uun un - (7 - 2 )un x u n y - (7 - l)vnx (7 - l ) n x dU ( 7 _ l ^ n , , - vun vnx - (7 - l ) u n y u „ - (7 - 2)vny (7 - l)ny ((7 - l)efc - ht)un htnx - (7 - l ) u u n htny - (7 - l ) u u n 71% (2.16) et — e + ek, ek = -(u2 + v2), ht—.h + ek a u„ (2.17) These eigenvalues, E q (2.17), describes the direction and speed of disturbance (information) propagation in the flow field. Obviously, the upwinding direction is determined based on the direction of these characteristic lines. Several categories of upwind methods such as Flux Vector Splitting (FVS) , Flux Difference Splitting (FDS) and Advection Upstream Splitting Method ( A U S M ) have been developed during the last three decades for flux evaluation and have been successfully employed in compressible flow computation. Hirsch [35] and Laney [41] reviewed upwind schemes in detail. 2.3.1 T h e G o d u n o v A p p r o a c h S.K. Godunov in 1959 introduced an elegant idea in numerical flux calculation [35]. He suggested the use of the locally exact Euler solution for the boundary interfaces of domain cells. The solution in each cell is considered to be piecewise constant, and the cell interface separates two different solution states, i.e. UL and UR (Fig ( 2.2)). Knowing the initial solution, evolution of the flow to the next time step can be exactly calculated through the wave interactions originating at the boundaries between two adjacent cells. This leads to solving the shock tube or Riemann problem. The Riemann problem has an exact solution composed of a shock wave, a contact discontinuity and an expansion fan separating regions with constant solution states. Propagating the aforementioned waves over a time interval A t changes those regions and their states from one solution at t = t\ to t = t\ + At. A l l waves propagate consistently with the upwind directions, making the resultant solution depend only on the physical zone of dependence. As shock tube problem requires solving a non-linear algebraic equation, which is quite time consuming, implementing an exact CHAPTER 2. FLOW SOLVER 26 t=tj Discontinuity Interface u R P L > P R t=ti+At Expansion Waves 1:L — Contact'Surface LU ' LH I Shock Wave — 4:R Figure 2.2: Shock-Tube problem Riemann solver for upwind flux evaluation would not be efficient2. Therefore, different approximate Riemann solvers have been developed to reduce the computation cost through some averaging procedure. Roe's approximate Riemann solver, introduced in the early 1980's [69] is one of the most successful and commonly employed approximate Riemann solver in modern C F D is used in this research. 2.3.2 R o e ' s F l u x Di f fe rence S p l i t t i n g Scheme Roe's approximate Riemann solver [70] is based on a characteristic decomposition of the flux differences and is a clever extension of the linear wave decomposition to the non-linear shock tube problem keeping the conservation properties of the scheme. The detail of Roe's scheme is quite complicated [41] and here for brevity just the flux formulation is presented. The Roe's flux formula at the interface of a control volume is equal to the average fluxes of left and right states minus a differencing term which splits the difference of the fluxes on both sides of the control volume: F(UL, UR) = \ \F{UL) + F(UR)} - \ \A\ (UR - UL) (2.18) 2 There is some evidence suggesting that an exact Riemann solver for polytropic gases can be competitive with most approximate techniques [31]. CHAPTER 2. FLOW SOLVER 27 A is the Jacobian matrix evaluated based on the Roe's average properties, and in diagonalized form in practice as: A A x- A X A = Diag(Xi) is written (2.19) where X are the right eigenvectors and A are the eigenvalues of the Jacobian matrix [71]. Roe's average properties are given by [69]: P = sJpLPR JpLUL + Jp~RUR h, = VP~L + \ /PR \/p~LVL + \ /~PRVR VPL + \ /PR y/p~LhtL + j~PR~htR VPL + s/PR (2.20) Although Roe's scheme is one of the most accurate available flux functions, it is quite an expensive method for flux computation and due to its matrix differencing term it requires 0 ( n 2 ) operations per control volume in each iteration (n is number of equations). If a real gas is used, then differencing term should be derived specifically based on the new equation of state, which could introduce some difficulty in derivation of differencing matrix. At the same time applying the Roe's flux formula in implicit context, requires differentiation of differencing matrix which has proven to be very tricky and expensive even for perfect gas (Chapter 4). A n d finally some empirical cure needs to be implemented in Roe's formulation in the case that Mach number goes towards one (sonic speed) or when Mach number goes to zero (stagnation point). In either case, we are dealing with eigenvalues equal to zero and the method can not distinguish the correct upwind direction resulting either non-physical solution (entropy reduction) or lack of convergence due to its non-differentiable flux. Adding a small value of e to eigenvalues, or rounding the slope of characteristic lines near zero, which is the taken approach in this thesis E q . (2.21), could totally resolve that issue (Fig(2.3)). if |Aj| < C c u t 0 f f A c i = (A 2 + C 2 ) Oi (2.21) CHAPTER 2. FLOW SOLVER 28 Eigenvalue correction 0.2 -\ 0.15 \ " •Cut-Off=0.l|/ \ 0,1 0.05-X Figure 2.3: Rounding the characteristic slope near zero 0.5 C \ = T T - — , C2 = C c 2 u t o f f , C c u t o f f « 0A(typically) ^cut off 2.4 B o u n d a r y F l u x T rea tment Implementing correct boundary conditions is a necessary condition for numerical fluid flow computation. Generally speaking, any kind of mistake in boundary condition treatment would result in either convergence problem and/or inaccurate solution. This is especially true if the imposed boundary condition does not match the flow physics. For an implicit solver it is also essential to formulate the boundary treatment implicitly both for the main solver and for the preconditioner (Chapter 4), or performance of the implicit solver will be degraded severely in addition to introducing a restrict C F L limitation. Normally boundary conditions are categorized as wall boundary and inlet/outlet boundary conditions, which are discussed in the following subsections. 2.4.1 Wall Boundary Condition In steady-state inviscid flow, there is a slip velocity condition: velocity normal to the wall surface is zero (i.e. velocity at the boundary is parallel to the surface). As a result the CHAPTER 2. FLOW SOLVER 29 convective flux normal to the surface is zero (flow does not cross the wall) and the pressure flux in the momentum equations is the only remaining flux. Both the continuity and energy fluxes are zero at steady-state. T h e easiest way to implement wall boundary flux in Euler flow is to set Un — 0 in the flux formula. There are other alternative wall treatments like reflective boundary conditions using ghost cells, or reconstruction of flow properties (Chapter 3) with a constraint of Un = 0. A l l those treatments more or less produce the same result as in all cases the convective flux is forced to be zero. In the current research, normal velocity is set to be zero using a proper constraint in reconstruction, and pressure is reconstructed up to the desired order of accuracy at the wall. 2.4.2 I n l e t / O u t l e t B o u n d a r y C o n d i t i o n s As mentioned in section 2.3, the numerical modeling should be consistent with the physical characteristics of the flow field, and with the information propagating along characteristic lines (according to eigenvalues of the Jacobian matrix). The inflow and outflow boundary condition implementation are completely different for subsonic and supersonic flows, there-fore the first thing is to determine the type of boundary condition. That can be easily done by computing the flow quantities (primitive variables) at the boundary using reconstructed values and working out the Mach number and flow direction with respect to the bound-ary surface. Knowing the type of the boundary, a proper boundary treatment needs to be carried out. For a subsonic inlet three eigenvalues are positive, meaning that three characteristic lines are transferring inlet information from inflow toward the solution domain. The remaining eigenvalue is negative at subsonic inlet i.e., one set of information travels from the solution domain towards the inlet. In this situation three parameters, in our case, total pressure, total temperature, and angle of attack are set at the subsonic inlet and static pressure is taken from the interior. T h e following formulas show the detail of computing the velocity components and density at subsonic inlet. T, = l + l ^ A 4 , 1^=1.0 (2.22) P« = i ( l + l ^ M i ) * , P „ = H (2.23) CHAPTER 2. FLOW SOLVER 30 TB = Tt(^^j 1 (2.24) P^PRB : Reconstructed Pressure at the Boundary V-=V4T(I- 11 (225) Ub = Vmcos(a), Vb = Vmsin(a) • (2.26) a : Angle of Attack Pb = l^ (2-27) At a supersonic inlet all eigenvalues are positive which means all boundary fluxes are com-puted based on the inlet properties ( a o o = 1)-Pb = Pin — Poo , Pb = Pin = Poo (2.28) ub = uin = Moacos(a), Vb = vin = Moosm(a) (2.29) At a subsonic outlet, three eigenvalues are negative and one is positive (normal is toward the solution domain) and therefore three pieces of information are taken from the solution domain (through reconstruction) and only the static pressure is fixed at the outlet as the back pressure. Pout = PRB , U O U T = URB , Vout = VRB , Pout — PBP (2.30) PBP — Poo for external flow CHAPTER 2. FLOW SOLVER 31 2nd-Order / 1 Gauss point per face 3rd and 4th-0rder / 2 Gauss points per face Figure 2.4: Schematic illustration of Gauss quadrature points In the case of supersonic outflow, since all eigenvalues are negative, all the information is coming from the solution domain towards the boundary, and no condition is set at the supersonic outlet. , Pout = PRB , U O U T = URB , Vout = VRB , Pout — PRB r (2-31) In all cases that implementing boundary condition requires properties to be taken from the solution domain, the higher-order reconstruction of those properties have been used to insure the correctness of accuracy of boundary condition treatment. 2.5 Flux Integration To compute the residual for each control volume, numerical fluxes should be integrated over control volume edges. T h e flux residual is the summation of these flux integrals. The accu-racy of flux integration should be equal or higher than the accuracy of flow reconstruction in order to evaluate the residuals with higher-order accuracy. This is achieved by Gauss quadrature integration. For the 2nd-order or linear reconstruction case one quadrature / CHAPTER 2. FLOW SOLVER 32 point per face is used. For the 3rd-order (quadratic reconstruction) and 4th-order (cubic reconstruction) cases, we use two quadrature points per face with the proper weightings. In the case of a curved boundary, the quadrature points are located on the curved arc segment instead of the straight edge for higher-order flux integration. This is due to the fact that mistaking a curved segment with a straight edge will introduce a 2nd-order error (i.e. 0(5S2), S is the arc length) in both computing and integrating the fluxes'at curved . boundaries. F ig (2.4) shows the quadrature points for the flux integration schematically. More information regarding the locations and weights of the Gauss quadrature integration points is provided in Ref. [80]. Chapter 3 Reconstruction Procedure and Monotonicity Enforcement 3.1 Introduction To compute a higher-order solution, numerical fluxes should be evaluated with higher-order accuracy containing two separate steps: first, calculating all flow variables anywhere inside a control volume up to the desired order of accuracy; second, higher-order integration of the evaluated fluxes over boundary edges using the proper number of Gauss points. The latter is relatively easy, but higher-order approximation of flow variables inside a control volume is neither unique nor trivial. This task could be done by implementing higher-order upwind differencing, and/or higher-order representation of the solution inside a control vol-ume (i.e. solution reconstruction) using polynomials with degree one or higher [82]. In structured discretization, this leads to the development of M U S C L (Monotone Upstream-centered Schemes for Conservation Laws) [35] . For structured meshes, higher-order re-construction of a solution is much easier due to mesh regularity. However, in the case of unstructured meshes a more complicated procedure needs to be adopted to represent the solution with higher-order accuracy within a control volume. Barth [13] laid out an straight forward procedure for gradient estimation for unstructured tessellation using Green-Gauss gradient technique, Because of its simplicity and robustness, the linear Green-Gauss reconstruction has been employed extensively in finite volume solvers since then. Barth also developed a minimum energy (i.e. Least-Squares) reconstruction procedure capable to computing the flow variables up to (K-fl)th-order of accuracy using 33 CHAPTER 3. RECONSTRUCTION AND MONOTONICITY 34 a Kth-order polynomial known as K-exact reconstruction [10, 12, 11]. The least-square reconstruction is relatively more difficult to implement but in general the linear least-square reconstruction is more accurate especially in the case of distorted meshes and/or in the presence of limiter [4]. That is expected since the least-square technique is not as sensitive as the Green-Gauss method to noise in the solution. Delanaye [22] has compared the accuracy of the linear and quadratic reconstruction techniques for a smooth flow (Ringleb's flow) on unstructured meshes. The linear least-squares reconstruction performed better than the linear Green-Gauss reconstruction both in terms of accuracy and preserving the total pressure (i.e. less entropy production). The quadratic Green-Gauss and least-square reconstructions behaved very similarly, although this time the total pressure loss was a bit higher for least-square method. In this research a K-exact least-squares reconstruction [60, 56] is employed which is de-scribed in the next section. In section 3.3 the accuracy assessment methodology is studied. Finally, section 3.4 discusses the monotonicity enforcement and introduces the taken limit-ing approach to address that issue. 3.2 K-Exact Least-Square Reconstruction The goal here is to reconstruct a solution polynomial based on the control volume data such that the truncation error of the solution in each control volume would remain in the order of (Ax)K where (Ax) is the local mesh length scale. Obviously, certain physical and mathematical criteria need to be met to reconstruct such a polynomial for each control volume. These criteria are: 1- Conservation of the Mean, 2- K-Exact Reconstruction, 3-Compact Support which are introduced briefly in next three subsections. In the last two subsections the implementation of boundary condition at Gauss points using boundary constraints, and solving the least-square system will be discussed. 3.2.1 C o n s e r v a t i o n of the M e a n The finite volume approach provides us piecewise constant data in each control volume which is equal to the integral control volume average. This integral average is updated from one iteration to the next and is the most meaningful measure of the physical quantities in each control volume. A Kth-order piecewise polynomial UR (x,y) can be reconstructed in such a way that the integral of URK\x,y) over the control volume is equal to the control CHAPTER 3: RECONSTRUCTION AND MONOTONICITY 35 volume average E q . (3.1). This is a very important property called Conservation of the Mean. f URK)(x,y) = UCv (3.1) Jcv (x - xc,y - yc) (3.2) m+n</c P(m,n)(x-xc,y-yc) = (x- xc)m(y - yc)n (3.3) (xc,yc) are the coordinates of the reference point for the control volume, which in our case (cell center formulation) is the cell centroid, Fig (3.1). In general, oc^mn^ are the derivatives of a continuous function represented by the recon-structed solution polynomial (URK\x,y)) with respect to the independent geometric vari-ables (i.e. x and y) : U{£\x,y) = URK\x,y) = U(xc,yc) + dU dx d2u dx2 d3u A x 2 + A x + — c dv d2U &y + c c dx3 Ax3 dxdy d3u AxAy + d2U dy2 A y 2 c c dx2dy Ax2 Ay d3U 2 dxdy2 c + AxAy2 d3U 2 + ~dy~3 Ay3 c (3.4) where, Ax = x-xc, Ay = y-yc (3.5) Imposing the conservation of the mean condition, or mean constraint is accomplished by integrating of URK\x,y) over the control volume i and making that equal to the control volume average i.e. E q . (3.1). For simplicity all the derivative terms in the following equations are replaced by simpler expressions like: Uxy = . Figure 3.1: A typical cell center control volume and its reconstruction stencil, including three layers of neighbors CHAPTER 3. RECONSTRUCTION AND MONOTONICITY ; 37 Equation (3.6) can be rearranged by introducing control volume moments definitions: Ui = Uc + -]-(Ux\cMx + Uy\cMy).+ UXX\C + UXy\C MXy + l Uyy\C My2 ) + UXXX\CMX3 + - UXXy\CMx1y + - UXyy\CMxy2 + ~ Uyyy^MyZ ) + where, (3.6) M x m y n \ = f (x- xc)m(y - yc)ndA (3.7) JCVi These moments can be computed up to the desired order of accuracy by Gauss's theorem. Notice that replacing m and n in E q . (3.7) by zero computes the area of the control volume. The introduced mean constraint is the first row (equation) in the least-square system. 3.2.2 K - E x a c t R e c o n s t r u c t i o n Consider a problem which is smooth and has an exact solution in the form of C/^(a;,j/). If (uRK\x,y)\ is a Kth-order polynomial reconstructing an approximate solution for the problem over control volume i, then the error in solution reconstruction for the control volume i would be in the order of 0(Axi)k+1 E q . (3.8). URK)(x,y)) =UE(x,y)+0(L\x, \k+l (3.8) In other words (UrK^ (x, y)^ reconstructs the solution exactly if the exact solution is a K t h -order polynomial, i.e. UE{X, y) = Pg (x, y). This property is called K-Exact Reconstruction. This property also can be expressed in the integral form: Error\cv. = UE(x,y)dA-CVi JCVi URK)(x,y)dA 0{Axt) k+l (3.9) CHAPTER 3. RECONSTRUCTION AND MONOTONICITY 38 3.2.3 Compact Support Notice that the reconstructed polynomial, (URK)(x, y)^ , is piecewise and changes from one control volume to another. Therefore integrating the reconstructed function of a control volume over its neighbors does not satisfy the average value of the neighboring control volumes exactly. For the Kth-order solution reconstruction, we are reconstructing a (K- l ) th degree polynomial which satisfies the mean property for the control volume i and minimizes the error in computing control volume averages of the neighboring control volumes. The neighboring control volumes used in reconstruction constitute the reconstruction stencil, Fig (3.1). The reconstruction stencil should include a proper number of neighboring control volumes for computing the mentioned derivatives. The number of unknowns for the linear, quadratic, and cubic reconstructions are 3, 6 and 10 leading to 2nd, 3rd and 4th-order accuracy. Generally speaking, for a K - t h order accurate reconstruction, at least K[K + l ) /2 neighboring control volumes must be included in the reconstruction stencil. However, for practical reasons one may use a few more control volumes in the reconstruction stencil. This leads to a larger least-squares system but provides additional information for reconstruction usually resulting in more robust solution reconstruction in the presence of non-smooth and/or vigorously oscillatory data. In this research, the reconstruction stencils for 2nd, 3rd, and 4th-order methods include* 4, 9 and 16 control volumes respectively. Not surprisingly, the polynomial (uRK\x, y)^ for control volume i should be reconstructed based on control volumes that are topologically and physically close to the control volume i. This property is called Compact Support. Consequently, the 4th-order reconstruction stencil covers three layers of the neighboring control volumes. This could adversely affect the compactness of the reconstruction for very coarse meshes. Since for very coarse meshes the distance spanned by the reconstruction stencil is uncomfortably close to the characteristic size of the domain, full convergence might not be achieved as reconstruction stencils of the boundary control volumes overlap considerably with the reconstruction stencils of interior control volumes. Because the size of the stencil and the associated cost of the reconstruction increase quadratically with the order of accuracy, there is a practical limit to the benefits of the classic finite-volume higher-order reconstruction. In reconstructing a non-smooth data in the vicinity of a discontinuity such as shock wave even when we use a compact stencil geometrically, the stencil is not compact physically as the data employed in reconstruction are irrelevant for control volumes across the shock wave. There are two different approaches to address this issue. One is choosing the best stencil containing the smoothest possible data and/or down playing the role of non-smooth data in reconstruction by assigning small weights to them leading to E N O (Essentially Non Oscillatory) and W E N O (Weighted Essentially Non Oscillatory) schemes [33, 43, 3, 57]. CHAPTER 3. RECONSTRUCTION AND MONOTONICITY 39 Figure 3.2: Imposing boundary constraint at the Gauss boundary points The other passive approach is to employ a limiter and enforce monotonicity by correcting (reducing) the computed gradients, suppressing over and under-shoots resulting from recon-struction over a non-physically compact stencil [13, 87, 24]. The latter approach has been taken in this research. ' 3.2.4 B o u n d a r y C o n s t r a i n t s In addition to the Mean Constraint, it is also possible to include additional boundary constraints to the reconstructed polynomial satisfying physical boundary condition at the solution domain boundaries [80, 60]. The straightforward way of doing this is imposing the boundary constraint at Gauss points where fluxes are actually calculated. Therefore, the coefficients of the reconstructed polynomial are computed such that the physical boundary condition is imposed automatically. As a result, number of boundary constraints per control volume is equal to the number of control volume Gauss points on boundary edges, Fig(3.2). 3.2.4.1 Dir ichle t Boundary Constraint For a Dirichlet boundary condition, the value of the flow variable is known at the boundary. Consequently, the reconstructed polynomial should give us the same boundary value at the Gauss points (3.10): CHAPTER 3. RECONSTRUCTION AND MONOTONICITY 40 (xGi,yG1) = UDB(xGl,yGi), URn)(xG2,yG2) = UDB(xG2,yG2) (3.10) T(K) 3.2.4.2 Neumann Boundary Constraint In Neumann boundary condition the normal derivative of the flow variable is known at the boundary edge (3.11). Therefore, we need to take the normal derivative from the reconstruction polynomial at the boundary Gauss point and equate that to the known normal derivative provided by Neumann boundary condition (3.12 and 3.13): dU dn U, NB d U R K \ x G l , y G l ) ( M f f W y c , ) , . , „ , , — ' -("x)G! + ^ (%JGi = UMB^XG^yGi) dx dy (3.11) (3.12) dURK\xG2,yg2 dx C)G2 + dU[R \ x a 2 , y G 2 dy -{hy)G2 = U N B { X G 2 , V G 2 ) (3.13) 3.2.5 C o n s t r a i n e d Leas t -Square S y s t e m Knowing the constraints and the neighbors involved in the polynomial reconstruction for control volume i , we can set up the least-square system in the form of Eq. (3.14): MiCltl MiClt2 MtC1>3 M ; C M . • MiChq BiC2,\ BiC2:2 BiC2fl BiC2t4 BiC2tq ^,1^1,1 1»itlNit2 Wi,lNit3 Wi,\NiA . • • Wi.iNi^ m,2N2,i Wi,2N2<2 ^1,2^2,3 Wi,2N2,4 • • Wi)2N2tq ^,3-^3,1 Wi,sN3t2 m,3N3)3 Wi,3N3A . • wi<3N3>q Wi,nNn,i Wi,nNn,2 Wi,nNn>3 Wi,nNnt4 . 1Ui,nNn,q XQi - XCJ I Ua \ c dU_ dx dU_ dy d2U OX? d2U dx dKU Bd Wi,iUNl Wi,2&N2 m,3UN3 \ m,nUNn J (3.14) (3.15) CHAPTER 3. RECONSTRUCTION AND MONOTONICITY 41 In E q . (3.14) the first two rows constitute the mean and boundary constraints which have to be enforced exactly. The rest of rows are additional equations arising from integration of the i control volume's polynomial over its neighbors {N\, A ^ , J V 3 , N n } which need to be satisfied in the least-squares sense, minimizing the L2 norm of the system residual. Of course, for interior control volumes there is no boundary constraint and the second row does not appear in the least-square system. A weighting strategy is also considered in forming the least-square system. Weights can be a function of geometric parameters and/or the solution data [10, 57]. The main purpose of these weights is to control the influence of the neighboring solution data on reconstruction. In this research only geometric weighting in the form of E q . (3.15) with t — 1 is used, weakening the effect of the solution data far from the reconstructed control volume by the geometric distance ratio. To satisfy the constraint exactly, Gaussian elimination with column pivoting is performed on (3.14) for as many rows as the number of constraints in the system. The remaining least-square system is reduced to a upper-triangular system using Householder transformation [92] with the proper scaling, and finally the system is solved through backward substitution. 3.3 Accuracy Assessment for a Smooth Function Error norms are the most common measure for accuracy assessment of a numerical solution. They provide useful information about over all accuracy of the numerical solution as well as local source of the (maximum) error. For a 2D finite volume formulation, the general form of the error norm can be computed by Eq . (3.16), where p is the norm index, N^ is the total number of control volumes in the domain, Ai is the area of the control volume i, and Ei is the average error of the control volume i. L\ and L2 are the global norms and are good indicators for over all accuracy. However, LQO would be the largest magnitude of the error in the solution domain and it is a local error indicator. (3.16) (3.17) max Ei CHAPTER 3. RECONSTRUCTION AND MONOTONICITY 42 Having a proper tool for error measurement, now we proceed with the accuracy assessment discussion. As previously mentioned (3.2.2), a K-exact reconstruction reconstructs the solution to the (K-(-l)th order of accuracy. To verify the accuracy of our reconstruction technique, we need to examine it for a smooth function. Here a simple but very common procedure is introduced to assess the accuracy of the numerical solution over discretized domains. Assume a 1-D finite volume problem over a uniformly discretized domain of M l with the length scale of Ax. The average numerical solution for an arbitrary control volume i of M l can be written in the following form: °i]m = 4 z [ UE(x)dx + 0(Ax)K (3.18) where, Ui is the control volume average, Ug(x) is the exact solution of the problem and K is the discretization order (i.e. the polynomial of (K — 1) degree is used in the solution reconstruction). Then the average solution error for each control volume can be expressed as: Therefore a similar relation is held in the case of taking any norm of the error ( | | J E | | P = 1 2 over the same domain: | | £ | M 1 | | = 0 ( A * ) * (3.19) Now, if M2 represents the discretization of the same problem over the similar domain where the discretization length scale is divided by half ((^p)), the error norm on M2 is: | £ ? | M 2 | | = 0 ( ^ (3.20) Y and the ratio of these two error norms would be only a function of the discretization order: E \ m \ \ _ 0 ( A x ) « = 2 K ( 3 2 1 ) \ \ E \ M 2 \ \ - Oft)* To examine the order of discretization accuracy, employing (3.21), some error norms must be plotted in logarithmic scale versus of mesh size, where the mesh length scale is uniformly CHAPTER 3. RECONSTRUCTION AND MONOTONICITY 43 reduced each time by half. The asymptotic slope of the Error-Mesh plot will show the numerical order of accuracy. This analogy can be extended to multiple dimensions as long as the mesh length scale is decreased each time uniformly in each dimension all through the domain. For unstructured meshes and in general case, it is nearly impossible to decrease the mesh length scale uniformly. Even if we try to refine the whole mesh globally in a self similar way (all angles and ratios remain the same in the refinement process) it would not be feasible for all triangles (except for structured triangulation of regular geometries). Therefore in general, the best that we can do is using a series of semi-uniform unstructured meshes where the density of mesh each time is increased by factor of four. Wi th this approach we hope that the over all mesh length scale is reduced by factor of two. Based on the E q . (3.21) the ratios of 4, 8, and 16 for error norms are expected for 2nd, 3rd, and 4th-orders of accuracy should we globally reduce the mesh length scale by half. But of course in practice, we are neither dealing with uniform unstructured meshes nor can we refine them uniformly everywhere, and as a consequence getting the nominal order of accuracy for all norms is not always exactly possible. 3.4 Monotonicity Enforcement First-order upwind methods often exhibit poor resolution, and do not resolve physical phe-nomena in the fluid flow accurately since they introduce considerable amount of numerical diffusion in the solution process. This is due to the nature of these techniques in which the fluxes are computed based on cell average data. Despite the mentioned weakness, they have two great advantages, i.e. first they are monotone and produce non-oscillatory results for whole domain even in the presence of discontinuities and second they are stable and converge very well. High resolution upwind methods (higher than first-order), cure the accuracy problem but they require precautionary measures such as limiting to overcome the oscillatory behavior and stability issue especially for non-smooth flows. 3.4.1 Flux Limiting In central difference schemes, additional terms in the form of artificial viscosity are added to the flux computation such that the accuracy of the solution in the smooth region is not affected, but these terms provide enough diffusivity in the shock region to make it oscillation CHAPTER 3. RECONSTRUCTION AND MONOTONICITY 44 free and stabilize the convergence. One of the most successful effort on this topic was made by Antony Jameson [38] who introduced a second-order differencing term to address the stability problem and a fourth-order differencing term to enhance the convergence of the numerical solution. W i t h this approach the solution in the vicinity of a discontinuity is first order (locally) while the global accuracy is preserved for smooth regions. In principle, high resolution upwind techniques use a similar approach, adopting a low-order resolution near the shock region to take advantage of the monotonicity of the first-order upwind scheme. This idea can be better explained by E q . (3.22), where $( [ / : j) is the flux limiter function and j is the control volume stencil from which U is calculated. In smooth regions, <f>((7 : j) should be very close to one and therefore the flux is computed with higher-order resolution. In contrast, in discontinuous regions $(U : j) must be nearly zero to eliminate the higher-order flux, recovering the monotonic low-order scheme. FH(U : j) = FL(U : j) + ${U : j) [FH(U : j) - FL(U : j)] (3.22) In other words, the goal is to provide just enough diffusion to prevent overshoots. One family of schemes that apply this procedure is the Total Variation Diminishing ( T V D ) schemes [35, 42]. In a T V D scheme, fluxes (F(U)) are limited such that the T V , total variation in the solution over the domain decreases with time in the solution process compared to the total variation of the initial data: TV = I Ju TV(f7) = ^ | , 7 l + 1 - C / t | (3.24) i TV(UN+L) < TV{UN) < TV(U°) (3.25) A l l T V D schemes result h i monotonic (non-oscillatory) solutions and stable convergence, 3.4.2 Slope L i m i t i n g In a structured discretization where there is a clear and straightforward connection be-tween the mesh data structure and discretization scheme, employing flux limiters to achieve non-oscillatory solution is quite common and successful. However, in general, this clear con-nection between mesh and discretization method does not exist anymore for unstructured dU dx dx (3.23) CHAPTER 3. RECONSTRUCTION AND MONOTONICITY 45 meshes and implementing flux limiters are not that straightforward. A t the same time, if the flux function is computed based on Godunov's approach (even for structured discretiza-tion), again determining a proper flux limiter is not an easy job. There is another approach for imposing the T V D condition in the solution process, known as slope or gradient limiting, which is more appropriate for generic mesh discretization schemes and Godunov's approach. In this approach, the computed solution gradient from reconstruction is corrected (reduced) to meet the monotonicity condition defined by E q . (3.28): i Uma.x = ma,x(Ui,UFNj) (3.26) = mm(Ui,UFNj) (3.27) E m i n < Ui{xG,yG) < C/max (3.28). where Ui is the control volume average and U a r e the control volume averages of the first neighbors. This limiting procedure can be better explained by visualizing a linear reconstruction for a simple 1-D case. Fig (3.3-a) illustrates a typical unlimited linear re-construction solution for 1-D control volume averages. Using a slope limiter will correct the overshoots and undershoots in the solution reconstruction by reducing the slope of the reconstruction as shown by Fig (3.3-b). In theory, E q . (3.28) should be valid for all points inside the control volume i, but in practice this condition will be checked and satisfied (if necessary) only for Gauss points where the actual fluxes are computed. Assuming a linear reconstruction (2nd-order method), for the control volume i (Fig(3.4)), the unlimited reconstructed value at the Gauss point G is written in the form of E q . (3.29): UG = U(xc,yc) + VU\c r G (3.29) U(xc, yc) is the value of the flow variable at the cell center, which in this case (2nd-order) is also the average value of the control volume. The V £ / | c computed from the reconstruction procedure needs to be adjusted according to the monotonicity condition (3.28) by a scalar value 4> called as the slope limiter: UG = U{xc, yc) + & VU\C f G (3.30) The goal is to determine the largest acceptable value for <fo in such a way that the computed CHAPTER 3. RECONSTRUCTION AND MONOTONICITY 46 (a) A n unlimited linear, reconstruction \ i-2 i-1 i i+l i+2 (b) A limited linear reconstruction Figure 3 .3 : Typical unlimited/limited linear reconstruction CHAPTER 3. RECONSTRUCTION AND MONOTONICITY 47 Figure 3.4: Using first neighbors for monotonicity enforcement reconstructed value for U at all Gauss points are bounded by the maximum and minimum of the neighboring control volume averages (including the control volume i average): mm 4>G3 = min I 1, 1 , if (UGj -Ui)<0 if (UG. -Ui) = 0 (3.31) <j>i = min(</)G1, <f>G2, •••) (3-32) This limiting procedure was introduced by Barth [13] and such an implementation guaran-tees the monotonicity principle for all Gauss points. The Barth limiter produces a strictly monotonic solution and removes all oscillations; however it has some convergence and accu-racy issues. First, Barth's limiter formulation is clearly not differentiable and this severely hampers the convergence process as limiter values oscillate across the shock. Second, in smooth regions (including the stagnation region), we expect to have some local extrema which will cause the limiter to fire. This reduces the accuracy of the solution in those circumstances. In general, an ideal limiter is differentiable, and acts firmly in the shock region not allowing oscillatory behavior around discontinuities. Such a limiter also should be inactive in smooth regions despite existence of non-monotone solutions resulting from smooth local extrema. CHAPTER 3. RECONSTRUCTION AND MONOTONICITY 48 Venkatakrishnan limiter [86, 87, 4], which is semi-differentiable, nicely addresses most of the aforementioned issues, and it has been employed in this research: A l ) m a x — Umax — Ui , A i ) m j n — Umln — Ui , A2 = UQ ~ Ui (3.33) 0G 1 1 ( A 2 i m a x + £ 2 ) A 2 + 2 A 2 A 1 , n l,max (A? • 2 A | - f A i .max A 2 + e 2 ¥ £ 2 ) A 2 + 2 A 2 A i , r a i n " A i m i n + 2 A 2 + A l , m i n 2 + e 2 if A2 > 0 if A 2 < 0 (3.34) s ign(A 2 ) A 2 = s i g n ( A 2 ) ( | A 2 | + w) (3.35) e2 = ( t f A x ) 3 (3.36) To avoid division by zero or a very small value in E q . (3.34) as a practical measure A 2 is replaced by E q . (3.35) where u> is chosen to be 10~ 1 2 for 64-bit arithmetic computations (according to Ref. [87]). Ax in E q . (3.36) is the mesh length scale and.it can be picked as an average mesh length scale or a local mesh length scale. A local mesh length scale has been used for this research and it is defined as the diameter of the largest circle that may be inscribed into a local control volume. This length scale is proportional to the square root of the control volume area, and as a simplification, we assume the control volumes are equilateral triangles: A - 2 / i f <3-37-> In smooth regions, A i > m a x , A i i m j n , and A 2 all are on the order of (Ax)2 either in constant regions or at extrema. Since e2 is made proportional to ( A x ) 3 , e2 dominates A i m a x , A i i m j „ , and A 2 terms and we recover the scheme without limiting [86, 87]. For instance, near an extremum, (say a max) A i ] m a x = 0, A 2 < 0 therefore 4> —> = 1- For flat solutions again the limiter will not be fired. However in the general smooth cases the limiter value will remain close to 1.0 but not exactly 1.0. In the shock region A i i m a x , A i | i n j n , and A 2 are 0(U) « 1 and they dominate e2 term and the limiter fires as it was originally supposed. The key point here is the value of constant K which is the coefficient of Ax in E q . (3.36). K determines the extent of the CHAPTER 3. RECONSTRUCTION AND MONOTONICITY 49 monotonicity enforcement by setting a threshold below which oscillations in the solution are not limited. A very large value of K essentially means no limiting at all, and it could make the solution process unstable. Normally increasing K up to some value enhances convergence characteristics as long as divergence does not occur. In contrast, a small value for that constant slows or stops the convergence although it produces more monotonic solution. Typical values of K used in literature are 0.3, 1, and 10 and in general the optimal value both for convergence and accuracy purposes is determined based on experience. Having computed the limiter value, we can directly multiply all the derivatives in the higher-order reconstructed polynomial (linear and non-linear) by the limiter value as was described previously for linear reconstruction: UGK)(xG,yG) = U(xc,yc) + > dU dx a2u dx2 d3U Ax2 Ax+ — c dV d2u Ay dx3 Ax3 + dxdy d3U c AxAy + + d2u dy2 Ay2 c c dx2dy Ax2Ay d3U + dxdy2 AxAy2 + d3U c dy3 Ay3 c (3.38) However, our experience as well as other research [22, 24] shows implementing the limiter to all gradients yields a more diffusive solution. The approach of [24, 23] is followed, where the limiter is applied only to the linear part of the reconstruction, and the non-linear part of the reconstruction is dropped. This also helps the limiting process around the shock as higher derivatives tend to show oscillatory behavior in the vicinity of discontinuities. The limited reconstruction polynomial is expressed as: UGK\xG,yG) = Ui+ (I - a)<j>i+ a {Linear Part} + a {Higher-Order Part} (3.39) where a is a limiter for higher-order terms. In smooth regions, the full higher-order recon-struction is applied by choosing o = 1. Near discontinuities we switch from the higher-order to the limited linear polynomial to prevent possible oscillatory behavior of second and third derivatives. This is done by a discontinuity detector employing the value of the limiter; if the limiter fires aggressively we assign zero to a. This idea is also shown for a simplified 1-D case in Fig(3.5) . Using a switch to set a to either zero or one stalls the convergence. To overcome this CHAPTER 3. RECONSTRUCTION AND MONOTONICITY 50 \ \ \ \ i - 2 i - 1 i i+1 i+2 (a) A r unlimited quadratic reconstruction \ \ \ \ i - 2 i - 1 i i + l i+2 (b) A limited quadratic reconstruction Figure 3.5: Typical unlimited/limited quadratic reconstruction CHAPTER 3. RECONSTRUCTION AND MONOTONICITY 51 0.8 0.6 0.4 h 0.2 h 0.2 0.4 4>. 0.6 0.8 Figure 3.6: Denning a as a function of < problem, a is defined as a differentiable function of 0 , such that a is nearly one for the regions that cfr > (fro and it quickly goes toward zero for other values of cfr; making a differentiable greatly enhances convergence. As a differentiable switch, a smooth semi-step function is employed: 1 - tanh(5(^ 0 - <fr)) a = (3.40) S in (3.40) determines the sharpness of the step function, and adjusts how fast the switching transition between zero and one is; (fro defines the limiter value that activates the switch. It appears that (fro = 0.8 and S = 20 provide a reasonable switch function whose good behavior is relatively case independent in this research, Fig(3.6). Chapter 4 Flux Jacobian 4.1 What is the Jacobian ? A n y implicit formulation for solving a P D E includes some sort of Jacobian calculation. We seek a solution vector X = (Xi,X2, X%,Xn) for a coupled non-linear system of algebraic equations defined by: F(X) = 0 (4.1) where, F = (F1,F2,F3,...,Fn) ( 4 . 2 ) The solution can be found via an iterative process (Newton's method): SXi+1 = -F{Xl), Xi+1 =Xl + 5Xi+1 dF_ dX ( 4 . 3 ) Since F is a vector operator, the derivatives of F respect to the vector X are expressed as a matrix called Jacobian matrix (f£), such that the(ff) entry is the derivative of the 5 2 CHAPTER 4. FLUX JACOBIAN j-th component of F respect to the k-th component of X: 53 J2, ,71 OF dX dXk (4.4) J7 n.n Generally speaking, in an implicit formulation such as (4.3) the Jacobian matrix constitutes the coefficient matrix and represents the linearization of a non-linear problem which must be solved through an iterative process. In C F D the non-linear function F, determined by the physics of the problem, is the residual of fluxes for the discretized domain. In other words, we look for a solution vector U (conserved flow variables) which satisfies the flux balance for all control volumes in a meshed domain: Equation (4.3) is similar to the implicit time advance formula, E q . (2.12), except for the diagonal term in (2.12) normalized by a time step (mainly added for stabilization purpose). Equation (2.12) can be regarded as Newton's formula with pseudo time stepping. In either case, the Jacobian matrix is needed not only for forming the linear system but also for building the preconditioner matrix, and it is the most expensive part of the implicit solver. The convergence rate of the implicit solver depends on the accuracy and correctness of the Jacobian matrix, and to achieve the quadratic convergence, we need to employ the true flux Jacobian on the left hand side of E q . (4.3). Since the Jacobian matrix consists of the derivatives of the discretized governing equations with respect to the flow variables used in the local discretization, the Jacobian entries in each row are zero except for those few control volumes that contribute to that flux integral (i.e. union of the reconstruction stencils of the neighbors). The level of difficulty in flux Jacobian computation depends on the flux function (physics), the numerical flux formulation (i.e. how the flux is being computed, such as Roe's flux formula), type of the mesh, number of dimensions, and order of accuracy. For instance, taking the derivative of the first order flux function for the 1-D linear wave equation, E q . (2.15), in finite-volume formulation is fairly easy (|^- = A and gf;Fi = - A ) , since (Flux)i = A(UJ - . But in general, for non-linear flux functions and/or formulations, especially in multi dimensions and with unstructured meshes, the flux Jacobian computation involves a large number of control volumes and much more work per control volume. Consequently, increasing the order of accuracy not only adds to the Res(U) = 0 (4.5) CHAPTER 4. FLUX JACOBIAN 54 complexity of flux Jacobian computation but also makes it considerably costly both in terms of computation time and memory usage. The Jacobian matrix can be built by either analytical or numerical (finite difference) dif-ferentiation; both approaches have been employed quite successfully [84]. Analytical differ-entiation can be performed either manually, symbolically [94] or automatically [78]. The analytical approach computes the exact Jacobian (ignoring round off error in numerical evaluation) but it is relatively difficult to apply this approach for complicated flux functions and/or higher-order discretizations by manual or even symbolic differentiation. Automatic differentiation (AD) , a very powerful tool for computing complex Jacobians, augments the computer program to compute the derivatives by applying the chain rule repeatedly to the elementary arithmetic operations performed by the computer program. The derivatives are computed relatively cheaply by this method and are accurate to machine accuracy without any truncation error. However, A D requires very careful programming and a considerable amount of memory [20], and has not been considered in this research. Numerical differ-entiation is fairly straightforward even for higher-order and complicated flux functions. It also works reasonably well where the flux function is not differentiable, but like any other numerical approach, the numerical differentiation technique suffers from truncation error as well as round off error. For some of the Newton-Krylov solvers like G M R E S , only products of the Jacobian matrix by some vectors are needed, and explicit computation of the Jacobian matrix can be avoided. These products are computed using directional finite differencing. Therefore the higher-order Jacobian of complex C F D problems can be employed on the left hand side of the implicit formula to match the true order of accuracy of the residual on the right hand side, E q . (2.12), without excessive computational effort and memory usage. However, in most cases, an approximate Jacobian matrix is required explicitly for preconditioning of the linear system solver. It is possible to use a simplified or first-order Jacobian (based on direct neighbors and mostly first-order) for preconditioning of the linear system resulting from higher-order discretization [51, 50]. In this chapter, first, the lower-order Euler Jacobian computation procedure is presented for an unstructured mesh stencil based on Roe's flux formula. This flux Jacobian is used for preconditioning of the linear system solver. The analytical and numerical approaches for flux differentiation are discussed step by step. CHAPTER 4. FLUX JACOBIAN 55 Figure 4.1: Schematic of Direct Neighbors 4.2 Flux Jacobian Formulation In the case of the 2D Euler equations discretized over a cell-centered unstructured mesh, each control volume has 3 direct (first) neighbors, and the first-order flux integral of the control volume i o h l y depends on these three neighbors and the control volume itself (Fig(4.1)): Ri= J2 (Fnds)™ = HU,UNl)nlll + F(Ui,UN2)n2l2 + F(Ui,UN (4.6) m=l,2,3 where, nm and lm are the outward unit normal and the length of the face m of the control volume i respectively. To compute the flux Jacobian of the control volume i, we need to take the derivatives of the flux integral or residual function, E q . (4.6), with respect to all control volume averages involved in the residual function evaluation as shown in the following formulas: (4.7) j(i A M _ dRi dF(UuUN2)^ (4.8) / CHAPTER 4. FLUX JACOBIAN 56 J(i,i) = OR, J{i,N3) dF(UuUNl) OR, dF(U{,UN3 dUN3 0UNs -n3<3 ^ dF(Uj,UN2)~ , ^ dF(Uj,UN3)^ . (4.9) (4.10) au du ' du ' • du{ Since both the flux F and solution U are 4-component vectors, each entry in the Jacobian matrix J is a 4 x 4 matrix and the. total-size of the block matrix is n x n-where n is the total number of control volumes as shown in E q . (4.11). However, most of these blocks are just zeros as in general there are no more than three neighboring control volumes in the first-order cell-center formulation and the Jacobian matrix is very sparse, with at most four non-zero blocks per row. J V \x X X X X X X X X X X X X X X X X J n,l V X X X X X X X X X X X X X X X I n.i ( X X X X / X X X X / X X X X X X X X X X X X X X X X X X X X X X X X X X X X \ X X X X ) 1,1 V X X X X ) l,fe \ X X X X I I X X X X \ / X X X X / X X X X X X X X X X X X X X X X X X X X X X X X X X X X \ X X X X / 2,1 I X X X X ) 2,fe I X X X X 1 X X X X X X X X X X X X X X X X I n^n J • (4.11) Figure (4.2) displays typical numbering and the resulting connectivity between triangles in a sample unstructured mesh. For this sample mesh, the only non-zero blocks in the Jacobian matrix corresponding to control volume 42 (row of 42) are (42,38), (42,42), (42,49), and (42,57); for this row, two non-zero blocks are above the diagonal block. For the row 54, the non-zero blocks are (54,43), (54,45), (54,54), and (54,65); this time two non-zero blocks are below the diagonal block. In the next two subsections, the analytic derivation of the first-order flux Jacobian entries CHAPTER 4. 57 Figure 4.2: Typical cell-centered mesh numbering for both interior and boundary fluxes are described in detail. 4 .2 .1 R o e ' s F l u x J a c o b i a n After introducing the structure of the flux Jacobian, we need to compute the derivatives of the fluxes, i.e. Roe's flux function. The flux at the face between cell i and N\ is: F(Ui,UNl) — - (F(Ui) + F(c7 A r 1 )) - A.(UNl-Ui (4.12) The Euler flux formula, F(U), and its derivative with respect to the conserved variables, | ^ were described by E q . (2.2) and E q . (2.16). The A , the Jacobian of the Euler flux evaluated at the Roe's averaged properties, was also introduced by E q . (2.19). The Roe's average formula (Tilde form) for density, velocity components, and total enthalpy are the same as those introduced in section (2.3.2), but this time they have been rearranged to reduce the evaluation cost. R (4.13) CHAPTER 4. FLUX JACOBIAN 58 L = 1 u v Q o —n. n x ut \un\ 0 0 0 y 0 0 0 0 1 _ U I Tlx C 2 C C 2 _ 5 _ c c 2 0 0 c 2 C 2 c 2 c 0 0 0 n x C c _ + -2-c 2 J C (4.14) (4.15) R = l-B-3-i ( - C u n + pQ) 8 c 2 \(nxC-Bv) 8^-v C2 ±(nyC-Pv) \{Cun + PQ) -\(nxC + (lu) -\(nyC + Bv) \B c 2 0 ¥ i (4.16) 0 = 7 - 1 , 7 = g * (4.17) (4.18) PL.+ P (4.19) •PL.^L + PVR PL + P PLhtL + phtR PL+P 1 C = JB(ht - Q) (4.20) (4.21) (4.22) (4.23) -nyii + nxv (4.24) un = nxu + nyv (4.25) CHAPTER 4. FLUX JACOBIAN 59 The flux differencing term in E q . (4.12), A (UNl-Ui) , can be recast in the form of A AU and the full derivative of this term with respect to the solution vector (in general form) is: d ( ^ 1 A U ) d 2 AU + dU dU Now, we take a look at the first term in E q . (4.26) A d{AU) dU (4.26) A 0 i dU AU (L A R^j dU AU = dL dU d A R + L dU R + L dR dU AU (4.27) Differentiation of -gjf- produces some third-rank tensors which not only is difficult to derive but also is quite expensive to compute. Barth [9] has derived the full derivative of 9 ^ with some clever modifications to eliminate the tensor computations, reducing the com-plexity of the Jacobian computation to some degree. Through spectral radius analysis for 1-D flow he showed that for a smooth flow the approximate Jacobian can be accurate up to CFL=1000 or even above. However, for the shock tube problem the difference between the true Jacobian and the approximate Jacobian grows after C F L = 1 0 , and becomes notice-able after C F L = 1 0 0 showing that the approximate Jacobian will not be accurate enough for larger C F L numbers. Looking back at the E q . (4.26), similar physical insight can be d\A~\ concluded. For a smooth flow, the magnitude of -^f-AU with respect to the magnitude of d(AU) -oy— is very small, and the resulting approximate Jacobian will be acceptable. In the case of a flow with discontinuity, this approximation is not accurate anymore because, d\A\ -frf - as well as AU entries would be large enough (at least in some regions of the flow field) and except for very small AU (i.e. smooth regions of the flow field) that approximation will lose its accuracy to some extent. Therefore applying a large C F L number for the so-lution acceleration is not necessarily useful when the overall flow linearization is relatively inaccurate. Ignoring changes in A (treating it as a constant) greatly simplifies the evaluation of the Roe's flux, as well as overall Jacobian computation cost; such a Roe's flux differentiation for the E q . (4.12) with respect to cell i and N\ is called the approximate Roe's flux Jacobian and can be written in the following form: dF(Ui,u dU, Ni ~dF(UNl] dU, Ni (4.28)' CHAPTER 4. FLUX JACOBIAN 60 dF(Uj,UNl) dUi dF(Uj dU + (4.29) The other Jacobian terms in E q . (4.8) through E q . (4.10) can be derived easily by repeating similar differentiation. Because of the difficulties in Jacobian calculation, especially for complicated flux functions such as Roe, several researchers [83, 93]'have successfully employed simpler flux functions like Steger-Warming [77] for linearization purposes-building the Jacobian on the left hand side-where the right hand side-the residual-still is evaluated based on the Roe's flux. A similar approach can be taken for preconditioning of the left hand side where the explicit Jacobian matrix is not required (matrix-free methods), but a preconditioner matrix still is needed to enhance the performance of the linear solver [22, 61]. In either case, due to inconsistency between the left and right hand sides (i.e. linearization and flux evaluation), the resultant convergence rate per iteration will not be as fast as if a consistent Jacobian was employed. 4.2.2 Boundary Flux Jacobians To have a true implicit time-advance and/or Newton formulation, the linearization must be extended to the boundary fluxes. Both the Jacobian matrix and the preconditioner matrix (in our case the first-order Jacobian) need to include derivatives of the boundary fluxes as well. Otherwise, the implicit formulation does not converge as fast as it should and/or some stability issues may well occur if large C F L numbers are attempted [79, 32]. The goal here is to compute the Jacobian matrix of the boundary Euler flux with respect to conservative variables. Since on some occasions boundary conditions (section 2.4) are implemented based on the primitive variables (V) rather than conservative variables (U), it is more convenient to compute the Jacobian of the Euler analytic flux with respect to primitive variables and then convert that Jacobian to the conservative format using the chain rule: 8F_ dV UUn VUn OF dU dFdV WW htun P(7-1) p(un + unx) : PV1lx p{uun + htnx pny puny p(un + vriy) p(vun + htny) 7-1 0 n x ny (4.30) (4.31). CHAPTER 4. FLUX JACOBIAN 61 5V 0 0 0 I i ( U 2 + / ) ( 7 - l ) - u ( 7 - l ) - v ( 7 - l ) ( 7 - 1 ) /9ww n + Pnx pvun + P n y phtun p P U = pu , v = u , F = pv V E P (4.32) (4.33) P I P et = e+ek, e = C „ T = — — , ek = -Cu2-fV), E 1 = pet, h = e+— , ht = h+ek (4.34) P\1 ~ 1) 2 P 4.2.2.1 Wall Boundary Flux Jacobian The wall boundary condition for the first-order Euler flux can be satisfied by setting all normal fluxes equal to zero (un — 0), section 2.4.1, and the wall flux Jacobian is derived by taking the derivative of this simplified wall flux: Fw 0 Pnx Pny 0 (4.35) dF_ W w 0 0 0 0 0 0 0 nx 0 0 0 ny 0 0 0 0 dF_ du w dF_ dV w dv dU (4.36) 4.2.2.2 Subsonic Inlet Flux Jacobian As it was explained in section 2.4.2, total pressure, total temperature, and angle of attack are set at a subsonic inlet as constant inlet conditions and the static pressure is taken from the interior as a parameter. This time it is easier to rewrite the Euler flux in terms of total pressure, total temperature, static pressure, and angle of attack and then proceed CHAPTER 4. FLUX JACOBIAN 62 with taking derivatives. The magnitude velocity, Vm, can be defined as a function of total temperature, and the normal velocity component is described in terms of the magnitude velocity and angle of attack: * u = Vm cos a — v = Kn.sina = 2 Tt - 1 7 - i ; \T 2 \ (Tt 7 - 1 / \T 2 \ (Tt - 1 cos a sin a u„ L V 7 - 1 / \T (cos anx + sin any) 5-i (4.37) (4.38) (4.39) , 7 - 1 / V T Now, we write density in terms of total temperature, total pressure and static pressure: (4.40) ^P (P 9 = 7 P * ( * ) 7 - 1 A n d finally ht is rewritten in terms of total pressure and static pressure: (4.41) (4.42) 7 P ( 7 - 1 ) -i + 2 W 7 - 1 7 - 1 PA — - 1 The Euler flux vector is recast in its new format after substitution of p, u, v, un, ht with their new definitions: F = F i F2 F3 F4 (4.43) 7 ^ 2 TtPt~( \2 J ^pCip(2-3Cx) _ p(2- •2d) (4.44) CHAPTER 4. FLUX JACOBIAN 63 Fo = 7 C 2 cos a TtPt~Ci 2 (pCi p ( l - 2 d ) _ p(l-Cx) B \ 4 + Pnx (4.45) 7 C 2 sin a TtPt~Cl 2 | p C i p ( l - 2 d ) _ p ( l - C i ) ^ Pra,, (4.46) 7 ^ 2 TtP; - C i TtPt~ClP + I ( " p C l p ( l - 2 C i ) _ p ( l - d ) (4.47) /9 = 7 - l 7 C o cos a n x + sin anv (4.48) Total pressure, total temperature and angle of attack for a steady inlet are constant, and where 77 is the inlet variable vector is greatly simplified, E q . (4.49). Applying the chain rule (Eq. (4.50)) and since the only apparent primitive variable in the subsonic inlet flux vector is the static pressure, P , all three first columns of the subsonic inlet flux Jacobian will be zero and only the 4th column needs to be evaluated. W ' 0 0 0 0 " ' Pt' 0 0 0 0 Tt 0 0 0 0 a 0 0 0 1 P (4.49) dF_ dV OF dn (4.50) dF dV Subln 0 0 0 § o o o f 0 0 0 I 0 0 0 I (4.51) dFx ~dP lC2 2 I T t P - C l 2 r 0 p - C i p ( 2 - 3 C i ) _ p ( 2 - 2 C i ) (2 - 3C1) i p d - 3 C i ) _ 2(1 _ CI)PQ-™ (4.52) CHAPTER 4. FLUX JACOBIAN 64 dF2 7C2 cos a ~dP = TtPt-c' P ((1 - 2 C i ) P t C l p - 2 C l - (1 - C i ) P - C l ) (4.53) QF3 ^ ryC2 sin a ~dP = TtPrCl I ((1 - 2C1)P^P-^ - (1 - d ) P - C l ) (4.54) dF4 dP v ^ ^ y p ^ p - ^ - i r PVPTtPt-c^ V2C2CaPtc'P-ci 2[3^TtPt-Cl JPtCl P-Cx - l T f P - C l + (1 - 2 C x ) P t C l p - 2 C l - (1 - d ) P - C l j^p -C i _|_ pC\ p-2C\ _ p-Ci (4.55) Having f ^ | 5 u W n evaluated, the subsonic inlet flux Jacobian with respect to conservative variables can be easily computed: dF dU Subln dE dV Subln dV du (4.56) 4.2.2.3 Subsonic Outlet flux Jacobian For subsonic outlet, section 2.4.2, three flow variables (p,u,v) are taken from interior, and only the static pressure is fixed at the outlet: Fs ubOut — pUUn + Poutnx pvun +' Poutny • pht-outV-n (4.57) , I Pout . 1 / 2 , 2\ (4.58) P ( 7 - 1 ) 2' Since the static pressure at the outlet is treated as a constant, the last column of the outlet 'flux Jacobian is filled with zeros: CHAPTER 4. FLUX JACOBIAN 65 dF dV SubOut U . U U . V U . n n pnx p[un + unx pvnx pny puriy p(un + vny) 0 0 0 (4.59) h t - o u t U n - 1pP°Jli) p(uun + ht_outnx) p(vun + ht_outny) 0 and the rest is trivial: . dF_ dU SubOut 5 F dV SubOut dV dU (4.60) 4.2.2.4 Supersonic Inlet/Outlet Flux Jacobians As it was discussed in-section 2.4.2, we impose all flow variables directly at the supersonic inlet, and their values do not depend on the variables of the inside solution domain. There-. fore the supersonic inlet flux Jacobian is just a block of zero. O n the contrary at the outlet all variables are taken from inside domain and the outlet flux Jacobian respect to primitive variables is the same as E q . (4.31). 4.3 Finite Difference Differentiation The flux Jacobian can be computed via finite differencing removing the issue of analytic dif-ferentiation of the flux function. Finite difference differentiation is a well known technique to compute derivatives of a complex function using perturbations of the independent variables. Equations (4.61) and (4.62) show typical forward and central differencing formulas: F(U + e) — F(U) d2F /£s ' - ) Forward Differencing (4-61) dF_ dU dF dU C D ~ 2e dU3 \ 6 F D ~ e dU2 \2 nU + e)-F{U-e) d*F(e^ ^ The central differencing has smaller truncation error but it comes with the cost penalty of one more function evaluation, making it twice as expensive, compared to the forward differencing. If the function evaluation is expensive, which indeed is the case for residual evaluation in C F D problem, central differencing may not the best choice. CHAPTER 4. FLUX JACOBIAN 66 The accuracy of the differentiation clearly is a function of £ and it appears that smaller e results in smaller error, which is true for truncation error, but choosing a very small value for £ will introduce more noise in numerical differentiation and amplifies the existing error in numerical function evaluation (caused by round off error) since that error is divided by very small number. This can be shown by a simple analysis, where the actual function evaluated by the computer F is replaced by the exact value of F and the associated round off error Er: F(U) = F(U) +-Er(U), F(U ± e) = F(U ±e) + Er(U ± e) (4.63) dF_ dU FD dU + Er(U + e) -Er(U) FD (4.64) 0F_ 3U CD 8F_ du + CD Er(U +£) - Er(U••- e) 2£ (4.65) if (Er)max is an error bound for round off error, the total error in numerical differentiation for forward differencing (FD) and central differencing (CD) can be written as a combination of truncation error and the error caused by round of error known also as cancellation or condition error [29]: Total-FD — d2F dU2 ' £ \ , 2 ( f l r ) : , 2 / £ (4.66) ETI otal-CD — d3F ou3 £2 (ET (4.67) By. examining the E q . (4.66) and E q . (4.67) it is clear that if the magnitude of pertur-bation, £, is a large number the dominant error in numerical differentiation is truncation error. Truncation error decreases linearly for forward differencing and quadratically for central differencing by reducing the perturbation magnitude. However, the condition error increases linearly by decreasing the e . Therefore there is an optimum value for perturbation magnitude which minimizes the total error in numerical differentiation. Fig (4.3) displays the total-relative numerical error in finite difference differentiation versus perturbation magnitude for a sample cubic function, F = X3 + X2 + X +0.1 , at XQ = 0.05. The relative error is computed by: CHAPTER 4. FLUX JACOBIAN 67 10' 1 7 10"'5 10 ' " 10'" 10'", IO'7 10's IO"1 IO'1 e . Figure 4.3: Total numerical error versus perturbation magnitude Error. total-relative F'(X0) - °W. Finite Diff. F'(Xo) (4.68) Clearly, there is an optimal value for e such that the numerical differentiation error is min-imum, in this case e'ss I O - 9 for forward differencing and e « 1 0 - 6 for central differencing. Very small value of perturbation (i.e. e < I O - 9 ) has caused domination of the condition error in such a way that both forward and central differencing essentially produce the same total error. Since the condition error is caused by round off error, the optimum £ should be determined based on the machine precision, EM- The error bound for round off error, (Er)max can be estimated by EM \\F\\- To find the optimum e, derivatives of Eq. (4.66) and Eq. (4.67) are taken respect to e and they are equated with zero: OEr, otal-FD dE d2F du2 2Em \\F\\ (4.69) dE-Total-CD DE d*F du3 £M\\F\\ = 0 (4.70) CHAPTER 4. FLUX JACOBIAN 68 and by solving the above equations for e the optimum perturbation for forward and central differencing are estimated: E o p t p D \ a2F W1 £OptcD ~ 3 £ M H^ l a 3F WP Therefore, in addition to machine accuracy, the optimum perturbation value, depends on the norm (magnitude) of the function F and its higher derivatives, and the type of the numerical differentiation. There is another consideration for choosing the size of the perturbation for numerical differentiation of a higher-order function, which is the grid size or discretization length scale; this will be discussed in chapter 5. Not surprisingly, there are other practical factors in scientific computing for e selection which is beyond the scope of this research. Several researchers [40, 26, 29, 61] have studied the effect of e in numerical differentiation and they provide some practical formulation and guidelines for choosing the best value for 4.4 Numerical Flux Jacobian In numerical Jacobian calculation using the finite difference method, the principle has not changed; we still need to take the derivatives of the residual function for a control volume, E q . (4.6), and compute the flux Jacobian terms through E q . (4.7) to E q . (4.10). However, this time instead of analytic differentiation of the flux function at each face, E q . (4.12), the finite difference technique is applied. The Euler flux function includes 4 equations and 4 primitive variables: At the same time, the flux derivatives with respect to conservative variables are needed, these variables should be perturbed originally. The following procedure demonstrates such a flux Jacobian com-putation for interior faces using forward differencing: CHAPTER 4. FLUX JACOBIAN 69 F7u:r=FluxFunct i on (V) for j=l to 4 { for i=l to 4 { « / [ i ] = e E L j ] [i] f/p[i]=C/[i]+5C/[i] } Vp=ConservativeToPrimitive(f7p) FZm -p=FluxFunction(Vp) for i=l to 4 { DFDU [i] [j] = (Fluxp [i] -Flux [i]) /e y } where Flux is the flux vector, V is the primitive variable vector, U is the consevative variable, subscript of P presents the perturbed vectors, and E(j, i) is the identity matrix: 1 0 0 0 0 1 0 0 0 0 1 0 0 0 0 1 (4.71) For boundary faces, exactly the same procedure is implemented but the boundary condition implementation is also considered both for the perturbed variables and the flux functions: FluxBC=BoundaryFluxFunct ion (Vg) for j=l to 4 { for i=l to 4 { <5t/[i]=eE[j] Ei] £ /p[ i ]= [ / [ i ]+<yt f [i] } Vp=ConservativeToPrimitive(t/p) VpB=BoundaryConditionProcedure(Vp) CHAPTER 4. FLUX JACOBIAN 70 i?/wa;BCp=BoundaryFluxFuiaction(VpB) for i=l to 4 . { . (DFDU)BC [i] [j] = (FluxBCp[i]-FluxBC[i])le } C h a p t e r 5 Linear Solver and Solution Strategy 5.1 Introduction U p to this point, two major elements used in the implicit solver (i.e. residual evaluation and Jacobian computation) have been explained. This chapter describes the iterative process to reach the steady-state solution, including, the linear system solver, preconditioning, start-up process, and over all solution strategy. Linearization of the fluid flow equations over a meshed domain leads to a large sparse linear system, which needs to be solved in multiple iterations. In principle, there are two separate issues here, forming the linear system and solving the linear system. As it was described in section 2.2, it may be useful if the solution process starts from a low C F L number (generally resulting in small solution update) and even low-order linearization, E q . (2.14), especially if the initial condition that the solution starts from is far away from the steady-state solution of the problem. This is particularly true for non-linear problems such as compressible flow, where the large solution update based on high C F L number at the early stage in the solution process does not necessarily help the convergence, and it may slow the convergence or even cause divergence. This fact can be better explained by F ig (5.1) where a sample scalar function is linearized. The linearization of a function F(U) at point A, where we are far away from the solution point C, does not provide us a good slope to estimate the solution point. Also it does not make a meaningful difference if the linearization at this point is exact or approximate, as both of them result in inaccurate solution estimation. Now if the slope 71 CHAPTER 5. LINEAR SOLVER AND SOLUTION STRATEGY 72 Figure 5.1: Linearization of a sample function of function F(U) at point A is used for the solution update, only a small fraction of AU can be applied reasonably. At the same time for small AU, an approximate linearization, in our case low-order linearization, still produces a reasonable solution update. O n the other hand, when we are close enough to the solution point C, the linearization is a fair representative of the non-linear problem and the more accurately this linearization is performed better solution update is expected. Direct methods (Gaussian Elimination and L U factorization) solve the linear system exactly (to within round off error), and they have been applied to C F D problems in early C F D im-plicit solvers [8, 89]. However, the computing cost ( C P U time) and the storage requirement (memory usage) of direct methods for large linear system arising from C F D problems limit their application seriously. Furthermore, quite often the linearization is not exact and we are solving a non-linear problem through multiple linear iterations, so it is not beneficial from the performance point of view to solve the linear system in each iteration exactly. Iterative methods, on the contrary, require far less memory, and can provide a good approx-imation for the solution at a reasonable cost. It should be mentioned that the effectiveness of iterative methods as compared with direct methods for solving linear systems is related to the sparsity of the coefficient matrix. In general, iterative methods are more efficient for large sparse systems. Development of iterative methods for large sparse linear sys-tems has been a very active research area in the field of numerical linear algebra, and the Krylov-subspace family methods have emerged as modern iterative techniques [74]. In these techniques, a very large sparse linear system is reduced to a much smaller system through CHAPTER 5. LINEAR SOLVER AND SOLUTION STRATEGY 73 creation of some subspace (known as Krylov subspace), and then solution of the original large system is approximated using the constructed subspace by satisfying some optimality criterion for the original system. A wide variety of Krylov iterative solvers [81], are in use for various applications; the Generalized Minimal Residual ( G M R E S ) algorithm [72], which is developed mainly for non-symmetric matrices, is employed in this research for the following reasons: 1. The linear system arising from unstructured discretization is asymmetric both in terms of the matrix structure and the values of entries. 2. T h e G M R E S algorithm only needs matrix-vector multiplication removing the issue of high-order Jacobian matrix computation. 3. G M R E S minimizes the residual of the linearized system implying that if we are close enough to the solution and linearization of the non-linear system is accurate then G M R E S computes the best update for solution at each iteration, making it a very suitable choice for Newton iteration. It should be noted that the G M R E S algorithm has been applied quite successfully and extensively in implicit C F D solvers since the early 90's, and it is a well known linear solver for C F D applications providing us a great deal of experience both in solver and preconditioning implementation. Preconditioning is another important issue in linear system solving since no iterative method for general large linear systems demonstrates suitable performance without proper precon-ditioning. In principle, preconditioning transforms the original linear system to a modified linear system making it easier to solve by an iterative technique. 5.2 G M R E S Linear Solver Like other modern iterative techniques G M R E S uses a projection process' to extract an approximation to the linear system solution from a subspace. Assume A is a real nxn matrix and x is a solution vector in 5ftn for a linear system of: Ax= b (5.1) CHAPTER 5. LINEAR SOLVER AND SOLUTION STRATEGY 74 A n approximate solution of the above system is found within a subspace IC of 3?™. The size of the subspace i s m « n, and generally m constraints in the form of orthogonal conditions (minimization problem) have to be imposed to make the best approximate solution. A logical candidate for imposing orthogonality condition is the residual vector: r = b - Ax (5.2) The residual vector is considered to be orthogonal to another subspace £ with the dimension m. In other words, we are looking for an approximate solution x £ IC with the constraint of r J_ C. It can be shown that if the subspace C is chosen in the form of C = AK, (oblique projection), imposing such an orthogonality condition minimizes .the l2 norm of the residual vector over the affine space of xo + IC, where XQ is the initial guess and the starting vector [74]-5.2.1 The Basic G M R E S Algorithm For a linear system of Ax = b, G M R E S [72] seeks an approximate solution xk in the form of xk = XQ + Zk , where XQ is the initial solution vector, and zk is the member of the Krylov subspace (search directions), ICk = span {ro, ATQ, A2ro, ..., AK~1TQ}. Here, r$ — b — Axo is the initial residual which minimizes the l2 norm of the residual vector. A subspace Vk = {v\, v2, ..., Vk} is constructed using Arnoldi's algorithm (and applying Gram-Schmidt method) for computing the l2 — orthonormal basis of the Krylov subspace, where v\ = ro/ ||ro||- T h e residual vector at the kth iteration, rk , should be l2 — orthogonal to Kk'-GMRES: 1. Use i n i t i a l guess x$ and compute ro = b — AXQ; normalize the i n i t i a l residual vi = ro/ \\ro\\ • 2. Build the orthonormal basis (Arnoldi's algorithm): f or j=l ,2, . . . ,k u n t i l satisfied (\\b - Axk\\ <tolerance) do: hitj = (Avj,Vi), 1=1,2,'. .. ,j , vj+x = AVJ - J23i=i havi > hj+ij = \\VJ+X\\, i f hj+ij = 0 stop Vj+i = Vj+i/hj+u, 3.Form the approximate solution: xk — XQ + Vkyk such that xk minimizes ||6 — j4xfc| | . CHAPTER 5. LINEAR SOLVER AND SOLUTION STRATEGY 75 After A; iteration in Arnoldi's process, the l2 — orthonormal subspace Vk+i and a (k + 1) x k matrix Hk are formed. Hk is the upper Hessenberg matrix Hk whose only nonzero entries are the h-ij described above, except for an additional row with the nonzero element of h k + i t k . The following important relation will hold: AVk = V k + 1 H k ( 5 . 3 ) For minimization, the update xk — xn + z should satisfy the least-squares problem: min ||6 - A(x0 + z)\\ = min ||r0 - Az\\, z G K-k , ( 5 . 4 ) z z By setting z = Vky, rn = Bv\, and 8 = ||r'o||, we have: | | r 0 - Az\\ = \\Bvx - AVky\\ = \\Vk+l(Bei -Hky)\\ (5-5) where, e\ is the first column of the (k + 1) x (k + 1) identity matrix. Since Vk+\ is an orthonormal set, the least-squares problem is reduced to: • min ||'/3ei - Hky\\ , y e K f c (5.6) Considering that this is a minimization problem, E q . (5.4), the residual at each iteration should be smaller or at least equal to the residual at the previous iteration, i.e. |J7"fc|| < | |rfc + i | | . Therefore the residual decreases in each iteration or at worst stalls. The algorithm terminates after n iteration steps in the absence of the round of error, meaning that the exact solution is computed if the size of the subspace is chosen equal to the size of the system. The iterations or steps taken in building the subspace or search directions in part 2 of the G M R E S algorithm are called inner iterations. Since by increasing the search directions or subspace size the cost of the algorithm rises linearly for memory usage and quadratically for computing time, the Krylov subspace size for large matrices is limited to m <C n as was mentioned earlier. In practice, we hope that by picking a small subspace size, a sufficiently accurate solution is computed if that does not happen we can restart the algorithm after m steps to limit the cost of the algorithm. However, there could be some occasions when the convergence criteria for the linear system is not satisfied without excessive number of restarts. In that case, the user normally limits the number of inner iterations, and/or restart number, and moves on-to the next non-linear iteration with whatever solution update achieved in the linear solver. The restarted version of the algorithm, G M R E S ( m ) , is explained in the following lines: CHAPTER 5. LINEAR SOLVER AND SOLUTION STRATEGY 76 GMRES(m): 1. Use i n i t i a l guess XQ and compute ro = b— AXQ; normalize the i n i t i a l residual v\ = ro/ ||ro||. 2 . Bui ld the orthonormal basis (Arnoldi's algorithm): for j = l , 2 , . . . , m do: hij - (Avj,Vi), i= l ,2 j , Vj+i = AVJ - YZ=i hijVi, .hj+u = \\vj+i\\, If hj+ij = 0 stop v i + \ = vj+i/hj+ij, 3.Form the approximate solution: xm = x0 + Vmym where ym minimizes f(y) = ||/3ex - Hky\\ , y G 3?fc . 4-Restart: Compute rm = b — Axm; i f sat i s f ied (||rm | |<tolerance) then stop else compute XQ :=xm, v\ '•= rm/\\rm\\ eind go to 2. The only possibility for breaking down of the G M R E S algorithm is in the Arnoldi's part where the orthogonalization process is performed. If hj+\j becomes zero the algorithm stops and the solution at the last iteration is returned. This happens only when the residual is zero, which means' reaching the exact solution. The least-squares solution in step 3 can be carried out via classical Q R factorization of H k using plane rotations. As it was suggested in the original article [72], it is preferable to update the factorization of Hk progressively as each column appears at every step of the orthogonalization process. This enables us to compute the residual norm of the approximate solution without computing xk and to terminate the algorithm whenever the convergence criterion for the linear system is satisfied without further operations. This also means the residual of r m = b — Axm does not need to be computed explicitly saving us the cost of one matrix-vector multiplication since the magnitude of the ||/?ei — ii/fc2/||and \\rm\\ are the same. Depending on the convergence criteria, the G M R E S algorithm may be completed with the inner iteration number smaller than the maximum subspace size (m), equal to the number of subspace size, or larger than the number of subspace size. For instance if one restart is allowed in the algorithm and the maximum subspace size of 30 is used, assuming that the convergence criterion in part 4 is not met within inner iterations, then 60 inner iterations are performed before completion of the algorithm. CHAPTER 5. LINEAR SOLVER AND SOLUTION STRATEGY. 77 5.2.2 Matrix-Vector Products Computation in G M R E S Considering the complexity of the higher-order discretization method used in this research, Chapter 3, computing the higher-order Jacobian matrix is a highly expensive task in terms of memory usage and computation cost.' Even if memory were not an issue, computing r a higher-order Jacobian would be quite expensive in terms of C P U time no matter what approach in Jacobian calculation is taken, Chapter 4. Therefore, a linear solver in which the explicit Jacobian computation is not required, such as G M R E S , is an attractive candidate for a higher-order implicit solver. Looking back at section 5.2.1, it is clear that in Arnoldi's algorithm, only matrix-vector products are needed, and these products can be easily approximated through the (forward) finite difference tech-nique: F(U + 8U)=F(U) + ^5U + 0(5U)2 (5.7) By choosing 8U = ev (e is a small number), E q . (5.7) can be rewritten: A F(U + ev) = F{U) + — ev + 0{ev)2 (5.8) and since e is a scalar ( v is a vector), the matrix vector product, Av , is evaluated by: dF .F(U + ev)~F(U) ft. A v = W V ~ - ^ — - e — . ( 5 - 9 ) E q . (5.9) is a first-order approximation, and a more accurate approximation can be achieved employing central differencing approach with the extra cost of one more flux (residual) evaluation per matrix-vector product (section 4.3). If forward differencing is employed for evaluation of matrix-vector product, for any inner iteration in the G M R E S algorithm one flux evaluation is needed. y So far no consideration has been made regarding the discretization order and e, but perhaps the choice of e can dramatically affect the correctness of the higher-order Jacobian matrix-vector multiplications, even beyond the consideration of section 4.3. The issue here is how much the higher-order terms in the reconstruction polynomial are affected by a small perturbation in the flow field. In other words, do higher-order terms remain arithmetically significant or measurable after a perturbation of size 6U on a discretized domain with the length scale of Ax ? To address this question we need to go back to Chapter 3. The reconstruction procedure led to.solving a least-squares problem, E q . (3.14), that can CHAPTER 5. LINEAR SOLVER AND SOLUTION STRATEGY 78 be simplified in the form of E q . (5.10). • MD = U (5.10) where M is the coefficient matrix containing the moment information of the reconstruction stencil, D is the solution vector including the derivatives of the reconstruction polynomial, and U is the control volume average vector for the reconstruction. In.Eq. (5.10), matrix M is purely geometric and it is not solution dependent, therefore by changing U to U + 8U, M will not be affected. Now we would like to know how much the derivatives are perturbed by a change in control volume averages: M(D+ 6D) = U+ 5U = MD+ 8U (5.11) and MSD = 5U or 5D — M~l5U ' (5.12) By taking the norms of E q . (5.10) and E q . (5.12), these two norm inequalities are found: \\U\\ < \\M\\ \\D\\ , I I ^ H l l M - 1 ! ! ^ ! ! (5.13) and finally after multiplication of these two norm inequalities, the relative change in the derivatives can be expressed in terms of the relative perturbation in U as: i i » i i < i i M - . n , M „ m ( , 1 4 ) K(M) In E q . (5.14), K(M) is called the condition number of the matrix M and it is evident that the change in the derivatives is a direct function of condition number of the coefficient matrix. The matrix M is relatively small and its entries are only functions of mesh geometry. In our case, the quality of the employed isotropic mesh is guaranteed by the mesh software (GRUMMP[59] , [19]), so the condition number of the matrix M is expected to be fairly small (O(10) for typical employed meshes). Therefore changes in the derivatives are of the same order (or at most one order larger) as the perturbations in the solution. Consider a simplified 1-D cubic reconstruction polynomial which leads to the 4th-order discretization for a mesh with the uniform length scale of A x , E q . (5.15). U is perturbed by 5U and the perturbed reconstruction polynomial can be expressed by E q . (5.16) where CHAPTER 5. LINEAR SOLVER AND SOLUTION STRATEGY the superscript p shows the perturbed coefficients: f' F(U) = a0 + ai(Ax) + a2(Ax)2 + a3{Ax)3 79 (5.15) F{U + SU) = ap0 + ap(Ax)+ap(Ax)2 + 4(Ax)3 . (5.16) Now we study the difference between the original reconstruction and the perturbed one: F(U + SU) - F(U) = (ap0 - a 0 ) + (ap - ax)(Ax) + (ap - a2)(Ax)2 + (ap3 - a 3 ) ( A x ) 3 (5.17) In order to have a correct higher-order Jacobian in the matrix-vector product computed by E q . (5.9), all terms in E q . (5.17) are required to be arithmetically significant; otherwise, the overall accuracy of the linearization would be reduced and the convergence rate is adversely affected. Of course the highest order term, in our case, (ap — a3)(Ax)3 is the main issue and consequently: ll°3 ~~ °3ll ( A a ; ) 3 ^ £M , £M '• machine precision , (5.18) Using E q . (5.14), and assuming SU = ev , the bound .'of the highest derivative for the perturbation is estimated, Eq.(5.19). Combining this bound with E q . (5.18) determines the minimum magnitude of e for keeping the higher-order order terms significant in the finite difference matrix-vector multiplication, E q . (5.20). < K ( M ) T F 7 T T (5-19) M - v J\\u\\ £-\\a3\\(Ax)3K(M)\\v\\ ( 5 - 2 0 ) Further simplification can be made if || a 3y^(M) in E q . (5.20) is assumed to be 0(1). e> / A £ * f „ „ (5.21) ( A x ) 3 H Although this result has been obtained with several simplifying assumptions, it gives us a very useful insight about the reasonable magnitude of e for proper finite difference matrix-vector product computation. In the case of the standard G M R E S algorithm, ||t>|| = 1 due to the normalization process, and e can be found based on the geometric length scale. For CHAPTER 5. LINEAR SOLVER AND SOLUTION STRATEGY 80 a non uniform mesh, Ax obviously is chosen based on the smallest length scale, and as an example if Aa; = O(10~ 4 ) , and EM = O ( 1 0 - 1 6 ) then choosing e « O ( 1 0 - 4 ) or larger, would keep higher-order terms significant. It should be noted that in the presence of the limiter using a large e or perturbation often causes limiter firing in residual evaluation leading to convergence problems. Therefore using a large perturbation for flows with an active limiter is not suggested. 5.2.3 Preconditioning Convergence of iterative techniques, including Krylov subspace methods, is highly depen-dent on the conditioning of the linear system, i.e. the Jacobian matrix. Using a higher-order discretization introduces more off-diagonal entries and increases the bandwidth of the Ja-cobian matrix considerably. In addition, in the case of Euler equations (compressible flow), with a non-linear flux function and possible discontinuities in the solution, the Jacobian matrix is off-diagonally dominant. A l l these factors lead to poor convergence of the linear solver, and consequently slowing or stalling the solution process of the non-linear problem. To remove this obstacle and enhance the performance of the linear solver, the linear system needs to be modified through a process called preconditioning, such that the preconditioned system has better spectral properties and clustered eigenvalues. Although in the case of the G M R E S method, eigenvalues may not describe the convergence of a nonsymmetric ma-trix iterations as much as they do for symmetric iterative solvers such as the Conjugate Gradient (CG) method, a clustered spectrum (not too close to zero) normally improves.the convergence characteristics of the linear solver [16]. Consider a preconditioner M as a nonsingular matrix which approximates the Jacobian matrix A. Equation (5.1) then can be modified by multiplying by M~l on both sides: M~1Ax = M~lb (5.22) If M _ 1 is a good approximation to A~l , M~lA becomes close to the identity matrix, increasing the performance of the linear solver through eigenvalue clustering around unity. Equation (5.22) is called left preconditioning, which affects not only the matrix operator but also vector b on the right hand side. It is also possible to introduce the preconditioner operator in the right side of the matrix A, and leave the right hand intact: 2 AM~l(Mx) = b, x = M~lz (5.23) CHAPTER 5. LINEAR SOLVER AND SOLUTION STRATEGY 81 This is called right preconditioning. If the G M R E S algorithm is used with left precondi-tioning, M~l(b — Axk) is minimized instead of 6 — Axk , resulting in a different stopping criterion (residual norm) for the algorithm. However, right preconditioning, still minimizes the residual norm of the original system [73, 81]. Of course the best apparent choice for M is A, as AM~l = / , but if we could solve the system with the A matrix easily we never needed preconditioning in the first place. Finding the optimal preconditioner matrix is not unique, since it is highly problem dependent, and also depends on how the preconditioner is applied. Also there are circumstances where the matrix M need not be computed explicitly and a preconditioner operator replaces M. , like polynomial preconditioners [81]. But in general three factors are considered in choosing a preconditioner: 1. M is a reasonably good approximation to the coefficient matrix A. 2. Af is better conditioned, more narrow banded, and less expensive to build compared to matrix A. 3. The system Mx = z should be much easier to solve than Ax = b. The bottom line is the cost of the construction and applying the preconditioner should be relatively cheap with respect to the cost of the original linear system solving. For effective preconditioning, in addition to applying a good preconditioner matrix, we need to employ a good preconditioning technique. A preconditioning technique is a sparse linear solver itself, and it can be a direct or iterative method. Iterative stationary methods such as successive over-relaxation (SOR) rely on the fixed iteration E q . (5.24), where B is the fixed iteration matrix and C is a fixed vector: xi+\ = Bxi + c (5-24) Stationary methods are simple and easy to implement, often have parallelization advan-tages, are low cost (memory and C P U time), and finally they are effective in damping high frequency errors. However, they often have a restrictive stability condition, reduc-ing the benefits of Newton method. This is especially true if the preconditioner matrix is off-diagonal which is the case for compressible flow. A t the same time, for relatively large systems slow damping of the low frequency errors becomes a noticeable issue for these techniques in the absence of a multigrid augmentation strategy. CHAPTER 5. LINEAR SOLVER AND SOLUTION STRATEGY 82 Preconditioning Ratio of non-zero elements Factored Mat . /Orig inal Mat. ILU-0 1 ILU-1 1.25 ILU-2 1.5 ILU-3 1.8 ILU-4 2.1 , L U 7.2 Table 5.1: Ratio of non-zero elements in factorized matrix Another preconditioning technique is the Incomplete Lower-Upper Factorization (ILU) method. Factoring the matrix M into two triangular matrices, M = LU, where L is a lower triangular matrix, and U is an upper triangular matrix normally results in factored matrices that are far less sparse than the original matrix M. As a consequence, a con-siderable amount of memory needs to be assigned for factorization, and the factorization process is quite expensive. One solution to this problem is incomplete factorization in which additional nondiagonal fill-in entries are allowed only for a predefined set of locations in the L U factorizations. In other words, we choose a non-zero pattern in advance for the elements of the factored matrices [74]. This is called I L U - P where P is the fill-level in the factorized matrix. P equal to zero means no fill is permitted during I L U decomposition. In ILU-0 the factorized matrix and the original (non-factored) matrix have the same graph or non-zero element locations. Choosing P larger than zero allows some additional fill-in in the factored matrix improving the accuracy of factorization and the preconditioning quality. However, increasing the fill-level comes at the expense of memory usage and extra computing cost, imposing a restriction in increasing fill-level in practice. Table 5.1 shows the number of non-zero elements in the factored matrix where the original preconditioning matrix is a first-order Jacobian [52], For instance ILU-2 needs 50% and ILU-4 requires a little bit more than 100% extra memory for storing the decomposed matrix in comparison with the orig-inal matrix graph. Also L U will be infeasible in practical applications because of its huge memory requirements. Both the preconditioner matrix and preconditioning technique may be developed based on the specific problem approach, but first one should have a detailed knowledge about the numerical aspects of the problem which is not always possible; second, generally speaking, such a preconditioning procedure is sensitive to the type of the.problem. Application of general preconditioning techniques such as I L U is more common for compressible C F D solvers. Another modification in the I L U family is ILU(P ,r ) , where in addition to the static non-zero pattern, a tolerance r is added as a dropping criterion for entries of the CHAPTER 5. LINEAR SOLVER AND SOLUTION STRATEGY 83 factored matrices [16]. In other words, all fill-in entries within the level P are changed to zero if they do not satisfy some set of tolerance condition. Since the proper fill-level and tolerance criterion in ILU(P,r ) for efficient preconditioning of the compressible flows are highly dependent on the test case, and they are not determined uniquely, ILU(P ,r ) is not considered in this research. Several researchers have implemented I L U methods for preconditioning of the G M R E S linear solver for compressible fluid flows [91, 14, 24, 18, 45]. Their results show that I L U with some additional fill-in (i.e. ILU-1&2) is a reliable and robust preconditioning strategy for a variety of test cases, while ILU-0 fails to provide fast convergence in some cases. Reordering is another important factor in I L U factorization. Reordering is designed to reduce the bandwidth of a matrix. Smaller bandwidth leads to a sparser matrix, and consequently during factorization (Gaussian elimination process) of the matrix fewer fill-in entries will appear. Knowing that, increasing the fill-level in I L U preconditioning has its own disadvantages, reordering the original preconditioner matrix becomes essential to keep the accuracy of preconditioning for a low fill-level factorization. Experience shows that quite often a non-reordered preconditioner matrix with low fill-level works poorly, while the same fill-level factorization performs quite well, when the original matrix is reordered. The most common available reordering technique is Reverse Cuthi l l -McKee ( R C M ) [74] which has been successfully used for I L U - P preconditioning. In this thesis research, the G M R E S linear solver including the I L U - P factorization is the employed Petsc library developed by Argonne National Laboratory [1]. 5.2.4 G M R E S w i t h R i g h t P r e c o n d i t i o n i n g In right preconditioning AM'~lz = b is solved where, z — Mx . Like the standard G M R E S , in the right preconditioned G M R E S , the objective is minimizing the residual r = b — Ax. Although the initial residual is ro = b - AM~1ZQ , ZQ need not to be computed explicitly and all the elements of the Krylov subspace are created without referring to z. Right P r e c o n d i t i o n e d GMRES(m): 1. Use i n i t i a l guess XQ and compute ro = b— Axo) normal ize the i n i t i a l r e s i d u a l v\ — ro / ||ro||. 2. B u i l d the orthonormal b a s i s ( A r n o l d i ' s a l g o r i t h m ) : f o r j = l , 2 , . . , , m do: CHAPTER 5. LINEAR SOLVER AND SOLUTION STRATEGY & compute Wj = AM_1Vj hiJ = iWjivi), i = l , 2 , . . . , j . . hj+\j = If ^ i + i j = 0 stop V j + i = w j + 1 / h j + l i j , 3. Form the approximate solution: x m = x0 + M~1Vmym where ym minimizes f(y) = \\6e1 - Hky\\ ,y e 5RFC . 4 . Restart: Compute r m = b — Axm; i f sat is f ied (||rm | |<tolerance) then stop else compute xn :—xm, V\ := r m / \\rm\\ and go to 2. Applying the preconditioner, the Krylov subspace is right preconditioned orthogonal basis: /C = {r0, AM~lrQ, (AM~l)m~lr0} (5.25) This time the matrix-vector, operator is AM~lv = Az , where z — M~lv is the precondi-tioned vector. Nielsen et. al. [55] has suggested to scale e in E q . (5.9) by R M S of vector' z to improve the convergence, since using a constant e could result in convergence stall after couple of orders of the (non-linear) residual reduction: _F{U + ez)-F[U) . e0 A Z ~ e ' S-RMS(z) ( 5 - 2 b j where £o is a small scalar usually in the order of v / £ J W (£M- machine accuracy). However, as it was described in section 5.2.2, the small scalar value may be taken some orders of magnitude larger than ^JEM due to mesh length scale issues. 5.3 Solution Strategy The goal of all C F D solver developers is reducing the cost of computation as well as im-proving'the accuracy of the solution. T h e latter could be achieved by applying higher-order discretization methods. However, in a practical sense that is possible if and only if the cost of the computation is comparable to the cost of the common second-order methods. That is why implicit techniques (Newton family of methods) play a crucial role in achieving that goal since the best explicit techniques are far less efficient with respect to computation cost making higher-order computation quite uncompetitive. In addition to the computation cost, CHAPTER 5. LINEAR SOLVER AND SOLUTION STRATEGY 85 i.e. C P U time, memory usage is also an issue for solver developers. But in most cases, C P U time is the main concern since if storing a set of data (i.e. Jacobian matrix) is expensive, then computing that data would be expensive too except in the case that the expensive computation is performed once and the resultant work is used in multiple iterations. Knowing the over all- approach, which is the implicit formulation, and the main objective which is the steady-state solution, we would like to lay out an efficient strategy to reach the steady state solution as fast as possible. The overall computing cost for a steady-state C F D problem by an implicit approach, can be analyzed by dividing the solution process into three major parts: 1. Forming the linear system including linearization and the non-linear residual compu-tation. 2. Solving the linear system including preconditioning. 3. Number of implicit iterations needed for steady-state convergence. It is obvious that we would like to minimize the cost of the first and second part of the solution process and reach the steady-state solution with minimum number of iterations. The third part has a cumulative effect on the overall solution cost since in each (non-linear) iteration the first two parts are covered once already. Most C F D problems, including compressible flows, are highly non-linear, and that makes them not too easy to solve via linearization. In addition to non-linearity in the mathematical sense, the behavior of a compressible flow (the physics of the flow) can dramatically change in some flow conditions for a fixed geometry. Therefore if the solution process is started from an arbitrary initial condition which is not reasonably close to the steady-state solution, large updates to the solution based on linearization are not likely to be helpful. Also, Newton's method is well known to converge to the solution quadratically if started from the vicinity of the solution; otherwise it will diverge or stall quickly. Since finding a good approximate solution in general (that is except the cases for which some close solution is available through analytical or experimental data) is impossible without actually starting to solve the problem, the solution process should consist of two phases:' 1 . Start-up phase 2. Newton phase CHAPTER 5. LINEAR SOLVER AND SOLUTION STRATEGY 86 In the start-up phase, multiple linearizations with small or moderate solution updates are required to advance the solution to the point that the linearization is accurate enough for finding the steady-state solution, i.e. a good approximate solution as initial starting point for Newton phase. Having a good initial guess, the solution process is switched to the Newton iteration where superlinear or quadratic convergence is possible by taking a very large (or infinite) time step. 5.3.1 Start-up Phase Finding a good initial guess or reasonable approximate solution is about knowing the physics of the problem and finding a proper way to get to that good initial solution faster by employing various techniques such as mesh sequencing, multigrid, mixed explicit/implicit iterations, exact solution of a simplified problem, potential flow solution, and so on. Hence, like preconditioning, start-up is problem dependent and is more an art rather than an exact science, implying that the proper start-up process is not unique either. In this research a proper start-up process for compressible flows is suggested which is based on the defect correction procedure [46]. In the start-up process, considering the fact that multiple iterations are required for advanc-ing the solution toward a good initial, guess, an implicit iterative process is used where the linearization is performed based on the inexpensive first-order discretization, section(4.2), and the flux calculation remains higher-order: A relaxation factor, u>, is applied for the solution update; to = 1 results in standard defect correction, while using different values for u> (typically ui = 0.9-1.3) can over-relax or under-relax the update [52].. Over-relaxation often is helpful in subsonic flows where it accelerates the solution while under-relaxation can prevent divergence resulting from an inaccurate solution update, for instance in transonic cases. The Jacobian matrix on the left hand side is the first-order Jacobian and its computation cost is approximately the same as the cost of the second-order residual calculation. The cost of one Jacobian evaluation is 0.6-0.7 of the cost of a second-order residual evaluation (limiting cost excluded) for the approximate analytical Jacobian and it is 1.3-1.5 of the computation cost of the same residual evaluation if the finite-difference Jacobian is employed (for moderate mesh sizes). (5.27) CHAPTER 5. LINEAR SOLVER AND SOLUTION STRATEGY 87 To reduce the cost of forming the linear system, the Jacobian matrix may be computed and stored for several iterations [39]. Wi th this approach the Jacobian is updated every j iteration (s) (typically j = 1-4) [52]. The time step At , which is actually scaled by the volume (area) of each control volume, is based on the C F L number times maximum allowable explicit time step: ' A t l = C F L A ^-max ( 5 2 g ) Ai A ' A ^ _ m a x = -j T ' ^max : Maximum wave speed (5.29) fey, A m a x « s As was noted before and also has been graphically illustrated for a sample non-linear func-tion, F ig (5.1), using the slope of the residual function for estimating the behavior of the residual function over a large span of the independent variables is not accurate in the early stage of the solution process. Therefore it makes sense to start from a relatively small A i or C F L , and to gradually increase C F L to some modest number as we are getting closer to the solution of the residual function, i.e. F(U) = 0. This justifies applying the first-order linearization in the start-up process instead of the correct order Jacobian calculation; the linearization does not allow large steps towards the steady-state solution, so why should we spend too much work to compute it, especially if implementation of Jacobian in the linear solver requires several matrix-vector multiplications? A t the same time, solving a linear system based on the first-order Jacobian is much easier than doing so for higher-order Jaco-bian (even for the second-order one), because of the structure of the higher-order Jacobian matrix. Therefore we are better off to form and solve a less expensive system, since we must inevitably perform several iterations at the start-up phase. The next step is solving the linear system. Since we are using an approximate linearization, it does not make sense to solve the approximate linear system exactly, therefore the linear system in each defect correction iteration, referred to as pre-iteration from now on, is solved approximately. T h e goal of each pre-iteration is to reduce the non-linear higher-order resid-ual by cheap linearization. Consequently it is logical to solve the linear system up to some fraction of the non-linear residual, Le. tolerance=C x Res(U), C < 1 . Employing this approach saves us from spending too much effort on finding a precise solution to a linear system that we know will not give us the correct solution to the non-linear problem. The right-preconditioned G M R E S ( m ) algorithm is used with a limited subspace size (K=15-20) and relatively a loose tolerance (0.1-0.05 Res(U)) is set for solving the linear system. No restart is allowed, and if the tolerance is not reached within the maximum search directions (subspace size) G M R E S is terminated anyway and the computed solution update up to that CHAPTER 5. LINEAR SOLVER AND SOLUTION STRATEGY 8 8 point is taken to move to another non-linear outer iteration. The preconditioner matrix is the same as the linear system matrix which is the perfect choice for preconditioning because the first-order Jacobian has been built already for forming the linear system, and it is the most accurate preconditioner matrix for that case. To reduce the preconditioning cost, a low fill-level incomplete factorization, I L U ( l ) , is applied, which is accurate and efficient enough for this first-order linear system. Now the question is how close we should be to the solution before switching to Newton iterations. Normally the decrease of some norm of the non-linear residual at the current iteration compared to the initial residual, > * s chosen for that purpose. That is, if the solution reaches a specific relative norm of the non-linear residual, then the linearization is accurate enough to take a very large time step at each iteration. It should be noted that the value for this criterion varies from problem to problem. However, it is possible to find reasonable values for different categories of the compressible flows (i.e. subsonic, supersonic and transonic). More detail is provided in that regard in the results chapter. 5.3.2 Newton Phase B y this stage most of the transient behaviors of the non-linear C F D problem have been removed from the flow field solution, and major steady features of the fluid flow have appeared in the solution domain. Consequently the linearization of the non-linear residual is fairly accurate for seeking the steady-state solution, and taking infinite or very large time steps accelerates the solution toward quadratic or superlinear convergence, i.e. switching to the Newton iteration formula: OR W AUn+i = _RHigh(ir), Un+1 = Un + AUn+x (5.30) High To have an accurate linearization, the higher-order Jacobian must be applied during the. Newton iterations, E q . (5.30). This has been implemented through matrix-free G M R E S , using finite difference directional matrix-vector multiplication technique. The linear system is right preconditioned by the first-order Jacobian, which indeed is essential due to bad conditioning of the left hand side. ILU(P=2-4) is used for preconditioning and normally for higher-order computation (especially the fourth-order transonic flow), increasing the fill-level is considerably beneficial [52]. A fixed number of search directions is employed (K=30) and it is important to keep them limited since the matrix-free G M R E S performs multiple perturbations of the non-linear residual which is very expensive for higher-order residual computation. The linear system is again solved approximately but this time with a tighter CHAPTER 5. LINEAR SOLVER AND SOLUTION STRATEGY 89 tolerance (10 - 2Res(U)) as an accurate update is required. No restart is allowed, and if the tolerance is not reached, the next outer iteration starts with the best update from the last inner G M R E S iteration. Approximately solving the linear system in this way is called the Inexact Newton method, and although it can increase the non-linear outer iteration number, it reduces the over all C P U time as considerable computation time is saved by not computing the exact solution of the linear system [64, 18, 45, 51]. Notice that the left hand side is not quite exact because of the truncation error in the matrix-vector multiplications and possibly is polluted by round off error, especially for higher-order terms. Therefore it is preferable to give up the quadratic convergence instead of increasing the solution C P U time. Chapter 6 Results (I): Verif ication Cases The first step in accuracy verification of an unstructured C F D solver is to verify the accu-racy of the reconstruction procedure. For this purpose two complete sets of the reconstruc-tion tests including straight and curved boundaries are presented here. Then to study the correctness, basic performance and solution accuracy of the proposed higher-order unstruc-tured Newton-Krylov solver, smooth subsonic and supersonic cases with known solutions have been investigated. This chapter is devoted to numerical study of these test cases. 6.1 Reconstruction Test Cases The objective here is to reconstruct a test function over a domain with different mesh resolutions, and to measure the error between the reconstruction of the test function and the exact value of the test function for discretized meshes. Two different domains have been chosen for examining the over all accuracy of the reconstruction procedure. The first test case is a unit square (a case with straight edges) and the second test case is an annulus which has two curved edges. The test function is a smooth analytical function described by Eq. (6.1). f(x, y) = 1.0 + 0.1 sin(7nc) sin(7ry) (6.1) The first step in reconstruction is computing the mean value for a control volume, implying that we need to be able to integrate a given function over any control volume (including boundary triangles with curved edges) at least as accurately as the nominal discretization 90 CHAPTER 6. RESULTS(I): VERIFICATION CASES 91 2nd-order Lx Ratio L 2 Ratio Loo Ratio Mesh 1/460 CVs 0.0001667 — 0.0002948 — 0.0017002 — Mesh 2/1948 CVs 4.551149e-5 3.66 8.73185e-5 3.37 0.0004369 3.89 Mesh 3/7856 CVs 1.15364e-5 3.94 2.27180e-5 3.84 0.0001099 3.97 Mesh 4/31668 CVs 2.92207e-6 3.95 5.81718e-6 3.9 2.75408e-5 3.99 Table 6.1: 2nd-order error norms for the square case order. The integration of a given function over the area of a control volume is carried out using Gauss quadrature rule, Eq. (6.2), where Wj is the Gauss weight. While this procedure looks straightforward, determining Gauss point locations and their associated weights robustly for higher-order integration over an area with curved edges are proved to be a nontrivial task [95]. n f(x,y)dA = Y,Wjf(xj,yj) (6.2) In this research, for the square case a 6th-order integration routine using 7 Gauss points and for the annulus case a 4th-order integration routine using 10 Gauss points (developed by Dr. Carl Ollivier-Gooch in ANSLib frame work [58]) have been employed. All the grids are generated using GRUMMP Version 0.3.2 [59] which generates high quality triangular meshes for domains with curved boundaries [19]. 6.1.1 Square Test Case Four unstructured meshes shown in Fig (6.1) were generated for a unit square domain. The density of control volumes is increased 4 times at each mesh with respect to the previous mesh level. The mesh length scale is not uniform, with control volumes at the corners clearly having larger length scales than the control volumes in the interior of the square domain. The error norms and the ratio of error norms are tabulated in Table 6.1 through Table 6.3. Despite non-uniform mesh distribution, all the error norm ratios for the Mesh 4 confirm the nominal accuracy of the reconstruction. Fig (6.2) displays the Error-Mesh plot for the square case. Notice that the error norm is reduced considerably by increasing the order of accuracy, and the slope of the graph proves the reconstruction accuracy. Asymptotic convergence of the error norms are clearly observed by examining the tabulated data and Fig (6.2) for this series of meshes. CHAPTER 6. RESULTS(I): VERIFICATION CASES irArArArArArA ••y^^^^WJBFjaagwm ^i^^i? rA xrj&wA rA rA&W WA tt^tttttt &&f*$Kd&K*&'A£>'j&WA-"i^m^WWU^W^ta 0 0.2 0.4 0.6 0.8 1 X (a) 460 Control Volumes (b) 1948 Control Volumes (c) 7876 Control Volumes (d) 31668 Control Volumes Figure 6.1: Unstructured meshes for a square domain 3rd-order Li R a t i o L2 Ratio Loo Ratio Mesh 1/460 C V s 3.54269e-5 — 4.82035e-5 — 0.0002061 — Mesh 2/1948 C V s 5.18616e-6 6.83 7.59731e-6 6.34 3.47214e-5 5.9 Mesh 3/7856 C V s 6.92254e-7 7.50 1.01755e-6 7.46 4.71759e-6 7.36 Mesh 4/31668 C V s 8.96936e-8 7.70 1.31888e-7 7.70 6.09472e-7 7.74 Table 6.2: 3rd-order error norms for the square case CHAPTER 6. RESULTS(I): VERIFICATION CASES 4th-order Li Ratio L2 Ratio Loo Ratio Mesh 1/460 C V s 3.98124e-6 — 9.87666e-6 — 0.0001084 — Mesh 2/1948 C V s 3.30234e-7 12.06 8.20473e-7 12.03 8.39794e-6 12.90 Mesh 3/7856 C V s 2.11865e-8 15.60 5.32239e-8 15.40 5.49060e-7 15.30 Mesh 4/31668 C V s 1.35160e-9 15.70 3.40196e-9 15.65 3.4699e-8 15.82 Table 6.3: 4th-order error norms for the square case T o 3 W~ 10 5 Number of Control Volumes Figure 6.2: Error-Mesh plot for the square case CHAPTER 6. RESULTS(I): VERIFICATION CASES 94 2nd-order Li Ratio L 2 Ratio Loo Ratio Mesh 1/427 CVs 0.00087885 — 0.00143183 — 0.0091068 — Mesh 2/1703 CVs 0.0001974 4.45 0.0003387 4.23 0.0035145 2.59 Mesh 3/6811 CVs 4;61658e-5 4.28 8.01761e-5 4.22 0.0008615 4.08 Mesh 4/27389 CVs 1.09032e-5 4.23 1.85523e-5 4.32 0.0002311 3.72 Table 6.4: 2nd-order error norms for the annulus case 3rd-order Li Ratio L 2 Ratio L>oo Ratio Mesh 1/427 CVs 0.00037211 — 0.00051522 — 0.0020959 — Mesh 2/1703 CVs 4.86257e-5 7.65 7.05354e-5 7.3 0.0003919 5.34 Mesh 3/6811 CVs 6.15972e-6 7.89 8.83369e-6 7.98 6.93723e-5 5.65 Mesh 4/27389 CVs 7.53171e-7 8.17 1.05894e-6 8.34 9.78171e-6 7.09 Table 6.5: 3rd-order error norms for the annulus case 6.1.2 Annulus Test case This is the case that includes curved boundaries. The domain inside an annulus is dis-cretized with four mesh densities. The Geometry and meshes are shown in Fig (6.3). Again refinement is performed using global parameters and except for the center portion of the annulus, refinement has been carried out relatively uniformly. The boundary region con-trol volumes which normally introduce a major part of the reconstruction error still remain larger than the interior control volumes. The error norms and their ratio for all orders of reconstruction are presented in Table 6.4 to Table 6.6. l\ and l2 norms have converged after 3 level refinement. Zoo norm has not converged as fast as the other two norms. Since Zoo is the local error indicator and presents the maximum error in the reconstruction, the slower convergence rate for Zoo often is expected. The nominal order of accuracy for Zi and l2 are achieved through the employed 4 level meshes. The ratios for Zoo are also fairly close to the nominal order of accuracy. Fig(6.4) demonstrates the reduction in Zi norm of the error versus control volume density. The accuracy of the reconstruction also can be verified by measuring the slope of this Error-Mesh plot. In addition, such a plot can give us a good estimation of the average solution error for a certain mesh density. In other words we can determine how many control volumes do we need to reach a certain level of error based on the chosen discretization order. It is interesting to note that the l\ norm of the reconstruction error on the finest mesh (for this test function) is 15 times smaller for the 3rd-order discretization and more than 500 times smaller for the 4th-order discretization than the error for the 2nd-order discretization. CHAPTER 6. RESULTS(I): VERIFICATION CASES 95 (c) 6811 Control Volumes (d) 27389 Control Volumes F i g u r e 6 . 3 : U n s t r u c t u r e d m e s h e s f o r a c u r v e d d o m a i n ( a n n u l u s ) 4 t h - o r d e r Lx R a t i o L2 R a t i o Loo R a t i o M e s h 1 / 4 2 7 C V s 8 . 0 3 4 3 2 e - 5 — 0 . 0 0 0 1 3 5 9 8 — 0 . 0 0 1 6 8 6 0 _ M e s h 2 / 1 7 0 3 C V s 5 . 8 7 6 6 9 e - 6 1 3 . 6 7 1 . 1 1 7 7 6 e - 5 1 2 . 1 6 0 . 0 0 0 1 6 7 8 1 0 . 0 4 M e s h 3 / 6 8 1 1 C V s 3 . 6 4 1 5 6 e - 7 1 6 . 1 4 7 . 1 4 8 4 9 e - 7 1 5 . 6 3 1 . 4 2 0 4 3 e - 5 1 1 . 8 2 M e s h 4 / 2 7 3 8 9 C V s 2 . 0 9 9 4 9 e - 8 1 7 . 3 3 . 8 8 5 3 7 e - 8 1 8 . 3 9 1 . 1 5 3 6 9 e - 6 1 2 . 3 1 T a b l e 6 . 6 : 4 t h - o r d e r e r r o r n o r m s f o r t h e a n n u l u s c a s e CHAPTER 6. RESULTS(I): VERIFICATION CASES Number of Control Volumes Figure 6.4: Error-Mesh plot for the annulus case CHAPTER 6. RESULTS(I): VERIFICATION CASES 97 5 0 0 h 4 0 0 h 300 h 200 h 100 k 300 (a) Figure 6.5: Circular domain over half a cylinder, Mesh 1 (1376 CVs) 6.2 Subsonic Flow Past a Semi-Circular Cylinder The smooth inviscid flow over a semi-circular (half) cylinder with R = 1, at MQO = 0.3 is computed. The flow direction is from left to the right and the angle of attack is zero. The far-field boundary is located at R = 300 to make sure that all perturbations are damped effectively (Fig (6.5)). Three different meshes, Mesh 1 (1376 CVs), Mesh 2 (5539 CVs), and Mesh 3 (22844 CVs) are generated for this purpose; close-ups are shown in Fig (6.6) to Fig (6.8). A special effort has been made in the grid refinement process in order to achieve a relatively self-similar refinement through the whole solution domain. Each mesh is nearly four times denser than its immediate coarser level mesh implying that the mesh length scale in all parts of the domain has been reduced approximately by a factor of two (section (3.3)). CHAPTER 6. RESULTS(I): VERIFICATION CASES 98 Figure 6.6: Circular cylinder, Mesh 1 (1376CVs) Mesh CVs Max AcVi Min AcVi f Amax, V Amin / {Amax^coarse V {A-max) fine / (Amin)coarse V {Amin)fine 1 1376 1475.98 0.00360815 639.6 — 2 5539 516.15 0.000853626 777.6 1.69 2.06 3 22844 125.98 0.000240918 723.2 2.02 1.88 Table 6.7: Sizes and ratios of the control volumes for circular cylinder meshes Table 6.7 summarizes a brief information about sizes of the control volumes in each mesh level. The length scale of each control volume can be assumed to be proportional to the square root of the control volume area. By looking at the tabulated data, it is clear that in each mesh level there is a wide variety of length scales which largely changes across the solution domain. The table also shows the approximate refinement factor from one coarse level to its immediate finer level. Assuming that the refinement is done uniformly (which is not the case in reality), the length scale of the mesh is not exactly reduced by a factor of two in each refinement. Since, we do not have a uniform mesh to start with, we are not able to uniformly refine the mesh, and reduce the length scale of the control volumes in an orderly fashion through whole solution domain (especially at boundary regions) it is not too surprising that the measured order of the accuracy will be somewhat away from the nominal expected order, section (3.3). CHAPTER 6. RESULTS(I): VERIFICATION CASES Figure 6.8: Circular cylinder, Mesh 3 (22844 C V s ) CHAPTER 6. RESULTS(I): VERIFICATION CASES 100 The potential (incompressible) flow solution is used as an initial solution for all the meshes and orders of accuracy. Since the solution is started from a good initial guess, no start-up procedure is needed. Newton iterations are performed using matrix-free GMRES with the maximum Krylov subspace size of 30. However for the sake of the robustness, we keep the -£t term and the time advance formula with large CFL number is applied, Eq. (2.12). To reduce the cost of the inner iterations, where multiple residual perturbations are required, a subspace of 20 is employed for early outer iterations. When the l2 norm of the non-linear residual is dropped below 5 x 10 - 9 the subspace of 30 is used. This is especially helpful for the 4th-order case where the residual computation is very expensive. The tolerance of the linear system solver is chosen to be 1 x 10 - 2 of the l2 norm of the non-linear residual which often is not reached within the allowable inner iterations. The convergence criterion for the steady-state solution is 1 x 10 - 1 2 for the l2 norm of the non-linear residual. In the case of Mesh 1 and 2, CFL=5000 for the 2nd and 3rd-order discretizations and CFL=1000 for the 4th-order discretization are used for starting the solution from the initial guess and it is increased to CFL=10,000 when the non-linear residual is reduced by one order. For Mesh 3 the starting CFL is 100 for all orders of accuracy which is increased to CFL=5000 gradually, and after one order reduction in residual, it is increased to CFL=10,000. The preconditioner matrix for all cases is the approximate analytic Jacobian and the preconditioning strategy is ILU(4). The solution convergence history for the coarse mesh (Mesh 1) and the fine mesh (Mesh 3) are shown in the Fig (6.9) and Fig (6.10). As it is evident full convergence is achieved, but the CPU time for the 4th-order case is much larger than the other two orders of accuracy. This was partially expected since the 4th-order discretization requires more operations, and the cost of the reconstruction rises quadratically by increasing the discretization order. However, a considerable part of the computing cost is due to noticeable increase in outer iteration numbers. Furthermore, the linear system arising from the 4th-order discretization is ill-conditioned and difficult to solve which results in relatively ineffective linearization of the non-linear problem (considering the fixed subspace size and Inexact Newton approach) demanding more outer iterations. On the contrary, both the 2nd and 3rd-order cases quickly converge displaying the effectiveness of the linearization. Pressure coefficient contours (banded) for all orders of accuracy (Mesh 1) are shown in the Fig (6.11) to Fig (6.13). The quality of the contours, which reflect the smoothness of the reconstructed solution, are visibly improved by increasing the order of accuracy of the solution1. The difference between the contours' quality is clearer in regions where control 1 Visualization is done by Tecplot 360 using 3 nodal triangle finite element format; consequently although solution is higher order its visual representation is only second order. 6. RESULTS(I): VERIFICATION CASES 1 o' 1 3 ' 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 O 50 100 150 CPU-Time (Sec) Figure 6.9: Convergence history for the coarse mesh (Mesh 1) R • i a ' i I i i i i I — i — i — i — i — I — L 0 500 1000 1500 CPU-Time (Sec) Figure 6.10: Convergence history for the fine mesh (Mesh 3) CHAPTER 6. RESULTS(I): VERIFICATION CASES Figure 6.12: 3rd-order pressure coefficient contours, Mesh 1 CHAPTER 6. RESULTS(I): VERIFICATION CASES 103 Figure 6.13: 4th-order pressure coefficient contours, Mesh 1 volumes are quite coarse. T h e 4th-order pressure coefficient contours over Mesh 3 are also shown in F i g (6.14) as a reference for the quality comparison with previously mentioned cases. Since flow is inviscid and the geometry is symmetric, the pressure contours are symmetric respect to the normal axis passing through the center of the cylinder. Front and rear stagnation regions where the maximum pressure occurs (C p=1.0 ) and the suction peak on the top of the cylinder are displayed in the contours. T h e pressure coefficient distribution along the surface of the cylinder (Gauss point data over the circle) is shown in the F i g (6.15) to F i g (6.17). In terms of the solution, all discritization orders predict nearly the same pressure coefficient over the cylinder. In general and for a smooth flow that is what is expected since results should differ within the orders of the mesh size. In the 2nd-order case, when the mesh is coarse, the suction peak is not recovered smoothly nor is it located quite at the top of the cylinder; however, these issues are considerably improved for the 3rd and 4th-order solution over the same mesh. T h e 4th-order case over the coarse mesh shows a slight over-prediction both at the front stagnation point and at the suction peak compared to the same solution over the fine mesh. Finally, solution over the fine mesh for all orders of accuracy is displayed in F i g (6.17) showing that all of them essentially are converged to the same pressure distribution such that the difference between them is not visible to plotting accuracy. CHAPTER 6. RESULTS(I): VERIFICATION CASES Figure 6.14: 4th-order pressure coefficient contours, Mesh 3 2nd-Order/Meshl 3rd-Order/Meshl l_l i i i i I i i i i L_ i L_J i I i i i i—L -1 -0.5 0 0.5 1 X Figure 6.15: Pressure coefficient along the axis CHAPTER 6. RESULTS(I): VERIFICATION CASES Figure 6.16: Close up of the pressure coefficient along the axis (suction region) 2nd-Order/Mesh3 3rd-Order/Mesh3 -1 -0.5 0 0.5 1 Figure 6.17: Pressure coefficient along the axis, Mesh 3 CHAPTER 6. RESULTS(I): VERIFICATION CASES 106 2nd-order Li Ratio Loo Ratio Mesh 1/1376 CVs 0.000553 — 0.013409 — Mesh 2/5539 CVs 9.4548e-5 5.85 0.003713 3.61 Mesh 3/22844 CVs 2.0492e-5 4.61 0.00129 2.88 Table 6.8: Error norms for total pressure, 2nd-order solution 3rd-order Li Ratio Loo Ratio Mesh 1/1376 CVs 0.000426 — 0.010250 — Mesh 2/5539 CVs 3.5353e-5 12.05 0.002030 5.05 Mesh 3/22844 CVs 6.0965e-6 5.8 0.000543 3.74 Table 6.9: Error norms for total pressure, 3rd-order solution As flow is inviscid and subsonic (shock free), the total pressure of the flow should be pre-served everywhere, and it is equal to the total pressure of the free-stream at the far-field. Therefore deviation of the total pressure ratio from one can be interpreted as an error indi-cator for these type of flows (E = — 1). The discrete norms of the errors in total pressure ratio of control volume averages are computed for all orders of accuracy and meshes and are tabulated in the Table 6.8 to Table 6.10. The l\ norm, a global measurement for error, is roughly half an order less than the nominal order of the discretization for the 3rd-order, and it is about right for the 4th-order discretization. The loo norm, a local measurement for maximum error, follows the same trend and it is about one order less than the nominal norm of the discretization for the 3rd-order solution. The total pressure error also is shown in Fig (6.18) to Fig (6.21) over the coarse mesh for all orders of accuracy and over the fine mesh for the 4th-order discretization (as a reference for comparison). These pictures demonstrate that the surface boundary as the main source of the error and show how different discretization orders under-predict and/or over-predict the total pressure in stagnation regions and at the maximum acceleration area (top of the cylinder). For instance the 2nd-order and 3rd-order discretizations slightly under-predict the total pressure over nearly all the boundary surface, and this under-prediction reaches its peak at the rear stagnation region. The 4th-order method slightly over-predicts the total pressure nearly everywhere along the boundary surface which was consistent with the 4th-order pressure coefficient along the boundary. By refining the mesh and/or increasing the accuracy order the error in total pressure improves dramatically (Fig (6.21)). The Error-Mesh plot for Zi, Fig (6.22), demonstrates the total pressure error reduction versus mesh size. It should be noted that despite the deviation of the measured accuracy order from its nominal value, the higher-order discretization still produces noticeably smaller error. For instance in the case of the 3rd-order solution over the finest mesh, where the CHAPTER 6. RESULTS(I): VERIFICATION CASES Figure 6.18: Error in total pressure ratio, 2nd-order discretization, Mesh 1 Figure 6.19: Error in total pressure ratio, 3rd-order discretization, Mesh 1 CHAPTER 6. RESULTS(I): VERIFICATION CASES Figure 6.21: Error in total pressure ratio, 4th-order discretization, Mesh 3 6. RESULTS(I): VERIFICATION CASES 4th-order u Ratio Foo Ratio Mesh 1/1376 C V s 0.000535 — 0.0148117 — Mesh 2/5539 C V s 2.7473e-5 19.47 0.001097 13.5 Mesh 3/22844 C V s 2.0954e-6 13.11 8.1810e-5 13.4 Table 6.10: Error norms for total pressure, 4th-order solution - • 2nd-Order •A 3rd-Oider 4th-Order j l 1 i l i i i i i i i i i i i i 10000 20000 30000 CVs Figure 6.22: Error-Mesh plot for the total pressure CHAPTER 6. RESULTS(I): VERIFICATION CASES 110 Figure 6.23: Drag coefficient versus mesh size largest deviation from the nominal accuracy occurs, the l\ norm is 3.4 times smaller than the l\ norm of the 2nd-order solution on the same mesh. The Drag coefficient can also be used for error measurement since the exact drag coefficient for the inviscid subsonic flow over circular cylinder is zero. Figure (6.23) displays the drag reduction versus mesh size. This time the measured accuracy order is somewhat better than the nominal order showing some error cancellation due to symmetry of the solution. As expected higher-order discretization results in much smaller drag, which is closer to the exact solution. To show the benefits of higher-order discretization, it is useful to assess the performance of a higher-order solution versus its accuracy. This can be perfectly done for the circular cylinder case. The C P U time of the solution versus the drag coefficient is shown in Fig (6.24). Assume that the picture shown is the solver performance characteristic graph for all meshes, and by changing the mesh size (within the range of the interest) the behavior of the performance graph does not change. For a desired accuracy, for instance CD = 1 0 - 4 the 3rd-order solution has the minimum C P U time. For a fixed solution time of 100 seconds again the 3rd-order case results in the minimum drag. If a large solution time is bearable, the 4th-order case provides us the best solution accuracy, since its C r j / C P U time slope is CHAPTER 6. RESULTS(I): VERIFICATION CASES 111 CPU-Time (Sec) Figure 6.24: Drag coefficient versus C P U time sharper than the other two orders of accuracy. T h i s implies that if a very accurate solution is needed it is more efficient to increase the discretization order rather than using a finer mesh. T h i s comparison is performed for a simple problem, as opposed to a practical engineering case, and at same time engineering accuracy requirements are typically limited. However, this case study still provides some interesting insight into the prospects for higher-order methods. CHAPTER 6. RESULTS(I): VERIFICATION CASES 112 Figure 6.25: Annulus, Mesh 1 (108 CVs) 6.3 Supersonic Vortex Supersonic flow inside an annulus geometry is studied. This flow is isentropic (shock free) and it is a standard test case for accuracy measurement. The inner and outer radii are 2.0 and 3.0 respectively, and the inlet Mach number at the inner radius is equal to 2.0. Five different meshes (Mesh 1 to Mesh 5) are employed in this test case (Fig (6.25) to Fig (6.29)). Each mesh level has 4 times more control volumes than the immediate coarser level and uniform refinement has been applied in the grid generation. Notice that meshes are still irregular unstructured meshes and any finer level is generated completely independent from the previous coarser level. The exact solution for supersonic vortex can be found in Aftosmis and Berger [5]. Having the exact solution provides us a direct option for accuracy measurement of the numerical solution. The exact solution for the supersonic vortex is described in Eq. (6.3). CHAPTER 6. RESULTS(I): VERIFICATION CASES Figure 6.27: Annulus, Mesh 3 (1703 CVs) Figure 6.29: Annulus, Mesh 5 (27389 CVs) CHAPTER 6. RESULTS(I): VERIFICATION CASES 115 Ri = 2.0, Mi = 2.0, pi = 1.0 R = y/x2 + y2 > = 4+1i1(1-f)M< URi = RiMi{Pi) 2 (6.3) 6.3.1 Numerical Solution The exact solution is used as an initial solution, which does not exactly satisfy the discretized equations due to truncation error. Starting a numerical solution from its exact solution is unrealistic in general, as if the exact solution was known, there would be no need for numerical solution. However this is a special case for efficiency study of the developed Newton-Krylov solver, which excludes the start-up issue completely. This way we can show how the efficiency of our matrix-free approach is affected purely by discretization order when the best starting solution is used. Newton iteration (infinite time step) is performed for all cases. An approximate analytic Jacobian with ILU(4) is used for preconditioning. The convergence criterion, as in section (6.2), is L2(Res(U)) = 1 x 10 - 1 2 and the same subspace size has been used. Fig (6.30) and Fig (6.31) show the convergence history for the Mesh 1 and Mesh 5 in terms of CPU time. Since the solution is started from a good initial guess superlinear convergence is achieved from the very first iteration both for the 2nd and 3rd-order discretizations. The 4th-order case (as expected) is much slower than the other two and it requires more outer iterations (2 more outer iterations for the fine mesh) to reach full convergence. This shows that even when the solution is started from the best possible initial guess (i.e. exact solution), the 4th-order discretization results in a complicated linear system which requires considerable effort to solve. The solution Mach contours over the coarse mesh for all orders of accuracy are displayed in Fig (6.32) to Fig (6.34). The flow Mach number is supposed to be constant along any given radius and decreases with increasing radius. In other words, flow at smaller radius has larger acceleration with the maximum occurs along the inner wall boundary. By inspecting the contour lines, it is clearly visible that the quality of the solution using the higher-order discretization has been improved noticeably. This improvement is most visible close to the inner wall portion. This shows that, especially in regions where the solution has CHAPTER 6. RESULTS(I): VERIFICATION CASES 116 ' 1 1 I I 1 1 1 1 1 1 1 —I I I L 0 1 2 3 CPU-Time (Sec) Figure 6.30: Convergence history for the coarse mesh (Mesh 1) 0 50 100 150 200 250 CPU-Time (Sec) Figure 6.31: Convergence history for the fine mesh (Mesh 5) CHAPTER 6. RESULTS(I): VERIFICATION CASES 117 Figure 6.32: 2nd-order M a c h contours for the coarse mesh (Mesh 1) large inadequately resolved derivatives (but is continuous), linear reconstruction of the flow quantities is not the optimal choice. F i g (6.35) to F i g (6.37) display the density error in the solution over the coarse grid. T h e scale of the error map is the same for all discretization orders providing a reasonable sense of how the solution error decreases when a higher-order discretization is employed. A s expected the boundaries are the original source of the error and the maximum error occurs at the inner wall. T h e 4th-order density solution over the fine grid is shown in F i g (6.38) as a reference. T h e behavior of the density contours is opposite to the behavior of the M a c h contours, and density reaches its maximum value at the outer wall. 6.3.2 Solution accuracy measurement T h e density of the solution is used for the accuracy measurement. E r r o r norms are tabulated in Table 6.11 to Table 6.13 for all set of grids and discretization orders. T h e maximum error norm (loo) is about one order less than the nominal order of accuracy due to the boundary error. T h e global norm, fg, however, converges to the nominal order of accuracy for all cases. Hence the over all accuracy of the computed solution is consistent with the discretization CHAPTER 6. RESULTS(I): VERIFICATION CASES Figure 6.34: 4th-order M a c h contours for the coarse mesh (Mesh 1) CHAPTER 6. RESULTS(I): VERIFICATION CASES Figure 6.35: 2nd-order density error for the coarse mesh (Mesh 1) Figure 6.36: 3rd-order density error for the coarse mesh (Mesh 1) CHAPTER 6. RESULTS(I): VERIFICATION CASES F i g u r e 6 . 3 8 : D e n s i t y , 4 t h - o r d e r s o l u t i o n o v e r t h e f i n e m e s h ( M e s h 5 ) CHAPTER 6. RESULTS(I): VERIFICATION CASES 121 Mesh CVs Lx Ratio Loo Ratio 1 108 0.014872 — 0.097239 — 2 427 0.003847 3.86 0.036135 2.69 3 1703 0.001073 3.59 0.012825 2.82 4 6811 0.000258 4.15 0.005168 2.48 5 27389 6.6334e-5 3.89 0.002019 2.56 Table 6.11: Solution error norms, 2nd-order discretization Mesh CVs Li Ratio Loo Ratio 1 108 0.044813 — 0.028904 — 2 427 0.000976 4.59 0.011439 2.53 3 1703 0.000139 7.0 0.002225 5.14 4 6811 2.4415e-5 5.69 0.000634 3.51 5 27389 3.3202e-6 7.35 0.000136 4.66 Table 6.12: Solution error norms, 3rd-order discretization order. The error convergence plot for the l\ norm is shown in Fig (6.39) for graphical solution accuracy verification. Again here, some accuracy-efficiency analysis can be carried out. The solution C P U time versus the Error norm is plotted in Fig (6.40). For a fixed C P U time, for example ( C P U time = 10 Sec), the 3rd-order performance characteristic line generates the minimum error. For a fixed error level, l\ = 1 x 1 0 - 5 , the 4th-order performance characteristic line has the minimum computation time, although on that error level the 3rd and 4th-order lines are very close to each other. In all cases the 2nd-order characteristic line is totally out performed by higher-order counterparts. It should be mentioned that a similar performance characteristic graph with the same trend, but different C P U time, can be achieved when the solution starts from a constant Mach number as an initial guess with a start-up process. The supersonic vortex is an internal flow; therefore, the accuracy of the solution for whole domain was measured and studied. If one modifies the error quantification, or limits the Mesh CVs Lx Ratio Loo Ratio 1 108 0.002199 — 0.01632 — 2 427 0.000410 5.36 0.005102 3.20 3 1703 2.3761e-5 17.25 0.000546 9.34 4 6811 2.0200e-6 11.76 8.3718e-5 6.52 5 27389 1.2514e-7 16.14 1.0024e-5 8.35 Table 6.13: Solution error norms, 4th-order discretization CHAPTER 6. RESULTS(I): VERIFICATION CASES 122 Figure 6.39: Error-Mesh plot for the solution (Density) importance of the solution to a specific part of the domain, the performance characteristic graph may change considerably. Therefore it is crucial to know what is important in a flow field, and how the error is defined and quantified. A l l of these are application oriented and vary from case to case. However, the studied test case clearly shows the potential performance advantages of higher-order discretizations. CHAPTER 6. RESULTS(I): VERIFICATION CASES Figure 6.40: Error versus C P U T i m e Chapter 7 Results (II): Simulat ion Cases In this chapter, the start-up procedure, fast convergence, robustness and overall performance of the proposed higher-order unstructured Newton-Krylov solver have been studied in detail for subsonic, transonic, and supersonic flows. 7.1 Subsonic Air fo i l , N A C A 0012, M = 0.63, a = 2.0' To study the convergence and robustness of the proposed Newton-Krylov solver with a higher-order unstructured discretization, here a subsonic case is presented which includes most of the features of the solver's performance. The convergence characteristics are inves-tigated for a series of meshes. Five different meshes with an O-Domain from a coarse to a relatively fine mesh have been used (Table 7.1 and Fig (7.1) to Fig(7.5) ). All meshes have proper refinement at the leading and trailing edges. The far field is located at 25 chords and characteristic boundary conditions are implemented implicitly. Mesh No. of CVs along the chord (upper side/lower side) Total No. of CVs 1 61/58 1245 2 101/99 2501 3 127/127 4958 4 198/192 9931 5 260/253 19957 Table 7.1: Mesh detail for NACA 0012 airfoil 124 CHAPTER 7. RESULTS(II): SIMULATION CASES X Figure 7.1: N A C A 0012, Mesh 1, 1245 CVs 0.4 0.2 >< 0 -0.2 -0.4 Figure 7.2: N A C A 0012, Mesh 2, 2501 C V s CHAPTER 7. RESULTS(II): SIMULATION CASES 0.4 0.2 > 0 -0.2 -0.4 0.4 0.2 > 0 -0.2 -0.4 Figure 7.4: N A C A 0012, Mesh 4, 9931 CVs CHAPTER 7. RESULTS(II): SIMULATION CASES 127 Figure 7.5: NACA 0012, Mesh 5, 19957 CVs 7.1.1 Solution Process The tolerance in solving the linear system for the start-up phase is 5 x 10 ~ 2 and for the Newton part is 1 x 10 - 2 . For all test cases a subspace of 30 has been set and no restart is allowed. The preconditioning for the start-up pre-iterations is performed by employing the approximate analytical Jacobian matrix with ILU-1 factorization and for the Newton iteration the first-order finite difference Jacobian matrix with ILU-4 factorization is used. The Newton iteration is matrix-free and e — | * j r with £rj = 1 x 10 - 6 is used for directional differencing. The initial condition is the free stream flow. No attempt for optimizing the start-up process is made. The solution starts with 30 pre-iterations in the start-up process to reach a good initial solution before switching to Newton iterations. The CFL starts at 2.0 and is increased gradually to CFL=100 for the first 15 pre-iterations which are performed with first-order accurate flux evaluation. The remaining 15 pre-iterations are performed in the form of the defect correction with constant CFL of 100, where the first-order Jacobian is used both for constructing the left hand side and for preconditioning the linear system. The right hand side (Eq. (5.27)) is evaluated to the correct order of accuracy. The cost of each pre-iteration includes one first order Jacobian CHAPTER 7. RESULTS(II): SIMULATION CASES 128 1 b ' I I I I i I I I I I I l l—I—L 0 0.2 0.4 0.6 0.8 1 X/C Figure 7.6: C p over the upper surface after start-up, M e s h l (1245 C V s ) evaluation and its incomplete factorization, one flux evaluation, and one system solve using G M R E S , which is not matrix-free since the Jacobian matrix is available explicitly. Experiments for subsonic flow have shown that a reasonable starting point for Newton iteration can be easily achieved by a relatively small number of pre-iterations, and there is no need to decrease the residual by a significant factor. Only a rough physical solution over the airfoil is good enough for starting Newton iterations. It is also interesting to look at the solution after finishing the start-up and right before switching to Newton iteration. F i g (7.6) displays the pressure coefficient over the upper surface of the airfoil at the end of the start-up process for Mesh 1, which is the coarsest mesh. T h e approximate physical solution has been reached for all discretization orders and the suction peak is nearly resolved. Al though the suction peak is recovered better in the start-up process for higher-order discretizations, the difference is small and for the start-up where only an approximate solution is needed, the 2nd-order defect correction generates a reasonably good solution to start the Newton iterations for all discretization orders. Considering the difference in cost of the 2nd-order residual evaluation compared to the 3rd and 4th-order residual evaluations, it makes sense to use the 2nd-order defect correction scheme for start-up. A similar graph is shown for the finest mesh (Mesh 5) after start-up process in F i g (7.7). CHAPTER 7. RESULTS(II): SIMULATION CASES 129 Figure 7.7: C p over the upper surface after start-up, Mesh 5 (19957 C V s ) While the general trend of the pressure coefficient over the airfoil is recovered, the suction peak has not been properly resolved yet. T h a t is not surprising since due to the mesh volume, the diversity of mesh length scales, and the local time stepping approach, solution time evolution is far from completion and more pre-iterations are needed to resolve the suction peak. T h e noticeable point is the violent oscillations of the 4th-order discretization around the suction region in the course of the solution evolution. T h i s oscillation around the extrema is normal, and it is the case both for the 2nd and 3rd-order discretizations. B u t for the 4th-order case due to low dissipation in the fourth-order scheme, oscillations are quite vigorous. These oscillations could be a source of instability in the start-up process and indeed the 4th-order case is sensitive in the early period of solution process. Since the cost of the higher-order start-up is more than the 2nd-order one, and it could generate noisy solution state (before switching to Newton iteration), it is more robust to perform all the defect correction part with the 2nd-order residual evaluation. A similar approach is taken for all meshes of the current test case, where the defect correction iterations are performed with 2nd-order accuracy. After start-up, the solution process is switched to Newton iteration and an infinite C F L is employed. To be able to compare the computing cost for different meshes, a work unit CHAPTER 7. RESULTS(II): SIMULATION CASES 130 has been used, which is simply equivalent to the cost of one residual evaluation for the corresponding order of accuracy. It is obvious then that two different meshes have two different work units in terms of CPU time. Convergence is reached (solution process is stopped) when the L2 norm of the non-linear density residual falls below 1 x 10 - 1 2 . Table 7.2 shows the convergence summary for the 2nd, 3rd and 4th-order discretizations in terms of total number of residual evaluations, total CPU time, total work units, number of Newton iterations, and the cost of Newton phase in terms of work units. For all meshes, after 30 pre-iterations, the solution has converged after a few Newton iterations. The total number of work units increases as finer meshes are used. The linear system arising from a denser grid is more difficult to solve than a similar system arising from a coarser grid. Using a constant subspace size without restart becomes less and less effective in solving the linear system as the size and complexity of the system increases resulting in reducing the linearization effectiveness. Consequently more outer (Newton) iterations are needed to reduce the non-linear residual where the mesh size or discretization order increases. Also the solution state after the start-up for the fine mesh is not as close as it was to the steady-state solution for the case of coarse mesh and this is another contributing factor for slowing the Newton convergence on the fine mesh. Notice that the total work unit has increased roughly by a factor of 2.5 while the mesh size has increased by a factor of 16. For a similar subsonic case, the total computation cost in terms of work units for one of the fastest and most efficient available Newton-Krylov algorithm (2nd-order/artificial dissipa-tion) for unstructured meshes [45] is just under 300. This confirms that the performance of the current developed Newton Krylov is quite comparable with the world class standard algorithms. However as it will be noted later a combination of mesh sequencing and multi-grid techniques are needed to ensure such excellent performance as the size of the mesh increases. Figure (7.8) displays the the solution CPU time versus grid size in logarithmic scale. The computing cost in terms of CPU time follows a power law relation, t = m(Ncv)k , with the mesh size and all discretization orders show more or less the same trend (k2nd = 1.131, k^rd — 1.021 and k^h = 1-163), showing the scalability of the developed solver CPU time with the mesh size. This graph emphasizes the need for parallel computing and multi-grid method in the case of large meshes. Figure (7.9) and Fig (7.10) show the total work units and the Newton phase cost in terms of the work unit. The 2nd and the 3rd-order solution have nearly the same Newton phase cost in terms of the work unit. The difference between these two is mainly due to the start-up phase cost which for small meshes could be considerable part of the total solution cost. For the 4th-order case, the cost of the start CHAPTER 7. RESULTS(II): SIMULATION CASES 131 Test Case No. of Res. Time (Sec) Work Uni t No. of Newton Itr. Newton Phase Work Uni t Mesh 1 2nd 100 5.76 240.0 3 96.7 3rd 132 8.87 197.1 4 122.4 4th 244 27.09 258.0 7 226.8 Mesh 2 2nd 121 12.88 280.0 3 118.3 3rd 136 17.71 213.4 4 128.2 4th 283 58.15 312.6 8 274.8 Mesh 3 2nd 126 26.88 349.1 3 136.1 3rd 147 36.03 248.5 4 141.2 4th 247 90.54 289.3 7 239.2 Mesh 4 2nd 158 60.39 399.9 4 182.8 3rd 158 73.98 276.0 4 159.0 4th 318 217.60 371.4 9 317.8 Mesh 5 2nd 254 164.10 562.0 7 325.3 3rd 286 225.87 456.3 8 321.8 4th 542 682.0 639.8 16 577.1 Table 7.2: Convergence summary for N A C A 0012 airfoil, M = 0.63, a = 2° CHAPTER 7. RESULTS(II): SIMULATION CASES 132 Figure 7 .8: C P U time versus the grid size, N A C A 0012, M = 0.63, a = 2° up is a minor part of the total solution cost since the 4th-order gets cheaper scaled start-up than 3rd (than 2nd) due to normalization of the solution cost. It should be mentioned that the start-up technique is neither unique nor optimized for these series of test cases and its cost should be studied separately. If a reasonably good pre-computed initial solution is available (e.g. full or incompressible potential solution, exact solution of some approximate problem or an empirical solution) then the cost of the start-up phase can be ignored and in practice the total solution cost would be equal to the Newton phase cost. The start-up methodology in this research is reasonably efficient for small and moderate sized meshes; however its cost would grow if more pre-iterations are needed for large meshes. In general for large meshes, it is not efficient to perform the start-up procedure from scratch, and it would be much more efficient if the computation procedure is started from a coarse mesh. Therefore for very large meshes augmenting the pre-iterations start-up with the mesh se-quencing technique could be quite helpful. At the same time the rise in total solution cost due to increasing the mesh size both in the start up and Newton phases can be dramatically reduced if a multi-grid procedure is employed. The convergence history for the coarse grid (Mesh 1) and the fine grid (Mesh 5) are shown in Fig (7.11) and Fig (7.12). Although the start up process is the same for all orders of CHAPTER 7. RESULTS(II): SIMULATION CASES 133 Figure 7.9: Total work unit versus the grid size, N A C A 0012, M = 0.63, a = 2° 2nd-Order r ' 1 1 1 1 I I I I I I I I I I I — I — I — I — I — I — I — L 10000 20000 No. of C V s Figure 7.10: Newton phase work unit versus the grid size, N A C A 0012, M — 0.63, a = 2 CHAPTER 7. RESULTS(II): SIMULATION CASES 134 CPU time Figure 7.11: Convergence history, N A C A 0012, Mesh 1, M = 0.63, a = 2 ° discretizations, the starting residual is different for two reasons. First the initial residual (ro) which initializes the implicit start-up is based on the correct corresponding order of residual evaluation. Second the mesh moments are pre-computed and employed based on the different orders of accuracy. However as start-up reaches to its end the solution state is almost the same for all discretization orders. After start-up full convergence is achieved within a few Newton iterations in 2nd and 3rd order cases. For the 4th-order case convergence is about two times slower (in terms of outer iterations). T h e superlinear convergence for the 2nd and 3rd-order cases is evident. F i g (7.13) examines the correlation between the non-linear residual and the residual of the linear system solution. T h i s graph shows the non-linear residual after applying the update produced by solving the linear system versus the residual of that linear system. T h e correlation between these two is nearly linear. F i g (7.14) shows the quality of the linear system solution for each Newton iteration. T h e linear residual reduction within the G M R E S linear solver is plotted versus the final residual of that system for all outer iterations. In other words this shows how much the linear system residual is reduced through inner iterations. For the 2nd-order case the linear system is solved quite effectively and more than 8 orders of residual reduction are achieved given the subspace size of 30 with no restart. For the 3rd-CHAPTER 7. RESULTS(II): SIMULATION CASES 135 100 200 300 400 500 600 CPU time Figure 7.12: Convergence history, Mesh 5, M = 0.63, a = 2 ° L 2 Linear System Residual Figure 7.13: Non-linear residual versus linear system residual, Mesh 3, M — 0.63, a — 2 ° CHAPTER 7. RESULTS(II): SIMULATION CASES 1 3 6 •H 2nd-Ordei7Newton Itr./Mesh3 •A 3rd-Order/Newton Itr./Mesh3 •V 4th-Order/Newton Itr./Mesh3 i t r Tcr T<r io"15 W* L2-linear S y s t e m Res idua l F i g u r e 7 . 1 4 : L i n e a r s y s t e m r e s i d u a l d r o p p i n g o r d e r , M e s h 3 , M — 0 . 6 3 , a = 2 ° o r d e r a n d 4 t h - o r d e r d i s c r e t i z a t i o n t h i s r e s i d u a l r e d u c t i o n i s a b o u t 5 a n d 2 o r d e r s r e s p e c t i v e l y f o r t h e s a m e s u b s p a c e s i z e . T h i s r e d u c t i o n i n q u a l i t y i n s o l v i n g t h e l i n e a r s y s t e m e v e n t u a l l y i n c r e a s e s t h e n u m b e r o f N e w t o n i t e r a t i o n s f o r h i g h e r - o r d e r d i s c r e t i z a t i o n s . H o w e v e r , i t h a s b e e n s h o w n t h a t b y u s i n g m u l t i p l e r e s t a r t s a n d s o l v i n g t h e l i n e a r s y s t e m u p t o m a c h i n e a c c u r a c y f o r e a c h N e w t o n i t e r a t i o n s e m i q u a d r a t i c c o n v e r g e n c e i s a t t a i n a b l e e v e n f o r t h e 4 t h - o r d e r d i s c r e t i z a t i o n w i t h d r a m a t i c p e n a l t y i n c o m p u t a t i o n c o s t [ 5 3 ] . T h e p e r f e c t p r e c o n d i t i o n i n g w o u l d c l u s t e r a l l e i g e n v a l u e s a t o n e . F o r t h e p r o p o s e d p r e c o n -d i t i o n i n g s t r a t e g y , i t i s i m p o s s i b l e t o e x a c t l y c l u s t e r a l l e i g e n v a l u e s a t o n e f o r t w o r e a s o n s . F i r s t , t h e f i r s t - o r d e r l i n e a r i z a t i o n m a t r i x ( J a c o b i a n m a t r i x ) i s e m p l o y e d w h i c h i s a n a p p r o x -i m a t i o n f o r h i g h e r - o r d e r l i n e a r i z a t i o n a n d n o t e q u a l t o t h e o r i g i n a l l i n e a r i z a t i o n . S e c o n d l y , a n i n c o m p l e t e f a c t o r i z a t i o n i s u s e d a s t h e p r e c o n d i t i o n i n g t e c h n i q u e i n s t e a d o f f u l l f a c t o r i z a -t i o n . T h e r e f o r e , t h e b e s t t h a t c a n b e e x p e c t e d i s t h a t a l l e i g e n v a l u e s o f t h e p r e c o n d i t i o n e d o p e r a t o r w o u l d b e s c a t t e r e d a r o u n d u n i t y w h i l e s o m e o f t h e m c o u l d b e v e r y c l o s e t o o n e . A s a r e s u l t , t h e d i s t a n c e o f t h e e i g e n v a l u e s f r o m u n i t y c a n b e u s e d a s o n e o f t h e p r e c o n d i -t i o n i n g q u a l i t y i n d i c a t o r s . A l s o , i t i s d e s i r e d t o h a v e e i g e n v a l u e s l o c a t e d f a r f r o m t h e o r i g i n ( z e r o ) t o a v o i d i l l - c o n d i t i o n i n g a n d s i n g u l a r i t y i s s u e s [ 5 3 ] . T o e v a l u a t e a n d c o m p a r e t h e q u a l i t y o f t h e p r e c o n d i t i o n i n g f o r d i f f e r e n t d i s c r e t i z a t i o n o r d e r s t h e a p p r o x i m a t e e i g e n v a l u e CHAPTER 7 RRSTTT.TSm)- RTMTJTATTON CASES 137 • 2nd-Order/Mesh3 A 3rd-Order/Mesh3 £ V 4th-Ordei7Mesh3 A V . * V A D 13 • V A A Re A D v -0 .5 | - v A ^ • ^ • v A * A A A " * AFigure 7.15: Eigenvalue pattern for the preconditioned system, Mesh 3, M = 0.63, a = 2 spectrum of the preconditioned linear system (estimated via G M R E S algorithm) is plotted at the last iteration (i.e. converged solution) for the Mesh 3 test case, F i g (7.15). T h e eigenvalues associated with higher-order discretizations are scattered with larger distances from one compared to the 2nd-order eigenvalues. T h i s some how indicates a reduction in the quality of system solving with increasing discretization order, which of course was not hard to predict in the first place. A t the same time, higher the discretization order closer to the origin eigenvalues are located shifting the matrix toward singularity. T h i s is especially the case for the 4th-order discretization where one of the eigenvalues is very close to origin. T h e estimated condition number of the linear system for Newton iterations are shown in F i g (7.16) and F i g (7.17) for Mesh 3 and Mesh 5 cases. T h e condition number is shown as a function of drop in residual. T h e ResO or reference residual is the initial non-linear residual of the corresponding mesh computed based on the far field flow condition. Therefore the ratio of the non-linear residual at the end of each Newton iteration to ResO, reflects the relative convergence after each Newton iteration. T h e first iteration condition number shows the conditioning of the linear system formed based on the solution linearization at the end of the start-up phase. T h e rest of the reported condition numbers are associated to the linear system formed based on the solution update at the end of successive Newton iterations. While the 2nd-order discretization graph initially has larger condition number, CHAPTER 7. RESULTS(II): SIMULATION CASES 138 Figure 7.16: Condit ion No. of the preconditioned system, Mesh 3, M = 0.63, a = 2 Figure 7.17: Condit ion No. of the preconditioned system, Mesh 5, M = 0.63, a = 2 CHAPTER 7. RESULTS(II): SIMULATION CASES 139 0.323 -0.322 -0.321 -U 0 . 3 2 -0.319 -0.318 -. A A A A -0.317 L i i—I—i—i—L_I i i ' 1 1 i i I i i i I i i i I . 0 2 4 6 8 10 L o g (L2(ResO)/L2(Res)) 2nd-Order/Newton Itr./Meshl 3rd-Order/Newton itrVMeshl 4th-Order/Newton Itr./Meshl Figure 7.18: Lift coefficient convergence history, N A C A 0012, Mesh 1, M = 0.63, a = 2 ° the condition number decreases gradually with convergence. In higher-order cases, initially, the condition number is smaller compare to the 2nd-order case, but it starts to grow with Newton iterations. In conclusion, the over all performance of the flow solver depends not only on the conditioning of the linear system but also it relies on how well the non-linear problem is represented by the linearization at each iteration. For higher-order discretizations the quality of the linearization is not as good as it is for the 2nd-order discretization and more outer or Newton iterations are needed for full convergence. It should be mentioned that the reported condition numbers are relatively small (since the size of the meshes are not very large). Generally speaking, the resultant preconditioned linear systems are well conditioned and the condition numbers are reported just for monitoring the solution process. Lift and drag coefficient convergence histories as a function of drop in residual are shown in F i g (7.18) to F i g (7.21) for coarse and fine meshes. In the case of coarse mesh, almost 6 orders drop in residual seems to be necessary for lift and drag convergence. T h e lift and drag convergence are nearly achieved after 4-5 orders drop in residual for higher-order discretizations. In the case of the fine mesh all the coefficients are completely converged after 4 orders drop in residual. A s a practical matter, this is an important point since the computation cost can be noticeably reduced by making CHAPTER 7. RESULTS(II): SIMULATION CASES 140 0.0028 0.0026 0.0024 0.0022 0.002 0.0018 0.0016 F-Q 0.0014 u 0.0012 h 0.001 0.0008 h 2nd-Order/Newton Itr./Meshl 3rd-Order/Newton itr./Meshl 4th-Order/Newton Itr./Meshl I i i ' I i i i I i i i I i i i I i i 0 2 4 6 Log (L2(Res0)/L2(Res)) 10 Figure 7.19: Drag coefficient convergence history, N A C A 0012, Mesh 1, M = 0.63, a = 2° convergence criteria less strict, and terminating the solution process after fewer Newton iterations for the fine mesh where the computing cost increases considerably. Following this procedure, the higher-order (especially the 4th-order) discretization would be greatly benefited, as the unnecessary computing cost to reduce the residual after a certain value is avoided. Lift and drag coefficients for all meshes and discretization orders are tabulated in Table 7.3. In the finest mesh case, lift coefficients up to third decimal place and the drag coefficients up to the 4th decimal place have converged to the same value. However the drag coefficient still is far from zero. Thi s is mainly due to lack of pressure recovery at the trailing edge which is a singular point in potential flow. T h e 3rd-order drag coefficient is consistently larger than the 2nd-order drag coefficient, which is surprising. However the 4th-order drag is consistently the smallest, as was expected. Without drawing a general conclusion about the 3rd-order drag, it is fair to say that behavior of the discretization order at the trailing edge of the airfoil is a determining factor, and apparently for such a region discretization methods with even orders of accuracy performed better than a discretization method with an odd-order of accuracy possibly due to cancellation effect. A l l of the test cases so far are performed with a far field size of 25 chords. T o study the CHAPTER 7. RESULTS(II): SIMULATION CASES 141 -1 0 1 2 3 4 5 6 Log (L2(Res0)/L2(Res)) Figure 7.20: Lift coefficient convergence history N A C A 0012, Mesh 5, M = 0.63, a = 2 ° Figure 7.21: Drag coefficient convergence history, N A C A 0012, Mesh 5, M = 0.63, a = 2 ° CHAPTER 7. RESULTS(II): SIMULATION CASES 142 Test Case Lift coefficient Drag coefficient Meshl 2nd 0.322318 0.00097664 3rd 0.317393 0.00184889 4th 0.322223 0.00072576 Mesh2 2nd 0.322302 0.00057225 3rd 0.321846 0.00103438 4th 0.322448 0.00040369 Mesh3 2nd 0.324905 0.00040197 3rd 0.325214 0.00049820 4th 0.325588 0.00034757 Mesh4 2nd 0.325159 0.000336973 3rd 0.324740 0.00037595 4th 0.325323 0.00032525 Mesh5 2nd 0.324801 0.00032136 3rd 0.324568 0.00032711 4th 0.324474 0.00030828 Table 7.3: Lift and drag coefficients for all meshes and discretization orders, N A C A 0012, M = 0.63, a = 2°, far field size of 25 chords CHAPTER 7. RESULTS(II): SIMULATION CASES 143 Test Case/Mesh Size /Far field distance Lift coefficient Drag coefficient Mesh 4-1/10604 CVs/128 Chords 2nd 0.332506 0.00010622 3rd 0.332882 0.00012667 4th 0.332699 7.09651e-5 Mesh 5-1/22421 CVs/200 Chords 2nd 0.333575 4.88588e-5 3rd 0.333530 5.98949e-5 4th 0.334306 4.04234e-5 Mesh 5-2/24514 CVs/1600 Chords 2nd 0.332908 1.82357e-5 3rd 0.333715 2.35558e-5 4th 0.333374 5.66076e-6 Table 7.4: Effect of the far field distance on lift and drag coefficients, NACA 0012, M = 0.63, a = 2° effect of the far field size on lift and drag, and to show that computing smaller drag is possible with the same mesh resolution, the far field size is increased considerably while the mesh resolution, the number of points on the airfoil and refining factors remained the same. In other words, the new meshes are equivalent to the meshes of the previous test cases up to 25 chords. The results are tabulated in Table 7.4. In all cases the drag coefficient has been reduced dramatically by increasing the far field distance. For instance, the 4th-order computed drag over Mesh 5-2 has been reduced by more than 50 times comapred to the 4th-order computed drag over Mesh 5. The lift coefficient is also affected by the far field distance (i.e. in the most cases the lift coefficient was slightly increased by extending the far field size). It should be mentioned that in this research no far field correction is used, and higher-order far field correction was beyond of the scope of the thesis research. Reference [21] reports CL of 0.3289 and CD of 0.0004 for a 2nd-order computation of a similar flow case over an adaptively refined Cartesian mesh with the total of 10694 CVs (494 body nodes) and 128 chords far field distance, which in terms of the mesh and far field size is closely comparable to the Mesh 4-1 test case. Mach contours for the Mesh 3 (4958 CVs) are displayed in Fig (7.23) to Fig (7.25), where the corresponding mesh is shown in Fig (7.22). Flow field features such as the stagnation region just below the leading edge, flow acceleration near the leading edge (where the maximum Mach number occurs), and flow deceleration near the trailing edge are visualized. Once CHAPTER 7. RESULTS(II): SIMULATION CASES 144 Figure 7.22: NACA 0012, Mesh 3 (4958 CVs) again where the mesh is coarse, higher-order contours are visibly smoother than the 2nd-order contours showing the improved quality of the flow computation over these regions. The Mach profile (computed at Gauss points) along upper side of the airfoil for Mesh 1 (coarsest mesh, Fig (7.28)) are shown in Fig (7.26) and Fig (7.27). Since there is a sudden change in the area of the control volumes over the surface of the airfoil in acceleration region, which is often the case for coarse meshes, a jump is observed in Mach profile at those locations. However, higher-order solution has reduced the amplitude of this jump so that in the 4th-order case, the jump has disappeared. Also, the Mach profile over the lower surface of the airfoil is shown in Fig (7.29), where the 2nd-order Mach profile is slightly lower than its higher-order counterparts. Fig (7.30) to Fig (7.32) display the Mach profile of upper/lower sides of all discretization orders over the finest mesh, Mesh 5. As expected, all orders of accuracy result in essentially the same Mach distribution. The close up over the acceleration region demonstrates that in locations where the flow experiences large changes, linear reconstruction or 2nd-order discretization does not result in as smooth a solution as high-order discretizations. The error distribution in total pressure, 1 — P t / P t o o , along the chord for Mesh 1 and Mesh 5 CHAPTER 7. RESULTS(II): SIMULATION CASES Figure 7.23: 2nd-order M a c h contours for N A C A 0012 airfoil, Mesh 3, M = 0.63, X Figure 7.24: 3rd-order M a c h contours for N A C A 0012 airfoil, Mesh 3, M = 0.63, CHAPTER 7. RESULTS(II): SIMULATION CASES 146 Figure 7.25: 4th-order Mach contours for NACA 0012 airfoil, Mesh 3, M = 0.63, a = 2° Figure 7.26: Mach profile, upper side, NACA 0012 airfoil, Mesh 1, M = 0.63, a = 2° CHAPTER 7. RESULTS(II): SIMULATION CASES 147 X / C Figure 7.27: M a c h profile close up, upper side, N A C A 0012 airfoil, Mesh 1, M = 0.63, a = 2 ° Figure 7.28: Mesh 1, close-up at the leading edge region CHAPTER 7. RESULTS(II): SIMULATION CASES 148 Figure 7.29: M a c h profile, lower side, N A C A 0012 airfoil, Mesh 1, M = 0.63, a = 2 ° Figure 7.30: M a c h profile, upper side, N A C A 0012 airfoil, Mesh 5, M = 0.63, a = 2 ° CHAPTER 7. RESULTS(II): SIMULATION CASES 149 0.99r-0.98 h 0.97 h 2nd-Order/Mesh5 3rd-Order/Mesh5 4th-Order/Mesh5 JS 0.96h 0.95 h 0.94 h 0.04 0.06 0.08 X / C Figure 7.31: M a c h profile close up, upper side, N A C A 0012 airfoil, Mesh 5, M = 0.63, a = 2 ° 2nd-Order/Mesh5 3rd-order/Mesh5 4th-Order/Mesh5 0.4 0.6 X/C Figure 7.32: M a c h profile, lower side, N A C A 0012 airfoil, Mesh 5, M = 0.63, a = 2 ° CHAPTER 7. RESULTS(II): SIMULATION CASES 150 0.04 0.03 0.02 2nd-Order/Meshl 3rd-Order/Meshl 4th-Order/Meshl Figure 7.33: 1 - upper side, N A C A 0012 airfoil, Mesh 1, M = 0.63, a = 2 ° 0.015 0.01 J 0.005 -0.005 h -0.01 h 2nd-Order/Meshl 3rd-Order/Meshl 4th-Order/Meshl Figure 7.34: 1 - lower side, N A C A 0012 airfoil, Mesh 1, M = 0.63, a = 2 ° CHAPTER 7. RESULTS(II): SIMULATION CASES 151 0.004 0.002 -0.002 -0.004 h 2nd-Order/Mesh5 3rd-Order/Mesh5 4th-Order/Mesh5 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 0.2 0.4 0.6 x/c 0.8 Figure 7.35: 1 - upper side, N A C A 0012 airfoil, Mesh 5, M = 0.63, a = 2 ° 0.004 p-0.002 h 2nd-Order/Mesh5 3rd-order/Mesh5 4lh-Order/Mesh5 -0.002 r--0.004 r-Figure 7.36: 1 - lower side, N A C A 0012 airfoil, Mesh 5, M = 0.63, a = 2 ° 'too CHAPTER 7. RESULTS(II): SIMULATION CASES 152 are shown in Fig (7.33) to Fig (7.36). In all cases there are sudden spikes at the trailing edge, and although the error amplitude over the fine mesh is noticeably decreased, the trailing edge remains one of the main source of total pressure error. Another source of error is the leading edge area which covers both the stagnation and acceleration regions. The 4th-order solution for all cases has produced smaller entropy all over the airfoil compared to the 2nd and 3rd-order discretizations, as its stagnation pressure error is nearly zero, (except the above mentioned spikes at the trailing edge) including the leading edge region. CHAPTER 7. RESULTS(II): SIMULATION CASES 153 7.2 Transonic Air fo i l , N A C A 0012, M = 0.8, a = 1.25° For a transonic flow, in general, it is more difficult to get fast convergence. This is because of the mixed subsonic/supersonic nature of the flow and the existence of discontinuities (shocks) in the solution. The methodology for handling of discontinuity can increase the complexity of the problem. This is true especially for implicit schemes, where there could be a large change in the solution update in each iteration and limiter values could have large oscillations. In the case of the matrix-free approach in which matrix-vector multiplication is computed through flux perturbation, any oscillatory behavior in limiter could severely degrade the solution convergence. All these facts amount to increasing the complexity and difficulty in solving transonic flows. The transonic flow around NACA 0012 at M = 0.8, a = 1.25° is studied. The Mesh 3 with 4958 CVs, Fig (7.3), of section 7.1 is employed for this test case. Flow is solved for all orders of accuracy using Venkatakrishnan limiter with proper higher-order modification, section 3.4.2. For the 2nd and 3rd-order cases K = 10 is used in the limiter, and for the 4th-order discretization K = 1 is employed. The limiter values are allowed to change through all iterations and no freezing is considered. The tolerance of solving the linear system, like previous test cases, for the start-up phase is 5 x 10 - 2 and for the Newton phase is 1 x 10 - 2 . For all test cases a subspace of 30 has been set and no restart is allowed. The preconditioning for the start-up pre-iterations is performed using the approximate analytical Jacobian matrix with ILU-1 factorization and for the Newton iteration finite difference Jacobian matrix with ILU-4 factorization is applied. The Newton iteration is matrix-free and e = with eo = 1 x 10 - 7 is used for directional differencing. The initial condition is free stream flow. Convergence is reached when the L2 norm of the non-linear density residual falls below 1 x 10 - 1 2 . For transonic flow before switching to Newton iteration, the shock locations in the flow field and their strengths need to be captured relatively accurately; otherwise Newton iterations will not decrease the residual of the non-linear problem effectively. This normally is achieved by reducing the non-linear residual by some orders, typically 1.5-2, respect to the initial residual in the course of the start-up process. Multiple implicit pre-iterations are performed in the form of defect correction, before switch-ing to Newton iteration. For the 2nd and 3rd-order start-up phases, pre-iterations in the form of defect correction continue until the residual of the non-linear problem drops 1.5 order below the residual of the initial condition. In defect correction phase the starting CFL number is 2 and it is increased gradually to 200 after 50 iterations. The CFL is not CHAPTER 7. RESULTS(II): SIMULATION CASES 154 0 0.2 0.4 0.6 0.8 1 XJC Figure 7.37: M a c h profile at the end of start-up process, M = 0.8, a = 1 . 2 5 ° increased above the value of 200 as increasing C F L would not help that much when the linearization is inaccurate, as in the start-up phase. 69 and 81 pre-iterations were required in the start-up phase to reduce the residual by 1.5 order for the 2nd and 3rd-order dis-cretizations respectively. In the Newton phase, the 2nd and 3rd-order cases are followed with infinite time step. T h e start-up phase for the 4th-order discretization, includes 200 pre-iterations with similar C F L trend. Although the target residual reduction of 1.5 or-ders was not achieved through pre-iterations, the solution after 200 iterations was good enough for starting the Newton phase. F i g (7.37) displays the M a c h profile at the end of the start-up phase for all orders of accuracy. There are some oscillations especially along the upper surface M a c h profile close to the shock location however, the shock locations and their strengths are captured reasonably accurately before switching to Newton iteration. For the 4th-order transonic case, using an infinite time step does not help the convergence acceleration and causes inaccurate linearization and limiter oscillation. Since this leads to slow convergence, CFL=10 ,000 has been set for Newton phase. T h e convergence summary is shown in the Table 7.5. Similar to the subsonic case, the number of Newton iterations for the 4th-order discretizations is twice as large as for the 2nd and 3rd-order discretizations. T h e major cost of the solution in all cases is the start-up CHAPTER 7. RESULTS(II): SIMULATION CASES 155 Test Case No. of Res. Time (Sec) Work Unit No. of Newton Itr. Newton Phase Work Unit 2nd 197 65.6 279 4 91 3rd 241 106.7 281 5 119 4th 450 311.4 590 10 221 Table 7.5: Convergence summary for N A C A 0012 airfoil, M = 0.8, a = 1 . 2 5 ° • 2nd-Order/Newton Itr. H A 3rd-Order/Newton Itr. 4th-Order/Newton Itr. CPU time (Sec) Figure 7.38: Convergence history for N A C A 0012, M = 0.8, a = 1 . 2 5 ° cost. Therefore there is a reasonable potential to reduce the total solution work dramatically by employing an optimized start-up technique which is able to capture shocks and establish the major flow features efficiently. Figure (7.38) demonstrates the convergence history graph. T h i s time, reduction in convergence slope for the 4th-order case is not only due to poor convergence of the solution of the linear system in each Newton iteration, but also to the limiter firing which changes the linearization in each iteration. Aga in for a similar transonic case, the total computation cost in terms of work units for a very fast Newton-Krylov algorithm (2nd-order/artificial dissipation) for unstructured meshes [45] is just under 400 showing that the the performance of the developed Newton K r y l o v in this research is very competitive. Table 7.6 summarizes the lift and drag coefficients for all discretization orders which are CHAPTER 7. RESULTS(II): SIMULATION CASES Test case C L C D 2nd 0.337593 0.0220572 3rd 0.339392 0.0222634 4th 0.345111 0.0224720 A G A R D - 2 1 1 [2]/Structured (192x39) 0.3474 0.0221 Table 7.6: Lift and drag coefficients, N A C A 0012, M = 0.8, a = 1 . 2 5 ° -0.2 0 0.2 0.4 0.6 0.8 1 1.2 X Figure 7.39: 2nd-order M a c h contours, N A C A 0012, M = 0.8, a = 1 . 2 5 ° Figure 7.41: 4th-order Mach contours, NACA 0012, M = 0.8, a = 1.25° CHAPTER 7. RESULTS(II): SIMULATION CASES 158 Figure 7.42: limiter 0 (3rd-order), N A C A 0012, M = 0.8, a = 1 . 2 5 ° in good agreement with the reference data [2]. Both C L and C D in transonic flow are essentially influenced by the shock location and its strength, and to have a good prediction for theses coefficients, it is crucial to capture the shocks accurately. Figure (7.39) to Fig(7.41) display banded M a c h contours for all orders of discretizations. T h e weak shock at lower surface and the strong shock at upper surface of the airfoil are quite visible. T h e presence of the discontinuity and firing of the limiter in shock regions results in non-smooth contour lines in the shock vicinity (mostly for coarser control volumes). T h i s non-smoothness widens with the discretization order due to increase in the reconstruction stencil size. In addition to hampering convergence, using the limiter has the disadvantage of reducing the accuracy of the solution due to altering the gradients and higher-order terms. T h e behavior of the limiter values </> and a is shown in F i g (7.42) and F i g (7.43). Some isolated limiter firing is observed in the leading and trailing edge regions, but those are limited to a small number of control volumes, and generally the value is close to one. T h e limiter is mainly active in the strong upper shock region, while it remains inactive at the weak lower shock. Note that wherever the gradient limiter, is fired aggressively the higher-order terms limiter, a, is nearly zero. CHAPTER 7. RESULTS(II): SIMULATION CASES 159 Figure 7.44: Mach profile, N A C A 0012, M = 0.8, a = 1.25° CHAPTER 7. RESULTS(II): SIMULATION CASES 160 1.4 2nd 3rd 4th AGARD 1.2 0.2 0.4 0.6 0.8 XJC Figure 7.45: M a c h profile in shock regions, N A C A 0012, M = 0.8, a = 1 . 2 5 ° T h e M a c h profile along the surface is shown in F i g (7.44) and F i g (7.45). Both the location and strength of the shocks are in good agreement with the A G A R D data [2]. T h e 2nd-order discretization has some overshoot right before the upper shock, which is also visible in the related M a c h contours. It appears that the 3rd-order discretization produces less noise in this shock capturing case, and this is probably because of the quadratic reconstruction characteristic. However, the cubic reconstruction associated to the 4th-order discretization shows some oscillations inside the shock region; this behavior is typical in approximating a (sharp) discontinuity by a higher-order polynomial. CHAPTER 7. RESULTS(II): SIMULATION CASES 161 7.3 Supersonic flow, Diamond airfoil , M = 2.0, a = 0.0 Supersonic flow field axound the diamond airfoil has been computed using all discretization orders at far field Mach number of 2.0 and zero angle of attack. The diamond airfoil has a thickness of 15% and is located in a 7x14 rectangular domain. The total number of control volumes in the mesh is 7771 with 80 control volumes along the chord. Appropriate mesh refinement at the leading and trailing edges (shock locations) and also the apexes of the airfoil (expansion locations) has been carried out to enhance the quality of the solution capturing in these high gradient regions, Fig (7.46). The interesting parts of the flow field are the discontinuities (shocks) and very sharp gradi-ents (expansion fans). Because these features would result in extensive limiter firing which would adversely affect the global accuracy, no limiter has been employed in the test case. Therefore, the resultant solution will not be monotone and there will be considerable oscil-lations around shocks and expansion fans. The tolerance in solving the linear system for the start-up phase is 5 x 10 - 2 and for the Newton phase is 1 x 10~2. For all test cases a subspace of 30 has been set and no restart is allowed. The preconditioning for the start-up pre-iterations is performed using the approximate analytical Jacobian matrix with ILU-1 factorization and for the Newton iteration finite difference Jacobian matrix with ILU-4 fac-torization is applied. The Newton iteration is matrix-free and e = with eo — 1 x 10 - 7 is used for directional differencing. The initial condition is free stream flow. Convergence is reached when the L2 norm of the non-linear density residual falls below 1 x 10 - 1 2 . In the current test case major physical flow characteristics, including both shocks and expansion fans need to be captured approximately before switching to Newton iterations. Otherwise due to highly non-linear and discontinuous regions in the supersonic flow field convergence stalls shortly after the starting point. Therefore 30 pre-iterations in the form of defect correction have been performed before switching to the Newton phase. The starting CFL is 5 and it is increased to 50 after 10 iterations. For the start-up of the 4th-order discretization case, the 2nd-order defect correction is used to reduce the start-up cost. That improves the robustness of the start-up too since the 4th-order defect correction start-up for such a flow (without limiter) could be unstable. The pressure coefficient along the chord at the end of the start-up phase is shown in Fig (7.47) demonstrating the effectiveness of the defect correction start-up for supersonic flow and the fact that the shocks and expansion fans are fairly accurately established at the end of the start-up process. After starting the Newton phase full convergence is achieved rapidly for all orders of dis-cretizations. Figure (7.48) shows the convergence history for the diamond airfoil displaying CHAPTER 7. RESULTS(II): SIMULATION CASES (a) Diamond airfoil (b) Nose section (c) M i d section Figure 7.46: Mesh 7771 CVs , diamond airfoil, M = 2.0, a = 0.0 CHAPTER 7. RESULTS(II): SIMULATION CASES 163 2nd-0rder Start up (used for 2nd/4th-Order) 3rd-0rder Start up • Exact Solution y±+—•—•—• 0.2 0.4 0.6 0.8 XJC Figure 7.47: C p at the end of start-up process, diamond airfoil, M = 2.0, a = 0.0 the fast convergence. Notice that the residual still is large right before the first Newton iteration, but by that point the main structure of the flow field has been formed and the residual has been dropped quickly after Newton iterations. In fact the high M a c h num-ber helps to damp all errors including low frequency errors quite effectively as soon as the shocks and expansion fans are established over the airfoil since first, the wave propagation speed is quite fast and second, all waves travel toward outlet without reflecting from bound-aries. Table (7.7) provides the convergence summary for all discretization orders. For the 2nd-order case the major part of the total work is spent in the start-up phase while for the 4h-order case as expected, from previous test cases, most of the time is spent on the Newton iterations. Thi s is especially the case here since the 2nd-order start-up is employed and the 4th-order start-up is avoided (like the subsonic airfoil test case) reducing the relative cost of the start-up phase for the 4th-order solution. For the 3rd-order case, the costs of the start-up and Newton phases are split evenly. T h e M a c h contours for all discretization orders are shown in F i g (7.49) to F i g (7.51). A l l the steady-state flow features, including symmetric shocks right at the sharp leading edge, expansion fans at the lower and upper apexes, and symmetric fish tail shocks at the sharp trailing edge of the diamond airfoil are clearly captured. CHAPTER 7. RESULTS(II): SIMULATION CASES 164 50 100 CPU time (Sec) Figure 7.48: Convergence history for diamond airfoil, M = 2.0, a = 0 Test Case No. of Res. Time (Sec) Work Unit No. of Newton Itr. Newton Phase Work Uni t 2nd 61 27.85 236.0 2 50.0 3rd 126 48.62 243.1 3 120.8 4th 254 133.57 301.5 7 250.0 Table 7.7: Convergence summary for diamond airfoil, M = 2.0, a = 0 Test case C D 2nd 0.0524258 3rd 0.0524661 4th 0.0524671 Exact 0.0524663 Table 7.8: Drag coefficient, diamond airfoil, M = 2.0, a = 0 ° CHAPTER 7. RESULTS(II): SIMULATION CASES 165 CHAPTER 7. RESULTS(II): SIMULATION CASES CHAPTER 7. RESULTS(II): SIMULATION CASES 167 3rd-0rder Figure 7.53: 3rd-order Cp, diamond airfoil, M — 2.0, a = 0.0 0.3 h 4th-Order Figure 7.54: 4th-order Cp, diamond airfoil, M = 2.0, a = 0.0 CHAPTER 7. RESULTS(II): SIMULATION CASES 168 Figure (7.52) to Fig (7.54) display the pressure coefficient along the chord of the diamond airfoil for all orders of discretization. In all cases there are over and/or undershoots at the leading and trailing edges of the airfoil (where the shocks are located) and at the mid-dle section of the diamond (where the expansion fans are formed). Naturally, for higher discretization order, the over and undershoots are worse. The 4th-order Cp shows some oscillatory behavior in the front region, located between two large gradient points of the diamond airfoil. This is due to the continuing effect of the over and undershoots which are not fully damped at that region due to very large reconstruction stencil and the character-istic of the cubic polynomial. It is worth mentioning that the higher-order reconstruction procedure used in this research has some compactness issues for such a supersonic flow since the reconstruction routine uses data from regions in the flow field which are physically independent. That is, downstream data is included in the reconstruction. Despite all this, there is an overall good agreement between the exact solution and the numerical solution over the supersonic diamond airfoil. The computed drag coefficient for all orders of discretization is tabulated in Table 7.8. Again there is a very good match between the numerical and exact solution here. For the inviscid supersonic diamond airfoil case, due to pre-determined shock and expansion fan locations, the drag coefficient is solely a function of the shocks strength and acceleration of the flow at the expansion apexes. This is why the computed drag coefficient is still fairly accurate despite all these over and undershoots, as their mean values remain very close to zero. C h a p t e r 8 Concluding Remarks 8.1 Summary and Contributions A fast implicit finite volume flow solver for higher-order unstructured (cell-centered) steady-state computation of inviscid compressible flows (the Euler equations) has been developed. The matrix-free G M R E S algorithm is used as a solver for the linear system arising from the discretized equations, avoiding explicit computation of the higher-order Jacobian matrix. The solution process has been divided into two separate phases, i.e. a start-up phase and a Newton phase. A defect correction procedure is proposed for the start-up phase consisting of multiple implicit pre-iterations. The approximate first-order analytical Jacobian is used both for constructing and preconditioning the linear system on the left hand side, reducing the complexity and cost of the pre-iterations. A low fill-level incomplete factorization, I L U ( l ) , is used for the right preconditioning of the low order linear system associated with the defect correction iterations. The G M R E S algorithm in its standard form is used for solving the resultant linear system. Knowing that the linearization is not very accurate at this stage, the C F L number is kept low (on the order of 102) and a moderate tolerance was set for solving the linear system. The defect correction procedure was very effective in reaching an approximate solution which includes most of the physical characteristics of the steady-state flow. Having an approximate solution at the end of the start-up phase where the linearization of the flow field is accurate enough for steady-state solution, the solution process is switched to the Newton phase taking an infinite time step. In this phase, a first-order finite difference 169 CHAPTER 8. CONCLUDING REMARKS 170 Jacobian is used for preconditioning of the higher-order linear system in the matrix-free context. A high fill-level incomplete factorization, ILU(4), is applied for the right precondi-tioning of the N e w t o n - G M R E S iterations to take advantage of the best possible precondi-tioning of the employed approximate Jacobian 1 . Semi-quadratic or superlinear convergence has been achieved within a few Newton iterations for most of the cases. A modified version of Venkatakrishnan limiter [87] is'used to enforce loose monotonicity for transonic flows. T h e higher-order terms in the reconstruction polynomial were dropped wherever the limiter fired firmly in the flow field. This has been performed through a dif-ferentiable switch, cr, which is a function of the limiter value <f> itself. Also, a large constant value K in Venkatakrishnan limiter (typically K=10) is chosen for the limiting which some-what sacrifices the monotonicity in the favor of rapid convergence. The over all proposed limiting procedure insures fast convergence while reducing the accuracy disadvantages of limiter application. The effect of choosing e in the accuracy of the computed matrix-vector products using directional derivative technique for the matrix-free G M R E S was analyzed throughly. It was shown that for a fine mesh it is necessary to choose a relatively large e to properly account for the higher-order terms. The issue of mesh refinement in accuracy measurement for unstructured meshes is revis-ited. A straightforward methodology is applied for accuracy assessment of the higher-order unstructured approach based on total pressure loss, drag measurement, and solution error calculation. The effect of the discretization order, mesh size and far field distance were studied com-pletely and a careful measurement of the solver performance was provided. The accuracy, fast convergence and robustness of the proposed higher-order unstructured Newton-Krylov solver for different speed regimes were shown via several test cases. Solutions of different orders of accuracy are compared in detail through several investiga-tions. The possibility of reducing computational cost required for a given level of accuracy using high-order discretization is demonstrated. 'For such a Newton-GMRES solver, it is shown that ILU(4) has an equal performance with full factoriza-tion, L U , in terms of outer iterations where the ILU(4) factorization cost both in terms of time and memory usage remains fairly small compared to LU.factorization [52]. CHAPTER 8. CONCLUDING REMARKS 171 8.2 Conclusions While, in general, computing cost remains one of the main concerns for the higher-order com-putation of fluid flow problems, the current research proves that reaching fast convergence— and a reasonable computing cost—for higher-order unstructured solution is indeed possible. Results for the implicit flow solution algorithm described in this thesis show that the 2nd and 3rd-order solutions both display semi-quadratic or superlinear convergence for all test cases if started from a good initial guess or approximate solution. The 4th-order solution still converges quickly although it is slower than the other discretization orders requiring nearly two times the number of. outer iterations as the 2nd-order solution started from a similar approximate flow solution. The employed start-up procedure is very effective in providing a good approximate solution as an initial condition for Newton phase with a reasonable cost. For subsonic or supersonic cases, a limited number of pre-iterations is sufficient to reach a good initial solution. For the transonic case, however, the number of pre-iterations increases since some residual reduction target must be met before switching to Newton iterations. The proposed start-up technique takes a considerable part of the solution time for the 2nd and 3rd-order solutions. As a result, reduction of the start-up cost would significantly improve the over all performance of the flow solver. Finding a reasonably good initial solution quickly is something of an art rather than an exact science. If a pre-computed solution is available then the cost of the start-up phase can be ignored and in practice the total solution cost would be equal to the Newton phase^ost. The first-order solution, full potential solution, exact solution of the approximated problem or an empirical solution can be picked as an initial solution for the Newton phase depending on the availability and the problem case. ' Preconditioning and the accuracy of the linear system solution is a vital factor in the Newton phase of the N e w t o n - G M R E S solver. Evidence suggests that the number of outer Newton iterations in the 4th-order case can be reduced to the level of the 2nd and 3rd-order methods provided that the linear system resulting from the 4th-order discretization is solved completely [53]. Given the current preconditioning strategy, this requires a considerable increase (by a factor of 4-5) in the 4th-order residual evaluations in the matrix-free G M R E S algorithm. Therefore it is preferable to keep the subspace size limited to a moderate number (i.e. 30) and bear the cost of increasing the outer iteration numbers by a factor of two. The eigenvalue spectrum of the preconditioned system shows that the higher-order system is shifted toward singularity, increasing the condition number of the linear system. Therefore CHAPTER 8. CONCLUDING REMARKS 172 more effort is needed to solve the linear system resulting from the higher-order discretization, especially in the vicinity of the steady-state solution. For transonic flow, like smooth flows and despite using a limiter and the existence of normal shocks, superlinear convergence for the 2nd and 3rd-order discretizations and fast conver-gence for the 4th-order discretization have been obtained in the Newton phase. The applied limiter, which was active mainly in the strong shock, did not have a strong effect on deteri-oration of the convergence rate, and the employed continuous switch performed as designed to remove higher-order terms near shocks. Using an efficient start-up technique, a good preconditioner matrix, and an effective pre-conditioning strategy are the key issues for the robustness and fast convergence of the N e w t o n - G M R E S solver. T h e performance of the current flow solver in terms of C P U time scales linearly with the mesh size for all orders of discretization. As an overall performance assessment (including the start-up phase), the 3rd-order solution is about 1.3 to 1.5 times, and the 4th-order solution is about 3.5-5 times, more expensive than the 2nd-order solution with the developed solver technology. It should be mentioned that the current 2nd-order solver algorithm, even without multigrid augmentation, is very competitive with the most efficient reported results for unstructured meshes in terms of the normalized total solu-tion time (work units). The 3rd and 4th-order schemes are comparable in efficiency to the 2nd-order scheme as measured by residual evaluations. Mesh refinement is the critical issue in accuracy verification, and reducing the length scale uniformly for irregular unstructured grids is not an easy task. Consequently in the accu-racy assessment'process the generated mesh levels must be examined closely especially at boundaries. Boundaries in general and the solid wall boundary in particular are the main source of the solution error. Qualitative/quantitative comparisons of the fluid flow solutions for different discretization orders over several meshes show that the quality of the solution for smooth flows is dramat-ically improved using high-order discretization technique, especially in the case of coarse grids. This suggests that depending on the error measurement criteria, significant effi-ciency/accuracy gains for high-order discretization are attainable. Since the developed implicit algorithm is matrix-free, memory usage will not be an issue. In the matrix-free case, memory usage is affected by the type of the preconditioning technique which in this thesis for I L U (4) is a bit more than two times of the required memory for storing the first-order Jacobian, Table 5.1. CHAPTER 8. CONCLUDING REMARKS 1 7 3 8.3 Recommended Future Work The proposed solver algorithm works quite efficiently and accurately for a variety of inviscid compressible flows. However, there are a number of areas where this research could be either improved or extended. 8.3.1 Start-up In the case of very large meshes, it is not efficient to perform the start-up procedure from scratch; it is more efficient if the computation procedure is started from a coarser level mesh. Therefore for very large meshes augmenting the pre-iterations with a mesh sequencing technique could be quite helpful. At the same time total solution cost due to increasing mesh size in the start up can be dramatically reduced if a multi-grid procedure is employed. 8.3.2 Preconditioning Applying a more accurate preconditioning matrix (that is, a better approximation to the higher-order Jacobian) instead of the first-order Jacobian would increase the system solving quality reducing the outer iteration numbers especially in the case of the 4th-order method. Having said that the preconditioning cost should be kept reasonably low since the factor-ization cost of the preconditioner matrix should be much smaller than solving the original linear system. Adding a multi-grid scheme to the preconditioning strategy could also be helpful in the case of large meshes. 8.3.3 Reconstruction In general, limiters adversely affect the solution accuracy via manipulation of the recon-struction polynomial. This issue becomes more important for higher-order reconstruction if the limiter fires unnecessarily and extensively in the flow field. As a result, W E N O schemes, in which limiters are avoided through a careful weighting non-oscillatory recon-struction procedure, may be an appropriate solution for higher-order reconstruction of non-smooth flows. Furthermore, the benefit of applying higher-order reconstruction for high Mach number/non-smooth flows, considering the natural oscillatory behavior of the recon-struction polynomial in such flows, still is an open area of research. CHAPTER 8. CONCLUDING REMARKS 174 8.3.4 Extension to 3 D Extension of the current higher-order 2D solver algorithm to a 3D version, considering the availability of the 3D reconstruction procedure, is reasonably straight forward if an appro-priate higher-order representation of a 3D mesh is available. Since the 2D algorithm is developed for unstructured meshes, there is no limitation in such an extension, and the performance of the flow solver in terms of the convergence should be improved due to intro-ducing the third dimension (3D relieving effect). However, the matrix storage requirement in 3D needs to be carefully studied since storing even the first-order Jacobian (due to the large number of meshes in 3D) for preconditioning could be challenging. The incomplete factorization fill-level in 3D is another issue, and it may not be possible to use high fill-level ILU(P) for 3D cases. 8.3.5 Extension to Viscous Flows The introduced implicit procedure could be implemented for viscous flow computation provided that the viscous residual function evaluation and a proper anisotropic unstruc-tured/hybrid mesh are available. However there would be some issues regarding the condi-tioning of the linear system due to wide variety of geometrical and physical length scales, which need to be addressed by proper preconditioning. In the case of turbulence modeling the degree of coupling of the turbulence model in the implicit procedure would be another issue. Bibliography [1] P E T S c , The Portable Extensible Toolkit for Scientific Computation. Argonne National Lab, http://www-unix.mcs.anl.gov/petsc/petsc-as/index.html. [2] Test Cases for Inviscid Flow Fields. Technical Report AR-211, Advisory Group for Aerospace Research and Development ( A G A R D ) , N A T O . , 1985. [3] R. Abgrall . O n Essentially Non-oscillatory Schemes on Unstructured Meshes: Analysis and Implementation. Journal of Computational Physics, 114:45-58, 1994. [4] M . Aftosmis, D . Gaitonde, and S. T . Tavares. Behavior of Linear Reconstruction Techniques on Unstructured Meshes. AIAA Journal, 33(ll):2038-2049, 1995. [5] M . J . Aftosmis and M . J . Berger. Multilevel Error Estimation and Adaptive h-Refinement for Cartesian Meshes With Embedded Boundaries. A I A A Paper 2002-0863, 40th A I A A Aerospace Sciences Meeting and Exhibit, 2002. [6] R. K . Agarwal. A Compact High-Order Unstructured Grids Method for the Solution of Euler Equations. International Journal of Numerical Methods in Fluids, 31:121-147, 1999. [7] F . R . Bailey. High-End Computing Challenges in Aerospace Design and Engineering. In Third International Conference on Computational Fluid Dynamics (Invited Lecture), 2004. [8] H . E . Bailey and R. M . Beam. Newton's Method Applied to Finite Difference Approx-imations for Steady State Compressible Navier-Stokes Equations. Journal of Compu-tational Physics, 93:108-127, 1991. [9] T . J . Barth. Analysis of Implicit Local Linearization Techniques for Upwind and T V D Algorithms. A I A A Paper 87-0595, 25th Aerospace Sciences Meeting and Exhibit, 1987. 1 7 5 BIBLIOGRAPHY 176 [10] T . J . Barth. Aspects of Unstructured Grids and Finite-Volume Solvers for the Euler and Navier-Stokes Equations, in Unstructured Grid Methods for Advection-Dominated Flows. Technical R e p o r t ' A G A R D - R 7 8 7 , A G A R D , 1992. [11] T . J Barth. Recent Development in High Order K-Exact Reconstruction on Unstruc-tured Meshes. A I A A Paper 93-0668, 31th Aerospace Sciences Meeting and Exhibit, 1993. [12] T . J . Barth and P. O . Frederickson. Higher-Order Solution of the Euler Equations on Unstructured Grids Using Quadratic Reconstruction. A I A A Paper 90-0013, 28th Aerospace Sciences Meeting and Exhibit, 1990. [13] T . J . Barth and D . C . Jespersen. The Design and Application of Upwind Schemes on Unstructured Meshes. A I A A Paper 89-0366, 27th Aerospace Sciences Meeting and Exhibit, 1989. ' [14] T . J . Barth and S. Linton. A n Unstructured Mesh Newton Solver for Compressible Fluid Flow and its Parallel Implementation. A I A A Paper 95-0221, 33rd Aerospace Sciences Meeting and Exhibit , 1995. [15] F . Bassi and S. Rebay. High-Order Accurate Discontinuous Finite Element Solution of the 2D Euler Equations. Journal of Computational Physics, 138:251-285, 1997. [16] M . Benzi. Preconditioning Techniques for Large Linear Systems: A Survey. Journal of Computational Physics, 182:418-477, 2002. [17] K . S. Bey and J . T . Oden. A Runge-Kutta Local Projection Pi-Discontinuous Galerkin Finite Element Method for High Speed Flows. A I A A Paper 91-1575, 1991. [18] M . Blanco and D . Zingg. A Fast Solver for the Euler Equations on Unstructured Grids Using a N e w t o n - G M R E S Method. Number A I A A Paper 97-0331, 1997. [19] C . Boivin and C . Ollivier-Gooch. Guaranteed-Quality Triangular Mesh Generation for Domains with Curved Boundaries. International Journal for Numerical Methods in Engineering, 55(10):1185-1213, 2002. [20] G . Corliss, C h . Faure, A . Griewank, L . Hascoet, and U . Naumann. Automatic Differ-entiation of Algorithms From Simulation to Optimization. Springer, 2002. [21] Darren L . De Zeeuw. A Quadtree-Based Adaptively-Refined Cartesian-Grid Algorithm for Solution of the Euler Equations. P h D thesis, Aerospace Engineering and Scientific Computing in the University of Michigan, 1993. BIBLIOGRAPHY 177 [22] M . Delanaye. Polynomial Reconstruction Finite Volume Schemes for the Compressible Euler and Navier-Stokes Equations on Unstructured Adaptative Grids. P h D thesis, Universite de Liege, Faculte des Sciences Appliquees, 1998. [23] M . Delanaye and J . A . Essers. Quadratic-Reconstruction Finite-Volume Scheme for Compressible Flows on Unstructured Adaptive Grids. AIAA Journal, 35(4):631-639, 1997. • [24] M . Delanaye, P h . Geuzaine, and J . A . Essers. Compressible Flows on Unstructured Adaptive Grids. A I A A Paper 97-2120, 13th A I A A Computational Fluid Dynamics Conference, 1997. [25] R. S. Dembo, S. C . Eisenstat, and T . Steihaug. Inexact Newton Methods. SIAM Journal on Numerical Analysis, 19(2):400-408, 1982. [26] J . E . Dennis and R. B. Schnable. Numerical Methods for Unconstrained Optimizations and Non Linear Equations. Prentice-Hall, 1983. [27] J . Forrester, E . Tinoco, and Y u . Jong. Thirty years of development and application of C F D at Boeing Commercial Airplanes. A I A A Paper 2003-3439, 2003. [28] O. Friedrich. Weighted Essentially Non-Oscillatory Schemes for the Interpolation of Mean Values on Unstructured Grids. Journal of Computational Physics, 144:194-212, 1998. [29] P. E . Gil l , W . Murray, and M . H . Wright. Practical Optimization. Elsevier Academic Press, 1986. [30] A . G . Godfrey, C . R. Mitchell, and R. W . Walters. Practical Aspects of Spatially High Accurate Methods. A I A A Paper 92-0054, 30th Aerospace Sciences Meeting and Exhibit, 1992. [31] J . J . Gottlieb and C . P. T . Groth. Assessment of Riemann Solvers for Unsteady One-Dimensional Inviscid Flows of Perfect Gases. Journal of Computational Physics, 78(2):437-458, 1988. [32] W . D . Gropp, D . E . Keyes, L . C . Mcinnes, and M . D . Tidr ir i . Globalized Newton-Krylov-Schwarz Algorithms and Software for Parallel Implicit C F D . Int. J. High Per-formance Computing Applications, 14:102-136, 2000. [33] A . Harten and S. Osher. Uniformly high-order accurate nonoscillatory schemes. SIAM Journal of Numerical Analysis, 24, 1987. BIBLIOGRAPHY 178 [34] A . Haselbacher. A W E N O Reconstruction Algorithm for Unstructured Grids Based on Explicit Stencil Construction. Number A I A A Paper -2005-879 in 43rd A I A A Aerospace Sciences Meeting and Exhibit, 2005. [35] C . Hirsch. Numerical Computation of Internal and External Flows, Computational Methods for Inviscid and Viscous Flows, volume 2. Jhon Wiley & Sons, 1990. [36] A . Jameson. Success and Challenges in Computational Aerodynamics. A I A A Paper 87-1184, 8th Computational Fluid Dynamics Conference, 1987. [37] A . Jameson and D . Mavriplis. Finite-Volume Solution of the Two-Dimensional Euler Equations on a Regular Triangular Mesh. AIAA Journal, 24:611-618. [38] A . J . Jameson, W . Schmidt, and E . Turkel. Numerical Solutions of the Euler Equations by a Finite-Volume Method Using Runge-Kutta Time-Stepping Schemes. A I A A Paper 81-1259, 1981. ; [39] D. B . K i m and P. D . Orkwis. Jacobian Update Strategies for Quadratic and Near-Quadratic Convergence of Newton and Newton-Like Implicit Schemes. A I A A Paper 93-0878, 1993. [40] K . A . Knol l and D . E . Keyes. Jacobian Free Newton Krylov Methods: A Survey of Approaches and Applications. Journal of Computational Physics, 193:357-397, 2004. [41] C . B. Laney. Computational Gasdynamics. Cambridge University Press, 1998. [42] R. J . Le Veque. Numerical Methods for Conservation Laws. Birkhauser Verlag, 1990. [43] X - D . L iu , S. Osher, and T . Chan. Weighted Essentially Non-Oscillatory Schemes. Journal of Computational Physics, 115, 1994. [44] H . Luo, J . Baum, and R. Lohner. A Fast, Matrix-free Implicit Method for Compressible Flows on Unstructured Girds. Journal of Computational Physics, 146:664-690, 1998. [45] L . M . Manzano, J . V . Lassaline, P. Wong, and D. W . Zingg. A Newton-Krylov A l -gorithm for the Euler Equations Using Unstructured Grids. A I A A Paper 2003-0274, 41th A I A A Aerospace Sciences Meeting and Exhibit, 2003. [46] D . J . Mavriplis. O n Convergence Acceleration Techniques for Unstructured Meshes. Technical Report I C A S E No. 98-44, Institute for Computer Applications in Science and Engineering ( I C A S E ) , N A S A Langley Research Center, N A S A Langley Research Center, Hampton V A 23681-2199, 1998. BIBLIOGRAPHY 179 [47] F . Mavriplis. C F D in aerospace in the new millenium. Canadian Aeronautics and Space Journal, 46(4):167-176, 2000. [48] A . Nejat and C . Ollivier-Gooch. A High-Order Accurate Unstructured G M R E S Solver for Poisson's Equation. 11th Annual Conference of the Computational Fluid Dynamics Society of Canada, pages 344-349, 2003. [49] A . Nejat and C . Ollivier-Gooch. A High-Order Accurate Unstructured G M R E S Solver for the Compressible Euler Equations. T h i r d International Conference on Computa-tional Fluid Dynamics, 2004. [50] A . Nejat and C . Ollivier-Gooch. A High-Order Accurate Unstructured G M R E S Algo-rithm for Inviscid Compressible Flows. A I A A Paper 2005-5341, 17th A I A A Computa-tional Fluid Dynamics Conference, 2005. [51] A . Nejat and C . Ollivier-Gooch. A High-Order Accurate Unstructured Newton-Krylov Solver for Inviscid Compressible Flows. A I A A Paper 2006-3711, 36th A I A A Fluid Dynamics Conference, 2006. [52] A . Nejat and C . Ollivier-Gooch. O n Preconditioning of N e w t o n - G M R E S algorithm for a Higher-Order Accurate Unstructured Solver. 14th Annual Conference of the C F D Society of Canada, 2006. [53] A . Nejat and C . Ollivier-Gooch. Effect of Discretization Order on Preconditioning and Convergence of a Higher-Order Unstructured Newton-Krylov Solver for Inviscid Compressible Flows. A I A A Paper 2007-719, 45th A I A A Aerospace Sciences Meeting and Exhibit, 2007. [54] J . Nichols and D . W . Zingg. A Three-Dimensional Multi-Block Newton-Krylov Flow Solver for the Euler Equations. A I A A Paper 2005-5230, 17th A I A A Computational Fluid Dynamics Conference, 2005. [55] E . J . Nielsen, W . K . Anderson, R. W . Walters, and D . E . Keyes. Application of Newton-Krylov Methodology to a Three-Dimensional Unstructured Euler Code. A I A A Paper 95-1733, 12th A I A A Computational Fluid Dynamics Conference, 1995. [56] C . Ollivier-Gooch. High-Order E N O Schemes for Unstructured Meshes Based on Least-Square Reconstruction. A I A A Paper 97-0540, 35th A I A A Aerospace Sciences Meeting and Exhibit, 1997. [57] C . Ollivier-Gooch. Q u a s i - E N O Schemes for Unstructured Meshes Based on Unlimited Data-Dependent Least-Squares Reconstruction. Journal of Computational Physics, 133:6-17, 1997. BIBLIOGRAPHY 180 [58] C . Ollivier-Gooch. Programmers Reference Guide for the A N S L i b Generic Finite-Volume Solver Software Library. Advanced Numerical Simulation Laboratory (ANSLab) , Mechanical Engineering Department, The University of British Columbia, 2002. [59] C . Ollivier-Gooch. G R U M M P (Generation and Refinement of Unstructured, Mixed-Element Meshes in Parallel) Version 0.3.2 User's Guide. Advanced Numerical Simu-lation Laboratory (ANSLab) , Mechanical Engineering Department, The University of ' British Columbia, 2005. [60] C . Ollivier-Gooch and M . Van Altena. A Higher-Order Accurate Unstructured Mesh Finite-Volume Scheme for the Advection-Diffusion Equation. Journal of Computational Physics, 181:729-752, 2002. [61] O. Onur and S. Ey i . Effects of the Jacobian Evaluation on Newton's Solution of the Euler Equations. International Journal for Numerical Methods In Fluids, 49:211-231, 2005. [62] P. D . Orkwis. Comparison of Newton's and Quasi-Newton's Method Solvers for Navier-Stokes Equations. AIAA Journal, 31:832-836, 1993. [63] A . Pueyo and D . W . Zingg. A n Efficient N e w t o n - G M R E S Solver for Aerodynamic Computations. A I A A Paper 97-1955, 1997. [64] A . Pueyo and D . W . Zingg. Progress in NewtonKrylov Methods for Aerodynamic Calculations. A I A A Paper 97-0877, 35th Aerospace Sciences Meeting and Exhibit, 1997. [65] A . Pueyo and D . W . Zingg. Improvement to a Newton-Krylov Solver for Aerodynamic Flows. A I A A Paper 98-0619, 36th Aerospace Sciences Meeting and Exhibit, 1998. [66] S. De Rango and D . W . Zingg. Aerodynamic Computations Using a Higher-Order Algorithm. A I A A Paper 99-0167, 37th A I A A Aerospace Sciences Meeting and Exhibit, 1999. [67] S. De Rango and D . W . Zingg. Further Investigation of Higher-Order Algorithm for Aerodynamic Computations. A I A A Paper 20.00-0823, 38th Aerospace Sciences Meeting and Exhibit, 2000. [68] S. De Rango and D . W . Zingg. Higher-Order Aerodynamic Computations on Multi-Block Grids: A I A A Paper 2001-0263, 39th A I A A Aerospace Sciences Meeting and Exhibit, 2001. BIBLIOGRAPHY 181 [69] P. Roe. Approximate Riemann Solvers, Parameter Vectors, and Difference Schemes. Journal of Computational Physics, 43:357-372, 1981. [70] P. Roe. Characteristic-Based Schemes for the Euler Equations. Annual Review of Fluid Mechanics, 18:337-365, 1986. [71] A . Rohde. Eigenvalues and Eigenvectors of the Euler Equations in General Geometries. A I A A Paper 2001-2609, 39th Aerospace Sciences Meeting and Exhibit, 2001. [72] Y . Saad, , and M . H . Schultz. A Generalized Minimal Residual Algorithm for Solving Non-Symmetric Linear Systems. SIAM J. Sci., Stat. Comp., 7:856-869, 1986. [73] Y . Saad. A Flexible Inner-Outer Preconditioned G M R E S Algorithm. SIAM Journal of Scientific Computing, 14(2):461-469, 1993. [74] Y . Saad. Iterative Methods for Sparse Linear Systems. Siam, second edition, 2003. [75] P.R. Spalart. Topics in Detached-Eddy Simulation. In Third International Conference on Computational Fluid Dynamics (Invited Lecture), 2004. [76] Spitaleri R. M . and Regolo V . Multiblock Multigrid Grid Generation Algorithms : Overcoming Multigrid Anisotropy. Applied mathematics and computation (Appl. math, comput.), 84:247-267, 1997. [77] J . L . Steger and R. F . Warming. Flux Vector Splitting of the Inviscid Gasdynamics Equations with Application to Finite Difference Methods. Journal of Computational Physics, 40(263-293), 1981. [78] M . Tadjouddine, S. A . Forth, and N . Qin. Elimination A D Applied to Jacobian As-sembly for an Implicit Compressible C F D Solver. International Journal for Numerical Methods in Fluids, 47:1315-1321, 2005. [79] M . D . Tidr ir i . Preconditioning Techniques for the Newton-Krylov Solution of Com-pressible Flows. Journal of Computational Physics, 132:51-61, 1997. r [80] M . Van Altena. High-Order Finite-Volume Discretisations for Solving a Modified Advection-Diffusion Problem on Unstructured Triangular Meshes. Master's thesis, The University of British Columbia, Mechanical Engineering Department, 1999. [81] H . A . Van Der Vorst. Iterative Krylov Methods for Large Linear Systems. Cambridge • University Press, 2003. BIBLIOGRAPHY 182 [82] B. Van Leer. Towards the Ultimate Conservative Difference Scheme. V . A Second-Order Sequel to Godunov's Method. Journal of Computational Physics, 32:101-136, 1979. [83] B. Van Leer, J . L . Thomas, P. L . Roe, and R. W . Newsome. A Comparision of Numer-ical Flux Formulas for the Euler and Navier-Stokes Equations. A I A A Paper 87-1104, 1987. [84] K . J . Vanden and P. D . Orkwis. Comparison of Numerical and Analytical Jacobians. AIAA Journal, 34(.6):1125-1129, 1996. [85] V . Venkatakrishnan. Newton Solution of Inviscid and Viscous Problems. A I A A Paper 88-0143, 26th Aerospace Sciences Meeting and Exhibit, 1988. [86] V . Venkatakrishnan. O n the Accuracy of Limiters and Convergence to Steady State Solutions. A I A A Paper 93-0880, 31st Aerospace Sciences Meeting and Exhibit, 1993. [87] V . Venkatakrishnan. Convergence to Steady state Solutions of the Euler Equations on Unstructured Grids W i t h Limiters. Journal of Computational Physics, 118:120-130, 1995. [88] V . Venkatakrishnan. A Perspective on Unstructured Grid Flow Solvers. Technical Report I C A S E 95-3, N A S A , 1995. [89] V . Venkatakrishnan and T . J . Barth. Application of Direct Solvers to Unstructured Meshes for the Euler and Navier-Stokes Equations Using Upwind Schemes. A I A A Paper 89-0364, 27th Aerospace Sciences Meeting and Exhibit, 1989. [90] V . Venkatakrishnan and D . Mavriplis. Implicit Solvers for Unstructured Meshes. Jour-nal of Computational Physics, 105(83-91), 1993. [91] V . Venkatakrishnan and D . J . Mavriplis. Implicit Solvers for Unstructured Meshes. A I A A Paper 91-1537, 1991. [92] D . S. Watkins. Fundamentals of Matrix Computations. John Wiley & Sons, 1991. [93] D. L . Whitaker. Three Dimensional Unstructured Grid Euler Computations Using a Fully-Implicit Upwind Method. A I A A Paper 93-3337, 1993. [94] L . Wigton. Application of M A C S Y M A and Sparse Matrix Technology to Multielement Airfoil Calculations. A I A A Paper 87-1142, 1987. BIBLIOGRAPHY 183 [95] D . J . Willis, J . Peraire, and J . K . White. A Quadratic Basis Function, Quadratic Geometry, High Order Panel Method. A I A A Paper 2005-0854, 43rd A I A A Aerospace sciences Meeting and Exhibit, 2005. [96] D. W . Zingg, S. De Rango, M . Nemec, and T . H . Pulliam. Comparison of Several Spatial Discretizations for the Navier-Stokes Equations. Journal of Computational Physics, 160:683-704, 2000.
- Library Home /
- Search Collections /
- Open Collections /
- Browse Collections /
- UBC Theses and Dissertations /
- A higher-order accurate unstructured finite volume...
Open Collections
UBC Theses and Dissertations
Featured Collection
UBC Theses and Dissertations
A higher-order accurate unstructured finite volume Newton-Krylov algorithm for inviscid compressible… Nejat, Amir 2007
pdf
Page Metadata
Item Metadata
Title | A higher-order accurate unstructured finite volume Newton-Krylov algorithm for inviscid compressible flows |
Creator |
Nejat, Amir |
Publisher | University of British Columbia |
Date Issued | 2007 |
Description | A fast implicit (Newton-Krylov) finite volume algorithm is developed for higher-order unstructured (cell-centered) steady-state computation of inviscid compressible flows (Euler equations). The matrix-free General Minimal Residual (GMRES) algorithm is used for solving the linear system arising from implicit discretization of the governing equations, avoiding expensive and complicated explicit computation of the higher-order Jacobian matrix. An Incomplete Lower-Upper factorization technique is employed as the preconditioning strategy and a first-order Jacobian as a preconditioning matrix. The solution process is divided into two phases: start-up and Newton iterations. In the start-up phase an approximate solution of the fluid flow is computed which includes most of the physical characteristics of the steady-state flow. A defect correction procedure is proposed for the start-up phase consisting of multiple implicit pre-iterations. At the end of the start-up phase (when the linearization of the flow field is accurate enough for steady-state solution) the solution is switched to the Newton phase, taking an infinite time step and recovering a semi-quadratic convergence rate (for most of the cases). A proper limiter implementation for higher-order discretization is discussed and a new formula for limiting the higher-order terms of the reconstruction polynomial is introduced. The issue of mesh refinement in accuracy measurement for unstructured meshes is revisited. A straightforward methodology is applied for accuracy assessment of the higher-order unstructured approach based on total pressure loss, drag measurement, and direct solution error calculation. The accuracy, fast convergence and robustness of the proposed higher-order unstructured Newton-Krylov solver for different speed regimes are demonstrated via several test cases for the 2nd, 3rd and 4th-order discretization. Solutions of different orders of accuracy are compared in detail through several investigations. The possibility of reducing the computational cost required for a given level of accuracy using high-order discretization is demonstrated. |
Genre |
Thesis/Dissertation |
Type |
Text |
Language | eng |
Date Available | 2011-01-28 |
Provider | Vancouver : University of British Columbia Library |
Rights | For non-commercial purposes only, such as research, private study and education. Additional conditions apply, see Terms of Use https://open.library.ubc.ca/terms_of_use. |
DOI | 10.14288/1.0080717 |
URI | http://hdl.handle.net/2429/30969 |
Degree |
Doctor of Philosophy - PhD |
Program |
Mechanical Engineering |
Affiliation |
Applied Science, Faculty of Mechanical Engineering, Department of |
Degree Grantor | University of British Columbia |
Campus |
UBCV |
Scholarly Level | Graduate |
AggregatedSourceRepository | DSpace |
Download
- Media
- 831-ubc_2007-267684.pdf [ 19.42MB ]
- Metadata
- JSON: 831-1.0080717.json
- JSON-LD: 831-1.0080717-ld.json
- RDF/XML (Pretty): 831-1.0080717-rdf.xml
- RDF/JSON: 831-1.0080717-rdf.json
- Turtle: 831-1.0080717-turtle.txt
- N-Triples: 831-1.0080717-rdf-ntriples.txt
- Original Record: 831-1.0080717-source.json
- Full Text
- 831-1.0080717-fulltext.txt
- Citation
- 831-1.0080717.ris
Full Text
Cite
Citation Scheme:
Usage Statistics
Share
Embed
Customize your widget with the following options, then copy and paste the code below into the HTML
of your page to embed this item in your website.
<div id="ubcOpenCollectionsWidgetDisplay">
<script id="ubcOpenCollectionsWidget"
src="{[{embed.src}]}"
data-item="{[{embed.item}]}"
data-collection="{[{embed.collection}]}"
data-metadata="{[{embed.showMetadata}]}"
data-width="{[{embed.width}]}"
async >
</script>
</div>
Our image viewer uses the IIIF 2.0 standard.
To load this item in other compatible viewers, use this url:
http://iiif.library.ubc.ca/presentation/dsp.831.1-0080717/manifest