A Higher-Order Accurate Unstructured Finite Volume Newton-Krylov Algorithm for Inviscid Compressible Flows by AMIR NEJAT B.Sc. (Aerospace Engineering), AmirKabir University of Technology, 1996 M.Sc. (Aerospace Engineering), AmirKabir University of Technology, 1998 A THESIS S U B M I T T E D IN P A R T I A L F U L F I L L M E N T O F THE REQUIREMENTS FOR T H E DEGREE OF DOCTOR OF PHILOSOPHY in T H E F A C U L T Y OF G R A D U A T E STUDIES (Department of Mechanical Engineering) T H E UNIVERSITY O F BRITISH C O L U M B I A April 2007 © Amir Nejat, 2007 Abstract A fast implicit (Newton-Krylov) finite volume algorithm is developed for higher-order unstructured (cell-centered) steady-state computation of inviscid compressible flows (Euler equations). T h e matrix-free General M i n i m a l Residual ( G M R E S ) algorithm is used for solving the linear system arising from implicit discretization of the governing equations, avoiding expensive and complicated explicit computation of the higher-order Jacobian matrix. A n Incomplete Lower-Upper factorization technique is employed as the preconditioning strategy and a first-order Jacobian as a preconditioning matrix. T h e solution process is divided into two phases: start-up and Newton iterations. In the start-up phase an approximate solution of the fluid flow is computed which includes most of the physical characteristics of the steady-state flow. A defect correction procedure is proposed for the start-up phase consisting of multiple implicit pre-iterations. A t the end of the start-up phase (when the linearization of the flow field is accurate enough for steady-state solution) the solution is switched to the Newton phase, taking an infinite time step and recovering a semi-quadratic convergence rate (for most of the cases). A proper limiter implementation for higher-order discretization is discussed and a new formula for limiting the higher-order terms of the reconstruction polynomial is introduced. T h e issue of mesh refinement in accuracy measurement for unstructured meshes is revisited. A straightforward methodology is applied for accuracy assessment of the higher-order unstructured approach based on total pressure loss, drag measurement, and direct solution error calculation. T h e accuracy, fast convergence • and robustness of the proposed higher-order unstructured Newton-Krylov solver for different speed regimes are demonstrated via several test cases for the 2nd, 3rd and 4th-order discretization. Solutions of different orders of accuracy are compared in detail through several investigations. T h e possibility of reducing the computational cost required for a given level of accuracy using high-order discretization is demonstrated. Contents 1 Introduction 1 1.1 Motivation 1 1.2 Background 7 1.2.1 Mesh Generation and Spatial Discretization 7 1.2.2 Higher-Order Discretization 10 1.2.3 Implicit M e t h o d and Convergence Acceleration 12 1.3 Objective 15 1.4 Contributions . . 15 1.5 Outline , 2 Flow Solver 16 18 2.1 Governing Equations 2.2 Implicit T i m e - A d v a n c e 2.3 U p w i n d F l u x Formulation 23 2.3.1 T h e Godunov A p p r o a c h . . ." 25 2.3.2 Roe's F l u x Difference Splitting Scheme 26 2.4 18 : Boundary F l u x Treatment 20 28 iii CONTENTS 2.5 iv 2.4.1 W a l l B o u n d a r y Condition 28 2.4.2 Inlet/Outlet Boundary Conditions 29 F l u x Integration 31 3 Reconstruction and Monotonicity 33 3.1 Introduction 33 3.2 K - E x a c t Least-Square Reconstruction 34 3.2.1 Conservation of the M e a n 34 3.2.2 K - E x a c t Reconstruction 37 3.2.3 Compact Support 38 3.2.4 Boundary Constraints 39 3.2.4.1 Dirichlet Boundary Constraint 39 3.2.4.2 Neumann Boundary Constraint 40 3.2.5 3.3 3.4 Constrained Least-Square System : . 40 Accuracy Assessment for a Smooth Function 41 Monotonicity Enforcement 43 3.4.1 Flux Limiting 43 3.4.2 Slope L i m i t i n g 44 4 Flux Jacobian 52 4.1 W h a t is the Jacobian ? 52 4.2 F l u x Jacobian Formulation 55 4.2.1 Roe's F l u x Jacobian . . . •. • 57 CONTENTS 4.2.2 ' v B o u n d a r y F l u x Jacobians 60 4.2.2.1 W a l l Boundary F l u x Jacobian 61 4.2.2.2 Subsonic Inlet F l u x Jacobian 4.2.2.3 Subsonic Outlet flux Jacobian 64 4.2.2.4 Supersonic Inlet/Outlet F l u x Jacobians 65 ; ,61 4.3 Finite Difference Differentiation 65 4.4 Numerical F l u x Jacobian 68 5 Linear Solver and Solution Strategy 71 5.1 Introduction 71 5.2 G M R E S Linear Solver 73 5.2.1 T h e Basic G M R E S Algorithm 74 5.2.2 Matrix-Vector Products Computation in G M R E S 77 5.2.3 Preconditioning 80 5.2.4 G M R E S with Right Preconditioning 5.3 Solution Strategy f ' 84 5.3.1 Start-up Phase 86 5.3.2 Newton Phase 88 t t 6 Results(I): Verification Cases 6.1 83 90 Reconstruction Test Cases 90 6.1.1 Square Test Case . , 91 6.1.2 Annulus Test case : 94 CONTENTS vi 6.2 Subsonic Flow Past a Semi-Circular Cylinder 97 6.3 Supersonic Vortex 112 6.3.1 Numerical Solution 115 6.3.2 Solution accuracy measurement . 7 Results(II): Simulation Cases 7.1 124 Subsonic Airfoil, N A C A 0012, M = 0.63, a = 2 . 0 ° 7.1.1 8 117 Solution Process 124 • • • • 1 2 7 7.2 Transonic Airfoil, N A C A 0012, M = 0.8, a = 1 . 2 5 ° 153 7.3 Supersonic flow, D i a m o n d airfoil , M = 2.0, a = 0.0 161 Concluding Remarks 169 8.1 Summary and Contributions 169 8.2 Conclusions 171 8.3 Recommended Future Work 173 8.3.1 173 Start-up 8.3.2 . Preconditioning 8.3.3 Reconstruction 173 8.3.4 Extension to 3 D 174 8.3.5 Extension to Viscous Flows 174 Bibliography . 173 175 List o f Tables 1.1 Qualitative illustration of research on solver development ' 4 5.1 Ratio of non-zero elements in factorized matrix . . 82 6.1 2nd-order error norms for the square case 91 6.2 3rd-order error norms for the square case 92 6.3 4th-order error norms for the square case 93 6.4 2nd-order error norms for the annulus case 94 6.5 3rd-order error norms for the annulus case 94 6.6 4th-order error norms for the annulus case 6.7 Sizes and ratios of the control volumes for circular cylinder meshes 6.8 Error norms for total pressure, 2nd-order solution 106 6.9 Error norms for total pressure, 3rd-order solution . . . 106 . 95 98 6.10 Error norms for total pressure, 4th-order solution 109 6.11 Solution error norms, 2nd-order discretization 6.12 Solution error norms, 3rd-order discretization 121 6.13 Solution error norms, 4th-order discretization 121 vii : . . 121 LIST OF 7.1 viii TABLES Mesh detail for N A C A 0012 airfoil 124 7.2 Convergence summary for N A C A 0012 airfoil, M = 0.63, a = 2° 7.3 Lift and drag coefficients for all meshes and discretization orders, N A C A 0012, M = 0.63, a = 2°, far field size of 25 chords 7.4 ' 142 Effect of the far field distance oh lift and drag coefficients, N A C A 0012, M 7.5 131 = 0.6.3, a = 2° . . .' Convergence summary for N A C A 0012 airfoil, M = 0.8, a = 1.25° 143 155 7.6 • Lift and drag coefficients, N A C A 0012, M = 0.8, a = 1.25° 156 7.7 Convergence summary for diamond airfoil, M = 2.0, a = 0° 164 7.8 Drag coefficient, diamond airfoil, M = 2.0, a = 0° 164 List of Figures 1.1 M a i n approaches in fluid dynamics 3 1.2 C F D overall algorithm 5 1.3 Example of a structured and an unstructured mesh over a 2 D airfoil 2.1 Propagation of a linear wave in positive direction 24 2.2 Shock-Tube problem 26 2.3 R o u n d i n g the characteristic slope near zero 28 2.4 Schematic illustration of Gauss quadrature points 31 3.1 A typical cell center control volume and its reconstruction stencil, including . . . . three layers of neighbors 8 36 3.2 Imposing boundary constraint at the Gauss boundary points '39 3.3 T y p i c a l unlimited/limited linear reconstruction 46 3.4 Using first neighbors for monotonicity enforcement 47 3.5 T y p i c a l unlimited/limited quadratic reconstruction 50 3.6 Defining a as a function of cf> 51 4.1 Schematic of Direct Neighbors : . ix 55 / L I S T OF FIGURES x 4.2 T y p i c a l cell-centered mesh numbering 57 4.3 Total numerical error versus perturbation magnitude 67 5.1 Linearization of a sample function 72 6.1 Unstructured meshes for a square domain 92 6.2 E r r o r - M e s h plot for the square case 93 6.3 Unstructured meshes for a curved domain (annulus) 95 6.4 E r r o r - M e s h plot for the annulus case 96 6.5 Circular domain over half a cylinder, Mesh 1 (1376 C V s ) 97 6.6 Circular cylinder, Mesh 1 (1376CVs) 98 6.7 Circular cylinder, M e s h 2 (5539 C V s ) 99 6.8 Circular cylinder, M e s h 3 (22844 C V s ) 99 6.9 Convergence history for the coarse mesh (Mesh 1) 101 6.10 Convergence history for the fine mesh (Mesh 3) 101 6.11 2nd-order pressure coefficient contours,. Mesh 1 102 6.12 3rd-order pressure coefficient contours, Mesh 1 102 6.13 4th-order pressure coefficient contours, Mesh 1 . . 103 6.14 4th-order pressure coefficient contours, Mesh 3 104 6.15 Pressure coefficient along the axis 104 6.16 Close up of the pressure coefficient along the axis (suction region) . . . . . 105 6.17 Pressure coefficient along the axis, Mesh 3' 105 6.18. Error in total pressure ratio, 2nd-order discretization, M e s h 1 107 LIST OF FIGURES " xi 6.19 Error in total pressure ratio, 3rd-order discretization, M e s h 1 107 6.20 Error in total pressure ratio, 4th-order discretization, M e s h 1 108 6.21 Error in total pressure ratio, 4th-order discretization, M e s h 3 108 6.22 109 E r r o r - M e s h plot for the total pressure 6.23 Drag coefficient versus mesh size 6.24 Drag coefficient versus C P U time 110 -. . Ill 6.25 Annulus, M e s h 1 (108 C V s ) 112 6.26 Annulus, M e s h 2 (427 C V s ) . . 113 6.27 Annulus, M e s h 3 (1703 C V s ) 113 6.28 Annulus, M e s h 4 (6811 C V s ) 114 6.29 Annulus, M e s h 5 (27389 C V s ) . 114 6.30 Convergence history for the coarse mesh (Mesh 1) • 116 6.31 116 Convergence history for the fine mesh (Mesh 5) 6.32 2nd-order M a c h contours for the coarse mesh (Mesh 1) 117 6.33 3rd-order M a c h contours for the coarse mesh (Mesh 1) 118 6.34 4th-order M a c h contours for the coarse mesh.(Mesh 1) 118 6.35 2nd-order density error for the coarse mesh (Mesh 1) 119 6.36 3rd-order density error for the coarse mesh (Mesh 1) 6.37 4th-order density error for the coarse mesh (Mesh 1) • . . . 119 : . . 6.38 Density, 4th-order solution over the fine mesh (Mesh 5) 6.39 E r r o r - M e s h plot for the solution (Density) 6.40 Error versus C P U T i m e 120 120 . 122 123 LIST OF FIGURES xii 7.1 N A C A 0012, M e s h 1, 1245 C V s 125 7.2 N A C A 0012, M e s h 2, 2501 C V s 7.3 N A C A 0012, M e s h 3, 4958 C V s 126 7.4 N A C A 0012, M e s h 4, 9931 C V s 126 7.5 N A C A 0012, M e s h 5, 19957 C V s 7.6 C p over the upper surface after start-up, M e s h l (1245 C V s ) 128 7.7 C p over the upper surface after start-up, Mesh 5 (19957 C V s ) 129 7.8 C P U time versus the grid size, N A C A 0012, M = 0.63, a = 2 ° 7.9 Total work unit versus the grid size, N A C A 0012, M = 0.63, a = 2 ° ' .125 . ^ . : . . . . 127 132 133 7.10 Newton phase work unit versus the grid size, N A C A 0012, M = 0.63, a = 2 ° 133 7.11 Convergence history, N A C A 0012, Mesh 1, M = 0.63, a = 2 ° 134 7.12 Convergence history, Mesh 5, M = 0.63, a = 2 ° 135 7.13 Non-linear residual versus linear system residual, Mesh 3, M = 0.63, a — 2 ° 135 7.14 Linear system residual dropping order, M e s h 3, M — 0.63, a — 2 ° 136 7.15 Eigenvalue pattern for. the preconditioned system, Mesh 3, M = 0.63, a = 2 ° 137 7.16 Condition No. of the preconditioned system, Mesh 3, M = 0.63, a = 2 ° .. 138 7.17 Condition No. of the preconditioned system, Mesh 5, M = 0.63, a = 2 ° . . 138 7.18 Lift coefficient convergence history, N A C A 0012, M e s h 1, M =.0.63, a = 2 ° 139 7.19 Drag coefficient convergence history, N A C A 0012, Mesh 1, M = 0.63, a = 2 ° 140 7.20 Lift coefficient convergence history N A C A 0012, Mesh 5, M = 0.63, a = 2 ° 141 7.21 D r a g coefficient convergence history, N A C A 0012, Mesh 5, M = 0.63, a = 2 ° 141 7.22 N A C A 0012, M e s h 3 (4958 C V s ) 144 '. LIST OF FIGURES xiii 7.23 2nd-order M a c h contours for N A C A 0012 airfoil, Mesh 3, M = 0.63, a = 2 ° 145 7.24 3rd-order M a c h contours for N A C A 0012 airfoil, Mesh 3, M = 0.63, a = 2 ° 145 7.25 4th-order M a c h contours for N A C A 0012 airfoil, Mesh 3, M = 0.63, a = 2 ° 146 7.26 M a c h profile, upper side, N A C A 0012 airfoil, Mesh 1, M — 0.63, a = 2 ° 146 7.27 M a c h profile close up, upper side, N A C A 0012 airfoil, M e s h 1, M = 0.63, a = 2 ° 147 7.28 Mesh 1, close-up at the leading edge region 147 7.29 M a c h profile, lower side, N A C A 0012 airfoil, Mesh 1, M — 0.63, a = 2 ° . . . 148 7.30 M a c h profile, upper side, N A C A 0012 airfoil, Mesh 5, M — 0.63, a = 2 ° 148 7.31 M a c h profile close up, upper side, N A C A 0012 airfoil, Mesh 5, M = 0.63, a = 2 ° 149 7.32 M a c h profile, lower side, N A C A 0012 airfoil, Mesh 5, M — 0.63, a = 2 ° . . . 149 7.33 1 - TA-, upper side, N A C A 0012 airfoil, Mesh 1, M = 0.63, a = 2 ° 150 lower side, N A C A 0012 airfoil, Mesh 1, M = 0.63, a = 2 ° 150 . . . , i too 7.34 •* too 7.35 1- upper side, N A C A 0012 airfoil, Mesh 5, M = 0.63, a = 2 ° . . . . . 151 7.36 1- lower side, N A C A 0012 airfoil, Mesh 5, M = 0.63, a = 2 ° 7.37 M a c h profile at the end of start-up process, M — 0.8, a — 1 . 2 5 ° 154 7.38 Convergence history for N A C A 0012, M = 0.8, a = 1 . 2 5 ° 155 7.39 2nd-order M a c h contours, N A C A 0012, M = 0.8, a = 1 . 2 5 ° 156 7.40 3rd-order M a c h contours, N A C A 0012, M = 0.8, a = 1 . 2 5 ° 157 7.41 4th-order M a c h contours, N A C A 0012, M = 0.8, a = 1 . 2 5 ° 157 7.42 limiter <j) (3rd-order), N A C A 0012, M = 0.8, a = 1 . 2 5 ° 158 7.43 limiter a (3rd-order), N A C A 0012, M = 0.8, a = 1 . 2 5 ° 159 7.44 M a c h profile, N A C A 0012, M = 0.8, a = 1 . 2 5 ° 159 151 LIST OF FIGURES • xiv 7.45 M a c h profile in shock regions, N A C A 0012, M = 0.8, a = 1 . 2 5 ° 160 7.46 Mesh 7771 C V s , diamond airfoil, M = 2.0, a = 0.0 162 7.47 C p at the end of start-up process, diamond airfoil, M = 2.0, a — 0.0 7.48 Convergence history for diamond airfoil, M = 2.0, a = 0 ° . . . . 163 164 7.49 2nd-order M a c h contours, diamond airfoil, M = 2.0, a = 0.0 . 165 7.50 3rd-order M a c h contours, diamond airfoil, M = 2.0, a — 0.0 165 7.51 4th-order M a c h contours, diamond airfoil, M = 2.0, a = 0.0 166 7.52 2nd-order C p , diamond airfoil, M = 2.0, a = 0.0 166 7.53 3rd-order C p , diamond airfoil, M = 2.0, a = 0.0 167 7.54 4th-order C p , diamond airfoil, M = 2.0, a = 0.0 167 List of Symbols Roman Symbols a speed of sound A area, Jacobian matrix b righr hand side (Ax = b) B fixed iteration matrix C center CL lift coefficient CD drag coefficient Cp specific heat at constant pressure, pressure coefficient C specific heat at constant volume D • reconstruction solution vector (derivatives) e specific internal energy E total energy, error F flux vector G Gauss (integration) point h specific enthalpy, mesh length scale H Hessenberg matrix I identity matrix v XV LIST OF SYMBOLS xvi J Jacobian matrix k number of subspace size K order of accuracy, polynomial order, constant in Venkatakrishnan limiter I length of the control volume face, norm L norm, left eigenvector m number of restarts M Mach number, moments, preconditioner matrix, coefficient matrix in reconstruction n normal vector N total number of ... p polynomial order P static pressure, polynomial r residual, distance R gas constant, residual, radius, right eigenvector S slope in higher-order limiter t time, exponent of the weighting in reconstruction T static temperature U solution vector, conservative variables u, v velocity components V • primitive variables, velocity, subspace x, y Cartesian coordinates, unknown vectors z preconditioning vector LIST OF SYMBOLS Greek Symbols a angle of attack 7 specific heat ratio e perturbation parameter A eigenvalue, wave speed p Density a higher-order limiter (f> slope limiter UJ relaxation factor Superscripts i iteration index k, K order, iteration number, subspace number m, n polynomial exponent Subscripts b boundary c center CD central difference CV control volume DB Dirichlet boundary FD forward difference G Gauss point i inner, control volume index LIST OF SYMBOLS in inlet L left side m magnitude min, max minimum and maximum n normal, normalized N neighbor NB Neumann boundary o outer out outlet ref reference R right side RB reconstructed at the boundary Subln subsonic inlet SubOut subsonic outlet t total W wall x, y Cartesian directions Acknowledgments I would like to express my deep appreciation to my research supervisor, D r . C a r l OllivierGooch for all of his guidance, support and patience. His remarkable feedbacks throughout the course of this research were very helpful. I also would like to thank D r . C h e n Greif from the Computer Science Department. Attending his valuable lectures on sparse matrix solvers and having the opportunity to engage in discussions with him, have greatly benefited my research. I am grateful to my all colleagues in the A N S L a b research group in the Mechanical E n gineering Department, especially Chris Michalak, Serge Gosselin, and Harsha Perera, for their computer assistance on many occasions. Finally, my most sincere and deepest appreciation is for my great family, especially my mother "Homa" and my father "Mehdi", whose constant support, encouragement, and love. I have enjoyed all through my life. xix patience Chapter 1 Introduction Prediction of fluid flow quantities, such as pressure, velocity, and temperature, and the study of flow behavior are the main goals in the field of fluid dynamics. Like other science and engineering disciplines, fluid dynamics has greatly benefited from the development of computing technology (numerical algorithms and computational tools) over the last four decades, resulting in the creation of a new born approach in the field known as C o m p u t a tional F l u i d Dynamics ( C F D ) , F i g (1.1). C F D has shown remarkable capability for fluid flow analysis both in academia and industry. C F D has not only made possible the sim- ulation of flows (such as reentry of space vehicles) for which complete analysis (either by theory or experiment) was impossible before, but also has provided a valuable feedback and information source for improving theoretical and experimental fluid dynamics. Increasing computing power over the last two decades has resulted in the development of new computational techniques and algorithms, enhancing the versatility of C F D application. Nowadays, C F D is not just a research tool, and it is used extensively and successfully in industry throughout the design process, from preliminary design to shape optimization. 1.1 Motivation In the field of computational aerodynamics, the final goal is the accurate simulation of the flow field around (and/or inside) complex 3D geometries to compute aerodynamic force coefficients. In the m i d 1980's, Jameson [36] was the first person to compute the three- dimensional flow over realistic aerodynamic configurations using the finite volume technique (a robust conservative numerical approach for discretization of the fluid flow equations over 1 CHAPTER 1. INTRODUCTION 2 general meshes [38]); since then tremendous progress has been made in this area and application of C F D for aerodynamic computations has revolutionized the process of aerodynamic design [27]. T o simulate the flow field around a 3D complex geometry accurately, a C F D package should include three essential parts: 1. State of the art mesh generation capability. M e s h generation or domain dis- cretization, is one of the most important parts (if not the most!) in C F D simulations. Since the discretization of the fluid flow equations is carried out on the mesh, without a good domain discretization the C F D solution can be. very inaccurate. Furthermore generating an appropriate mesh, especially for complex geometries in practice is the most time consuming part for a C F D user and certainly is not a trivial task. According to a real aerodynamic case study in aerospace engineering, the mesh preparation time can be up to 45 times larger than the required computation time for the fluid flow simulation [47]. Therefore, it is both desirable and necessary to reduce the mesh preparation time and there is a large potential to gain by automating this process. Ideally, meshing software should be able to generate a geometrically and physically suitable mesh around or inside a complex 3D geometry with a reasonable user workload. Also the user should be able to refine the mesh according to geometric parameters and to adapt the mesh based on flow features without excessive effort. T h e unstructured mesh technique, among other types of mesh generation methods, is a very good candidate to address mesh generation issues due to its automation capability in generating meshes for complex geometries and its flexibility in refinement and adaptation. 2. Accurate physical modeling of high Reynolds number turbulent flow. Most practical engineering applications such as aircraft aerodynamics involve turbulent flows. T h e Direct Numerical Simulation (DNS) of turbulent flows for practical pur- poses is not feasible at least for the next couple of decades due to computing technology limitations (memory and speed). Therefore modeling the turbulent flow is the only viable approach in high Reynolds number C F D simulations and will remain a major active research area for the foreseeable future [7, 75]. Discussing the physical and numerical criteria for choosing an appropriate turbulence model and related issues are beyond the scope of this thesis, but accurate physical modeling is a key part of valid C F D simulation for practical engineering problems..It should be noted that the modeling of the physical phenomena in C F D simulations is not just limited to turbulence, but is also essential for combustion, multiphase flows, hydromagnetic flows, and other types of fluid flows. CHAPTER 1. INTRODUCTION 3 Figure 1.1: M a i n approaches in fluid dynamics 3. A robust, efficient, accurate flow solver for the generic mesh. Numerical solution of the fluid flow equations is what C F D is all about and this solution must be stable and converge to the correct answer at a reasonable cost. A solver algo- rithm includes three separate components: discretization of the fluid flow equations, numerical flux computation, and updating the solution mostly-through time advance methods. F i g (1.2) shows overall schematic of a C F D algorithm. Current C F D flow solver algorithms still have some limitations both in terms of efficiency and accuracy • especially for simulation of physically complicated flows. Specifically, the memory and speed of current computers (even using parallel techniques) do not yet allow us to simulate physically complex flows in realistic geometries, with sufficient accuracy, in a short time and at a reasonable cost. A s a result, we have to simplify the physics of the fluid flow v i a approximate modeling, and neglect some of the numerical/physical issues caused by insufficient mesh density (especially for complex geometries), limiting the validity of the simulation and adversely affecting its application. Before proceeding forward with further details, it should be mentioned that this thesis is aimed only at the third component mentioned above improving the efficiency and accuracy of C F D algorithms. T h e numerical error in a simulation can be written in the form of h , p where h is the mesh length scale and p is the discretization order of accuracy. Clearly, then, improving the accuracy of a numerical simulation (modeling issues aside) is possible by means of increasing CHAPTER 1: INTRODUCTION Order Second-Order Higher-Order 4 Structured Structured Unstructured Unstructured Explicit Implicit Explicit Implicit * * ? ******* ** ** Table 1.1: Qualitative illustration of research on solver development the mesh density (using smaller grid or decreasing h), a n d / o r increasing the discretization order. Reducing mesh length scale can be achieved by global or local mesh refinement or adaptation. Nearly all modern C F D codes use second-order methods, which produce a diffusive error proportional to h? due to diffusive derivatives beside possible added artificial viscosity for stability purposes (in central difference schemes). For instance in a second-order 2D finite volume formulation in the Cartesian coordinates this numerical error for each control volume can be written in the following form: d U Ax 2 Numerical diffusive error = 2 dx 2 2 dU 2 dxdy 3U Ay 8y 2 2 Ax Ay + 2 2 (1.1) T h i s leading-order error term causes two significant numerical problems. First, it smears sharp gradients in convection dominated parts of the flow and spoils the conservation of , total pressure in isentropic regions of the flow field as it acts like a (numerical) viscosity. Second, this numerical diffusivity produces parasitic error in viscous regions by adding extra diffusion, which is very grid dependent, to the solution. Therefore using a high resolution numerical scheme for discretization is quite desirable. Application of discretization orders larger than second-order both for structured and unstructured grids has been an area of ongoing research for the last two decades [35, 10], and will be the focus of this thesis. However, convergence of high resolution schemes is not as efficient as second-order schemes especially for unstructured grids [88, 46] due to the increased complexity of the discretization, decreased damping (lack of diffusive damping) and adding more error modes (which must be damped in the solution process). Consequently implementing a higher-order un- structured discretization within an implicit framework to achieve the efficient convergence is extremely helpful if not necessary! A s shown in Fig(1.2), up to the flux integral computation, the overall C F D algorithm is the same, but the choice of the time advance technique in updating the solution changes the level of complexity of the solution process completely. Integration of the discretized equations in time can be done either explicitly or implicitly. In the explicit time integration the space discretization is performed at the previous time CHAPTER 1. INTRODUCTION Geometry & Solution Domain 5 Mesh Generation Package Physics & Fluid Flow Equations Discretized Domain Boundary & Initial Conditions Discretization of the Fluid Flow Equations over the Discretized Domain Flux Integral Explicit Time Advance Implicit Time Advance Multistage Techniques Solution Update Flux Integral Linearization Preconditioning Sparse Matrix Solver Solution Update Figure 1.2: C F D overall algorithm CHAPTER 1. INTRODUCTION' 6 level using the known flow quantities found at the previous time iteration. In the implicit time integration both the space and the time discretizations are performed at the current time level where the flow quantities are needed as unknowns. E q u a t i o n (1.2) shows a typical unsteady fluid flow P D E where the right hand side represents the spatial discretization and the left hand side shows the time derivative. For example, employing the first-order forward differencing time advance technique in the explicit and implicit forms leads to E q . (1.3) and E q . (1.4). IT = -R{U) u Explicit time advance: 1 n ^ — T j n + l Implicit time advance: '2) - u n + 1 ' ( 1 _ ^ — l - = -R(U?) (1.3) jjn l - = -R(U? ) (1.4) +1 W h i l e an explicit update just needs multistage integration of the flux integral using a RungeK u t t a type scheme, an implicit update requires linearization of the flux integral (Chapter 2) and constructing a large linear system which requires an efficient sparse matrix solver (discussed in Chapter 4 and 5). Efficiently solving a large linear system, especially with an ill conditioned matrix resulting from a higher-order discretization, demands effective preconditioning which adds to the complexity of the process. B u t the error reduction of each solution update in implicit integration is far larger than the explicit one, since implicit methods do not suffer from the stability issues of explicit methods and large time steps can be taken. O n balance, therefore, it is preferable to bear the complexity of the algorithm and accelerate the solution toward the steady-state in a relatively small number of implicit iterations. Table 1.1 provides a qualitative summary on finite volume solver development research, where the number of symbols represents the approximate volume of the research on the solver development since earjy 80's (based on the author's survey). Clearly the trend in solver development is moving toward: 1. Unstructured meshes to address the mesh generation issues for complex geometries 2. Higher-order discretizations to increase the global solution accuracy 3. Implicit techniques to improve the efficiency of the solution process T h i s research is mainly intended to contribute to the development of efficient and accurate flow solvers filling the gap in high-order implicit methods, on unstructured meshes. CHAPTER 1. INTRODUCTION 7 1.2 Background T h i s section provides an overview of relevant aspects of current C F D solvers including discretization type and order as well as implicit algorithms. T h i s overview is complemented by detailed discussion of previous work related to each part of the solver in the relevant chapters of the thesis. 1.2.1 M e s h G e n e r a t i o n a n d S p a t i a l D i s c r e t i z a t i o n Numerical simulation of fluid flow consists of two main parts: discretization of the flow field around or inside the geometry by a finite number of cells (grid generation) and solution of the fluid flow equations over the discretized domain (flow solver). Structured and unstructured meshes, F i g (1.3), are the most common types of the grids used in C F D applications. In a structured mesh, all cells and vertices have the same topology but in an unstructured mesh, elements can have irregular and variable topologies. T h e task of generating structured grids around complex configurations has proved to be a considerable challenge. Sophisticated structured approaches such as multi-block grid generation has resolved this issue by dividing the domain between the body and far field into simple geometrical blocks; structured grids are generated inside each block. procedure is still a relatively difficult job [76]. However, automation of the blocking Another structured approach is overlap or chimera grids. In this technique, the computational domain is divided into multiple zones and a suitable grid is generated in each zone. T h e chimera approach allows zones to overlap, and interpolation routines are used to transmit data between the overset grids in the flow solver. However, generalizing the grid generation and adaptation in this approach is still not an easy task and demands a high level of expertise as well as considerable effort. Furthermore, interpolation between the blocks and overlapped meshes has its own issues and can introduce additional error. T h e most powerful approach for complex geometries is unstructured grids (typically triangular in 2D and tetrahedral in 3D). Unstructured grids have a higher flexibility in refinement based on the geometry and adaptation based on the solution features and gradients. Therefore, unstructured meshes are one of the most suitable choices for complex geometries; associated solvers are becoming more common in modern C F D applications and promise to be more capable and successful for complex aerodynamic problems [88]. T h e fluid flow equations ( P D E s ) generally are discretized in one of the following forms: finite difference, finite element or finite volume. Finite difference is the point wise repre- sentation of the flow field where the flow equations are solved only for variables defined at CHAPTER 1. INTRODUCTION (a) Structured 8 (b) Unstructured Figure 1.3: Example of a structured and an unstructured mesh over a 2D airfoil grid points. The finite difference scheme was the original approach to the C F D problems and it is well suited for structured grids. Therefore, its higher-order implementation can be easily achieved by employing higher-order differencing formula. However, the finite difference discretization does not conserve mass, momentum .and energy of the flow which is an important issue for most of the practical applications such as shock capturing. Furthermore and more importantly the finite difference discretization can not be implemented on unstructured grids. The finite element method, another discretization technique, is one of the most complete and well established mathematical approach for numerical solution of PDEs. In this method, the flow equations are multiplied by a test function and then integrated over the discretized domain. The solution is represented by a local basis function (interpolation function) for each element. Finite element method is very flexible both in terms of theory and application and it can be easily used for unstructured meshes. Its high-order extension is also fairly common by employing higher-order basis and test functions. The challenging part in application of the finite element for C F D computation is again the conservation of the flow equations especially for non-smooth flows. Although conserving the mass, momentum and energy is possible in the finite element formulation but it is not an easy task, and finite element codes require considerable fine tuning in shock capturing. The finite volume approach is designed based on the conservation of mass, momentum CHAPTER 1. and energy. INTRODUCTION 9 T h e solution is represented, by control volume averages and equations are discretized over the volume integrals. Like the finite element method, it is very flexible for complex geometries and unstructured meshes. A t the same time its robustness for nearly all C F D applications especially shock capturing problems is well established. Higherorder application of finite volume methods is possible by using a higher-order polynomial inside each control volume which the polynomial average integral over the control volume represents the control volume average. Jameson and Mavriplis [37] reported some of the earliest unstructured finite volume C F D results, solving two-dimensional inviscid flow on regular triangular grids obtained by subdividing quadrilateral grids; central differencing was used. T h e i r approach was second-order (linear distribution). T h e artificial viscosity and the second-order truncation error made the mentioned approach relatively high diffusive for general irregular unstructured meshes. Second order upwind schemes (discretization of the flow equations based on the physical waves propagation directions) have also been used on unstructured grids either through GreenGauss gradient technique or least-squares linear reconstruction method. A p p l y i n g upwind schemes for unstructured grids is more complicated than for structured grids especially for higher-order approximation. For the unstructured case, over each finite volume (triangle in 2D) a polynomial approximation to the solution is reconstructed with the help of the neighboring control volumes, and then the Riemann (shock tube) problem is solved approximately at the control volume interfaces. One of the most successful approaches in applying the upwind scheme was undertaken by Barth[10]. In this approach, B a r t h defined a general upwind formulation (in multi-dimensions), introducing the m i n i m u m energy least-square reconstruction procedure for flux calculation up to the desired order of accuracy. However, any upwind scheme higher than first-order (which is monotone by its nature), often causes oscillations in the vicinity of sharp gradients and discontinuities which can produce , instability problems. For example, Agarwal and Halt [6] proposed a compact higher-order scheme for solution of Euler equations over unstructured grids using explicit time integration. Considerable over and undershoots in their transonic airfoil case (3rd-Order) were evident. A common solution to that is using limiters, which enforce monotonicity at the expense of adversely affecting both accuracy and convergence. B a r t h and Jespersen introduced a multi-dimensional limiter to achieve the monotonic solution. [13] Although that approach was quite successful in suppressing oscillations, reaching full convergence was not possible even after freezing the limiter because the limiter was not differentiable. Since then several attempts have been made to design a differentiable limiter for unstructured grid solvers, and Venkatakrishnan's limiter [87] seems to be one of the most robust. T h a t limiter does not strictly enforce monotonicity but allows only small overshoots in the con- CHAPTER 1. INTRODUCTION 10 i verged solution. However, it preserves accuracy especially for smooth regions where there are local extrema. Although this limiter shows better convergence behavior, it still has some convergence issues with implicit solvers. Designing an appropriate limiter for higher-order unstructured grid solvers is a fairly unexplored topic so far and needs to be addressed for practical higher-order unstructured application. Another more sophisticated approach to cure oscillations in compressible flow computation is the essentially non-oscillatory ( E N O ) family of schemes. These schemes are uniformly accurate and prevent oscillations in the non-smooth regions by detecting discontinuity and modifying the reconstruction stencil from cell to cell and time level to time level [30]. E N O schemes are computationally expensive and sacrifice fast convergence because of their dynamic stencils [34]. Weighted E N O ( W E N O ) schemes were developed to address the problems caused by dynamic stencils. Near discontinuities, weighted E N O schemes ([3, 57, 28]) remove the effect of non-smooth data in the reconstruction stencil by giving it an asymptotically small weight. However no comprehensive convergence analysis a n d / o r computational cost studies are presented. Furthermore, performance of E N O / W E N O schemes in the context of implicit time advance, which is one of the most efficient solution strategies, has not yet been studied. 1.2.2 Higher-Order Discretization For structured meshes, application of higher-order algorithms has progressed considerably and it has been shown that, for practical levels of accuracy, using a higher-order accurate method can be more efficient both in terms of solution time and memory usage. With higher-order accurate methods, the cost of flux computation, integration, and other associated numerical calculations increase per control volume. However, as we can use a coarser mesh, computation time and memory are saved overall and accuracy can be increased as well. De Rango and Zingg [66] applied a globally third-order accurate algorithm for steady turbulent flow over a 2-D airfoil using a structured grid. T h e y showed this approach can lead to a dramatic reduction in numerical error in drag using relatively coarse grids, and the results provide a convincing demonstration of the benefits of higher-order methods for practical flows. Zingg et al. [96] compared different flux discretization techniques with higher-order accuracy for laminar and turbulent flows (including transition) both in subsonic and transonic speed regimes. Extending the conclusion of the previous research, it was shown that the higher-order discretization produces solutions of a given accuracy much more efficiently than the second-order methods. More aspects of implementation of higherorder methods have been discussed by De Rango and Zingg [67], and convergence behavior CHAPTER 1. INTRODUCTION 11 of this approach has been studied in detail. Again a higher-order algorithm has been applied for calculation of the flow around the multi-element airfoil using the multi-block grid technique by the same researchers [68]. A grid convergence study in this research showed that the higher-order discretization produces a substantial reduction in the numerical errors in the flow field in comparison with the second-order algorithm. T h i s smaller error has been achieved on a grid several times coarser than the grid which had been used for the secondorder algorithm. In summary, these studies show that achieving the desired accuracy in practical aerodynamic flows using higher-order algorithms not only is possible but also has some advantages. Research in high-order unstructured solvers is motivated by the desire to combine the accuracy and efficiency benefits seen in the application of high-order methods on structured meshes with the geometric and adaptive flexibility of unstructured meshes. T h e application of methods having higher than second-order accuracy for solving the compressible Euler and Navier-Stokes equations on unstructured meshes has not been thoroughly investigated yet and remains an active research topic. Several researchers have achieved higher-order accuracy by the use of the finite element method. Bey and O d e n [17] have used a discontinuous Galerkin method to reach fourthorder accuracy for smooth flows. Bassi and Rebay [15] have used a new discontinuous element for discretization of Euler equations and have computed the compressible flow over a simple unstructured grid. T h e finite volume approach has received more attention, B a r t h and Fredrickson [12] derived a general condition for a scheme to be higher-order accurate, including a reconstruction procedure satisfying the properties of conservation of mean, K-exactness and compact support (these criteria are discussed in detail in Chapter 3). T h e y also proposed a m i n i m u m energy (least-square) reconstruction to calculate the required polynomial coefficients. Delanaye and Essers [23] proposed a quadratic reconstruction finite volume scheme for compressible flows on unstructured adaptive grids. T h e overall accuracy of the scheme was second-order. T h e inviscid flux was computed directly from their quadratic polynomials; however, diffusive derivatives were obtained through a linear interpolation. For monotonicity enforcement, a discontinuity detector was introduced and higher-order terms in reconstructed polynomials were dropped in the vicinity of discontinuities. Ollivier-Gooch and V a n A l t e n a [60] have analyzed a new approach for higher-order accurate finite-volume discretization for diffusive fluxes that is based on the gradients computed during solution reconstruction; fourth-order accurate solution for the advection-diffusion problem has been computed. Recently Nejat and Ollivier-Gooch [48] developed an implicit higher-order unstructured solver for Poisson's CHAPTER 1. INTRODUCTION 12 equation. T h e y clearly showed the possibility of reducing computational cost required for a given level of solution accuracy using higher-order discretization over an unstructured stencil for certain fluid problems. 1.2.3 Implicit Method and Convergence Acceleration Flow features, especially in physically complicated flows, vary greatly in size. In particular, time and length scales associated with physical phenomena like turbulence and combustion can be very disparate. T h e use of millions of cells, which is common today in practical simulations, results in global length scales spanning the computational domain which are several orders of magnitude larger than the smallest scales resolved by neighboring grid points. W i t h such disparate length scales, and such large numerically stiff problemes, efficient time integration and solution convergence are real challenges for the solution of the resulting discrete systems. Generally explicit integration methods (multi-stage R u n g e - K u t t a family schemes), even with the help of acceleration techniques, such as local time stepping and residual smoothing, still show slow convergence behavior for steady-state solution of large a n d / o r stiff C F D problems. Implicit methods are a fairly common and efficient approach for the steady-state solution of the fluid flow equations. Regardless of the space discretization technique, finding solutions to fluid flow problems requires solving a large linear system resulting from the linearization of fluid flow equations in time (temporal discretization). In the limit of very large time steps, implicit time advance schemes approach Newton's method. Newton methods, which will be discussed in Chapter 5, have been used in C F D since the late 80's and are considered an attractive approach for solution convergence of steady flows due to their property of quadratic convergence (when starting from a good initial guess). In early attempts direct methods were employed for solving the linear system arising at each Newton iteration [85, 89, 8]. While direct solvers have been developed for stiff linear systems, the size of the systems of equations arising in C F D makes applying direct methods impossible in practice. Therefore, using iterative linear solvers with proper preconditioning, which is a crucial factor for complex problems, is the only reasonable choice for solving the linear system in each Newton iteration. A t the same time, the cost of each iteration in terms of C P U time and memory usage for a pure Newton method is relatively high. Quasi-Newton methods can have satisfactory convergence behavior, lower memory usage and less cost per iteration at the expense of increasing total number of Newton iterations and losing quadratic convergence rate [62]. Quasi-Newton methods are generally categorized as Approximate Newton and Inexact Newton methods. CHAPTER 1. INTRODUCTION 13 In Approximate Newton methods the flux Jacobian on the left hand side (arising from linearization) either is computed through some simplifications or is evaluated based on lower-order discretization, while the flux integral on the right hand side is evaluated up to the desired order of accuracy. In either case the linearization is done approximately and although the Jacobian matrix has simpler structure and is better conditioned (i.e. to invert), the overall convergence rate of the non-linear problem will be degraded. easier This approach is also known as the defect correction technique, and it is useful when there are memory limitations and storing the full Jacobian is impractical, specifically in 3D. If the true Jacobian is very stiff and solving the linear system of the true linearization is challenging, Approximate Newton may work better than the original Newton method. T h i s is often true in the early stage of Newton iterations when a good starting solution is not yet available. Therefore, Approximate Newton is a very good candidate for the start-up process, especially if the solution process is started from a poor initial guess. In the second category, Inexact Newton methods [25], the complete linearization based on the flux integral on the right hand side is employed and the true Jacobian is calculated. However, the resultant linear system at each Newton iteration is solved approximately by an iterative linear solver. For highly non-linear problems such as compressible flows, the linearized system, especially at the initial iterations is not an accurate representation of the non-linear problem. A s a result, completely solving the linear system does not improve the overall convergence rate. Instead, the linear system is solved up to some tolerance criteria which normally is chosen as a fraction (typically between 1 0 _ 1 and 1 0 ) of the flux integral - 2 on the right hand side. A m o n g iterative linear solvers, K r y l o v subspace methods are the most common, and amongst these, the Generalized M i n i m a l Residual ( G M R E S ) [72] algorithm (visited in section 5.2 in detail) has been developed mainly for non-symmetric systems such as those resulting from unstructured meshes. In the matrix-free G M R E S , the matrix vector products required by the G M R E S algorithm are computed without forming the matrix explicitly. Matrix-free G M R E S [14] is a very attractive technique for dealing with complicated Jacobian matrices, because it reduces memory usage considerably and eliminates the problem of explicitly forming the Jacobian matrix. T h i s is especially helpful for higher-order unstructured mesh solvers where full (analytic) Jacobian calculation is extremely costly and difficult, if not impossible. G M R E S efficiency depends strongly on the conditioning of the linear system. T h i s is especially important for higher-order discretization, which makes the Jacobian matrix more off-diagonally dominant and quite ill-conditioned, and for the Euler equations (compressible flow) with the non-linear flux function and possible discontinuities in the solution. A p p l y i n g a good preconditioner for G M R E S under these circumstances becomes a CHAPTER 1. INTRODUCTION 14 necessity [49]. A s a result, there are a wide variety of Quasi-Newton methods, where the forming of the Jacobian, solving the linear system, choosing the preconditioner and starting up the solution process are the key factors in overall performance and robustness of the solver. For structured meshes, Pueyo and Zingg [63, 64, 65] presented an efficient matrix-free N e w t o n - G M R E S solver for steady-state aerodynamic flow computations. T h e y investigated the efficiency of different Quasi-Newton methods, Incomplete Lower-Upper (ILU) preconditioning strategies (see section 5.2.3), and reordering techniques for a variety of compressible inviscid, laminar and turbulent flows using a G M R E S iterative solver. T h e y showed that the Approximate Newton method using matrix-free G M R E S - I L U ( 2 ) with Reverse CuthillM c K e e reordering, when the lower order Jacobian is employed as the preconditioner matrix, provides the best efficiency in terms of C P U time for most cases. Later on Nichols and Zingg [54] developed a 3D multi-block Newton-Krylov solver for the Euler equations using the same approach. T h r o u g h parametric study, they showed that I L U ( l ) provides an adequate balance between good preconditioning and low computational time. For unstructured meshes, Venkatakrishnan and Mavriplis [90] developed an approximate N e w t o n - G M R E S implicit solver for computing compressible inviscid and turbulent flows around a multi-element airfoil. T h e y compared different preconditioning strategies and found out that G M R E S with I L U preconditioning had the best performance. In their case the graph of the linearized Jacobian and the unstructured mesh were the same, as the Jacobian was approximated based on the direct neighbors, and ILU(O) demonstrates satisfactory result. B a r t h and Linton [14] successfully applied both full matrix and matrix-free N e w t o n - G M R E S for computing turbulent compressible flow on unstructured meshes. T h e y also presented a technique for constructing matrix-vector products which is an exact calculation of the directional derivatives. Delanaye et al. [24] presented an I L U preconditioned matrix-free N e w t o n - G M R E S solver for Euler and Navier-Stokes Equations on unstructured adaptive grids using quadratic reconstruction. T h i s study shows ILU(O) in preconditioning of stiff problems when the Jacobian on the left hand side is higher-order can be insufficient for reaching full convergence. B y permitting one more fill-level in the I L U decomposition ( I L U ( l ) ) , full convergence was achieved. A totally matrix-free implicit N e w t o n - G M R E S method was introduced by L u o et al. [44] for 3D compressible flows using L U - S G S (LowerUpper Symmetric Gauss-Seidel) as the preconditioning strategy. T h e y completely elimi- nated the storage of the preconditioning Jacobian matrix by approximating the Jacobian with numerical fluxes. However, most probably because of the stability .consideration for their preconditioning strategy ( L U S G S ) , the full Newton iterations were not performed and CHAPTER 1. INTRODUCTION 15 the convergence rate remained nearly linear. Manzano et al. [45] presented an efficient I L U preconditioned matrix-free N e w t o n - K r y l o v ( G M R E S ) algorithm for 3D unstructured meshes. T h e y used different levels of fill (ILU(l-3)) depending on the case and the flux residual to achieve optimum performance. Nejat and Ollivier-Gooch [50] developed a L U - S G S preconditioned matrix-free N e w t o n - G M R E S algorithm for higher-order inviscid compressible flow computations. T h e i r results show that L U S G S - G M R E S works almost as efficiently for the third-order discretization as for the second-order one. T h e y also presented a supersonic airfoil case in which the third-order discretization converged faster than the second-order discretization. T h a t again raised the possibility that using higher-order discretization could in fact increase both accuracy and efficiency. 1.3 Objective T o develop an efficient and accurate algorithm for a compressible viscous flow (i.e. NavierStokes equations), the first step is to develop the base flow solver for a compressible inviscid flow (i.e. Euler equations). It is worth mentioning that the extension of an inviscid solver to the viscous solver will not affect the overall algorithm in principle, although application of anisotropic/hybrid meshes as well as computing the viscous flux function (instead of inviscid one) are required. T h e goal of this research is to develop an efficient and accurate solution algorithm for inviscid compressible fluid flow simulations using unstructured meshes. T o achieve this objective an existing higher-order unstructured reconstruction procedure [60] is combined with an efficient implicit time advance algorithm. 1.4 Contributions A fast and efficient implicit (matrix-free) algorithm is designed and successfully implemented for the 2nd, 3rd and 4th order accurate steady-state computation of inviscid compressible flows. T h e robustness and accuracy of the developed solver have been verified through several test cases. It should be noted that the 4th-order unstructured finite volume solution for inviscid compressible flows has not been available prior to this research, and the currently available 3rd-order results in the literature lack curved boundary implementation. CHAPTER 1. INTRODUCTION 16 T h e solution process has been divided into two separate phases, a start-up phase and a Newton phase. A defect correction procedure is proposed for the start-up phase consisting of multiple implicit pre-iterations. T h i s research provides a deep insight into preconditioning, a comprehensive convergence comparison, and a meaningful cost breakdown study for the implicit algorithm for the 2nd, 3rd and 4th-order unstructured discretization methods. i A differentiable switch is designed for the limiting-of the higher-order terms in the reconstruction polynomial in the vicinity of discontinuities. Accuracy assessment is performed for a series of independently generated irregular unstructured meshes, which based on the author's knowledge to this extent is unprecedented and proves the overall accuracy of the developed solver algorithm. Unstructured finite volume solution of the 2nd, 3rd and 4th-order methods have been compared in detail for subsonic, transonic and supersonic cases. T h e possibility of reducing computational cost required for a given level of accuracy using high-order unstructured discretization is demonstrated. 1.5 Outline In Chapter 2, the main flow solver is introduced, including governing equations, implicit time advance algorithm (linearization of the flux in time), upwind flux formulation, boundary flux treatment, and flux integration. In Chapter 3, the employed reconstruction procedure, monotonicity enforcement and higherorder limiter implementation are discussed; a new formula for limiting the higher-order terms of the reconstruction polynomial is introduced including a smooth switch. In Chapter 4, the Jacobian matrix calculation used in preconditioning is described. A firstorder approximate analytical Jacobian is derived for the interior and boundary fluxes. Also, a similar low-order Jacobian computation using the finite difference technique is introduced in addition to further discussion on the accuracy of this finite difference Jacobian. Chapter 5 lays out the solution strategy and introduces the matrix-free' G M R E S linear solver (with special higher-order considerations) and the applied preconditioning in detail. CHAPTER 1. INTRODUCTION 17 Chapter 6 and 7 are devoted to the numerical results which include the verification cases (Part I) and simulation cases (Part II) respectively. First the accuracy and performance of the developed solver are verified for basic (not necessarily easy) test cases. T h e n fast convergence and robustness of the proposed higher-order unstructured Newton-Krylov solver have been investigated for subsonic, transonic, and supersonic flows. Furthermore solutions of different orders of accuracy for all test cases have been compared in detail. Finally, in Chapter 7, the thesis is brought to closure by summarizing the research, describing the contributions, providing conclusions based on the results, and recommending some future work. Chapter 2 Flow Solver 2.1 Governing Equations Conservation of mass (continuity), momentum (Newton's second law) and energy (first Thermodynamics's Law) are the principal equations which govern the dynamics of all fluid flows. T o simulate a fluid flow field, we need to solve these equations with proper implementation of physical boundary conditions. If dissipation, viscous transport phenomena, and thermal conductivity are neglected in a fluid flow, the flow is considered to be non-viscous or inviscid. Leonhard Euler, the Swiss scientist for the first time derived (1775) inviscid fluid flow equations, known today as Euler equations. Euler equations are the limit form of the Navier-Stokes equations, where Reynolds number goes to infinity. For many practical aerodynamic applications, Euler flow is a relatively accurate representation of the flow field and includes both rotational and discontinuous (shock) phenomena in the flow providing an excellent approximation for lift, induced drag and wave drag. Also, a robust Euler solver is an essential part of any Navier-Stokes solver. T h e finite volume formulation of the unsteady 2D Euler equations for an arbitrary control volume can be written in the following form of a volume and a surface integral (2.1), where U is the solution vector in conservative variables, and F is the flux vector . 1 (2.1) 1 M o r e formally, F should be referred as a dyad. 18 CHAPTER 2. FLOW 19 SOLVER where p u =. pu , F= pv puu n + Pn pvu n + Pn E In (2.2), u n = un x + vh (E + and [p pu pv E] 1 y momentum and energy, respectively. x (2.2) y P)u n are the densities of mass, x-momentum, y- T h e energy is related to the pressure by the perfect gas equation of state, (i.e. P = pRT). T h e following equations (or definitions) correlate thermodynamic properties such as energy and enthalpy with density, temperature, pressure and velocity in the Euler equations: E = C p 7 = e = C T, v r - P (u + P 2 + v) 2 (2.3) 1 7-1 ^ R R Cy 1 e = e+ ~(u A 2 t (2.4) 7-1' + v ), E = pe , 2 t In (2.3), E is the total energy and 7 is the ratio of specific heats. constant, C p and C v P h=e+- , p (2.5) In (2.4) R is the gas are the specific heats. Finally in (2.5) e is the internal energy per unit mass (specific energy), et is the total energy per unit mass, and h is the specific enthalpy. For a compressible flow, most of the flow properties are described as a function of M a c h number, M, which is a non-dimensional form of the flow speed, and it is defined as the ratio of velocity magnitude to the local sound speed, a. = V( 7 l)h = ^RT (2.6) Normally for aerodynamic applications, density is in the order of 1, velocity is in the order of 10 , and energy is in the order of 10 . A s a result, the solution vector, U, and the flux 2 5 vector, F, (2.2) in their dimensional form have quantities with very different scales. T h i s CHAPTER 2. FLOW SOLVER 20 introduces tremendous potential for numerical errors, as well as unnecessary stiffness of the resultant linearized system of equations (next section). T o avoid that issue we need to normalize our system of equations, in such a way that all quantities become order of one. T h i s can be done by normalizing the quantities respect to some properly chosen reference values, like free stream values. While normalization does not change^the form of the original equations, it transforms them to the proper non-dimensional form. However, some of the correlations between the properties may look different. For example, in the next couple of lines we show part of this process for Euler equations: Reference quantities: Pief - Poo, T re{ = Too, P r e f = 7P00, F-ref = Roo = Const. (2.7) Non-Dimensional quantities: p„ = A" Pref T = ^ , n P = -£-, n ref R = ~ n ref = l (2.8) -^ref It is not difficult to show that the speed of sound in its. non-dimensional form is equal to the square root of normalized temperature, i.e. a n = y/Tn . In general, the mathematical characteristics of the fluid flow equations depend on the flow speed regime and they change with M a c h number. For instance the system of steady Euler equations are elliptic in space for subsonic flow and they are hyperbolic in space for supersonic flow. If the fluid flow equations are written in their unsteady form, they would be hyperbolic in time independent of the speed regime. A s a result, it is relatively easy to integrate fluid flow equations in time instead of space. T h e set of unsteady Euler equations are a first-order nonlinear P D E , which make a hyperbolic system in time, and therefore properties propagate along the characteristic lines. T o obtain a steady-state solution time marching process would be continued up to the point that all time derivatives reach some tolerance criteria. 2.2 Implicit Time-Advance Assuming the discretized physical domain does not change in time, U can be brought out from the integral in E q . (2.1), as the average solution vector of the control volume: CHAPTER 2. FLOW SOLVER 21 ^ =--L-/ dt AcViJcSi FdA (2.9) T h e left hand side of (2.9) is the first time derivative of the average solution vector in each control volume and the right hand side represents the spatial discretization for the same control volume. Right hand side of (2.9) is called flux integral or residual of the control volume, which is a nonlinear function of solution vector, and can be re-written in the form of (2.10) for control volume i : d § = -R(u>) (2.10) T o form an implicit expression for (2.10), we need to evaluate both sides of the equation at the time level "n+1". T h i s can be done by backward time differencing (backward Euler) of the left hand side and residual linearization in time for the right hand side. n+l At UJ BR 1 -R(U? ) = - R(ur) + —(u?+\ +1 ur) + o((u? -u?) ) +i 2 (2.11) or (ii w) + SUi = ~ > ? = ?+ > R(un u +l u 6U where | ^ is the Jacobian matrix resulting from residual linearization. (- ) 2 12 Equation (2.12) is a large linear system of equations which should be solved at each time step to obtain an' update for the vector of unknowns. W i t h this approach the accuracy in time is first order; however, as we are only interested in steady-state solution, time accuracy is not our concern, and advancing the solution in time continues till the residual of the non-linear problem practically converges to zero. In E q . (2.12), the Jacobian matrix, is an essential element in implicit formulation, and setting Jacobian equal to zero is equivalent to explicit time advance. Note that in the case of Euler flux, Jacobian calculation is not a trivial task and it takes a considerable amount of effort. T h e degree of difficulty of Jacobian evaluation depends on the type of. the employed flux formula and its complexity. T h e details of Jacobian evaluation will be discussed in 22 CHAPTER 2. Chapter 4. Having the Jacobian computed, solving the resultant large linear system in FLOW SOLVER each time iteration which is often very ill conditioned and off-diagonal is another part of implicit solvers. Due to size and sparsity structure (graph) of the matrix, subspace iterative methods are the most appropriate choice for solving such a system (Chapter 5). Coding an implicit formulation is much more complicated than similar explicit formulation and efficient parallelization of implicit solvers is a relatively difficult process. However, implicit methods have a critical advantage over explicit methods which is stability and fast convergence. Implicit methods do not suffer from restrictive stability condition of explicit methods. In theory they are unconditionally stable and by taking large time steps steadystate solution is reached rapidly. If we take an infinite time step E q . (2.12) becomes Newton iteration (2.13). Newton's method generally converges quadratically in the vicinity of solution. However, near singularities the convergence rate may degrade from quadratic rate to linear rate [14]. ( )6U Wf i = -R(U?), U^ +1 = Ur + SUi (2.13) In practice, taking a time step too large is not beneficial when the solution process is started from a poor initial guess. Because Euler flow is highly non-linear, the linearization of the flow equations is not an accurate representative of the original non-linear problem at an early stage of the iterations. Therefore any update based on a large time step (such as Newton update) would be inaccurate, often causing instability or stall in convergence. Normally, a start-up process is needed to advance the solution from an initial guess to a good solution state for taking large time step and eventually switching to Newton iteration. One strategy for this start-up process is defect correction, in which the flux integral on the right hand side is computed to the desired order of accuracy while on the left hand side the linearization is done based on a lower-order discretization (normally first-order): (2.14) W i t h this approach, the higher-order Jacobian computation which is very expensive and inaccurate in early stage of iterations is avoided. A s the linear system based on lower-order discretization has simpler structure in terms of the matrix bandwidth and is better conditioned, solving the resultant.system is easier and more stable especially for stiff Jacobians. T h e defect correction linear system is relatively inexpensive to form and it can be preconditioned effectively by the same matrix. In addition, early high-frequency oscillations are CHAPTER 2. FLOW SOLVER 23 •damped efficiently when a moderate time step is used which is necessary anyhow for the start up process.. Another start-up technique is using E q . (2.12) based on higher-order Jacobian but taking small time steps at early stage of solution process. W i t h this approach At is small, the linear system would be diagonally dominant or at least better conditioned, and consequently easier to solve. T h e drawback is in the case of complicated flows like transonic flow, higher-order Jacobian would be very ill-conditioned, and to be able to solve the linear system we need to choose very small At, which may not accelerate the solution convergence efficiently. In practice one may combine the above strategies with mesh sequencing, starting the solution process on coarser meshes and then transferring it to a final fine mesh. First-order solution also could be used as an initial guess either independently or in combination with other strategies. In general, it is hard to say which strategy or combination will work best for the start-up process as it depends on the problem and type of linear solver (Chapter 5). 2.3 Upwind Flux Formulation T o compute the control volume residual or flux integral, we need to evaluate the numerical flux at Gauss points located on control volume boundaries and integrate along the boundary edges. In other words, residual or flux integral computation includes three steps: 1. Reconstructing solution vector V (in primitive variables) over each control volume up to the desired accuracy (next chapter). 2. C o m p u t i n g the numerical flux vector at the control volume interfaces based on the reconstructed V . . 3. Integrating numerical flux along the control volume interfaces using Gauss quadrature rule. T o describe an upwind scheme, it is helpful to start from a first-order linear wave equation, Eq.(2.15), where A is the wave speed. For a positive value of A, this represents the propagation of a wave from left to the right (positive direction) along the x axis, as it is shown in Fig(2.1). (2.15) CHAPTER 2. FLOW SOLVER 24 Propagating Wave i-1 -•— KM i I X u 2 X Figure 2.1: Propagation of a linear wave in positive direction Based on the physics and the model equation, it is evident that the information in the field is propagating with the wave speed and in the wave direction. Therefore, the properties at point i are influenced by the propagated properties originating from point i—1 and properties at point i + 1 will not physically affect the properties at point i. A s a result it makes sense that in discretization of wave equation, | ^ is replaced by U i ~£ ~ x l or by any other type of backward differencing. It is well understood that using central or forward differencing would result in non-physical a n d / o r oscillatory behavior and typically instability in solution. In other words, any numerical scheme used in discretization of the fluid flow equations should respect the physical characteristics of the fluid flow, i.e. the velocity and direction of the propagated information throughout the flow field. The unsteady Euler equation, which is hyperbolic in time, behaves somewhat similarly to the wave equation, although it is much more complex, has non-linearity, and is in vector form. In Euler (inviscid compressible) flow, information travels along characteristic lines, and the slope of these lines are the eigenvalues of the Jacobian matrix, which is the derivative of the Euler flux vector with respect to the solution vector, E q (2.16 ). CHAPTER 2. FLOW SOLVER 25 n 0 9F_ _ ( 7 - l)efc"x - uun dU ( u vn ( ( 7 - l)efc - h )u hn n t n t u n - ( 7 - l)vn x - (7 - l ) u n x x (7- l ) n x y (7 - y y u „ - ( 7 - 2)vn y - (7- l ) u u x 0 y - (7 - 2)un n _ l ^ n , , - vu 7 n x n hn t - (7 - l ) u u y n l)n (2.16) 71% e — e + e , e = -(u + v ), h —.h + e 2 t k 2 k t k (2.17) a u„ These eigenvalues, E q (2.17), describes the direction and speed of disturbance (information) propagation in the flow field. Obviously, the upwinding direction is determined based on the direction of these characteristic lines. Several categories of upwind methods such as F l u x Vector Splitting ( F V S ) , F l u x Difference Splitting ( F D S ) and Advection Upstream Splitting Method ( A U S M ) have been developed during the last three decades for flux evaluation and have been successfully employed in compressible flow computation. Hirsch [35] and Laney [41] reviewed upwind schemes in detail. 2.3.1 The Godunov Approach S . K . Godunov in 1959 introduced an elegant idea in numerical flux calculation [35]. H e suggested the use of the locally exact Euler solution for the boundary interfaces of domain cells. T h e solution in each cell is considered to be piecewise constant, and the cell interface separates two different solution states, i.e. UL and UR (Fig ( 2.2)). K n o w i n g the initial solution, evolution of the flow to the next time step can be exactly calculated through the wave interactions originating at the boundaries between two adjacent cells. T h i s leads to solving the shock tube or R i e m a n n problem. T h e Riemann problem has an exact solution composed of a shock wave, a contact discontinuity and an expansion fan separating regions with constant solution states. Propagating the aforementioned waves over a time interval A t changes those regions a n d their states from one solution at t = t\ to t = t\ + At. A l l waves propagate consistently with the upwind directions, making the resultant solution depend only on the physical zone of dependence. A s shock tube problem requires solving a non-linear algebraic equation, which is quite time consuming, implementing an exact CHAPTER 2. FLOW SOLVER 26 t=tj Discontinuity Interface u P R L> R P t=ti+At Contact'Surface Expansion Waves 1:L LU — ' Shock Wave LH — 4:R I Figure 2.2: Shock-Tube problem Riemann solver for upwind flux evaluation would not be efficient . Therefore, different 2 approximate Riemann solvers have been developed to reduce the computation cost through some averaging procedure. Roe's approximate Riemann solver, introduced in the early 1980's [69] is one of the most successful and commonly employed approximate Riemann solver in modern C F D is used in this research. 2.3.2 R o e ' s F l u x Difference S p l i t t i n g Scheme Roe's approximate R i e m a n n solver [70] is based on a characteristic decomposition of the flux differences and is a clever extension of the linear wave decomposition to the non-linear shock tube problem keeping the conservation properties of the scheme. T h e detail of Roe's scheme is quite complicated [41] and here for brevity just the flux formulation is presented. T h e Roe's flux formula at the interface of a control volume is equal to the average fluxes of left and right states minus a differencing term which splits the difference of the fluxes on both sides of the control volume: F(U , U ) = \ \F{U ) + F(U )} - \ \A\ (U - U ) L 2 R L R R L (2.18) T h e r e is some evidence suggesting that an exact Riemann solver for polytropic gases can be competitive with most approximate techniques [31]. CHAPTER 2. FLOW SOLVER 27 A is the Jacobian matrix evaluated based on the Roe's average properties, and A is written in diagonalized form in practice as: A x- X A A = Diag(Xi) (2.19) where X are the right eigenvectors and A are the eigenvalues of the Jacobian matrix [71]. Roe's average properties are given by [69]: P = sJpLPR JpLUL + Jp~RU R VP~L + \/PR \/p~LVL + \/~PR R V VPL + \/PR y/p~L tL + j~PR~h h h, = VPL tR (2.20) + s/PR Although Roe's scheme is one of the most accurate available flux functions, it is quite an expensive method for flux computation and due to its matrix differencing term it requires 0 ( n ) operations per control volume in each iteration (n is number of equations). If a real 2 gas is used, then differencing term should be derived specifically based on the new equation of state, which could introduce some difficulty in derivation of differencing matrix. A t the same time applying the Roe's flux formula in implicit context, requires differentiation of differencing matrix which has proven to be very tricky and expensive even for perfect gas (Chapter 4). A n d finally some empirical cure needs to be implemented in Roe's formulation in the case that M a c h number goes towards one (sonic speed) or when M a c h number goes to zero (stagnation point). In either case, we are dealing with eigenvalues equal to zero and the method can not distinguish the correct upwind direction resulting either non-physical solution (entropy reduction) or lack of convergence due to its non-differentiable flux. A d d i n g a small value of e to eigenvalues, or rounding the slope of characteristic lines near zero, which is the taken approach in this thesis E q . (2.21), could totally resolve that issue (Fig(2.3)). if |Aj| < C c u t 0 ff A = (A + C ) Oi 2 c i 2 (2.21) CHAPTER 2. FLOW SOLVER 28 Eigenvalue correction 0.2 - \ 0.15 \ •Cut-Off=0.l|/ " \ 0,1 0.05- X Figure 2.3: Rounding the characteristic slope near zero 0.5 C \ =TT-— , C =C 2 c 2 ut o f f , C c u t off « 0A(typically) ^cut off 2.4 Boundary F l u x Treatment Implementing correct boundary conditions is a necessary condition for numerical fluid flow computation. Generally speaking, any kind of mistake in boundary condition treatment would result in either convergence problem a n d / o r inaccurate solution. T h i s is especially true if the imposed boundary condition does not match the flow physics. For an implicit solver it is also essential to formulate the boundary treatment implicitly both for the main solver and for the preconditioner (Chapter 4), or performance of the implicit solver will be degraded severely in addition to introducing a restrict C F L limitation. Normally boundary conditions are categorized as wall boundary and inlet/outlet boundary conditions, which are discussed in the following subsections. 2.4.1 Wall Boundary Condition In steady-state inviscid flow, there is a slip velocity condition: velocity normal to the wall surface is zero (i.e. velocity at the boundary is parallel to the surface). A s a result the CHAPTER 2. FLOW SOLVER 29 convective flux normal to the surface is zero (flow does not cross the wall) and the pressure flux in the momentum equations is the only remaining flux. B o t h the continuity and energy fluxes are zero at steady-state. T h e easiest way to implement wall boundary flux in Euler flow is to set U n — 0 in the flux formula. There are other alternative wall treatments like reflective boundary conditions using ghost cells, or reconstruction of flow properties (Chapter 3) with a constraint of U n = 0. A l l those treatments more or less produce the same result as in all cases the convective flux is forced to be zero. In the current research, normal velocity is set to be zero using a proper constraint i n reconstruction, and pressure is reconstructed up to the desired order of accuracy at the wall. 2.4.2 Inlet/Outlet Boundary Conditions A s mentioned in section 2.3, the numerical modeling should be consistent with the physical characteristics of the flow field, and with the information propagating along characteristic lines (according to eigenvalues of the Jacobian matrix). T h e inflow and outflow boundary condition implementation are completely different for subsonic and supersonic flows, therefore the first thing is to determine the type of boundary condition. T h a t can be easily done by computing the flow quantities (primitive variables) at the boundary using reconstructed values and working out the M a c h number and flow direction with respect to the boundary surface. K n o w i n g the type of the boundary, a proper boundary treatment needs to be carried out. For a subsonic inlet three eigenvalues are positive, meaning that three characteristic lines are transferring inlet information from inflow toward the solution domain. T h e remaining eigenvalue is negative at subsonic inlet i.e., one set of information travels from the solution domain towards the inlet. In this situation three parameters, in our case, total pressure, total temperature, and angle of attack are set at the subsonic inlet and static pressure is taken from the interior. T h e following formulas show the detail of computing the velocity components and density at subsonic inlet. T, = l + l ^ A 4 , P« = i ( l + l ^ M i ) * , 1^=1.0 P„ = H (2.22) (2.23) CHAPTER 2. FLOW SOLVER 30 T B P^PRB = T (^^j 1 (2.24) t : Reconstructed Pressure at the B o u n d a r y V -=V4T(I- 11 Ub = V cos(a), (225) Vb = V sin(a) m m • (2.26) a : Angle of Attack Pb = l^ (2-27) A t a supersonic inlet all eigenvalues are positive which means all boundary fluxes are computed based on the inlet properties ( a o o = 1)- Pb = Pin — Poo , Pb = Pin = Poo u = u b in = M cos(a), Vb = v oa in = Moosm(a) (2.28) (2.29) A t a subsonic outlet, three eigenvalues are negative and one is positive (normal is toward the solution domain) and therefore three pieces of information are taken from the solution domain (through reconstruction) and only the static pressure is fixed at the outlet as the back pressure. Pout = PRB , U PBP O U T = URB — Poo , Vout = VRB , for external flow Pout — PBP (2.30) CHAPTER 2. FLOW SOLVER 31 2nd-Order / 1 Gauss point per face 3rd and 4th-0rder / 2 Gauss points per face Figure 2.4: Schematic illustration of Gauss quadrature points In the case of supersonic outflow, since all eigenvalues are negative, all the information is coming from the solution domain towards the boundary, and no condition is set at the supersonic outlet. , Pout = PRB , U O U T = URB , Vout = VRB , Pout — PRB r (2-31) In all cases that implementing boundary condition requires properties to be taken from the solution domain, the higher-order reconstruction of those properties have been used to insure the correctness of accuracy of boundary condition treatment. 2.5 Flux Integration T o compute the residual for each control volume, numerical fluxes should be integrated over control volume edges. T h e flux residual is the summation of these flux integrals. T h e accuracy of flux integration should be equal or higher than the accuracy of flow reconstruction in order to evaluate the residuals with higher-order accuracy. quadrature integration. T h i s is achieved by Gauss For the 2nd-order or linear reconstruction case one quadrature / CHAPTER 2. FLOW SOLVER point per face is used. 32 For the 3rd-order (quadratic reconstruction) and 4th-order (cubic reconstruction) cases, we use two quadrature points per face with the proper weightings. In the case of a curved boundary, the quadrature points are located on the curved arc segment instead of the straight edge for higher-order flux integration. T h i s is due to the fact that mistaking a curved segment with a straight edge will introduce a 2nd-order error (i.e. 0(5S ), S is the arc length) in both computing and integrating the fluxes'at curved 2 . boundaries. F i g (2.4) shows the quadrature points for the flux integration schematically. More information regarding the locations and weights of the Gauss quadrature integration points is provided in Ref. [80]. Chapter 3 Reconstruction Procedure and Monotonicity Enforcement 3.1 Introduction T o compute a higher-order solution, numerical fluxes should be evaluated with higher-order accuracy containing two separate steps: first, calculating all flow variables anywhere inside a control volume up to the desired order of accuracy; second, higher-order integration of the evaluated fluxes over boundary edges using the proper number of Gauss points. The latter is relatively easy, but higher-order approximation of flow variables inside a control volume is neither unique nor trivial. T h i s task could be done by implementing higher-order upwind differencing, a n d / o r higher-order representation of the solution inside a control volume (i.e. solution reconstruction) using polynomials with degree one or higher [82]. In structured discretization, this leads to the development of M U S C L (Monotone Upstreamcentered Schemes for Conservation Laws) [35] . For structured meshes, higher-order re- construction of a solution is much easier due to mesh regularity. However, in the case of unstructured meshes a more complicated procedure needs to be adopted to represent the solution with higher-order accuracy within a control volume. Barth [13] laid out an straight forward procedure for gradient estimation for unstructured tessellation using Green-Gauss gradient technique, Because of its simplicity and robustness, the linear Green-Gauss reconstruction has been employed extensively in finite volume solvers since then. B a r t h also developed a m i n i m u m energy (i.e. Least-Squares) reconstruction procedure capable to computing the flow variables up to ( K - f l ) t h - o r d e r of accuracy using 33 CHAPTER 3. RECONSTRUCTION AND MONOTONICITY 34 a Kth-order polynomial known as K-exact reconstruction [10, 12, 11]. T h e least-square reconstruction is relatively more difficult to implement but in general the linear least-square reconstruction is more accurate especially in the case of distorted meshes a n d / o r in the presence of limiter [4]. T h a t is expected since the least-square technique is not as sensitive as the Green-Gauss method to noise in the solution. Delanaye [22] has compared the accuracy of the linear and quadratic reconstruction techniques for a smooth flow (Ringleb's flow) on unstructured meshes. T h e linear least-squares reconstruction performed better than the linear Green-Gauss reconstruction both in terms of accuracy and preserving the total pressure (i.e. less entropy production). T h e quadratic Green-Gauss and least-square reconstructions behaved very similarly, although this time the total pressure loss was a bit higher for least-square method. In this research a K-exact least-squares reconstruction [60, 56] is employed which is described in the next section. In section 3.3 the accuracy assessment methodology is studied. Finally, section 3.4 discusses the monotonicity enforcement and introduces the taken limiting approach to address that issue. 3.2 K-Exact Least-Square Reconstruction T h e goal here is to reconstruct a solution polynomial based on the control volume data such that the truncation error of the solution in each control volume would remain in the order of (Ax) K where (Ax) is the local mesh length scale. Obviously, certain physical and mathematical criteria need to be met to reconstruct such a polynomial for each control volume. These criteria are: 1- Conservation of the Mean, 2- K-Exact Reconstruction, 3- Compact Support which are introduced briefly in next three subsections. In the last two subsections the implementation of boundary condition at Gauss points using boundary constraints, and solving the least-square system will be discussed. 3.2.1 Conservation of the M e a n T h e finite volume approach provides us piecewise constant data in each control volume which is equal to the integral control volume average. T h i s integral average is updated from one iteration to the next and is the most meaningful measure of the physical quantities in each control volume. A Kth-order piecewise polynomial U R such a way that the integral of U \x,y) K R (x,y) can be reconstructed in over the control volume is equal to the control CHAPTER 3: RECONSTRUCTION AND MONOTONICITY volume average E q . (3.1). 35 T h i s is a very important property called Conservation of the Mean. f U (x,y) = U v Jcv (3.1) K) R U £\x,y) C = { (x - x ,y - y) c (3.2) c m+n</c P( , )(x-x ,y-y ) m n c = (x- x ) (y - y ) m c (3.3) n c c (x ,y ) are the coordinates of the reference point for the control volume, which in our case c c (cell center formulation) is the cell centroid, F i g (3.1). In general, oc^ ^ are the derivatives of a continuous function represented by the recon- mn (U \x,y)) K structed solution polynomial with respect to the independent geometric vari- R ables (i.e. x and y) : U \x,y) K R = U(x ,y ) + c c dU Ax+ — &y + dx c v c dU dU + dxdy AxAy + dy d du 2 dx 2 du Ax c 3 dx 2 c 2 Ay 2 + 2 Ax 3 3 2 du c dU dxdy Ax Ay 2 3 2 dx dy 2 3 AxAy 2 2 2 c dU Ay ~dy~ c 3 + 3 3 (3.4) where, Ax = x-x , c Ay = y-y (3.5) c Imposing the conservation of the mean condition, or mean constraint is accomplished by integrating of U \x,y) K R over the control volume i and making that equal to the control volume average i.e. E q . (3.1). For simplicity all the derivative terms in the following equations are replaced by simpler expressions like: U xy = . Figure 3.1: A typical cell center control volume and its reconstruction stencil, including three layers of neighbors CHAPTER 3. RECONSTRUCTION AND MONOTONICITY ; 37 Equation (3.6) can be rearranged by introducing control volume moments definitions: Ui = U + -]-(U \ M c x c + x C X U \M3 XXX C y c y + U y\ M y + l Uyy\ M 2 ) + U \ XX U \ M ).+ C X + - U y\ M 1 X XX C y C x + - U yy\ M 2 y X C xy + ~ Uyyy^MyZ ) + (3.6) where, M x m y n \= f (x- x ) (y - y ) dA m n c (3.7) c JCVi These moments can be computed up to the desired order of accuracy by Gauss's theorem. Notice that replacing m and n in E q . (3.7) by zero computes the area of the control volume. T h e introduced mean constraint is the first row (equation) in the least-square system. 3.2.2 K-Exact Reconstruction Consider a problem which is smooth and has an exact solution in the form of C/^(a;,j/). If (u \x,y)\ K R is a Kth-order polynomial reconstructing an approximate solution for the problem over control volume i, then the error in solution reconstruction for the control volume i would be in the order of 0(Axi) k+1 U (x,y)) K) R E q . (3.8). \k+l (3.8) =U (x,y)+0(L\x, E In other words (U ^ (x, y)^ reconstructs the solution exactly if the exact solution is a K t h K r order polynomial, i.e. UE{X, y) = Pg (x, y). T h i s property is called K-Exact Reconstruction. T h i s property also can be expressed in the integral form: Error\ . cv = U (x,y)dA- U (x,y)dA K) E CVi R JCVi 0{Ax ) k+l t (3.9) CHAPTER 3. RECONSTRUCTION AND MONOTONICITY 3.2.3 38 Compact Support Notice that the reconstructed polynomial, control volume to another. (U (x, y)^ K) R , is piecewise and changes from one Therefore integrating the reconstructed function of a control volume over its neighbors does not satisfy the average value of the neighboring control volumes exactly. For the Kth-order solution reconstruction, we are reconstructing a ( K - l ) t h degree polynomial which satisfies the mean property for the control volume i and minimizes the error in computing control volume averages of the neighboring control volumes. The neighboring control volumes used in reconstruction constitute the reconstruction stencil, F i g (3.1). T h e reconstruction stencil should include a proper number of neighboring control volumes for computing the mentioned derivatives. T h e number of unknowns for the linear, quadratic, and cubic reconstructions are 3, 6 and 10 leading to 2nd, 3rd and 4th-order accuracy. Generally speaking, for a K - t h order accurate reconstruction, at least K[K neighboring control volumes must be included in the reconstruction stencil. + l)/2 However, for practical reasons one may use a few more control volumes in the reconstruction stencil. T h i s leads to a larger least-squares system but provides additional information for reconstruction usually resulting in more robust solution reconstruction in the presence of non-smooth and/or vigorously oscillatory data. In this research, the reconstruction stencils for 2nd, 3rd, and 4th-order methods include* 4, 9 and 16 control volumes respectively. the polynomial (u \x, K R y)^ for control volume i Not surprisingly, should be reconstructed based on control volumes that are topologically and physically close to the control volume i. T h i s property is called Compact Support. Consequently, the 4th-order reconstruction stencil covers three layers of the neighboring control volumes. T h i s could adversely affect the of the reconstruction for very coarse meshes. compactness Since for very coarse meshes the distance spanned by the reconstruction stencil is uncomfortably close to the characteristic size of the domain, full convergence might not be achieved as reconstruction stencils of the boundary control volumes overlap considerably with the reconstruction stencils of interior control volumes. Because the size of the stencil and the associated cost of the reconstruction increase quadratically with the order of accuracy, there is a practical limit to the benefits of the classic finite-volume higher-order reconstruction. In reconstructing a non-smooth data in the vicinity of a discontinuity such as shock wave even when we use a compact stencil geometrically, the stencil is not compact physically as the data employed in reconstruction are irrelevant for control volumes across the shock wave. There are two different approaches to address this issue. One is choosing the best stencil containing the smoothest possible data a n d / o r down playing the role of non-smooth data in reconstruction by assigning small weights to them leading to E N O (Essentially N o n Oscillatory) and W E N O (Weighted Essentially N o n Oscillatory) schemes [33, 43, 3, 57]. CHAPTER 3. RECONSTRUCTION AND MONOTONICITY 39 Figure 3.2: Imposing boundary constraint at the Gauss boundary points T h e other passive approach is to employ a limiter and enforce monotonicity by correcting (reducing) the computed gradients, suppressing over and under-shoots resulting from reconstruction over a non-physically compact stencil [13, 87, 24]. T h e latter approach has been taken in this research. 3.2.4 ' Boundary Constraints In addition to the Mean Constraint, it is also possible to include additional boundary constraints to the reconstructed polynomial satisfying physical boundary condition at the solution domain boundaries [80, 60]. T h e straightforward way of doing this is imposing the boundary constraint at Gauss points where fluxes are actually calculated. Therefore, the coefficients of the reconstructed polynomial are computed such that the physical boundary condition is imposed automatically. A s a result, number of boundary constraints per control volume is equal to the number of control volume Gauss points on boundary edges, Fig(3.2). 3.2.4.1 D i r i c h l e t B o u n d a r y Constraint For a Dirichlet boundary condition, the value of the flow variable is known at the boundary. Consequently, the reconstructed polynomial should give us the same boundary value at the Gauss points (3.10): CHAPTER 3. RECONSTRUCTION (xGi,yG ) 3.2.4.2 MONOTONICITY T(K) U (x ,yGi), = 1 AND DB =U U (xG ,yG ) n) Gl R 2 40 2 (3.10) (xG ,yG ) DB 2 2 Neumann Boundary Constraint In Neumann boundary condition the normal derivative of the flow variable is known at the boundary edge (3.11). Therefore, we need to take the normal derivative from the reconstruction polynomial at the boundary Gauss point and equate that to the known normal derivative provided by Neumann boundary condition (3.12 and 3.13): dU U, NB dn dU ' \x K R — G l ,y ) G l ( -("x)G! + dx [ K R 2 2 C)G dx 3.2.5 M f f W y c , ) , . , „ ^ (%JGi = dy dU dU \xG ,yg 2 (3.11) R \xa ,y 2 + G 2 -{h ) y dy G2 = U , , UMB^XG^yGi) N B { X G 2 , V G 2 (3.12) (3.13) ) Constrained Least-Square System Knowing the constraints and the neighbors involved in the polynomial reconstruction for control volume i , we can set up the least-square system in the form of Eq. (3.14): MiC MiC MC M;C BiC ,\ BiC 2 BiC2fl BiC2 4 ^,1^1,1 1»i lNi Wi,lNi Wi,\Ni ltl lt2 2 t 2: t t2 1>3 M . • hq BiC q 2t t t3 I MiC A . • • Wi.iNi^ Ua \ dU_ dx dU_ dy dU OX? c Bd Wi,iU Nl 2 m,2N ,i Wi,2N 2 ^1,2^2,3 Wi,2N ,4 • ^,3-^3,1 Wi,sN m,3N Wi, N . Wi,nN ,i Wi,nN ,2 2 n 2< 3t2 n 3)3 Wi,nN n>3 2 3 3A Wi,nN 4 nt . Wi N2 q • t )2 w N • i<3 Wi,2&N 2 dU dx 2 3>q 1Ui,nNn,q dU K m,3UN 3 \ m, UN n n (3.14) (3.15) XQi - XCJ J CHAPTER 3. RECONSTRUCTION AND MONOTONICITY 41 In E q . (3.14) the first two rows constitute the mean and boundary constraints which have to be enforced exactly. T h e rest of rows are additional equations arising from integration of the i control volume's polynomial over its neighbors {N\, A^, J V 3 , N } n which need to be satisfied in the least-squares sense, minimizing the L2 norm of the system residual. O f course, for interior control volumes there is no boundary constraint and the second row does not appear in the least-square system. the least-square system. A weighting strategy is also considered in forming Weights can be a function of geometric parameters a n d / o r the solution data [10, 57]. T h e main purpose of these weights is to control the influence of the neighboring solution data on reconstruction. In this research only geometric weighting in the form of E q . (3.15) with t — 1 is used, weakening the effect of the solution data far from the reconstructed control volume by the geometric distance ratio. T o satisfy the constraint exactly, Gaussian elimination with column pivoting is performed on (3.14) for as many rows as the number of constraints in the system. T h e remaining leastsquare system is reduced to a upper-triangular system using Householder transformation [92] with the proper scaling, and finally the system is solved through backward substitution. 3.3 Accuracy Assessment for a Smooth Function Error norms are the most common measure for accuracy assessment of a numerical solution. T h e y provide useful information about over all accuracy of the numerical solution as well as local source of the (maximum) error. For a 2D finite volume formulation, the general form of the error norm can be computed by E q . (3.16), where p is the norm index, N^ is the total number of control volumes in the domain, Ai is the area of the control volume i, and Ei is the average error of the control volume i. (3.16) (3.17) L\ and L2 are the global norms and are good indicators for over all accuracy. However, LQO would be the largest magnitude of the error in the solution domain and it is a local error indicator. max Ei CHAPTER 3. RECONSTRUCTION AND MONOTONICITY 42 Having a proper tool for error measurement, now we proceed with the accuracy assessment discussion. A s previously mentioned (3.2.2), a K-exact reconstruction reconstructs solution to the (K-(-l)th order of accuracy. the T o verify the accuracy of our reconstruction technique, we need to examine it for a smooth function. Here a simple but very common procedure is introduced to assess the accuracy of the numerical solution over discretized domains. Assume a 1-D finite volume problem over a uniformly discretized domain of M l with the length scale of Ax. T h e average numerical solution for an arbitrary control volume i of M l can be written in the following form: °i]m = 4z[ U (x)dx + 0(Ax) where, Ui is the control volume average, Ug(x) (3.18) K E is the exact solution of the problem and K is the discretization order (i.e. the polynomial of (K — 1) degree is used in the solution reconstruction). T h e n the average solution error for each control volume can be expressed as: Therefore a similar relation is held in the case of taking any norm of the error ( | | J E | | P=1 2 over the same domain: ||£| M 1 ||=0(A*)* (3.19) Now, if M 2 represents the discretization of the same problem over the similar domain where the discretization length scale is divided by half ((^p)), the error norm on M 2 is: | £ ? | M 2 | | = 0 ( ^ (3.20) Y and the ratio of these two error norms would be only a function of the discretization order: E \ m \\E\ \ \ _ 0 ( A x ) « M 2 \\- = 2 K ( 3 2 1 ) Oft)* To examine the order of discretization accuracy, employing (3.21), some error norms must be plotted in logarithmic scale versus of mesh size, where the mesh length scale is uniformly CHAPTER 3. RECONSTRUCTION AND MONOTONICITY reduced each time by half. T h e asymptotic slope of the Error-Mesh 43 plot will show the numerical order of accuracy. T h i s analogy can be extended to multiple dimensions as long as the mesh length scale is decreased each time uniformly in each dimension all through the domain. For unstructured meshes and in general case, it is nearly impossible to decrease the mesh length scale uniformly. E v e n if we try to refine the whole mesh globally in a self similar way (all angles and ratios remain the same in the refinement process) it would not be feasible for all triangles (except for structured triangulation of regular geometries). Therefore in general, the best that we can do is using a series of semi-uniform unstructured meshes where the density of mesh each time is increased by factor of four. W i t h this approach we hope that the over all mesh length scale is reduced by factor of two. Based on the E q . (3.21) the ratios of 4, 8, and 16 for error norms are expected for 2nd, 3rd, and 4th-orders of accuracy should we globally reduce the mesh length scale by half. But of course in practice, we are neither dealing with uniform unstructured meshes nor can we refine them uniformly everywhere, and as a consequence getting the nominal order of accuracy for all norms is not always exactly possible. 3.4 Monotonicity Enforcement First-order upwind methods often exhibit poor resolution, and do not resolve physical phenomena in the fluid flow accurately since they introduce considerable amount of numerical diffusion in the solution process. T h i s is due to the nature of these techniques in which the fluxes are computed based on cell average data. Despite the mentioned weakness, they have two great advantages, i.e. first they are monotone and produce non-oscillatory results for whole domain even in the presence of discontinuities and second they are stable and converge very well. High resolution upwind methods (higher than first-order), cure the accuracy problem but they require precautionary measures such as limiting to overcome the oscillatory behavior and stability issue especially for non-smooth flows. 3.4.1 Flux Limiting In central difference schemes, additional terms in the form of artificial viscosity are added to the flux computation such that the accuracy of the solution in the smooth region is not affected, but these terms provide enough diffusivity in the shock region to make it oscillation CHAPTER 3. RECONSTRUCTION AND 44 MONOTONICITY free and stabilize the convergence. One of the most successful effort on this topic was made by Antony Jameson [38] who introduced a second-order differencing term to address the stability problem and a fourth-order differencing term to enhance the convergence of the numerical solution. W i t h this approach the solution in the vicinity of a discontinuity is first order (locally) while the global accuracy is preserved for smooth regions. In principle, high resolution upwind techniques use a similar approach, adopting a low-order resolution near the shock region to take advantage of the monotonicity of the first-order upwind scheme. T h i s idea can be better explained by E q . (3.22), where $ ( [ / flux limiter function and j smooth regions, <f>((7 : j) : j) is the is the control volume stencil from which U is calculated. In should be very close to one and therefore the flux is computed with higher-order resolution. In contrast, in discontinuous regions $(U : j) must be nearly zero to eliminate the higher-order flux, recovering the monotonic low-order scheme. FH(U : j) = F (U L : j) + ${U : j) [F (U : j) - F (U H L : j)] (3.22) In other words, the goal is to provide just enough diffusion to prevent overshoots. One family of schemes that apply this procedure is the Total Variation Diminishing ( T V D ) schemes [35, 42]. In a T V D scheme, fluxes (F(U)) are limited such that the T V , total variation in the solution over the domain decreases with time in the solution process compared to the total variation of the initial data: TV = dU I TV(f7) = ^ | , 7 l + (3.23) dx dx Ju 1 -C/ | (3.24) < TV(U°) (3.25) t i TV(U ) N+L < TV{U ) N A l l T V D schemes result h i monotonic (non-oscillatory) solutions and stable convergence, 3.4.2 Slope L i m i t i n g In a structured discretization where there is a clear and straightforward connection between the mesh data structure and discretization scheme, employing flux limiters to achieve non-oscillatory solution is quite common and successful. However, in general, this clear connection between mesh and discretization method does not exist anymore for unstructured CHAPTER 3. RECONSTRUCTION AND MONOTONICITY 45 meshes and implementing flux limiters are not that straightforward. A t the same time, if the flux function is computed based on Godunov's approach (even for structured discretization), again determining a proper flux limiter is not an easy job. T h e r e is another approach for imposing the T V D condition in the solution process, known as slope or gradient limiting, which is more appropriate for generic mesh discretization schemes and Godunov's approach. In this approach, the computed solution gradient from reconstruction is corrected (reduced) i to meet the monotonicity condition defined by E q . (3.28): U .x = ma,x(Ui,U ) FNj (3.26) = mm(Ui,U ) (3.27) ma FNj E m i n < Ui{x ,y ) G G < C/max where Ui is the control volume average and U a r e first neighbors. (3.28). the control volume averages of the T h i s limiting procedure can be better explained by visualizing a linear reconstruction for a simple 1-D case. F i g (3.3-a) illustrates a typical unlimited linear re- construction solution for 1-D control volume averages. Using a slope limiter will correct the overshoots and undershoots in the solution reconstruction by reducing the slope of the reconstruction as shown by F i g (3.3-b). In theory, E q . (3.28) should be valid for all points inside the control volume i, but in practice this condition will be checked and satisfied (if necessary) only for Gauss points where the actual fluxes are computed. Assuming a linear reconstruction (2nd-order method), for the control volume i (Fig(3.4)), the unlimited reconstructed value at the Gauss point G is written in the form of E q . (3.29): U = U(x ,y ) G U(x , c y) c c c + VU\ r c (3.29) G is the value of the flow variable at the cell center, which in this case (2nd-order) is also the average value of the control volume. T h e V £ / | computed from the reconstruction c procedure needs to be adjusted according to the monotonicity condition (3.28) by a scalar value 4> called as the slope limiter: U G = U{x , c y ) + & VU\ c C f G (3.30) T h e goal is to determine the largest acceptable value for <fo in such a way that the computed CHAPTER 3. RECONSTRUCTION AND MONOTONICITY (a) A n unlimited linear, reconstruction \ i-2 i-1 i i+l i+2 (b) A limited linear reconstruction Figure 3.3: Typical unlimited/limited linear reconstruction 46 CHAPTER 3. RECONSTRUCTION AND MONOTONICITY 47 Figure 3.4: Using first neighbors for monotonicity enforcement reconstructed value for U at all Gauss points are bounded by the m a x i m u m and minimum of the neighboring control volume averages (including the control volume i average): mm 4>G = 3 min I 1, 1, if (U Gj if <j>i = min(</)G , 1 (U . G -Ui)<0 -Ui) = (3.31) 0 <f>G , •••) 2 (3-32) T h i s limiting procedure was introduced by B a r t h [13] and such an implementation guarantees the monotonicity principle for all Gauss points. T h e B a r t h limiter produces a strictly monotonic solution and removes all oscillations; however it has some convergence and accuracy issues. First, Barth's limiter formulation is clearly not differentiable and this severely hampers the convergence process as limiter values oscillate across the shock. Second, in smooth regions (including the stagnation region), we expect to have some local extrema which will cause the limiter to fire. T h i s reduces the accuracy of the solution in those circumstances. In general, an ideal limiter is differentiable, and acts firmly in the shock region not allowing oscillatory behavior around discontinuities. Such a limiter also should be inactive in smooth regions despite existence of non-monotone solutions resulting from smooth local extrema. CHAPTER 3. RECONSTRUCTION AND MONOTONICITY 48 Venkatakrishnan limiter [86, 87, 4], which is semi-differentiable, nicely addresses most of the aforementioned issues, and it has been employed in this research: Al — Ui , A i j — U ax ) m a x m ) m (A 1 + 2 i m a x (A? 0G )A 2 £ — Ui , A2 = mln 2 2A2A , + 1 n 2 .max A • 2 +e 2 A 2 2 A sign(A ) A 2 e 2 2 (3.33) >0 if A < 0 (3.34) 2 2 2 ~ Ui 2 A i ,+ e " i m i n + ¥ £2 )+A l ,+m2iAn A A UQ if A 2A| -f A i l,max 1 —U n r a i n = s i g n ( A ) ( | A | + w) 2 2 = (tfAx) (3.35) 2 (3.36) 3 To avoid division by zero or a very small value in E q . (3.34) as a practical measure A replaced by E q . (3.35) where u> is chosen to be 1 0 ~ 2 is for 64-bit arithmetic computations 12 (according to Ref. [87]). Ax in E q . (3.36) is the mesh length scale and.it can be picked as an average mesh length scale or a local mesh length scale. A local mesh length scale has been used for this research and it is defined as the diameter of the largest circle that may be inscribed into a local control volume. T h i s length scale is proportional to the square root of the control volume area, and as a simplification, we assume the control volumes are equilateral triangles: - A In smooth regions, A i > m a x , Ai i m regions or at extrema. Since e A 2 and A n i f < - -> 3 all are on the order of (Ax) 2 2 3 37 either in constant is made proportional to ( A x ) , e dominates A i 2 and j , / 2 2 m a x , Ai j„, i m terms and we recover the scheme without limiting [86, 87]. For instance, near an extremum, (say a max) A i ] m a x the limiter will not be fired. = 0, A 2 < 0 therefore 4> —> = 1- For flat solutions again However in the general smooth cases the limiter value will remain close to 1.0 but not exactly 1.0. In the shock region A i and i m a x , Ai | i n j , n and A 2 are 0(U) the limiter fires as it was originally supposed. « 1 and they dominate e 2 term T h e key point here is the value of constant K which is the coefficient of Ax in E q . (3.36). K determines the extent of the CHAPTER 3. RECONSTRUCTION AND MONOTONICITY 49 monotonicity enforcement by setting a threshold below which oscillations in the solution are not limited. A very large value of K essentially means no limiting at all, and it could make the solution process unstable. Normally increasing K up to some value enhances convergence characteristics as long as divergence does not occur. In contrast, a small value for that constant slows or stops the convergence although it produces more monotonic solution. T y p i c a l values of K used in literature are 0.3, 1, and 10 and in general the optimal value both for convergence and accuracy purposes is determined based on experience. Having computed the limiter value, we can directly multiply all the derivatives in the higherorder reconstructed polynomial (linear and non-linear) by the limiter value as was described previously for linear reconstruction: U (x ,yG) = K) G G U(x ,y ) c dU + > c au 2 dx Ax 2 dx dU Ax 3 dx du 3 c — Ay V d c 2 dxdy 2 3 Ax+ c dU 3 + dx dy AxAy + du 2 + dy 2 Ax Ay 2 2 Ay 2 c dU AxAy 3 + dxdy 2 2 dU Ay 3 + dy 3 c 3 c (3.38) However, our experience as well as other research [22, 24] shows implementing the limiter to all gradients yields a more diffusive solution. T h e approach of [24, 23] is followed, where the limiter is applied only to the linear part of the reconstruction, and the non-linear part of the reconstruction is dropped. T h i s also helps the limiting process around the shock as higher derivatives tend to show oscillatory behavior in the vicinity of discontinuities. The limited reconstruction polynomial is expressed as: U \x ,yG) K G G = Ui+ (I - a)<j>i+ a {Linear Part} + a {Higher-Order Part} (3.39) where a is a limiter for higher-order terms. In smooth regions, the full higher-order reconstruction is applied by choosing o = 1. Near discontinuities we switch from the higher-order to the limited linear polynomial to prevent possible oscillatory behavior of second and third derivatives. T h i s is done by a discontinuity detector employing the value of the limiter; if the limiter fires aggressively we assign zero to a. T h i s idea is also shown for a simplified 1-D case in Fig(3.5) . Using a switch to set a to either zero or one stalls the convergence. T o overcome this CHAPTER 3. RECONSTRUCTION AND MONOTONICITY \ \\ \ i-2 i-1 i i+1 i+2 (a) A r unlimited quadratic reconstruction \ \ \ \ i-2 i-1 i i+l i+2 (b) A limited quadratic reconstruction Figure 3.5: T y p i c a l unlimited/limited quadratic reconstruction 50 CHAPTER 3. RECONSTRUCTION AND MONOTONICITY 51 0.8 0.6 0.4 h 0.2 h 0.2 0.4 4>. 0.6 0.8 Figure 3.6: Denning a as a function of < problem, a is defined as a differentiable function of 0, such that a is nearly one for the regions that cfr > (fro and it quickly goes toward zero for other values of cfr; making a differentiable greatly enhances convergence. As a differentiable switch, a smooth semi-step function is employed: a= 1 - tanh(5(^ - <fr)) 0 (3.40) S in (3.40) determines the sharpness of the step function, and adjusts how fast the switching transition between zero and one is; (fro defines the limiter value that activates the switch. It appears that (fro = 0.8 and S = 20 provide a reasonable switch function whose good behavior is relatively case independent in this research, Fig(3.6). Chapter 4 F l u x Jacobian 4.1 What is the Jacobian ? A n y implicit formulation for solving a P D E includes some sort of Jacobian calculation. We seek a solution vector X = (Xi,X , X%,X ) 2 for a coupled non-linear system of algebraic n equations defined by: F(X) = 0 (4.1) where, F = (F ,F ,F ,...,F ) 1 2 3 (4.2) n T h e solution can be found v i a an iterative process (Newton's method): dF_ SX dX i+1 = -F{X ), X l i+1 =X l + 5X i+1 (4.3) Since F is a vector operator, the derivatives of F respect to the vector X are expressed as a matrix called Jacobian matrix (f£), such that 5 2 the(ff) entry is the derivative of the CHAPTER 4. FLUX JACOBIAN 53 j-th component of F respect to the k-th component of X: J2,,71 OF dX (4.4) dX k J7n.n Generally speaking, in an implicit formulation such as (4.3) the Jacobian matrix constitutes the coefficient matrix and represents the linearization of a non-linear problem which must be solved through an iterative process. In C F D the non-linear function F, determined by the physics of the problem, is the residual of fluxes for the discretized domain. In other words, we look for a solution vector U (conserved flow variables) which satisfies the flux balance for all control volumes in a meshed domain: Res(U) = 0 (4.5) Equation (4.3) is similar to the implicit time advance formula, E q . (2.12), except for the diagonal term in (2.12) normalized by a time step (mainly added for stabilization purpose). Equation (2.12) can be regarded as Newton's formula with pseudo time stepping. In either case, the Jacobian matrix is needed not only for forming the linear system but also for building the preconditioner matrix, and it is the most expensive part of the implicit solver. T h e convergence rate of the implicit solver depends on the accuracy and correctness of the Jacobian matrix, and to achieve the quadratic convergence, we need to employ the true flux Jacobian on the left hand side of E q . (4.3). Since the Jacobian matrix consists of the derivatives of the discretized governing equations with respect to the flow variables used in the local discretization, the Jacobian entries in each row are zero except for those few control volumes that contribute to that flux integral (i.e. stencils of the neighbors). union of the reconstruction T h e level of difficulty in flux Jacobian computation depends on the flux function (physics), the numerical flux formulation (i.e. how the flux is being computed, such as Roe's flux formula), type of the mesh, number of dimensions, and order of accuracy. For instance, taking the derivative of the first order flux function for the 1-D linear wave equation, E q . (2.15), in finite-volume formulation is fairly easy (|^f; i F g = - A ) , since (Flux)i = A(UJ - = A and . B u t in general, for non-linear flux functions and/or formulations, especially in multi dimensions and with unstructured meshes, the flux Jacobian computation involves a large number of control volumes and much more work per control volume. Consequently, increasing the order of accuracy not only adds to the CHAPTER 4. FLUX JACOBIAN 54 complexity of flux Jacobian computation but also makes it considerably costly both in terms of computation time and memory usage. T h e Jacobian matrix can be built by either analytical or numerical (finite difference) differentiation; both approaches have been employed quite successfully [84]. Analytical differentiation can be performed either manually, symbolically [94] or automatically [78]. The analytical approach computes the exact Jacobian (ignoring round off error in numerical evaluation) but it is relatively difficult to apply this approach for complicated flux functions a n d / o r higher-order discretizations by manual or even symbolic differentiation. Automatic differentiation ( A D ) , a very powerful tool for computing complex Jacobians, augments the computer program to compute the derivatives by applying the chain rule repeatedly to the elementary arithmetic operations performed by the computer program. T h e derivatives are computed relatively cheaply by this method and are accurate to machine accuracy without any truncation error. However, A D requires very careful programming and a considerable amount of memory [20], and has not been considered in this research. Numerical differ- entiation is fairly straightforward even for higher-order and complicated flux functions. It also works reasonably well where the flux function is not differentiable, but like any other numerical approach, the numerical differentiation technique suffers from truncation error as well as round off error. For some of the Newton-Krylov solvers like G M R E S , only products of the Jacobian matrix by some vectors are needed, and explicit computation of the Jacobian matrix can be avoided. These products are computed using directional finite differencing. Therefore the higher- order Jacobian of complex C F D problems can be employed on the left hand side of the implicit formula to match the true order of accuracy of the residual on the right hand side, E q . (2.12), without excessive computational effort and memory usage. However, in most cases, an approximate Jacobian matrix is required explicitly for preconditioning of the linear system solver. It is possible to use a simplified or first-order Jacobian (based on direct neighbors and mostly first-order) for preconditioning of the linear system resulting from higher-order discretization [51, 50]. In this chapter, first, the lower-order Euler Jacobian computation procedure is presented for an unstructured mesh stencil based on Roe's flux formula. T h i s flux Jacobian is used for preconditioning of the linear system solver. T h e analytical and numerical approaches for flux differentiation are discussed step by step. CHAPTER 4. FLUX JACOBIAN 55 Figure 4.1: Schematic of Direct Neighbors 4.2 Flux Jacobian Formulation In the case of the 2D Euler equations discretized over a cell-centered unstructured mesh, each control volume has 3 direct (first) neighbors, and the first-order flux integral of the control volume i o h l y depends on these three neighbors and the control volume itself (Fig(4.1)): Ri= J2 ( n )™ F ds = HU,U )n l Nl l l + F(U ,U )n l i N2 2 2 + F(U ,U i N (4.6) m=l,2,3 where, n m and l m are the outward unit normal and the length of the face m of the control volume i respectively. T o compute the flux Jacobian of the control volume i, we need to take the derivatives of the flux integral or residual function, E q . (4.6), with respect to all control volume averages involved in the residual function evaluation as shown in the following formulas: (4.7) j(i A M _ / i dR dF(U U )^ u N2 (4.8) CHAPTER 4. JACOBIAN FLUX 56 OR, dU J{i,N ) 3 dF(U ,U -n < 0U { 3 N3 J(i,i) = OR, dF(U U ) u au Nl ^ N3 dF(Uj,UN )~ , ^ dF(Uj,U )^ 2 ' du (4.9) 3 Ns . N3 (4.10) ' • du du { Since both the flux F and solution U are 4-component vectors, each entry in the Jacobian matrix J is a 4 x 4 matrix and the. total-size of the block matrix is n x n-where n is the total number of control volumes as shown in E q . (4.11). However, most of these blocks are just zeros as in general there are no more than three neighboring control volumes in the first-order cell-center formulation and the Jacobian matrix is very sparse, with at most four non-zero blocks per row. ( X X X X X X X X X X X X ) I X X X X \ X X X X X X X X X X X X X X X X VX / / X X X X X X X X X X X X X X X X X X \ X X X X X X X X / X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X \x X X X X VX X X X X X X X \X V / X \ J X / J 1,1 2,1 n,l I ) l,fe ) I 2,fe n.i I • I 1 I n^n J (4.11) Figure (4.2) displays typical numbering and the resulting connectivity between triangles in a sample unstructured mesh. For this sample mesh, the only non-zero blocks in the Jacobian matrix corresponding to control volume 42 (row of 42) are (42,38), (42,42), (42,49), and (42,57); for this row, two non-zero blocks are above the diagonal block. For the row 54, the non-zero blocks are (54,43), (54,45), (54,54), and (54,65); this time two non-zero blocks are below the diagonal block. In the next two subsections, the analytic derivation of the first-order flux Jacobian entries CHAPTER 4. 57 Figure 4.2: T y p i c a l cell-centered mesh numbering for both interior and boundary fluxes are described in detail. 4.2.1 R o e ' s F l u x J a c o b i a n After introducing the structure of the flux Jacobian, we need to compute the derivatives of the fluxes, i.e. Roe's flux function. T h e flux at the face between cell i and N\ is: F(Ui,U ) Nl T h e Euler flux formula, F(U), —- (F(Ui) + F(c7 r )) A 1 A.(U -Ui Nl (4.12) and its derivative with respect to the conserved variables, | ^ were described by E q . (2.2) and E q . (2.16). T h e A , the Jacobian of the Euler flux evaluated at the Roe's averaged properties, was also introduced by E q . (2.19). T h e Roe's average formula (Tilde form) for density, velocity components, and total enthalpy are the same as those introduced in section (2.3.2), but this time they have been rearranged to reduce the evaluation cost. R (4.13) CHAPTER 4. FLUX JACOBIAN L = 58 1 1 o u —n.y v n _U C I x 5 _ C _ u 0 t c c 0 \u \ n 0 0 0 0 0 l-B-3- 2 n C 2 2 Q c Tlx C 2 c 2 2 x C (4.14) c _ c 0 0 0 0 + -2c 2 J (4.15) 0 C 8c 8^v 2 2 C R = i(-Cu \{Cu n n + pQ) \(n C-Bv) ±(n C-Pv) x + PQ) -\(n C y + (lu) x 0 = 7 - 1 , -\(nyC 7=g* + Bv) c 2 0 ¥i (4.16) \B (4.17) (4.18) (4.19) PL.+ P •PL.^L + PVR PL + P PLhtL + ph R t PL+P 1 - Q) -n ii + y n = nu x (4.21) (4.22) C = JB(ht u (4.20) nv x + nv y (4.23) (4.24) (4.25) CHAPTER 4. FLUX JACOBIAN 59 A (U -Ui) T h e flux differencing term in E q . (4.12), A AU , can be recast in the form of Nl and the full derivative of this term with respect to the solution vector (in general form) is: d ( ^1 A U dU 2 ) d dU AU + A d{AU) dU (4.26) Now, we take a look at the first term in E q . (4.26) A dU AU 0 i (L A R^j AU = dU dL dU R+L dA dU R+L dR dU AU (4.27) Differentiation of -gjf- produces some third-rank tensors which not only is difficult to derive but also is quite expensive to compute. B a r t h [9] has derived the full derivative of 9 ^ with some clever modifications to eliminate the tensor computations, reducing the complexity of the Jacobian computation to some degree. T h r o u g h spectral radius analysis for 1-D flow he showed that for a smooth flow the approximate Jacobian can be accurate up to C F L = 1 0 0 0 or even above. However, for the shock tube problem the difference between the true Jacobian and the approximate Jacobian grows after C F L = 1 0 , and becomes noticeable after C F L = 1 0 0 showing that the approximate Jacobian will not be accurate enough for larger C F L numbers. Looking back at the E q . (4.26), similar physical insight can be d\A~\ concluded. For a smooth flow, the magnitude of -^f-AU with respect to the magnitude d(AU) of -oy— is very small, and the resulting approximate Jacobian will be acceptable. In the case of a flow with discontinuity, this approximation is not accurate anymore because, d\A\ -frf - as well as AU entries would be large enough (at least in some regions of the flow field) and except for very small AU (i.e. smooth regions of the flow field) that approximation will lose its accuracy to some extent. Therefore applying a large C F L number for the solution acceleration is not necessarily useful when the overall flow linearization is relatively inaccurate. Ignoring changes in A (treating it as a constant) greatly simplifies the evaluation of the Roe's flux, as well as overall Jacobian computation cost; such a Roe's flux differentiation for the E q . (4.12) with respect to cell i and N\ is called the approximate Roe's flux Jacobian and can be written in the following form: dF(Ui,u dU,Ni ~dF(U ] dU,Ni Nl (4.28)' CHAPTER 4. FLUX JACOBIAN 60 dF(Uj,U ) dUi dF(Uj dU + Nl (4.29) T h e other Jacobian terms i n E q . (4.8) through E q . (4.10) can be derived easily by repeating similar differentiation. Because of the difficulties in Jacobian calculation, especially for complicated flux functions such as Roe, several researchers [83, 93]'have successfully employed simpler flux functions like Steger-Warming [77] for linearization purposes-building the Jacobian on the left hand side-where the right hand side-the residual-still is evaluated based on the Roe's flux. A similar approach can be taken for preconditioning of the left hand side where the explicit Jacobian matrix is not required (matrix-free methods), but a preconditioner matrix still is needed to enhance the performance of the linear solver [22, 61]. In either case, due to inconsistency between the left and right hand sides (i.e. linearization and flux evaluation), the resultant convergence rate per iteration will not be as fast as if a consistent Jacobian was employed. 4.2.2 Boundary Flux Jacobians To have a true implicit time-advance a n d / o r Newton formulation, the linearization must be extended to the boundary fluxes. B o t h the Jacobian matrix and the preconditioner matrix (in our case the first-order Jacobian) need to include derivatives of the boundary fluxes as well. Otherwise, the implicit formulation does not converge as fast as it should a n d / o r some stability issues may well occur if large C F L numbers are attempted [79, 32]. T h e goal here is to compute the Jacobian matrix of the boundary Euler flux with respect to conservative variables. Since on some occasions boundary conditions (section 2.4) are implemented based on the primitive variables ( V ) rather than conservative variables (U), it is more convenient to compute the Jacobian of the Euler analytic flux with respect to primitive variables and then convert that Jacobian to the conservative format using the chain rule: OF dU dFdV WW (4.30) pn 0 y 8F_ dV p(u UU n n VU t n un ) p{uu n + hn t x n y p(u x P(7-1) pun x : PV1l n hu + + vriy) n + p(vu n hn ) t y x n y 7-1 (4.31). CHAPTER 4. FLUX JACOBIAN 61 0 0 5V I i( 2 U +/)( -l) p pu U= pv E I e = e+e , e = C „ T = — —, k P\1 ~ 4.2.2.1 7 P u ,v= P t -u( -l) 7 -v(7-l) (7-1) /9ww , F= V P n x (4.33) y ph u t = -Cu-fV), E = pe , t 1) Pn n 1 k + pvu + P n 2 e (4.32) 0 n P h = e+— , h = h+e (4.34) t k P 2 Wall Boundary Flux Jacobian T h e wall boundary condition for the first-order Euler flux can be satisfied by setting all normal fluxes equal to zero (u n — 0), section 2.4.1, and the wall flux Jacobian is derived by taking the derivative of this simplified wall flux: 0 Pn x Fw (4.35) Pn y 0 4.2.2.2 0 0 0 0 dF_ 0 0 0 n W 0 0 0 n 0 0 0 0 w x y dF_ du w dF_ dv dV w dU (4.36) Subsonic Inlet Flux Jacobian A s it was explained in section 2.4.2, total pressure, total temperature, and angle of attack are set at a subsonic inlet as constant inlet conditions and the static pressure is taken from the interior as a parameter. T h i s time it is easier to rewrite the Euler flux in terms of total pressure, total temperature, static pressure, and angle of attack and then proceed CHAPTER 4. FLUX JACOBIAN 62 with taking derivatives. T h e magnitude velocity, V , can be defined as a function of total m temperature, and the normal velocity component is described in terms of the magnitude velocity and angle of attack: * T t 7 - i ; 2 u = V cos a — \ m v = Kn.sina = 2 u„ t \ (T L V 7 - 1 / \T 2 5-i - 1 t (cos an x VT ,7-1/ (T \T 7 - 1 / (4.37) - 1 \T cos a (4.38) sin a (4.39) + sin an ) (4.40) y Now, we write density in terms of total temperature, total pressure and static pressure: ^P (P 7 9 = (4.41) P (4.42) 7-1 * ( * ) A n d finally h is rewritten in terms of total pressure and static pressure: t 7-1 i P 7 + (7-1)- PA — 2 W 7 - 1 - 1 T h e Euler flux vector is recast in its new format after substitution of p, u, v, u , n h with t their new definitions: Fi F= F 2 (4.43) F 3 F 4 7^2 TP~ \2 ( t t J ^pCip(2-3C ) _ p(2- •2d) x (4.44) CHAPTER 4. FLUX JACOBIAN 7 C 2 cos a Fo = (pCi p ( l - 2 d ) _ B \ p(l-C ) 2 TP~ x 4 Ci t 63 t 7 C 2 sin a |pCip(l-2d) 2 + _ p(l-Ci)^ TP~ Cl t Pra,, (4.46) t TP~ P 7^2 -Ci TtP; (4.45) x Cl t Pn + I ("pClp(l-2Ci) _ p ( l - d ) t /9 = 7 - l (4.47) cos a n + sin an Co x 7 v (4.48) Total pressure, total temperature and angle of attack for a steady inlet are constant, and where 77 is the inlet variable vector is greatly simplified, E q . (4.49). A p p l y i n g the chain rule ( E q . (4.50)) and since the only apparent primitive variable in the subsonic inlet flux vector is the static pressure, P , all three first columns of the subsonic inlet flux Jacobian will be zero and only the 4th column needs to be evaluated. W ' 0 0 0 0 " 0 0 0 0 T 0 0 0 0 a 0 0 0 1 P dF_ OF ' Pt' t (4.49) dn (4.50) dV 00 o o o f dF dV Subln 0 0 0 00 lC dF x ~dP I T t P 2 r(2 0 - 0 § I I (4.51) p-Cip(2-3Ci) _ p(2-2Ci) 2 2 0 C l - 3C ) 1 i p d - 3 C i ) _ 2(1 _ CI)PQ-™ (4.52) CHAPTER 4. FLUX JACOBIAN dF 7C2 cos a 2 ~dP TP- ' = t t I QF ^ ryC sin a 3 ~dP dF dP 2 T Pr = ((1 - 2 C i ) P P c Cl ((1 - PVPT P - ^ c t 2[3^T P - Cl t 5 u W n 2 C l 1 - (1 - Ci)P- - (1 - d)P- C l ) (4.53) C ) (4.54) l r TfP- t C l + (1 - 2 C ) P x C l t p- 2 C l - (1 - d ) P - C l t c 2 f^| p- 2C )P^P-^ V2C CaP 'P- i Having C l t t v ^ ^ y p ^ p - ^ - i 4 64 c j ^ p - C i _|_ pC\ p-2C\ t JP _ p-Ci (4.55) P-Cx - l Cl t evaluated, the subsonic inlet flux Jacobian with respect to conservative variables can be easily computed: dF dU 4.2.2.3 dE dV Subln Subln dV du (4.56) Subsonic Outlet flux Jacobian For subsonic outlet, section 2.4.2, three flow variables (p,u,v) are taken from interior, and only the static pressure is fixed at the outlet: pUU n F s ubOut — pvu n • , I Pout P(7-1) + P tn +' P n ou out x (4.57) y pht-outV-n . 1 / 2 , 2\ (4.58) 2' Since the static pressure at the outlet is treated as a constant, the last column of the outlet 'flux Jacobian is filled with zeros: CHAPTER 4. FLUX JACOBIAN dF dV SubOut U .n pn UU. p[u + un V U .n pvn p(uu + h _ n ) h t - o u t U n and the rest is trivial: 65 - °Jli) p n t 0 puriy x x out x p(u + vn ) p(vu + h _ ny) n n 0 (4.59) y t out 0 . dF_ dU SubOut 4.2.2.4 y n 1 P 0 pn x 5F dV SubOut dV dU (4.60) Supersonic Inlet/Outlet Flux Jacobians A s it was discussed in-section 2.4.2, we impose all flow variables directly at the supersonic inlet, and their values do not depend on the variables of the inside solution domain. T h e r e - . fore the supersonic inlet flux Jacobian is just a block of zero. O n the contrary at the outlet all variables are taken from inside domain and the outlet flux Jacobian respect to primitive variables is the same as E q . (4.31). 4.3 Finite Difference Differentiation T h e flux Jacobian can be computed via finite differencing removing the issue of analytic differentiation of the flux function. Finite difference differentiation is a well known technique to compute derivatives of a complex function using perturbations of the independent variables. Equations (4.61) and (4.62) show typical forward and central differencing formulas: dF dU dF_ dU C F(U F D ~ nU D ~ + e) — e + e)-F U-e) { 2e F(U) dF dU 2 2 /£ s '-) \2 d*F(e^ dU \ 6 Forward Differencing (4-61) ^ 3 T h e central differencing has smaller truncation error but it comes with the cost penalty of one more function evaluation, making it twice as expensive, compared to the forward differencing. If the function evaluation is expensive, which indeed is the case for residual evaluation in C F D problem, central differencing may not the best choice. 4. CHAPTER FLUX JACOBIAN 66 T h e accuracy of the differentiation clearly is a function of £ and it appears that smaller e results in smaller error, which is true for truncation error, but choosing a very small value for £ will introduce more noise in numerical differentiation and amplifies the existing error in numerical function evaluation (caused by round off error) since that error is divided by very small number. T h i s can be shown by a simple analysis, where the actual function evaluated by the computer F is replaced by the exact value of F and the associated round off error E : r F(U) = F(U) +-E (U), F(U ± e) = F(U ±e) + E (U ± e) r dF_ dU 0F_ 3U if (E ) r max r dU FD FD 8F_ CD du CD E (U + e) -E (U) r + r (4.64) E (U +£) - E (U••- e) r + (4.63) r (4.65) 2£ is an error bound for round off error, the total error in numerical differentiation for forward differencing ( F D ) and central differencing ( C D ) can be written as a combination of truncation error and the error caused by round of error known also as cancellation or condition error [29]: dF dU 2 Total-FD — '£\ 2 dF 3 ETI otal-CD — , 2(flr) ,2/ £ 2 (4.66) : £ (E T (4.67) ou 3 By. examining the E q . (4.66) and E q . (4.67) it is clear that if the magnitude of perturbation, £, is a large number the dominant error in numerical differentiation is truncation error. Truncation error decreases linearly for forward differencing and quadratically for central differencing by reducing the perturbation magnitude. However, the condition error increases linearly by decreasing the e . Therefore there is an optimum value for perturbation magnitude which minimizes the total error in numerical differentiation. F i g (4.3) displays the total-relative numerical error in finite difference differentiation versus perturbation magnitude for a sample cubic function, F = X 3 T h e relative error is computed by: +X 2 + X +0.1 , at XQ = 0.05. CHAPTER 4. FLUX JACOBIAN 10' 10"' 17 5 10'" 67 10'" 10'", e IO' 10' 7 s IO" IO' 1 1 . Figure 4.3: Total numerical error versus perturbation magnitude Error.total-relative F'(X ) 0 °W. Finite Diff. (4.68) F'(Xo) Clearly, there is an optimal value for e such that the numerical differentiation error is minimum, in this case e'ss I O - 9 for forward differencing and e « 1 0 - 6 for central differencing. Very small value of perturbation (i.e. e < I O ) has caused domination of the condition -9 error in such a way that both forward and central differencing essentially produce the same total error. Since the condition error is caused by round off error, the optimum £ should be determined based on the machine precision, EM- The error bound for round off error, (E ) r max can be estimated by EM \\F\\- To find the optimum e, derivatives of Eq. (4.66) and Eq. (4.67) are taken respect to e and they are equated with zero: OEr, otal-FD dE dE-Total-CD DE dF du 2 2E \\F\\ m 2 d*F du 3 £M\\F\\ = 0 (4.69) (4.70) CHAPTER 4. FLUX JACOBIAN 68 and by solving the above equations for e the optimum perturbation for forward and central differencing are estimated: EoptpD OptcD £ \ ~ a F1 2 W 3£M H^ll a F 3 WP Therefore, in addition to machine accuracy, the optimum perturbation value, depends on the norm (magnitude) of the function F and its higher derivatives, and the type of the numerical differentiation. There is another consideration for choosing the size of the perturbation for numerical differentiation of a higher-order function, which is the grid size or discretization length scale; this will be discussed in chapter 5. Not surprisingly, there are other practical factors in scientific computing for e selection which is beyond the scope of this research. Several researchers [40, 26, 29, 61] have studied the effect of e in numerical differentiation and they provide some practical formulation and guidelines for choosing the best value for 4.4 Numerical Flux Jacobian In numerical Jacobian calculation using the finite difference method, the principle has not changed; we still need to take the derivatives of the residual function for a control volume, E q . (4.6), and compute the flux Jacobian terms through E q . (4.7) to E q . (4.10). However, this time instead of analytic differentiation of the flux function at each face, E q . (4.12), the finite difference technique is applied. T h e Euler flux function includes 4 equations and 4 primitive variables: A t the same time, the flux derivatives with respect to conservative variables are needed, these variables should be perturbed originally. T h e following procedure demonstrates such a flux Jacobian computation for interior faces using forward differencing: CHAPTER 4. FLUX JACOBIAN 69 F7u:r=FluxFunct i on (V) for j=l to 4 { for i=l to 4 { « / [ i ] = e E L j ] [i] f/p[i]=C/[i]+5C/[i] } Vp=ConservativeToPrimitive(f7p) FZm p=FluxFunction(Vp) - for i=l to 4 { DFDU [i] [j] = (Fluxp [i] -Flux [i]) /e y } where Flux is the flux vector, V is the primitive variable vector, U is the consevative variable, subscript of P presents the perturbed vectors, and E(j, i) is the identity matrix: 1 0 0 0 1 0 0 0 0 0 0 0 (4.71) 1 0 0 1 For boundary faces, exactly the same procedure is implemented but the boundary condition implementation is also considered both for the perturbed variables and the flux functions: FluxBC=BoundaryFluxFunct for j=l to ion (Vg) 4 { for i=l to 4 { <5t/[i]=eE[j] Ei] £ / p [ i ] = [ / [ i ] + < y t f [i] } Vp=ConservativeToPrimitive(t/p) VpB=BoundaryConditionProcedure(Vp) CHAPTER 4. FLUX JACOBIAN 70 i /wa;BCp=BoundaryFluxFuiaction(VpB) for i=l to 4 ? . { . (DFDU)BC } [i] [j] = (FluxBCp[i]-FluxBC[i])le Chapter 5 Linear Solver and Solution Strategy 5.1 Introduction U p to this point, two major elements used in the implicit solver (i.e. residual evaluation and Jacobian computation) have been explained. T h i s chapter describes the iterative process to reach the steady-state solution, including, the linear system solver, preconditioning, start-up process, and over all solution strategy. Linearization of the fluid flow equations over a meshed domain leads to a large sparse linear system, which needs to be solved in multiple iterations. In principle, there are two separate issues here, forming the linear system and solving the linear system. A s it was described in section 2.2, it may be useful if the solution process starts from a low C F L number (generally resulting in small solution update) and even low-order linearization, E q . (2.14), especially if the initial condition that the solution starts from is far away from the steady-state solution of the problem. T h i s is particularly true for non-linear problems such as compressible flow, where the large solution update based on high C F L number at the early stage in the solution process does not necessarily help the convergence, and it may slow the convergence or even cause divergence. T h i s fact can be better explained by F i g (5.1) where a sample scalar function is linearized. T h e linearization of a function F(U) at point A, where we are far away from the solution point C, does not provide us a good slope to estimate the solution point. Also it does not make a meaningful difference if the linearization at this point is exact or approximate, as both of them result in inaccurate solution estimation. Now if the slope 71 CHAPTER 5. LINEAR SOLVER AND SOLUTION STRATEGY 72 Figure 5.1: Linearization of a sample function of function F(U) at point A is used for the solution update, only a small fraction of AU can be applied reasonably. A t the same time for small AU, an approximate linearization, in our case low-order linearization, still produces a reasonable solution update. O n the other hand, when we are close enough to the solution point C, the linearization is a fair representative of the non-linear problem and the more accurately this linearization is performed better solution update is expected. Direct methods (Gaussian Elimination and L U factorization) solve the linear system exactly (to within round off error), and they have been applied to C F D problems in early C F D implicit solvers [8, 89]. However, the computing cost ( C P U time) and the storage requirement (memory usage) of direct methods for large linear system arising from C F D problems limit their application seriously. Furthermore, quite often the linearization is not exact and we are solving a non-linear problem through multiple linear iterations, so it is not beneficial from the performance point of view to solve the linear system in each iteration exactly. Iterative methods, on the contrary, require far less memory, and can provide a good approximation for the solution at a reasonable cost. It should be mentioned that the effectiveness of iterative methods as compared with direct methods for solving linear systems is related to the sparsity of the coefficient matrix. In general, iterative methods are more efficient for large sparse systems. Development of iterative methods for large sparse linear sys- tems has been a very active research area in the field of numerical linear algebra, and the Krylov-subspace family methods have emerged as modern iterative techniques [74]. In these techniques, a very large sparse linear system is reduced to a much smaller system through CHAPTER 5. LINEAR SOLVER AND SOLUTION STRATEGY 73 creation of some subspace (known as K r y l o v subspace), and then solution of the original large system is approximated using the constructed subspace by satisfying some optimality criterion for the original system. A wide variety of K r y l o v iterative solvers [81], are in use for various applications; the Generalized M i n i m a l Residual ( G M R E S ) algorithm [72], which is developed mainly for non-symmetric matrices, is employed in this research for the following reasons: 1. T h e linear system arising from unstructured discretization is asymmetric both in terms of the matrix structure and the values of entries. 2. T h e G M R E S algorithm only needs matrix-vector multiplication removing the issue of high-order Jacobian matrix computation. 3. G M R E S minimizes the residual of the linearized system implying that if we are close enough to the solution and linearization of the non-linear system is accurate then G M R E S computes the best update for solution at each iteration, making it a very suitable choice for Newton iteration. It should be noted that the G M R E S algorithm has been applied quite successfully and extensively in implicit C F D solvers since the early 90's, and it is a well known linear solver for C F D applications providing us a great deal of experience both in solver and preconditioning implementation. Preconditioning is another important issue in linear system solving since no iterative method for general large linear systems demonstrates suitable performance without proper preconditioning. In principle, preconditioning transforms the original linear system to a modified linear system making it easier to solve by an iterative technique. 5.2 G M R E S Linear Solver Like other modern iterative techniques G M R E S uses a projection process' to extract an approximation to the linear system solution from a subspace. Assume A is a real nxn matrix and x is a solution vector in 5ft for a linear system of: n Ax= b (5.1) CHAPTER 5. LINEAR SOLVER AND SOLUTION STRATEGY 74 A n approximate solution of the above system is found within a subspace IC of 3?™. T h e size of the subspace i s m « n , and generally m constraints in the form of orthogonal conditions (minimization problem) have to be imposed to make the best approximate solution. A logical candidate for imposing orthogonality condition is the residual vector: r = b - Ax (5.2) The residual vector is considered to be orthogonal to another subspace £ with the dimension m. In other words, we are looking for an approximate solution x £ IC with the constraint of r J_ C. It can be shown that if the subspace C is chosen in the form of C = AK, (oblique projection), imposing such an orthogonality condition minimizes .the l norm of the residual 2 vector over the affine space of xo + IC, where XQ is the initial guess and the starting vector [74]- 5.2.1 The Basic G M R E S Algorithm For a linear system of Ax = b, G M R E S [72] seeks an approximate solution x k of x = XQ + k Zk , where XQ is the initial solution vector, and subspace (search directions), IC k = span {ro, ATQ, is the initial residual which minimizes the l Krylov r$ — b — Axo is the member of the k K 1 Here, norm of the residual vector. 2 A subspace V z A ro, ..., A ~ TQ}. 2 in the form = {v\, v , ..., Vk} is constructed using Arnoldi's algorithm (and applying k 2 Gram-Schmidt method) for computing the l — orthonormal basis of the K r y l o v subspace, 2 where v\ = ro/ ||ro||- T h e residual vector at the kth iteration, r k to , should be l — orthogonal 2 K 'k GMRES: 1. Use i n i t i a l guess x$ and compute ro = b — AXQ; normalize the i n i t i a l r e s i d u a l vi = ro/ \\ro\\ • 2. B u i l d the orthonormal basis (Arnoldi's algorithm): f or j=l , 2 , . . . ,k u n t i l s a t i s f i e d hi tj v = (Avj,Vi), k 1=1,2,'. .. ,j , = AVJ - J2 i a i > h 3 j+x (\\b - Ax \\ <tolerance) do: v i= hj+ij = \\VJ X\\, + Vj+i = = 0 stop i f hj ij + Vj+i/hj+u, 3.Form the approximate s o l u t i o n : x k — XQ + V yk k such that x k minimizes ||6 — j4xfc||. CHAPTER 5. LINEAR SOLVER AND SOLUTION 75 STRATEGY After A; iteration in Arnoldi's process, the l — orthonormal subspace Vk+i and a (k + 1) x k 2 matrix H k are formed. H k is the upper Hessenberg matrix H k whose only nonzero entries are the h-ij described above, except for an additional row with the nonzero element of h k + i t k . T h e following important relation will hold: AV k For minimization, the update x k = V 0 + z)\\ = min z k H (5.3) k — xn + z should satisfy the least-squares problem: min ||6 - A(x B y setting z = V y, k + 1 ||r 0 Az\\, z G K- , (5.4) k z rn = Bv\, and 8 = ||r'o||, we have: ||r 0 - Az\\ = \\Bv x - AV y\\ = \\V (Bei k -H y)\\ k+l (5-5) k where, e\ is the first column of the (k + 1) x (k + 1) identity matrix. Since V \ k+ is an orthonormal set, the least-squares problem is reduced to: • min ||'/3ei - H y\\ , y e K f c k (5.6) Considering that this is a minimization problem, E q . (5.4), the residual at each iteration should be smaller or at least equal to the residual at the previous iteration, i.e. |J7"fc|| < ||rfc i||. Therefore the residual decreases in each iteration or at worst stalls. T h e algorithm + terminates after n iteration steps in the absence of the round of error, meaning that the exact solution is computed if the size of the subspace is chosen equal to the size of the system. T h e iterations or steps taken in building the subspace or search directions in part 2 of the G M R E S algorithm are called inner iterations. Since by increasing the search directions or subspace size the cost of the algorithm rises linearly for memory usage and quadratically for computing time, the K r y l o v subspace size for large matrices is limited to m <C n as was mentioned earlier. In practice, we hope that by picking a small subspace size, a sufficiently accurate solution is computed if that does not happen we can restart the algorithm after m steps to limit the cost of the algorithm. However, there could be some occasions when the convergence criteria for the linear system is not satisfied without excessive number of restarts. In that case, the user normally limits the number of inner iterations, and/or restart number, and moves on-to the next non-linear iteration with whatever solution update achieved in the linear solver. T h e restarted version of the algorithm, G M R E S ( m ) , is explained in the following lines: CHAPTER 5. LINEAR SOLVER AND SOLUTION STRATEGY 76 GMRES(m): i n i t i a l guess XQ and compute ro = b— AXQ; 1. Use normalize the i n i t i a l r e s i d u a l v\ = ro/ ||ro||. 2 . B u i l d the orthonormal b a s i s ( A r n o l d i ' s algorithm): j = l , 2 , . . . , m do: for hij - (Avj,Vi), Vj+i = AVJ - i=l,2 YZ=i hijVi, .hj+u = \\vj+i\\, v i+\ = j, If hj+ij = 0 stop j+i/hj+ij, v 3.Form the approximate s o l u t i o n : x m =x 0 +V y m m where y minimizes f(y) = ||/3e - H y\\ , y G 3? . fc m x k 4-Restart: Compute r m = b — Ax ; m else compute XQ :=x , m i f satisfied v\ '•= r /\\r \\ m m (||r ||<tolerance) then stop m eind go to 2. T h e only possibility for breaking down of the G M R E S algorithm is in the Arnoldi's part where the orthogonalization process is performed. If hj \j + becomes zero the algorithm stops and the solution at the last iteration is returned. This happens only when the residual is zero, which means' reaching the exact solution. T h e least-squares solution in step 3 can be carried out via classical Q R factorization of H using plane rotations. k As it was suggested in the original article [72], it is preferable to update the factorization of H k progressively as each column appears at every step of the orthogonalization process. T h i s enables us to compute the residual norm of the approximate solution without computing x k and to terminate the algorithm whenever the convergence criterion for the linear system is satisfied without further operations. T h i s also means the residual of r m = b — Ax m does not need to be computed explicitly saving us the cost of one matrix-vector multiplication since the magnitude of the ||/?ei — ii/fc2/||and \\rm\\ are the same. Depending on the convergence criteria, the G M R E S algorithm may be completed with the inner iteration number smaller than the maximum subspace size (m), equal to the number of subspace size, or larger than the number of subspace size. For instance if one restart is allowed in the algorithm and the maximum subspace size of 30 is used, assuming that the convergence criterion in part 4 is not met within inner iterations, then 60 inner iterations are performed before completion of the algorithm. CHAPTER 5.2.2 5. LINEAR SOLVER AND SOLUTION STRATEGY. 77 Matrix-Vector Products Computation in G M R E S Considering the complexity of the higher-order discretization method used in this research, Chapter 3, computing the higher-order Jacobian matrix is a highly expensive task in terms of memory usage and computation cost.' E v e n if memory were not an issue, computing r a higher-order Jacobian would be quite expensive in terms of C P U time no matter what approach in Jacobian calculation is taken, Chapter 4. Therefore, a linear solver in which the explicit Jacobian computation is not required, such as G M R E S , is an attractive candidate for a higher-order implicit solver. Looking back at section 5.2.1, it is clear that in Arnoldi's algorithm, only matrix-vector products are needed, and these products can be easily approximated through the (forward) finite difference technique: F(U + 8U)=F(U) + ^5U + 0(5U) (5.7) 2 B y choosing 8U = ev (e is a small number), E q . (5.7) can be rewritten: A F(U + ev) = F{U) + — ev + 0{ev) (5.8) 2 and since e is a scalar ( v is a vector), the matrix vector product, Av , is evaluated by: A v dF .F(U + ev)~F(U) = W ~ - ^ — - e — V . ft . ( 5 - 9 ) E q . (5.9) is a first-order approximation, and a more accurate approximation can be achieved employing central differencing approach with the extra cost of one more flux evaluation per matrix-vector product (section 4.3). (residual) If forward differencing is employed for evaluation of matrix-vector product, for any inner iteration in the G M R E S algorithm one flux evaluation is needed. y So far no consideration has been made regarding the discretization order and e, but perhaps the choice of e can dramatically affect the correctness of the higher-order Jacobian matrixvector multiplications, even beyond the consideration of section 4.3. T h e issue here is how much the higher-order terms in the reconstruction polynomial are affected by a small perturbation in the flow field. In other words, do higher-order terms remain arithmetically significant or measurable after a perturbation of size 6U on a discretized domain with the length scale of Ax ? T o address this question we need to go back to Chapter 3. T h e reconstruction procedure led to.solving a least-squares problem, E q . (3.14), that can CHAPTER 5. LINEAR SOLVER AND SOLUTION be simplified in the form of E q . (5.10). STRATEGY 78 • MD = U (5.10) where M is the coefficient matrix containing the moment information of the reconstruction stencil, D is the solution vector including the derivatives of the reconstruction polynomial, and U is the control volume average vector for the reconstruction. I n . E q . (5.10), matrix M is purely geometric and it is not solution dependent, therefore by changing U to U + 8U, M will not be affected. Now we would like to know how much the derivatives are perturbed by a change in control volume averages: M(D+ 6D) = U+ 5U = MD+ 8U (5.11) and MSD = 5U or 5D — M~ 5U l ' (5.12) B y taking the norms of E q . (5.10) and E q . (5.12), these two norm inequalities are found: \\U\\ < \\M\\ \\D\\ , I I ^ H l l M - 1 ! ! ^ ! ! (5.13) and finally after multiplication of these two norm inequalities, the relative change in the derivatives can be expressed in terms of the relative perturbation in U as: ii»ii<ii -.n, „m M M ( , ) 1 4 K(M) In E q . (5.14), K(M) is called the condition number of the matrix M and it is evident that the change in the derivatives is a direct function of condition number of the coefficient matrix. T h e matrix M is relatively small and its entries are only functions of mesh geometry. In our case, the quality of the employed isotropic mesh is guaranteed by the mesh software ( G R U M M P [ 5 9 ] , [19]), so the condition number of the matrix M is expected to be fairly small (O(10) for typical employed meshes). Therefore changes in the derivatives are of the same order (or at most one order larger) as the perturbations in the solution. Consider a simplified 1-D cubic reconstruction polynomial which leads to the 4th-order discretization for a mesh with the uniform length scale of A x , E q . (5.15). U is perturbed by 5U and the perturbed reconstruction polynomial can be expressed by E q . (5.16) where CHAPTER 5. LINEAR SOLVER AND SOLUTION STRATEGY 79 the superscript p shows the perturbed coefficients: ' F(U) f F{U = a + (Ax) 0 + a (Ax) + SU) = a + 2 ai 2 a {Ax) 3 + a (Ax)+a (Ax) p p 0 p (5.15) 3 + 4(Ax) 2 . (5.16) 3 Now we study the difference between the original reconstruction and the perturbed one: F(U + SU) - F(U) = (a p - a ) + (a - a )(Ax) p 0 0 + (a p x - a )(Ax) + (a 2 p 2 3 - a )(Ax) 3 3 (5.17) In order to have a correct higher-order Jacobian in the matrix-vector product computed by E q . (5.9), all terms in E q . (5.17) are required to be arithmetically significant; otherwise, the overall accuracy of the linearization would be reduced and the convergence rate is adversely affected. and O f course the highest order term, in our case, (a p — a )(Ax) 3 3 is the main issue consequently: ll°3 ~~ °3ll ( Using E q . A a ; ) ^ M , 3 £M '• machine precision £ , (5.18) (5.14), and assuming SU = ev , the bound .'of the highest derivative for the perturbation is estimated, Eq.(5.19). Combining this bound with E q . (5.18) determines the minimum magnitude of e for keeping the higher-order order terms significant in the finite difference matrix-vector multiplication, E q . (5.20). < M K - (v ) JT F 7 T T M \\u\\ -\\a \\(Ax) K(M)\\v\\ £ 3 3 (5-19) ( 5 - 2 0 ) Further simplification can be made if || y^( ) in E q . (5.20) is assumed to be 0(1). a3 e> M *f„ „ ( A x ) H £ / A 3 (5.21) Although this result has been obtained with several simplifying assumptions, it gives us a very useful insight about the reasonable magnitude of e for proper finite difference matrixvector product computation. In the case of the standard G M R E S algorithm, ||t>|| = 1 due to the normalization process, and e can be found based on the geometric length scale. For CHAPTER 5. LINEAR SOLVER AND SOLUTION a non uniform mesh, Ax STRATEGY 80 obviously is chosen based on the smallest length scale, and as an example if Aa; = O ( 1 0 ~ ) , and EM = O ( 1 0 4 keep higher-order terms significant. - 1 6 ) then choosing e « O ( 1 0 ) or larger, would - 4 It should be noted that in the presence of the limiter using a large e or perturbation often causes limiter firing in residual evaluation leading to convergence problems. Therefore using a large perturbation for flows with an active limiter is not suggested. 5.2.3 Preconditioning Convergence of iterative techniques, including Krylov subspace methods, is highly dependent on the conditioning of the linear system, i.e. the Jacobian matrix. Using a higher-order discretization introduces more off-diagonal entries and increases the bandwidth of the Jacobian matrix considerably. In addition, in the case of Euler equations (compressible flow), with a non-linear flux function and possible discontinuities in the solution, the Jacobian matrix is off-diagonally dominant. A l l these factors lead to poor convergence of the linear solver, and consequently slowing or stalling the solution process of the non-linear problem. To remove this obstacle and enhance the performance of the linear solver, the linear system needs to be modified through a process called preconditioning, such that the preconditioned system has better spectral properties and clustered eigenvalues. A l t h o u g h in the case of the G M R E S method, eigenvalues may not describe the convergence of a nonsymmetric matrix iterations as much as they do for symmetric iterative solvers such as the Conjugate Gradient ( C G ) method, a clustered spectrum (not too close to zero) normally improves.the convergence characteristics of the linear solver [16]. Consider a preconditioner M as a nonsingular matrix which approximates the Jacobian matrix A. Equation (5.1) then can be modified by multiplying by M~ l M~ Ax 1 If M _ 1 is a good approximation to A~ l on both sides: = M~ b (5.22) l , M~ A l becomes close to the identity matrix, increasing the performance of the linear solver through eigenvalue clustering around unity. Equation (5.22) is called left preconditioning, which affects not only the matrix operator but also vector b on the right hand side. It is also possible to introduce the preconditioner operator in the right side of the matrix A, and leave the right hand intact: 2 AM~ (Mx) l = b, x = M~ z l (5.23) CHAPTER 5. LINEAR SOLVER AND SOLUTION STRATEGY 81 T h i s is called right preconditioning. If the G M R E S algorithm is used with left preconditioning, M~ (b l — Axk) is minimized instead of 6 — Ax , resulting in a different stopping k criterion (residual norm) for the algorithm. However, right preconditioning, still minimizes the residual n o r m of the original system [73, 81]. O f course the best apparent choice for M is A, as AM~ l = / , but if we could solve the system with the A matrix easily we never needed preconditioning in the first place. Finding the optimal preconditioner matrix is not unique, since it is highly problem dependent, and also depends on how the preconditioner is applied. Also there are circumstances where the matrix M need not be computed explicitly and a preconditioner operator replaces M. , like polynomial preconditioners [81]. B u t in general three factors are considered in choosing a preconditioner: 1. M is a reasonably good approximation to the coefficient matrix A. 2. A f is better conditioned, more narrow banded, and less expensive to build compared to matrix A. 3. T h e system Mx = z should be much easier to solve than Ax = b. T h e bottom line is the cost of the construction and applying the preconditioner should be relatively cheap with respect to the cost of the original linear system solving. For effective preconditioning, in addition to applying a good preconditioner matrix, we need to employ a good preconditioning technique. A preconditioning technique is a sparse linear solver itself, and it can be a direct or iterative method. Iterative stationary methods such as successive over-relaxation ( S O R ) rely on the fixed iteration E q . (5.24), where B is the fixed iteration matrix and C is a fixed vector: x\ i+ = Bxi + c (5-24) Stationary methods are simple and easy to implement, often have parallelization advantages, are low cost (memory and C P U time), and finally they are effective in damping high frequency errors. However, they often have a restrictive stability condition, reduc- ing the benefits of Newton method. T h i s is especially true if the preconditioner matrix is off-diagonal which is the case for compressible flow. A t the same time, for relatively large systems slow damping of the low frequency errors becomes a noticeable issue for these techniques in the absence of a multigrid augmentation strategy. CHAPTER 5. LINEAR SOLVER AND SOLUTION Preconditioning STRATEGY 82 Ratio of non-zero elements Factored M a t . / O r i g i n a l M a t . , ILU-0 1 ILU-1 1.25 ILU-2 1.5 ILU-3 1.8 ILU-4 2.1 LU 7.2 Table 5.1: Ratio of non-zero elements in factorized matrix Another preconditioning technique method. Factoring the matrix M is the Incomplete Lower-Upper Factorization (ILU) into two triangular matrices, M = LU, where L is a lower triangular matrix, and U is an upper triangular matrix normally results in factored matrices that are far less sparse than the original matrix M. A s a consequence, a con- siderable amount of memory needs to be assigned for factorization, and the factorization process is quite expensive. O n e solution to this problem is incomplete factorization in which additional nondiagonal fill-in entries are allowed only for a predefined set of locations in the L U factorizations. In other words, we choose a non-zero pattern in advance for the elements of the factored matrices [74]. T h i s is called I L U - P where P is the fill-level in the factorized matrix. P equal to zero means no fill is permitted during I L U decomposition. In I L U - 0 the factorized matrix and the original (non-factored) matrix have the same graph or non-zero element locations. Choosing P larger than zero allows some additional fill-in in the factored matrix improving the accuracy of factorization and the preconditioning quality. However, increasing the fill-level comes at the expense of memory usage a n d extra computing cost, imposing a restriction in increasing fill-level in practice. Table 5.1 shows the number of non-zero elements in the factored matrix where the original preconditioning matrix is a first-order Jacobian [52], For instance I L U - 2 needs 50% and I L U - 4 requires a little bit more than 100% extra memory for storing the decomposed matrix in comparison with the original matrix graph. Also L U will be infeasible in practical applications because of its huge memory requirements. Both the preconditioner matrix and preconditioning technique may be developed based on the specific problem approach, but first one should have a detailed knowledge about the numerical aspects of the problem which is not always possible; second, generally speaking, such a preconditioning procedure is sensitive to the type of the.problem. Application of general preconditioning techniques such as I L U is more common for compressible C F D solvers. Another modification in the I L U family is I L U ( P , r ) , where in addition to the static non-zero pattern, a tolerance r is added as a dropping criterion for entries of the CHAPTER 5. LINEAR SOLVER AND SOLUTION STRATEGY 83 factored matrices [16]. In other words, all fill-in entries within the level P are changed to zero if they do not satisfy some set of tolerance condition. Since the proper fill-level and tolerance criterion in I L U ( P , r ) for efficient preconditioning of the compressible flows are highly dependent on the test case, and they are not determined uniquely, I L U ( P , r ) is not considered in this research. Several researchers have implemented I L U methods for preconditioning of the linear solver for compressible fluid flows [91, 14, 24, 18, 45]. GMRES Their results show that I L U with some additional fill-in (i.e. ILU-1&2) is a reliable and robust preconditioning strategy for a variety of test cases, while ILU-0 fails to provide fast convergence in some cases. Reordering is another important factor in I L U factorization. reduce the bandwidth of a matrix. Reordering is designed to Smaller bandwidth leads to a sparser matrix, and consequently during factorization (Gaussian elimination process) of the matrix fewer fill-in entries will appear. K n o w i n g that, increasing the fill-level in I L U preconditioning has its own disadvantages, reordering the original preconditioner matrix becomes essential to keep the accuracy of preconditioning for a low fill-level factorization. Experience shows that quite often a non-reordered preconditioner matrix with low fill-level works poorly, while the same fill-level factorization performs quite well, when the original matrix is reordered. T h e most common available reordering technique is Reverse C u t h i l l - M c K e e ( R C M ) [74] which has been successfully used for I L U - P preconditioning. In this thesis research, the G M R E S linear solver including the I L U - P factorization is the employed Petsc library developed by Argonne National Laboratory [1]. 5.2.4 G M R E S with Right Preconditioning In right preconditioning AM'~ z l = b is solved where, z — Mx . Like the standard G M R E S , in the right preconditioned G M R E S , the objective is minimizing the residual r = b — Ax. Although the initial residual is ro = b - AM~ ZQ 1 , ZQ need not to be computed explicitly and all the elements of the K r y l o v subspace are created without referring to z. R i g h t P r e c o n d i t i o n e d GMRES(m): 1. Use i n i t i a l guess XQ and compute ro = b— normalize the 2. B u i l d t h e Axo) i n i t i a l r e s i d u a l v\ — r o / ||ro||. orthonormal b a s i s for j = l , 2 , . . , , m do: (Arnoldi's algorithm): CHAPTER 5. LINEAR SOLVER compute Wj = W + V j + i = w If ^ i + i j + / h 1 j + l i j = x + M~ V y 0 m stop 0 solution: where y 1 m = , 3. Form the approximate x & i=l,2, . . . ,j . . v j STRATEGY _1 = hj \j SOLUTION AM Vj = i ji i), iJ h AND m m minimizes f(y) = \\6e 1 - H y\\ k ,y e 5R . FC 4 . Restart: Compute r m = b — Ax ; i f s a t i s f i e d (||r ||<tolerance) then stop m m else compute xn :—x , V\ := r / m m \\r \\ and go to m 2. A p p l y i n g the preconditioner, the K r y l o v subspace is right preconditioned orthogonal basis: /C = {r , 0 AM~ r , (AM~ ) ~ r } l l Q m (5.25) l 0 This time the matrix-vector, operator is AM~ v l = Az , where z — M~ v l is the precondi- tioned vector. Nielsen et. al. [55] has suggested to scale e in E q . (5.9) by R M S of vector' z to improve the convergence, since using a constant e could result in convergence stall after couple of orders of the (non-linear) residual reduction: _F{U + ez)-F[U) A Z ~ . ' e where £ o is a small scalar usually in the order of / v e 0 -RMS(z) S £JW ( 5 - 2 b j (£M- machine accuracy). However, as it was described in section 5.2.2, the small scalar value may be taken some orders of magnitude larger than ^JEM due to mesh length scale issues. 5.3 Solution Strategy T h e goal of all C F D solver developers is reducing the cost of computation as well as improving'the accuracy of the solution. T h e latter could be achieved by applying higher-order discretization methods. However, in a practical sense that is possible if and only if the cost of the computation is comparable to the cost of the common second-order methods. That is why implicit techniques (Newton family of methods) play a crucial role in achieving that goal since the best explicit techniques are far less efficient with respect to computation cost making higher-order computation quite uncompetitive. In addition to the computation cost, CHAPTER 5. LINEAR SOLVER AND SOLUTION STRATEGY 85 i.e. C P U time, memory usage is also an issue for solver developers. B u t in most cases, C P U time is the main concern since if storing a set of data (i.e. Jacobian matrix) is expensive, then computing that data would be expensive too except in the case that the expensive computation is performed once and the resultant work is used in multiple iterations. Knowing the over all- approach, which is the implicit formulation, and the main objective which is the steady-state solution, we would like to lay out an efficient strategy to reach the steady state solution as fast as possible. T h e overall computing cost for a steady-state C F D problem by an implicit approach, can be analyzed by dividing the solution process into three major parts: 1. Forming the linear system including linearization and the non-linear residual computation. 2. Solving the linear system including preconditioning. 3. Number of implicit iterations needed for steady-state convergence. It is obvious that we would like to minimize the cost of the first and second part of the solution process and reach the steady-state solution with m i n i m u m number of iterations. T h e third part has a cumulative effect on the overall solution cost since in each (non-linear) iteration the first two parts are covered once already. Most C F D problems, including compressible flows, are highly non-linear, and that makes them not too easy to solve via linearization. In addition to non-linearity in the mathematical sense, the behavior of a compressible flow (the physics of the flow) can dramatically change in some flow conditions for a fixed geometry. Therefore if the solution process is started from an arbitrary initial condition which is not reasonably close to the steady-state solution, large updates to the solution based on linearization are not likely to be helpful. Also, Newton's method is well known to converge to the solution quadratically if started from the vicinity of the solution; otherwise it will diverge or stall quickly. Since finding a good approximate solution in general (that is except the cases for which some close solution is available through analytical or experimental data) is impossible without actually starting to solve the problem, the solution process should consist of two phases:' 1 . Start-up phase 2. Newton phase CHAPTER 5. LINEAR SOLVER AND SOLUTION STRATEGY 86 In the start-up phase, multiple linearizations with small or moderate solution updates are required to advance the solution to the point that the linearization is accurate enough for finding the steady-state solution, i.e. a good approximate solution as initial starting point for Newton phase. Having a good initial guess, the solution process is switched to the Newton iteration where superlinear or quadratic convergence is possible by taking a very large (or infinite) time step. 5.3.1 Start-up Phase F i n d i n g a good initial guess or reasonable approximate solution is about knowing the physics of the problem and finding a proper way to get to that good initial solution faster by employing various techniques such as mesh sequencing, multigrid, mixed explicit/implicit iterations, exact solution of a simplified problem, potential flow solution, and so on. Hence, like preconditioning, start-up is problem dependent and is more an art rather than an exact science, implying that the proper start-up process is not unique either. In this research a proper start-up process for compressible flows is suggested which is based on the defect correction procedure [46]. In the start-up process, considering the fact that multiple iterations are required for advancing the solution toward a good initial, guess, an implicit iterative process is used where the linearization is performed based on the inexpensive first-order discretization, section(4.2), and the flux calculation remains higher-order: (5.27) A relaxation factor, u>, is applied for the solution update; to = 1 results in standard defect correction, while using different values for u> (typically ui = 0.9-1.3) can over-relax or underrelax the update [52].. Over-relaxation often is helpful in subsonic flows where it accelerates the solution while under-relaxation can prevent divergence resulting from an inaccurate solution update, for instance in transonic cases. T h e Jacobian matrix on the left hand side is the first-order Jacobian and its computation cost is approximately the same as the cost of the second-order residual calculation. The cost of one Jacobian evaluation is 0.6-0.7 of the cost of a second-order residual evaluation (limiting cost excluded) for the approximate analytical Jacobian and it is 1.3-1.5 of the computation cost of the same residual evaluation if the finite-difference Jacobian is employed (for moderate mesh sizes). CHAPTER 5. LINEAR SOLVER AND SOLUTION STRATEGY 87 T o reduce the cost of forming the linear system, the Jacobian matrix may be computed and stored for several iterations [39]. W i t h this approach the Jacobian is updated every j iteration (s) (typically j = 1-4) [52]. T h e time step At , which is actually scaled by the volume (area) of each control volume, is based on the C F L number times m a x i m u m allowable explicit time step: ' A t l = CFLA A A ^ _ a x = -j fey, m A ^-max ( 5 2 g ) Ai ' T ' max«s ^max : M a x i m u m wave speed (5.29) As was noted before and also has been graphically illustrated for a sample non-linear function, F i g (5.1), using the slope of the residual function for estimating the behavior of the residual function over a large span of the independent variables is not accurate in the early stage of the solution process. Therefore it makes sense to start from a relatively small A i or C F L , and to gradually increase C F L to some modest number as we are getting closer to the solution of the residual function, i.e. F(U) = 0. T h i s justifies applying the first-order linearization in the start-up process instead of the correct order Jacobian calculation; the linearization does not allow large steps towards the steady-state solution, so why should we spend too much work to compute it, especially if implementation of Jacobian in the linear solver requires several matrix-vector multiplications? A t the same time, solving a linear system based on the first-order Jacobian is much easier than doing so for higher-order Jacobian (even for the second-order one), because of the structure of the higher-order Jacobian matrix. Therefore we are better off to form and solve a less expensive system, since we must inevitably perform several iterations at the start-up phase. T h e next step is solving the linear system. Since we are using an approximate linearization, it does not make sense to solve the approximate linear system exactly, therefore the linear system in each defect correction iteration, referred to as pre-iteration from now on, is solved approximately. T h e goal of each pre-iteration is to reduce the non-linear higher-order residual by cheap linearization. Consequently it is logical to solve the linear system up to some fraction of the non-linear residual, Le. tolerance=C x Res(U), C < 1 . E m p l o y i n g this approach saves us from spending too much effort on finding a precise solution to a linear system that we know will not give us the correct solution to the non-linear problem. T h e right-preconditioned G M R E S ( m ) algorithm is used with a limited subspace size (K=15-20) and relatively a loose tolerance (0.1-0.05 Res(U)) is set for solving the linear system. No restart is allowed, and if the tolerance is not reached within the m a x i m u m search directions (subspace size) G M R E S is terminated anyway and the computed solution update up to that CHAPTER 5. LINEAR SOLVER AND SOLUTION STRATEGY 88 point is taken to move to another non-linear outer iteration. T h e preconditioner matrix is the same as the linear system matrix which is the perfect choice for preconditioning because the first-order Jacobian has been built already for forming the linear system, and it is the most accurate preconditioner matrix for that case. T o reduce the preconditioning cost, a low fill-level incomplete factorization, I L U ( l ) , is applied, which is accurate and efficient enough for this first-order linear system. Now the question is how close we should be to the solution before switching to Newton iterations. Normally the decrease of some norm of the non-linear residual at the current iteration compared to the initial residual, > * chosen for that purpose. T h a t is, if s the solution reaches a specific relative norm of the non-linear residual, then the linearization is accurate enough to take a very large time step at each iteration. It should be noted that the value for this criterion varies from problem to problem. However, it is possible to find reasonable values for different categories of the compressible flows (i.e. subsonic, supersonic and transonic). More detail is provided in that regard in the results chapter. 5.3.2 Newton Phase B y this stage most of the transient behaviors of the non-linear C F D problem have been removed from the flow field solution, and major steady features of the fluid flow have appeared in the solution domain. Consequently the linearization of the non-linear residual is fairly accurate for seeking the steady-state solution, and taking infinite or very large time steps accelerates the solution toward quadratic or superlinear convergence, i.e. switching to the Newton iteration formula: OR W n+i AU High = _ (ir), RHigh U n+1 = U + AU n n+x (5.30) T o have an accurate linearization, the higher-order Jacobian must be applied during the. Newton iterations, E q . (5.30). T h i s has been implemented through matrix-free G M R E S , using finite difference directional matrix-vector multiplication technique. T h e linear system is right preconditioned by the first-order Jacobian, which indeed is essential due to bad conditioning of the left hand side. ILU(P=2-4) is used for preconditioning and normally for higher-order computation (especially the fourth-order transonic flow), increasing the filllevel is considerably beneficial [52]. A fixed number of search directions is employed (K=30) and it is important to keep them limited since the matrix-free G M R E S performs multiple perturbations of the non-linear residual which is very expensive for higher-order residual computation. T h e linear system is again solved approximately but this time with a tighter CHAPTER 5. LINEAR tolerance (10 Res(U)) -2 SOLVER AND SOLUTION STRATEGY 89 as an accurate update is required. N o restart is allowed, and if the tolerance is not reached, the next outer iteration starts with the best update from the last inner G M R E S iteration. Approximately solving the linear system in this way is called the Inexact Newton method, and although it can increase the non-linear outer iteration number, it reduces the over all C P U time as considerable computation time is saved by not computing the exact solution of the linear system [64, 18, 45, 51]. Notice that the left hand side is not quite exact because of the truncation error in the matrix-vector multiplications and possibly is polluted by round off error, especially for higher-order terms. Therefore it is preferable to give up the quadratic convergence instead of increasing the solution C P U time. Chapter 6 R e s u l t s (I): Verification Cases The first step in accuracy verification of an unstructured C F D solver is to verify the accuracy of the reconstruction procedure. For this purpose two complete sets of the reconstruction tests including straight and curved boundaries are presented here. Then to study the correctness, basic performance and solution accuracy of the proposed higher-order unstructured Newton-Krylov solver, smooth subsonic and supersonic cases with known solutions have been investigated. This chapter is devoted to numerical study of these test cases. 6.1 Reconstruction Test Cases The objective here is to reconstruct a test function over a domain with different mesh resolutions, and to measure the error between the reconstruction of the test function and the exact value of the test function for discretized meshes. Two different domains have been chosen for examining the over all accuracy of the reconstruction procedure. The first test case is a unit square (a case with straight edges) and the second test case is an annulus which has two curved edges. The test function is a smooth analytical function described by E q . (6.1). f(x, y) = 1.0 + 0.1 sin(7nc) sin(7ry) (6.1) The first step in reconstruction is computing the mean value for a control volume, implying that we need to be able to integrate a given function over any control volume (including boundary triangles with curved edges) at least as accurately as the nominal discretization 90 CHAPTER 6. RESULTS(I): 2nd-order VERIFICATION Ratio Lx 0.0001667 Mesh 1/460 CVs Mesh 2/1948 CVs 4.551149e-5 Mesh 3/7856 CVs 1.15364e-5 Mesh 4/31668 CVs 2.92207e-6 — 3.66 3.94 3.95 91 CASES L Ratio 2 0.0002948 8.73185e-5 2.27180e-5 5.81718e-6 — 3.37 3.84 3.9 Loo 0.0017002 0.0004369 0.0001099 2.75408e-5 Ratio — 3.89 3.97 3.99 Table 6.1: 2nd-order error norms for the square case order. The integration of a given function over the area of a control volume is carried out using Gauss quadrature rule, Eq. (6.2), where Wj is the Gauss weight. While this procedure looks straightforward, determining Gauss point locations and their associated weights robustly for higher-order integration over an area with curved edges are proved to be a nontrivial task [95]. n f(x,y)dA = Y,W f(x ,y ) j j j (6.2) In this research, for the square case a 6th-order integration routine using 7 Gauss points and for the annulus case a 4th-order integration routine using 10 Gauss points (developed by Dr. Carl Ollivier-Gooch in ANSLib frame work [58]) have been employed. All the grids are generated using GRUMMP Version 0.3.2 [59] which generates high quality triangular meshes for domains with curved boundaries [19]. 6.1.1 Square Test Case Four unstructured meshes shown in Fig (6.1) were generated for a unit square domain. The density of control volumes is increased 4 times at each mesh with respect to the previous mesh level. The mesh length scale is not uniform, with control volumes at the corners clearly having larger length scales than the control volumes in the interior of the square domain. The error norms and the ratio of error norms are tabulated in Table 6.1 through Table 6.3. Despite non-uniform mesh distribution, all the error norm ratios for the Mesh 4 confirm the nominal accuracy of the reconstruction. Fig (6.2) displays the Error-Mesh plot for the square case. Notice that the error norm is reduced considerably by increasing the order of accuracy, and the slope of the graph proves the reconstruction accuracy. Asymptotic convergence of the error norms are clearly observed by examining the tabulated data and Fig (6.2) for this series of meshes. CHAPTER 6. RESULTS(I): VERIFICATION CASES ir r r r r r A ^i^^i? r A A A A xrj&w A A ••y^^^^WJBFjaagwm r &W r A A A WA tt^tttttt && *$Kd&K*&' £>'j&W -"i^m^WWU^W^ta f A 0 A 0.4 0.2 0.6 X 0.8 (a) 460 Control Volumes (b) 1948 Control Volumes (c) 7876 Control Volumes (d) 31668 Control Volumes 1 F i g u r e 6.1: U n s t r u c t u r e d meshes for a square d o m a i n Li Ratio M e s h 1/460 C V s 3.54269e-5 — M e s h 2/1948 C V s 5.18616e-6 M e s h 3/7856 C V s 6.92254e-7 M e s h 4/31668 C V s 8.96936e-8 3rd-order L Ratio Loo Ratio 4.82035e-5 — 0.0002061 — 6.83 7.59731e-6 6.34 3.47214e-5 5.9 7.50 1.01755e-6 7.46 4.71759e-6 7.36 7.70 1.31888e-7 7.70 6.09472e-7 7.74 2 T a b l e 6.2: 3rd-order error norms for the square case CHAPTER 6. RESULTS(I): VERIFICATION 4th-order Li M e s h 1/460 C V s 3.98124e-6 M e s h 2/1948 C V s 3.30234e-7 Ratio — 12.06 CASES L Ratio 2 9.87666e-6 8.20473e-7 — Loo Ratio 0.0001084 — 12.03 8.39794e-6 12.90 15.30 15.82 M e s h 3/7856 C V s 2.11865e-8 15.60 5.32239e-8 15.40 5.49060e-7 M e s h 4/31668 C V s 1.35160e-9 15.70 3.40196e-9 15.65 3.4699e-8 Table 6.3: 4th-order error norms for the square case To 3 W~ Number of Control Volumes Figure 6.2: E r r o r - M e s h plot for the square case 10 5 CHAPTER 6. RESULTS(I): VERIFICATION 2nd-order Li 0.00087885 Mesh 1/427 CVs Mesh 2/1703 CVs 0.0001974 Mesh 3/6811 CVs 461658e-5 Mesh 4/27389 CVs 1.09032e-5 ; Ratio — 4.45 4.28 4.23 94 CASES L 2 0.00143183 0.0003387 8.01761e-5 1.85523e-5 Ratio — 4.23 4.22 4.32 Loo 0.0091068 0.0035145 0.0008615 0.0002311 Ratio — 2.59 4.08 3.72 Table 6.4: 2nd-order error norms for the annulus case 3rd-order Li 0.00037211 Mesh 1/427 CVs Mesh 2/1703 CVs 4.86257e-5 Mesh 3/6811 CVs 6.15972e-6 Mesh 4/27389 CVs 7.53171e-7 Ratio — 7.65 7.89 8.17 L 2 0.00051522 7.05354e-5 8.83369e-6 1.05894e-6 Ratio — 7.3 7.98 8.34 L>oo 0.0020959 0.0003919 6.93723e-5 9.78171e-6 Ratio — 5.34 5.65 7.09 Table 6.5: 3rd-order error norms for the annulus case 6.1.2 Annulus Test case This is the case that includes curved boundaries. The domain inside an annulus is discretized with four mesh densities. The Geometry and meshes are shown in Fig (6.3). Again refinement is performed using global parameters and except for the center portion of the annulus, refinement has been carried out relatively uniformly. The boundary region control volumes which normally introduce a major part of the reconstruction error still remain larger than the interior control volumes. The error norms and their ratio for all orders of reconstruction are presented in Table 6.4 to Table 6.6. l\ and l norms have converged 2 after 3 level refinement. Zoo norm has not converged as fast as the other two norms. Since Zoo is the local error indicator and presents the maximum error in the reconstruction, the slower convergence rate for Zoo often is expected. The nominal order of accuracy for Zi and l are achieved through the employed 4 level meshes. The ratios for Zoo are also fairly close 2 to the nominal order of accuracy. Fig(6.4) demonstrates the reduction in Zi norm of the error versus control volume density. The accuracy of the reconstruction also can be verified by measuring the slope of this Error-Mesh plot. In addition, such a plot can give us a good estimation of the average solution error for a certain mesh density. In other words we can determine how many control volumes do we need to reach a certain level of error based on the chosen discretization order. It is interesting to note that the l\ norm of the reconstruction error on the finest mesh (for this test function) is 15 times smaller for the 3rd-order discretization and more than 500 times smaller for the 4th-order discretization than the error for the 2nd-order discretization. CHAPTER 6. RESULTS(I): VERIFICATION CASES (c) 6811 Control Volumes Figure 6.3: Mesh Mesh Mesh Mesh (d) 27389 Control Volumes U n s t r u c t u r e d m e s h e s for a c u r v e d d o m a i n Lx 4th-order 95 Ratio L Ratio 2 (annulus) Loo Ratio 1/427 C V s 8.03432e-5 0.0016860 _ 2/1703 C V s 5.87669e-6 13.67 1.11776e-5 12.16 0.0001678 10.04 3/6811 C V s 3.64156e-7 16.14 7.14849e-7 15.63 1.42043e-5 11.82 4/27389 C V s 2.09949e-8 17.3 3.88537e-8 18.39 1.15369e-6 12.31 Table 6.6: 0.00013598 — 4th-order error norms — for t h e a n n u l u s case CHAPTER 6. RESULTS(I): VERIFICATION CASES Number o f Control Volumes Figure 6.4: Error-Mesh plot for the annulus case CHAPTER 6. RESULTS(I): VERIFICATION CASES 97 500h 400h 300 h 200 h 100 k 300 (a) Figure 6.5: Circular domain over half a cylinder, Mesh 1 (1376 CVs) 6.2 Subsonic Flow Past a Semi-Circular Cylinder The smooth inviscid flow over a semi-circular (half) cylinder with R = 1, at MQO = 0.3 is computed. The flow direction is from left to the right and the angle of attack is zero. The far-field boundary is located at R = 300 to make sure that all perturbations are damped effectively (Fig (6.5)). Mesh CVs) are generated for this purpose; close-ups are shown in Fig (6.8). 3 (22844 Three different meshes, Mesh 1 (1376 CVs), Mesh 2 (5539 CVs), and (6.6) to Fig A special effort has been made in the grid refinement process in order to achieve a relatively self-similar refinement through the whole solution domain. Each mesh is nearly four times denser than its immediate coarser level mesh implying that the mesh length scale in all parts of the domain has been reduced approximately by a factor of two (section (3.3)). CHAPTER 6. RESULTS(I): VERIFICATION 98 CASES Figure 6.6: Circular cylinder, Mesh 1 (1376CVs) Mesh CVs Max AcVi Min AcVi 1 2 3 1376 5539 22844 1475.98 516.15 125.98 0.00360815 0.000853626 0.000240918 V f Amax, Amin 639.6 777.6 723.2 V / {Amax^coarse {A-max) fine V / (Amin)coarse {Amin)fine — 1.69 2.02 2.06 1.88 Table 6.7: Sizes and ratios of the control volumes for circular cylinder meshes Table 6.7 summarizes a brief information about sizes of the control volumes in each mesh level. The length scale of each control volume can be assumed to be proportional to the square root of the control volume area. By looking at the tabulated data, it is clear that in each mesh level there is a wide variety of length scales which largely changes across the solution domain. The table also shows the approximate refinement factor from one coarse level to its immediate finer level. Assuming that the refinement is done uniformly (which is not the case in reality), the length scale of the mesh is not exactly reduced by a factor of two in each refinement. Since, we do not have a uniform mesh to start with, we are not able to uniformly refine the mesh, and reduce the length scale of the control volumes in an orderly fashion through whole solution domain (especially at boundary regions) it is not too surprising that the measured order of the accuracy will be somewhat away from the nominal expected order, section (3.3). CHAPTER 6. RESULTS(I): VERIFICATION CASES F i g u r e 6.8: C i r c u l a r cylinder, M e s h 3 (22844 C V s ) CHAPTER 6. RESULTS(I): VERIFICATION 100 CASES The potential (incompressible) flow solution is used as an initial solution for all the meshes and orders of accuracy. Since the solution is started from a good initial guess, no start-up procedure is needed. Newton iterations are performed using matrix-free GMRES with the maximum Krylov subspace size of 30. However for the sake of the robustness, we keep the -£t term and the time advance formula with large CFL number is applied, Eq. (2.12). To reduce the cost of the inner iterations, where multiple residual perturbations are required, a subspace of 20 is employed for early outer iterations. When the l norm of the non-linear 2 residual is dropped below 5 x 1 0 -9 the subspace of 30 is used. This is especially helpful for the 4th-order case where the residual computation is very expensive. The tolerance of the linear system solver is chosen to be 1 x 1 0 -2 of the l norm of the non-linear residual which 2 often is not reached within the allowable inner iterations. The convergence criterion for the steady-state solution is 1 x 1 0 -12 for the l norm of the non-linear residual. In the case of 2 Mesh 1 and 2, CFL=5000 for the 2nd and 3rd-order discretizations and CFL=1000 for the 4th-order discretization are used for starting the solution from the initial guess and it is increased to CFL=10,000 when the non-linear residual is reduced by one order. For Mesh 3 the starting C F L is 100 for all orders of accuracy which is increased to CFL=5000 gradually, and after one order reduction in residual, it is increased to CFL=10,000. The preconditioner matrix for all cases is the approximate analytic Jacobian and the preconditioning strategy is ILU(4). The solution convergence history for the coarse mesh (Mesh 1) and the fine mesh (Mesh 3) are shown in the Fig (6.9) and Fig (6.10). As it is evident full convergence is achieved, but the CPU time for the 4th-order case is much larger than the other two orders of accuracy. This was partially expected since the 4th-order discretization requires more operations, and the cost of the reconstruction rises quadratically by increasing the discretization order. However, a considerable part of the computing cost is due to noticeable increase in outer iteration numbers. Furthermore, the linear system arising from the 4th-order discretization is ill-conditioned and difficult to solve which results in relatively ineffective linearization of the non-linear problem (considering the fixed subspace size and Inexact Newton approach) demanding more outer iterations. On the contrary, both the 2nd and 3rd-order cases quickly converge displaying the effectiveness of the linearization. Pressure coefficient contours (banded) for all orders of accuracy (Mesh 1) are shown in the Fig (6.11) to Fig (6.13). The quality of the contours, which reflect the smoothness of the reconstructed solution, are visibly improved by increasing the order of accuracy of the solution . The difference between the contours' quality is clearer in regions where control 1 Visualization is done by Tecplot 360 using 3 nodal triangle finite element format; consequently although solution is higher order its visual representation is only second order. 1 6. RESULTS(I): 1 o' ' 13 1 VERIFICATION 1 1 1 1 1 O 1 CASES 1 1 1 1 1 50 100 CPU-Time (Sec) 1 1 1 1 150 F i g u r e 6.9: Convergence history for the coarse mesh ( M e s h 1) R • 0 F i g u r e 6.10: i a ' i I 500 i i i i I—i—i—i—i—I—L 1000 CPU-Time (Sec) 1500 Convergence history for the fine mesh ( M e s h 3) CHAPTER 6. RESULTS(I): VERIFICATION CASES F i g u r e 6.12: 3rd-order pressure coefficient contours, M e s h 1 CHAPTER 6. RESULTS(I): VERIFICATION CASES 103 F i g u r e 6.13: 4th-order pressure coefficient contours, M e s h 1 volumes are quite coarse. T h e 4th-order pressure coefficient contours over M e s h 3 are also shown i n F i g (6.14) as a reference for the quality comparison w i t h previously mentioned cases. Since flow is inviscid a n d the geometry is symmetric, the pressure contours are s y m m e t r i c respect to the normal axis passing through the center of the cylinder. Front a n d rear stagnation regions where the m a x i m u m pressure occurs (C =1.0 ) a n d the suction peak p on the top of the cylinder are displayed i n the contours. T h e pressure coefficient distribution along the surface of the cylinder (Gauss point d a t a over the circle) is shown i n the F i g (6.15) to F i g (6.17). In terms of the solution, all discritization orders predict nearly the same pressure coefficient over the cylinder. In general a n d for a s m o o t h flow that is what is expected since results should differ w i t h i n the orders of the mesh size. In the 2nd-order case, when the mesh is coarse, the suction peak is not recovered smoothly nor is it located quite at the top of the cylinder; however, these issues are considerably i m p r o v e d for the 3 r d a n d 4th-order solution over the same mesh. T h e 4thorder case over the coarse mesh shows a slight over-prediction b o t h at the front stagnation point a n d at the suction peak compared to the same solution over the fine mesh. Finally, solution over the fine mesh for all orders of accuracy is displayed i n F i g (6.17) showing that all of t h e m essentially are converged to the same pressure d i s t r i b u t i o n such that the difference between t h e m is not visible to plotting accuracy. CHAPTER 6. RESULTS(I): VERIFICATION CASES F i g u r e 6.14: 4th-order pressure coefficient contours, M e s h 3 2nd-Order/Meshl 3rd-Order/Meshl l_l -1 i i i i I -0.5 i i i i L_i 0 X L_J i I 0.5 i i i i—L 1 F i g u r e 6.15: Pressure coefficient along the axis CHAPTER 6. RESULTS(I): VERIFICATION CASES F i g u r e 6.16: Close u p of the pressure coefficient along the axis (suction region) 2nd-Order/Mesh3 3rd-Order/Mesh3 -1 -0.5 0 0.5 1 F i g u r e 6.17: Pressure coefficient along the axis, M e s h 3 CHAPTER 6. RESULTS(I): VERIFICATION 2nd-order Li Mesh 1/1376 CVs 0.000553 Mesh 2/5539 CVs 9.4548e-5 Mesh 3/22844 CVs 2.0492e-5 106 CASES Ratio — 5.85 4.61 Loo 0.013409 0.003713 0.00129 Ratio — 3.61 2.88 Table 6.8: Error norms for total pressure, 2nd-order solution 3rd-order Li 0.000426 Mesh 1/1376 CVs Mesh 2/5539 CVs 3.5353e-5 Mesh 3/22844 CVs 6.0965e-6 Ratio — 12.05 5.8 Loo 0.010250 0.002030 0.000543 Ratio — 5.05 3.74 Table 6.9: Error norms for total pressure, 3rd-order solution As flow is inviscid and subsonic (shock free), the total pressure of the flow should be preserved everywhere, and it is equal to the total pressure of the free-stream at the far-field. Therefore deviation of the total pressure ratio from one can be interpreted as an error indicator for these type of flows (E = — 1). The discrete norms of the errors in total pressure ratio of control volume averages are computed for all orders of accuracy and meshes and are tabulated in the Table 6.8 to Table 6.10. The l\ norm, a global measurement for error, is roughly half an order less than the nominal order of the discretization for the 3rd-order, and it is about right for the 4th-order discretization. The loo norm, a local measurement for maximum error, follows the same trend and it is about one order less than the nominal norm of the discretization for the 3rd-order solution. The total pressure error also is shown in Fig (6.18) to Fig (6.21) over the coarse mesh for all orders of accuracy and over the fine mesh for the 4th-order discretization (as a reference for comparison). These pictures demonstrate that the surface boundary as the main source of the error and show how different discretization orders under-predict and/or over-predict the total pressure in stagnation regions and at the maximum acceleration area (top of the cylinder). For instance the 2nd-order and 3rd-order discretizations slightly under-predict the total pressure over nearly all the boundary surface, and this under-prediction reaches its peak at the rear stagnation region. The 4th-order method slightly over-predicts the total pressure nearly everywhere along the boundary surface which was consistent with the 4th-order pressure coefficient along the boundary. By refining the mesh and/or increasing the accuracy order the error in total pressure improves dramatically (Fig (6.21)). The Error-Mesh plot for Zi, Fig (6.22), demonstrates the total pressure error reduction versus mesh size. It should be noted that despite the deviation of the measured accuracy order from its nominal value, the higher-order discretization still produces noticeably smaller error. For instance in the case of the 3rd-order solution over the finest mesh, where the CHAPTER 6. RESULTS(I): VERIFICATION CASES F i g u r e 6.18: E r r o r in total pressure ratio, 2nd-order discretization, M e s h 1 F i g u r e 6.19: E r r o r i n total pressure ratio, 3rd-order discretization, M e s h 1 CHAPTER 6. RESULTS(I): VERIFICATION CASES Figure 6.21: Error in total pressure ratio, 4th-order discretization, Mesh 3 6. RESULTS(I): VERIFICATION CASES 4th-order u M e s h 1/1376 C V s 0.000535 M e s h 2/5539 C V s 2.7473e-5 19.47 0.001097 13.5 M e s h 3/22844 C V s 2.0954e-6 13.11 8.1810e-5 13.4 Foo Ratio Ratio 0.0148117 — — T a b l e 6.10: E r r o r norms for total pressure, 4th-order solution -• •A j l 1 CVs i l i 10000 i i 2nd-Order 3rd-Oider 4th-Order i i i i iii i i 20000 30000 Figure 6.22: E r r o r - M e s h plot for the total pressure CHAPTER 6. RESULTS(I): VERIFICATION CASES 110 Figure 6.23: Drag coefficient versus mesh size largest deviation from the nominal accuracy occurs, the l\ norm is 3.4 times smaller than the l\ norm of the 2nd-order solution on the same mesh. The Drag coefficient can also be used for error measurement since the exact drag coefficient for the inviscid subsonic flow over circular cylinder is zero. Figure (6.23) displays the drag reduction versus mesh size. This time the measured accuracy order is somewhat better than the nominal order showing some error cancellation due to symmetry of the solution. As expected higher-order discretization results in much smaller drag, which is closer to the exact solution. To show the benefits of higher-order discretization, it is useful to assess the performance of a higher-order solution versus its accuracy. This can be perfectly done for the circular cylinder case. The C P U time of the solution versus the drag coefficient is shown in F i g (6.24). Assume that the picture shown is the solver performance characteristic graph for all meshes, and by changing the mesh size (within the range of the interest) the behavior of the performance graph does not change. For a desired accuracy, for instance CD = 1 0 - 4 the 3rd-order solution has the minimum C P U time. For a fixed solution time of 100 seconds again the 3rd-order case results in the minimum drag. If a large solution time is bearable, the 4th-order case provides us the best solution accuracy, since its C r j / C P U time slope is CHAPTER 6. RESULTS(I): VERIFICATION CASES 111 CPU-Time (Sec) Figure 6.24: D r a g coefficient versus C P U time sharper t h a n the other two orders of accuracy. T h i s implies that if a very accurate solution is needed it is more efficient to increase the discretization order rather t h a n using a finer mesh. T h i s comparison is performed for a simple problem, as opposed to a practical engineering case, a n d at same time engineering accuracy requirements are typically limited. However, this case study still provides some interesting insight into the prospects for higher-order methods. CHAPTER 6. RESULTS(I): VERIFICATION CASES 112 Figure 6.25: Annulus, Mesh 1 (108 CVs) 6.3 Supersonic Vortex Supersonic flow inside an annulus geometry is studied. This flow is isentropic (shock free) and it is a standard test case for accuracy measurement. The inner and outer radii are 2.0 and 3.0 respectively, and the inlet Mach number at the inner radius is equal to 2.0. Five different meshes (Mesh 1 to Mesh 5) are employed in this test case (Fig (6.25) to Fig (6.29)). Each mesh level has 4 times more control volumes than the immediate coarser level and uniform refinement has been applied in the grid generation. Notice that meshes are still irregular unstructured meshes and any finer level is generated completely independent from the previous coarser level. The exact solution for supersonic vortex can be found in Aftosmis and Berger [5]. Having the exact solution provides us a direct option for accuracy measurement of the numerical solution. The exact solution for the supersonic vortex is described in Eq. (6.3). CHAPTER 6. RESULTS(I): VERIFICATION CASES Figure 6.27: Annulus, Mesh 3 (1703 CVs) Figure 6.29: Annulus, Mesh 5 (27389 CVs) CHAPTER 6. RESULTS(I): Ri R VERIFICATION 115 CASES = 2.0, Mi = 2.0, pi = 1.0 = y/x + y 2 2 > = 4 i ( -f) < +1 U Ri = RiMi{ ) Pi 1 1 M 2 (6.3) 6.3.1 Numerical Solution The exact solution is used as an initial solution, which does not exactly satisfy the discretized equations due to truncation error. Starting a numerical solution from its exact solution is unrealistic in general, as if the exact solution was known, there would be no need for numerical solution. However this is a special case for efficiency study of the developed Newton-Krylov solver, which excludes the start-up issue completely. This way we can show how the efficiency of our matrix-free approach is affected purely by discretization order when the best starting solution is used. Newton iteration (infinite time step) is performed for all cases. An approximate analytic Jacobian with ILU(4) is used for preconditioning. The convergence criterion, as in section (6.2), is L2(Res(U)) = 1 x 1 0 -12 and the same subspace size has been used. Fig (6.30) and Fig (6.31) show the convergence history for the Mesh 1 and Mesh 5 in terms of CPU time. Since the solution is started from a good initial guess superlinear convergence is achieved from the very first iteration both for the 2nd and 3rd-order discretizations. The 4th-order case (as expected) is much slower than the other two and it requires more outer iterations (2 more outer iterations for the fine mesh) to reach full convergence. This shows that even when the solution is started from the best possible initial guess (i.e. exact solution), the 4th-order discretization results in a complicated linear system which requires considerable effort to solve. The solution Mach contours over the coarse mesh for all orders of accuracy are displayed in Fig (6.32) to Fig (6.34). The flow Mach number is supposed to be constant along any given radius and decreases with increasing radius. In other words, flow at smaller radius has larger acceleration with the maximum occurs along the inner wall boundary. By inspecting the contour lines, it is clearly visible that the quality of the solution using the higherorder discretization has been improved noticeably. This improvement is most visible close to the inner wall portion. This shows that, especially in regions where the solution has CHAPTER 6. RESULTS(I): ' 0 1 VERIFICATION I I 1 1 1 1 1 CASES 1 1 1 1 116 —I 2 I I L 3 CPU-Time (Sec) F i g u r e 6.30: Convergence history for the coarse mesh (Mesh 1) 0 50 100 150 CPU-Time (Sec) 200 250 F i g u r e 6.31: Convergence history for the fine mesh ( M e s h 5) CHAPTER 6. RESULTS(I): VERIFICATION CASES 117 F i g u r e 6.32: 2nd-order M a c h contours for the coarse mesh ( M e s h 1) large inadequately resolved derivatives (but is continuous), linear reconstruction of the flow quantities is not the o p t i m a l choice. F i g (6.35) to F i g (6.37) display the density error i n the solution over the coarse grid. T h e scale of the error m a p is the same for all discretization orders p r o v i d i n g a reasonable sense of how the solution error decreases when a higher-order discretization is employed. As expected the boundaries are the original source of the error a n d the m a x i m u m error occurs at the inner wall. T h e 4th-order density solution over the fine grid is shown in F i g (6.38) as a reference. T h e behavior of the density contours is opposite to the behavior of the M a c h contours, a n d density reaches its m a x i m u m value at the outer wall. 6.3.2 Solution accuracy measurement T h e density of the solution is used for the accuracy measurement. E r r o r norms are tabulated in T a b l e 6.11 to T a b l e 6.13 for all set of grids a n d discretization orders. T h e m a x i m u m error n o r m (loo) is about one order less t h a n the n o m i n a l order of accuracy due to the b o u n d a r y error. T h e global n o r m , fg, however, converges to the n o m i n a l order of accuracy for all cases. Hence the over all accuracy of the computed solution is consistent w i t h the discretization CHAPTER 6. RESULTS(I): VERIFICATION CASES F i g u r e 6.34: 4th-order M a c h contours for the coarse mesh (Mesh 1) CHAPTER 6. RESULTS(I): VERIFICATION CASES F i g u r e 6.35: 2nd-order density error for the coarse mesh ( M e s h 1) F i g u r e 6.36: 3rd-order density error for the coarse mesh ( M e s h 1) CHAPTER 6. RESULTS(I): Figure 6.38: VERIFICATION CASES Density, 4th-order solution over the fine m e s h (Mesh 5) CHAPTER 6. RESULTS(I): VERIFICATION Mesh CVs 1 2 3 4 5 108 427 1703 6811 27389 Lx 0.014872 0.003847 0.001073 0.000258 6.6334e-5 121 CASES Ratio — 3.86 3.59 4.15 3.89 Loo 0.097239 0.036135 0.012825 0.005168 0.002019 Ratio — 2.69 2.82 2.48 2.56 Table 6.11: Solution error norms, 2nd-order discretization Mesh CVs 1 2 3 4 5 108 427 1703 6811 27389 Li 0.044813 0.000976 0.000139 2.4415e-5 3.3202e-6 Ratio — 4.59 7.0 5.69 7.35 Loo 0.028904 0.011439 0.002225 0.000634 0.000136 Ratio — 2.53 5.14 3.51 4.66 Table 6.12: Solution error norms, 3rd-order discretization order. The error convergence plot for the l\ norm is shown in F i g (6.39) for graphical solution accuracy verification. Again here, some accuracy-efficiency analysis can be carried out. The solution C P U time versus the Error norm is plotted in F i g (6.40). For a fixed C P U time, for example ( C P U time = 10 Sec), the 3rd-order performance characteristic line generates the minimum error. For a fixed error level, l\ = 1 x 1 0 , the 4th-order performance characteristic line has - 5 the minimum computation time, although on that error level the 3rd and 4th-order lines are very close to each other. In all cases the 2nd-order characteristic line is totally out performed by higher-order counterparts. It should be mentioned that a similar performance characteristic graph with the same trend, but different C P U time, can be achieved when the solution starts from a constant Mach number as an initial guess with a start-up process. The supersonic vortex is an internal flow; therefore, the accuracy of the solution for whole domain was measured and studied. If one modifies the error quantification, or limits the Mesh CVs Lx 1 2 3 4 5 108 427 1703 6811 27389 0.002199 0.000410 2.3761e-5 2.0200e-6 1.2514e-7 Ratio — 5.36 17.25 11.76 16.14 Loo 0.01632 0.005102 0.000546 8.3718e-5 1.0024e-5 Ratio — 3.20 9.34 6.52 8.35 Table 6.13: Solution error norms, 4th-order discretization CHAPTER 6. RESULTS(I): VERIFICATION CASES 122 Figure 6.39: Error-Mesh plot for the solution (Density) importance of the solution to a specific part of the domain, the performance characteristic graph may change considerably. Therefore it is crucial to know what is important in a flow field, and how the error is defined and quantified. A l l of these are application oriented and vary from case to case. However, the studied test case clearly shows the potential performance advantages of higher-order discretizations. CHAPTER 6. RESULTS(I): VERIFICATION F i g u r e 6.40: CASES E r r o r versus C P U T i m e Chapter 7 Results (II): S i m u l a t i o n Cases In this chapter, the start-up procedure, fast convergence, robustness and overall performance of the proposed higher-order unstructured Newton-Krylov solver have been studied in detail for subsonic, transonic, and supersonic flows. 7.1 Subsonic Airfoil, N A C A 0012, M = 0.63, a = 2.0' To study the convergence and robustness of the proposed Newton-Krylov solver with a higher-order unstructured discretization, here a subsonic case is presented which includes most of the features of the solver's performance. The convergence characteristics are investigated for a series of meshes. Five different meshes with an O-Domain from a coarse to a relatively fine mesh have been used (Table 7.1 and Fig (7.1) to Fig(7.5) ). All meshes have proper refinement at the leading and trailing edges. The far field is located at 25 chords and characteristic boundary conditions are implemented implicitly. Mesh No. of CVs along the chord (upper side/lower side) Total No. of CVs 1 2 3 4 5 61/58 101/99 127/127 198/192 260/253 1245 2501 4958 9931 19957 Table 7.1: Mesh detail for NACA 0012 airfoil 124 CHAPTER 7. RESULTS(II): SIMULATION CASES X Figure 7.1: N A C A 0012, Mesh 1, 1245 C V s 0.4 0.2 >< 0 -0.2 -0.4 Figure 7.2: N A C A 0012, Mesh 2, 2501 C V s CHAPTER 7. RESULTS(II): SIMULATION CASES 0.4 0.2 > 0 -0.2 -0.4 0.4 0.2 > 0 -0.2 -0.4 Figure 7.4: N A C A 0012, Mesh 4, 9931 C V s CHAPTER 127 7. RESULTS(II): SIMULATION CASES Figure 7.5: NACA 0012, Mesh 5, 19957 CVs 7.1.1 Solution Process The tolerance in solving the linear system for the start-up phase is 5 x 10 ~ and for the 2 Newton part is 1 x 10 . For all test cases a subspace of 30 has been set and no restart -2 is allowed. The preconditioning for the start-up pre-iterations is performed by employing the approximate analytical Jacobian matrix with ILU-1 factorization and for the Newton iteration the first-order finite difference Jacobian matrix with ILU-4 factorization is used. The Newton iteration is matrix-free and e — |*jr with £rj = 1 x 10 -6 is used for directional differencing. The initial condition is the free stream flow. No attempt for optimizing the start-up process is made. The solution starts with 30 preiterations in the start-up process to reach a good initial solution before switching to Newton iterations. The C F L starts at 2.0 and is increased gradually to CFL=100 for the first 15 pre-iterations which are performed withfirst-orderaccurate flux evaluation. The remaining 15 pre-iterations are performed in the form of the defect correction with constant C F L of 100, where thefirst-orderJacobian is used both for constructing the left hand side and for preconditioning the linear system. The right hand side (Eq. (5.27)) is evaluated to the correct order of accuracy. The cost of each pre-iteration includes one first order Jacobian CHAPTER 7. RESULTS(II): 1 b ' I 0 SIMULATION 0.2 I I 0.4 CASES I i X/C I I 0.6 I 128 I I I 0.8 l l—I—L 1 F i g u r e 7.6: C p over the upper surface after start-up, M e s h l (1245 C V s ) evaluation a n d its incomplete factorization, one flux evaluation, a n d one system solve using G M R E S , which is not matrix-free since the Jacobian m a t r i x is available explicitly. E x p e r i m e n t s for subsonic flow have shown that a reasonable starting point for N e w t o n iteration c a n be easily achieved b y a relatively small number of pre-iterations, a n d there is no need to decrease the residual by a significant factor. O n l y a rough physical solution over the airfoil is good enough for starting Newton iterations. It is also interesting to look at the solution after finishing the start-up and right before switching to Newton iteration. F i g (7.6) displays the pressure coefficient over the upper surface of the airfoil at the e n d of the startup process for M e s h 1, which is the coarsest mesh. T h e approximate physical solution has been reached for all discretization orders a n d the suction peak is nearly resolved. A l t h o u g h the suction peak is recovered better in the start-up process for higher-order discretizations, the difference is small a n d for the start-up where only an approximate solution is needed, the 2nd-order defect correction generates a reasonably good solution to start the N e w t o n iterations for all discretization orders. Considering the difference i n cost of the 2nd-order residual evaluation compared to the 3rd and 4th-order residual evaluations, it makes sense to use the 2nd-order defect correction scheme for start-up. A similar graph is shown for the finest mesh (Mesh 5) after start-up process in F i g (7.7). CHAPTER 7. RESULTS(II): SIMULATION CASES 129 F i g u r e 7.7: C p over the upper surface after start-up, M e s h 5 (19957 C V s ) W h i l e the general trend of the pressure coefficient over the airfoil is recovered, the suction peak has not been properly resolved yet. T h a t is not surprising since due to the mesh volume, the diversity of mesh length scales, a n d the local time stepping approach, solution time evolution is far from completion a n d more pre-iterations are needed to resolve the suction peak. T h e noticeable point is the violent oscillations of the 4th-order discretization a r o u n d the suction region in the course of the solution evolution. T h i s oscillation a r o u n d the extrema is normal, a n d it is the case b o t h for the 2nd a n d 3rd-order discretizations. B u t for the 4th-order case due to low dissipation in the fourth-order scheme, oscillations are quite vigorous. These oscillations could be a source of instability i n the start-up process and indeed the 4th-order case is sensitive in the early period of solution process. Since the cost of the higher-order start-up is more than the 2nd-order one, a n d it could generate noisy solution state (before switching to Newton iteration), it is more robust to p e r f o r m all the defect correction part w i t h the 2nd-order residual evaluation. A similar approach is taken for all meshes of the current test case, where the defect correction iterations are performed w i t h 2nd-order accuracy. After start-up, the solution process is switched to Newton iteration a n d a n infinite C F L is employed. T o be able to compare the computing cost for different meshes, a work unit CHAPTER 7. RESULTS(II): SIMULATION 130 CASES has been used, which is simply equivalent to the cost of one residual evaluation for the corresponding order of accuracy. It is obvious then that two different meshes have two different work units in terms of CPU time. Convergence is reached (solution process is stopped) when the L norm of the non-linear density residual falls below 1 x 1 0 -12 2 . Table 7.2 shows the convergence summary for the 2nd, 3rd and 4th-order discretizations in terms of total number of residual evaluations, total CPU time, total work units, number of Newton iterations, and the cost of Newton phase in terms of work units. For all meshes, after 30 pre-iterations, the solution has converged after a few Newton iterations. The total number of work units increases as finer meshes are used. The linear system arising from a denser grid is more difficult to solve than a similar system arising from a coarser grid. Using a constant subspace size without restart becomes less and less effective in solving the linear system as the size and complexity of the system increases resulting in reducing the linearization effectiveness. Consequently more outer (Newton) iterations are needed to reduce the non-linear residual where the mesh size or discretization order increases. Also the solution state after the start-up for the fine mesh is not as close as it was to the steadystate solution for the case of coarse mesh and this is another contributing factor for slowing the Newton convergence on the fine mesh. Notice that the total work unit has increased roughly by a factor of 2.5 while the mesh size has increased by a factor of 16. For a similar subsonic case, the total computation cost in terms of work units for one of the fastest and most efficient available Newton-Krylov algorithm (2nd-order/artificial dissipation) for unstructured meshes [45] is just under 300. This confirms that the performance of the current developed Newton Krylov is quite comparable with the world class standard algorithms. However as it will be noted later a combination of mesh sequencing and multigrid techniques are needed to ensure such excellent performance as the size of the mesh increases. Figure (7.8) displays the the solution CPU time versus grid size in logarithmic scale. The computing cost in terms of CPU time follows a power law relation, t = m(Ncv) k , with the mesh size and all discretization orders show more or less the same trend (k d = 2n 1.131, k^rd — 1.021 and k^h = 1-163), showing the scalability of the developed solver C P U time with the mesh size. This graph emphasizes the need for parallel computing and multigrid method in the case of large meshes. Figure (7.9) and Fig (7.10) show the total work units and the Newton phase cost in terms of the work unit. The 2nd and the 3rd-order solution have nearly the same Newton phase cost in terms of the work unit. The difference between these two is mainly due to the start-up phase cost which for small meshes could be considerable part of the total solution cost. For the 4th-order case, the cost of the start CHAPTER 7. RESULTS(II): SIMULATION 131 CASES Test Case No. of Res. T i m e (Sec) Work U n i t No. of Newton Itr. Newton Phase Work U n i t Mesh 1 2nd 3rd 4th 100 132 244 5.76 8.87 27.09 240.0 197.1 258.0 3 4 7 96.7 122.4 226.8 Mesh 2 2nd 3rd 4th 121 136 283 12.88 17.71 58.15 280.0 213.4 312.6 3 4 8 118.3 128.2 274.8 Mesh 3 2nd 3rd 4th 126 147 247 26.88 36.03 90.54 349.1 248.5 289.3 3 4 7 136.1 141.2 239.2 Mesh 4 2nd 3rd 4th 158 158 318 60.39 73.98 217.60 399.9 276.0 371.4 4 4 9 182.8 159.0 317.8 Mesh 5 2nd 3rd 4th 254 286 542 164.10 225.87 682.0 562.0 456.3 639.8 7 8 16 325.3 321.8 577.1 Table 7.2: Convergence summary for N A C A 0012 airfoil, M = 0.63, a = 2° CHAPTER 7. RESULTS(II): SIMULATION CASES 132 Figure 7.8: C P U time versus the grid size, N A C A 0012, M = 0.63, a = 2° up is a minor part of the total solution cost since the 4th-order gets cheaper scaled start-up than 3rd (than 2nd) due to normalization of the solution cost. It should be mentioned that the start-up technique is neither unique nor optimized for these series of test cases and its cost should be studied separately. If a reasonably good pre-computed initial solution is available (e.g. full or incompressible potential solution, exact solution of some approximate problem or an empirical solution) then the cost of the start-up phase can be ignored and in practice the total solution cost would be equal to the Newton phase cost. The start-up methodology in this research is reasonably efficient for small and moderate sized meshes; however its cost would grow if more pre-iterations are needed for large meshes. In general for large meshes, it is not efficient to perform the start-up procedure from scratch, and it would be much more efficient if the computation procedure is started from a coarse mesh. Therefore for very large meshes augmenting the pre-iterations start-up with the mesh sequencing technique could be quite helpful. A t the same time the rise in total solution cost due to increasing the mesh size both in the start up and Newton phases can be dramatically reduced if a multi-grid procedure is employed. The convergence history for the coarse grid (Mesh 1) and the fine grid (Mesh 5) are shown in F i g (7.11) and Fig (7.12). Although the start up process is the same for all orders of CHAPTER 7. RESULTS(II): SIMULATION CASES 133 Figure 7.9: Total work unit versus the grid size, N A C A 0012, M = 0.63, a = 2° 2nd-Order r ' 1 1 1 1 I I I I I I I I I 10000 I I — I — I — I — I — I — I — L 20000 No. of C V s Figure 7.10: Newton phase work unit versus the grid size, N A C A 0012, M — 0.63, a = 2 CHAPTER 7. RESULTS(II): SIMULATION CASES 134 CPU time Figure 7.11: Convergence history, N A C A 0012, M e s h 1, M = 0.63, a = 2 ° discretizations, the starting residual is different for two reasons. F i r s t the initial residual (ro) which initializes the implicit start-up is based on the correct corresponding order of residual evaluation. Second the mesh moments are pre-computed a n d employed on the different orders of accuracy. However as start-up reaches to its end the state is almost the same for all discretization orders. based solution After start-up full convergence achieved within a few Newton iterations in 2nd and 3rd order cases. is F o r the 4th-order case convergence is about two times slower (in terms of outer iterations). T h e superlinear convergence for the 2nd a n d 3rd-order cases is evident. Fig (7.13) examines the correlation between the non-linear residual a n d the residual of the linear system solution. T h i s g r a p h shows the non-linear residual after a p p l y i n g the update p r o d u c e d b y solving the linear system versus the residual of that linear system. correlation between these two is nearly linear. The F i g (7.14) shows the quality of the linear system solution for each N e w t o n iteration. T h e linear residual reduction w i t h i n the G M R E S linear solver is plotted versus the final residual of that system for all outer iterations. In other words this shows how m u c h the linear system residual is reduced through inner iterations. For the 2nd-order case the linear system is solved quite effectively a n d more t h a n 8 orders of residual reduction are achieved given the subspace size of 30 with no restart. For the 3 r d - CHAPTER 7. RESULTS(II): 100 SIMULATION 200 CASES 300 400 CPU time 135 500 600 F i g u r e 7.12: Convergence history, M e s h 5, M = 0.63, a = 2 ° L 2 Linear System Residual F i g u r e 7.13: Non-linear residual versus linear system residual, M e s h 3, M — 0.63, a — 2 ° CHAPTER 7. RESULTS(II): SIMULATION CASES 136 •H •A •V Tcr itr T<r 2nd-Ordei7Newton Itr./Mesh3 3rd-Order/Newton Itr./Mesh3 4th-Order/Newton Itr./Mesh3 io" W* 15 L2-linear S y s t e m R e s i d u a l F i g u r e o r d e r for a n d t h e t h a t for 4 t h - o r d e r T h e n u m b e r s h o w n b y e a c h F i r s t , first-order a n for A s a w o u l d result, t i o n i n g (zero) q u a l i t y t h e q u a l i t y to o f N e w t o n iterations for m u l t i p l e restarts i t e r a t i o n s e m i w i t h is a v o i d t h e t h e b e d r a m a t i c w o u l d i m p o s s i b l e l i n e a r i z a t i o n to best t h a t s c a t t e r e d d i s t a n c e i n d i c a t o r s . o f c a n A l s o , i l l - c o n d i t i o n i n g p r e c o n d i t i o n i n g n o t t h e b e e x p e c t e d u n i t y is a n d for to t h e w h i l e s o m e o f f r o m u n i t y c a n to h a v e linear is one. is d i s c r e t i z a t i o n o r d e r s b e at T o o r d e r s e v e n t u a l l y H o w e v e r , u p to it h a s m a c h i n e e v e n for p r o p o s e d o n e for w h i c h i n s t e a d as o f b e e v a l u a t e o f t h e t h e is reasons. a n a p p r o x S e c o n d l y , f a c t o r i z a - p r e c o n d i t i o n e d o f far p r e c o n - t w o full v e r y o n e l o c a t e d the r e s p e c t i v e l y l i n e a r i z a t i o n . c o u l d u s e d 2 ° s y s t e m t h e e m p l o y e d t h e m = [53]. F o r o r i g i n a l a a t t a i n a b l e e i g e n v a l u e s [53]. 2 s y s t e m cost e i g e n v a l u e s issues 0.63, linear t e c h n i q u e all s i n g u l a r i t y different at t h a t d e s i r e d a n d eigenvalues p r e c o n d i t i o n i n g e i g e n v a l u e s it all — discretizations. t h e m a t r i x ) e q u a l is 5 c o n v e r g e n c e cluster M the c o m p u t a t i o n ( J a c o b i a n a n d i n s o l v i n g s o l v i n g in 3, is a b o u t all e i g e n v a l u e s as a r o u n d the a n d e x a c t l y M e s h h i g h e r - o r d e r p e n a l t y m a t r i x is u s e d q u a l i t y q u a d r a t i c cluster l i n e a r i z a t i o n f a c t o r i z a t i o n T h e r e f o r e , o p e r a t o r it this r e s i d u a l r e d u c t i o n i n u s i n g h i g h e r - o r d e r i n c o m p l e t e tion. o f d r o p p i n g o r d e r , r e d u c t i o n p r e c o n d i t i o n i n g strategy, i m a t i o n r e s i d u a l T h i s N e w t o n d i t i o n i n g t h e size. d i s c r e t i z a t i o n perfect s y s t e m d i s c r e t i z a t i o n s u b s p a c e t h e a c c u r a c y L i n e a r 4 t h - o r d e r s a m e increases b e e n 7.14: close t h e f r o m a n d to one. p r e c o n d i t h e o r i g i n c o m p a r e a p p r o x i m a t e t h e e i g e n v a l u e CHAPTER 7 RRSTTT.TSm)- RTMTJTATTON CASES 137 • A V £ A VA V . • 13 D * V A A Re A v D -0.5|- v A ^ • A 2nd-Order/Mesh3 3rd-Order/Mesh3 4th-Ordei7Mesh3 ^ • A v A * AA" * F i g u r e 7.15: Eigenvalue pattern for the preconditioned system, M e s h 3, M = 0.63, a = 2 spectrum of the preconditioned linear system (estimated v i a G M R E S algorithm) is plotted at the last iteration (i.e. converged solution) for the M e s h 3 test case, F i g (7.15). The eigenvalues associated w i t h higher-order discretizations are scattered w i t h larger distances from one compared to the 2nd-order eigenvalues. T h i s some how indicates a reduction i n the quality of system solving with increasing discretization order, which of course was not h a r d to predict i n the first place. A t the same time, higher the discretization order closer to the origin eigenvalues are located shifting the matrix toward singularity. T h i s is especially the case for the 4th-order discretization where one of the eigenvalues is very close to origin. T h e estimated condition number of the linear system for Newton iterations are shown i n F i g (7.16) a n d F i g (7.17) for M e s h 3 a n d M e s h 5 cases. as a function of drop in residual. T h e ResO or T h e condition n u m b e r is shown reference residual is the initial non-linear residual of the corresponding mesh computed based on the far field flow condition. Therefore the ratio of the non-linear residual at the end of each N e w t o n iteration to ResO, reflects the relative convergence after each Newton iteration. T h e first iteration condition n u m b e r shows the conditioning of the linear system formed based o n the solution linearization at the end of the start-up phase. T h e rest of the reported condition numbers are associated to the linear system formed based on the solution update at the end of successive N e w t o n iterations. W h i l e the 2nd-order discretization graph initially has larger condition number, CHAPTER 7. RESULTS(II): SIMULATION CASES F i g u r e 7.16: C o n d i t i o n N o . of the preconditioned system, M e s h 3, M = 0.63, a = 2 F i g u r e 7.17: C o n d i t i o n No. of the preconditioned system, M e s h 5, M = 0.63, a = 2 138 CHAPTER 7. RESULTS(II): SIMULATION CASES 139 2nd-Order/Newton Itr./Meshl 3rd-Order/Newton itrVMeshl 0.323 - 4th-Order/Newton Itr./Meshl 0.322 - 0.321 - U 0.32 - 0.319 - 0.318 . A A A A0.317 L i i—I—i—i—L_I 0 i i ' 1 1 i i I i i i I 2 4 6 L o g (L2(ResO)/L2(Res)) i i i 8 I . 10 Figure 7.18: Lift coefficient convergence history, N A C A 0012, M e s h 1, M = 0.63, a = 2 ° the condition number decreases gradually with convergence. In higher-order cases, initially, the condition number is smaller compare to the 2nd-order case, b u t it starts to grow w i t h N e w t o n iterations. In conclusion, the over all performance of the flow solver depends not only o n the conditioning of the linear system b u t also it relies o n how well the non-linear problem is represented by the linearization at each iteration. F o r higher-order discretizations the quality of the linearization is not as good as it is for the 2nd-order discretization a n d more outer or N e w t o n iterations are needed for full convergence. It should be mentioned that the reported condition numbers are relatively small (since the size of the meshes are not very large). Generally speaking, the resultant preconditioned linear systems are well conditioned a n d the condition numbers are reported just for monitoring the solution process. Lift a n d d r a g coefficient convergence histories as a function of drop i n residual are shown in F i g (7.18) to F i g (7.21) for coarse a n d fine meshes. In the case of coarse mesh, almost 6 orders drop i n residual seems to be necessary for lift a n d d r a g convergence. T h e lift a n d drag convergence are nearly achieved after 4-5 orders d r o p i n residual for higher-order discretizations. In the case of the fine mesh all the coefficients are completely converged after 4 orders drop i n residual. A s a practical matter, this is a n important point since the computation cost can be noticeably reduced b y m a k i n g CHAPTER 7. RESULTS(II): SIMULATION CASES 140 2nd-Order/Newton Itr./Meshl 3rd-Order/Newton itr./Meshl 4th-Order/Newton Itr./Meshl 0.0028 0.0026 0.0024 0.0022 0.002 0.0018 0 . 0 0 1 6 FQ 0.0014 u 0.0012 h 0.001 0.0008 h 0 I i i ' I i i i I i i i I i i i I i i 2 4 6 10 Log (L2(Res0)/L2(Res)) F i g u r e 7.19: D r a g coefficient convergence history, N A C A 0012, M e s h 1, M = 0.63, a = 2° convergence criteria less strict, a n d terminating the solution process after fewer iterations for the fine mesh where the c o m p u t i n g cost increases considerably. Newton Following this procedure, the higher-order (especially the 4th-order) discretization would be greatly benefited, as the unnecessary c o m p u t i n g cost to reduce the residual after a certain value is avoided. Lift a n d d r a g coefficients for all meshes a n d discretization orders are tabulated in T a b l e 7.3. In the finest mesh case, lift coefficients up to t h i r d decimal place a n d the d r a g coefficients up to the 4 t h decimal place have converged to the same value. However the d r a g coefficient still is far from zero. T h i s is m a i n l y due to lack of pressure recovery at the trailing edge which is a singular point in potential flow. T h e 3rd-order drag coefficient is consistently larger t h a n the 2nd-order drag coefficient, which is surprising. However the 4th-order d r a g is consistently the smallest, as was expected. W i t h o u t drawing a general conclusion about the 3rd-order drag, it is fair to say that behavior of the discretization order at the trailing edge of the airfoil is a determining factor, a n d apparently for such a region discretization methods w i t h even orders of accuracy performed better t h a n a discretization m e t h o d w i t h an odd-order of accuracy possibly due to cancellation effect. A l l of the test cases so far are performed with a far field size of 25 chords. T o study the CHAPTER 7. RESULTS(II): -1 0 SIMULATION CASES 2 3 4 Log (L2(Res0)/L2(Res)) 1 141 5 6 F i g u r e 7.20: Lift coefficient convergence history N A C A 0012, M e s h 5, M = 0.63, a = 2 ° F i g u r e 7.21: D r a g coefficient convergence history, N A C A 0012, M e s h 5, M = 0.63, a = 2 ° CHAPTER 7. RESULTS(II): SIMULATION 142 CASES Test Case Lift coefficient Drag coefficient Meshl 2nd 3rd 4th 0.322318 0.317393 0.322223 0.00097664 0.00184889 0.00072576 Mesh2 2nd 3rd 4th 0.322302 0.321846 0.322448 0.00057225 0.00103438 0.00040369 Mesh3 2nd 3rd 4th 0.324905 0.325214 0.325588 0.00040197 0.00049820 0.00034757 Mesh4 2nd 3rd 4th 0.325159 0.324740 0.325323 0.000336973 0.00037595 0.00032525 Mesh5 2nd 3rd 4th 0.324801 0.324568 0.324474 0.00032136 0.00032711 0.00030828 Table 7.3: Lift and drag coefficients for all meshes and discretization orders, N A C A 0012, M = 0.63, a = 2°, far field size of 25 chords CHAPTER 7. RESULTS(II): SIMULATION 143 CASES Test Case/Mesh Size /Far field distance Lift coefficient Drag coefficient Mesh 4-1/10604 CVs/128 Chords 2nd 3rd 4th 0.332506 0.332882 0.332699 0.00010622 0.00012667 7.09651e-5 Mesh 5-1/22421 CVs/200 Chords 2nd 3rd 4th 0.333575 0.333530 0.334306 4.88588e-5 5.98949e-5 4.04234e-5 Mesh 5-2/24514 CVs/1600 Chords 2nd 3rd 4th 0.332908 0.333715 0.333374 1.82357e-5 2.35558e-5 5.66076e-6 Table 7.4: Effect of the far field distance on lift and drag coefficients, NACA 0012, M = 0.63, a = 2° effect of the far field size on lift and drag, and to show that computing smaller drag is possible with the same mesh resolution, the far field size is increased considerably while the mesh resolution, the number of points on the airfoil and refining factors remained the same. In other words, the new meshes are equivalent to the meshes of the previous test cases up to 25 chords. The results are tabulated in Table 7.4. In all cases the drag coefficient has been reduced dramatically by increasing the far field distance. For instance, the 4th-order computed drag over Mesh 5-2 has been reduced by more than 50 times comapred to the 4th-order computed drag over Mesh 5. The lift coefficient is also affected by the far field distance (i.e. in the most cases the lift coefficient was slightly increased by extending the far field size). It should be mentioned that in this research no far field correction is used, and higher-order far field correction was beyond of the scope of the thesis research. Reference [21] reports C L of 0.3289 and CD of 0.0004 for a 2nd-order computation of a similar flow case over an adaptively refined Cartesian mesh with the total of 10694 CVs (494 body nodes) and 128 chords far field distance, which in terms of the mesh and far field size is closely comparable to the Mesh 4-1 test case. Mach contours for the Mesh 3 (4958 CVs) are displayed in Fig (7.23) to Fig (7.25), where the corresponding mesh is shown in Fig (7.22). Flowfieldfeatures such as the stagnation region just below the leading edge, flow acceleration near the leading edge (where the maximum Mach number occurs), and flow deceleration near the trailing edge are visualized. Once CHAPTER 7. RESULTS(II): SIMULATION CASES 144 Figure 7.22: NACA 0012, Mesh 3 (4958 CVs) again where the mesh is coarse, higher-order contours are visibly smoother than the 2ndorder contours showing the improved quality of the flow computation over these regions. The Mach profile (computed at Gauss points) along upper side of the airfoil for Mesh 1 (coarsest mesh, Fig (7.28)) are shown in Fig (7.26) and Fig (7.27). Since there is a sudden change in the area of the control volumes over the surface of the airfoil in acceleration region, which is often the case for coarse meshes, a jump is observed in Mach profile at those locations. However, higher-order solution has reduced the amplitude of this jump so that in the 4th-order case, the jump has disappeared. Also, the Mach profile over the lower surface of the airfoil is shown in Fig (7.29), where the 2nd-order Mach profile is slightly lower than its higher-order counterparts. Fig (7.30) to Fig (7.32) display the Mach profile of upper/lower sides of all discretization orders over the finest mesh, Mesh 5. As expected, all orders of accuracy result in essentially the same Mach distribution. The close up over the acceleration region demonstrates that in locations where the flow experiences large changes, linear reconstruction or 2nd-order discretization does not result in as smooth a solution as high-order discretizations. The error distribution in total pressure, 1 — P / P t t o o , along the chord for Mesh 1 and Mesh 5 CHAPTER 7. RESULTS(II): F i g u r e 7.23: SIMULATION CASES 2nd-order M a c h contours for N A C A 0012 airfoil, M e s h 3, M = 0.63, X F i g u r e 7.24: 3rd-order M a c h contours for N A C A 0012 airfoil, M e s h 3, M = 0.63, CHAPTER 7. RESULTS(II): SIMULATION CASES 146 Figure 7.25: 4th-order Mach contours for NACA 0012 airfoil, Mesh 3, M = 0.63, a = 2° Figure 7.26: Mach profile, upper side, NACA 0012 airfoil, Mesh 1, M = 0.63, a = 2° CHAPTER 7. RESULTS(II): SIMULATION CASES 147 X/C Figure 7.27: M a c h profile close u p , upper side, N A C A 0012 airfoil, M e s h 1, M = 0.63, a = 2 ° F i g u r e 7.28: M e s h 1, close-up at the leading edge region CHAPTER 7. RESULTS(II): SIMULATION CASES 148 Figure 7.29: M a c h profile, lower side, N A C A 0012 airfoil, M e s h 1, M = 0.63, a = 2 ° F i g u r e 7.30: M a c h profile, upper side, N A C A 0012 airfoil, M e s h 5, M = 0.63, a = 2 ° CHAPTER 7. RESULTS(II): SIMULATION CASES 149 2nd-Order/Mesh5 3rd-Order/Mesh5 4th-Order/Mesh5 0.99r- 0.98 h 0.97 h JS 0.96h 0.95 h 0.94 h 0.04 0.06 0.08 X/C F i g u r e 7.31: M a c h profile close u p , upper side, N A C A 0012 airfoil, M e s h 5, M = 0.63, a = 2 ° 2nd-Order/Mesh5 3rd-order/Mesh5 4th-Order/Mesh5 0.4 X/C 0.6 Figure 7.32: M a c h profile, lower side, N A C A 0012 airfoil, M e s h 5, M = 0.63, a = 2 ° CHAPTER 7. RESULTS(II): SIMULATION CASES 150 0.04 0.03 2nd-Order/Meshl 3rd-Order/Meshl 4th-Order/Meshl 0.02 F i g u r e 7.33: 1 - 0.015 upper side, N A C A 0012 airfoil, M e s h 1, M = 0.63, a = 2 ° 2nd-Order/Meshl 3rd-Order/Meshl 4th-Order/Meshl 0.01 J 0.005 -0.005 h -0.01 h F i g u r e 7.34: 1 - lower side, N A C A 0012 airfoil, M e s h 1, M = 0.63, a = 2 ° CHAPTER 7. RESULTS(II): SIMULATION CASES 151 2nd-Order/Mesh5 3rd-Order/Mesh5 4th-Order/Mesh5 0.004 0.002 -0.002 -0.004 h 1 1 0.2 1 1 1 1 1 1 1 0.4 1 1 0.6 1 1 1 1 1 0.8 x/c F i g u r e 7.35: 1 - u p p e r side, N A C A 0012 airfoil, M e s h 5, M = 0.63, a = 2 ° 2nd-Order/Mesh5 3rd-order/Mesh5 4lh-Order/Mesh5 0 . 0 0 4 p- 0.002 h -0.002 r- -0.004 r- F i g u r e 7.36: 1 - lower side, N A C A 'too 0012 airfoil, M e s h 5, M = 0.63, a = 2 ° CHAPTER 7. RESULTS(II): SIMULATION CASES 152 are shown in Fig (7.33) to Fig (7.36). In all cases there are sudden spikes at the trailing edge, and although the error amplitude over the fine mesh is noticeably decreased, the trailing edge remains one of the main source of total pressure error. Another source of error is the leading edge area which covers both the stagnation and acceleration regions. The 4th-order solution for all cases has produced smaller entropy all over the airfoil compared to the 2nd and 3rd-order discretizations, as its stagnation pressure error is nearly zero, (except the above mentioned spikes at the trailing edge) including the leading edge region. CHAPTER 7.2 7. RESULTS(II): SIMULATION 153 CASES Transonic Airfoil, N A C A 0012, M = 0.8, a = 1.25° For a transonic flow, in general, it is more difficult to get fast convergence. This is because of the mixed subsonic/supersonic nature of the flow and the existence of discontinuities (shocks) in the solution. The methodology for handling of discontinuity can increase the complexity of the problem. This is true especially for implicit schemes, where there could be a large change in the solution update in each iteration and limiter values could have large oscillations. In the case of the matrix-free approach in which matrix-vector multiplication is computed through flux perturbation, any oscillatory behavior in limiter could severely degrade the solution convergence. All these facts amount to increasing the complexity and difficulty in solving transonic flows. The transonic flow around NACA 0012 at M = 0.8, a = 1.25° is studied. The Mesh 3 with 4958 CVs, Fig (7.3), of section 7.1 is employed for this test case. Flow is solved for all orders of accuracy using Venkatakrishnan limiter with proper higher-order modification, section 3.4.2. For the 2nd and 3rd-order cases K = 10 is used in the limiter, and for the 4th-order discretization K = 1 is employed. The limiter values are allowed to change through all iterations and no freezing is considered. The tolerance of solving the linear system, like previous test cases, for the start-up phase is 5 x 1 0 is 1 x 10 . -2 -2 and for the Newton phase For all test cases a subspace of 30 has been set and no restart is allowed. The preconditioning for the start-up pre-iterations is performed using the approximate analytical Jacobian matrix with ILU-1 factorization and for the Newton iteration finite difference Jacobian matrix with ILU-4 factorization is applied. The Newton iteration is matrix-free and e = with eo = 1 10 x -7 is used for directional differencing. The initial condition is free stream flow. Convergence is reached when the L 2 density residual falls below 1 x 1 0 -12 norm of the non-linear . For transonic flow before switching to Newton iteration, the shock locations in theflowfield and their strengths need to be captured relatively accurately; otherwise Newton iterations will not decrease the residual of the non-linear problem effectively. This normally is achieved by reducing the non-linear residual by some orders, typically 1.5-2, respect to the initial residual in the course of the start-up process. Multiple implicit pre-iterations are performed in the form of defect correction, before switching to Newton iteration. For the 2nd and 3rd-order start-up phases, pre-iterations in the form of defect correction continue until the residual of the non-linear problem drops 1.5 order below the residual of the initial condition. In defect correction phase the starting CFL number is 2 and it is increased gradually to 200 after 50 iterations. The C F L is not CHAPTER 7. RESULTS(II): 0 SIMULATION 0.2 0.4 154 CASES XJC 0.6 0.8 1 F i g u r e 7.37: M a c h profile at the end of start-up process, M = 0.8, a = 1 . 2 5 ° increased above the value of 200 as increasing C F L would not help that m u c h when the linearization is inaccurate, as in the start-up phase. 69 a n d 81 pre-iterations were required in the start-up phase to reduce the residual b y 1.5 order for the 2 n d a n d 3rd-order discretizations respectively. w i t h infinite time step. In the Newton phase, the 2nd a n d 3rd-order cases are followed T h e start-up phase for the 4th-order discretization, includes 200 pre-iterations w i t h similar C F L trend. A l t h o u g h the target residual reduction of 1.5 or- ders was not achieved t h r o u g h pre-iterations, the solution after 200 iterations was good enough for starting the N e w t o n phase. F i g (7.37) displays the M a c h profile at the end of the start-up phase for all orders of accuracy. T h e r e are some oscillations especially along the upper surface M a c h profile close to the shock location however, the shock locations a n d their strengths are captured reasonably accurately before switching to N e w t o n iteration. For the 4th-order transonic case, using an infinite time step does not help the convergence acceleration a n d causes inaccurate linearization a n d limiter oscillation. Since this leads to slow convergence, C F L = 1 0 , 0 0 0 has been set for Newton phase. The convergence s u m m a r y is shown i n the Table 7.5. Similar to the subsonic case, the number of N e w t o n iterations for the 4th-order discretizations is twice as large as for the 2nd a n d 3rd-order discretizations. T h e major cost of the solution in all cases is the start-up CHAPTER Test Case 7. RESULTS(II): No. of Res. SIMULATION CASES 155 T i m e (Sec) Work Unit No. of Newton Itr. Newton Phase W o r k U n i t 2nd 197 65.6 279 4 91 3rd 241 106.7 281 5 119 4th 450 311.4 590 10 221 T a b l e 7.5: Convergence s u m m a r y for N A C A 0012 airfoil, M = 0.8, a = 1 . 2 5 ° • HA 2nd-Order/Newton Itr. 3rd-Order/Newton Itr. 4th-Order/Newton Itr. CPU time (Sec) F i g u r e 7.38: Convergence history for N A C A 0012, M = 0.8, a = 1 . 2 5 ° cost. Therefore there is a reasonable potential to reduce the total solution work d r a m a t i c a l l y b y employing an optimized start-up technique which is able to capture shocks a n d establish the major flow features efficiently. F i g u r e (7.38) demonstrates the convergence history graph. T h i s time, reduction in convergence slope for the 4th-order case is not only due to poor convergence of the solution of the linear system i n each Newton iteration, but also to the limiter firing which changes the linearization in each iteration. A g a i n for a similar transonic case, the total computation cost i n terms of work units for a very fast N e w t o n - K r y l o v algorithm (2nd-order/artificial dissipation) for unstructured meshes [45] is just under 400 showing that the the performance of the developed Newton K r y l o v in this research is very competitive. T a b l e 7.6 summarizes the lift a n d drag coefficients for all discretization orders which are CHAPTER 7. RESULTS(II): SIMULATION CASES Test case CL CD 2nd 0.337593 0.0220572 3rd 0.339392 0.0222634 4th 0.345111 0.0224720 A G A R D - 2 1 1 [2]/Structured (192x39) 0.3474 0.0221 T a b l e 7.6: Lift a n d drag coefficients, N A C A 0012, M = 0.8, a = 1 . 2 5 ° -0.2 0 0.2 0.4 X 0.6 0.8 1 1.2 F i g u r e 7.39: 2nd-order M a c h contours, N A C A 0012, M = 0.8, a = 1 . 2 5 ° Figure 7.41: 4th-order Mach contours, NACA 0012, M = 0.8, a = 1.25° CHAPTER 7. RESULTS(II): SIMULATION CASES 158 F i g u r e 7.42: limiter 0 (3rd-order), N A C A 0012, M = 0.8, a = 1 . 2 5 ° in good agreement with the reference d a t a [2]. B o t h C L a n d C D i n transonic flow are essentially influenced b y the shock location a n d its strength, a n d to have a good prediction for theses coefficients, it is crucial to capture the shocks accurately. F i g u r e (7.39) to Fig(7.41) display b a n d e d M a c h contours for all orders of discretizations. T h e weak shock at lower surface a n d the strong shock at upper surface of the airfoil are quite visible. T h e presence of the discontinuity a n d firing of the limiter i n shock regions results in non-smooth contour lines in the shock vicinity (mostly for coarser control volumes). This non-smoothness widens with the discretization order due to increase in the reconstruction stencil size. In a d d i t i o n to h a m p e r i n g convergence, using the limiter has the disadvantage of reducing the accuracy of the solution due to altering the gradients a n d higher-order terms. The behavior of the limiter values <> / a n d a is shown in F i g (7.42) a n d F i g (7.43). Some isolated limiter firing is observed in the leading a n d trailing edge regions, but those are limited to a small n u m b e r of control volumes, a n d generally the value is close to one. T h e limiter is m a i n l y active i n the strong upper shock region, while it remains inactive at the weak lower shock. Note that wherever the gradient limiter, terms limiter, a, is nearly zero. is fired aggressively the higher-order CHAPTER 7. RESULTS(II): SIMULATION CASES Figure 7.44: Mach profile, N A C A 0012, M = 0.8, a = 1.25° 159 CHAPTER 7. RESULTS(II): SIMULATION 160 CASES 1.4 2nd 3rd 4th AGARD 1.2 0.2 0.6 0.4 0.8 XJC F i g u r e 7.45: M a c h profile in shock regions, N A C A 0012, M = 0.8, a = 1 . 2 5 ° T h e M a c h profile along the surface is shown in F i g (7.44) a n d F i g (7.45). B o t h the location and strength of the shocks are i n good agreement with the A G A R D d a t a [2]. T h e 2nd-order discretization has some overshoot right before the upper shock, which is also visible in the related M a c h contours. It appears that the 3rd-order discretization produces less noise in this shock c a p t u r i n g case, a n d this is probably because of the quadratic reconstruction characteristic. However, the cubic reconstruction associated to the 4th-order discretization shows some oscillations inside the shock region; this behavior is typical in a p p r o x i m a t i n g a (sharp) discontinuity by a higher-order polynomial. CHAPTER 7.3 7. RESULTS(II): SIMULATION 161 CASES Supersonicflow,Diamond airfoil , M = 2.0, a = 0.0 Supersonic flow field axound the diamond airfoil has been computed using all discretization orders at far field Mach number of 2.0 and zero angle of attack. The diamond airfoil has a thickness of 15% and is located in a 7x14 rectangular domain. The total number of control volumes in the mesh is 7771 with 80 control volumes along the chord. Appropriate mesh refinement at the leading and trailing edges (shock locations) and also the apexes of the airfoil (expansion locations) has been carried out to enhance the quality of the solution capturing in these high gradient regions, Fig (7.46). The interesting parts of the flow field are the discontinuities (shocks) and very sharp gradients (expansion fans). Because these features would result in extensive limiter firing which would adversely affect the global accuracy, no limiter has been employed in the test case. Therefore, the resultant solution will not be monotone and there will be considerable oscillations around shocks and expansion fans. The tolerance in solving the linear system for the start-up phase is 5 x 1 0 -2 and for the Newton phase is 1 x 10~ . For all test cases a 2 subspace of 30 has been set and no restart is allowed. The preconditioning for the start-up pre-iterations is performed using the approximate analytical Jacobian matrix with ILU-1 factorization and for the Newton iteration finite difference Jacobian matrix with ILU-4 factorization is applied. The Newton iteration is matrix-free and e = with eo — 1 x 1 0 -7 is used for directional differencing. The initial condition is free stream flow. Convergence is reached when the L norm of the non-linear density residual falls below 1 x 1 0 2 - 1 2 . In the current test case major physical flow characteristics, including both shocks and expansion fans need to be captured approximately before switching to Newton iterations. Otherwise due to highly non-linear and discontinuous regions in the supersonic flow field convergence stalls shortly after the starting point. Therefore 30 pre-iterations in the form of defect correction have been performed before switching to the Newton phase. The starting C F L is 5 and it is increased to 50 after 10 iterations. For the start-up of the 4th-order discretization case, the 2nd-order defect correction is used to reduce the start-up cost. That improves the robustness of the start-up too since the 4th-order defect correction start-up for such a flow (without limiter) could be unstable. The pressure coefficient along the chord at the end of the start-up phase is shown in Fig (7.47) demonstrating the effectiveness of the defect correction start-up for supersonic flow and the fact that the shocks and expansion fans are fairly accurately established at the end of the start-up process. After starting the Newton phase full convergence is achieved rapidly for all orders of discretizations. Figure (7.48) shows the convergence history for the diamond airfoil displaying CHAPTER 7. RESULTS(II): SIMULATION CASES (a) Diamond airfoil (b) Nose section (c) M i d section Figure 7.46: Mesh 7771 C V s , diamond airfoil, M = 2.0, a = 0.0 CHAPTER 7. RESULTS(II): SIMULATION CASES 163 2nd-0rder Start up (used for 2nd/4th-Order) 3rd-0rder Start up Exact Solution • y±+—•—•—• 0.2 0.4 XJC 0.6 0.8 F i g u r e 7.47: C p at the end of start-up process, d i a m o n d airfoil, M = 2.0, a = 0.0 the fast convergence. Notice that the residual still is large right before the first N e w t o n iteration, but by that point the m a i n structure of the flow field has been formed a n d the residual has been d r o p p e d quickly after Newton iterations. In fact the h i g h M a c h n u m - ber helps to d a m p all errors including low frequency errors quite effectively as soon as the shocks a n d expansion fans are established over the airfoil since first, the wave propagation speed is quite fast a n d second, all waves travel toward outlet without reflecting from b o u n d aries. T a b l e (7.7) provides the convergence s u m m a r y for all discretization orders. For the 2nd-order case the major part of the total work is spent in the start-up phase while for the 4h-order case as expected, from previous test cases, most of the time is spent on the N e w t o n iterations. T h i s is especially the case here since the 2nd-order start-up is employed a n d the 4th-order start-up is avoided (like the subsonic airfoil test case) reducing the relative cost of the start-up phase for the 4th-order solution. For the 3rd-order case, the costs of the start-up a n d N e w t o n phases are split evenly. T h e M a c h contours for all discretization orders are shown in F i g (7.49) to F i g (7.51). A l l the steady-state flow features, i n c l u d i n g symmetric shocks right at the sharp leading edge, expansion fans at the lower and upper apexes, and symmetric fish tail shocks at the sharp trailing edge of the d i a m o n d airfoil are clearly captured. CHAPTER 7. RESULTS(II): SIMULATION 164 CASES 50 CPU time (Sec) 100 F i g u r e 7.48: Convergence history for d i a m o n d airfoil, M = 2.0, a = 0 Test Case No. of Res. Time (Sec) Work Unit No. of Newton Itr. Newton Phase W o r k U n i t 2nd 61 27.85 236.0 2 50.0 3rd 126 48.62 243.1 3 120.8 7 250.0 4th 254 133.57 301.5 T a b l e 7.7: Convergence s u m m a r y for d i a m o n d airfoil, M = 2.0, a = 0 Test case CD 2nd 0.0524258 3rd 0.0524661 4th 0.0524671 Exact 0.0524663 T a b l e 7.8: D r a g coefficient, d i a m o n d airfoil, M = 2.0, a = 0 ° CHAPTER 7. RESULTS(II): SIMULATION CASES 165 CHAPTER 7. RESULTS(II): SIMULATION CASES CHAPTER 7. RESULTS(II): SIMULATION CASES 167 3rd-0rder Figure 7.53: 3rd-order Cp, diamond airfoil, M — 2.0, a = 0.0 0.3 h 4th-Order Figure 7.54: 4th-order Cp, diamond airfoil, M = 2.0, a = 0.0 CHAPTER 7. RESULTS(II): SIMULATION CASES 168 Figure (7.52) to Fig (7.54) display the pressure coefficient along the chord of the diamond airfoil for all orders of discretization. In all cases there are over and/or undershoots at the leading and trailing edges of the airfoil (where the shocks are located) and at the middle section of the diamond (where the expansion fans are formed). Naturally, for higher discretization order, the over and undershoots are worse. The 4th-order Cp shows some oscillatory behavior in the front region, located between two large gradient points of the diamond airfoil. This is due to the continuing effect of the over and undershoots which are not fully damped at that region due to very large reconstruction stencil and the characteristic of the cubic polynomial. It is worth mentioning that the higher-order reconstruction procedure used in this research has some compactness issues for such a supersonic flow since the reconstruction routine uses data from regions in the flow field which are physically independent. That is, downstream data is included in the reconstruction. Despite all this, there is an overall good agreement between the exact solution and the numerical solution over the supersonic diamond airfoil. The computed drag coefficient for all orders of discretization is tabulated in Table 7.8. Again there is a very good match between the numerical and exact solution here. For the inviscid supersonic diamond airfoil case, due to pre-determined shock and expansion fan locations, the drag coefficient is solely a function of the shocks strength and acceleration of the flow at the expansion apexes. This is why the computed drag coefficient is still fairly accurate despite all these over and undershoots, as their mean values remain very close to zero. Chapter 8 Concluding Remarks 8.1 Summary and Contributions A fast implicit finite volume flow solver for higher-order unstructured (cell-centered) steadystate computation of inviscid compressible flows (the Euler equations) has been developed. T h e matrix-free G M R E S algorithm is used as a solver for the linear system arising from the discretized equations, avoiding explicit computation of the higher-order Jacobian matrix. T h e solution process has been divided into two separate phases, i.e. a start-up phase and a Newton phase. A defect correction procedure is proposed for the start-up phase consisting of multiple implicit pre-iterations. T h e approximate first-order analytical Jacobian is used both for constructing and preconditioning the linear system on the left hand side, reducing the complexity and cost of the pre-iterations. A low fill-level incomplete factorization, I L U ( l ) , is used for the right preconditioning of the low order linear system associated with the defect correction iterations. T h e G M R E S algorithm in its standard form is used for solving the resultant linear system. K n o w i n g that the linearization is not very accurate at this stage, the C F L number is kept low (on the order of 10 ) and a moderate tolerance was set for 2 solving the linear system. T h e defect correction procedure was very effective in reaching an approximate solution which includes most of the physical characteristics of the steady-state flow. Having an approximate solution at the end of the start-up phase where the linearization of the flow field is accurate enough for steady-state solution, the solution process is switched to the Newton phase taking an infinite time step. In this phase, a first-order finite difference 169 CHAPTER 8. CONCLUDING REMARKS 170 Jacobian is used for preconditioning of the higher-order linear system in the matrix-free context. A high fill-level incomplete factorization, ILU(4), is applied for the right preconditioning of the N e w t o n - G M R E S iterations to take advantage of the best possible preconditioning of the employed approximate J a c o b i a n . Semi-quadratic or superlinear convergence 1 has been achieved within a few Newton iterations for most of the cases. A modified version of Venkatakrishnan limiter [87] is'used to enforce loose monotonicity for transonic flows. T h e higher-order terms in the reconstruction polynomial were dropped wherever the limiter fired firmly in the flow field. T h i s has been performed through a differentiable switch, cr, which is a function of the limiter value <f> itself. Also, a large constant value K in Venkatakrishnan limiter (typically K = 1 0 ) is chosen for the limiting which somewhat sacrifices the monotonicity in the favor of rapid convergence. T h e over all proposed limiting procedure insures fast convergence while reducing the accuracy disadvantages of limiter application. T h e effect of choosing e in the accuracy of the computed matrix-vector products using directional derivative technique for the matrix-free G M R E S was analyzed throughly. It was shown that for a fine mesh it is necessary to choose a relatively large e to properly account for the higher-order terms. T h e issue of mesh refinement in accuracy measurement for unstructured meshes is revisited. A straightforward methodology is applied for accuracy assessment of the higher-order unstructured approach based on total pressure loss, drag measurement, and solution error calculation. T h e effect of the discretization order, mesh size and far field distance were studied completely and a careful measurement of the solver performance was provided. T h e accuracy, fast convergence and robustness of the proposed higher-order unstructured Newton-Krylov solver for different speed regimes were shown via several test cases. Solutions of different orders of accuracy are compared in detail through several investigations. T h e possibility of reducing computational cost required for a given level of accuracy using high-order discretization is demonstrated. 'For such a N e w t o n - G M R E S solver, it is shown that ILU(4) has an equal performance with full factorization, L U , in terms of outer iterations where the ILU(4) factorization cost both in terms of time and memory usage remains fairly small compared to LU.factorization [52]. CHAPTER 8.2 8. CONCLUDING REMARKS 171 Conclusions While, in general, computing cost remains one of the main concerns for the higher-order computation of fluid flow problems, the current research proves that reaching fast convergence— and a reasonable computing cost—for higher-order unstructured solution is indeed possible. Results for the implicit flow solution algorithm described in this thesis show that the 2nd and 3rd-order solutions both display semi-quadratic or superlinear convergence for all test cases if started from a good initial guess or approximate solution. T h e 4th-order solution still converges quickly although it is slower than the other discretization orders requiring nearly two times the number of. outer iterations as the 2nd-order solution started from a similar approximate flow solution. T h e employed start-up procedure is very effective in providing a good approximate solution as an initial condition for Newton phase with a reasonable cost. For subsonic or supersonic cases, a limited number of pre-iterations is sufficient to reach a good initial solution. For the transonic case, however, the number of pre-iterations increases since some residual reduction target must be met before switching to Newton iterations. T h e proposed start-up technique takes a considerable part of the solution time for the 2nd and 3rd-order solutions. A s a result, reduction of the start-up cost would significantly improve the over all performance of the flow solver. F i n d i n g a reasonably good initial solution quickly is something of an art rather than an exact science. If a pre-computed solution is available then the cost of the start-up phase can be ignored and in practice the total solution cost would be equal to the Newton phase^ost. T h e first-order solution, full potential solution, exact solution of the approximated problem or an empirical solution can be picked as an initial solution for the Newton phase depending on the availability and the problem case. ' Preconditioning and the accuracy of the linear system solution is a vital factor in the Newton phase of the N e w t o n - G M R E S solver. Evidence suggests that the number of outer Newton iterations in the 4th-order case can be reduced to the level of the 2nd and 3rd-order methods provided that the linear system resulting from the 4th-order discretization is solved completely [53]. Given the current preconditioning strategy, this requires a considerable increase (by a factor of 4-5) in the 4th-order residual evaluations in the matrix-free G M R E S algorithm. Therefore it is preferable to keep the subspace size limited to a moderate number (i.e. 30) and bear the cost of increasing the outer iteration numbers by a factor of two. T h e eigenvalue spectrum of the preconditioned system shows that the higher-order system is shifted toward singularity, increasing the condition number of the linear system. Therefore CHAPTER 8. CONCLUDING REMARKS 172 more effort is needed to solve the linear system resulting from the higher-order discretization, especially in the vicinity of the steady-state solution. For transonic flow, like smooth flows and despite using a limiter and the existence of normal shocks, superlinear convergence for the 2nd and 3rd-order discretizations and fast convergence for the 4th-order discretization have been obtained in the Newton phase. T h e applied limiter, which was active mainly in the strong shock, did not have a strong effect on deterioration of the convergence rate, and the employed continuous switch performed as designed to remove higher-order terms near shocks. Using an efficient start-up technique, a good preconditioner matrix, and an effective preconditioning strategy are the key issues for the robustness and fast convergence of the N e w t o n - G M R E S solver. T h e performance of the current flow solver in terms of C P U time scales linearly with the mesh size for all orders of discretization. A s an overall performance assessment (including the start-up phase), the 3rd-order solution is about 1.3 to 1.5 times, and the 4th-order solution is about 3.5-5 times, more expensive than the 2nd-order solution with the developed solver technology. It should be mentioned that the current 2nd-order solver algorithm, even without multigrid augmentation, is very competitive with the most efficient reported results for unstructured meshes in terms of the normalized total solution time (work units). T h e 3rd and 4th-order schemes are comparable in efficiency to the 2nd-order scheme as measured by residual evaluations. Mesh refinement is the critical issue in accuracy verification, and reducing the length scale uniformly for irregular unstructured grids is not an easy task. Consequently in the accu- racy assessment'process the generated mesh levels must be examined closely especially at boundaries. Boundaries in general and the solid wall boundary in particular are the main source of the solution error. Qualitative/quantitative comparisons of the fluid flow solutions for different discretization orders over several meshes show that the quality of the solution for smooth flows is dramatically improved using high-order discretization technique, especially in the case of coarse grids. T h i s suggests that depending on the error measurement criteria, significant effi- ciency/accuracy gains for high-order discretization are attainable. Since the developed implicit algorithm is matrix-free, memory usage will not be an issue. In the matrix-free case, memory usage is affected by the type of the preconditioning technique which in this thesis for I L U (4) is a bit more than two times of the required memory for storing the first-order Jacobian, Table 5.1. CHAPTER 8.3 8. CONCLUDING REMARKS 173 Recommended Future Work T h e proposed solver algorithm works quite efficiently and accurately for a variety of inviscid compressible flows. However, there are a number of areas where this research could be either improved or extended. 8.3.1 Start-up In the case of very large meshes, it is not efficient to perform the start-up procedure from scratch; it is more efficient if the computation procedure is started from a coarser level mesh. Therefore for very large meshes augmenting the pre-iterations with a mesh sequencing technique could be quite helpful. A t the same time total solution cost due to increasing mesh size in the start up can be dramatically reduced if a multi-grid procedure is employed. 8.3.2 Preconditioning A p p l y i n g a more accurate preconditioning matrix (that is, a better approximation to the higher-order Jacobian) instead of the first-order Jacobian would increase the system solving quality reducing the outer iteration numbers especially in the case of the 4th-order method. Having said that the preconditioning cost should be kept reasonably low since the factorization cost of the preconditioner matrix should be much smaller than solving the original linear system. A d d i n g a multi-grid scheme to the preconditioning strategy could also be helpful in the case of large meshes. 8.3.3 Reconstruction In general, limiters adversely affect the solution accuracy via manipulation of the reconstruction polynomial. T h i s issue becomes more important for higher-order reconstruction if the limiter fires unnecessarily and extensively in the flow field. A s a result, WENO schemes, in which limiters are avoided through a careful weighting non-oscillatory reconstruction procedure, may be an appropriate solution for higher-order reconstruction of nonsmooth flows. Furthermore, the benefit of applying higher-order reconstruction for high M a c h number/non-smooth flows, considering the natural oscillatory behavior of the reconstruction polynomial in such flows, still is an open area of research. CHAPTER 8.3.4 8. CONCLUDING REMARKS 174 Extension to 3 D Extension of the current higher-order 2D solver algorithm to a 3 D version, considering the availability of the 3D reconstruction procedure, is reasonably straight forward if an appropriate higher-order representation of a 3D mesh is available. Since the 2D algorithm is developed for unstructured meshes, there is no limitation in such an extension, and the performance of the flow solver in terms of the convergence should be improved due to introducing the third dimension (3D relieving effect). However, the matrix storage requirement in 3D needs to be carefully studied since storing even the first-order Jacobian (due to the large number of meshes in 3D) for preconditioning could be challenging. T h e incomplete factorization fill-level in 3D is another issue, and it may not be possible to use high fill-level I L U ( P ) for 3 D cases. 8.3.5 The Extension to Viscous Flows introduced implicit procedure could be implemented for viscous flow computation provided that the viscous residual function evaluation and a proper anisotropic unstruct u r e d / h y b r i d mesh are available. However there would be some issues regarding the conditioning of the linear system due to wide variety of geometrical and physical length scales, which need to be addressed by proper preconditioning. In the case of turbulence modeling the degree of coupling of the turbulence model in the implicit procedure would be another issue. Bibliography [1] P E T S c , T h e Portable Extensible Toolkit for Scientific Computation. Argonne National L a b , http://www-unix.mcs.anl.gov/petsc/petsc-as/index.html. [2] Test Cases for Inviscid Flow Fields. Technical Report A R - 2 1 1 , Advisory G r o u p for Aerospace Research and Development ( A G A R D ) , N A T O . , 1985. [3] R. Abgrall. O n Essentially Non-oscillatory Schemes on Unstructured Meshes: Analysis and Implementation. [4] M . Aftosmis, Journal of Computational Physics, D . Gaitonde, and S. T . Tavares. Techniques on Unstructured Meshes. [5] M . J . Aftosmis and M . J . Berger. 114:45-58, 1994. Behavior of Linear Reconstruction AIAA Journal, 33(ll):2038-2049, 1995. Multilevel Error Estimation and Adaptive h- Refinement for Cartesian Meshes W i t h Embedded Boundaries. A I A A Paper 2002-0863, 40th A I A A Aerospace Sciences Meeting and Exhibit, 2002. [6] R . K . Agarwal. A Compact High-Order Unstructured Grids M e t h o d for the Solution of Euler Equations. International Journal of Numerical Methods in Fluids, 31:121-147, 1999. [7] F . R . Bailey. H i g h - E n d C o m p u t i n g Challenges in Aerospace Design and Engineering. In Third International Conference on Computational Fluid Dynamics (Invited Lecture), 2004. [8] H . E . Bailey and R . M . Beam. Newton's Method A p p l i e d to Finite Difference Approximations for Steady State Compressible Navier-Stokes Equations. tational Physics, Journal of Compu- 93:108-127, 1991. [9] T . J . B a r t h . Analysis of Implicit Local Linearization Techniques for U p w i n d and T V D Algorithms. A I A A Paper 87-0595, 25th Aerospace Sciences Meeting and Exhibit, 1987. 1 7 5 BIBLIOGRAPHY 176 [10] T . J . B a r t h . Aspects of Unstructured Grids and Finite-Volume Solvers for the Euler and Navier-Stokes Equations, in Unstructured G r i d Methods for Advection-Dominated Flows. Technical R e p o r t ' A G A R D - R 7 8 7 , A G A R D , 1992. [11] T . J B a r t h . Recent Development in High Order K - E x a c t Reconstruction on Unstructured Meshes. A I A A Paper 93-0668, 31th Aerospace Sciences Meeting and Exhibit, 1993. [12] T . J . B a r t h and P. O . Frederickson. Higher-Order Solution of the Euler Equations on Unstructured Grids Using Quadratic Reconstruction. A I A A Paper 90-0013, 28th Aerospace Sciences Meeting and E x h i b i t , 1990. [13] T . J . B a r t h and D . C . Jespersen. T h e Design and Application of U p w i n d Schemes on Unstructured Meshes. A I A A Paper 89-0366, 27th Aerospace Sciences Meeting and Exhibit, 1989. ' [14] T . J . B a r t h and S. Linton. A n Unstructured Mesh Newton Solver for Compressible F l u i d Flow and its Parallel Implementation. A I A A Paper 95-0221, 33rd Aerospace Sciences Meeting and E x h i b i t , 1995. [15] F . Bassi and S. Rebay. High-Order Accurate Discontinuous Finite Element Solution of the 2D Euler Equations. Journal of Computational Physics, 138:251-285, 1997. [16] M . Benzi. Preconditioning Techniques for Large Linear Systems: A Survey. Journal of Computational Physics, 182:418-477, 2002. [17] K . S. Bey and J . T . Oden. A R u n g e - K u t t a Local Projection Pi-Discontinuous Galerkin Finite Element M e t h o d for H i g h Speed Flows. A I A A Paper 91-1575, 1991. [18] M . Blanco and D . Zingg. A Fast Solver for the Euler Equations on Unstructured Grids Using a N e w t o n - G M R E S Method. Number A I A A Paper 97-0331, 1997. [19] C . Boivin and C . Ollivier-Gooch. Guaranteed-Quality Triangular Mesh Generation for Domains with C u r v e d Boundaries. Engineering, International Journal for Numerical Methods in 55(10):1185-1213, 2002. [20] G . Corliss, C h . Faure, A . Griewank, L . Hascoet, and U . N a u m a n n . Automatic Differ- entiation of Algorithms From Simulation to Optimization. [21] Darren L . D e Zeeuw. Springer, 2002. A Quadtree-Based Adaptively-Refined Cartesian-Grid Algorithm for Solution of the Euler Equations. P h D thesis, Aerospace Engineering and Scientific C o m p u t i n g in the University of Michigan, 1993. BIBLIOGRAPHY 177 Polynomial Reconstruction Finite Volume Schemes for the Compressible Euler and Navier-Stokes Equations on Unstructured Adaptative Grids. P h D thesis, [22] M . Delanaye. Universite de Liege, Faculte des Sciences Appliquees, [23] M . Delanaye and J . A . Essers. 1998. Quadratic-Reconstruction Finite-Volume Scheme for AIAA Journal, Compressible Flows on Unstructured Adaptive Grids. 1997. 35(4):631-639, • [24] M . Delanaye, P h . Geuzaine, and J . A . Essers. Adaptive Grids. Conference, Compressible Flows on Unstructured A I A A Paper 97-2120, 13th A I A A Computational F l u i d Dynamics 1997. [25] R. S. Dembo, S. C . Eisenstat, and T . Steihaug. Journal on Numerical Analysis, 19(2):400-408, [26] J . E . Dennis and R. B . Schnable. and Non Linear Equations. Inexact Newton Methods. SIAM 1982. Numerical Methods for Unconstrained Optimizations Prentice-Hall, 1983. [27] J . Forrester, E . Tinoco, and Y u . Jong. T h i r t y years of development and application of C F D at Boeing Commercial Airplanes. A I A A Paper 2003-3439, 2003. [28] O . Friedrich. Weighted Essentially Non-Oscillatory Schemes for the Interpolation of Mean Values on Unstructured Grids. Journal of Computational Physics, 144:194-212, 1998. [29] P. E . G i l l , W . Murray, and M . H . Wright. Press, Practical Optimization. Elsevier Academic 1986. [30] A . G . Godfrey, C . R . Mitchell, and R. W . Walters. Practical Aspects of Spatially High Accurate Methods. A I A A Paper 92-0054, 30th Aerospace Sciences Meeting and Exhibit, 1992. [31] J . J . Gottlieb and C . P. T . G r o t h . Assessment of Riemann Solvers for One-Dimensional Inviscid Flows of Perfect Gases. Unsteady Journal of Computational Physics, 78(2):437-458, 1988. [32] W . D . G r o p p , D . E . Keyes, L . C . Mcinnes, and M . D . T i d r i r i . Globalized Newton- Krylov-Schwarz Algorithms and Software for Parallel Implicit C F D . formance Computing Applications, Int. J. High Per- 14:102-136, 2000. [33] A . Harten and S. Osher. Uniformly high-order accurate nonoscillatory schemes. Journal of Numerical Analysis, 24, 1987. SIAM BIBLIOGRAPHY 178 [34] A . Haselbacher. A W E N O Reconstruction Algorithm for Unstructured Grids Based on Explicit Stencil Construction. Number A I A A Paper -2005-879 in 43rd A I A A Aerospace Sciences Meeting and Exhibit, 2005. [35] C . Hirsch. Numerical Computation of Internal and External Flows, Computational Methods for Inviscid and Viscous Flows, [36] A . Jameson. volume 2. Jhon Wiley & Sons, 1990. Success and Challenges in Computational Aerodynamics. A I A A Paper 87-1184, 8th Computational F l u i d Dynamics Conference, 1987. [37] A . Jameson and D . Mavriplis. Finite-Volume Solution of the Two-Dimensional Euler Equations on a Regular Triangular Mesh. AIAA Journal, 24:611-618. [38] A . J . Jameson, W . Schmidt, and E . Turkel. Numerical Solutions of the Euler Equations by a Finite-Volume M e t h o d Using R u n g e - K u t t a Time-Stepping Schemes. A I A A Paper 81-1259, 1981. ; [39] D . B . K i m and P. D . Orkwis. Jacobian Update Strategies for Quadratic and NearQuadratic Convergence of Newton and Newton-Like Implicit Schemes. A I A A Paper 93-0878, 1993. [40] K . A . K n o l l and D . E . Keyes. Approaches and Applications. [41] C . B . Laney. Jacobian Free Newton K r y l o v Methods: A Survey of Journal of Computational Physics, Computational Gasdynamics. [42] R . J . L e Veque. Cambridge University Press, 1998. Numerical Methods for Conservation Laws. [43] X - D . L i u , S. Osher, and T . C h a n . 193:357-397, 2004. Birkhauser Verlag, 1990. Weighted Essentially Non-Oscillatory Schemes. Journal of Computational Physics, 115, 1994. [44] H . L u o , J . B a u m , and R . Lohner. A Fast, Matrix-free Implicit M e t h o d for Compressible Flows on Unstructured Girds. Journal of Computational Physics, 146:664-690, 1998. [45] L . M . Manzano, J . V . Lassaline, P. Wong, and D . W . Zingg. A Newton-Krylov A l gorithm for the Euler Equations Using Unstructured Grids. A I A A Paper 2003-0274, 41th A I A A Aerospace Sciences Meeting and Exhibit, 2003. [46] D . J . Mavriplis. O n Convergence Acceleration Techniques for Unstructured Meshes. Technical Report I C A S E No. 98-44, Institute for Computer Applications in Science and Engineering ( I C A S E ) , N A S A Langley Research Center, N A S A Langley Research Center, H a m p t o n V A 23681-2199, 1998. BIBLIOGRAPHY 179 [47] F . Mavriplis. C F D in aerospace in the new millenium. Space Journal, Canadian Aeronautics and 46(4):167-176, 2000. [48] A . Nejat and C . Ollivier-Gooch. A High-Order Accurate Unstructured G M R E S Solver for Poisson's Equation. 11th A n n u a l Conference of the Computational F l u i d Dynamics Society of C a n a d a , pages 344-349, 2003. [49] A . Nejat and C . Ollivier-Gooch. A High-Order Accurate Unstructured G M R E S Solver for the Compressible Euler Equations. T h i r d International Conference on C o m p u t a tional F l u i d Dynamics, 2004. [50] A . Nejat and C . Ollivier-Gooch. A High-Order Accurate Unstructured G M R E S Algorithm for Inviscid Compressible Flows. A I A A Paper 2005-5341, 17th A I A A Computational F l u i d Dynamics Conference, 2005. [51] A . Nejat and C . Ollivier-Gooch. A High-Order Accurate Unstructured Newton-Krylov Solver for Inviscid Compressible Flows. A I A A Paper 2006-3711, 36th A I A A Fluid Dynamics Conference, 2006. [52] A . Nejat and C . Ollivier-Gooch. O n Preconditioning of N e w t o n - G M R E S algorithm for a Higher-Order Accurate Unstructured Solver. 14th A n n u a l Conference of the C F D Society of C a n a d a , 2006. [53] A . Nejat and C . Ollivier-Gooch. Effect of Discretization Order on Preconditioning and Convergence of a Higher-Order Unstructured Newton-Krylov Solver for Inviscid Compressible Flows. A I A A Paper 2007-719, 45th A I A A Aerospace Sciences Meeting and Exhibit, 2007. [54] J . Nichols and D . W . Zingg. A Three-Dimensional M u l t i - B l o c k Newton-Krylov Flow Solver for the Euler Equations. A I A A Paper 2005-5230, 17th A I A A Computational F l u i d Dynamics Conference, 2005. [55] E . J . Nielsen, W . K . Anderson, R . W . Walters, and D . E . Keyes. Application of NewtonK r y l o v Methodology to a Three-Dimensional Unstructured Euler Code. A I A A Paper 95-1733, 12th A I A A Computational F l u i d Dynamics Conference, 1995. [56] C . Ollivier-Gooch. High-Order E N O Schemes for Unstructured Meshes Based on LeastSquare Reconstruction. A I A A Paper 97-0540, 35th A I A A Aerospace Sciences Meeting and Exhibit, 1997. [57] C . Ollivier-Gooch. Q u a s i - E N O Schemes for Unstructured Meshes Based on Unlimited Data-Dependent 133:6-17, 1997. Least-Squares Reconstruction. Journal of Computational Physics, BIBLIOGRAPHY 180 [58] C . Ollivier-Gooch. Volume Solver Programmers Reference Guide for the A N S L i b Software Library. Advanced Numerical Generic Finite- Simulation Laboratory ( A N S L a b ) , Mechanical Engineering Department, T h e University of British Columbia, 2002. [59] C . Ollivier-Gooch. G R U M M P (Generation and Refinement of Unstructured, M i x e d Element Meshes in Parallel) Version 0.3.2 User's Guide. Advanced Numerical Simulation Laboratory ( A N S L a b ) , Mechanical Engineering Department, T h e University of ' British Columbia, 2005. [60] C . Ollivier-Gooch and M . V a n Altena. A Higher-Order Accurate Unstructured Mesh Finite-Volume Scheme for the Advection-Diffusion Equation. Journal Physics, of Computational 181:729-752, 2002. [61] O . Onur and S. E y i . Effects of the Jacobian Evaluation on Newton's Solution of the Euler Equations. International Journal for Numerical Methods In Fluids, 49:211-231, 2005. [62] P. D . Orkwis. Comparison of Newton's and Quasi-Newton's M e t h o d Solvers for NavierStokes Equations. AIAA Journal, [63] A . Pueyo and D . W . Zingg. 31:832-836, 1993. A n Efficient N e w t o n - G M R E S Solver for Aerodynamic Computations. A I A A Paper 97-1955, 1997. [64] A . Pueyo and D . W . Zingg. Calculations. Progress in NewtonKrylov Methods for Aerodynamic A I A A Paper 97-0877, 35th Aerospace Sciences Meeting and Exhibit, 1997. [65] A . Pueyo and D . W . Zingg. Improvement to a Newton-Krylov Solver for Aerodynamic Flows. A I A A Paper 98-0619, 36th Aerospace Sciences Meeting and Exhibit, 1998. [66] S. De Rango and D . W . Zingg. Aerodynamic Computations Using a Higher-Order Algorithm. A I A A Paper 99-0167, 37th A I A A Aerospace Sciences Meeting and Exhibit, 1999. [67] S. De Rango and D . W . Zingg. Further Investigation of Higher-Order Algorithm for Aerodynamic Computations. A I A A Paper 20.00-0823, 38th Aerospace Sciences Meeting and Exhibit, 2000. [68] S. De Rango and D . W . Zingg. Higher-Order Aerodynamic Computations on M u l t i Block Grids: Exhibit, 2001. A I A A Paper 2001-0263, 39th A I A A Aerospace Sciences Meeting and 181 BIBLIOGRAPHY [69] P. Roe. Approximate Riemann Solvers, Parameter Vectors, and Difference Schemes. Journal of Computational Physics, 43:357-372, 1981. [70] P. Roe. Characteristic-Based Schemes for the Euler Equations. Mechanics, Annual Review of Fluid 18:337-365, 1986. [71] A . Rohde. Eigenvalues and Eigenvectors of the Euler Equations in General Geometries. A I A A Paper 2001-2609, 39th Aerospace Sciences Meeting and E x h i b i t , 2001. [72] Y . Saad, , and M . H . Schultz. A Generalized M i n i m a l Residual Algorithm for Solving Non-Symmetric Linear Systems. SIAM J. Sci., Stat. Comp., 7:856-869, 1986. [73] Y . Saad. A Flexible Inner-Outer Preconditioned G M R E S Algorithm. of Scientific Computing, [74] Y . Saad. SIAM Journal 14(2):461-469, 1993. Iterative Methods for Sparse Linear Systems. [75] P . R . Spalart. Topics in Detached-Eddy Simulation. In Siam, second edition, 2003. Third International Conference on Computational Fluid Dynamics (Invited Lecture), 2004. [76] Spitaleri R . M . and Regolo V . Multiblock Multigrid G r i d Generation Algorithms : Overcoming Multigrid Anisotropy. Applied mathematics and computation (Appl. math, comput.), 84:247-267, 1997. [77] J . L . Steger and R . F . Warming. F l u x Vector Splitting of the Inviscid Gasdynamics Equations with Application to Finite Difference Methods. Physics, Journal of Computational 40(263-293), 1981. [78] M . Tadjouddine, S. A . Forth, and N . Q i n . Elimination A D A p p l i e d to Jacobian A s sembly for an Implicit Compressible C F D Solver. International Journal for Numerical Methods in Fluids, 47:1315-1321, 2005. [79] M . D . T i d r i r i . Preconditioning Techniques for the Newton-Krylov Solution of C o m - pressible Flows. [80] M . V a n Altena. Journal of Computational Physics, 132:51-61, 1997. r High-Order Finite-Volume Discretisations for Solving a Modified Advection-Diffusion Problem on Unstructured Triangular Meshes. Master's thesis, T h e University of British Columbia, Mechanical Engineering Department, 1999. [81] H . A . V a n Der Vorst. Iterative Krylov Methods for Large Linear Systems. • University Press, 2003. Cambridge BIBLIOGRAPHY [82] B . V a n Leer. 182 Towards the Ultimate Conservative Difference Scheme. V . A Second- Order Sequel to Godunov's M e t h o d . Journal of Computational Physics, 32:101-136, 1979. [83] B . V a n Leer, J . L . Thomas, P. L . Roe, and R. W . Newsome. A Comparision of Numerical F l u x Formulas for the Euler and Navier-Stokes Equations. A I A A Paper 87-1104, 1987. [84] K . J . Vanden and P. D . Orkwis. Comparison of Numerical and Analytical Jacobians. AIAA Journal, 34(.6):1125-1129, 1996. [85] V . Venkatakrishnan. Newton Solution of Inviscid and Viscous Problems. A I A A Paper 88-0143, 26th Aerospace Sciences Meeting and Exhibit, 1988. [86] V . Venkatakrishnan. O n the Accuracy of Limiters and Convergence to Steady State Solutions. A I A A Paper 93-0880, 31st Aerospace Sciences Meeting and Exhibit, 1993. [87] V . Venkatakrishnan. Convergence to Steady state Solutions of the Euler Equations on Unstructured Grids W i t h Limiters. Journal of Computational Physics, 118:120-130, 1995. [88] V . Venkatakrishnan. A Perspective on Unstructured G r i d Flow Solvers. Report I C A S E 95-3, N A S A , Technical 1995. [89] V . Venkatakrishnan and T . J . B a r t h . Application of Direct Solvers to Unstructured Meshes for the Euler and Navier-Stokes Equations Using U p w i n d Schemes. AIAA Paper 89-0364, 27th Aerospace Sciences Meeting and Exhibit, 1989. [90] V . Venkatakrishnan and D . Mavriplis. Implicit Solvers for Unstructured Meshes. nal of Computational Physics, Jour- 105(83-91), 1993. [91] V . Venkatakrishnan and D . J . Mavriplis. Implicit Solvers for Unstructured Meshes. A I A A Paper 91-1537, 1991. [92] D . S. Watkins. Fundamentals of Matrix Computations. John Wiley & Sons, 1991. [93] D . L . Whitaker. Three Dimensional Unstructured G r i d Euler Computations Using a Fully-Implicit U p w i n d M e t h o d . A I A A Paper 93-3337, 1993. [94] L . Wigton. Application of M A C S Y M A and Sparse M a t r i x Technology to Multielement Airfoil Calculations. A I A A Paper 87-1142, 1987. BIBLIOGRAPHY 183 [95] D . J . Willis, J . Peraire, and J . K . White. A Quadratic Basis Function, Quadratic Geometry, H i g h Order Panel Method. A I A A Paper 2005-0854, 43rd A I A A Aerospace sciences Meeting and Exhibit, 2005. [96] D . W . Zingg, S. De Rango, M . Nemec, and T . H . Pulliam. Spatial Discretizations for the Navier-Stokes Equations. Physics, 160:683-704, 2000. Comparison of Several Journal of Computational
- Library Home /
- Search Collections /
- Open Collections /
- Browse Collections /
- UBC Theses and Dissertations /
- A higher-order accurate unstructured finite volume...
Open Collections
UBC Theses and Dissertations
Featured Collection
UBC Theses and Dissertations
A higher-order accurate unstructured finite volume Newton-Krylov algorithm for inviscid compressible… Nejat, Amir 2007
pdf
Page Metadata
Item Metadata
Title | A higher-order accurate unstructured finite volume Newton-Krylov algorithm for inviscid compressible flows |
Creator |
Nejat, Amir |
Publisher | University of British Columbia |
Date Issued | 2007 |
Description | A fast implicit (Newton-Krylov) finite volume algorithm is developed for higher-order unstructured (cell-centered) steady-state computation of inviscid compressible flows (Euler equations). The matrix-free General Minimal Residual (GMRES) algorithm is used for solving the linear system arising from implicit discretization of the governing equations, avoiding expensive and complicated explicit computation of the higher-order Jacobian matrix. An Incomplete Lower-Upper factorization technique is employed as the preconditioning strategy and a first-order Jacobian as a preconditioning matrix. The solution process is divided into two phases: start-up and Newton iterations. In the start-up phase an approximate solution of the fluid flow is computed which includes most of the physical characteristics of the steady-state flow. A defect correction procedure is proposed for the start-up phase consisting of multiple implicit pre-iterations. At the end of the start-up phase (when the linearization of the flow field is accurate enough for steady-state solution) the solution is switched to the Newton phase, taking an infinite time step and recovering a semi-quadratic convergence rate (for most of the cases). A proper limiter implementation for higher-order discretization is discussed and a new formula for limiting the higher-order terms of the reconstruction polynomial is introduced. The issue of mesh refinement in accuracy measurement for unstructured meshes is revisited. A straightforward methodology is applied for accuracy assessment of the higher-order unstructured approach based on total pressure loss, drag measurement, and direct solution error calculation. The accuracy, fast convergence and robustness of the proposed higher-order unstructured Newton-Krylov solver for different speed regimes are demonstrated via several test cases for the 2nd, 3rd and 4th-order discretization. Solutions of different orders of accuracy are compared in detail through several investigations. The possibility of reducing the computational cost required for a given level of accuracy using high-order discretization is demonstrated. |
Genre |
Thesis/Dissertation |
Type |
Text |
Language | eng |
Date Available | 2011-01-28 |
Provider | Vancouver : University of British Columbia Library |
Rights | For non-commercial purposes only, such as research, private study and education. Additional conditions apply, see Terms of Use https://open.library.ubc.ca/terms_of_use. |
DOI | 10.14288/1.0080717 |
URI | http://hdl.handle.net/2429/30969 |
Degree |
Doctor of Philosophy - PhD |
Program |
Mechanical Engineering |
Affiliation |
Applied Science, Faculty of Mechanical Engineering, Department of |
Degree Grantor | University of British Columbia |
Campus |
UBCV |
Scholarly Level | Graduate |
Aggregated Source Repository | DSpace |
Download
- Media
- 831-ubc_2007-267684.pdf [ 19.42MB ]
- Metadata
- JSON: 831-1.0080717.json
- JSON-LD: 831-1.0080717-ld.json
- RDF/XML (Pretty): 831-1.0080717-rdf.xml
- RDF/JSON: 831-1.0080717-rdf.json
- Turtle: 831-1.0080717-turtle.txt
- N-Triples: 831-1.0080717-rdf-ntriples.txt
- Original Record: 831-1.0080717-source.json
- Full Text
- 831-1.0080717-fulltext.txt
- Citation
- 831-1.0080717.ris
Full Text
Cite
Citation Scheme:
Usage Statistics
Share
Embed
Customize your widget with the following options, then copy and paste the code below into the HTML
of your page to embed this item in your website.
<div id="ubcOpenCollectionsWidgetDisplay">
<script id="ubcOpenCollectionsWidget"
src="{[{embed.src}]}"
data-item="{[{embed.item}]}"
data-collection="{[{embed.collection}]}"
data-metadata="{[{embed.showMetadata}]}"
data-width="{[{embed.width}]}"
async >
</script>
</div>
Our image viewer uses the IIIF 2.0 standard.
To load this item in other compatible viewers, use this url:
http://iiif.library.ubc.ca/presentation/dsp.831.1-0080717/manifest