Simulating Water for Computer GraphicsParticle-in-Cell, Explicit Surfaces, and Discontinuous GalerkinbyEssex EdwardsB.CIS, The University of the Fraser Valley, 2006M.Sc, The University of British Columbia, 2010A THESIS SUBMITTED IN PARTIAL FULFILLMENTOF THE REQUIREMENTS FOR THE DEGREE OFDoctor of PhilosophyinTHE FACULTY OF GRADUATE AND POSTDOCTORALSTUDIES(Computer Science)The University of British Columbia(Vancouver)December 2015© Essex Edwards, 2015AbstractWe propose several advances in the simulation of fluids for computer graphics.We concentrate on particle-in-cell methods and related sub-problems. We develophigh-order accurate extensions to particle-in-cell methods demonstrated on a va-riety of equations, including constrained dynamics with implicit-explicit time in-tegration. We track the liquid-air interface with an explicit mesh, which we showhow to do in a provably exact fashion. To address the mismatched simulation andsurface resolution, we solve the partial differential equations in each time step withwith a p-adaptive discontinuous Galerkin discretization. This allows us to use acoarse regular grid for the entire simulation. For solving the resulting linear sys-tem, we propose a novel mostly-algebraic domain decomposition preconditionerthat automatically creates a coarse discontinuous Galerkin approximation of theproblem.iiPrefaceThe work presented in this thesis was supervised by Dr. Robert Bridson, as partof Imager Lab at the University of British Columbia. Generally, the projects weresuggested by Dr. Bridson, while the research, design, implementation, and writingwere largely performed by myself or other collaborators.Chapter 3 was published as “E. Edwards and R. Bridson. Detailed waterwith coarse grids: combining surface meshes and adaptive discontinuous Galerkin.ACM Transactions on Graphics, 33(4), 2014” [68] and presented at SIGGRAPH2014.Portions of Chapter 4 are modified from the published paper “E. Edwards andR. Bridson. A high-order accurate particle-in-cell method. International Journalfor Numerical Methods in Engineering, 90(9): 1073-1088, 2012” [67] and from mymaster’s thesis [66] with the same title. The research contributions of the chapterare new and unpublished.Chapter 5 and Appendix A discuss work that was published as “T. Brochu,E. Edwards, and R. Bridson. Efficient Geometrically Exact Continuous CollisionDetection. ACM Transactions on Graphics, 31(4), 2012” [36]. Appendix A is amodified version of that published paper. My contribution to this project was theproof of correctness and the Root Parity Lemma, originally included as supple-mentary material to the published paper.The work in Chapter 6 is available as a preprint on arXiv [69].iiiTable of ContentsAbstract . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . iiPreface . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . iiiTable of Contents . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . ivList of Tables . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . viiiList of Figures . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . ixGlossary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . xi1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11.1 Concepts . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 21.2 Projects . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 32 Related Work . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 62.1 Particle-in-Cell Methods . . . . . . . . . . . . . . . . . . . . . . 62.2 Discontinuous Galerkin . . . . . . . . . . . . . . . . . . . . . . . 102.3 Matching Surface and Simulation Resolution . . . . . . . . . . . 123 DG Water . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 143.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 143.2 Related Work . . . . . . . . . . . . . . . . . . . . . . . . . . . . 163.3 The Method . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 183.3.1 Discretizing Projection . . . . . . . . . . . . . . . . . . . 18iv3.3.2 The Volume Mesh . . . . . . . . . . . . . . . . . . . . . 193.3.3 The Approximation Space . . . . . . . . . . . . . . . . . 213.3.4 The LDG Equations . . . . . . . . . . . . . . . . . . . . 223.3.5 Assembly and Linear Algebra . . . . . . . . . . . . . . . 253.3.6 Interpolating to the Particles . . . . . . . . . . . . . . . . 273.3.7 Interpolating from the Particles . . . . . . . . . . . . . . 283.3.8 Particles and Advection . . . . . . . . . . . . . . . . . . 293.3.9 Surface Tracking . . . . . . . . . . . . . . . . . . . . . . 333.3.10 Viscosity . . . . . . . . . . . . . . . . . . . . . . . . . . 333.4 Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 343.5 Discussion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 373.6 Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 404 High-Order Accurate Particle-in-Cell Methods . . . . . . . . . . . . 414.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 414.2 High-Order PIC . . . . . . . . . . . . . . . . . . . . . . . . . . . 424.2.1 Deriving the Time Discretization . . . . . . . . . . . . . . 424.2.2 Moving the Particles . . . . . . . . . . . . . . . . . . . . 444.2.3 Stokes Solve . . . . . . . . . . . . . . . . . . . . . . . . 454.2.4 Interpolating from Grid to Particles . . . . . . . . . . . . 474.2.5 Reseeding . . . . . . . . . . . . . . . . . . . . . . . . . . 484.2.6 Startup . . . . . . . . . . . . . . . . . . . . . . . . . . . 484.3 A Projection Method . . . . . . . . . . . . . . . . . . . . . . . . 484.4 Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 504.4.1 Taylor Green Flow . . . . . . . . . . . . . . . . . . . . . 514.4.2 A Complex Rayleigh Taylor Instability . . . . . . . . . . 524.5 Discussion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 544.6 Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 545 Free-Surface Geometry . . . . . . . . . . . . . . . . . . . . . . . . . 555.1 The Root Parity Lemma and Proof of Correctness . . . . . . . . . 575.1.1 Outline . . . . . . . . . . . . . . . . . . . . . . . . . . . 575.1.2 Piecewise Linear Functions . . . . . . . . . . . . . . . . 58v5.1.3 The Root Parity Lemma . . . . . . . . . . . . . . . . . . 595.1.4 Topological Degree . . . . . . . . . . . . . . . . . . . . . 635.1.5 Proof of the Collision Algorithm . . . . . . . . . . . . . . 645.2 Discussion of the Odd-Parity Counting Argument . . . . . . . . . 655.2.1 No Self-Intersections are Introduced . . . . . . . . . . . . 655.2.2 Disjoint Volumes Remain Disjoint . . . . . . . . . . . . . 666 Discretely Discontinuous Galerkin Coarse Grid . . . . . . . . . . . . 676.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 676.2 Related Work . . . . . . . . . . . . . . . . . . . . . . . . . . . . 696.3 Preliminaries . . . . . . . . . . . . . . . . . . . . . . . . . . . . 706.4 The Coarse Grid . . . . . . . . . . . . . . . . . . . . . . . . . . . 716.5 Coarse Grid Analysis . . . . . . . . . . . . . . . . . . . . . . . . 746.5.1 Error Bound Using Geometric Properties . . . . . . . . . 746.5.2 Error Bound Using Algebraic Properties . . . . . . . . . . 776.5.3 Practical Considerations . . . . . . . . . . . . . . . . . . 786.6 Domain Decomposition . . . . . . . . . . . . . . . . . . . . . . . 796.6.1 Condition Number . . . . . . . . . . . . . . . . . . . . . 806.7 Numerical Experiments . . . . . . . . . . . . . . . . . . . . . . . 806.8 Weighting for a Better Coarse Grid . . . . . . . . . . . . . . . . . 856.9 Discussion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 907 Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 94Bibliography . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 98A Efficient Geometrically Exact Continuous Collision Detection . . . . 116A.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 116A.2 Related Work . . . . . . . . . . . . . . . . . . . . . . . . . . . . 118A.2.1 The Cubic Solver Approach . . . . . . . . . . . . . . . . 118A.2.2 Other Work in Continuous Collision Detection . . . . . . 119A.2.3 Exact Geometric Computation . . . . . . . . . . . . . . . 120A.3 Continuous Collision Detection in 3D . . . . . . . . . . . . . . . 121A.3.1 Determining Root Parity . . . . . . . . . . . . . . . . . . 123viA.3.2 Ray-Bilinear-Patch Crossing Parity Testing . . . . . . . . 125A.3.3 Putting it Together . . . . . . . . . . . . . . . . . . . . . 126A.4 Implementation . . . . . . . . . . . . . . . . . . . . . . . . . . . 128A.4.1 Resolving Collisions . . . . . . . . . . . . . . . . . . . . 129A.5 Examples . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 130A.6 Conclusions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 133A.7 Implicit Function for a Bilinear Patch . . . . . . . . . . . . . . . 133A.8 Acknowledgments . . . . . . . . . . . . . . . . . . . . . . . . . 135viiList of TablesTable 3.1 Equations for affine advection schemes . . . . . . . . . . . . . 30Table 3.2 Detailed timings of the algorithm’s components . . . . . . . . 38Table 6.1 Common symbols for DD . . . . . . . . . . . . . . . . . . . . 71Table 6.2 Iterations of CG with DDG . . . . . . . . . . . . . . . . . . . . 91Table 6.3 Runtimes for DDG . . . . . . . . . . . . . . . . . . . . . . . . 92Table 6.4 Normalized runtimes for DDG . . . . . . . . . . . . . . . . . . 93viiiList of FiguresFigure 3.1 A simulation generates thin splashes . . . . . . . . . . . . . . 15Figure 3.2 Schematic diagram of detailed cut cells . . . . . . . . . . . . 20Figure 3.3 A simulation entirely within one grid cell . . . . . . . . . . . 21Figure 3.4 An illustrative p-adaptive simulation . . . . . . . . . . . . . . 23Figure 3.5 Comparison of advection methods . . . . . . . . . . . . . . . 31Figure 3.6 High-quality triangles from the explicit surface tracker . . . . 32Figure 3.7 Sub-grid features properly captured by p-adaptivity . . . . . . 34Figure 3.8 Examples with thin sheets . . . . . . . . . . . . . . . . . . . 35Figure 3.9 Comparison of our method to a commercial FLIP code . . . . 36Figure 4.1 Procedure for a single time step of the implicit-explicit PIC . . 43Figure 4.2 Error convergence of PIC on the Taylor Green vortex . . . . . 51Figure 4.3 Fluid density in a complex scenario . . . . . . . . . . . . . . 53Figure 6.1 Error after smoothing with DD . . . . . . . . . . . . . . . . . 72Figure 6.2 Code to build DD restriction matrix . . . . . . . . . . . . . . . 73Figure 6.3 Regions Ωcut and Ωsub used for analysis . . . . . . . . . . . . 75Figure 6.4 Meshes used in DDG examples . . . . . . . . . . . . . . . . . 81Figure 6.5 p-dependent convergence on a biharmonic problem . . . . . . 84Figure 6.6 Optimal convergence when subdomain size is not fixed . . . . 85Figure A.1 Cloth folding over a spinning ball . . . . . . . . . . . . . . . 117Figure A.2 Root parity test with no roots . . . . . . . . . . . . . . . . . . 124Figure A.3 Root parity test with roots . . . . . . . . . . . . . . . . . . . 124Figure A.4 The ray-vs-bilinear-patch parity test . . . . . . . . . . . . . . 126ixFigure A.5 Degenerate ray intersections . . . . . . . . . . . . . . . . . . 128Figure A.6 CCD stress test. . . . . . . . . . . . . . . . . . . . . . . . . . 130Figure A.7 Cloth with an adaptive simulation mesh . . . . . . . . . . . . 132xGlossaryBDD balancing domain decompositionBDF backward differentiation formulaCCD continuous collision detectionCG conjugate gradient methodDAE differential algebraic equationDD domain decompositionDDG discretely-discontinuous GalerkinDG discontinuous GalerkinFD finite differenceFEM finite element methodFETI finite element tearing and interconnect, domain decompositionFLIP fluid implicit particleFVM finite volume methodGIMP generalized interpolation material-point methodLDG local discontinuous GalerkinMAC the staggered marker and cell gridMG multigridMLS moving least squaresMPM the material-point methodPCG preconditioned conjugate gradientPDE partial differential equationPIC particle-in-cell methodSPD symmetric positive definite matrixSPH smoothed particle hydrodynamicsxiChapter 1IntroductionNumerical simulation of fluids has emerged as an important problem in computergraphics, and of course has long been a challenge in science and engineering. Incomputer graphics, these simulations typically treat water (or other liquids) as in-compressible, uniform density, often inviscid, often with no surface tension, andwith zero pressure at the free surface. This is described mathematically by theNavier-Stokes equations, with these simplifications applied.Any algorithm for solving these equations has to tackle many problems, eachof which have been the topic of many published works. The velocity field mustbe tracked somehow as it is advected by the fluid motion. Likewise, the liquidsurface moves and must also be tracked, while allowing for topology changes.The incompressibility condition introduces a global coupling that almost invariablyrequires solving a large sparse linear system to find the fluid pressure.Computer graphics, like any field, has its own set of requirements and errormetrics that make some techniques more appropriate than others. Computer graph-ics emphasizes perceptual accuracy, human spatial and temporal scales (around onemeter and one second), and the ability to plausibly and robustly handle very com-plex situations with relatively coarse grids and large time steps.The bulk of this thesis is organized into chapters based on separate projectscompleted during my time at the University of British Columbia (UBC). The re-search chapters begin with Chapter 3, which describes a complete liquid simulationusing modern and novel techniques. The remaining chapters investigate some of1those techniques in more detail, and develop and apply them in other ways. Histor-ically speaking, most of the later chapters were done first, before being synthesizedin Chapter 3, but we present Chapter 3 first to help establish a context for the otherprojects. Before the research chapters, Chapter 2 reviews the published literaturein the field, further placing the thesis in context, and introducing the tools usedin the research. The remainder of the introduction outlines the key concepts, theprojects, and their connections.1.1 ConceptsThroughout the thesis, several concepts and tools are used repeatedly. The coreones are briefly introduced below.Particle-in-Cell. PIC methods simulate continuum mechanics problem in flowswith high distortion. They work by representing the material state with Lagrangianparticles, but solve all of the spatial equations by transferring information from theparticles to a structured grid, solving the equations on the grid, and transferringresults back to the particles.PIC methods are used in Chapter 3 and Chapter 4.High-Order Accuracy. High-order accurate methods and high-order convergenceare used in every part of this thesis. For example, a function can be approximatedas as piecewise polynomial with polynomials of degree p inside of mesh elementsof diameter h. If that function is sufficiently smooth, then the approximation error(L∞ norm) will be bounded by O(hp+1). Other high-order schemes all have similarerror bounds, although h may refer to the time-step for time-dependent problems.When using high-order schemes, solution accuracy can be increased by reducingh (h-refinement) or increasing p (p-refinement). Which approach is most efficientis case-dependent. Refining both (hp-refinement) in an adaptive fashion is alsopossible.Discontinuous Galerkin. DG is a flexible technique for solving partial differentialequations (PDEs). Using DG, the problem domain is partitioned into elements and2the solution within each element is restricted to be a polynomial. Often DG meth-ods are high-order accurate. The distinct feature of discontinuous Galerkin (DG),relative to finite element method (FEM), is that the functions are allowed to be dis-continuous across element boundaries.DG concepts are used in Chapter 3 and Chapter 6.1.2 ProjectsDG Water. In Chapter 3, many ideas come together to make a state-of-the-artliquid simulation. This chapter follows a trend in recent research of using explicittriangle meshes to track the liquid surface during simulation, while the physicalforces are computed on an underlying grid. Using an explicit surface allows forvery detailed and accurate surface tracking, but special care needs to be taken tocouple the grid and the surface, or artifacts will develop.The core problem is that the surface mesh resolution may be higher than the res-olution of the grid. Consequently, there are deformations of the surface that cannotbe resolved by the physics computed on the grid. Chapter 3 describes an adaptivemethod that still uses a simple coarse Cartesian grid. To capture high-resolutionsurface details we use exact detailed cut cells at boundaries, and adaptively usericher discrete models within grid cells near the surface, more specifically a p-adaptive DG method. This DG discretization of the spatial components is used in aparticle-in-cell (PIC) discretization of the flow problem, necessitating some changesto the typical operations that couple the PIC particles to the grid.High-Order Accurate Particle-in-Cell Methods. Particle-in-cell (PIC) methods, inparticular fluid implicit particle (FLIP) and the material-point method (MPM), arevery popular in computer graphics. However, they both suffer from problems oflimited accuracy, and have been almost exclusively limited to first-order accuracysince the introduction of PIC in the 1950s. Chapter 4 introduces a high-order accu-rate PIC method.This chapter concentrates on two key components: interpolation between theparticles and the grid, and time-integration. This chapter is follow-up work to3my master’s thesis, which introduced a high-order accurate PIC method, but withlimitations that made it impossible to apply to incompressible Navier-Stokes inthe usual velocity-pressure form. In this thesis, we develop a semi-implicit time-integration scheme capable of handling the constrained dynamics necessary forNavier-Stokes in this form.The PIC operations in Chapter 4 and Chapter 3 are quite similar. In both cases,particles are interpolated in a different way than most PIC methods, allowing forhigher-order accuracy. Chapter 3 does not achieve global high-order convergence,primarily due to the use of low-order time integration, but Chapter 4 demonstrateshow this can be done.Free-Surface Geometry. Chapter 5 describes the first provably exact algorithmfor detecting collisions between moving triangle mesh elements. This primitiveoperation is a fundamental building block for robust adaptive and dynamic meshes,which made possible the reliable surface tracking used in Chapter 3.Algebraic DG Coarse Grid for Domain Decomposition. Many discretizationsof elliptic PDEs lead to sparse symmetric positive definite (SPD) linear systems.For large 3D problems, iterative solvers such as preconditioned conjugate gradi-ent (PCG) are usually necessary and the quality of the preconditioner becomes thecrucial factor for efficiency and robustness. With an optimal preconditioner, thelinear system can be solved in a time which scales linearly with the problem size.Domain decomposition (DD) is one common framework for optimal precondition-ing [136, 153].A popular framework for optimal DD solvers is the Schwarz method with coarsegrid correction. First, the problem domain is partitioned into multiple subdomains,in which the local problem is solved by whatever technique is convenient. Thesesolutions are missing the global modes, so the second component in DD is a coarseglobal discretizations of the PDE. Combined, the coarse global solution and thelocal solutions are potentially an optimal preconditioner.In Chapter 6, we present a coarse discretization that can be constructed alge-braically from the input matrix and the positions of the mesh nodes. This dis-4cretization is inspired by DG, as used in Chapter 3. The method is easy to use,applies to a broad variety of PDEs, and leverages the power of high-order conver-gence to outperform other simple algebraic schemes and converge optimally evenon higher-order PDEs.5Chapter 2Related WorkEach research chapter includes a focused review of the most relevant literature forthat chapter. The following sections provide a broader view on the main tools andproblems in the thesis.2.1 Particle-in-Cell MethodsThe PIC method is an approach to simulating continuum mechanics problems inflows with high distortion. PIC is a fundamentally Lagrangian scheme, in which allmaterial properties are carried on Lagrangian particles that move with the flow. Oneach time step, some PDE must be solved to compute the forces (and other updates)applied to the particles. To compute these efficiently, all material properties areinterpolated from the particles to a convenient computation grid. The PDE is thensolved on the grid and updates are interpolated back to the particles.The hybrid particle/mesh approach of PIC attempts to get the best of both La-grangian and Eulerian approaches. The use of Lagrangian particles for trackingmaterial properties can give very low numerical diffusion compared to Eulerianrepresentations on relatively coarse grids. The use of a structured grid for thesolution of the PDE enables the use of efficient and accurate finite difference (FD)methods, finite volume methods (FVM), or finite element methods (FEM) rather thanthe typically more complicated schemes, or less accurate approximations, used infully mesh-free methods.6In the most general form of a modern PIC method for fluid and continuummechanics problems, particles with positions r and other attributes q (momentumor velocity included in q, or derivable from it) are seeded throughout the domain asinitial conditions. The attributes are transferred from the particles to a structuredgrid, so that Qi are the values on the grid. Some PDE discretization is solved onthe grid to compute values at the end of the time step, Qnew. The changes over thecourse of the time step, i.e., ∆Q = Qnew−Q, are interpolated from the grid backto the particles and used to update their values to end of the time step. Similarly,the particles are moved according to the velocity field interpolated from the grid.The particle-in-cell method first appeared as a method for solving compressibleflow problems in a series of Los Alamos National Lab technical reports starting inthe 1950s [77, 90, 91], and in the academic literature shortly after [88, 92]. A goodreference for this early work is the 1969 annotated bibliography by Harlow [89].At this stage, PIC was interpreted as an essentially Eulerian methodology. Theparticles were only used when they moved from one grid cell to another: when aparticle crossed cell boundaries, it would pick up some of the momentum, mass,and energy of its old cell, and deposit it into its new cell. This approach sufferedfrom numerous problems, including excessive diffusion, noise, and instabilities.The 1970s saw essentially no advances in PIC methods for fluid or continuummechanics problems, because of its poor performance. In fact, Brackbill et al.wrote, “were it not for its usefulness in simulating plasmas, PIC would be mainly ofhistorical interest” [25]. Fortunately, advances were being made with PIC in plasmasimulations, and in the early 1980s, these advances were adopted from plasmadynamics back into fluid mechanics in the fluid implicit particle (FLIP) method[24, 25, 39].In FLIP, the particles are the primary representation of the material and carry theentire material state, making it a fundamentally Lagrangian method. This changealone is mostly conceptual, but it paves the way for future improvements. Whenthe time step is computed on the grid, only updates are interpolated back to theparticles, rather than new values. For example, the particle velocities are updatedwith the acceleration computed on the grid, rather than directly interpolating thenew particle velocities from the new grid velocities. These changes effectivelyremoved all numerical diffusion from FLIP.7Through the 1990s, FLIP was extended to apply to history-dependent elasto-plastic materials, in the the material-point method (MPM) [145, 146]. For MPM, theequations were derived from a weak formulation, and strong theoretical connec-tions were made between FEM and PIC, with the particles interpreted as quadraturepoints. After its introduction, many papers were published on improvements, ex-tensions, or applications of MPM. Recent applications of MPM include: brittle frac-ture [107], adhesive contact [106], explosive welding [162], non-local plasticity[40], and nearly-incompressible materials [112].The generalized interpolation material-point method (GIMP) [13] presents aunifying framework in which to analyze FLIP, MPM, and their relatives. In GIMP,both the particles and the grid nodes have shape functions, which may be unrelatedto each other. The spatial PDEs are discretized in terms of the grid’s degrees-of-freedom by a Petrov-Galerkin procedure that projects the particle fields onto thegrid fields. The entries in the mass matrix, for example, are sums of integrals ofthe products of the grid shape functions and particle shape functions. The originalMPM paper corresponds to the choice of linear “tent” functions for the grid shapefunctions, and Dirac-deltas for the particle shape functions.With the framework established by MPM and GIMP, many papers investigatedsources of error and potential improvements. Every method published to this pointis inherently first-order accurate in space and usually first- or second-order accuratein time. Steffen et al. [140] analyze the many spatial and temporal error sourceswithin MPM, and describe a method to balance the error terms by choosing appro-priate spatial and temporal resolutions.Several papers explored implicit time integration schemes for MPM [59, 86, 87,144]. These typically focused on non-linear plasticity problems, and used iterativeNewton-Krylov methods. The degrees of freedom being solved for are only thegrid degrees of freedom, but each iteration needs to involve the particles becauseof their appearance in the mass-matrix and other terms. Wallstedt and Guilkey[159] evaluated many explicit time integration schemes. Ultimately, they writethat “while the choice of time integration scheme has a large impact on the overallaccuracy of a simulation, the ultimate conclusion [...] is that, when the best ofthese choices is made, spatial error remains dominant.”Spatial errors have received a lot of attention too. Steffen et al. [139] write,8“much of the anomalous behavior exhibited by MPM can be attributed to the quadra-ture approximation properties of the method. In fact, many of the proposed im-provements to MPM either explicitly or tacitly attempt to control and improve MPM’squadrature behavior.” In particular, they shows that the grid-cell-crossing instabil-ity that plagues all PIC methods up to and including MPM is due to quadrature errorsthat result from using the particles as quadrature points on non-smooth integrands.Improvements to the spatial terms have been made by adopting a Hermite-typescheme for interpolating the particle to the grid [158], by using smoother shapefunctions that reduce quadrature errors [139, 172], or by using higher-order ele-ments for the grid shapes [5]. Moving away from the framework of GIMP, spatialerrors are drastically reduced by applying least-squares approximation to interpo-late from the particles to the grid [67, 160].MPM, FLIP, and GIMP are almost exclusively applied to compressible materialproblems. In fact, MPM fails on incompressible problems due to locking [112].However, in plasma dynamics, while the flows are not divergence-free, there areconstraints on the divergence of the electric and magnetic fields. For a introductionto PIC’s use in plasma physics, we refer the reader to the review paper by Tskhakayaet al. [154]. Jacobs and Hesthaven [95, 96] describe a plasma PIC method that ishigh-order accurate in both space and time. They satisfy the divergence constrainteither by projection or by weakening the constraint. Their approach is not imme-diately applicable to incompressible fluid dynamics, because of the different waythat the divergence-constraint acts in the plasma equations, as well as significantdifferences between PIC’s application in plasma dynamics and fluid dynamics.In computer graphics, PIC was introduced to fluid simulation [175] as a modi-fied form of FLIP, adapted to incompressible flow. Rather than storing extensiveproperties (e.g., mass and momentum) and being concerned with conservationlaws, the particles are interpreted as storing point-samples of the unknown continu-ous fields (e.g., density and velocity). The particle-to-grid interpolation problem isthen a scattered-data interpolation/approximation problem. The grid discretizationdoes not depend on any information about the particles, except through the right-hand-side “load” vector. The complete decoupling of the particles from the grid’sdiscretization is very flexible, but makes it unclear how to enforce conservationlaws, if it is even possible. It is this flexible and decoupled form that we use in this9thesis.Another advantage of using the particles as massless point samples is thatadding and removing particles is comparatively trivial; several interesting tech-niques have been proposed [6, 23]. In comparison, removing particles in MPMremoves mass and momentum from the system, so more delicate techniques arerequired [105, 147].Recently, several papers [141, 142] have been published at SIGGRAPH usingMPM for elasto-plastic solids and melting. The high-quality results seem to haveexcited an interest in MPM for the graphics community, in addition to the ongoingsuccess of FLIP.Chapter 3 and Chapter 4 are both PIC methods with novel features.2.2 Discontinuous GalerkinDiscontinuous Galerkin (DG) is a flexible technique for solving PDEs. Using DG,the problem domain is partitioned into elements (triangles, quadrilaterals, curvilin-ear simplicies, arbitrary polyhedra, etc.) and the solution within each element isrestricted to be in some space of polynomials (or other approximation space). Thedistinct feature of DG, relative to conforming FEM, is that the functions are allowedto be discontinuous across element boundaries.The standard Galerkin finite element method cannot be applied on such a func-tion space, as it is not a subspace of the appropriate Sobolev space for typical PDEs.Naı¨vely applying FEM assembly routines to DG approximation spaces results in asingular system, with no coupling terms between adjacent cells. DG addresses thisproblem by applying the standard Galerkin approach within each element, and theninventing an inter-element flux term to handle the discontinuities. This flux term isnot uniquely defined, and varies from one DG formulation to the next. Ideally, theflux should be designed so that the entire method is consistent, adjoint-consistent,conservative, stable (i.e., coercive), symmetric (for symmetric problems), and com-putationally efficient.For the Poisson problem (∇ ·∇u= f ), which is all one needs to solve for simplefluids, Arnold et al. [9] presented a unified analysis of many existing DG formula-tions. The methods they identify as both stable and consistent are the interior-10penalty (IP) method [62], Brezzi et al. [29], non-symmetric interior penalty DG(NIPG) [126], Bassi et al. [15], and local DG (LDG) [47]. Of these, only Brezzi et al.[29] and LDG converge to optimal order independent of non-physical parameters.For certain simplifying parameter choices, LDG reduces to Brezzi’s method, so weconcentrate on LDG.LDG has seen a lot of development since its introduction for the convection-diffusion problem [51]. It has since been applied to: Stokes problem [52], theOseen equations [53], quasi-Newtonian Stokes flow [42], incompressible Navier-Stokes [54], elastic solids [56], Maxwell’s equations [163], magnetohydrodynam-ics [83], Burger’s equation [132], and Hamilton-Jacobi equations [169]. For a morethorough overview of LDG, we refer the reader to the review papers by Cockburnand Castillo [46, 55].Early on, LDG was analyzed for elliptic problems, and Castillo et al. [47] notethat it “might be advantageous because of the ease with which the method handleshanging nodes, elements of general shapes, and local spaces of different types”.These are exactly the problems we face in Chapter 3. Yuan and Shu [171] tookadvantage of custom local spaces to show vastly improved performance when awell-adapted non-polynomial basis is used that matches the expected form of thesolution. Bustinza [41] present a unified analysis of LDG for linear and non-linearPDEs.Kanschat [100] have noted that LDG, appropriately applied, recovers the well-known staggered marker and cell (MAC) grid. It has also been shown that, subjectto certain conditions, LDG recovers exactly divergence-free results even when thedivergence-free condition is only applied weakly, and the basis is not divergence-conforming [57, 83]. Also, special multilevel preconditioners have been proposedfor LDG [97–99].There are more recent DG methods than LDG. One method of note is hybridiz-able DG (HDG) [44, 50, 58], for which the global system to solve is only in termsof a function defined on the boundary of the elements. The compact DG (CDG)method [121] also reduces the computational expense, relative to LDG, by makingthe matrix sparser. Another interesting trend is the development of fluxes based onfitting, rather than differential operators on the boundary, akin to how high-orderfluxes are derived in finite volume methods [21, 110].11Chapter 3 uses DG primarily as a tool, and includes novel results for DG withcomplex time-dependent cut cells. Chapter 6 applies concepts from DG in a purelydiscrete setting to develop a novel preconditioner.2.3 Matching Surface and Simulation ResolutionWhen an explicit surface mesh is used as the free-surface within a fluid simulation,something must be done to address the different resolution of the surface mesh andthe volume mesh on which the spatial PDE is solved.A typical fluid simulation in computer graphics uses a regular staggered gridfor the discretization. There may be multiple surface vertices within a single gridcell forming a bump or other feature that cannot be resolved or reacted to by thesimulation. In the best case, these high-frequency surface features simply persistin a non-physical fashion, but they may very well be unstable and grow in time.One must either regularize the surface, smoothing away any features below thegrid resolution and lowering the effective surface resolution to that of the grid, orlocally increase the simulation resolution near the surface.Within the field computer graphics, there have been several approaches to han-dling the problem of mismatched surface and grid resolution.The simplest approach is to use a low-resolution mesh, or smooth/simplify thesurface mesh until it has no features which are not captured by the simulation grid.As demonstrated by Mu¨ller [117], this has some advantages over implicit surfaces.However, to a large degree, this approach loses sight of the fact that we care muchmore about the surface detail than the bulk flow detail. We do not consider thisapproach any further, and always assume that the surface resolution is higher thanthe volume-mesh resolution.One approach is to regularize the surface so that it is topologically equivalentto the topology seen by an underlying grid-based physics simulation. This getsthe bulk physics correct, but without further regularization, fine scale details thatare not representable on the simulation grid persist. For simulations of elastic orvery viscous materials, this may be perfectly acceptable [165, 166]. For liquidswhere fine surface details should not persist, Wojtan et al. [167] and Thu¨rey et al.[152] apply surface-tension or mean-curvature flow to smooth them in a physically-12motivated way. The disadvantage of this approach is that any dynamic features ofthe surface below the grid resolution are essentially non-physical (unless the phys-ical surface tension is sufficiently high). For situations with less surface tension,good results were achieved with a surface model resembling vortex sheet equations[20].Another approach comes from Brochu et al. [35]. In this case, the surfaceis not modified to agree with the grid in any way [33]. They used an h-adaptivevolume mesh that starts with a regular lattice and then inserts special mesh elementsaround the surface vertices. In this way, the surface is well represented on the grid.This approach does not suffer from the non-physical behaviour of the approachabove, but using an unstructured h-adaptive volume mesh is more computationallyexpensive. Presumably, similar results could be achieved with other h-adaptivediscretizations.Another approach that uses an explicit triangle mesh for the surface worksby maintaining a boundary conforming Lagrangian tetrahedralization of the entirespatial domain [113–115]. The surface triangles are the set of facets separatingthe water and air tetrahedra (or other materials). By using the tetrahedral meshas the simulation mesh, this approach has a perfect match between simulation andsurface degrees of freedom. Again, the use of an unstructured h-adaptive mesh isexpensive.The method in Chapter 3 approaches this problem in a new way, using p-adaptivity to enrich the approximation space within grid cells instead of adaptingthe simulation grid structure itself. This gives the grid enough degrees of freedomto resolve the motion of the surface mesh contained within, but retains much of thesimplicity of regular Cartesian grids.13Chapter 3DG Water3.1 IntroductionSimulating liquids for visual effects demands a high resolution surface with de-tailed motion, but typically not the same high resolution in the entire volume: itis only the surface that we observe, and experiment and theory often indicate thatmost of the interesting dynamics (e.g. strong vorticity) are generated at the surfaceand remain strongest near the surface. We cannot avoid all volumetric compu-tation, but much is gained by concentrating the bulk of the computation on thesurface, as surface area scales quadratically with size while volume scales cubi-cally. We are interested in techniques that take advantage of this opportunity. Forliquid simulation, this encompasses methods for the surface tracker, the volumetricvelocity/pressure solver, and their interactions.Surface tracking may use an implicit method (e.g., level-set or volume-of-fluid), explicit method (e.g., marker particles or mesh), or a hybrid (e.g., parti-cle level set). These approaches are already capable of spending computation andmemory only at the surface. In this chapter we use an existing method to trackthe surface with an explicit triangle mesh, and concentrate on handling the dy-namics. While the surface tracker may appear to be conceptually independent ofthe dynamics’ discretization, artifacts can easily arise if the surface tracker anddynamical model are poorly coupled.Broadly speaking, there have been two approaches to simulating a high-resolution14Figure 3.1: A simulation in a 25× 25× 25 grid generates thin splashes andsheets down to 1/1200 the domain width.surface without a correspondingly high-resolution mesh for the entire volume. Thefirst category uses a simple coarse volumetric fluid model everywhere, and addsa secondary model for the missing high-resolution surface features. These meth-ods are generally quite fast and attractively simple, but typically make simplifyingassumptions that can lead to physically incorrect behavior. The second categoryof methods use an adaptive volumetric mesh that matches the high resolution atthe surface but is low resolution in the interior of the liquid. The unified handlingof all the dynamics is physically consistent and correct, but much more compu-tationally expensive, in large part due to the complexity of using unstructured orsemi-structured meshes.We present an adaptive method that is physically consistent and correct, butstill uses a simple coarse Cartesian grid. To capture high-resolution surface details15we use detailed cut cells at boundaries, and adaptively use richer discrete modelswithin grid cells near the surface, more specifically a p-adaptive DG method.In summary, our core technical contributions are:• the first application of DG with exact cut cells to moving free-surface prob-lems, and• a novel particle advection scheme that requires fewer evaluations of the ve-locity field.We also highlight some ideas that are new to fluid simulation in graphics:• p-adaptive techniques, and• embracing discontinuous approximations at all levels.3.2 Related WorkFluid simulation has a long history in computer graphics, and an even longer his-tory in scientific computing, engineering, and physics. For an overview of themajor techniques in graphics, we refer the reader to Bridson’s book [30].This chapter uses the hybrid Lagrangian-Eulerian FLIP method [175]. FLIP isinterpretable as an extension of the Eulerian velocity-pressure formulation withstaggered time stepping [81]. Since its introduction to graphics, FLIP has seenseveral developments related to this work. Adaptive FLIP particle distributions havebeen introduced for use in an h-adaptive simulation [7] and for detailed trackingof thin sheets [6]. Other particle methods (e.g., smoothed particle hydrodynamics(SPH)) can also have adaptive particle distributions, including generally h-adaptiveapproaches [3] and special two-scale approaches [137]. Interpolation between FLIPparticles and high-order grids has been seen outside of graphics with both smooth[67] and non-smooth [116] interpolants.For tracking the liquid surface, marker particles [81] and level-set methods[119] or their combination in the particle level-set [73] have been most popular.Recently, explicit surface tracking with triangle meshes has gained interest; e.g.reviewed in a recent SIGGRAPH course [168]. In this work we use the El Topoexplicit surface tracking library of Brochu et al. [33].16As mentioned above, some methods embed a high-resolution surface trackerin a low-resolution volumetric simulation and apply a second model to the surfaceto control sub-grid motion. Regularizing the surface so that its topology is repre-sentable in the grid-based physics simulation has been recognized as an importantfeature in these approaches: without regularization, fine scale details that are notcaptured in the simulation grid behave non-physically. For simulations of elastic orvery viscous materials, simple persistence of fine details may be perfectly accept-able [165–167]. For liquids, fine surface details should not persist. Mu¨ller [117]handles this by remeshing the surface every time step, and restricting the geome-try and topology within each cell to be very simple. Thu¨rey et al. [152] appliedsurface-tension and mean-curvature flow to smooth small details in a physically-motivated way. For situations with less surface tension, good results were achievedwith a surface model resembling vortex sheet equations [20].A second class of approaches use h-adaptive volumetric simulations to putsmaller diameter (h) volumetric elements near the surface, but large elements inthe bulk flow, capturing surface dynamics in the same way as the volume dynam-ics. Classic examples of this are octrees [109] and unstructured tetrahedral meshes,including Eulerian approaches [7, 17, 49] and Lagrangian approaches [113–115].Recognizing the simplicity and efficiency of regular grids, chimera grids combinemultiple regular grids of different resolution [72] and tall-cell grids go adaptiveonly in the vertical direction [48, 94]. Special h-adaptive schemes designed forexplicit surface trackers are also possible [35].The techniques in this chapter are not h-adaptive, but rather p-adaptive, in-creasing the resolution in an area by increasing the approximation degree, p, involumetric elements rather than by geometric refinement. To our knowledge, thesetechniques have never been applied in graphics, but have a long history in finiteelement methods (FEMs). When the approximation degree is allowed to be verylarge, one arrives at spectral [22] and spectral element methods [101]. While weuse pure p-adaptivity, both h and p adaptivity can be combined into hp-adaptivemethods [11, 129].Our discretization of the Poisson-projection problem uses a discontinuous Galerkin(DG) approach, a close relative of the famous FEM. For a review of many DG meth-ods for elliptic problems, we recommend the unified analysis by Arnold et al.17[9]. Of these, we make use of the local discontinuous Galerkin (LDG) method,which has been applied to many problems from Poisson to non-linear Navier-Stokes [46, 55]. DG has been used previously in computer graphics for elasticdeformations [102], where its flexibility enabled the embedding of high-resolutionsolids inside low resolution volumetric meshes, similar to our use.3.3 The MethodWe work with the velocity-pressure (u-p) form of the Navier-Stokes equations,simplified by assuming incompressible, inviscid, and uniform-density fluid withfree-slip conditions at solid-fluid boundaries and free-surface conditions with nosurface-tension at the air-fluid boundaries. This common model is appropriate forsimulating medium to large bodies of liquid.We apply a staggered time stepping, with separate advection and projectionsteps. The fluid velocity field is stored with FLIP particles, and the boundary loca-tion is stored with an explicit triangle mesh (Figure 3.6). An outline of one timestep is given in Algorithm 1, including references to the section where each step isdescribed.3.3.1 Discretizing ProjectionThe projection step of our solver takes an intermediate velocity field u˜ from advec-tion and gravity, and applies a pressure gradient to make it divergence-free whilerespecting solid and air boundaries. The projected velocity field u is found byu+∇p = u˜, (3.1)∇ ·u = 0, (3.2)in the liquid domain Ω, and subject to the boundary conditions.Discretizing with DG follows the same outline as FEM. First the domain ispartitioned into cells. Within each cell, some approximation space is specified foru and p. Finally, a weak form of the PDE is satisfied within this space. Unliketypical FEM, DG allows each cell to have different approximation spaces regardlessof discontinuities across cell boundaries or agreement with boundary conditions.18Algorithm 1 Outline of a time stepInput: particle positions, velocities, and surface mesh at time t0Output: particle positions, velocities, and surface mesh at time t11: Projection Step: §3.3.12: Build cut-cell volume mesh §3.3.23: Prepare basis functions §3.3.34: Assemble Poisson matrix §3.3.45: Interpolate velocity from particles to grid §3.3.76: Solve linear system §3.3.47: If viscosity 6= 0, apply viscous update §3.3.108: Interpolate update from grid to particles §3.3.69: Advection Step:10: Add particles where necessary §3.3.811: Advect FLIP particles, including surface §3.3.812: Smooth surface §3.3.913: Update surface tracker: §3.3.914: collision detection/response15: topology changes16: remeshing17: Remove particles where necessary §3.3.818: Add body forces to FLIP velocitiesFurthermore, the cells need not be simple shapes.3.3.2 The Volume MeshTo construct our volume mesh, we begin with a regular Cartesian voxel grid withspacing h. In voxels that contain the boundary of the liquid, we intersect the liquid’svolume and voxel to get a surface mesh of the liquid in just that cell. This isthe detailed cut cell that we use in our discretization, replacing the cubes of theCartesian grid, but keeping the same regular structure (Figure 3.2). This volumemesh conforms to the liquid boundary, exactly capturing all sub-grid features. Theonly simplification comes later, by projecting the PDE into the approximation space.Each triangle of the explicit surface is assigned either the free-surface or solidboundary condition. Triangles that are close to the solid mesh and have normaloriented opposite the solid’s normal are assigned solid-boundary conditions.We need to avoid cut cells with small, skinny, or other otherwise poor geometry19Figure 3.2: A schematic diagram of the detailed cut cells in the volumetricmesh (thick lines) constructed by intersecting the polyhedral liquid do-main (blue fill) with the cubes of a regular grid (thin lines). Small cellsare merged with neighbors (dashed lines). Note that (A) a grid cell maycontain multiple connected components, and (B) a cut cell may havecomplex topology.that can cause poor conditioning of the discretization. This issue has come up withelasticity [102] and 2D fluid problems [80, 124]. We follow a similar heuristicstrategy of merging ‘poor’ cells with adjacent cells to create larger cells with betterconditioning. It is always possible to avoid arbitrarily small volumes because oursurface tracker does not produce arbitrarily thin features; see §3.3.9.Deciding when and how to merge cells is a matter of heuristics, and our resultswere not sensitive to its details. When a cell’s volume is small (V < h3/100), wemerge it with the adjacent cell with which it has the largest shared face. For cellswith intermediate volume (V < h3/4), we merge them with a neighbour only ifthe area (A) of their shared face is sufficiently large (A3/2 > V ). The condition onshared area prevents undesired merging of thin fluid sheets into a single large thincell.20Figure 3.3: A simulation entirely within one grid cell. Note the detailedpressure variation within the thin sheet and detailed sub-grid velocities.This example uses degree 6 polynomials for pressure: 28 variables.The location and size of the grid each frame is arbitrary, as it stores no databetween time steps. In our 3D implementation, we perturb its location each timestep to avoid degenerate cut-cell geometry.3.3.3 The Approximation SpaceDG requires an approximation space for u and p within each cell. For pressure, weuse Pk – the polynomials of degree no more than k (i.e., the span of xaybzc where0≤ a,b,c and a+b+ c≤ k). We could use the same space Pk for each componentof velocity, but we use Pk+1 instead because it provides a more accurate velocityfor the same size pressure solve. Larger disparities, such as Pk+2 for velocity, maybe unstable. Figure 3.3 illustrates how these high-order approximation spaces canprovide lots of sub-grid detail. If a cell has multiple connected components, eachcomponent uses a separate DG approximation space: we simply enrich the basisaccordingly.When applying high-order FEM on Cartesian grids, tensor product polynomialsQk (like Pk but including powers up to max(a,b,c)≤ k) are normally used. Thesemake continuity simple while achieving kth order accuracy. However, Qk has di-21mension k3+O(k2) whereas Pk only has dimension k3/6+O(k2) but achieves thesame order of accuracy. Since DG allows discontinuities, we can use the Pk spaces,so the pressure solve has only one sixth the number of variables of an equivalentcontinuous discretization.The solution is independent of the choice of basis, but it has large effects onthe conditioning of the linear system. We use a nodal basis for Pk defined by pointsin a simplex, as commonly used for FEM. To adapt the basis to the irregularly-shaped cut cells, we approximately fit this simplex to each cell by fitting an ori-ented bounding box aligned with the cell’s inertia tensor and another axis-alignedbounding box, taking the smaller of these boxes, and using the largest tetrahedronthat fits in the box. This provides anisotropically stretched basis functions thatmatch the anisotropy of thin sheets or tendrils of liquid.The cornerstone of p-adaptivity is using different approximation spaces in dif-ferent cells. For cells touching the free surface we use high-degree polynomials(typically cubic or quartic pressure fields). All other cells use one degree less thantheir neighbor of highest degree, but no less than linear pressure, as shown in Fig-ure 3.4. Using linear pressure as the minimum ensures that the hydrostatic case issolved exactly. This p-adaptive approach allows us to use a coarse regular grid forthe whole domain, while still achieving fine-scale details where desired.For smooth functions, p-refinement is more efficient than h-refinement. Errorestimates generally bound the error by O(hp), while the number of degrees offreedom isO((p/h)3). Dividing those expressions, the error per degree of freedomdecreases exponentially with p but only geometrically with h. For non-smoothfunctions, this is false, but fortunately liquid simulations typically have smoothvelocity and pressure, even when the surface geometry is non-smooth.3.3.4 The LDG EquationsWe use the LDG method, which is well-studied, flexible, and has simple parameters[55]. Using LDG, when the solution is exactly representable in the approximationspace, LDG finds that solution exactly, independent of geometry. Consequentlyquiescent free-fall and hydrostatic liquids (linear p, constant u) are exactly re-constructed. In general, the boundary conditions, continuity, and the PDE are all22Figure 3.4: An illustrative p-adaptive simulation. This simulation uses quar-tic polynomials for pressure at the surface (red cells), and linear poly-nomials for pressure in the interior (light cells), with smooth grading ofthe polynomial degree in intermediate cells.23satisfied in a weak sense.LDG for the Poisson problem was introduced by Castillo et al. [47]. This sectionbriefly follows their construction of the equations. We refer the reader to their paperfor full details and alternative forms of the equations.We look first at the divergence-free condition. Multiplying by an arbitrarysmooth test-function q and integrating over an arbitrary grid cell K ⊆ Ω reveals∫K q∇ ·u = 0. Integrating by parts,∫K∇q ·u dV =∫∂Kqu ·nK dA (3.3)where nK is the unit outward normal on K.For continuous functions and conforming FEM, this is sufficient. However, uchanges discontinuously across cell boundaries so it is not clear what value to use inthe boundary integral. These equations must be modified to apply to discontinuousfunctions. DG methods introduce a numerical flux uˆ at discontinuities and modifyEquation 3.3. The discrete solution finds a velocity field in the approximation spacethat satisfies ∫K∇q ·u dV =∫∂Kquˆ ·nK dA (3.4)for all cells K in the mesh and all test functions q in the pressure approximationspace.Different DG methods define different numerical fluxes. Let K+ and K− betwo cells with a shared face, with normals n± and u = u± in K±. Then, lettingθ = 12 −C12 ·n+, LDG definesuˆ = θu++(1−θ)u−−C11(n+p++n−p−). (3.5)where C12 and C11 are parameters.On the domain boundary, the flux is defined using the boundary conditions.At solid-boundaries, uˆ is taken to be the velocity of the solid. At the free-surfaceuˆ = u+−C11n+p+. Both of these are equivalent to a particular definition of u−,p−, C11, and C12 on the boundary.We choose the parameters to be as simple as possible while achieving optimalconvergence. The vector field C12 may be an arbitrary O(1) function. We bypass24defining it, and set θ = 12 directly. The parameter C11 acts as a penalty parameteron discontinuities in the pressure. By using C11 = O(1), LDG requires no mesh-dependent parameters, but using C11 =O(h−1) achieves a better order of accuracy.We use C11 = h−1.Writing u and p in terms of coefficient vectors u and p for the velocity andpressure basis functions, and substituting into the integral equations (3.4), we arriveat an equivalent linear system GTu−Sp = 0. The weak form of the momentumequation is discretized similarly and combined into the symmetric indefinite linearsystem [M GGT −S][up]=[Mu˜0](3.6)where 12 uTMu exactly equals the kinetic energy of the fluid. The right-hand-side u˜is the intermediate velocity field after advection. It is found by interpolation fromthe FLIP particles (see §3.3.7). Any non-stationary solids would add another termon the right-hand-side.3.3.5 Assembly and Linear AlgebraTo assemble the system, we evaluate the integrals with exact high-order quadrature.In the interior of the domain, efficient Gaussian quadrature is available, and the gridstructure allows a table of local integrals to be precomputed once, then added intothe global matrix as needed. However, integration over cut cells must be performedevery time step as their shapes change.To integrate over a cut cell’s volume, we clip the water mesh to the Cartesiangrid cell, triangulate its boundary, then inexpensively tetrahedralize the cell byconnecting each face to an arbitrary central point. By taking sign into accountfor the final exact integral, we allow overlapping and inverted tetrahedra. Thefinal volume integral is a sum of exact quadratures applied to each tetrahedron,after which the tetrahedra are discarded and used for nothing else. The boundaryintegrals are similarly computed by a summation of quadratures over faces.The volume integrals in G (S has no volume integral) can be easily computedfrom M. Each grid cell contributes three identical diagonal blocks to M, one for25each velocity component. Let MK be one of these blocks. Given two functionsa and b (discretely represented as a and b) from Pk+1 (i.e., one component of thevelocity space), then MK satisfiesaTMKb =∫Kab dV. (3.7)In G (see Equation 3.4), the volume integral component GK from a single cellsatisfiesuTGKq =∫K∇q ·u dV (3.8)with q from the pressure space and u from the velocity space. Taking the partialderivative of q with respect to x is a linear operator whose action may be written asDx, so that Dxq represents ∂q∂x in the same Pk basis as q. This matrix is dependenton the choice of basis (representative tetrahedron), so it may not be precomputed.Instead, on a reference tetrahedron, we precompute and store these derivative op-erators Drefx (similarly for y and z). From those matrices, the operators for elementK can be computed from the reference operatorsDx = JxxDrefx + JxyDrefy + JxzDrefz . (3.9)The J terms come from the Jacobian of the transformation from the reference tet.to cell K’s representative tet. Thus, Dx is simple to compute. Another matrix, B canbe precomputed, independent of cell shape, that converts from the Pk basis to thePk+1 basis. With these matrices, BDxq represents ∂q∂x in the same basis as velocity.Introducing one final matrix Rx that extracts just the x-component of a velocityfield, we arrive atuTGKq =∫K∇q ·u dV (3.10)= uTRTx MKBDxq+uTRTy MKBDyq+uTRTz MKBDzq (3.11)where the y and z subscripts are the natural generalizations. From this, we concludeGK = RTx MKBDx+RTy MKBDy+RTz MKBDz26The implementation of this is simpler than the mathematics might make it firstappear. For each polynomial degree, we store B, and the derivative matrices Dref?on the reference tetrahedron. These are all small dense matrices. For each cellwe compute MK , another small dense matrix, by quadrature. Then we apply theJacobian-term to get D? acting on the current cell’s representative tetrahedron. Theaction of R amounts to storing the resulting matrix products MKBD into the correctspot in G. R is never explicitly created at all.The boundary integrals can be treated similarly. Again, a mass-like matrixmust be computed by quadrature (or other means) for each face. From there, thenecessary differential operators and dot products can be written in terms of thesame precomputed matrices.In the interior of the domain, the situation is even simpler. Because the cellshape is constant (a cube), all of the local matrices can be precomputed.To solve system (3.6), we eliminate u to get a symmetric positive definite sys-tem for the pressure only.Lp = (S+GTM−1G)p = GTu˜ (3.12)Because the basis functions in separate cells do not overlap, M is block diagonaland easy to invert directly. We construct L with several dense matrix operationsper cell and solve this linear system with conjugate gradient, preconditioned withblock-wise zero-fill incomplete Cholesky. Substitution after solving system (3.12)gives u = u˜−M−1Gp.3.3.6 Interpolating to the ParticlesOnce u and u˜ are known, we interpolate an update from the grid to the FLIP par-ticles. The DG basis functions naturally define the interpolants u(x) and u˜(x), anda linear operator P that evaluates these at all the particle locations. To update the27particle velocities vold, we use Zhu and Bridson’s PIC/FLIP mixing strategy [175]:vPIC = Pu (3.13)vFLIP = vold+Pu−Pu˜ (3.14)vnew = θvFLIP+(1−θ)vPIC (3.15)We interpret this mixing of vPIC and vFLIP as the solution to a modified ODE thatincludes a decay of the (potentially) noisy particle velocities towards the noise-free grid velocities. For a decay with half-life λ , setting θ = 2−∆t/λ achieves thisdecay in a stable fashion. Unlike directly setting θ , this approach give a timestep independent amount of diffusion. We choose the half-life in terms of somecharacteristic time for the simulation, generally λ = 0.5s (so, e.g., θ ≈ 0.95 whendt = 1/30s).This introduces a spatially-varying amount of diffusion, in agreement with thespatially-varying effective resolution of the grid. This is appropriate when inter-preted as a regularization, but it should not be used as an approximate viscosity.3.3.7 Interpolating from the ParticlesIn previous FLIP methods for computer graphics, the interpolation/approximationof the intermediate particle velocity v˜ to the grid is done by evaluating a weighted-average at the nodal points of the grid. This is inappropriate in this discretizationbecause the nodal points have no particular meaning: they may be oddly distributedin space, outside the fluid, or with another basis construction, nonexistent.Instead, we find the coefficients u˜ that minimize the error when interpolatingback to the particles, min‖Pu˜− v˜‖. In cut cells that contain few particles, manysolutions have zero residual. To make the problem well-posed, we use a regular-ization term to select smooth velocity fields.u˜ = argminu(‖Pu− v˜‖22+η∑K∫K‖∇u(x)‖22)(3.16)where η = 0.1m−1 in all our simulations. The regularization also adds robustnessagainst small amounts of noise in the particle data.28This global optimization problem separates into a small independent least-squares problem in each cell, since we use discontinuous approximation spaceswhich are independent in each cell, and the smoothness term is designed for sepa-rability by ignoring discontinuities across cell faces.3.3.8 Particles and AdvectionThe fluid state consists of the FLIP particles (positions r and velocities v), as well asthe surface triangle mesh (positions, velocities, and connectivity). All the surfacevertices are also FLIP particles, which is vital for capturing thin sheets and threadswhere the fluid volume is so small that there may be no particles in the interior.Every time step, we add and remove particles as necessary so that each cell hasapproximately twice as many particles as it has basis functions for each velocitycomponent, scaled down by volume-fraction in cut cells, ensuring that particlesand grid represent a similar level of detail.When the liquid surface is advected, vertices that were touching the solidboundary before being advected are projected back onto the solid surface at theend of the time step, consistent with the free-slip solid boundary conditions. Toreduce collisions between the solid and liquid mesh, the liquid mesh is kept awayfrom the solid by moving a small amount in the normal direction after this projec-tion.As in other FLIP solvers, the particles are advected using an ordinary differentialequation (ODE) integrator through the static velocity field u computed on the grid.The simplest classical integrator that has stable behavior around vortices is RK3.Unfortunately, this requires 3 evaluations of the velocity field per step, which areexpensive high-order interpolations from the grid. Furthermore, these evaluationpoints may be outside the fluid volume, requiring some sort of extrapolation.We present a new technique that uses only a single evaluation of the velocityfield. With a single lookup into the grid, we build the linear Taylor approximationto u at a particle’s location r, using u(r) and∇u(r). The velocity field in each cell isat least a quadratic polynomial, so ∇u is always available. (In conventional solvers,a local finite difference could be used to estimate the gradient instead, typically us-ing values that are already in cache.) This affine velocity field is enough to describe29Table 3.1: Equations for affine advection schemes that calculate particle dis-placement d given scaled velocity b = ∆tu(r) and velocity gradientS = ∆t∇u(r) at the particle’s initial location r.Method EquationForward Euler d = bRK2 d = (I+ 12 S)bRK3 d = (I+ 12 S+16 S2)bRKN d = (I+ 12 S+ · · ·+ 1N! SN−1)bExact (for affine u) d =[1 01 01 0]exp([S b0 0])[ 0001]Trapezoidal/Midpoint d = (I− 12 S)−1bvortices and shear flow, so we use RK3 to advect through this approximation, andachieve the same linear stability properties as RK3 applied directly to u.To demonstrate the advection scheme, we uniformly distribute particles in a2D box, and advect them through the divergence-free velocity field with stream-function sin(x)sin(y). This velocity field is a single vortex with pure rotation at itscenter, and pure shear in the corners of the [0,pi]2 domain. Figure 3.5 shows theresult of 100 time steps (∆t = 1.5) with RK2, RK3, our approximate RK3, and thecorrect solution. Each integrator uses a different number of substeps, such that allapproaches require the same number of evaluations of u. We experimented withother advection schemes using integrators of various accuracy applied to velocityapproximations of various order, and found this combination to be an appropri-ate trade-off between speed and accuracy. Table 3.1 lists algebraically simplifiedexpressions for several integrators applied to the local affine approximation.Tracing through the velocity field is usually also a significant expense for ex-isting FLIP and semi-Lagrangian simulators, since each velocity evaluation re-quires an unstructured memory look-up and mixing of integer and floating-pointpipelines: we believe our new technique should be of interest in accelerating othersolvers too.30Figure 3.5: Comparison of advection methods. Uniformly-distributed tracerparticles are advected in a stationary vortex. Each integrator takes sub-steps such that all approaches evaluate the velocity field 6 times pertime step. After 100 time steps, RK2 and RK3 both show significantclumping and divergence of particles.31Figure 3.6: A still from the simulation in Figure 3.1, showing the complexsplashing geometry, and high-quality triangles from the explicit surfacetracker.323.3.9 Surface TrackingSurface tracking is handled by the El Topo library [33], which produces qualitytriangulations as shown in Figure 3.6. We made some small some modifications toadapt El Topo to our use case.Our discretization cannot handle arbitrarily thin sheets of liquid, so we pre-vented the surface tracker from creating them. We first tried deleting sheets ofliquid that became too thin (e.g. < 0.02h), but this results in flicker at the tips ofthin sheets as they are deleted. This is visible in our 2D videos, which use thisstrategy. We found better results in 3D by adding a repulsion force that tries tothicken sheets that are thinner than 0.05h. Thickening a thin sheet can cause anincrease in fluid volume and momentum, but our results indicate it is an acceptabletrade-off. Either choice helps ensure good conditioning and behavior of the DGproblem.Some high-frequency surface features are handled poorly by our discretizationand the surface tracker, so we use a small amount of Laplacian smoothing on thesurface every step. As with PIC/FLIP smoothing, we can interpret this as a decay ofeach vertex towards the average position of its neighbours, with half-life between0.1s and 0.5s for different examples. Smoothing is not applied at sharp convexfeatures, determined by dihedral angles between triangles.3.3.10 ViscosityWhile our focus is inviscid flow, we sometimes found adding a small amount of vis-cosity improved the stability and visual appeal of the results. Viscosity is appliedas a separate time-splitting step, after projection, to the grid’s velocity field. We usean approximate implicit formulation of viscosity in which we apply the Galerkindiscretization of the implicit backward-Euler viscous step in each cell, but solveeach cell independently. Viscous fluxes between cells are ignored, so convergenceto the viscous Navier-Stokes solution is not expected, but it serves as an effectiveregularization. This solve can increase the size of the discontinuities at cell bound-aries, but not visibly so for the levels of viscosity in our examples. Figures 3.1and 3.8 (bottom) used kinematic viscosity of 2 · 10−2 and 2 · 10−4 respectively, innon-dimensional units relative to h (length) and 1 second (time). Other examples33Figure 3.7: Sub-grid features persist as non-physical bumps when using lin-ear pressure, but are properly captured by physics when using quarticpressure, leaving only ripples in the surface. This view is approximatelyfive grid cells wide.use no viscosity.3.4 ResultsThe discretization uses an extended model for pressure and velocity in a singlecell. To experiment with the capabilities of this model, we ran a simulation in asingle grid cell. This required no modifications to the method, just shorter edgelengths and more particles per cell than we typically use. Figure 3.3 shows a 1×1simulation using sextic polynomials for pressure. Within the single cell, the methodcaptures thin sheets, sloshing, and interactions with non-trivial solid boundaries.This example is, in essence, a very coarse spectral method, and emphasizes thatour model is solving for full sub-grid physics, not applying a simple smoothing ormore approximate fluid model to the surface.The polynomial degree used in cut cells must be chosen in concert with meshresolution. We examined the behavior as the polynomial degree changes by re-producing an experiment from [35]. We simulated a still pool, with a subgriddisturbance h/2 wide and h/10 high, with surface mesh edge lengths of ≈ h/4.Figure 3.7 shows that with low-order polynomials there are persistent artifacts,but for quartic pressure and higher, the bump is captured by the physics and rip-ples across the domain. Furthermore, simulation quality increases gradually as thepolynomial degree is increased through intermediate degrees. This example uses34Figure 3.8: Examples with thin sheets. Top: sheets move nearly ballisticallyafter a splash. Bottom: thin sheets flow over and around a solid obstaclecausing a complex wake. Grid resolution is shown on the solid floor.The obstacle is less than 3h tall.no unphysical Laplacian smoothing in the normal direction.Two controlled scenarios demonstrate the handling of thin sheets (Figure 3.8).Duplicating the experiment by Thu¨rey et al. [152], we drop a ball of liquid onto aflat-topped pillar whence it expands into a thin sheet. As the sheet moves nearlyballistically through the air, small perturbations in its initial velocity amplify into awavy shape. Second, we run a dam-break with a smooth obstacle less than 3h tall.The liquid forms a thin sheet as it spreads over and around the obstacle, followingits curves and generating a detailed wake. In both scenarios, the thin sheets are stillrepresented exactly in the discretization, due to the use of detailed cut cells.To gain a sense of how practical the new approach is, we compared our codeto a commercial FLIP solver, running the complex scenario from Figures 3.1 and3.6. Figure 3.9 shows the results. Running on a 4-core 2.3GHz Intel Core i7 laptopwith 4GB memory, the new DG solver took an average of 72 seconds per time step,with a 253 grid and sheets as thin as 1/48th of a grid cell. Timings for different35Figure 3.9: Comparison of our method on a 253 grid (top), a commercialFLIP code on a 1213 grid (middle), and the same commercial code on a493 grid (bottom).36steps are given in Table 3.2. We ran the commercial FLIP solver (using a standarddiscretization and sparse, tiled voxels for efficiency) at two different resolutions,effectively 493 and 1213. Both methods took up to three time steps per frame.At 493 the commercial FLIP solver used approximately the same number ofpressure variables as the DG code and was 14× faster at 5.25s per time step, butof course gave far less detailed results: the solver could not resolve sheets thin-ner than a grid cell. At 1213 the commercial FLIP solver took a similar computetime, at 43s per time step, and had a qualitatively similar perceived level of detail.However, the character of the detail is quite different. The DG simulation producessmooth, structured, and very thin features, with sheets approximately 1/500th ofthe domain width in thickness – and even thinner in places. In contrast, the com-mercial FLIP code cannot reliably represent anything below a grid cell, 1/121th ofthe domain width. The commercial FLIP simulation produces rough and splashy re-sults in which the sheets break up into droplets, and cannot produce the smooth andextremely thin sheets of the DG results without drastically higher grid resolution.In contrast to our research prototype, the commercial code is thoroughly op-timized and multithreaded. Given the scope for continued performance and par-allelism improvements in the new code, the ease of p-adaptivity, and the fullydynamic thin features that other solvers cannot capture nearly as efficiently, webelieve this represents a very practical way forward for liquid animation.3.5 DiscussionInterpreting the Method One simple interpretation of this method looks at it asanother spatially-adaptive alternative to octrees or unstructured meshes. Relativeto those methods, this approach has several advantages. First, it keeps the struc-ture of the regular grid, significantly simplifying many parts of the algorithm. Thisapproach also naturally produces anisotropic elements around thin fluid ligamentsand sheets, providing significant reductions in problem size relative to the isotropich-adaptation that is nearly universal. Furthermore, increasing p is more efficientfor problems with smooth solutions. However, it is difficult to achieve large reso-lution differences between two areas of the simulation with just p-adaptivity, pre-37Table 3.2: Detailed timings of the algorithm’s components during the simu-lation from Figure 3.1, run on a laptop with an Intel i7-3610QM and 8GBof memory.Median Component Runtimes (seconds/time step)Projection: build volume mesh and u-p basis 2quadrature + assembly 15linear solver 6Advection: RK3, smoothing, collisions 2Surface: collision handling 38topology changes and remeshing 7rebuilding data-structures 5Entire Step: 82Median Variable Counts (number/time step)Grid: cut cells 1600regular cells 2800pressure variables 50800Mesh: surface vertices 205000cisely because of imposed grid structure. Our p-adaptive approach would alsowork on general h-adapted meshes, complementing the h-adaptive techniques, andindeed would simplify octree-like methods with T-junctions since continuity be-tween elements is not required. We are interested in seeing their combination intoan hp-adaptive method.Another over-simplified way to look at this method is as an approach for regu-larizing sub-grid features that arise when embedding a high-resolution surface intoa low-resolution grid. Unlike the two-model approaches in §3.2, this approach re-solves the velocity-pressure field to sub-grid features in a unified way and a singlesolve. Nevertheless, it cannot handle arbitrary amounts of sub-grid detail withoutextremely expensive and high-order bases. Combining these techniques may im-prove the result quality. The small amount of Laplacian smoothing we apply to themesh can be interpreted in this way.Both of the above interpretations try to cast this method in terms of the familiar,and consequently miss its novel behaviors and particular strengths. This approachis adaptive in a new way. In a sense, the velocity field has no limit to how smalla feature can be, just as the distance between two extrema in a polynomial can be38arbitrarily small. There is no free lunch, of course. The velocity field in a cell islimited to some finite dimensional space, and there is a limit on the total numberand type of ‘features’ per grid cell. Still, the velocity field in a cell has the fullapproximating power of degree k polynomials, regardless of how small the liquidvolume is within that cell. As a result, sub-grid thin sheets (and other features) canstretch, curl, and generally behave physically.Thin Sheets This algorithm handles thin sheets remarkably well. We used sheetsdown to 0.02h thick, giving extremely anisotropic elements. We contrast our be-havior here with the behavior of typical grid solvers. To represent the sheet usinga level set on a standard MAC grid would require at least fifty times the grid resolu-tion, i.e. greater than 2500 active degrees of freedom for a sheet that only uses 35pressure variables in our approach. Methods using explicit surface tracking withina simple regular MAC grid could track the same geometry as our approach, butwill still have difficulty producing an accurate velocity field in areas where sheetscollide and produce complex topology within a single grid cell.We chose a common physical model without surface tension, and without anyeffects from the ‘air’ outside the liquid domain. Physical experiments and theoryidentify that it is the velocity difference at the liquid-air interface that drives insta-bility of sheets, and surface tension that drives retraction at their rims [71]. Withouteither of these effects, sheets and jets spread to be very thin without breaking up.This is in stark contrast to typical fluid simulations, in which discretization artifactsbreak up or delete fluid sheets, despite the fact that they have no physical model forthat behavior. It’s also in stark contrast to our every-day experience with splashes atsmall scale, which atomize almost instantly. Capturing instability-driven break-upin a physical way remains an outstanding problem.Discontinuities This algorithm embraces discontinuities at every opportunity, grant-ing more flexibility and efficiency to the algorithms. Advection, interpolation be-tween the grid and the particles, and the volume integrals all involve purely localdata. The relevant data can be stored contiguously in memory, allowing efficientcomputation on modern architectures. Similarly, large parts of the computation can39be expressed as dense matrix operations on data from a single cell (or pair of cells),and computed with highly-tuned dense BLAS and LAPACK routines.Limitations Our method has some weaknesses when the velocity field is not verysmooth. For example, a small collapsing air bubble requires a velocity field thatquickly switches directions from one side of the bubble to the other, and thesesometimes appear to have a ‘numerical pressure’ slowing their collapse. Giventhat collapsing bubbles are a non-physical artifact of the free surface model, slow-ing their collapse is not necessarily a bad thing. Another important case is whentopological merges occur, introduce a nearly-discontinuous intermediate velocity,and cause large discontinuities in the discrete solution. The only time we saw visi-ble discontinuities was in the first few frames after a droplet impacts a pool. Theyare imperceptible at regular playback speed. Despite these limitations, we still findthe p-adaptive approach works well. These issues could, potentially, be resolvedby using a different approximation space that includes velocity fields like these.3.6 ConclusionThe combination of explicit surface tracking, detailed cut cells, and p-adaptive DGis an effective method for liquid simulation. They provide a qualitatively differentset of performance characteristics than more common discretizations. In partic-ular, they scale much better for smooth fields, and accurately capture thin sheetsand features well below the grid scale. These techniques, applied together or in-dependently, have lots of room for improvement and application in this and otherareas.40Chapter 4High-Order AccurateParticle-in-Cell Methods4.1 IntroductionThe particle-in-cell (PIC) method is an approach to simulating continuum mechan-ics problems featuring flows with high distortion. PIC is a fundamentally La-grangian scheme, in which all material properties are carried on Lagrangian parti-cles that move with the flow. On each time step, a PDE must be solved to computethe forces (and other changes) applied to the particles. To compute these efficiently,all material properties are transferred from the particles to a convenient computa-tion grid where the PDE is solved. Updates are interpolated from the grid back tothe particles.As it is usually described, PIC is inherently low-order accurate in space andusually low-order accurate in time. We present a PIC method that is high-orderaccurate in both space and time. For concreteness, we concentrate on fourth-orderaccuracy, but in a way that is extensible to higher or lower orders. The proposedmethod maintains high-order accuracy even in the face of stiff or constrained dy-namics (such as an incompressibility constraint), through an implicit-explicit time-integration strategy.For a review of the PIC literature, see Section 2.1.414.2 High-Order PICOur new approach is a direct successor to the high-order PIC method by Edwardsand Bridson [67], with the intent to improve on its limitations. Their approach isunable to simulate incompressible Navier-Stokes in the natural velocity-pressureformulationDuDt= ν∇ ·∇u−∇p+ f (4.1)∇ ·u = 0 (4.2)(D/Dt = ∂/∂ t+u ·∇) because it could only handle unconstrained dynamics. Further-more, it was limited to explicit time steps, imposing severe time step restrictionsfor stiff problems.The scheme in this chapter is semi-implicit in time, allowing for large timesteps of stiff effects, such as viscosity. The implicit approach also naturally handles(infinitely-stiff) constraints, such as the divergence-free constraint (equation 4.2),allowing the simulation to work directly in the velocity-pressure form.The remainder of this section derives and describes the discretization. The fullalgorithm is summarized in Figure 4.1.4.2.1 Deriving the Time DiscretizationWe begin with the incompressible Navier-Stokes equations (4.1 & 4.2) as well asthe equation for a particle’s position r in the fluid:dr/dt = u(r) (4.3)Discretizing with a finite number of particles, we arrive at finite-dimensionalvectors r,u,p for the particle positions, velocities, and pressure. Each particle storesthe value of u at its location, e.g. ui = u(ri). This is unlike the original MPMand FLIP papers, or SPH, in which the particles are ‘blob’-like and store extensivequantities such as mass and momentum. We will see later that pressure need notbe stored.Along with the particles, PIC requires a background grid (mesh) and discretiza-421. Input particle data:(a) rn−1 positions(b) un−1...un−k velocity history(c) pn−1...pn−k pressure history (projection method only)2. Reseed particles as necessary3. Compute safe time step4. Compute weights for the multistep methods5. Move particles(a) Compute position update ∆r on particles(b) Interpolate update to the grid and back to the particles(c) Move the particles rn = rn−1+ II∆r6. Compute known BDF derivative terms on particles7. Compute pressure extrapolation on particles (projection method only)8. Interpolate to grid9. On the grid: solve Stokes problem, or projection problem10. Interpolate update from grid to particlesFigure 4.1: Procedure for taking a single time step of the implicit-explicit PICscheme.43tion. To distinguish between the grid and particle representation of the same field,we use capital and lower-case letters. Arbitrary data q on the particles is trans-ferred to the grid Q by Q≈ I(r)q. The linear operator I(r) depends non-linearlyon particle positions. Similarly, the grid’s representation can be interpolated to theparticles using another linear operator q ≈ I(r)Q. In general, I and I need notsatisfy any particular relationship. For accuracy, we assume that if q representsa smooth field, then I(r)q is a high-order accurate representation of that smoothfield. Likewise, we require the same of I(r), the grid-to-particle interpolation. Weslightly abuse notation here; rather than apply the scalar-field transfer operatorscomponent wise for the vector-fields, we simply write the same symbol applied tovector fields.We do not interpret the particles to define any continuous field, except throughan auxiliary grid. Spatial derivatives are defined only on the grid, and for simplicityof notation, we write ∇, ∇·, ∇2 even for the discrete operators. How they arediscretized is not important at this stage, as long as they are sufficiently accurate.Substituting all of these relations back in to the Navier-Stokes equations, wearrive at a PIC-like spatial discretization:dr/dt = u (4.4)Du/Dt = νI(r)∇2I(r)u− I(r)∇I(r)p+ f (4.5)I(r)∇ · I(r)u = 0 (4.6)To discretize in time, we apply several multistep methods, making use of ahistory of velocity data un−k ... un from the last k time steps.4.2.2 Moving the ParticlesEquations 4.5 and 4.6 both have non-linear dependencies on r, through the interpo-lation operators. In order to avoid solving a non-linear implicit problem, we tacklethe advection equation (4.4) explicitly. Particle advection is not a stiff problem, soan explicit scheme causes no difficulty here.To update the particle positions, we use the Adams-Bashforth method, whichgives a formula for the change in position ∆r from time tn−1 to time tn, using past44velocities.∆r =k∑i=1αiun−i (4.7)The weights αi are such that if r(t) is a degree k polynomial in time, then ∆r isexact.This update could be applied directly to the particles (i.e. rn = rn−1 + ∆r).However, this would allow multi-streaming (intersecting particle trajectories) andmakes the simulations less stable (see paragraph 4.4.2). To avoid this problem, ∆ris interpolated to the grid and then back before being applied to the particles.rn = rn−1+ I(rn−1)I(rn−1)∆r (4.8)This is consistent with the general approach in PIC, wherein all updates applied tothe particles come from the grid, and the data on the particles alone is consideredto be ‘noisy’ and unreliable until interpolated to the grid.4.2.3 Stokes SolveWith r handled explicitly, the remainder of the PDE can take I and I as beingtime-dependent rather than r-dependent.du/dt−νI(t)∇2I(t)u+ I(t)∇I(t)p = f (4.9)I(t)∇ · I(t)u = 0 (4.10)Multiplying adjacent operators together, we arrive at the remaining equations to bediscretized in time:(d/dt−A(t))u+B(t)p = f (4.11)C(t)u = 0 (4.12)This is a linear differential algebraic equation (DAE), with time-varying linearoperators. The particular form, with differential (u) and algebraic (p) variablesseparated in this way, puts this in Hessenberg size two form. In Navier-Stokes,CB is approximately the Laplace operator and should be invertible, which makes45this an index-2 DAE. These DAEs are solvable with the backward differentiationformula (BDF) method [10], which is stiffly stable up to 6th order accuracy. UsingBDF, the time derivative at time tn is approximated bydudt(tn)≈k∑i=0βiun−i (4.13)with weights β that make the approximation exact for low-order polynomials. Sub-stituted into the first equation,β0un−νI∇2Iun+ I∇Ipn = f˜n (4.14)where all matrices have implicit n subscripts, and f˜ contains all the forcing termsand BDF history terms.Solving this equation (along with the constraint equation) would be a completeBDF time step. However, this equation involves the particle-grid transfer operators,which is undesirable, as they are generally quite complex and make it impossibleto directly apply efficient linear solvers developed for discretizations on the grid.So, we make some approximations to eliminate the transfer operators and solvea system entirely on the grid. Recalling that II is approximately the identityoperator for smooth functions, we interpolate all terms to the grid and back to theparticles uniformlyβ0IIun−νI∇2Iun+ I∇Ipn = IIf˜n (4.15)Typically, I will be tall and full rank, implying thatβ0Iun−ν∇2Iun+∇Ipn = If˜n (4.16)which, after re-labeling the Iq-like terms,β0Un−ν∇2Un+∇Pn = F˜n (4.17)is a set of equations entirely on the grid. Applying the same approach to the con-straint equation, and combining them together, one arrives at an indefinite linear46system that must be solved on each time step:[(β0−ν∇2) ∇∇· 0][UnPn]=[F˜n0](4.18)This is a well-studied problem that comes from solving time-dependent Stokesflow. One can discretize it in whatever way is convenient and accurate (e.g., FEM).4.2.4 Interpolating from Grid to ParticlesFollowing the derivation thus far, to find the new velocities on the particles fromthe values on the grid, one must solve the underdetermined linear system,Un = Inun. (4.19)A good approximate solution is found by interpolating the new velocity from thegrid to the particles uPICn = InUn. However, even if I is a pseudo-inverse of I,the data in the nullspace must be included in the dynamics in some way, or theywill accumulate noise. Consistent with other methods, we smooth this noise byregularization. Both ‘FLIP-style’ uFLIP and ‘PIC-style’ uPIC solutions are computedand combined:uFLIP = un−1+ In(Un−Un−1) (4.20)uPIC = InUn (4.21)un = θuFLIP+(1−θ)uPIC (4.22)θ = 2−(tn−tn−1)/τ . (4.23)The equation for θ produces a stable and time step-independent decay of unre-solved features and noise with a half-life τ [67]. We set τ to a short, but naturaltimescale for the problem (e.g. 0.5 seconds for a simulation lasting several sec-onds).Note that the pressure on the particles, pn, was not used, so it need not ever bestored.474.2.5 ReseedingOver time, initially uniform particle distributions become less uniform and requiresome reseeding. For efficiency, if there are many particles in a cell (e.g. > 15),we remove some particles from the simulation. For accuracy, when the number ofparticles in a grid cell becomes small (e.g. < 5), we add additional particles.Newly created particles require a synthetic history of velocities (and possiblypressures §4.3), for use in the multistep method. We create this history by pro-jecting the history of all the existing particles to the grid and interpolating thosehistory values at the new particles’ locations. This results in kth order accurateapproximate history values on new particles.If aO(1) fraction of the particles are added and/or removed each time step, thenthis error could add up to a (k−1)th order error term over the course of a simulation.In that case, (k+ 1)th order accurate history would need to be constructed duringreseeding. However, we do not observe this loss of accuracy in our experiments.Reseeding typically improves the result.This reseeding operation is quite expensive. The entire particle state needs tobe projected to the grid, and then interpolated on to new particles. Alternativeapproaches would be an avenue of future research.4.2.6 StartupMultistep methods all have a problem with startup, because the high-order methodsneed a history over multiple time steps in order to work. We tackle this by usingadaptive time step sizes and adaptive order. For a simulation with desired timesteps of size ∆t, we begin with a step of size ∆t2 using BDF1. Each subsequent timestep uses a higher-order BDF method until there is enough history for BDF4, thenthe step sizes are increased (by a factor of no more than 1.2×) until the desired stepsizes are reached.4.3 A Projection MethodSolving SPD systems, in terms of just u or p, is often preferable to solving theindefinite Stokes system (4.18). This sections derives a projection method withone SPD problem for implicit viscosity and another SPD problem for a pressure48correction.Using a history of pressures from past time steps, stored on the particles, poly-nomial extrapolation gives a predicted pressurep˜n =k∑i=1εipn−i (4.24)that we project to the gridP˜n = Inp˜n. (4.25)By substituting this for pressure in the viscous part of the Stokes equations (4.18),one arrives at a simpler SPD problem(β0−ν∇2)U˜n =−∇P˜n+ F˜n (4.26)for an approximate velocity field U˜n. This solution can then be projected to satisfythe divergence-free constraint by finding φn such thatUn+∇φn = U˜n (4.27)∇ ·Un = 0 (4.28)which is a SPD Poisson problem ∇2φn = ∇ · U˜n.It remains to find the corrected pressure Pn from P˜n and φn. Substituting thisform of Un into equation 4.17 and subtracting equation 4.26, one arrives at anequation for the updated pressure∇Pn = ∇P˜n+(β0−ν∇2)∇φn (4.29)In the continuous domain, ∇2∇ = ∇∇2, so we commute these operators, and for-mally integrate once, to compute the corrected pressure,Pn = P˜n+(β0−ν∇2)φn (4.30)Once discretized, ∇2 and ∇ may not exactly commute, in which case this pres-49sure does not exactly satisfy the Stokes equations. However, it does satisfy theStokes equations with a modified right hand side. Using discretized linear oper-ators G ≈ ∇, Au ≈ β0− ν∇2 (for vector fields), and Ap ≈ β0− ν∇2 (for scalarfields), the modified right hand side term isF˜n+(AuG−GAp)φn. (4.31)With kth order accurate discretizations for A∗ and G, the extra term (AuG−GAp)is a kth order accurate approximation to 0. If it were identically zero, the projectionmethod would exactly solve the Stokes system, independent of the extrapolatedpressure. However, in general, an accurate pressure prediction is necessary.Note that P˜n is a kth order accurate extrapolation, but local (k+ 1)th order ac-curacy is required to achieve global kth order convergence. We expect that theupdated Pn after projection is (k+1)th order accurate, as has been shown in similarcontexts in other works [38, 84, 174].If there are boundary conditions in terms of pressure (e.g. the free-surface p= 0condition), then this projection method is cumbersome. These boundary conditionsneed to be enforced via φ , but the conditions become more complicated because itis (β0−ν∇2)φ that gets added to the pressure, not φ directly. We do not addressthis issue, and simply prefer the coupled algorithm for this case.The projection scheme is not exactly implementing BDF, so the stability re-quirements are different and unclear. The viscous and projection steps are stableoperations, so we do not expect any significant additional instabilities from them.This approach solves two simple linear systems instead of one more complexStokes solve, but it is less accurate and requires a history to be stored for pres-sure. Whether or not this is the correct engineering trade-off will depend on theapplication. We prefer not to use this approach. While Stokes solvers are morecomplicated, they have seen a lot of research and effective solution strategies exist[135].4.4 ResultsIn our 2D implementation, the PDEs are solved with a spectral discretization on aregular grid, with periodic boundary conditions. Interpolating from the particles to50101 102 10310−1010−810−610−410−2100N||error||Error for Inviscid Taylor Green Vortex reseed, ∞−normnoreseed, ∞−normreseed, 2−normnoreseed, 2−normthird−order guidefourth−order guide101 102 10310−1010−810−610−410−2100N||error||Error for Viscous Taylor Green Vortex reseed, ∞−normnoreseed, ∞−normreseed, 2−normnoreseed, 2−normthird−order guidefourth−order guideFigure 4.2: Error convergence of PIC on the Taylor Green vortex. Plots of theL2 and L∞ velocity error, for the inviscid (left) and viscous (right) case.Simulations use a N×N grid and 2N time steps (after startup). Resultsare shown with and without particle reseeding.the grid is done with moving least squares (MLS), while interpolating back is donewith bi-cubic interpolation. These are similar to the choices in the explicit method[67]. In the spectral discretization, the projection and Stokes algorithms produceidentical results, because the discrete gradient and Laplacian commute. Thereforewe only show results using the Stokes approach.4.4.1 Taylor Green FlowThe 2D Taylor-Green vortex is a closed form solution to incompressible Navier-Stokesu(x,y) = F(t)sin(x)cos(y) (4.32)v(x,y) =−F(t)cos(x)sin(y) (4.33)F(t) = exp(−2νt). (4.34)in domain Ω = [0,2pi]2. This looks simple when written in the Eulerian coordi-nates, but is actually non-trivial behaviour on the particles as they move throughthe fluid. We simulate this problem to measure convergence of our method.Error in velocity, on the grid, is measured at time t = 3.0. Figure 4.2 shows51the convergence of the method, both with and without reseeding, and with (ν =0.5) and without (ν = 0) viscosity. In every case, we observe approximately 4thorder convergence. In the inviscid case, reseeding causes a minor decrease in L∞convergence at high resolutions.4.4.2 A Complex Rayleigh Taylor InstabilityTo generate some more interesting dynamics, we add a Boussinesq buoyancy termto the dynamics. The PDE is the same as before, with forcing term f = −ρ∇Πusing fluid ‘density’ ρ and gravitational potential energy Π. We evolve ρ passivelythrough the fluid, Dρdt = 0, except for PIC/FLIP regularization as applied to the ve-locity. This regularization is necessary, since Rayleigh-Taylor instabilities tend togenerate strong discontinuities in the density field, and our interpolation operatorsovershoot significantly around discontinuities.Our setup uses periodic boundary conditions, with gravitational potential Π=cos(x)+ cos(y) in a domain [0,2pi]2. The initial fluid density isρ =−2tanh(3cos(x)+3cos(y)+0.6cos(4x+1)cos(4x+0.5)+1.5)which is designed to generate strong Rayleigh Taylor instabilities with complexbehaviours and no symmetries. The simulation uses a 255×255 grid, ∆t = 1/40,regularization half-life of τ = 0.2 (for velocity and density), and was simulatedfrom t = 0 to t = 15. Computation on a 4-core Intel i7 with 8GB of RAM tookapproximately 0.7 seconds per time step, of which 75% is spent interpolating fromthe particles to the grid. Figure 4.3 shows several frames of the resulting simula-tion.Multistreaming In this experiment, we also tried allowing multistreaming by ap-plying the ∆r update to particle positions directly, without interpolating to the gridand back. In this case, the multistreaming simulation develops noise at some pointsin the flow that subsequently cause extreme oscillations and velocities that pollutethe entire simulation. We do not recommend allowing multistreaming.52Figure 4.3: Fluid density in a complex scenario, at time t = 0,1.5,4.5,15. Aspatially varying gravitational potential causes Rayleigh-Taylor insta-bilities and pulls the buoyant (white) fluid to the center.534.5 DiscussionThis project is a second exploration into globally high-order accurate PIC, followingup on Edwards and Bridson [67]. As intended, this approach addresses severalshortcomings on the previous work: it is able to handle the constrained dynamicsof the velocity-pressure formulation of Navier-Stokes, and its semi-implicit natureallows it to take large time steps despite the presence of a stiff viscosity term.There are numerous outstanding questions left by this work. First, we’ve ne-glected boundary conditions. We expect that adding boundary conditions to thegrid solve will be sufficient, with little or no modification to the parts of the algo-rithm working with the particles. There are also still outstanding questions aboutthe effect of reseeding. We would like a better theoretical or experimental under-standing of whether or not reseeding needs to be done at a higher order of accuracythan the remainder of the algorithm, or ways to work around that requirement ifnecessary.4.6 ConclusionHigh-order accuracy is possible for PIC simulations of Navier-Stokes. The key isto treat the prolem as a DAE and apply techniques for such problems. By usingdifferent multistep methods, the linear and nonlinear portions of the problem areseparated.Some of these ideas (using spatially high-order operators) have already beenfruitfully applied in a graphics setting, as presented in Chapter 3.54Chapter 5Free-Surface GeometryThis chapter proves the correctness of the continuous collision detection CCD algo-rithm by Brochu et al. [36]. The full paper, which presents the CCD algorithm anddemonstrates its utility on several applications, is in Appendix A. The reader mayprefer to read this appendix before the remainder of this chapter. It includes a moregentle introduction to the concepts, reasoning, and value of the approach.In a simulation of water for computer animation, the free surface is the primaryvisual element. It is often the only place where the fluid motion is directly visible:an accurate representation of the surface is important. Using a Lagrangian trianglemesh that tracks the surface, as in Chapter 3, is one natural way to do this. A trian-gle mesh has good temporal coherence, has no problem representing and trackingthin and sharp features, and is a memory-efficient representation of the surface.The surface of a fluid is never self-intersecting. This is a desirable invariant tomaintain in the numerical model as well. In order to maintain this invariant for asurface represented by a moving triangle mesh, one ultimately needs methods fordetecting collisions between moving mesh elements (point-vs-triangle and edge-vs-edge collisions). This is the continuous collision detection (CCD) problem.The predicate form of CCD, which determines (true or false) whether a pairof elements collide, may have false-negatives (missed collisions) or false-positives(extraneous collisions) due to some form of approximation or simply floating-pointrounding errors. Depending on the application, these errors may or may not be55acceptable, but they are certainly never desirable. We present the first exact CCDalgorithm for moving triangles.The currently most popular CCD method [123] solves a cubic equation for thetime at which the elements are coplanar, then interpolates the geometry to thattime, and checks if the elements are intersecting (a simpler predicate). This ap-proach writes the solution in terms of a series of intermediate constructions thathave irrational values that cannot be exactly expressed in floating point. This is es-sentially impossible to implement without introducing rounding errors. Typically,tolerances are added to every test to reduce false-negatives, but this introducesfalse-positives and requires parameter tuning. All methods, prior to ours, sufferfrom similar inexactness problems.Our new approach is exact. It has no false-positives or negatives. Givenfloating-point inputs, it always computes the correct result, as if computed withexact arithmetic. Since the publication of this approach, several papers have pub-lished other methods with similar exactness guarantees [150, 161].Our approach transforms the CCD question into a root-counting problem. Forexample, consider edge-edge collisions. Let the function f (θ1,θ2, t)→R3 take thebarycentric coordinate θ1 of a point along the first edge and θ2 of a point alongthe second edge, and return the vector from the first point to the second point attime t. If this function returns 0, then those points are coincident, indicating that acollision is occurring.There is a relationship between the number of roots of f in a domain Ω, andwhether or not the image of the domain boundary f (∂Ω) ‘contains’ the originpoint. More precisely, we characterize this with a lemma:Root Parity Lemma. Suppose Ω⊂ Rn is an n-polytope.Suppose ~F :Ω 7→Rn is C2, has p <∞ roots in Ω, has no roots on Γ= ∂Ω, andhas non-singular Jacobian at each root.Suppose R is a ray from~0 to infinity. Call any point ~x ∈ Γ such that ~F(~x) ∈ Ra crossing point, then the crossing number q is the number of crossing points.Suppose there are no degenerate crossing points (defined later).Then, p≡ q mod 2.For many applications, element pairs with an even number of collisions can be56safely ignored. As a consequence, we need not count the number of roots directly.Instead, we measure the crossing number.For the particular f that appears in the CCD problem, the surface (Γ) is com-posed of bilinear patches, for which we describe exact ray-patch intersection tests.From these tools, we assemble an exact CCD algorithm.This work was published in SIGGRAPH 2012 [36]. My contribution to thiswork is the proof of correctness. It proves the root parity lemma, which is funda-mental to the technique. The root parity lemma does not apply in all cases, so italso proves that the algorithm produces the correct output even in degenerate caseswhen that lemma does not apply. The proofs are in the next section of this chapter.Given my limited contribution to the remainder of the paper, it is included only asAppendix A.The exact method described here was essential to achieve reliable results withthe simulations in Chapter 3.5.1 The Root Parity Lemma and Proof of Correctness5.1.1 OutlineWe present the root parity lemma to relate the roots of a function to the action ofthat function on the domain boundary. First, we prove the analog of the lemma forpiecewise linear functions, then for more general C2 functions.For a linear function, on a simplicial domain, the root-party lemma is easyto show: either the simplex has one root and its image contains the origin, it haszero roots and its image does not contain the origin, or the function/geometry isdegenerate.For a piecewise linear function, defined on a simplicial mesh, each simplex canbe analyzed as above. We then show that the crossing points contributed by eachsimplex in the mesh add up appropriately.Finally, we use the good behavior on piecewise linear meshes to prove the resultfor C2 meshes. We approximate the function with its piecewise linear interpolanton a mesh. When this mesh is sufficiently fine, there is a one-to-one mappingbetween the interpolant’s roots and the roots of the original function. We show57this by putting a small simplex around each root, and sufficiently small simpliceselsewhere. Furthermore, by putting each crossing-point directly into the mesh, weguarantee a one-to-one correspondence between the crossing points of the originalfunction and its interpolant.The intersection function used in the CCD algorithm is a well-behaved C2 func-tions, so we argue the correctness of our algorithm on top of the lemma.5.1.2 Piecewise Linear FunctionsBefore proving the root parity lemma, we prove a related result for piecewise linearfunctions.Root Parity for Piecewise Linear Functions. Suppose Ω ⊂ Rn has a simplicialdecomposition by mesh M.Suppose F :Ω 7→Rn is continuous, linear within each simplex of M, has p<∞roots in Ω, has no roots on Γ= ∂Ω, and is one-to-one in a small ball around eachroot.Suppose R is a ray from 0 to infinity. Call any point x ∈ Γ such that F(x) ∈ Ra crossing point, then the crossing number q is the number of crossing points.Suppose that the ray is not tangent to F(Γ) at the image of any crossing points, andthat q < ∞.Then, p≡ q mod 2.Suppose the hypotheses are true. Then the image under F of any simplex in thismesh is also a simplex. Therefore the image of the entire mesh is also a simplicialmesh, M′, though it may be self-intersecting. The function F is uniquely definedby the position of the vertices of M′, and roots of F are given by simplices in M′that contain the origin.We may make various simplifying assumptions about M′. Such an assumptionwill be without loss of generality if, for any mesh M′, a perturbation exists thatmodifies it to satisfy these conditions without changing the parity of the number ofroots or the parity of the crossing number for the ray.Assume the origin does not lie on any facets of M′. This is possible becauseF is invertible around each root. Therefore a small perturbation exists which will58push the origin from being on a facet to being in only one of the adjacent simplices;therefore not changing the number of roots.Assume also that M′ has no degenerate simplices. That is, each simplex hasvolume. Simulation of Simplicity [65] addresses almost exactly this problem. Asimilar argument applies here. The only additional concern is that we do not mod-ify the number of roots. Let ε > 0 be the distance from the origin to the nearestfacet. Since the perturbation can be arbitrarily small, no vertex needs to be movedmore than ε , so the number of roots does not change.Now, consider a ray R from 0 to infinity satisfying the conditions of the rootparity lemma. Such a ray exists because F(Γ) is smooth almost everywhere anddoes not include 0. Define a hit to be an intersection of the ray with the boundaryof a simplex. If the ray intersects a facet shared by two simplices, then this is twohits. Each simplex can contribute hits.First, consider a simplex from M with no roots in it. The image of this simplexdoes not contain the origin. The ray crosses its boundary either zero or two times.This simplex contributes an even number of hits.Second, consider a simplex from M containing a root. The image of this sim-plex contains the origin in its interior. Consequently, the ray intersects its boundaryonce, contributing an odd number of hits.Summing up all the hits, only simplices with roots contribute odd parity to thesum, so the parity of the number of hits equals the parity of the number of roots.Likewise, the parity of the number of hits equals the parity of the crossing number.This is because any intersection of the ray with an interior facet contributes twohits. Only the boundary facets, coincident with Γ, contribute odd parity. So theparity of the number of roots equals the parity of the number of crossings, as wasto be shown.5.1.3 The Root Parity LemmaRoot Parity Lemma. Suppose Ω⊂ Rn is an n-polytope.Suppose F :Ω 7→Rn is C2, has p <∞ roots in Ω, has no roots on Γ= ∂Ω, andhas non-singular Jacobian at each root.Suppose R is a ray from 0 to infinity. Call any point x ∈ Γ such that F(x) ∈ R59a crossing point, then the crossing number q is the number of crossing points.Suppose that F(Γ) is smooth at the image of any crossing points, that the ray is nottangent to F(Γ) at any these points, and that q < ∞.Then, p≡ q mod 2.Suppose the hypotheses of the root parity lemma are true. Then, let the entiredomain Ω be tessellated with a simplicial mesh M, as is possible for a polytope.Let each simplex have circumradius less than δout, and let each root of F be at thecentroid of a regular simplex. We take the existence of such a mesh to be trivial.Let F¯ be the piecewise linear interpolant of F on the mesh M. We argue belowthat there exists such a mesh for which F¯ and F have the same number of roots,the same crossing number, and F satisfies the hypotheses of the previous section.From this it will follow that the root parity lemma is true.RootsIn this section, we show that if the mesh is sufficiently fine, then in any simplex,either F and F¯ both have roots, or neither F or F¯ have roots.Let x?, be an arbitrary root of F, and let Σ be the simplex containing it. Sincethis simplex is regular, it has inradius δout = κδin with constant κ . In addition to thefunctions F and F¯, we introduce Fˆ = J(x− x?) which is the linear approximationto F about the root (i.e. J = ∇F(x?)).Clearly Fˆ(x?) = 0, so Fˆ has a root in Σ. Now, consider a point q on the surfaceof Σ.‖Fˆ(q)‖2 = ‖J(x−x?)‖2≥ ‖J−1‖−12 ‖x−x?‖2≥ δin‖J−1‖−12Therefore the image of Σ under Fˆ, which is also a simplex, has its surface at leastδin‖J−1i ‖−12 away from the origin.By Taylor’s Theorem, we have,‖F(q)− Fˆ(q)‖ ≤ c1‖q−qi‖2 ≤ c1δ 2out60where c1 <∞ is a constant related to ct . Similarly, by the approximation quality oflinear interpolants (for proof, see e.g. [134]), we have‖F(q)− F¯(q)‖ ≤ c2δ 2outwhere c2 < ∞ is another constant related to ct . These can be combined to get therelationship,‖Fˆ(q)− F¯(q)‖ ≤ c3δ 2out.The origin is at least δin‖J−1i ‖−12 away from the surface of Fˆ(Σ), and Fˆ(Σ) isno more than c3δ 2out away from F¯(Σ). Since F¯(Σ) is also a simplex, it follows thatifδin‖J−1i ‖−12 > c3δout2⇔ 1/(κc3‖J−1‖2)> δout,then F¯(Σ) will also contain the origin, and F¯ will also have a root in Σ. Sinceκc3‖J−1‖2 is some constant bounded away from zero, we can choose such a δout.Consequently, both F and F¯ have a single root in Σ, and Fˆ is locally invertible there.Since there are finitely many roots, this constraint on δout can be met at all roots bysome constant δout > 0.Now we show that in the other simplices of the mesh, F¯ has no roots. Contin-uing with the point q on the surface of the simplex around a root, we find that‖F(q)‖ ≥ ‖Fˆ(q)‖− c1‖q−x‖2≥ ‖J(q−x)‖− c1δout2≥ ‖J−1‖−1‖q−x‖− c1δout2≥ ‖J−1‖−1δin− c1δout2≥ ‖J−1‖−1δin− c1κ2δin2≥ ‖J−1‖−1κ−1δout− c1δout2≥ αδout− c1δout2where α > 0 is the minimum value over all roots of ‖J−1‖−1κ−1.Let S be the union of the simplices which contain roots of F, as have alreadybeen addressed. Then, Ω0 = Ω\S is the remainder of the domain, including the61boundaries. In Ω0, ‖F‖> Fmin > 0. For sufficiently small simplices, the minimumvalue of ‖F‖ will occur on the surface of one of the simplices surrounding a root.For simplicity, assume that this is the case. So, Fmin > αδout− c1δout2 everywhereoutside the simplices surrounding the roots. Then at all points x outside the root-simplices ‖F¯(x)−F(x)‖ ≤ c2δ 2out, and so‖F¯(x)‖ ≥ ‖F(x)‖− c2δ 2out≥ Fmin− c2δ 2out> αδout− c1δout2− c2δ 2out> αδout−δout2(c1+ c2)For sufficiently small δout, we have ‖F¯(x)‖ > 0 because α > 0. So, outside of thesimplices surrounding the roots, F¯ has no roots.It follows that F and F¯ have the same number of roots.Crossing PointsTo show that F¯ also has the same crossing number as F(Γ), we construct two newfunctions H : Γ 7→R and G : Γ 7→Rn−1 by decomposing F into its first componentand the remainder: F = [H,G]. So H(x) = F1(x) is the first component of F(x)and G(x) is components 2 through n of F(x).Without loss of generality, by a simple rotation, let the ray be the positive x1axis. In that case, H measures the distance along the ray of the closest point toF(x). Whereas, G measures a vector-distance from F(x) to the closest point on theray. Consequently, a point x is a crossing point of R and F(Γ) iff H(x) > 0 andG(x) = 0.Now, consider the functions G¯ and H¯ defined as above, but using F¯ insteadof F. As before, x is a crossing point of R and F¯(Γ) iff H¯(x) > 0 and G¯(x) =0. Notice also, H¯ and G¯ are the linear interpolants of H and G on the boundaryelements of the mesh, which form an n−1 dimensional simplex mesh in Γ. We canuse similar arguments as above to establish an exact correspondence between thecrossing points of R and F(Γ) and the crossing points of R and F¯(Γ) by matching62the roots of G and G¯ and the signs of H and H¯ at those roots.G may not be locally invertible at all of its roots, so we take a different approachthan above for F. Add all of the crossing points of R and F(Γ) as vertices to themesh; by hypothesis there are a finite number of them. This does not contradictthe earlier construction of a single simplex around each root of F, because F hasno roots on Γ. With these vertices in the mesh, H(x) > 0 and G(x) = 0 impliesG¯(x) = 0. Let any simplex containing a root of G be small enough that it containsonly one. By construction, it will be at a vertex, the root-vertex. All the othervertices of this simplex form an (n− 2)-face, the far-face. By using anisotropicsimplices around the roots, the far-face can always be distance δ > 0 from the root,but fit in an arbitrarily small ball of radius r. Consequently, ‖G‖ > α > 0 on thefar-face. A described earlier, with a sufficiently fine mesh, G¯ will have no rootson the far-face, and consequently no roots anywhere in the simplex, except at theroot-vertex.Let ε be half the minimum magnitude of H at any root of G. When −ε < H <ε , then ‖G‖ > δ > 0. So, in a sufficiently fine mesh, G¯ 6= 0. This follows fromthe same bound used above to analyze F¯. When H < −ε , in a sufficiently finemesh we get H¯ < 0. Finally, when H > ε , a sufficiently fine mesh will have H¯ > 0and, outside of the simplices constructed above, G¯ 6= 0. Combined with the resultsabove, we conclude that F and F¯ have the same crossing points.Combined with the earlier result, we have that F and F¯ have the same numberof roots, and exactly the same crossing points. So we have shown that the rootparity lemma is true.5.1.4 Topological DegreeThe root parity lemma is closely related to proofs and concepts from topologicaldegree theory. In some sense, it is a specialization, and our proof is subsequentlytailored for our application.Topological degree defines a multidimensional generalization of the windingnumber. One definition of topological degree is the sum of the signs of the Jacobiandeterminants at all the roots of F. For non-singular roots, the parity of topologicaldegree is the same as the parity of the number of roots, which is what we want to63measure. We refer the interested reader to the text by O’Regan et. al [118] for moredetails about topological degree.While the definition above is in terms of the roots in the domain’s interior, anequivalent expression depends only on the boundary. In fact, it can be calculatedby summing ±1 (depending on some property of F and its derivatives) at eachintersection between a ray and the image of the boundary. Such an approach is de-scribed, for example, by Aberth in his text on numerical methods [2]. Because weonly care about parity, our algorithm can simply count the number of intersections.5.1.5 Proof of the Collision AlgorithmThe function that the algorithm uses is C2 with bounded curvature, defined in apolytope domain. When the input geometry to the collision detection algorithmproduces a function F satisfying the hypotheses of the root parity lemma, then thecorrectness of the algorithm follows trivially.However, F doesn’t always satisfy the hypotheses. It is always C2 and it alwayshas bounded second derivative. However, it may have roots on Γ, it may haveinfinitely many roots, and it may have a singular Jacobian at any root. We arguehere that the algorithm still does the right thing for collision detection in this case.If F(x) = 0 on Γ, then F(Γ) is at the origin. So, the ray’s vertex lies on thesurface. The algorithm immediately identifies this as a collision without botheringwith ray crossings.If F has an infinite number of roots, then because of the multi-affine form of F,it must have a root on Γ, and this case is detected as above.Otherwise, if ∇F is not invertible at a root, then there exists a small perturba-tion to F strictly in the interior of the domain, such that it is not singular at anyroots and is unchanged on Γ. This perturbation does not affect the initial and fi-nal configuration of the mesh or the linear trajectories of the extremes of the meshelements: it only causes an arbitrarily small adjustment to the interpolated trajec-tories. Thus, it will not change whether or not a significant collision has occurred,and the algorithm returns the correct result.The root parity lemma also has conditions on the ray that the algorithm mustmeet. The only non-smooth regions of Γ are the edges. The algorithm traces a new64ray when an intersection with an edge is detected. If the crossing number is infinite,then the ray must hit the edges, which is detected as above. The requirement thatthe ray not be tangent is handled by the ray-patch parity algorithm, which returnsthe correct (even) parity in that case.We note that while this proof contains several cases involving perturbing F orthe ray, the algorithm does not need to do this. This is only a conceptual perturba-tion for use in the proof.5.2 Discussion of the Odd-Parity Counting ArgumentThe claim made in the paper is that if a pair of mesh elements (a triangle and point,or two edges) whose vertices are moving on constant speed, linear trajectories col-lide an even number of times over the course of a time step, we can safely ignorethe collisions (i.e. no collision resolution is required).Although we do not have a rigorous proof for the correctness of this approach,we argue here that at the very least, a few desirable properties can be maintained.This model will clearly result in different behaviour than detecting and resolvingevery collision, however most collision processing systems already make simplify-ing assumptions such as not solving collisions in order of simulation time, resultingin an approximate yet plausible solution.5.2.1 No Self-Intersections are IntroducedSuppose we are given a mesh that is intersection-free at t = 0, and has at leastone edge-triangle intersection at t = 1. If collisions occur at the time boundaries(t = 0 or t = 1) or element boundaries (e.g. , at the endpoint of an edge, or theedge of a triangle) our algorithm always identifies this regardless of parity, andreports a collision. Otherwise, every collision changes the edge-triangle pair froman intersecting state to a non-intersecting state, or vice-versa. To switch from non-intersecting at t = 0 to intersecting at t = 1, there must have been an odd numberof total collisions, which our algorithm detects via the root-parity argument, andreports a collision.655.2.2 Disjoint Volumes Remain DisjointSuppose we have two closed meshes which define disjoint volumes at t = 0 (i.e.one mesh does not intersect or lie inside the other). Then at t = 1, if one meshlies inside the other, our CCD algorithm would report a collision. A vertex outsidea closed surface must collide with the surface an odd number of times to end upinside the surface. If there is an odd number of collisions between the vertex andthe surface as a whole, then there must be at least one triangle with an odd numberof collisions against the vertex.66Chapter 6Discretely DiscontinuousGalerkin Coarse Grid forDomain Decomposition6.1 IntroductionMany discretizations of elliptic partial differential equations (PDEs) lead to sparsesymmetric positive definite (SPD) linear systems of the form Au = f. For large3D problems, iterative solvers such as preconditioned conjugate gradient (PCG) areusually necessary. For a mesh with elements of diameter h, the condition numberoften grows quickly as h→ 0: the quality of the preconditioner becomes the crucialfactor for efficiency and robustness.With an optimal preconditioner, the linear system can be solved to desired pre-cision in a time which scales linearly with the problem size. Two popular andrelated frameworks for (potentially) optimal preconditioning are multigrid (MG)[32] and domain decomposition (DD) [136, 153]; we focus on the latter in thischapter. The key component we present is a coarse discretization of the PDE, usinglarger elements of size H, providing the coarse grid correction to accelerate globalconvergence.A critical factor in selecting a solver is the question of how much domain67knowledge the preconditioner requires. Geometric approaches require the prac-titioner to re-discretize the PDE at multiple scales, which for irregular domainsand/or coefficients may be challenging. In contrast, algebraic approaches work al-most entirely with the matrix A. While algebraic methods may be more difficultto develop, they can provide benefits in both ease of use and in handling irregularproblems.The method we propose here is essentially algebraic, but uses additional dis-crete information: we ask for a small set of generating vectors that span the spaceof degree p polynomials.1 We construct a coarse basis by algebraically partitioningthe domain into subdomains and using the restriction of each generating vector toeach subdomain as its own basis function. The resulting coarse space functionsare piecewise-smooth, with jumps at subdomain boundaries. From this basis, weconstruct a coarse problem by Galerkin projection.We derive an error bound on the solutions to the coarse problem, and showthat it is a high-order accurate convergent coarse grid approximation for a varietyof PDEs and discretizations. Convergence requires a limited coarsening factor [H/h]and sufficiently large p. Combined with DD in a Krylov method, we observe thenumber of required iterations decreases rapidly with p, and has reduced depen-dence on [H/h], e.g., maintaining optimal scaling in the case h = H2.For any finite resolution of the fine problem, our coarse bases may or may notbe interpreted as discontinuous. However, in the limit as h→ 0 with H fixed, theyare equivalent to the bases used in discontinuous Galerkin (DG) methods [9]. Wecall our coarse basis functions discretely-discontinuous, giving rise to the name“Discretely-Discontinuous Galerkin” (DDG) for the approach.We provide both theoretical and numerical evidence that discretely-discontinuousGalerkin (DDG) provides a convenient tool for easily constructing highly effectivecoarse grid corrections for a wide range of problems, varying over the type ofdiscretization (e.g., classic finite elements or finite differences), the domain (fromCartesian grids to adaptive unstructured meshes), and the underlying PDE (e.g.,1 While using mesh coordinates in the algorithm instead of just the matrix A may arguably go be-yond the strictest definition of an ‘algebraic’ method, we emphasize that no geometric operations arerequired, no mesh connectivity is involved, and no knowledge of the underlying continuum equationsis needed. Other algebraic schemes which similarly use slightly more information than the matrix Ainclude Vaneˇk et al. ’s smoothed aggregation method [155] and Brezina et al. ’s AMGe [28]68vector-valued elasticity and fourth-order biharmonic problems).6.2 Related WorkOur approach is closely related to the aggregation-based algebraic methods forconstructing a coarse basis. For a more thorough review of aggregation techniquesin the MG context, we refer the reader to review paper by Stu¨ben [143]. We reviewthe most closely related ideas and the DD setting.The performance of non-smoothed aggregation, like ours, depends criticallyon [H/h]. The simplest aggregation algorithm produces a piecewise constant coarsespace. If [H/h] =O(1), then this preconditioner applied to the Poisson problem hascondition number bounded independent of h [127, 128]. However, the conditionnumber grows with [H/h].For elasticity and higher-order PDEs, a piecewise constant basis is insufficient.Better aggregation techniques have been derived by requiring additional user input:the vectors that span the (near-)nullspace of the PDE [155], e.g., the rigid modes forelasticity and the linear polynomials for biharmonic problems.These techniques are already optimal, in the sense that the number of iterationsis bounded independent of problem size. Improvements to the iteration count cancome in the form of constant factor reductions and reduced dependence on [H/h] orgeometric dependencies such as the PDE, domain, coefficients, etc. Many workspresent modifications to aggregation-based techniques that improve their perfor-mance in these ways.Despite the optimal scaling of non-smoothed aggregation, when [H/h] is large,the aggregation-based coarse solution is a poor approximation to the actual solu-tion, and convergence is slow. Galerkin projection finds a solution which is optimalin energy norm, but the near-discontinuities at subdomain boundaries dominate theenergy. One way to reduce this dependence on [H/h] is to keep the aggregation ba-sis but apply a non-Galerkin projection, as in over-correction methods that applya scaling to the Galerkin solution [19, 26]. In practice, this significantly improvesthe results.Alternatively, one can work to change the basis. Sala et al. [128] show thatthe subdomains used for aggregation can be smaller than those used for the DD69smoothing step. Following this idea, the associated term in the bound on the con-dition number reduces geometrically with H. This requires some additional workto come up with the extra partitions, and enlarges the size of the coarse problem.Another alternative is smoothed aggregation, which smooths the basis func-tions, thus reducing the steep jumps at subdomain boundaries. For the Poissonproblem, this transforms an [H/h] term in the condition number bound into an [H/d]term, where d is the smoothing diameter [127]. This keeps the size of the coarseproblem the same as basic aggregation, but requires additional work to smooth thebasis, and can increase the number of nonzeros in the coarse matrix.Our method reduces to non-smoothed aggregation if the only generating vec-tor is the constant vector, and our method inherits the upper bounds proven fornon-smoothed aggregation. We increase the performance beyond non-smoothedaggregation by using a higher-order basis, which creates a high-order accurate re-discretization of the input PDE. By using a pth order coarse basis we reduce theenergy at subdomain boundaries from [H/h] to [H p+1/h]. We find the added power ofhigher-order bases greatly reduces the required number of iterations.Beyond non-smooth aggregation, discontinuous functions have appeared withinDD algorithms before. For example, the restricted additive Schwarz method gen-erates discontinuities at subdomain boundaries and has improved performance rel-ative to the corresponding smooth method [43]. The improved performance isexplained by Efstathiou and Gander: the restricted algorithm converges in the con-tinuous limit while the additive overlapping algorithm does not [70]. When usingDG discretizations, discontinuities are already present in the fine problem. Previousworks have developed DD solvers specifically for DG [61, 63], or agglomorated fineDG problems to construct coarse ones [16]. While our approach uses a basis likethat of DG methods, it does not require that the fine problem be discretized withDG, but can be interpreted as a rediscretization of the problem using a DG basis onelements of size H.6.3 PreliminariesThe DDG algorithm requires no geometric interpretation or information for im-plementation (the generating vectors which typically would contain a basis for70polynomials in the nodal coordinates are treated as a black box). However, ourunderstanding and analysis is intimately tied to a geometric interpretation, so wefrequently refer to the geometric properties for simplicity. For reference, table 6.1lists the major symbols used throughout this paper. They are also each defined atfirst use.Table 6.1: Common symbols for DD used throughout this chapter.input, fine grid system Au = finput, generating basis (tall matrix) Frestriction to coarse space R0restriction to overlapping subdomain i > 0 R˜icoarse grid matrix A0 = R0ART0coarse grid system A0u0 = R0felement diameter in fine grid helement diameter in coarse grid Hcoarsening factor [H/h]polynomial degree used in coarse basis psubdomain overlap, in geometric distance δsubdomain overlap, in algebraic graph distance ∆spatial dimension d6.4 The Coarse GridAn effective coarse grid needs to be able to approximate the error left by the one-level DD method. After a single pass of one-level DD, the error is extremely smoothin each subdomain, but not across subdomain boundaries (Figure 6.1). From thisstructure, we are motivated to use piecewise higher-order polynomials which havea similar piecewise-smooth structure.Our coarse approximation space consists of piecewise polynomials which aresmooth within each subdomain, but have arbitrary jumps at subdomain boundaries.The construction uses several user-supplied vectors, arranged in the columns of F,that span a degree p polynomial space. For many discretizations, F can be easilybuilt using the nodal coordinates of the mesh and the constant vector.To construct the coarse restriction and coarse system matrix, we follow thesame Galerkin projection as used in previous aggregation methods (e.g. [155]);71Figure 6.1: Error after smoothing with DD for a simple Poisson problem withrandom right-hand-side. One-level DD produces a piecewise-smootherror after one iteration. Top row: partition, error after DD smoothing.Bottom row: first and second x-derivative of error.the difference is in our choice of F. The subdomains are built by partitioning thediscrete domain into non-overlapping subdomains Ωi (i.e. subsets of indices) con-taining approximately (H/h)d nodes for problems in dimension d. Unless otherwiseindicated, all of our examples were partitioned using a graph-based discrete algo-rithm from the SCOTCH library [120].The coarse basis for subdomain i is spanned by the columns of φi = RTi RiF,721 % F ( i , j ) = i n p u t b a s i s j e v a l u a t e d a t node i .2 % Node i i s i n subdomain p a r t i t i o n ( i ) ( ze ro−based ←↩i n d e x ) .3 f u n c t i o n R0 = basis (F ,partition )4 k = s i z e (F , 2 ) ;5 [i ,j ,s ] = f i n d (F ) ;6 Rt = s p a r s e (i , partition (i ) *k+j ,s ) ;7 [Rt , ˜ ] = q r (Rt , 0 ) ; % o p t i o n a l o r t h o g o n a l i z a t i o n .8 R0 = Rt ' ;Figure 6.2: MATLAB code to build the restriction matrix R0 from input vec-tors F and a partition.where Ri is the restriction to the ith partition domain (with no overlap). The finalcoarse restriction is made by concatenating these basis vectors together as rows inR0. To help with conditioning, we orthogonalize the rows of R0, which may bedone independently for each subdomain as there is no overlap. Orthogonalizationis optional, but the remainder of the paper assumes R0 is orthogonal to simplifythe analysis. Figure 6.2 shows simple (albeit inefficient) MATLAB code for thisconstruction. A robust implementation must also detect when φi is not full rank,and discard columns as necessary.The coarse matrix A0 is constructed by Galerkin projection, A0 =R0ART0 , andthe coarse approximation to u (of Au = f) is given by RT0 u0 where A0u0 = R0f.The size of A0 increases with p, both in rank Θ(pd) and in the number of non-zeros Θ(p2d). However, the block sparsity pattern of A0 is independent of p. It hasthe same sparsity pattern as the subdomain adjacency matrix, but with each non-zero replaced with a small dense block with size dependent on p. This structureallows for efficient numerical linear algebra using dense storage and operations.The optimal choice of p, in terms of total work to solve the problem, will dependon both the problem at hand and details of the implementation, but we generallyfound p = 3, cubic polynomials, is a good default.736.5 Coarse Grid AnalysisWe show that, under moderate assumptions, the error of the coarse solution isbounded by‖RT0 u0−u‖A ≤ cH1+p−q(1+[H/h]q−1/2)|u|W˜ 1+p∞ (Ω) (6.1)for PDEs of degree 2q, where c is independent of h and H, and u is the piecewisesmooth interpretation of u defined in the next section. When [H/h] = O(1) and1+ p > q, this error converges with high-order in H. A convergent coarse gridapproximation naturally allows the coarse grid correction to capture all componentsof the error not handled by fine grid smoothing, increasing the efficiency of thepreconditioner.We present two arguments for convergence of the coarse grid. First, we presentan argument for FEM discretizations leveraging the extensive theory surroundingFEM and Sobolev norms. Second, we give an alternative argument that dependsonly on some discrete algebraic properties, which must be shown for each particu-lar discretization.6.5.1 Error Bound Using Geometric PropertiesHere we restrict our attention to the finite element method.Let the domain Ω be partitioned into subdomains Ωi. Let u be in the Sobolevspace W q∞(Ω) (i.e. it should have bounded qth-order weak derivatives) and supposeu is C∞ in each subdomain. Let uh ∈W q∞ be a FEM interpolant of this functionon some mesh. We assume that both the subdomains and the mesh elements sat-isfy all the usual regularity assumptions for meshes with elements of diameter Hand h respectively. We represent uh with a discrete vector u, assuming a nodalbasis, so ui = u(xi). Furthermore, we assume that the FEM interpolant satisfies‖u‖W q2 ≤ c‖uh‖W q2 , which is true when ‖u‖W q2 = ‖uh‖W q2 +O(h) and h is smallerthan some h0.Let the PDE be given as a symmetric elliptic bilinear form, a(u,v) =∫Ω k[u] · k[v]dΩ,with some linear functional k[·] involving up to qth order derivatives. For example,k = ∇ for the Poisson problem or k = ∇2 for the biharmonic problem. We assume74 ΩcutΩsubFigure 6.3: Regions Ωcut and Ωsub used for analysis. A simple finite elementmesh is divided into three DD subdomains. The elements separating thesubdomain interiors from each other and from the boundary conditionsare Ωcut . In Ωsub, the error is small because the approximation is good.The error is larger in Ωcut , but the total area is small enough that it doesnot hinder convergence.continuity a(u,u)≤ c‖u‖2W q2 , where c denotes an arbitrary constant independent ofh and H. Discretized with the FEM, uTAu = a(uh,uh).Each FEM nodal point lies within exactly one subdomain, and has an asso-ciated basis function. In some areas, basis functions from multiple subdomainsoverlap. Let the union of all mesh elements containing these overlapping areas,plus any elements touching the boundary of Ω, be Ωcut , and let Ωsub = Ω \Ωcut(Figure 6.3). For the purposes of the proof, we introduce additional bilinear formsacut(u,v) =∫Ωcut k[u] · k[v]dΩ and asub defined analogously, along with their dis-cretizations Acut and Asub. Note that a = acut +asub and A = Acut+Asub.If the mesh and subdomains are sufficiently regular and the FEM basis functionshave the usual compact support, then the (d-dimensional) volume µ(Ωcut)≤ c[h/H].It is linearly dependent on h because Ωcut is in a band of thickness ch around thesubdomain boundaries, and inversely dependent on H because that is the rate atwhich the total subdomain surface area grows.As in any Galerkin scheme, the coarse solution RT0 u0 is the minimum error75solution in the energy norm over the entire coarse space. Therefore the error isbounded by that of any particular coarse vector, including v = RT0 R0u. Using thisand the splitting,‖RT0 u0−u‖A ≤ ‖v−u‖A (6.2)=(|v−u|2Asub + |v−u|2Acut)1/2 (6.3)≤ |v−u|Asub + |v−u|Acut (6.4)Before tackling either of these terms, consider what v=RT0 R0u is. Because R0is orthogonal, RT0 R0u is the l2 projection of u onto the coarse space. By construc-tion of the coarse space, v has a continuous interpretation v that is a degree p poly-nomial in each subdomain Ωi. We can find v directly from u by a per-subdomainleast-squares approximation of u by a degree p polynomial, minimizing the sumof the squared error at each of the FEM nodal points. Barring pathological distribu-tions of mesh nodes, v will be a high-order approximation to u, satisfying the sameerror bounds commonly derived for FEM interpolants on a mesh with elements ofsize H.Now, we can bound the discrete error in terms of the geometric functions.We cite the appropriate theorems from Brenner et al. [27] for Sobolev and FEM-interpolant inequalities. Looking first in Ωsub, we find that the coarse polynomialsare a high-order approximation:|v−u|Asub error interior to subdomains (6.5)= |vh−uh|asub discrete and FEM energy are equal (6.6)≤ c‖vh−uh‖W q2 (Ωsub) continuity assumption (6.7)≤ c‖v−u‖W q2 (Ωsub) convergent FEM for sufficiently small h (6.8)= c‖v−u‖W˜ q2 (Ωsub) broken semi-norm defined below (6.9)≤ c‖v−u‖W˜ q2 (Ω) increasing domain only increases the norm (6.10)≤ cH1+p−q|u|W˜ 1+p2 (Ω) Theorem 4.4.20 [27] (6.11)≤ cH1+p−q|u|W˜ 1+p∞ (Ω) 2-norm vs. ∞-norm (6.12)76Here the broken semi-norm W˜ ba (Q) on domain Q is defined as a sum over subdo-mains:|u|W˜ ba (Q) =(∑i|u|aW ba (Ωi∩Q))1/a(6.13)This is naturally extended to a maximum over subdomains for a = ∞.Turning to Ωcut , the coarse polynomials are not a high-order approximationin the energy norm, because u is not smooth and v is not even continuous in thisregion. However, Ωcut is small enough that L∞ bounds are sufficient.|v−u|Acut error at subdomain boundaries (6.14)= |vh−uh|acut discrete and FEM energy are equal (6.15)≤ c‖vh−uh‖W q2 (Ωcut) continuity assumption (6.16)≤ ch−q‖vh−uh‖L2(Ωcut) Theorem 4.5.12 [27] (6.17)≤ ch−qµ(Ωcut)1/2‖vh−uh‖L∞(Ωcut) 2-norm vs. ∞-norm (6.18)≤ ch−q[h/H]1/2‖vh−uh‖L∞(Ωcut) mesh and subdomain regularity (6.19)≤ ch−q[h/H]1/2‖v−u‖L∞(Ωcut) stability of interpolation, 4.4.1 [27] (6.20)≤ ch−q[h/H]1/2H1+p|u|W˜ 1+p∞ (Ωcut) Theorem 4.4.20 [27] (6.21)≤ ch−q[h/H]1/2H1+p|u|W˜ 1+p∞ (Ω) increasing domain (6.22)= cH1+p−q[H/h]q−1/2|u|W˜ 1+p∞ (Ω) factor to match eq. (6.12) (6.23)Combining the two bounds 6.12 and 6.23 completes the error bound‖RT0 u0−u‖A ≤ cH1+p−q(1+[H/h]q−1/2)|u|W˜ 1+p∞ (Ω) (6.24)Numerical experiments suggest this bound has the correct asymptotic form,excluding the term |u|W˜ 1+p∞ (Ω), which was not computed experimentally.6.5.2 Error Bound Using Algebraic PropertiesWe can derive a similar bound based purely on algebraic components. As be-fore, let v = RT0 R0u and A = Acut +Asub be a splitting into symmetric positivesemi-definite components. This splitting need not correspond to the FEM defini-77tion given earlier. However, we require that Acut = [Bcut 00 0 ] with Bcut ∈ Rm×m andm≤ c[h/H]h−d . This is usually true and plays the role of µ(Ωcut) from the geomet-ric proof.We assume that for all u ∈V ,‖Acut‖2 ≤ ch−2q, (6.25)‖u‖∞ ≤ chd/2‖u‖2, (6.26)|u−v|Asub ≤ cH1+p−q‖u‖2, (6.27)and ‖u−v‖∞ ≤ cH1+p‖u‖∞. (6.28)The constants c include the roughness term |u|W˜ 1+p∞ (Ω) from the geometric proof.Geometrically speaking, the subspace V must be restricted to functions with boundedroughness.Showing that these assumptions are true for a particular discretization couldexploit geometric properties as in the previous section.From these assumptions, the convergence argument follows the exact samestructure as in the geometric case and we do not repeat it. The final error bound issimilar to the above:‖RT0 u0−u‖A ≤ cH1+p−q(1+[H/h]q−1/2)‖u‖2. (6.29)6.5.3 Practical ConsiderationsThe proof and our use of the coarse grid in practice are not entirely consistentwith each other. The coarse grid is used to approximate the error after applyingone-level DD. For PDEs with smooth coefficients, as h→ 0 with H fixed, the errorafter one-level DD is piecewise C∞ as in the proof. For any finite h, it is only anapproximation as accurate as the discretization.In problems with discontinuous coefficients, even as h→ 0, the error after one-level DD is not C∞ in each subdomain. It has kinks where the coefficients have dis-continuities. We tried matching the partition boundaries to the discontinuities, orusing a piecewise generating basis F that matches the discontinuities. We observedoptimal scaling even without these strategies, but either strategy significantly ac-78celerated convergence with high-order polynomials.When we use non-trivial overlap between subdomains, the boundaries of thesmooth regions do not line up with the discontinuities in the coarse space. This iseasy to resolve by adjusting the coarse subdomains, but this introduces more vari-ation in the coarse subdomains’ size and shape. In numerical experiments, betterresults were obtained by ignoring this inconsistency with the proof and keeping theoriginal subdomain shapes.The roughness term in the error bound can increase with p, suggesting that theerror can actually increase with p. However, because increasing p always growsthe coarse vector space, and the Galerkin solve is optimal in that space, increasingp never increases the error.6.6 Domain DecompositionWe incorporate the coarse grid within a standard multiplicative algebraic DD frame-work.The DD subdomains begin with the same partition computed for the construc-tion of the coarse basis. From the partition, overlapping subdomains are alge-braically constructed by expanding the partition to include nodes within graphdistance ∆ in the graph defined by A. The expanded subdomains Ω˜i overlap ingeometric bands of size δ . For simple meshes and discretizations, δ = (1+2∆)h.Unless otherwise indicated, we use minimal overlap ∆= 0.Let R˜i be the restriction matrix, such that ui = R˜iu is the vector of the elementsof u from subdomain Ω˜i, i.e. R˜i is a subset of the rows of the identity matrix. Foreach subdomain, the local problem uses the matrix Ai = R˜iAR˜Ti , and we solvethese subdomain problems exactly. Given a current approximation uk, processingsubdomain i updates the approximation touk+1 = uk + R˜Ti A−1i R˜i(f−Auk). (6.30)Iterating over all subdomains and updating the approximation to u after each, wearrive at the algorithm for one-level multiplicative overlapping Schwarz.To build a two-level method, we multiplicatively combine one pass of one-levelSchwarz as a pre-smoother, the coarse problem solution, and another pass through79the subdomains as a post-smoother. The post-smoother is done in reverse order,making the entire operation symmetric and usable with PCG.We give some experimental results with a three-level method operating in a V-cycle. We construct the three-level problem by taking the two-level algorithm andapplying it again to the coarse matrix A0 to make an even coarser matrix A1. Todo this, we need a coarsened version of the generating vectors, which are simplyF0 = R0F. The coarsened coarse problem is equivalent to directly coarsening theoriginal problem with larger subdomains. We keep the ratio between physical ele-ment sizes in adjacent levels the same (i.e. h/H0 = H0/H1). The algebraic overlap∆ used for smoothing is also the same at all levels.6.6.1 Condition NumberFor several special cases, our approach reduces to previously published aggregationapproaches. For the Poisson problem using p = 0, Sala [127] showed that the ad-ditive variant of the preconditioner has condition number bounded by O(1+[H/h]).For elasticity and biharmonic problems with p = 1, our approach is essentially anon-smoothed two-level variant of the multigrid method described by Vaneˇk et al.[155]. Their coarse grid uses the zero-energy modes, which are p = 1 for bihar-monic and a subset of p = 1 for elasticity. They later prove optimal convergence,but only for the Poisson problem [156].We do not have a condition number bound showing the dependence on p andq. However, increasing p beyond the low-order choices in the literature increasesthe dimension of the coarse space, which does not have a negative effect on con-vergence – it can only increase the rate of convergence.6.7 Numerical ExperimentsWe demonstrate the performance of our coarse grid and DD as a preconditioner forPCG on a variety of PDEs and discretizations. Unless otherwise noted, all problemsare solved to a 10−9 reduction in residual after the first application of the precon-ditioner. The right-hand-side vector f is a random Gaussian-distributed vector, andthe initial guess for u is 0. For the sake of easier reporting, we consider all prob-lems with uniform meshes to be scaled such that rank(A) = n = h−d , where d is80 Figure 6.4: Meshes used in DDG examples. Left: sample annulus mesh usedfor Poisson problems B,C,D, with shading showing S+ 10J. Right:sample adaptive mesh used for elasticity problem E.the spatial dimension. Consequently, H−d is the number of subdomains.The graph of the logarithm of the residual vs. the iteration count is typicallyvery straight. Therefore we measure not just the integer iteration on which theresidual is first smaller than the tolerance, but also the fractional iteration countat which the linear interpolation of this graph meets the tolerance. We found thisreveals a lot of otherwise hidden detail and these fractional iterations are shown inFigure 6.6. When PCG takes more than 1000 iterations, we stop the solve and reportiteration bounds based on condition number estimates derived from the Lanczoscoefficients computed during PCG. For converged problems, these bounds agreedvery well with actual iteration counts.We use the following problems:(A) Poisson 3D. ∇ ·∇u = f discretized on an m×m×m regular grid with a 7-point finite difference stencil. One face of the cube has a Dirichlet boundarycondition and the remainder are Neumann. For this problem, we partitionusing recursive inertial partitioning [151] so that the matrix need never beexplicitly constructed. The first partition uses a randomly-oriented plane to81ensure irregularly shaped subdomains.(B) Smooth Poisson. ∇ · S∇u = f discretized with piecewise linear finite ele-ments on a 2D unstructured triangle mesh of a circular annulus with outerradius 5 times the inner radius. Both inner and outer boundaries use Dirichletconditions. The scalar function S = exp(1+ sin(pi(x+ y))) is smooth. SeeFigure 6.4.(C) Non-Smooth Poisson. ∇ ·D∇u = f discretized as above, but with discon-tinuous D = S+ 100J where J =H (0.25+ cos(pix)cos(2piy)) with Heav-iside step function H . J is an indicator function for two ‘materials’ in theproblem. We used algebraic partitions that do not conform to the materialboundaries, but we use generating vectors F that are piecewise polynomialwith respect to the material domains. This doubles the number of columns inF, but only subdomains that include the material boundary end up with ad-ditional coarse basis functions, so A0 is only marginally larger. In practice,we observe optimal scaling even without this extra work and it makes nodifference with the piecewise constant basis. However, with the piecewisecubic basis, this material-aware F reduces the iteration count by nearly onehalf.(D) High-Order Poisson. ∇ ·∇ discretized on the same mesh as problems (B)and (C), but with continuous piecewise cubic finite elements.(E) Elasticity. (∇ ·∇+∇∇·)u= f with Dirichlet boundary conditions, with vec-tor u. This is discretized on a spatially-adaptive unstructured 2D trianglemesh (Figure 6.4) with piecewise linear FEM. For the generating vectors F,we take the degree p polynomials in each component of u.(F) Biharmonic. ∇4u = f on a regular m×m 2D grid discretized with a 13point finite difference stencil. All boundaries have homogenous Dirichletand Neumann conditions. This problem uses an algebraic overlap ∆ = 1,since performance is quite poor with ∆ = 0. Also this problem is solvedonly to 10−6 reduction in residual, as the fine discretizations are very poorlyconditioned causing PCG to break down before reaching 10−9 as used in theabove.82Table 6.2 summarizes the results for solving these problems at different h, H,and polynomial coarse spaces from piecewise constant P0 to cubic P3. With fixed[H/h], we observe near constant iteration counts, independent of the problem size,when using the two-level algorithm. Part of the increase in iteration count can beattributed to degrading partition quality with the algebraic partitioner. Experiments(not shown) with more structured partitioning of the structured meshes showed lessvariation in iteration counts. The three-level V-cycle does not perform nearly aswell, but still appears to be sub-logarithmic in n.Table 6.3 shows the wall-clock time spent on setup (excluding partitioning)and solution of some problems from Table 6.2. The 2D problems were solved witha MATLAB implementation that solved each subdomain problem with the “back-slash” operator on each iteration, but stored a factorization of the coarse grid. The3D problems were solved with a parallel C++ implementation that solved the sub-domains with successive over-relaxation and solved the coarse grid with conjugategradient, preconditioned with incomplete Cholesky. Both implementations ran ona 32-core Intel Xeon E5-2690 with 256GB of RAM. In all cases, the bulk of theruntime is spent on the subdomain solves, despite the use of poorly-scaling solversfor the coarse problem. Furthermore, the runtime scales approximately linearlywith problem size, as desired. This is visible in Table 6.4, which displays runtimenormalized by problem size. Perfect linear scaling would be shown as constantnormalized runtime. The major exception to the approximately linear scaling isthe 3D solver, which scales faster than linearly. Given that the number of itera-tions are constant, but the normalized runtime is improving on larger problems,this seems likely to be due to overhead that is being over-emphasized on smallproblems.To directly explore the value in using higher-order coarse bases, we solve a1000×1000 biharmonic problem with varying p, a large coarsening factor [H/h] =125, and ∆ = 2. The large coarsening factor is desirable for efficient parallel im-plementations, but significantly reduces the accuracy of the low-order coarse grids.As shown in Figure 6.5, we observe the number of iterations decreases rapidly withincreasing p.Note that increasing p increases the size of A0, so there are diminishing returnswith large p. Nonetheless, significant reductions in problem size are achieved for83102 103 104101102103104 p=0p=10250 30 rank(A)/rank(A0)Iterations to ConvergeBiharmonic Problem, Variable Coarse Grid Size variable p, fixed Hp=1, variable HFigure 6.5: p-dependent convergence on a biharmonic problem. The numberof required iterations decreases rapidly with polynomial degree used inthe coarse basis (parameters: h−1 = 1000, [H/h] = 125, ∆= 2). For com-parison, when using p = 1 but aggregating using smaller subdomains,the iteration count is much higher for the same number of coarse vari-ables.all p. The least reduction, with p = 10, is rank(A0) = (1/284) · rank(A). A sim-ilar effect occurs in aggregation techniques when the aggregation subdomains aresmaller than the subdomains used in smoothing. We compare to this approach byusing p= 1 but aggregating on smaller subdomains. The high-p basis significantlyoutperforms this approach (Figure 6.5).Our error bound does not strictly require that [H/h] = O(1) in order to producea convergent coarse grid, and accompanying optimal preconditioner. For Poisson(q = 1) using a piecewise linear coarse basis (p = 1) and substituting the relation-840 200 400 600 800 1000 1200 1400 1600 1800 200056789101/hIterations to ConvergePoisson convergence with h=H2Figure 6.6: Optimal convergence when subdomain size is not fixed. Our gridis convergent and has approximately h-independent convergence, evenwhen [H/h] is not fixed. Parameters: h = H2, 2δ ≈ H, p = 1.ship h = H2, we arrive at the convergent bound‖RT0 u0−u‖A ≤ cH1/2|u|W˜ 2∞(Ω). (6.31)Figure 6.6 shows the Poisson problem B with these parameters. Each subdomainhas a number of nodes equal to the number of subdomains, so the coarse matrixand the subdomain matrices are a similar size, which is an interesting point in thedesign space. As in all overlapping DD methods, the smoother is very sensitive toH/δ . To keep it approximately constant, we set ∆= bH/4hc. The minor saw-toothpattern in the graph comes directly from the remaining variation in H/δ . Withhigher polynomial degrees, h = Hs for higher powers of s should be possible.6.8 Weighting for a Better Coarse GridThe error bound presented previously has a dependency on [H/h] that a geometriccoarse grid would not have. This section presents some incomplete work towardsan h-independent coarse grid.The h-dependence is not due to the choice of basis, but is an artifact of theGalerkin projection. Using the same basis, but with the L2-optimal approximation85to the solution, the coarse approximation is given by,uL2 = RTRA−1f.This coarse approximation has an L2 error bound that depends only on H p+1, as itis simple piecewise polynomial approximation on subdomains of size H.This works well, but is obviously computationally unreasonable: it requirescomputing A−1f which is already the solution to the high-resolution problem.However, we begin here to analyze some non-Galerkin projection schemes.Let S be the restriction onto the orthogonal complement of the coarse space(the rowspace of R). So that [RT ST ] ∈Rn×n is an orthogonal matrix. We can write:uL2 = RTRA−1f= RTRIA−1If= RTR[RS]T[RS]A−1[RS]T[RS]f= RTR[RS]T[RS]A[RS]T−1[RS]f= RT[I 0][RS]A[RS]T−1[RS]f= RT[I 0][RART RASTSART SAST]−1[RS]f= RT[I 0][A−11 BBT A−12][RS]f= RT[A−11 B][RS]f= RTA−11 Rf+RTBSf= uR+uS.86In many cases, the matrix A can be factorized as A = GTG in a meaningfulway. For example, if A is the Laplace operator ∇ ·∇, then G can be a discretizationof the gradient ∇. More generally, when evaluating the finite element method byquadrature, each row of G can correspond to a quadrature point, with a weighteddiscrete evaluation of k[u]. Taking this view, we expect G to be taller than A, atleast as sparse, and easy to obtain. Using this factorization,A1 = RGTWGRTandW = I−GST (SAST)−1 SGT = I−PGST = P⊥GSTwhich is the projection onto the orthogonal complement of the column-space ofGST (which is different from the column-space of GRT). There is a similar closedform for B, but we will not use it.The first term (uR =RTA−11 Rf) has the form that we would like: it has the samedimensions as the Galerkin coarse matrix, and the entire operation is symmetric.The second term (uS) is asymmetric and undesirably directly involves the largeunknown matrix S. Can uS be safely dropped?The uS term represents the coupling between high-frequency terms in the right-hand side and low frequency terms in the solution. If f were smooth, then ‖Sf‖ =O(H p+1), due to the approximation power of the coarse basis, and we could hopeto ignore uS based on that argument. Unfortunately, there is no reason to expect fto be smooth: one-level DD leaves a smooth error, but not a smooth residual. In-dependent of f, numerical experiments suggest the second term is relatively small:‖RTA−11 R‖2 ≈ 8‖RTBS‖2 for a Poisson problem with h−1 = 40, H−1 = 4, p = 1.More experiments show this scales as O(H2) and I conjecture that the H2 decaycomes from the approximation power of R, so this term could be even smaller forhigher-order bases. Because of these (incomplete) arguments, dropping uS seemsreasonable.The remaining difficulty is that W is too computationally difficult to constructas given. Fortunately, there are many other matrices that will have the same exact87effect as W. Because W only ever acts on the low-rank matrix GRT, its actionoutside that subspace has no effect. To make this explicit, we can characterize thefreedom we have with W. Take the singular value decomposition of GRT:GRT = UΣVT =[U1 U2][Σ10]VT = U1Σ1VTThen, write W in that basis,W = UXUT =[U1 U2][X11 X12X21 X22][U1 U2]TIt follows that RGTWGRT = VΣ1X11Σ1VT, and we see that only X11 matters.The blocks X12, X21, and X22 are completely free.This leads to the question, does there exist a matrix equivalent to W that is easyto construct or approximate? Ideally, a diagonal approximation to W would bepossible, as this would give A1 the same sparsity pattern as the Galerkin approach.The connection between diagonal dominance of W and diagonal dominanceof X is not obvious. As a start, we make X as diagonally dominant as possibleby choosing X12 = XT21 = 0 and X22 = I, and call this choice W1. This is alsoequivalent toW1 = I−PGRTPGSTPGRTIn experiments, this matrix has nearly unit diagonal. It is diagonally dominanton all rows which don’t correspond to crossings of subdomain boundaries. On rowsthat do cross subdomain boundaries, the sum of the magnitude of off the diagonalsappear to be approximately proportional to the norm of the rows of GRT.The uR term is related to a non-Galerkin projection using a weighted norm:uLS = RTA−11 RGTWb = argminu=RTy‖Gu−b‖2W.So, we call W the weight matrix. The difference between uLS and uR comes isthe right-hand-side. For uR (and the Galerkin approximation) the right-hand-sideis f = RGTb, but for uLS it is fLS = RGTWb. When using approximations to W,numerical experiments suggest that uLS is generally a better approximation to u88than uR. Unfortunately, b is not available when applying iterative Krylov methods,such as PCG.A Diagonal Weight Matrix Consider the simple first-order finite-difference dis-cretization of the 2D Poisson problem on a regular grid, with G measuring finite-difference derivatives between two points. Then, let WD have ones on the diagonalwhen the corresponding row of G does not touch multiple subdomains, and havech/H on the diagonal if the row crosses subdomain boundaries, where c= 8 seemsto work in practice.Using WD in place of W, the coarse matrix is easily shown to be a first orderapproximation (O(h) when H is fixed) to the early DG method by Babusˇka andZla´mal [12]. Proofs of convergence of that approach require an H−2 term on thediagonal in order to dominate consistency errors [9], but we found performance tobe better when using H−1.We experimented with algebraic methods of deriving diagonal weights withsimilar properties, without prior knowledge of h or H. Multiple schemes based onthe row-norms of GRT seem to have the same asymptotic form. There is a largedesign space, with no obvious advantages to draw us to any corner.This approach appears to generalize to other second order elliptic PDEs, but itdoes not continue to work for higher-order PDEs. On the biharmonic problem, thisweighting generates the necessary penalty term on discontinuities at subdomainboundaries, but is missing the required penalty term for discontinuities in gradientacross boundaries.In the special case of piece-wise constant bases, this approach is equivalentto over-correction methods [19, 26]. However, for higher-order bases (p > 0),over-correction incorrectly re-weights the interior of the subdomains where thebasis functions are smooth. Re-weighting only disrupts the correctly-functioningGalerkin projection here. The approach using WD fixes this defect.My analysis based on W stops here. We found no further way to make use ofthese observations in a numerical method. Research along these lines is left for thefuture.896.9 DiscussionWe have presented an algebraic coarse grid construction that produces a convergentrediscretization for a wide variety of PDEs. It works for both scalar- and vector-valued problems, both second- and fourth-order PDEs, and both smooth and dis-continuous coefficients. The high-order DG-like coarse basis is easy to constructalgebraically, and Galerkin projection generates a high-order convergent coarserediscretization of the input problem. Combined with DD and PCG, we observeconvergence in a number of iterations nearly independent of problem size, as ex-pected from the existing theory for aggregation methods. Furthermore, increasingthe polynomial degree rapidly reduces the number of required iterations: the fastestsolves used high-degree polynomials.The relationship between this approach and simple aggregation is similar to therelationship between high-order FEM and low-order FEM. High-order methods areappropriate when the solution to the problem is very smooth – which is exactly thecase after subdomain smoothing. Similar to the situation in FEM, the high-orderapproach often drastically outperform the low-order schemes.There are a number of outstanding questions raised by this work. We haveshown a bound on the error of the coarse grid, dependent on H,h, p and the smooth-ness of the solution. Ideally, we would have a thorough understanding of the rela-tionship between all of the parameters (H, p, h, and δ ) and the condition numberor number of iterations to converge. We leave closing this gap in the analysis forfuture work.On the more practical side, the generating vectors F are not difficult to sup-ply, but it would be more convenient to construct similar high-order coarse gridsdirectly from A. This becomes even more important on more complex problemswith spatially-varying coefficients, for which a well-adapted basis can significantlyimprove performance. Also, our approach still has an undesirable dependency on[H/h] that is not present in geometric methods. Following the connection betweenthe coarse basis and DG, we have done some preliminary work on algebraicallyconstructing a DG-like discretization that is independent of [H/h], but with mixedsuccess.90Table 6.2: Iterations of CG with DDG to solve various problems in n variables.P0, i.e. non-smoothed aggregation, is not expected to achieve optimalscaling for ∇4; the P0 column is included for comparison only. Prob-lems marked “×” are too small to use the three-level solver - the coarsestpartition would be a single subdomain.Two-Level Three-LevelH/h=10 H/h=20 H/h=10n1/d P0 P1 P2 P3 P0 P1 P2 P3 P1 P2 P3∇·∇3D40 36 20 15 12 35 23 18 15 × × ×80 41 20 16 13 51 28 21 18 × × ×160 44 21 16 13 61 30 23 19 39 29 23320 44 21 16 14 63 30 23 19 43 32 26640 46 22 16 14 65 31 23 19 48 35 34∇·S∇200 41 19 14 11 51 25 20 15 26 23 19400 44 19 14 11 60 26 19 16 40 29 24800 48 20 15 12 64 27 20 17 43 31 231600 52 20 15 12 68 28 22 17 45 32 253200 52 21 16 13 71 29 22 17 45 32 25∇·D∇200 43 19 15 12 56 27 22 16 32 27 25400 46 20 15 13 64 28 22 19 42 28 24800 48 21 16 13 67 31 23 18 43 31 251600 52 21 17 14 69 33 23 20 47 34 263200 58 24 17 13 69 31 24 19 49 35 27∇2P 3-FEM200 47 22 17 14 58 29 21 17 33 24 21400 53 22 17 14 68 29 22 19 44 34 25800 53 24 17 14 73 31 23 19 47 33 261600 56 23 19 15 75 32 25 19 49 35 283200 67 24 18 15 83 32 24 20 50 36 28Elasticity200 72 29 21 19 83 40 32 25 × × ×400 81 29 24 18 96 41 30 25 47 37 30800 82 33 23 20 104 42 31 26 67 48 391600 76 32 24 20 107 44 33 27 69 51 363200 76 33 27 23 108 47 36 29 71 50 40∇4200 698 62 20 12 600 154 44 24 119 40 22400 2900 68 21 12 3500 188 53 27 376 112 37800 5700 77 25 15 6900 184 55 32 961 156 511600 9500 99 31 15 11000 246 70 30 2100 169 5891Table 6.3: Runtimes for the DDG results in the middle column of Table 6.2(two-level scheme with H/h=20). Each entry shows the wall-clock timein seconds and the percent of total runtime spent on coarse grid setup andsolution. Problems marked “×” did not converge within 1000 iterations.n1/d P0 P1 P2 P3∇·∇3D40 79 ( 0%) 55 ( 0%) 42 ( 0%) 32 ( 1%)80 152 ( 1%) 87 ( 1%) 64 ( 2%) 53 ( 3%)160 481 ( 2%) 238 ( 4%) 186 ( 6%) 160 (11%)320 1207 ( 3%) 614 ( 8%) 525 (15%) 497 (27%)640 7053 ( 5%) 3785 (16%) 3140 (25%) 3563 (43%)∇·S∇200 12 ( 2%) 5 ( 2%) 4 ( 3%) 3 ( 7%)400 48 ( 1%) 22 ( 2%) 17 ( 4%) 15 ( 8%)800 246 ( 1%) 108 ( 2%) 87 ( 4%) 72 ( 9%)1600 1283 ( 1%) 572 ( 2%) 423 ( 4%) 358 ( 9%)3200 6491 ( 0%) 2767 ( 2%) 2107 ( 4%) 1711 ( 9%)∇·D∇200 11 ( 1%) 5 ( 2%) 4 ( 5%) 4 (11%)400 51 ( 1%) 22 ( 2%) 18 ( 5%) 16 (11%)800 227 ( 1%) 100 ( 2%) 84 ( 6%) 67 (13%)1600 1051 ( 1%) 496 ( 2%) 362 ( 6%) 320 (12%)3200 7170 ( 0%) 3006 ( 2%) 2254 ( 4%) 1896 (10%)∇2P 3-FEM200 18 ( 1%) 9 ( 1%) 6 ( 4%) 6 ( 5%)400 92 ( 0%) 43 ( 1%) 32 ( 3%) 27 ( 6%)800 440 ( 0%) 190 ( 1%) 149 ( 3%) 121 ( 7%)1600 1902 ( 0%) 853 ( 1%) 635 ( 4%) 498 ( 8%)3200 10223 ( 0%) 4123 ( 1%) 3183 ( 3%) 2706 ( 7%)Elasticity200 30 ( 0%) 15 ( 1%) 11 ( 3%) 9 ( 6%)400 147 ( 0%) 60 ( 1%) 46 ( 3%) 42 ( 7%)800 644 ( 0%) 263 ( 1%) 207 ( 3%) 172 ( 7%)1600 2778 ( 0%) 1160 ( 1%) 883 ( 4%) 727 ( 8%)3200 14958 ( 0%) 6202 ( 1%) 5017 ( 3%) 4625 ( 6%)∇4200 198 ( 0%) 57 ( 1%) 17 ( 1%) 11 ( 3%)400 × 321 ( 0%) 86 ( 1%) 45 ( 4%)800 × 1565 ( 0%) 484 ( 2%) 225 ( 5%)1600 × 7900 ( 1%) 2171 ( 2%) 1012 ( 5%)92Table 6.4: Normalized runtimes for the DDG results in the middle column ofTable 6.2 (two-level scheme with H/h=20). Each entry shows the wall-clock time (as in Table 6.4), normalized by problem size. The exactnumber shown is 1000000 · runtime/n. The percent of total runtime spenton coarse grid setup and solution is shown in parentheses. Problemsmarked “×” did not converge within 1000 iterations.n1/d P0 P1 P2 P3∇·∇3D40 1234 ( 0%) 860 ( 0%) 656 ( 0%) 500 ( 1%)80 297 ( 1%) 170 ( 1%) 125 ( 2%) 104 ( 3%)160 117 ( 2%) 60 ( 4%) 45 ( 6%) 40 (11%)320 37 ( 3%) 19 ( 8%) 16 (15%) 15 (27%)640 27 ( 5%) 15 (16%) 12 (25%) 14 (43%)∇·S∇200 292 ( 2%) 131 ( 2%) 102 ( 3%) 84 ( 7%)400 297 ( 1%) 138 ( 2%) 106 ( 4%) 93 ( 8%)800 384 ( 1%) 169 ( 2%) 136 ( 4%) 113 ( 9%)1600 501 ( 1%) 223 ( 2%) 165 ( 4%) 140 ( 9%)3200 634 ( 0%) 270 ( 2%) 206 ( 4%) 167 ( 9%)∇·D∇200 272 ( 1%) 132 ( 2%) 108 ( 5%) 91 (11%)400 320 ( 1%) 138 ( 2%) 112 ( 5%) 97 (11%)800 354 ( 1%) 157 ( 2%) 131 ( 6%) 105 (13%)1600 411 ( 1%) 194 ( 2%) 142 ( 6%) 125 (12%)3200 700 ( 0%) 294 ( 2%) 220 ( 4%) 185 (10%)∇2P 3-FEM200 426 ( 1%) 213 ( 1%) 156 ( 4%) 143 ( 5%)400 570 ( 0%) 270 ( 1%) 197 ( 3%) 165 ( 6%)800 682 ( 0%) 294 ( 1%) 231 ( 3%) 188 ( 7%)1600 742 ( 0%) 333 ( 1%) 248 ( 4%) 194 ( 8%)3200 996 ( 0%) 402 ( 1%) 310 ( 3%) 264 ( 7%)Elasticity200 750 ( 0%) 374 ( 1%) 272 ( 3%) 239 ( 6%)400 916 ( 0%) 373 ( 1%) 286 ( 3%) 262 ( 7%)800 1008 ( 0%) 411 ( 1%) 324 ( 3%) 269 ( 7%)1600 1110 ( 0%) 464 ( 1%) 353 ( 4%) 290 ( 8%)3200 1466 ( 0%) 608 ( 1%) 492 ( 3%) 453 ( 6%)∇4200 4941 ( 0%) 1437 ( 1%) 427 ( 1%) 273 ( 3%)400 × 2008 ( 0%) 535 ( 1%) 283 ( 4%)800 × 2445 ( 0%) 757 ( 2%) 351 ( 5%)1600 × 3086 ( 1%) 848 ( 2%) 395 ( 5%)93Chapter 7ConclusionFluid simulation in computer graphics depends on a many different mathematicaltools. In this thesis, I developed new techniques for several important componentsof a modern liquid solver. The research chapters each include their own conclusion,discussing limitations and potential future work. Here, I extend that discussion toinclude the impact of the projects and other new research, and discuss the future ofthe general approaches used in the thesis projects.Detailed Water with DG. This project was presented at SIGGRAPH 2014. Peopleseemed generally excited about the results, but confused by the unfamiliar andintricate use of DG. As this was just presented last year, it is unclear what sort ofimpact it will have.With this project, we set out to solve the resolution-mismatch problem withembedding high-resolution meshes in low-resolution grids. We did not completelyeliminate these problems, and still had to include some non-physical surface reg-ularization. In hindsight, using an approximation space that is not adapted to thesurface shape may never be strong enough to completely solve the problem in anefficient manner. Some new work by Ando et al. [8] has started to investigatesurface-adapted bases. While this project was in development, the need to com-pletely solve this embedding problem in a physical way was slightly reduced bythe publication of more physically-realistic regularization approach based on vor-tex sheets [20]. These approaches are completely complementary.94The power and flexibility of p-adaptivity went beyond my expectations andare applicable to far more than just surface resolution matching. There is a lotof room for future work based on this project, most obviously extending this tohp-adaptivity, which promises even more performance and flexibility.High-Order PIC. This project achieved the goal of addressing some specific short-comings of the high-order accurate approach by Edwards and Bridson [67]. How-ever, it is difficult to assess its overall value and it seems unlikely that the resultingscheme will ever be used as-is.I hope this work will inspire and inform future projects using more accurateformulations of PIC, FLIP, or MPM. In the engineering field, where MPM is mostused, that seems unlikely to happen in the near future, because the scheme is soheavily modified relative to traditional MPM. The loss of discrete conservationproperties could be the strongest impediment to adoption.In the field of computer graphics, our approach is not so foreign. It is similarto the version of FLIP already used for computer graphics. Furthermore, conser-vation laws are not given as much importance by computer graphics practitioners.The MPM methods published in graphics venues so far [141, 142] have used theengineering/conservation form, but I wouldn’t be surprised to see the adoption ofsome ideas from Chapter 4. As a whole, PIC schemes are treated very flexibly incomputer graphics, with lots of extensions and modifications from the standardengineering approach.Continuous Collision Detection. The CCD paper (Chapter 5) seems to have broughtthe problem of inexact CCD algorithms to people’s attention. Since its publi-cation, Wang [161] has shown how to prevent false-negatives while minimizingfalse-positives in the standard (constructive) CCD approach, by carefully trackingfloating-point errors and introducing several other predicates to use when the error-bound shows the point-triangle test to be inaccurate. Tang et al. [150] derive analternative exact algorithm, similar to the traditional approach, but with an exactreformulation. Their approach runs faster than that proposed in Chapter 5, and isconceptually more familiar to users of the standard approach. So far, there have95been no follow-up approaches using the root-parity reformulation, but given itsflexibility it would not surprise me if it sees future use as well.Discretely Discontinuous Galerkin Coarse Grid. This project presents an alge-braic coarse grid that works for a wide variety of problems, including high-orderPDEs. The approach is closely related to existing aggregation approaches, but hassignificantly more flexibility due to its use of high-order convergence to combatother sources of error.While it has strictly better asymptotic error bounds than basic aggregation ap-proaches, this project did not entirely accomplish the goals I wanted to achieve.The convergence rate of the algorithm in Chapter 6 has a dependency on [H/h],while geometric DD has convergence rate independent of fine resolution h andcoarse resolution H. The dependency on [H/h] is common to many algebraic DDschemes, including both aggregation and non-aggregation schemes (e.g. BDD [111]and FETI [78]). Nonetheless, I hoped to avoid this undesirable property, and I’moptimistic that similar strategies can achieve this goal.This approach occupies an interesting niche in the design space between ge-ometric methods and ‘fully’ algebraic methods, which work only with the inputlinear system. Whether or not it will be used as is, I hope it can serve as a stepping-stone towards even better techniques.High-Order Methods. A common trend through many of the projects in this thesisis the use of high-order accurate techniques.High-order methods have a number of advantages over low order techniques.The most obvious is the high-order convergence rate from which they get theirname. This often leads to increased accuracy for a fixed discretization resolution,which hopefully translates in to reduces time and/or memory to achieve the samequality of result. Furthermore, high-order methods often do more computation pervariable than low-order schemes, which works especially well on modern com-puter architectures that have memory access times far slower than floating-pointoperations [4, 45].The character of the errors made by high-order methods can also be differ-96ent from that of low order methods. For example, in fluid problems, numericaldiffusion/viscosity is a common numerical error in low order schemes. A high-order scheme may completely eliminate the usual diffusion term, resulting onlyin a ‘hyper-diffusion’ related to higher-order derivatives. In fact, many high-ordermethods achieve exact results in the case that the solution is a low-order polyno-mial. This may not sound interesting at first glance, but it includes many commonscenarios; for example: parabolic ballistic trajectories, linear pressure in hydro-static fluids, linear velocity in rigidly-moving bodies. Solving these base-casesincorrectly can sometimes result in obvious visual artifacts.In exchange for these benefits, high-order techniques have several disadvan-tages. They are often not as stable, or the stability analysis is much more difficult.The algorithms are a lot more complicated and difficult to implement. At very lowresolution, high-order methods are a lot more computationally expensive and mayproduce low-accuracy results. Finally, high-order schemes are not as flexible. It isdifficult to write a modular system in which components can be added and removedwithout disrupting the high-order convergence. The weakest link in the chain ruinsthe global convergence rate.High-order methods are far from ubiquitous in computer graphics, but theyare making in-roads in some sub-problems. Second-order accurate boundary con-ditions in fluid problems have become common (e.g. [7, 18, 74]), because theyshow drastically improved results (reduced artifacts) relative to first-order tech-niques. In solving the advection problem, higher-order schemes (e.g. BFECC [64]and MacCormack [130]) have also gained a lot of traction, as have higher-order in-terpolation (e.g. cubic [79] and CIP [103]) schemes during advection. These tech-niques significantly reduce the numerical diffusion during advection, relative tothe simplest first-order semi-Lagrangian technique. The use of high-order integra-tion techniques for ODEs (e.g. Runge-Kutta methods) and spatial integration (e.g.Gaussian quadrature) are common as well.The future of high-order methods in computer graphics remains to be seen.Given their potential advantages, and the trends in hardware architectures, it seemsplausible that they will see increasing adoption.97Bibliography[1] CGAL, Computational Geometry Algorithms Library. http://www.cgal.org.→ pages 121[2] O. Aberth. Precise Numerical Methods Using C++. Academic Press,1998. ISBN 0120417502. → pages 64[3] B. Adams, M. Pauly, R. Keiser, and L. J. Guibas. Adaptively sampledparticle fluids. ACM Transactions on Graphics, 26(3), July 2007. ISSN0730-0301. doi:10.1145/1276377.1276437. → pages 16[4] F. Alted. Why modern CPUs are starving and what can be done about it.Computing in Science & Engineering, 12(2):68–71, 2010. → pages 96[5] S. Andersen and L. Andersen. Analysis of spatial interpolation in thematerial-point method. Computers & Structures, 88(7):506–518, 2010.doi:10.1016/j.compstruc.2010.01.004. → pages 9[6] R. Ando, N. Thu¨rey, and R. Tsuruno. Preserving fluid sheets withadaptively sampled anisotropic particles. Visualization and ComputerGraphics, IEEE Transactions on, 18(8):1202–1214, 2012. ISSN1077-2626. doi:10.1109/TVCG.2012.87. → pages 10, 16[7] R. Ando, N. Thu¨rey, and C. Wojtan. Highly adaptive liquid simulations ontetrahedral meshes. ACM Transactions on Graphics, July 2013. → pages16, 17, 97[8] R. Ando, N. Thu¨rey, and C. Wojtan. A dimension-reduced pressure solverfor liquid simulations. EUROGRAPHICS 2015, 2015. → pages 94[9] D. Arnold, F. Brezzi, B. Cockburn, and L. Marini. Unified analysis ofdiscontinuous Galerkin methods for elliptic problems. SIAM Journal onNumerical Analysis, 39(5):1749–1779, 2002.doi:10.1137/S0036142901384162. → pages 10, 18, 68, 8998[10] U. M. Ascher and L. R. Petzold. Computer methods for ordinarydifferential equations and differential-algebraic equations, volume 61.Siam, 1998. → pages 46[11] I. Babuska and B. Guo. The h, p and hp version of the finite elementmethod basic theory and applications. Technical report, 1992. → pages 17[12] I. Babusˇka and M. Zla´mal. Nonconforming elements in the finite elementmethod with penalty. SIAM Journal on Numerical Analysis, 10:863–875,1973. doi:10.1137/0710071. → pages 89[13] S. Bardenhagen and E. Kober. The generalized interpolation material pointmethod. Computer Modeling in Engineering and Sciences, 5(6):477–496,2004. doi:10.3970/cmes.2004.005.477. → pages 8[14] A. W. Bargteil, C. Wojtan, J. K. Hodgins, and G. Turk. A finite elementmethod for animating large viscoplastic flow. ACM Transactions onGraphics, 26(3), 2007. → pages 131[15] F. Bassi, S. Rebay, G. Mariotti, S. Pedinotti, and M. Savini. A high-orderaccurate discontinuous finite element method for inviscid and viscousturbomachinery flows. In Proceedings of 2nd European Conference onTurbomachinery, Fluid Dynamics and Thermodynamics, pages 99–108,1997. → pages 11[16] F. Bassi, L. Botti, A. Colombo, D. A. D. Pietro, and P. Tesini. On theflexibility of agglomeration based physical space discontinuous Galerkindiscretizations. Journal of Computational Physics, 231(1):45 – 65, 2012.ISSN 0021-9991. doi:10.1016/j.jcp.2011.08.018. → pages 70[17] C. Batty, S. Xenos, and B. Houston. Tetrahedral embedded boundarymethods for accurate and flexible adaptive fluids. In Proceedings ofEurographics, 2010. → pages 17[18] C. Batty, S. Xenos, and B. Houston. Tetrahedral embedded boundarymethods for accurate and flexible adaptive fluids. In Computer GraphicsForum, volume 29, pages 695–704. Wiley Online Library, 2010.doi:10.1111/j.1467-8659.2009.01639.x. → pages 97[19] R. Blaheta. A multilevel method with overcorrection by aggregation forsolving discrete elliptic problems. Journal of Computational and AppliedMathematics, 24(1-2):227–239, Nov. 1988. ISSN 0377-0427.doi:10.1016/0377-0427(88)90355-X. → pages 69, 8999[20] M. Bojsen-Hansen and C. Wojtan. Liquid surface tracking with errorcompensation. ACM Transactions on Graphics, 32(4):79:1–79:10, 2013.→ pages 13, 17, 94[21] M. Borrel and J. Ryan. The elastoplast discontinuous Galerkin (EDG)method for the Navier–Stokes equations. Journal of ComputationalPhysics, 231(1):1–22, 2012. doi:10.1016/j.jcp.2011.08.005. → pages 11[22] J. P. Boyd. Chebyshev and Fourier spectral methods. Courier DoverPublications, 2001. → pages 17[23] L. Boyd and R. Bridson. MultiFLIP for energetic two-phase fluidsimulation. ACM Transactions on Graphics, 31(2):16, 2012.doi:10.1145/2159516.2159522. → pages 10[24] J. Brackbill and H. Ruppel. FLIP: A method for adaptively zoned,particle-in-cell calculations of fluid flows in two dimensions. Journal ofComputational Physics, 65(2):314–343, 1986.doi:10.1016/0021-9991(86)90211-1. → pages 7[25] J. Brackbill, D. Kothe, and H. Ruppel. FLIP: a low-dissipation,particle-in-cell method for fluid flow. Computer Physics Communications,48(1):25–38, 1988. doi:10.1016/0010-4655(88)90020-3. → pages 7[26] D. Braess. Towards algebraic multigrid for elliptic problems of secondorder. Computing, 55(4):379–393, Oct. 1995. ISSN 0010-485X.doi:10.1007/BF02238488. → pages 69, 89[27] S. C. Brenner and R. Scott. The mathematical theory of finite elementmethods, third edition, volume 15. Springer, 2008. → pages 76, 77[28] M. Brezina, A. J. Cleary, R. D. Falgout, V. E. Henson, J. E. Jones, T. A.Manteuffel, S. F. Mccormick, and J. W. Ruge. Algebraic multigrid basedon element interpolation (AMGe). SIAM Journal on Scientific Computing,22:1570–1592, 2001. doi:10.1137/S1064827598344303. → pages 68[29] F. Brezzi, G. Manzini, D. Marini, P. Pietra, and A. Russo. Discontinuousfinite elements for diffusion problems. Atti Convegno in onore di F.Brioschi (Milano 1997), Istituto Lombardo, Accademia di Scienze eLettere, pages 197–217, 1999. → pages 11[30] R. Bridson. Fluid Simulation for Computer Graphics. A K Peters/CRCPress, Sept. 2008. ISBN 1568813260. → pages 16100[31] R. Bridson, R. Fedkiw, and J. Anderson. Robust treatment of collisions,contact and friction for cloth animation. ACM Transactions on Graphics,21(3):594–603, 2002. → pages 118, 129, 130[32] W. L. Briggs, S. F. McCormick, et al. A multigrid tutorial, volume 72.Siam, 2000. → pages 67[33] T. Brochu and R. Bridson. Robust topological operations for dynamicexplicit surfaces. SIAM Journal on Scientific Computing, 31(4):2472–2493,2009. doi:10.1137/080737617. → pages 13, 16, 33, 119, 132[34] T. Brochu and R. Bridson. Numerically robust continuous collisiondetection for dynamic explicit surfaces. Technical Report TR-2009-03,University of British Columbia, 2009. → pages 119, 131[35] T. Brochu, C. Batty, and R. Bridson. Matching fluid simulation elements tosurface geometry and topology. ACM Transactions on Graphics, 29(4):1–9, 2010. ISSN 0730-0301. doi:10.1145/1778765.1778784. → pages 13,17, 34[36] T. Brochu, E. Edwards, and R. Bridson. Efficient geometrically exactcontinuous collision detection. ACM Transactions on Graphics, 31(4):96:1–96:7, July 2012. ISSN 0730-0301. doi:10.1145/2185520.2185592.→ pages iii, 55, 57, 116[37] H. Bro¨nnimann, C. Burnikel, and S. Pion. Interval arithmetic yieldsefficient dynamic filters for computational geometry. Discrete AppliedMathematics, 109:25–47, 2001. → pages 128[38] D. L. Brown, R. Cortez, and M. L. Minion. Accurate projection methodsfor the incompressible Navier–Stokes equations. Journal of ComputationalPhysics, 168(2):464 – 499, 2001. ISSN 0021-9991.doi:10.1006/jcph.2001.6715. → pages 50[39] D. Burgess, D. Sulsky, and J. Brackbill. Mass matrix formulation of theFLIP particle-in-cell method. Journal of Computational Physics, 103(1):1–15, 1992. doi:10.1016/0021-9991(92)90323-Q. → pages 7[40] J. Burghardt, R. Brannon, and J. Guilkey. A nonlocal plasticity formulationfor the material point method. Computer Methods in Applied Mechanicsand Engineering, 225–228:55–64, 2012. doi:10.1016/j.cma.2012.03.007.→ pages 8101[41] R. Bustinza. A unified analysis of the local discontinuous Galerkin methodfor a class of nonlinear problems. Applied Numerical Mathematics, 56(10):1293–1306, 2006. doi:10.1016/j.apnum.2006.03.015. → pages 11[42] R. Bustinza and G. Gatica. A mixed local discontinuous Galerkin methodfor a class of nonlinear problems in fluid mechanics. Journal ofComputational Physics, 207(2):427–456, 2005.doi:10.1016/j.jcp.2005.01.017. → pages 11[43] X.-C. Cai and M. Sarkis. A restricted additive Schwarz preconditioner forgeneral sparse linear systems. SIAM Journal on Scientific Computing, 21:792–797, 1999. doi:10.1137/S106482759732678X. → pages 70[44] J. Carrero, B. Cockburn, and D. Scho¨tzau. Hybridized globallydivergence-free LDG methods. part I: The Stokes problem. Mathematics ofComputation, 75(254):533–564, 2006. → pages 11[45] C. Carvalho. The gap between processor and memory speeds. In Proc. ofIEEE International Conference on Control and Automation, 2002. →pages 96[46] P. Castillo. A review of the local discontinuous Galerkin (LDG) methodapplied to elliptic problems. Applied Numerical Mathematics, 56(10):1307–1313, 2006. doi:10.1016/j.apnum.2006.03.016. → pages 11, 18[47] P. Castillo, B. Cockburn, I. Perugia, and D. Scho¨tzau. An a priori erroranalysis of the local discontinuous Galerkin method for elliptic problems.SIAM Journal on Numerical Analysis, 38(5):1676–1706, 2000.doi:10.1016/j.jcp.2006.07.029. → pages 11, 24[48] N. Chentanez and M. Mu¨ller. Real-time Eulerian water simulation using arestricted tall cell grid. ACM Transactions on Graphics, 30(4):82, 2011.doi:10.1145/1964921.1964977. → pages 17[49] N. Chentanez, B. E. Feldman, F. Labelle, J. F. O’Brien, and J. R.Shewchuk. Liquid simulation on lattice-based tetrahedral meshes. InProceedings of the 2007 ACM SIGGRAPH/Eurographics Symposium onComputer Animation, SCA ’07, pages 219–228, Aire-la-Ville, Switzerland,Switzerland, 2007. Eurographics Association. ISBN 978-1-59593-624-0.→ pages 17[50] B. Cockburn and J. Gopalakrishnan. The derivation of hybridizablediscontinuous Galerkin methods for Stokes flow. SIAM Journal on102Numerical Analysis, 47(2):1092–1125, 2009. doi:10.1137/080726653. →pages 11[51] B. Cockburn and C. Shu. The local discontinuous Galerkin method fortime-dependent convection-diffusion systems. SIAM Journal on NumericalAnalysis, 35(6):2440–2463, 1998. → pages 11[52] B. Cockburn, G. Kanschat, D. Scho¨tzau, and C. Schwab. Localdiscontinuous Galerkin methods for the Stokes system. SIAM Journal onNumerical Analysis, 40(1):319–343, 2002.doi:10.1137/S0036142900380121. → pages 11[53] B. Cockburn, G. Kanschat, and D. Schotzau. The local discontinuousGalerkin method for the Oseen equations. Mathematics of Computation, 73(246):569–594, 2004. → pages 11[54] B. Cockburn, G. Kanschat, and D. Schotzau. A locally conservative LDGmethod for the incompressible Navier-Stokes equations. Mathematics ofComputation, 74(251):1067–1096, 2005. → pages 11[55] B. Cockburn, G. Kanschat, and D. Scho¨tzau. The local discontinuousGalerkin method for linearized incompressible fluid flow: a review.Computers & Fluids, 34(4):491–506, 2005.doi:10.1016/j.compfluid.2003.08.005. → pages 11, 18, 22[56] B. Cockburn, D. Scho¨tzau, and J. Wang. Discontinuous Galerkin methodsfor incompressible elastic materials. Computer Methods in AppliedMechanics and Engineering, 195(25):3184–3204, 2006.doi:10.1016/j.cma.2005.07.003. → pages 11[57] B. Cockburn, G. Kanschat, and D. Scho¨tzau. A note on discontinuousGalerkin divergence-free solutions of the navier–stokes equations. Journalof Scientific Computing, 31(1):61–73, 2007.doi:10.1007/s10915-006-9107-7. → pages 11[58] B. Cockburn, J. Gopalakrishnan, N. Cuong Nguyen, J. Peraire, andF. Sayas. Analysis of HDG methods for Stokes flow. Mathematics ofComputation, 80(274):723, 2010. → pages 11[59] S. Cummins and J. Brackbill. An implicit particle-in-cell method forgranular materials. Journal of Computational Physics, 180(2):506–548,2002. doi:10.1006/jcph.2002.7101. → pages 8103[60] T. J. Dekker. A floating-point technique for extending the availableprecision. Numerische Mathematik, 18:224–242, 1971.doi:10.1007/BF01397083. → pages 121[61] E. G. D. do Carmo and A. V. C. Duarte. A discontinuous finite-elementbased domain decomposition method. Computer Methods in AppliedMechanics and Engineering, 190(8):825–843, November 2000.doi:10.1016/S0045-7825(00)00216-4. → pages 70[62] J. Douglas and T. Dupont. Interior penalty procedures for elliptic andparabolic Galerkin methods. In R. Glowinski and J. Lions, editors,Computing Methods in Applied Sciences, volume 58 of Lecture Notes inPhysics, pages 207–216. Springer Berlin Heidelberg, 1976.doi:10.1007/BFb0120591. → pages 11[63] M. Dryja, J. Galvis, and M. Sarkis. BDDC methods for discontinuousGalerkin discretization of elliptic problems. Journal of Complexity, 23:715–739, 2007. doi:10.1016/j.jco.2007.02.003. → pages 70[64] T. F. Dupont and Y. Liu. Back and forth error compensation and correctionmethods for removing errors induced by uneven gradients of the level setfunction. Journal of Computational Physics, 190(1):311–324, 2003.doi:10.1016/S0021-9991(03)00276-6. → pages 97[65] H. Edelsbrunner and E. P. Mu¨cke. Simulation of simplicity: A technique tocope with degenerate cases in geometric algorithms. ACM Transactions onGraphics, 9:66–104, 1990. → pages 59[66] E. Edwards. A high-order accurate particle-in-cell method. Master’s thesis,University of British Columbia, 2010. → pages iii[67] E. Edwards and R. Bridson. A high-order accurate particle-in-cell method.International Journal for Numerical Methods in Engineering, 90(9):1073–1088, 2012. doi:10.1002/nme.3356. → pages iii, 9, 16, 42, 47, 51,54, 95[68] E. Edwards and R. Bridson. Detailed water with coarse grids: Combiningsurface meshes and adaptive discontinuous Galerkin. ACM Transactions onGraphics, 33(4):136:1–136:9, July 2014. doi:10.1145/2601097.2601167.→ pages iii[69] E. Edwards and R. Bridson. The discretely-discontinuous Galerkin coarsegrid for domain decomposition. arXiv preprint arXiv:1504.00907, 2015.→ pages iii104[70] E. Efstathiou and M. Gander. Why restricted additive Schwarz convergesfaster than additive Schwarz. BIT Numerical Mathematics, 43:945–959,2003. doi:10.1023/B:BITN.0000014563.33622.1d. → pages 70[71] J. Eggers. The subtle dynamics of liquid sheets. Journal of FluidMechanics, 672:1, 2011. → pages 39[72] R. E. English, L. Qiu, Y. Yu, and R. Fedkiw. Chimera grids for watersimulation. In Proceedings of the 12th ACM SIGGRAPH/EurographicsSymposium on Computer Animation, SCA ’13, pages 85–94, New York,NY, USA, 2013. ACM. ISBN 978-1-4503-2132-7.doi:10.1145/2485895.2485897. → pages 17[73] D. Enright, R. Fedkiw, J. Ferziger, and I. Mitchell. A hybrid particle levelset method for improved interface capturing. Journal of ComputationalPhysics, 183(1):83–116, 2002. → pages 16[74] D. Enright, D. Nguyen, F. Gibou, and R. Fedkiw. Using the particle levelset method and a second order accurate pressure boundary condition forfree surface flows. In ASME/JSME 2003 4th Joint Fluids SummerEngineering Conference, pages 337–342. American Society of MechanicalEngineers, 2003. doi:10.1115/FEDSM2003-45144. → pages 97[75] C. Ericson. Real-Time Collision Detection (The Morgan Kaufmann Seriesin Interactive 3-D Technology). Morgan Kaufmann Publishers Inc., SanFrancisco, CA, USA, 2004. ISBN 1558607323. → pages 120[76] O. Etzmuss, M. Keckeisen, and W. Strasser. A fast finite element solutionfor cloth modelling. In Proc. 11th Pacific Conference on ComputerGraphics and Applications, pages 244 – 251, Oct. 2003.doi:10.1109/PCCGA.2003.1238266. → pages 131[77] M. Evans, F. Harlow, and E. Bromberg. The particle-in-cell method forhydrodynamic calculations. Technical report, DTIC Document, 1957. →pages 7[78] C. Farhat and F.-X. Roux. A method of finite element tearing andinterconnecting and its parallel solution algorithm. International Journalfor Numerical Methods in Engineering, 32(6):1205–1227, 1991. ISSN1097-0207. doi:10.1002/nme.1620320604. → pages 96[79] R. Fedkiw, J. Stam, and H. W. Jensen. Visual simulation of smoke. InE. Fiume, editor, Proceedings of SIGGRAPH 2001, Computer Graphics105Proceedings, Annual Conference Series, pages 15–22. ACM, ACM Press /ACM SIGGRAPH, 2001. doi:10.1145/383259.383260. → pages 97[80] K. Fidkowski and D. Darmofal. A triangular cut-cell adaptive method forhigh-order discretizations of the compressible Navier–Stokes equations.Journal of Computational Physics, 225(2):1653–1672, 2007.doi:10.1016/j.jcp.2007.02.007. → pages 20[81] N. Foster and D. Metaxas. Realistic animation of liquids. GraphicalModels and Image Processing, 58(5):471–483, 1996. → pages 16[82] F. Goualard. Fast and correct SIMD algorithms for interval arithmetic. InProceedings of PARA 2008, Lecture Notes in Computer Science,Trondheim, 2010. Springer. 10 pages. → pages 128[83] C. Greif, D. Li, D. Scho¨tzau, and X. Wei. A mixed finite element methodwith exactly divergence-free velocities for incompressiblemagnetohydrodynamics. Computer Methods in Applied Mechanics andEngineering, 199(45):2840–2855, 2010. doi:10.1016/j.cma.2010.05.007.→ pages 11[84] J. Guermond, P. Minev, and J. Shen. An overview of projection methodsfor incompressible flows. Computer Methods in Applied Mechanics andEngineering, 195(44):6011–6045, 2006. ISSN 0045-7825.doi:10.1016/j.cma.2005.10.010. → pages 50[85] P. Guigue and O. Devillers. Fast ray-triangle intersection test usingorientation predicates. Journal of Graphics Tools, 8(1):addendum, 2003.→ pages 125[86] J. Guilkey and J. Weiss. An implicit time integration strategy for use withthe material point method. In Proceedings from the First MIT Conferenceon Computational Fluid and Solid Mechanics, 2001. → pages 8[87] J. Guilkey and J. Weiss. Implicit time integration for the material pointmethod: Quantitative and algorithmic comparisons with the finite elementmethod. International Journal for Numerical Methods in Engineering, 57(9):1323–1338, 2003. doi:10.1002/nme.729. → pages 8[88] F. Harlow. The particle-in-cell computing method for fluid dynamics.Methods in Computational Physics, 3:319–343, 1964. → pages 7106[89] F. Harlow. Numerical methods for fluid dynamics, an annotatedbibliography. Technical report, Los Alamos Scientific Lab., N. Mex., 1969.→ pages 7[90] F. Harlow, M. Evans, and D. Harris Jr. The particle-in-cell method fortwo-dimensional hydrodynamic problems. Technical report, Los AlamosScientific Lab., N. Mex., 1956. → pages 7[91] F. Harlow, D. Dickman, D. Harris Jr, and R. Martin. Two-dimensionalhydrodynamic calculations. Technical report, DTIC Document, 1959. →pages 7[92] F. H. Harlow. Hydrodynamic problems involving large fluid distortions.Journal of the ACM, 4(2):137–142, Apr. 1957. ISSN 0004-5411.doi:10.1145/320868.320871. → pages 7[93] D. Harmon, E. Vouga, R. Tamstorf, and E. Grinspun. Robust treatment ofsimultaneous collisions. ACM Transactions on Graphics, 27(3):1–4, 2008.ISSN 0730-0301. → pages 129[94] G. Irving, E. Guendelman, F. Losasso, and R. Fedkiw. Efficient simulationof large bodies of water by coupling two and three dimensional techniques.ACM Transactions on Graphics, 25(3):805–811, 2006. → pages 17[95] G. Jacobs and J. Hesthaven. High-order nodal discontinuous Galerkinparticle-in-cell method on unstructured grids. Journal of ComputationalPhysics, 214(1):96–121, 2006. doi:10.1016/j.jcp.2005.09.008. → pages 9[96] G. Jacobs and J. Hesthaven. Implicit–explicit time integration of ahigh-order particle-in-cell method with hyperbolic divergence cleaning.Computer Physics Communications, 180(10):1760–1767, 2009.doi:10.1016/j.cpc.2009.05.020. → pages 9[97] G. Kanschat. Preconditioning methods for local discontinuous Galerkindiscretizations. SIAM Journal on Scientific Computing, 25(3):815–831,2003. doi:10.1137/S1064827502410657. → pages 11[98] G. Kanschat. Multilevel methods for discontinuous Galerkin FEM onlocally refined meshes. Computers & Structures, 82(28):2437–2445, 2004.doi:10.1016/j.compstruc.2004.04.015. → pages[99] G. Kanschat. Block preconditioners for LDG discretizations of linearincompressible flow problems. Journal of Scientific Computing, 22(1):371–384, 2005. doi:10.1007/s10915-004-4144-6. → pages 11107[100] G. Kanschat. Divergence-free discontinuous Galerkin schemes for theStokes equations and the MAC scheme. International Journal forNumerical Methods in Fluids, 56(7):941–950, 2008. doi:10.1002/fld.1566.→ pages 11[101] G. E. Karniadakis and S. J. Sherwin. Spectral/hp element methods forCFD. Oxford University Press, 1999. → pages 17[102] P. Kaufmann, S. Martin, M. Botsch, and M. Gross. Flexible simulation ofdeformable models using discontinuous Galerkin FEM. Graphical Models,71(4):153–167, 2009. ISSN 1524-0703. doi:10.1016/j.gmod.2009.02.002.Special Issue of ACM SIGGRAPH / Eurographics Symp. Comp. Anim.2008. → pages 18, 20[103] D. Kim, O.-y. Song, and H.-S. Ko. A semi-Lagrangian CIP fluid solverwithout dimensional splitting. In Computer Graphics Forum, volume 27,pages 467–475. Wiley Online Library, 2008.doi:10.1111/j.1467-8659.2008.01144.x. → pages 97[104] B. Lambov. Interval arithmetic using SSE-2. In P. Hertling, C. M.Hoffmann, W. Luther, and N. Revol, editors, Reliable Implementation ofReal Number Algorithms: Theory and Practice, volume 5045 of LectureNotes in Computer Science, pages 102–113, 2006. → pages 128[105] G. Lapenta and J. Brackbill. Control of the number of particles in fluid andMHD particle in cell methods. Computer Physics Communications, 87(1):139–154, 1995. doi:10.1016/0010-4655(94)00180-A. → pages 10[106] F. Li, J. Pan, and C. Sinka. Modelling adhesive contact between fineparticles using material point method. Mechanics of Materials, 43(3):157–167, 2011. doi:10.1016/j.mechmat.2011.02.004. → pages 8[107] F. Li, J. Pan, and C. Sinka. Modelling brittle impact failure of disc particlesusing material point method. International Journal of Impact Engineering,38(7):653–660, 2011. doi:10.1016/j.ijimpeng.2011.02.004. → pages 8[108] L. Li and V. Volkov. Cloth animation with adaptively refined meshes. InProc. 28th Australasian Conference on Computer Science - Volume 38,ACSC ’05, pages 107–113, Darlinghurst, Australia, Australia, 2005.Australian Computer Society, Inc. ISBN 1-920-68220-1. URLhttp://portal.acm.org/citation.cfm?id=1082161.1082173. → pages 131108[109] F. Losasso, F. Gibou, and R. Fedkiw. Simulating water and smoke with anoctree data structure. ACM Transactions on Graphics, 23:457–462, 2004.→ pages 17[110] H. Luo, L. Luo, R. Nourgaliev, V. Mousseau, and N. Dinh. A reconstructeddiscontinuous Galerkin method for the compressible Navier–Stokesequations on arbitrary grids. Journal of Computational Physics, 229(19):6961–6978, 2010. doi:10.1016/j.jcp.2010.05.033. → pages 11[111] J. Mandel. Balancing domain decomposition. Communications inNumerical Methods in Engineering, 9(3):233–241, 1993.doi:10.1002/cnm.1640090307. → pages 96[112] C. Mast, P. Mackenzie-Helnwein, P. Arduino, G. Miller, and W. Shin.Mitigating kinematic locking in the material point method. Journal ofComputational Physics, 231:53515373, 2012.doi:10.1016/j.jcp.2012.04.032. → pages 8, 9[113] M. Misztal and J. Bærentzen. Topology adaptive interface tracking usingthe deformable simplicial complex. ACM Transactions on Graphics, 31(3),2012. doi:10.1145/2167076.2167082. → pages 13, 17[114] M. Misztal, R. Bridson, K. Erleben, J. Bærentzen, and F. Anton.Optimization-based fluid simulation on unstructured meshes. InProceedings of the 7th Workshop on Virtual Reality Interaction andPhysical Simulation (VRIPHYS 2010), 2010.doi:10.2312/PE/vriphys/vriphys10/011-020. → pages[115] M. Misztal, K. Erleben, A. Bargteil, J. Fursund, B. Christensen,J. Bærentzen, and R. Bridson. Multiphase flow of immiscible fluids onunstructured moving meshes. In Eurographics/ACM SIGGRAPHSymposium on Computer Animation, pages 97–106. The EurographicsAssociation, 2012. doi:10.2312/SCA/SCA12/097-106. → pages 13, 17[116] L. Moresi, F. Dufour, and H.-B. Mhlhaus. A Lagrangian integration pointfinite element method for large deformation modeling of viscoelasticgeomaterials. Journal of Computational Physics, 184(2):476 – 497, 2003.ISSN 0021-9991. doi:10.1016/S0021-9991(02)00031-1. → pages 16[117] M. Mu¨ller. Fast and robust tracking of fluid surfaces. In Proceedings of the2009 ACM SIGGRAPH/Eurographics Symposium on Computer Animation,SCA ’09, pages 237–245, New York, NY, USA, 2009. ACM. ISBN978-1-60558-610-6. doi:10.1145/1599470.1599501. → pages 12, 17109[118] D. O’Regan, Y. Je Cho, and Y.-Q. Chen. Topological Degree Theory andApplications. Chapman and Hall/CRC, 2006. ISBN 158488648X. →pages 64, 123[119] S. Osher and R. Fedkiw. Level set methods and dynamic implicit surfaces,volume 153. Springer, 2003. → pages 16[120] F. Pellegrini and J. Roman. SCOTCH: A software package for staticmapping by dual recursive bipartitioning of process and architecturegraphs. In High-Performance Computing and Networking, volume 1067,pages 493–498. Springer, 1996. doi:10.1007/3-540-61142-8 588. → pages72[121] J. Peraire and P.-O. Persson. The compact discontinuous Galerkin (CDG)method for elliptic problems. SIAM Journal on Scientific Computing, 30(4):1806–1824, 2008. doi:10.1137/070685518. → pages 11[122] D. M. Priest. Algorithms for arbitrary precision floating point arithmetic.In Proceedings of the 10th Symposium on Computer Arithmetic, pages132–145. IEEE Computer Society Press, 1991. → pages 121[123] X. Provot. Collision and self-collision handling in cloth model dedicated todesign garments. In Graphics interface, volume 97, pages 177–189.Eurographics Association, 1997. → pages 56, 118, 129[124] R. Qin and L. Krivodonova. A discontinuous Galerkin method forsolutions of the Euler equations on Cartesian grids with embeddedgeometries. Journal of Computational Science, 2012.doi:10.1016/j.jocs.2012.03.008. → pages 20[125] S. D. Ramsey, K. Potter, and C. Hansen. Ray bilinear patch intersections.Journal of Graphics Tools, 9(3):41–47, 2004. → pages 120[126] B. Riviere, M. Wheeler, and V. Girault. Improved energy estimates forinterior penalty, constrained and discontinuous Galerkin methods forelliptic problems. part i. Computational Geosciences, 3(3):337–360, 1999.doi:10.1023/A:1011591328604. → pages 11[127] M. Sala. Analysis of two-level domain decomposition preconditionersbased on aggregation. ESAIM: Mathematical Modelling and NumericalAnalysis, 38:765–780, 9 2004. ISSN 1290-3841.doi:10.1051/m2an:2004038. → pages 69, 70, 80110[128] M. Sala, J. N. Shadid, and R. S. Tuminaro. An improved convergencebound for aggregation-based domain decomposition preconditioners. SIAMJournal on Matrix Analysis and Applications, 27(3):744–756, July 2005.ISSN 0895-4798. doi:10.1137/040617455. → pages 69[129] C. Schwab. p-and hp-finite element methods: Theory and applications insolid and fluid mechanics. Clarendon Press Oxford, 1998. → pages 17[130] A. Selle, R. Fedkiw, B. Kim, Y. Liu, and J. Rossignac. An unconditionallystable MacCormack method. Journal of Scientific Computing, 35(2-3):350–371, 2008. doi:10.1007/s10915-007-9166-4. → pages 97[131] A. Selle, M. Lentine, and R. Fedkiw. A mass spring model for hairsimulation. ACM Transactions on Graphics, 27:64:1–64:11, 2008. ISSN0730-0301. doi:10.1145/1360612.1360663. → pages 118[132] L. Shao, X. Feng, and Y. He. The local discontinuous Galerkin finiteelement method for Burger’s equation. Mathematical and ComputerModelling, 54(11):2943–2954, 2011. doi:10.1016/j.mcm.2011.07.016. →pages 11[133] J. R. Shewchuk. Robust adaptive floating-point geometric predicates. InProceedings of the Twelfth Annual Symposium on ComputationalGeometry, pages 141–150. Association for Computing Machinery, May1996. → pages 121[134] J. R. Shewchuk. What is a good linear element? Interpolation,conditioning, and quality measures. In In 11th International MeshingRoundtable, pages 115–126, 2002. → pages 61[135] D. Silvester and A. Wathen. Fast iterative solution of stabilised Stokessystems part II: using general block preconditioners. SIAM Journal onNumerical Analysis, 31(5):1352–1367, 1994. → pages 50[136] B. Smith, P. Bjørstad, and W. Gropp. Domain Decomposition - ParallelMultilevel Methods for Elliptic Partial Differential Equations. CambridgeUniversity Press, 1996. → pages 4, 67[137] B. Solenthaler and M. Gross. Two-scale particle simulation. ACMTransactions on Graphics, 30(4):81:1–81:8, July 2011. ISSN 0730-0301.doi:10.1145/2010324.1964976. → pages 16[138] J. Stam. Nucleus: Towards a unified dynamics solver for computergraphics. In IEEE CADCG, pages 1–11, 2009. → pages 119111[139] M. Steffen, R. Kirby, and M. Berzins. Analysis and reduction of quadratureerrors in the material point method (MPM). International Journal forNumerical Methods in Engineering, 76(6):922–948, 2008.doi:10.1002/nme.2360. → pages 8, 9[140] M. Steffen, R. Kirby, and M. Berzins. Decoupling and balancing of spaceand time errors in the material point method (MPM). International Journalfor Numerical Methods in Engineering, 82(10):1207–1243, 2010.doi:10.1002/nme.2787. → pages 8[141] A. Stomakhin, C. Schroeder, L. Chai, J. Teran, and A. Selle. A materialpoint method for snow simulation. ACM Transactions on Graphics, 32(4):102, 2013. doi:10.1145/2461912.2461948. → pages 10, 95[142] A. Stomakhin, C. Schroeder, C. Jiang, L. Chai, J. Teran, and A. Selle.Augmented MPM for phase-change and varied materials. ACMTransactions on Graphics, 33(4):138, 2014.doi:10.1145/2601097.2601176. → pages 10, 95[143] K. Stu¨ben. A review of algebraic multigrid. Journal of Computational andApplied Mathematics, 128(12):281 – 309, 2001.doi:10.1016/S0377-0427(00)00516-1. Numerical Analysis 2000. Vol. VII:Partial Differential Equations. → pages 69[144] D. Sulsky and A. Kaul. Implicit dynamics in the material-point method.Computer Methods in Applied Mechanics and Engineering, 193(12):1137–1170, 2004. doi:10.1016/j.cma.2003.12.011. → pages 8[145] D. Sulsky, Z. Chen, and H. Schreyer. A particle method forhistory-dependent materials. Computer Methods in Applied Mechanics andEngineering, 118(1):179–196, 1994. doi:10.1016/0045-7825(94)90112-0.→ pages 8[146] D. Sulsky, S. Zhou, and H. Schreyer. Application of a particle-in-cellmethod to solid mechanics. Computer Physics Communications, 87(1):236–252, 1995. doi:10.1016/0010-4655(94)00170-7. → pages 8[147] H. Tan and J. Nairn. Hierarchical, adaptive, material point method fordynamic energy release rate calculations. Computer Methods in AppliedMechanics and Engineering, 191(19):2123–2137, 2002.doi:10.1016/S0045-7825(01)00377-2. → pages 10112[148] M. Tang, Y. J. Kim, and D. Manocha. Continuous collision detection fornon-rigid contact computations using local advancement. Robotics andAutomation (ICRA), 2010 IEEE International Conference on, pages4016–4021, 2010. → pages 119[149] M. Tang, D. Manocha, and R. Tong. Fast continuous collision detectionusing deforming non-penetration filters. In Proc. ACM SIGGRAPH Symp.Interactive 3D Graphics and Games, pages 7–13, 2010. ISBN978-1-60558-939-8. doi:10.1145/1730804.1730806. → pages 120[150] M. Tang, R. Tong, Z. Wang, and D. Manocha. Fast and exact continuouscollision detection with Bernstein sign classification. ACM Transactions onGraphics, 33(6):186:1–186:8, Nov. 2014. ISSN 0730-0301.doi:10.1145/2661229.2661237. → pages 56, 95[151] V. Taylor and B. Nour-omid. A study of the factorization fill-in for aparallel implementation of the finite element method. International Journalfor Numerical Methods in Engineering, 37:3809–3823, 1994. → pages 81[152] N. Thu¨rey, C. Wojtan, M. Gross, and G. Turk. A multiscale approach tomesh-based surface tension flows. ACM Transactions on Graphics, 29(4):48:1–48:10, 2010. doi:10.1145/1833349.1778785. → pages 12, 17, 35[153] A. Toselli and O. Widlund. Domain Decomposition Methods - Algorithmsand Theory, volume 34 of Springer Series in Computational Mathematics.Springer, 2004. → pages 4, 67[154] D. Tskhakaya, K. Matyash, R. Schneider, and F. Taccogna. Theparticle-in-cell method. Contributions to Plasma Physics, 47(8-9):563–594, 2007. doi:10.1002/ctpp.200710072. → pages 9[155] P. Vaneˇk, J. Mandel, and M. Brezina. Algebraic multigrid by smoothedaggregation for second and fourth order elliptic problems. Computing, 56(3):179–196, 1996. → pages 68, 69, 71, 80[156] P. Vaneˇk, M. Brezina, J. Mandel, et al. Convergence of algebraic multigridbased on smoothed aggregation. Numerische Mathematik, 88(3):559–579,2001. → pages 80[157] J. Villard and H. Borouchaki. Adaptive meshing for cloth animation.Engineering with Computers, 20:333–341, August 2005. ISSN 0177-0667.doi:10.1007/s00366-005-0302-1. → pages 131113[158] P. Wallstedt and J. Guilkey. Improved velocity projection for the materialpoint method. Computer Modeling in Engineering and Sciences, 19(3):223, 2007. → pages 9[159] P. Wallstedt and J. Guilkey. An evaluation of explicit time integrationschemes for use with the generalized interpolation material point method.Journal of Computational Physics, 227(22):9628–9642, 2008.doi:10.1016/j.jcp.2008.07.019. → pages 8[160] P. Wallstedt and J. Guilkey. A weighted least squares particle-in-cellmethod for solid mechanics. International Journal for Numerical Methodsin Engineering, 85(13):1687–1704, 2011. doi:10.1002/nme.3041. →pages 9[161] H. Wang. Defending continuous collision detection against errors. ACMTransactions on Graphics, 33(4):122:1–122:10, July 2014. ISSN0730-0301. doi:10.1145/2601097.2601114. → pages 56, 95[162] Y. Wang, H. Beom, M. Sun, and S. Lin. Numerical simulation of explosivewelding using the material point method. International Journal of ImpactEngineering, 38(1):51–60, 2011. doi:10.1016/j.ijimpeng.2010.08.003. →pages 8[163] T. Warburton and M. Embree. The role of the penalty in the localdiscontinuous Galerkin method for Maxwells eigenvalue problem.Computer Methods in Applied Mechanics and Engineering, 195(25):3205–3223, 2006. doi:10.1016/j.cma.2005.06.011. → pages 11[164] M. Wicke, D. Ritchie, B. Klingner, S. Burke, J. Shewchuk, and J. O’Brien.Dynamic local remeshing for elastoplastic simulation. ACM Transactionson Graphics, 29(3), 2010. → pages 131[165] C. Wojtan and G. Turk. Fast viscoelastic behavior with thin features. ACMTransactions on Graphics, 27(3):1–8, 2008. ISSN 0730-0301.doi:10.1145/1360612.1360646. → pages 12, 17, 131[166] C. Wojtan, N. Thu¨rey, M. Gross, and G. Turk. Deforming meshes that splitand merge. ACM Transactions on Graphics, 28(3):76:1–76:10, 2009.doi:10.1145/1576246.1531382. → pages 12[167] C. Wojtan, N. Thu¨rey, M. Gross, and G. Turk. Physics-inspired topologychanges for thin fluid features. ACM Transactions on Graphics, 29(4):1–8,2010. ISSN 0730-0301. doi:10.1145/1778765.1778787. → pages 12, 17114[168] C. Wojtan, M. Mu¨ller-Fischer, and T. Brochu. Liquid simulation withmesh-based surface tracking. In ACM SIGGRAPH 2011 Courses,SIGGRAPH ’11, pages 8:1–8:84, New York, NY, USA, 2011. ACM, ACM.ISBN 978-1-4503-0967-7. doi:10.1145/2037636.2037644. → pages 16[169] J. Yan and S. Osher. A local discontinuous Galerkin method for directlysolving Hamilton–Jacobi equations. Journal of Computational Physics,230(1):232–244, 2011. doi:10.1016/j.jcp.2010.09.022. → pages 11[170] C. Yap. Robust geometric computation. In J. E. Goodman andJ. O’Rourke, editors, CRC Handbook of Computational and DiscreteGeometry, chapter 41. CRC Press LLC, 2 edition, 2004. → pages 116, 121[171] L. Yuan and C. Shu. Discontinuous Galerkin method based onnon-polynomial approximation spaces. Journal of Computational Physics,218(1):295–323, 2006. doi:10.1016/j.jcp.2006.02.013. → pages 11[172] D. Zhang, X. Ma, and P. Giguere. Material point method enhanced bymodified gradient of shape function. Journal of Computational Physics,230(16):6379–6398, 2011. doi:10.1016/j.jcp.2011.04.032. → pages 9[173] X. Zhang, S. Redon, M. Lee, and Y. J. Kim. Continuous collision detectionfor articulated models using Taylor models and temporal culling. ACMTransactions on Graphics, 26(3):15, 2007. → pages 119[174] Z. Zheng and L. Petzold. Runge–Kutta–Chebyshev projection method.Journal of Computational Physics, 219(2):976–991, 2006. → pages 50[175] Y. Zhu and R. Bridson. Animating sand as a fluid. ACM Transactions onGraphics, 24(3):965–972, 2005. doi:10.1145/1186822.1073298. → pages9, 16, 28115Appendix AEfficient Geometrically ExactContinuous Collision DetectionThis appendix is a copy of Brochu et al. [36], with minor reformatting for inclusionhere. This is the full paper for the approach outlined in Chapter 5. The content ofChapter 5 is the proof of correctness referred to in this paper.A.1 IntroductionWe consider continuous collision detection (CCD) as the process of detecting ifa mesh moving between initial and final configurations comes into contact withitself at any point in time. CCD is a critical element of many algorithms for phys-ical simulation, when a non-intersecting mesh invariant must be maintained, andfor path planning, when the feasibility of a path must be guaranteed. Beyond ef-ficiency, a good CCD algorithm should therefore be safe: no false negatives, i.e.missed collisions, can be tolerated. It should also be accurate in the sense of mini-mizing the number of false positives, i.e. non-collisions being flagged as collisions,for the effectiveness and efficiency of algorithms using CCD. This paper provides(1) the first CCD algorithm to guarantee safety and accuracy despite using roundedfloating-point arithmetic, under the paradigm of Exact Geometric Computation de-scribed by Yap [170]: we compute the correct Boolean answer (collision or not) asif exact arithmetic were used. As part of our CCD algorithm we also present (2)116Figure A.1: Four layers of cloth folding over a spinning ball, with on-the-fly adaptive remeshing driven by curvature, provide an example wheregeometrically exact continuous collision detection has advantages overprevious techniques.a new, efficient, geometrically exact ray vs. bilinear patch intersection parity test,which can be used to precisely determine if a point is inside a quad-mesh-boundedvolume or not. We also introduce (3) a new adaptive cloth simulation simulationmethod which maintains intersection-free meshes despite remeshing, as an exam-ple of the practical advantage of geometrically exact CCD, and show there is noperformance penalty for using geometrically exact CCD.We restrict our attention to triangle meshes in 3D, with an intersection-freeinitial configuration, so CCD can be reduced to two primitive tests: does a mov-ing point hit a moving triangle, or does a moving edge hit another moving edge?Note that we ignore the connectedness of the mesh: multiple meshes are treated aslumped together, so there is no difference between inter-object collision and self-collision. We further assume that vertices move with constant velocity during thetime step and that triangles are linearly interpolated between their vertices at inter-mediate times. If a collision does happen, we do not require its precise time and117location: simple approximations detailed below are quite adequate for collisionresolution.A.2 Related WorkA.2.1 The Cubic Solver ApproachThe most popular current method for continuous collision detection for trianglemeshes was introduced by Provot [123]. First a cubic equation is solved to de-termine coplanarity times, then the interpolated geometry is checked for overlapat these times to determine if a collision actually occurs. Bridson et al. [31] sig-nificantly reduced the number of false negatives due to floating-point error by in-troducing error tolerances in the root-finding algorithm used to solve the cubicequation and using a static distance query at the coplanarity times: collisions arereported if the mesh elements are within some small distance of each other.However, the minimum error tolerances required for safe CCD are difficult topredict in advance. Especially in cases where the primitives remain nearly copla-nar for the entire step, such as hair segments [131] sliding against each other onskin, cancellation error in simply computing the coefficients of the cubic can elim-inate almost all precision in the rest of the calculation. (Of course, in constantlycoplanar cases, the method breaks down entirely.) Even if the cubic is representedexactly, its roots are in general irrational and must be rounded to floating-pointnumbers. Completing the error analysis with further bounds on the constructionof the intermediate geometry at the rounded coplanarity time, bounds on the cal-culated barycentric coordinates of closest points, and then bounds on the distanceappears intractable. In practice, a usable tolerance can typically be found by trialand error for a large class of similar simulations (e.g. cloth animations), but differ-ent applications such as adaptive cloth, hair, or liquid surface tracking can requireenervating per-simulation adjustment, which makes writing a general purpose li-brary especially tricky.The cubic approach naturally gives false positives if the tolerance is high enoughto work. If the tolerance is too high (a definite possibility if restricted to single pre-cision arithmetic, for example) this can seriously slow down or even completely118stymie collision resolution: tuning the tolerance for a new simulation isn’t alwayseasy.A fully symbolic implementation could in principle resolve the above prob-lems, apart from the degenerate constantly coplanar case, but the computationaloverhead would be drastic. In this paper we show a different approach to CCD canbe fully safe and accurate without need for tuning, yet run just as fast.A.2.2 Other Work in Continuous Collision DetectionStam [138] extended the cubic solver approach to explicitly test if two mesh el-ements approach closer than a given distance during the time step, resulting in asixth degree polynomial to solve for potential collision times. This helps to resolvethe coplanar-motion degeneracy mentioned above, but poses an even less tractablerounding error analysis problem for safe CCD, suffers from the same false-positiveissues, and is a heavier burden computationally.Alternative methods for computing the time of possible collisions, such as con-servative local advancement [148] offer potential speed-ups over the cubic solverapproach, but don’t robustly deal with rounding error, relying on user-set tolerancesto account for slight non-planarities in intersection/proximity testing.The constant vertex velocity model underlying this paper and the cubic solverapproach is perhaps the most natural for general deformable motions. However, forrigid bodies, constant linear and angular velocity of the entire model makes moresense — though the helical trajectories of vertices are somewhat more difficult tohandle. Zhang et al. [173] demonstrate significant acceleration of conservative ad-vancement using the Taylor model generalization of interval arithmetic, but againrely on user-set tolerances to cope with the inexact solve and rounding error.Brochu & Bridson [33, 34] suggest using a simplicial space-time mesh tomodel the motion of the mesh, reducing CCD to simplex intersection tests in fourdimensions. While these tests could be computed exactly with known determinant-based predicates, this approximation leads to an unintuitive model for the meshgeometry at intermediate times: mesh edges develop kinks and triangles developfolds; normals do not vary continuously over time. This precludes the use of CCDculling techniques which assume the geometry is linearly interpolated at intermedi-119ate times [149]. More importantly, the unusual model of motion causes unintuitiveand undesired collisions. For example, two close but parallel triangles can movetogether with no collisions in the standard model, but their non-standard model cancrease the triangles in an inconsistent way, causing a hard-to-resolve collision.Raytracing can be seen as a special case of CCD, generally easier since thegeometry is static relative to the “motion” of the light ray. We highlight Ramsey etal.’s ray-bilinear patch test [125] as particularly relevant, as our 3D CCD test in factrelies on ray-bilinear patch intersection parity tests. However, our new approach isgeometrically exact and fully robust, unlike Ramsey et al.’s constructive approachwhich is vulnerable to rounding error; on the other hand our test only provides theparity of the number of intersections, not their location.The related problem of culling collision tests is very well studied in computergraphics. Several approaches using bounding volume hierarchies have been pro-posed, as well as culling using bounding boxes with regular grids, sweep-and-prune testing, and sweeping-plane testing: see Ericson’s book for example [75].We observe that except for axis-aligned methods which only use comparisons (noarithmetic), the culling literature generally does not worry about verifiably han-dling rounding error. We briefly address this issue later, but emphasize our focusis the correctness of the core element vs. element test, not the efficiency of broaderculling methods.A.2.3 Exact Geometric ComputationThe cubic solver approach is an example of a constructive geometric algorithm, inthat intermediate geometric quantities are computed (such as planarity times andinterpolated positions) and used in a sequence of calculations. In this, as in manyother geometric tests, the necessary rounding analysis to get a provably correctalgorithm (accounting for the errors in all intermediate quantities) is intractablewhile the symbolic or exact arithmetic version would be too slow (necessitatingradicals in this case).An alternative approach is to decompose a geometric test into a set of simplerpredicates, providing discrete answers such as “does a point lie to the left, to theright or on a line?” rather than continuous values. Approximate continuous values120may be computed alongside, of course, but the discrete correctness of the algorithmas a whole relies only on the correctness of the discrete answers from the predi-cates. Several approaches to defining and implementing correct predicates exist;the most successful is the paradigm of Exact Geometric Computation (EGC). Werecommend Yap’s article as an excellent review of the topic [170], and the CGALproject for examples of applications and ongoing research [1]. In brief, a geomet-rically exact predicate must return the same discrete answer as if computed withexact arithmetic (from the floating point input) even if under the hood it takes afaster approach. Our method is the first geometrically exact CCD test for generalCCD, but exact predicates for other problems have long been used in graphics andelsewhere.Building on previous work by Dekker [60] and Priest [122], Shewchuk pre-sented practical systems for exactly evaluating a number of geometric predicatesneeded for Delaunay mesh generation [133]. These sign-of-determinant predicatesare equally useful for detecting self-intersections for triangle meshes. When higherprecision than provided by floating point hardware is required, the system usesfloating-point expansions (the sum of a sequence of floats) leveraging fast floating-point hardware even for exact arithmetic.In this paper we decompose CCD into a set of simplex intersection tests, basedon the same standard sign-of-determinant tests, together with the evaluation of thesign of a simple polynomial function. No radicals or even divisions are required,making it straightforward to implement exactly using expansion arithmetic likePriest and Shewchuk. Furthermore, through the use of fast interval arithmetic fil-ters, we can rapidly find the provably correct signs without need for high precisionexpansions in all but the most extreme cases, leading to highly efficient executionon average.A.3 Continuous Collision Detection in 3DIn 3D CCD, there are two fundamental collisions tests: point-triangle and segment-segment. The input to each are the location of the vertices at the beginning and endof the time step. We denote the location in space of vertex i at the beginning of thetime step as xi, and its location at the end of the time step as xˆi. Then for each test,121we are given 8 points: x0, x1, x2, x3, xˆ0, xˆ1, xˆ2, and xˆ3. For convenience, we willnormalize the time step so that t ∈ [0,1]. The constant velocity model of motiongives the location of a vertex at an intermediate time as xi(t) = (1− t)xi+ txˆi.First consider the point-triangle test. In the following we will index the movingvertex with 0, and the triangle vertices as 1, 2, and 3. Any point on the triangle canbe written as: x(u,v) = (1− u− v)x1 + ux2 + vx3, where x1, x2, and x3 are thetriangle corners, and u,v ∈ [0,1] with u+v≤ 1. The vector between a point on themoving triangle defined by the coordinates (u,v) and the other vertex, at time t,can then be written as:F(t,u,v) =x0(t)−[(1−u− v)x1(t)+ux2(t)+ vx3(t)]=(1− t)x0+ txˆ0− (1−u− v)((1− t)x1+ txˆ1)−u((1− t)x2+ txˆ2)− v((1− t)x3+ txˆ3).This is a tri-affine function which is zero precisely when the vertex lies on the trian-gle. The domain for the point-triangle test is thereforeΩ= [0,1]×{u,v≥ 0 | u+ v≤ 1},a triangular prism.A similar tri-affine function can be defined for the segment-segment collisiontest, the vector between the point at fraction u ∈ [0,1] along one segment and thepoint at fraction v ∈ [0,1] along the other, at time t ∈ [0,1]. The domain is then[0,1]3, the unit cube.CCD then amounts to discovering if such a function has a root in the domain,a point in Ω which F maps to 0. We make an important simplification: we reporta collision if there is any zero on the domain of the boundary (i.e. at the initial orfinal time, or at any edge or endpoint of the geometry) or if there is an odd numberof roots in the interior. We justify ignoring the case of a nonzero even numberof interior roots by noting that an edge-triangle intersection cannot be introducedin the mesh if the total number of collisions between the edge and triangle haseven parity. Likewise a vertex cannot enter or exit a closed mesh without eithercolliding with an edge or colliding with the triangles an odd number of times, and122therefore with at least one triangle an odd number of times, so our method cannotmiss essential collisions in this category either; see the supplemental material formore discussion.A.3.1 Determining Root ParityWrite the image of Ω under F as F(Ω) = {y | y = F(x),x ∈Ω}, and similarlyF(Γ) = {y | y = F(x),x ∈ Γ} for the image of the domain boundary Γ= ∂Ω.If F were smooth and one-to-one, determining if 0 ∈ F(Ω) could be done bycounting the number of crossing of a ray to infinity from 0 through the boundaryimage F(Γ): an odd number of crossings indicates a root by the usual Jordan-Brouwer Theorem argument. However, our F may not be one-to-one: the imageof Ω can “fold over itself”. The more general Brouwer topological degree theory[118] can be applied in this case. It is the parity of the ray crossings with F(Γ)that gives us the parity of the number of roots: the sum of an odd number ofintersections for each separate root leads to an odd total if and only if there is anodd number of roots — with the proviso that if 0 ∈ F(Γ), i.e. we have a root on thedomain boundary, we always report a collision. More formally:Root Parity Lemma. Suppose Ω⊂ Rn is an n-polytope.Suppose F :Ω 7→Rn is C2, has p <∞ roots in Ω, has no roots on Γ= ∂Ω, andhas non-singular Jacobian at each root.Suppose R is a ray from 0 to infinity. Call any point x ∈ Γ such that F(x) ∈ Ra crossing point, then the crossing number q is the number of crossing points.Suppose that F(Γ) is smooth at the image of any crossing points, that the ray is nottangent to F(Γ) at any these points, and that q < ∞.Then, p≡ q mod 2.We offer a sketch of a proof of the lemma, and that it applies to the particularfunctions we need for CCD, in the supplemental material.This lemma also describes our algorithm. We report a collision if the image ofthe boundary F(Γ) passes through 0, and otherwise cast a ray from 0 in an arbitrarydirection, and then count the number of crossings of the ray though F(Γ), choosinga different direction and trying again if any crossings are tangent or lie on corners.123Figure A.2: Root parity test. In this case there are no roots in the domain, sothe origin is outside of F(Ω).Figure A.3: Root parity test with one and two roots in the domain. A ray castfrom the origin will have odd and even parity, respectively.Figures A.2 and A.3 illustrate this approach for the 2D case, showing cases wherewe have zero, one, and two roots in the domain.We transform the boundary of these domains (cube or triangular prism) by thecorresponding function F to get a generalized hexahedron or prism, and test forray crossings on each of their faces. The hexahedron has potentially non-planarbilinear patches for faces (the restriction of the tri-affine function to a face of thedomain is bi-affine), and the prism is composed of three bilinear patches and two124triangles. Computing ray-triangle crossings can be done with exact arithmetic —however, we know of no prior practical method for quickly and exactly computingthe crossings of a ray through a bilinear patches, or even the parity of the number ofcrossings which is all we need. We thus introduce an efficient method for exactlycomputing this parity.A.3.2 Ray-Bilinear-Patch Crossing Parity TestingWe first define a continuous scalar function φ(x) which is positive if x is on oneside of the patch and negative on the other side, and to permit exact evaluationwith floating-point expansions define it using only multiplication, addition, andsubtraction — see Section A.7 for the derivation.Next consider the tetrahedron spanned by the four corners of the bilinear patch.It is composed of two pairs of triangles, one pair corresponding to each side of thebilinear patch. For the “positive” triangle pair, any point x on either triangle hasthe property that φ(x)≥ 0, and vice versa for the “negative” pair of triangles. Forthe test, we consider two cases depending on whether the ray origin 0 lies insidethe tetrahedron or not — which can be determined directly from standard sign-of-determinant “orientation” predicates with the tetrahedron’s triangular faces.If the ray origin 0 lies inside the tetrahedron, we can determine the sign ofφ(0) and replace the ray-patch test with two ray-triangle tests, using the trianglescorresponding to the opposite sign of φ(0). Ray-triangle intersection can also bebroken down into determinant predicates [85]. If there is an intersection betweenthe ray and either triangle, then the ray must also pass once through the bilinearpatch.If instead the ray origin lies outside of the tetrahedron, we can use either set oftriangles as an equivalent proxy for the bilinear patch. The parity of the number ofintersections between the ray and the triangle pair matches the parity of the numberof intersections between the ray and the bilinear patch.Pseudocode for the test is given in algorithm 2; Figure A.4 illustrates the 2Danalog. Since it relies only on determinants and the evaluation of φ , i.e. just mul-tiplication and addition/subtraction, the test can be evaluated using floating-pointexpansions to give the geometrically exact result.125Algorithm 2 Ray-bilinear-patch crossing parityGiven: Ray origin 0, direction R, and a bilinear patch.Form the tetrahedron from the bilinear patch corner vertices.Let F+1 , F+2 be the tetrahedron faces where φ ≥ 0.Let F−1 , F−2 be the tetrahedron faces where φ ≤ 0.if 0 is inside the tetrahedron thenif φ(0)> 0 thenreturn intersect( 0, R, F−1 ) ∨ intersect( 0, R, F−2 )elsereturn intersect( 0, R, F+1 ) ∨ intersect( 0, R, F+2 )else. Use either pair of triangles return intersect( 0, R, F+1 ) XOR intersect( 0,R, F+2 )Figure A.4: A 2D analog of the ray-vs-bilinear-patch parity test. Rays A andB have origins on the “negative” side of the patch, and so we test againstthe proxy geometry on the “positive” side. Ray A intersects both thepatch and the proxy geometry, while B intersects neither. Rays C andD have origins outside the bounding simplex, and so can be tested witheither proxy geometry.A.3.3 Putting it TogetherWe now have the tools we need for determining the intersection parity of a rayversus a set of bilinear patches and triangles. For segment-segment CCD, our al-gorithm runs 6 ray-vs-patch tests for the faces of the hexahedron, and for point-triangle CCD, we run 3 ray-vs-patch and 2 ray-vs-triangle tests for the faces of the126triangular prism, and determine the parity of the total number of intersections. Ifwe have an odd parity, we know there is an odd number of roots in the domain, andso we must flag this as a collision. Algorithm 3 shows the point-vs-triangle test(the segment-vs-segment test is analogous).There are a few special cases we must watch for. If the origin lies exactly ona patch or a triangle (i.e. there is a root on the boundary of the domain), then wereport the collision and skip the ray testing. This includes, for example, the fullydegenerate cases of exactly planar motion (such as two edges sliding into eachother along a flat surface) that entirely defeats the cubic solver. In our simulationsthis type of collision is vanishingly rare unless artificially induced.We must also take care if the ray hits an edge shared between two patches, be-tween two triangles (acting as proxy geometry in the ray-vs-patch test), or betweena patch and a triangle. If this occurs, we will see two positive ray intersection tests.In the context of inside-outside testing, this may or may not be correct (see Fig-ure A.5). Fortunately, since we are using exact arithmetic, we can precisely detectthese degenerate cases (one barycentric coordinate will be exactly zero). In sucha case, we simply choose a new ray direction at random and run the test again.Again, in our testing this happens only extraordinarily rarely.Algorithm 3 Point-triangle collision testGiven: corner vertices of the domain, XCreate ray (0,R) with arbitrary direction RS← 0for i = 1→ 3 doForm bilinear patch i with appropriate F(X)pi← intersection parity of ray (0,R) vs bilinear patch iS← S+ pifor j = 1→ 2 doForm triangle j with appropriate F(X)if Ray (0,R) intersects triangle j thenS← S+1return S≡ 1 (mod 2)127Figure A.5: Degenerate ray intersections. Two rays, each hitting two seg-ments at their common endpoint. If we are testing each segment in-dividually, then in both cases the parity of ray intersections is even.Here the parity of intersection count cannot determine whether pointsA and B are inside the quadrilateral. Perturbing the rays slightly wouldproduce the correct results in both cases.A.4 ImplementationFast implementation of exact intersection testing is crucial for making our approachpractical. Computing intersections with expansion arithmetic is expensive, so weuse a filter: we evaluate the determinants and φ first with interval arithmetic, onlyswitching to exact expansions when the sign is indeterminate (the final intervalcontains zero). See Bro¨nnimann et al. for the case of determinants [37].To avoid repeatedly switching the rounding mode during interval arithmetic,we use the standard “opposite trick”, storing the negative of the lower bound anddefining new operations that rely on one rounding direction only. We have alsoexperimented with using SIMD vectors to store the intervals and using vector in-trinsics for arithmetic operations [82, 104], but found that our implementation ofthis strategy was not significantly faster in practice than simply operating on twodoubles.128Collision test culling is also critical for efficiency. We compute the axis-alignedbounding box (AABB) of each moving edge, triangle and vertex and only run col-lision detection when AABBs overlap, accelerating this test with a regular back-ground grid. We further cull tests by checking if, for any of several non-axis normaldirections, there is a plane separating the origin from the transformed hexahedronor prism. Our implementation of this plane test uses interval arithmetic for robust-ness: only when all vertices are definitely on the negative or positive side of a plane(no interval contains zero), do we consider the plane to be a separating plane. Thisrelatively inexpensive plane-based testing eliminates 99% of the tests, considerablyimproving performance.A.4.1 Resolving CollisionsWe implemented our new algorithm using interval filtered floating-point expansionarithmetic, providing practical, provably robust CCD code for deforming mesheswithout any user-tuned tolerances. However, at this point we can only guaranteecollision detection: this says nothing about resolving these collisions, i.e. findinga physically consistent adjustment to the final configuration of the mesh that elim-inates all collisions. Indeed, taking into account that the final positions are quan-tized to a finite number of bits of precision, provably robust but ideally physicalcollision resolution may involve the solution of a rather daunting large-scale inte-ger programming problem. As an example of the complications involved, even justtransformation of an intersection-free mesh with a rotation matrix can potentiallycreate self-intersection once the results are rounded again.We use the velocity filtering approach initiated by Provot [123] and extendedby Bridson et al. [31] and Harmon et al. [93]. First repulsion forces are applied toproximal mesh elements, followed by several sweeps of CCD, applying individualimpulses when collisions are detected. If there remain unresolved collisions afterthese sweeps, we gather overlapping collisions into “impact zones” and solve forthe set of impulses which will resolve all collisions in the zone simultaneously. Ifthis system is degenerate, we compute a single rigid motion for the vertices of theoffending impact zone, ensuring no collision (modulo the quantization issue men-tioned above). This was referred to as “rigid impact zones” in Bridson’s original129paper [31].While those previous works used the normal at the time of collision (typicallyfrom the triangle or from the cross-product of edges), we have found this is nota crucial choice. Interpolating the geometry at t = 0.5 and computing the normalfrom the vector between the closest points on the two mesh elements has workedequally well in our experiments, and is more computationally efficient.A.5 ExamplesWe tested our new CCD routines in a standard mass-spring cloth simulator with aninitially curved sheet of cloth of resolution of 40×400 vertices, dropped on a solidground plane. As shown in Figure A.6, this results in a large number of collisionsas the cloth stacks up on itself.Figure A.6: CCD stress test.Although mass-spring systems are popular due to their simplicity and ease ofimplementation, implementers of cloth animation systems have been turning toincreasingly sophisticated models in recent years. For example, the Finite Element130Method (FEM) can achieve accurate results and is less dependent on the meshstructure than using edge-based springs [76].In the related sub-field of simulating volumetric elastic solids for graphics, it isbecoming increasingly popular to perform on-the-fly optimization of the volumet-ric simulation mesh [14, 164, 165]. The benefits of remeshing include reducingerror for highly-sheared elements, and concentrating computational effort where itis needed to resolve small-scale details. For cloth, an additional important bene-fit of remeshing is to increase vertex density in regions of high curvature, so thatcurved regions can be accurately represented without having to globally refine thesurface mesh.A few authors have suggested refining the simulation elements for cloth [108,157], however, the idea has not been as popular for cloth as it has for solid elasticity.One reason for the lack of uptake is the difficulty in dealing with collisions. For ex-ample, adding and removing vertices without introducing self-intersections is a ma-jor concern if continuous collision detection assumes that the mesh is intersection-free at the beginning of each time step. Adding and removing vertices and alteringthe triangulation at discrete times compounds the difficulty in choosing suitablecollision and intersection error tolerances.Armed with our parameterless collision detection system, we demonstrate acomplete FEM simulator with on-the-fly, adaptive remeshing. We use linear elas-ticity with rotated finite elements, as described by Etzmuß et al. [76], and simpleedge crossover springs for bending forces. We choose simple edge splitting, flip-ping, and collapsing as our mesh optimization operations, and make them all colli-sion safe, using CCD on “pseudo-trajectories”, as described by Brochu & Bridson[34]. To increase the vertex density in high-curvature areas, we scale the measurededge lengths by local curvature estimates when deciding to collapse or split edges.(We note that these operations are perhaps not as suitable for cloth simulation asa regular subdivision scheme, but it is a reasonable proxy for examining the chal-lenges faced when maintaining intersection-free adaptive surfaces.) Figure A.7shows a frame from a simulation with a single piece of cloth, and the underlyingrest-state mesh. We also show a more challenging CCD scenario, with severallayers of cloth draped over a solid sphere (Figure A.1).All of our tests were performed on a single core of a 2.7 GHz Intel i5 processor131Figure A.7: Cloth with an adaptive simulation meshwith 4GB of RAM. We integrated our new CCD algorithm into the open-sourceEl Topo surface tracking library [33], as it provides an intersection-free remeshingframework, suitable for adaptive cloth simulation, and provides an implementationof the cubic-solver based CCD approach for comparison. To test our new CCD, wesimply substituted El Topo’s inexact CCD and intersection testing functions withour new implementation.The average time spent per call to CCD, not including culling based on AABBcomparisons, but including plane-based culling, was approximately 614 nanosec-onds for segment-segment testing, and 439 ns for point-triangle testing. By com-parison, the average time in El Topo’s cubic solver CCD implementation (againnot including AABB culling) was 649 ns and 659 ns.Counting only tests which were positive (exercising the entire path of ourcode), the average time per call was 19 microseconds for segment-segment, and15 µs for point-triangle, compared to 1.2 µs for both tests with the cubic solverCCD. This indicates that without culling, our new algorithm is more expensivethan the cubic-solver version, as expected, but also that our culling is very effec-tive.132A.6 ConclusionsWe have presented a novel approach to continuous collision detection, constructedfrom a set of predicates which can be evaluated exactly. We have shown that thecollision detection problem can be rephrased as determining whether a functionhas an odd number of roots in a given domain. We then showed how this problemcan be reduced to testing a ray from the origin against the image of the domainboundary and counting the parity of the crossings. This in turn reduces to a set ofray-vs-triangle and ray-vs-bilinear-patch tests, built from determinant and φ signevaluations.Our implementation uses a floating-point filter approach for efficiency — firstdetermining if the correct Boolean result can be determined using interval arith-metic, and only using floating-point expansions for exact evaluation if required.We demonstrated the utility of our approach with a challenging test case: simu-lation of cloth undergoing on-the-fly remeshing with a large amount of contact.To our knowledge, this is the first time an adaptive cloth simulation scheme hasbeen presented which explicitly deals with the challenges of continuous collisiondetection.There are several avenues of future work. While irrelevant for the simula-tions we presented, being able to distinguish zero from a positive even number ofroots could be critical for other applications. Our approach should extend naturallyfrom multi-affine functions to testing intersections with higher-degree polynomialpatches. Rigid body motion is more naturally expressed in terms of screw mo-tions; though the intermediate positions involve trigonometric functions of time,reparametrization similar to the NURBS approach to conic sections would lead toa multivariate polynomial problem amenable to this attack.A.7 Implicit Function for a Bilinear PatchWe define the following multivariate polynomial φ(x)where the indices 0 to 3 referto the patch corner vertices:φ(x) = h12(x)−h03(x).133The two h-functions are designed to be zero on the straight edges of the patch viaproducts of plane g-functions for the various subsets of three vertices:h12(x) = g012(x)g132(x)h03(x) = g013(x)g032(x)gpqr(x) = (x−xp) · (xq−xp)× (xr−xp).We claim that the zero level set of φ(x) contains the bilinear patch. To con-firm this is indeed the function we seek, take an arbitrary point x with barycentriccoordinates α , β , γ , and δ w.r.t. the corners of the patch:x = αx0+βx1+ γx2+δx3.Recall that the barycentric coordinates of a point with respect to a tetrahedron areproportional to the signed volumes of the tetrahedra formed by the point and eachof the triangular faces. Letting V be six times the signed volume of the tetrahedronspanning the corners of the patch, observe that:g132(x) = αV g032(x) = βVg013(x) = γV g012(x) = δVTherefore our function evaluates to:φ(x) = δαV 2− γβV 2.Assuming V 6= 0, this is zero if and only if αδ = βγ , which occurs preciselyfor the parameterized bilinear surface:α = (1− s)(1− t) β = (1− s)tγ = s(1− t) δ = st.Moreover, it is clear that φ(x) changes sign across the zero level set, dividing spaceinto positive and negative regions separated by the conic containing the bilinearpatch.134This construction breaks down if V = 0. However this is the case when thepatch is perfectly flat, where we can simply replace the entire ray-patch intersectiontest with two ray-triangle tests.A.8 AcknowledgmentsThis work was supported in part by a grant from the Natural Sciences and Engi-neering Research Council of Canada. We would also like to thank Eitan Grinspunand our anonymous reviewers for helpful discussion and comments.135
- Library Home /
- Search Collections /
- Open Collections /
- Browse Collections /
- UBC Theses and Dissertations /
- Simulating water for computer graphics : particle-in-cell,...
Open Collections
UBC Theses and Dissertations
Featured Collection
UBC Theses and Dissertations
Simulating water for computer graphics : particle-in-cell, explicit surfaces, and discontinuous Galerkin Edwards, Essex 2015
pdf
Page Metadata
Item Metadata
Title | Simulating water for computer graphics : particle-in-cell, explicit surfaces, and discontinuous Galerkin |
Creator |
Edwards, Essex |
Publisher | University of British Columbia |
Date Issued | 2015 |
Description | We propose several advances in the simulation of fluids for computer graphics. We concentrate on particle-in-cell methods and related sub-problems. We develop high-order accurate extensions to particle-in-cell methods demonstrated on a variety of equations, including constrained dynamics with implicit-explicit time integration. We track the liquid-air interface with an explicit mesh, which we show how to do in a provably exact fashion. To address the mismatched simulation and surface resolution, we solve the partial differential equations in each time step with with a p-adaptive discontinuous Galerkin discretization. This allows us to use a coarse regular grid for the entire simulation. For solving the resulting linear system, we propose a novel mostly-algebraic domain decomposition preconditioner that automatically creates a coarse discontinuous Galerkin approximation of the problem. |
Genre |
Thesis/Dissertation |
Type |
Text |
Language | eng |
Date Available | 2016-01-07 |
Provider | Vancouver : University of British Columbia Library |
Rights | Attribution-NoDerivs 2.5 Canada |
DOI | 10.14288/1.0223154 |
URI | http://hdl.handle.net/2429/56270 |
Degree |
Doctor of Philosophy - PhD |
Program |
Computer Science |
Affiliation |
Science, Faculty of Computer Science, Department of |
Degree Grantor | University of British Columbia |
GraduationDate | 2016-02 |
Campus |
UBCV |
Scholarly Level | Graduate |
Rights URI | http://creativecommons.org/licenses/by-nd/2.5/ca/ |
AggregatedSourceRepository | DSpace |
Download
- Media
- 24-ubc_2016_february_edwards_essex.pdf [ 20.56MB ]
- Metadata
- JSON: 24-1.0223154.json
- JSON-LD: 24-1.0223154-ld.json
- RDF/XML (Pretty): 24-1.0223154-rdf.xml
- RDF/JSON: 24-1.0223154-rdf.json
- Turtle: 24-1.0223154-turtle.txt
- N-Triples: 24-1.0223154-rdf-ntriples.txt
- Original Record: 24-1.0223154-source.json
- Full Text
- 24-1.0223154-fulltext.txt
- Citation
- 24-1.0223154.ris
Full Text
Cite
Citation Scheme:
Usage Statistics
Share
Embed
Customize your widget with the following options, then copy and paste the code below into the HTML
of your page to embed this item in your website.
<div id="ubcOpenCollectionsWidgetDisplay">
<script id="ubcOpenCollectionsWidget"
src="{[{embed.src}]}"
data-item="{[{embed.item}]}"
data-collection="{[{embed.collection}]}"
data-metadata="{[{embed.showMetadata}]}"
data-width="{[{embed.width}]}"
async >
</script>
</div>
Our image viewer uses the IIIF 2.0 standard.
To load this item in other compatible viewers, use this url:
https://iiif.library.ubc.ca/presentation/dsp.24.1-0223154/manifest