SIMULATING WAVE SCATTERING PROBLEMS BYPSEUDOSPECTRAL TIME-MARCHING ON SUPERCOMPUTERSbyYONG LUOB. Sc. (Electronic Engineering), Fudan University, 1987M. A. Sc. (Electronic Engineering), The University ofElectronic Sci. & Tech. of China, 1990A THESIS SUBMITTED IN PARTIAL FULFILLMENT OFTHE REQUIREMENTS FOR THE DEGREE OFDOCTOR OF PHILOSOPHYinTHE FACULTY OF GRADUATE STUDIESTHE DEPARTMENT OF ELECTRICAL ENGINEERINGWe accept this thesis as confonningto the required standardTHE UNIVERSITY OF BPJTISH COLUMBIASept. 1994© Yong Luo, 1994In presenting this thesis in partial fulfilment of the requirements for an advanceddegree at the University of British Columbia, I agree that the Library shall make itfreely available for reference and study. I further agree that permission for ectensivecopying of this thesis for scholarly purposes may be granted by the head of mydepartment or by his or her representatives. It is understood that copying orpublication of this thesis for financial gain shall not be allowed without my writtenpermission.(Signature)Department of_____________The University of British ColumbiaVancouver, CanadaDateDE-6 (2/88)AbstractIn this thesis, a new time-marching scheme for time-dependent PDEs with periodicand non-periodic boundary conditions is introduced and implemented. The new time-marching scheme is based on the polynomial interpolation of the symbolic solutionof the original PDEs. The approximation of the space derivatives is performed bypseudospectral methods. The marching of the solution in the time domain is done bythe polynomial expansion or the Newton-form polynomial interpolation, depending onthe properties of the space derivatives. The boundary conditions are properly representedby a suitable pseudospectral approximation and some technical manipulations of thecollocated operators. This technique can efficiently provide balanced spectral accuracyin both the space and time dimensions. The numerical stability and resolution are alsoimproved by the new polynomial time-marching scheme. In the periodic boundarycase, the spatial approximation generally can be done by Fourier collocation and thetime-marching sometimes can be easily implemented by Chebyshev polynomial series.In the non-periodic case, the spatial operator should be approximated by a Chebyshevcollocation, which can include different non-periodic boundary conditions through carefulmanipulations of the boundary conditions. In this case and the complicated periodic case,the time-marching has to rely on the more general Newton-form interpolation based onFejér points.Based on the new polynomial time-marching scheme, a two-dimensional, SH-seismicreflection model is simulated by full implementation of the new time-marching schemewith the approximated absorbing boundary conditions. Some scattering phenomena suchas diffraction are illustrated through the visualization.The simulation of the physical model is accomplished on two supercomputers: theTMC Connection Machine CM-5 and the Fujitsu VPX24OI1O. The parallel programming(CM-Fortran on CM-5 and Fortran 77/VP on the VPX24O/lO) and optimization issueson the two supercomputers are also discussed in this thesis.11TABLE OF CONTENTSAbstract iiList of Tables ViList of Figures viiAcknowledgments ixChapter 1 Introduction 1Section 1 Numerical Methods For Time-dependent PDEs 1Section 2 Description of the Polynomial Time-marching Method 2Section 3 Thesis Outline 2Chapter 2 Pseudospectral Methods for PDEs 4Section 1 Introduction of Pseudospectral Methods 4Section 2 Pseudospectral Fourier Method 8Section 3 Pseudospectral Chebyshev Method 11Section 4 Pseudospectral Domain Decomposition Techniques 14Chapter 3 Pseudospectral Polynomial Time-Marching Techniques 17Section 1 Evolution of the Time Dependence Using ChebyshevPolynomials 17Topic 1 Algorithm 17Topic 2 Accuracy, Resolution And Convergence 19Topic 3 Second-order Case 21Section 2 Approximating The Time Dependence By ComplexPolynomials 23Topic 1 Polynomial Interpolation Algorithm 23Topic 2 Rate of Convergence 27Topic 3 Computational Considerations 29111Section 3 Numerical Results for Time-Marching in the Periodic Domain . 33Topic 1 Variable Wave Velocity and Damping in the PseudospectralTime-Marching Scheme 33Topic 2 Chebyshev polynomial expansion results 34Chapter 4 Pseudospectral Polynomial Time-marching for Non-periodicBoundary Value Problems 40Section 1 Homogeneous Dirichlet Boundary Conditions 40Section 2 Homogeneous Neumann Boundary Conditions 42Section 3 Numerical Examples 44Section 4 General Homogeneous Boundary Conditions 50Section 5 The Two-Dimensional Case 51Chapter 5 Pseudospectral Polynomial Time-marching for Non-reflectingBoundary Problems 54Section 1 Non-reflecting Boundary Condition Approximation 55Section 2 Polynomial Time-marching With Non-reflecting BoundaryConditions in One-dimensional Case 57Section 3 Spectral Accuracy Discussions of the PolynomialTime-marching Scheme in One Dimension 60Section 4 Polynomial Time-marching With Non-reflecting BoundaryConditions in Two Dimension 64Chapter 6 Simulation and Visualization of Model Wave Scattering 68Section 1 2—D Seismic Reflection Modeling 68Section 2 Numerical Results and Discussion 75Section 3 Half-space Wave Scattering 78ivChapter 7 Parallel Programming On the Connection Machine CM-5 and theFujitsu VPX24O/10 84Section 1 Fortran Programming on Fujitsu VPX24O/l0 84Section 2 CM-5 and CM-Fortran 91Section 3 Parallel Implementation of the Polynomial Time-Marching byCM-Fortran 97Chapter 8 Conclusion 100Section 1 Summary 100Section 2 Contributions 102Section 3 Future Work 103Appendix A Some Snapshot of Fourier Pseudospectral Time-marchingResults 104Appendix B Previous Study Results on the Performance of the ConnectionMachine CM-2 105Appendix C Derivation of (6.1.5) 109References 110VList of TablesTable 2.1 The Choices of Basis Functions.8Table 4.1 Numerical Stability Limitation for the PolynomialTime-marching Step Size 49Table 5.1 Time-marching Errors for the One-dimensional Wave Equationwith Homogeneous Neumann Boundary Conditions(homogeneous medium) 61Table 5.2 Time-marching Errors for the One-dimensional Wave Equationwith Homogeneous Dirichiet Boundary Conditions(homogeneous medium) 62Table 5.3 Time-marching Errors for the One-dimensional Wave Equationwith Absorbing Boundary Conditions (homogeneous medium). 63Table 5.4 Time-marching Errors for the One-dimensional Wave Equationwith Absorbing Boundary Conditions (varying medium) . . . . 63Table 7.1 VPX24O/10 Hardware Summary 86Table 7.2 2K*2K REAL*8 Matrix Multiply 90Table 7.3 Memory Access (2K*2K REAL*8 Array): two-dimensionalassignment (3 arrays) 90Table 7.4 2K*2K Double Precision Matrix Multiply CA*B by CMSSLRoutine gen_matrix_mult_noadd 96Table 7.5 Per Time Step Kernel Operation Execution Time (modeldiscussed in Section 6.1) 98Table B.1 The 128* 128 one time step performance comparison: attachedto 4K processor except otherwise described. The CM-2 usedhere has 256 Kbits/proc. distributed memory and one 64—bitsfloating-point processing unit per 32 processors. The clockfrequency of this CM-2 is 7 MHz 106viList of FiguresFigure 3.1 Conformal Mappings for the Newton-form PolynomialInterpolation 24Figure 3.2 Computation Error for the One-dimensional HomogeneousWave Equation 36Figure 3.3 Magnitude of Errors At Different Time Steps 38Figure 3.4 Magnitude of Errors For Different Time Step Sizes (at 1st timestep) 38Figure 3.5 Initial Wave Distribution Simulating an Impulse 39Figure 3.6 Wave-Velocity Distribution Simulating a Low Velocity Cylinder(The interface is created by a Rapid Velicity Transition) . . 39Figure 4.1 Homogeneous Dirichlet Initial-Boundary Value Problem N=64(time step 0.005) 47Figure 4.2 Homogeneous Dirichiet Initial-Boundary Value Problem N=31(time step 0.005) 48Figure 5.1 One-dimension Absorbing Boundary Condition NumericalResults (N=32) 60Figure 5.2 One-dimension Absorbing Boundary Condition NumericalResults (N=64) 61Figure 5.3 Two-dimension Absorbing Approximation Numerical Results(Grid points 32 in both X and Y, dt=0.01) 66Figure 6.1 Geometry of Salt Dome for Polynomial Time-marchingModeling 69Figure 6.2 Impulsive Source Distribution Time Function 73Figure 6.3 Scatter Movie Snapshots When the Source is Located atxs=3500m, zs=4000 76viiFigure 6.4 Scatter Movie Snapshots When the Source is Located atxs=3500m, zs=3200 76Figure 6.5 Scatter Movie Snapshots When the Source is Located atxs=3500m, zs=2400 77Figure 6.6 Half-space Scattering 77Figure 6.7 Wave Incidence and Reflection for a Half-Space 79Figure 6.8 Source-Receiver Geometry 79Figure 6.9 Trace Comparison Between Analytical (——) and Numerical (++)Solutions at (0, —0.094) 80Figure 6.10 Trace Comparisons Between Analytical (—) and Numerical(++) Solutions at (0.187, —0.094) and (0, -0.187) 81Figure 7.1 The Architectural Diagram of the Fujitsu VPX24O/l0 85Figure 7.2 Vectorizing a DO Loop 88Figure 7.3 Components of a Typical CM-5 System 92Figure 7.4 Components of a Processing Node with Vector Units 93Figure 7.5 Distribution of Code and Data in a CM Program 95Figure A.1 Snapshot of wave scattering described in page 37—39 . . . 104viiiAcknowledgmentsDedicated to my parentsI would like to take this opportunity to express a much deserved “thank you” to myknowledgeable and supportive thesis supervisor, Dr. Matthew J. Yedlin, for his two-fisted assistance in helping me turn a vague idea into a finished product. His goodconnections made some state-of-the-art computing facilities (CM-2 and CM-5) availableto this thesis work.My sincerest thanks to Dr. Ian Cumming, Dr. Brian Seymour and Dr. E. V. Jull, inrecognition of helpfulness and partial financial support, which made the accomplishmentof this thesis work possible.I gratefully appreciate the insightful and valuable discussions with Dr. Shenkai Zhao inGeophysics and Mr. Pin Lin in Mathematics, from both of whom I’ve been benefitedso much on some complicated mathematical topics. Special thanks also to Dr. PeterCahoon in Computer Science for his assistance in helping me make the data visualizationtape which is part of this thesis.Finally, a heartfelt note of thanks to my wonderful parents, who have supported me inevery manner possible. Without their endless love, support and care, this would neverhave been possible. I am indebted to my dearest parents. At this moment, please allowme to say:‘To the Love ané9J)evotion of!7s4y Parents!ixChapter 1 IntroductionAccurately and efficiently solving many time-dependent Partial Differential Equations(PDEs) by numerical methods enables researchers to study some complex physicalphenomena in detail. With the help of the quickly growing computational ability ofsome supercomputers, accurate simulation and visualization can be performed for manywave-field problems. On the latest supercomputers, researchers now can perform accuratetime-dependent simulations in three dimensions and study phenomena they only dreamedof studying a decade ago. Increasing demands in the numerical simulation field andthe availability of rapidly increasing computing power have greatly pushed and alsochallenged some traditional numerical algorithms.1.1 Numerical Methods For Time-dependent PDEsMajor numerical algorithm candidates in the numerical simulation field are the FiniteDifference (FD), Finite Element (FE) and spectral methods (including pseudospectralor collocation). Particularly, spectral algorithms are becoming more popular with moreaccurate simulation demands and the greater computing power of the supercomputers.The solution of partial differential equations (PDEs) by pseudospectral methods hasbeen extensively studied and comprehensively applied in various numerical analysis application areas such as Computational Fluid Dynamics (CFD), geophysics, oceanography,mechanics and electromagnetic, etc. Pseudospectral methods (or the spectral collocationmethods) are characterized by the expansion of the solutions in terms of global basisfunctions, where the expansion coefficients are computed so that the differential equationis satisfied exactly at a set of so-called collocation (or interpolation) points. The popularity of these methods arises from several advantages in computation efficiency andmuch higher accuracy over common finite difference methods. [24, 5]. Conventionalpseudospectral methods only deal with the approximation of the derivatives in space.For time evolution of the solutions, so far the various finite difference time-marchingtechniques still play a dominant role in actual applications.1.2 Description of the Polynomial Time-marching MethodA unique technique for time-marching has been proposed by Tal-Ezer since 1986[39,40, 41]. This technique is based on the polynomial approximation theory of complexmatrix polynomials and has arbitrary order accuracy for the time-marching approximation. Furthermore, this technique actually evaluates time evolution by polynomials ofthe derivatives in space. Tal-Ezer’s original algorithm is based on Fourier pseudospectral space approximation. Therefore, it can only be applied to those problems withperiodic boundaries in space. However, his Newton-form polynomial interpolation time-marching scheme can be incorporated into many other space descretization methods suchas Chebyshev pseudospectral or finite-difference. This thesis extends the Newton-forminterpolation time-marching method to Chebyshev pseudospectral space descretization incorporated with non-periodic and non-reflecting boundary conditions. With this incorporation, some experimental calculations have been performed on the one-dimensional waveequation in constant medium with homogeneous DirichletlNeumannlabsorbing boundaryconditions. The results show that the overall error converges to zero at a rate faster thansome fixed orders of finite-difference error convergence, but not at an exponential rate.1.3 Thesis OutlineIn Chapter 2 of this thesis, mathematical principles, accuracy, convergence, stability,boundary-condition considerations and computational requirements of conventional pseudospectral methods for the PDEs are briefly described. Domain decomposition techniquesrelated to the use of pseudospectral methods for elliptic equations are also introduced.Tal-Ezer’s time-marching technique is introduced in Chapter 3. Chebyshev polynomialexpansion, Newton-form polynomial interpolation and Faber polynomial expansion are2then described. Their advantages and weaknesses are also investigated. Chapter 4 introduces a new polynomial time-marching technique for non-periodic boundary valueproblems. Stability improvement and computing efficiency are emphasized. Chapter 5extends the technique in Chapter 4 to time-dependent boundary cases: non-reflectingboundary approximations. Chapter 6 gives a general analysis of applying the foregoingtechniques to simulate an SH-wave seismic reflection model. Chapter 7 focuses on thebrief introduction of the supercomputing implementation, including the Connection Machine CM-5 architecture, CM-Fortran parallel programming issues and the Fujitsu VPX240/10 issues. The last chapter, summary chapter, summarizes the motivation, researchmethods and achievements of this thesis.The major contribution of this thesis work is incorporating the non-periodic and evennon-reflecting boundary conditions into the original polynomial time-marching scheme.Thus, the applicability of the polynomial time-marching has been greatly expanded.3Chapter 2 Pseudospectral Methods for PDEsThe success of spectral methods in practical computation has led to an increasinginterest in their theoretical aspects, especially since the mid-1970s. Spectral methodsgenerally can be viewed as an extreme development of the class of discretizationschemes for differential equations known generically as the method of weighted residuals(MWR). The key elements of the MWR are trial functions (also called the expansion orapproximating functions) and test functions (also known as weight functions). Choiceof the trial functions is one of the features which distinguish spectral methods fromfinite-element and finite-difference methods. Choice of the test functions distinguishesbetween the three most commonly used spectral schemes: the Galerkin, pseudospectralor collocation, and tau methods. In this chapter, the basic theoretical principles will beintroduced and focussed mainly on the pseudospectral ones, through comparison with theFD, FE and the other spectral schemes.2.1 Introduction of Pseudospectral MethodsThe general idea of spectral methods is based on approximating U(x), the solutionof a PDE by a sum of N+l “basis functions” ç(x):[24JU(x) UN(x) = a(x) (2.1.1)The second step is to obtain equations for the discrete values UN (x) or the coefficientsa from the original equation. In the case of a differential equation, this second stepinvolves finding an approximation for a differential operator in terms of the grid pointvalues of UN(x) or , equivalently, the expansion coefficients. The goal is to choose theseries coefficients {a } in such a way that the so-called residual function is made assmall as possible. For the equationLU =f(x) (2.1.2)4where £ is the operator of a differential equation, the residual function is defined byR(x;ao,al,”,aN)LUN—f (2.1.3)The different spectral and pseudospectral methods differ mainly in their ways of minimizing R(x; ao, al,• .. , ar). In this sense, the general spectral methods fall into two broadcategories: the interpolating or pseudospectral, and the non-interpolating (or sometimesdirectly called spectral method). The pseudospectral methods associate a grid of pointswith each basis set. The coefficients of a known functionf(x) are found by requiring thatthe truncated series agrees with f(x) at each point of the grid. The solution of a differential equation is found by demanding that the coefficients a be such that the residualfunction has the propertyR(x1;ao,ai,. •,aN) = 0, i = 0,1,... ,N (2.1.4)Presumably, as R(x;a) is forced to vanish at an increasingly large number of discretepoints known as “collocation” or “interpolation” points, it will be smaller and smallerin the gaps between the collocation points so that UN(X) will converge to U(x) as Nincreases. Methods in this category are also called “orthogonal collocation” or “methodof selected points”.The “non-interpolating” methods include Galerkin’s method and the Lanczos taumethod which were developed long before the pseudospectral methods. These algorithmsare usually labelled as spectral methods. There is no grid of interpolating points associatedwith these procedures. Instead, the coefficients of a known function f(x) are computedby multiplying f(x) by a given basis function and integrating (projection method). Thepseudospectral methods are equivalent to the spectral methods if one evaluates theintegrals of the latter algorithm by means of numerical quadrature with (N÷l) points[24, 8].In [24], [8], it has been shown that the accuracy of the pseudospectral methods isonly a little bit poorer than that of the non-interpolating ones too little to outweigh the5much greater simplicity and computational efficiency of the pseudospectral algorithms.This approach is especially attractive because of the ease with which it can be applied tovariable-coefficient and non-linear problems. Consequently, this thesis will focus on thepseudospectral techniques for applications involving variable coefficients.The so-called “finite-element” methods are similar in philosophy to the spectralalgorithms. The major difference is that the finite-element (FE) techniques chop theinterval in x into a number of sub-intervals, and choose the qSn(x) to be local functionswhich are polynomials of low, fixed degree and non-zero only over neighboring sub-intervals. In contrast, the spectral methods use global basis functions in which each basisfunction is a polynomial of degree and non-zero, except for isolated points, overthe entire computational domain. To increase the accuracy, the finite element methodschop the interval into smaller pieces without changing the degree of the polynomials inthe basis. In contrast, the spectral methods raise the highest degree of the polynomialswithout subdividing the interval any more. The FE methods convert differential equationsinto matrix equations that are sparse. The spectral methods generate algebraic equationswith full matrices. As compensation, the high order of the basis functions gives highaccuracy for a given N. To model problems, when fast iterative matrix-solvers are used,the spectral methods can be much more efficient than FE methods.For comparison between the pseudospectral methods and the finite difference (FD)methods, Fomberg has given more details in [14, 15]. He showed that the pseudospectralmethods can be viewed as a limit of finite differences of increasing orders. In eachspace dimension, the pseudospectral methods require as little as a quarter the number ofgrid points compared to a fourth-order finite difference scheme for the same accuracyrequirements. Here, some simple and heuristic explanations for the accuracy of thepseudospectral methods are addressed. More detailed discussions can be found in [26,24, 5, 20, 8, 14].A standard fourth-order finite difference approximation to the 1st-order spatial deriva6tive can be written as:[—f(x + 2h) + 8f(x + h) — 8f(x — h) + f(x — 2h)]/12h + 0(h4) (215)where h=1/N is the grid interval. Since each spectral coefficient is determined by all thegrid point values of f(x) for the N-point pseudospectral approximation, it follows that thepseudospectral rules are an N-point differencing formula with an error of O(h’).As N is increased, the pseudospectral method benefits in two ways: the grid interval hbecomes smaller, and the order of the approximation degree increases. This is decreasingfaster than any finite power of N because the power in the error formula is increasingtoo. This is the so-called “infinite order” or “exponential” convergence, as J.P. Boydnoted in [5]. Specifically, this “exponential” convergence property is useful in solvinglinear elliptic problems with smoothly variable coefficients. The property of “exponential”convergence to hyperbolic equations only holds for sample model problems discussed inSection 5.3.Choosing a proper basis function set is the first step of the pseudospectral approximation. This is usually decided by the boundary conditions of the problem. The boundaryconditions may be divided into two broad categories: behavioral and numerical. It isalmost always necessary to impose explicit numerical boundary conditions upon the approximate solution. In contrast, behavioral conditions may be satisfied implicitly bychoosing basis functions that have the required property or behavior. Based on theseconsiderations and the properties of different basis function sets, the proper basis set canbe chosen as follows:7Function Behavior Over The Interval Basis Setf(x) is periodic & symmetric or antisymmetric about x=O Fourier Seriesx E [a, b] &f(x) is non-periodic Chebyshev Polys;Legendre Polys.x E [0, oo] &f(x) decays exponentially as x - Co TL(x) (rationalChebyshev);Legendre Polys.x E [0, oo] f(x) has asymptotic series in inverse powers of x TL onlyx e [—co, J &f(x) decays exponentially as x — 00 sinc functions orHermite functionsx [—oo, ooj &f(x) has asymptotic series in inverse powers of TB onlyxfQ,O) is a function of latitude and longtitude on a sphere SphericalharmonicsTable 2.1 The Choices of Basis FunctionsIn most actual applications, the Fourier series and the Chebyshev polynomials are thebest choices, not only because Fast Fourier Transform (FFf) can be used to compute thesetwo bases, but also because most physical problems can be modelled as either periodicboundary value problems or finite interval boundary value problems. Consequently,all discussions and analysis will be concerned with the Fourier and the Chebyshevpseudospectral methods.2.2 Pseudospectral Fourier MethodIn the pseudospectral Fourier approximation, the problem involves periodic boundaryconditions. Thus, all functions appearing in the problem are periodic. Suppose f(x) to8be a smoothly differentiable function with period 27r. It can be approximated by thetrigonometric polynomialP1vf(x) that interpolates it at a set of collocation points. Thefollowing is one of the possible choices of the points [20]:x2=- (j=O,.”,2N—1) (2.2.1)Then the approximation FNf(x) has the form2N—IPNf(x) = f(x,)g,(x) (2.2.2)j=owhere g3 (xk) 8jk, andN1x—g,— 2N Lcj=1 (IlI#N), CN=C_N=2Defining2N—1al2Nc1f(xj)e_2 , (2.2.3)then (2.2.2) becomesPNf(x)=a1e (2.2.4)The next step in the pseudospectral method is to seek equations for an approximatesolution UN to a differential equation whose exact solution is U. Indeed, (2.2.4) givesakU(x) =(i )k 2 (2.2.5)InINwhere a is given by (2.2.3). If N is a highly composite integer, the Discrete FourierTransforms (DFIs) (2.2.3), (2.2.5) can be efficiently evaluated by the FFT algorithm inO(N log2 N) operations. Thus, evaluation of derivatives requires just two FFTs togetherwith the complex multiplications by (j)lC in (2.2.5).9Expressing the same procedure in matrix form, one obtainsaku(x)(DkU) (2.2.6)where Dk is a 2N x 2N matrix, and/ UN(XO)U is a column vector U=\UN(x2N_1)Explicitly,(D)!(_i)J+fl cot (x) (2.2.7)10 j—n(Di)cObviously, D1 is a real, antisymmetric matrix. Consequently, D2k is a real, symmetricmatrix while D2k+1 is a real, antisymmetric matrix.To show the accuracy of the Fourier pseudospectral method, an approximation theoryresult has been given in [9] : Let p>l!2. Then for Oqp, there exists a constant Cindependent of N and such thatlU — PNUIIq CN_PUiHere the norm is defined as:qlIUH ul , where = > k2ai3=0 k=—ccIt measures the magnitude of U and its first q derivatives. Obviously, the rate errordecrease with increasing N only depends on the smoothness of the function beingapproximated. However, if some smoothing procedure is used, the pseudospectral methodis still applicable to pointwise smooth problems[20].It is necessary to review some results regarding the stability and convergence of theFourier pseudospectral method. A scheme is said to be stable ifIIUN(t)ll <CIIUN(0)ll , (0 t T) (2.2.8)10for some proper norm. Here C may depend on T, but not on N. By the energy estimatemethod, Gotilieb et. al.[20] and Hussaini et. al.[241 have demonstrated the stabilityof the pseudospectral method for three major types of partial differential equations. Theexception is the hyperbolic equation with zero order derivative (damping term):U + a(x)U + bU = 0 (2.2.9)The straightforward discretization for this equation is often unstable. The Fourier methodis stable if a(x) is of fixed sign. Otherwise only some typical cases have been provenstable. Usually, writing the spatial derivative in skew-symmetric form, or filtering thesolution results in a practical and stable approximation.As for the accuracy and convergence of Fourier pseudospectral method in elastic waveequations, Fornberg [14, 15] has indicated analytically that the Fourier pseudospectralmethod can be interpreted as a limit of finite differences of increasing orders. Basedon spectral analysis of a model equation, he showed that the pseudospectral method(integration in time) may require, in each space dimension, as little as a quarter thenumber of grid points compared to a fourth-order finite-difference scheme and one-sixteenth the number of points as a second-order finite-difference scheme. For the totalnumber of points in two dimensions, these factors become 1/16 and 1/256, respectively.In series of his test calculations on the two-dimensional elastic wave equation, only minordegradations are found in cases with variable coefficients and discontinuous interfaces.He also noted in [141 that in the acoustic case, the governing equations take a specialform which can be exploited by low-order finite-difference methods. The advantages ofthe pseudospectral method are then less than in the more general elastic case.2.3 Pseudospectral Chebyshev MethodWhen a function f(x) is not periodic, or the computational domain has some numerical boundary conditions, a trigonometric interpolation polynomial does not provide agood enough approximation to yield accurate approximations to the derivatives of f(x).11It is better to approximate f(x) by polynomials in x. However, it is well known thatthe Lagrange interpolation polynomial based on equally spaced points does not give asatisfactory approximation to a general smooth function f. This poor behavior of polynomial interpolation can be avoided for smoothly differentiable functions by removingthe restriction to equally spaced collocation points. In the most common pseudospectralChebyshev method, the interpolation points in the interval [-1,1] are chosen to be one ofthe extrema of the Nth order Chebyshev polynomials TN(x). These areGauss — Lobatto: xj = cos-, (j = , N) (2.3.1)Gauss — Radau: x = cos2N+ 1’ i = , N) (2.3.2)Gauss: x = cos (j = O,••• ,N) (2.3.3)Following the definition of the Chebyshev polynomialT(x) = cos (ncos’ x) (2.3.4)in the Gauss-Lobatto grid point case, one obtainsT(x1)= cos (2.3.5)which indicates a close relation between the pseudospectral Chebyshev and the pseudospectral Fourier method. The Nth degree interpolation polynomial PNf(x) to f(x) isgiven byFNf(x) = aT(x) (2.3.6)where a = --—c0=c=2, c=1, (1<j<N—1)12An expression for the derivative of PNf(x) can be obtained by differentiating theabove equation:N NöFNf= aTr) bT(x) (2.3.7)oxn=O n=Owhere bN = 0, bN_1 2NaN andc, = b2 + 2(n + 1)a (0 Ti < N — 2)More generally,N= f(x)(D) (2.3.8)OxPj=oCk(_1))+kwhere (D1)kJ = — (kj)ci X —_____2N+1 —(D1)0= —(D1)NN6 —(Dl)fi=—(2and D = (D1)’It should be noted that the matrix D1 is not antisymmetric; also D2 is not symmetric.To consider the stability and convergence of the pseudospectral Chebyshev method,the following norm definitions are introduced:[ 11f112 Of2 ‘I qf I2IIfIIq = + +“.+(2.3.9)1dx2 J f2(x)where Ifil = \/l_x2—1By these norm definitions, Gotllieb et. al. have summarized the main approximationresults of the pseudospectral Chebyshev method in [20].13Let f(x) be a function with S continuous derivatives and let PNf be defined as in(2.3.6), then there is a constant C independent of f(x) and N such thatIf — FNfIIq (0 q (2.3.10)The accuracy of this approximation still depends on the smoothness of f(x). Similarly, onecan obtain a good approximation for the derivative “far away” from the discontinuity bysomehow smoothing the function. In [24, 20], it is demonstrated by the energy estimatemethod that the pseudospectral Chebyshev method has similar stability properties for thethree major PDEs as the Fourier method. The hyperbolic equation is still a problem whenthe wave velocity changes sign. Fortunately, in physical applications, wave velocitiesare always strictly positive.2.4 Pseudospectral Domain Decomposition TechniquesIn order to avoid the need for global mappings required by spectral methods inproblems with complicated geometries, domain decomposition techniques have beendeveloped for elliptic equations. They are also effective methods for overcoming somenon-smoothness or jumping conditions in global domains. A complicated domain canbe subdivided into several subdomains and individual spectral methods can be appliedto each subdomain. These techniques seems naturally suited for the parallel computerarchitectures except at the subdomain interfaces. Gottlieb et. al. [19] investigated indepth the influence of interface boundary conditions on the possibility of parallelizingpseudospectral multidomain algorithms.The subdomains are connected by “PATCHING” if the solutions in different elementsare matched along their common boundary by requiring that U(x) and a finite number ofits derivatives are equal along the interface [5]. Since the explicit numerical boundaryconditions must be used, the Chebyshev method is the most common choice.The number of matching conditions is the number that is necessary to give a uniquesolution in each subdomain. Thus, one must match both U(x) and at the interface for14a second order differential equation. For elliptic equations or other steady state equations,patching is usually straightforward. Boyd [51 has given a very efficient way to generatethe weak coupling of element solutions in the following example:Equation Lu = f(x) (2.4.1)u(a) = o, u(b) = ,B82 8£ =q2(x).—+qi(x)— + qo(x)Instead of just two subintervals, divide the interval x e [a, bl into M subdomains. Let d3denote the boundary between element (j-1) and element j and define:U3 u(d)The solution on thejth element, Uj(x), can always be written as the sum of a particularintegral Pj(x) plus two homogeneous solutions, defined by:£F,(x) f and P,(d3_i) = P(d,) = 0 (2.4.2)= 0 and hL3(d,_l) = 1, hL3(d,) = 0 (2.4.3)£IIRf = 0 and hR(df_1) = 0, hR,(d,) = 1It follows that the general solution to the differential equation can be written asu3(x) =F3(x) + U,_lhL3(x) +U3hR,(x) (2.4.4)Thus, all particular integrals and homogeneous solutions can be computed independentlyof the solutions on all the other elements. The boundary value problems defined by(2.4.2), (2.4.3) are completely uncoupled.15The final element solutions u(x) should be determined by (2.4.2). (2.4.3) plus thefollowing continuity condition of the first derivative at each of the subdomain interfaces.(The (2.4.2), (2.4.3) only consider the u(x) continuity)+ (h — — = — (2.4.5)j=1,••,M—1Uo = a, UM =The matrix expressed above should be tridiagonal because only four different homogeneous solutions, two for element (j- 1) and two for element j, enter the continuity conditionat each row of (2.4.5). Therefore, the cost of solving this continuity condition is not great.16Chapter 3 Pseudospectral PolynomialTime-Marching TechniquesThe pseudospectral methods provide a very useful tool for the solution of time-dependent differential equations. A standard scheme uses spectral methods to approximatethe space derivatives and a finite difference approach to march the solution in time. Thistactic results in an unbalanced scheme. Some alternative time-marching techniques, whichapproximate the time dependence of PDE solutions by using orthogonal polynomialsor Faber polynomials involving powers of the spatial differential operator, have beenintroduced. These time-marching techniques can be used to evaluate the numericalsolutions to spectral accuracy in both the space and time dimensions.1 Evolution of the Time Dependence Using Chebyshev Polynomials3.1.1 AlgorithmConsider the equationUj—GU=O,Ox2Tr (3.1.1)U(x,O) = Uo(x)where G is a linear spatial differential operator (for example, 0 = at). Here, one canassume the periodic boundary conditions for the problem so that (3.1.1) can be discretizedin space by using the pseudospectral Fourier methods outlined below:[39]DUN9t— FNGFNUN =0 (3.1.2)UN(x,0) = PNUO(X)where for any functionf(x), PNf(x) is its trigonometric interpolant at the collocation pointsxi =j, j = 0,1,•••,2N— 1 (3.1.3)17More precisely (the same as (2.2.4) )PNf(x)=(3.1.4)where a is defined in (2.2.3). The solution of (3.1.2) is given byUN(x,t) eXP(tPNGPN)UO(X) (3.1.5)Except for a very simple operator G, it is impossible to construct the exponential matrixexp (tPNGPN) explicitly. Usually an approximation to the exponent is used.According to the discussion in Chapter 2, the matrix GN = PNGPN is an anti-symmetric matrix and therefore has a complete set of 2N eigenvectors. Let Hm(GNt)be a polynomial approximation of exp (GNt) of the formHm(GNt) = (3.1.6)Tal-Ezer [39] has shown thatGNt— Hm(GNt) _ r — (3.1.7)<max— Hm(zt)2where Xk, k = 1,2, . . , 2N are the eigenvalues of GN,z [—ia(N — 1), ia(N — 1)]We therefore seek a polynomial approximation with real coefficients to the function ethat will minimize the expression on the right hand of (3.1.7). DefineR= Ia(N—1) (3.1.8)Then one can find the Chebyshev polynomial approximationHm(zt) = GkJk(Rt)Qk() (3.1.9)18where C0 = 1, C, = 2 for k 1, Jk(Rt) is the Bessel function of order k and the Qksare the modified Chebyshev polynomials in z/R defined as below:Qk(w) = (i)kTk(_iw), (3.1.10)w w E [—i, i]Thus, using the recurrence relation satisfied by the Chebyshev polynomialsTk+l(x) = 2xTk(x) — (3.1.11)To(x) 1, Ti(x) =it is easily verified that Qk(w) satisfies the following recurrence relation:Qk+1(w) = 2wQk(w) + Qk_1(w), (3.1.12)Qo(w) = 1, Qi(w) w3.1.2 Accuracy, Resolution And ConvergenceThe accuracy of a polynomial is defined by its asymptotic rate of convergence as m(the degree of the polynomial) tends to infinity. Let the interval of asymptotic behaviorof the polynomial be denoted by [mo, oo). Then the degree of the polynomial has to begreater or equal to m in order to have proper time resolution. In [39J, Tal-Ezer definedthe condition of having meaningful resolution as one in which mm0 and the relative errornorm is less than 10%. precisely, assume that for m e [mo, oo), the minimal m whichachieves this accuracy is , Tal-Ezer defined the necessary and sufficient condition forresolution is m> f. Applying this condition to my approximation problem means thatone has to apply the spatial operator tGN m0 times in order to resolve N modes of the19exact solution of (3.1.1) at time level t:UN(x,t) exp (tGN)U°(x) (3.1.13)mo G= CkJk(tR)Qk()U0(X)It is known that the Bessel function Jk(tR) converges to zero exponentially whenk increases beyond tR. This implies that the interval of asymptotic behavior is[mo, oo) where mo = [tR]. Hence, to resolve N modes for the problem of (3.1.2),one has to use the spatial operator at least Iat(N-1)I times. In contrast, in order to get aresolution of N modes by the leap-frog central-differencing time-marching scheme,U’’ U’’ + 2ztGU’ (3.1.14)5 1/2 3/2one has to operate with iGN at least () ItaNI times [39]. A detailed analysis of thetime resolution in standard finite difference time-marching scheme can be found in [261It is clear from the resolution discussion that resolution implies stability. In fact, sinceHm(tGN) etG const.Hm(GN) must be bounded independently of m and N.Obviously, due to the stability limitation, the leap-frog scheme has to be iteratedmany times over a small time step /t to get the resolution at time level t. In contrast, theChebyshev polynomial approximation has no such kind of limitation. All the Ita(N-1)Itimes operations can be evaluated in one time step L2st=t. According to the property ofthe Bessel function:Jm(m) <am (II <1)exp (v’ - 2)a1+y1_220Tal-Ezer has proven in [391 that the error of Chebyshev approximation scheme in timeconverges to zero asatm — o, o a < 1 (3.1.15)m—*ooHence, this scheme has spectral accuracy both in time and space. In practice, if m istoo large, the coefficients of Qk in the polynomial may exceed machine precision, thusimposing a practical limitation.In the scalar variable coefficient (still the one-dimensional) case, the operator G is(3.1.16)Now the new discretized operator matrixGN=AN•DN (3.1.17)where AN is a diagonal matrix(AN)21 a(x1)Sand DN approximates the derivative spectrally. It is clear that GN is no longer a normalmatrix. By a technique different from the constant coefficient case, it has been provenin [39] that the main results of the Chebyshev scheme are still valid as long as the wavevelocity a(x) does not change its sign in the interval. Now the max a(x)I, x E [0, 27r]should replace the constant a in the previous case.3.1.3 Second-order CaseEtgen [12] has extended Tal-Ezer’ s results to the second-order wave equation:(3.1.18)where _2 is a second-order spatial differential operator. The standard second-orderfinite difference approximation is expressed as the sum of two Taylor series in [12]:a2u 1+ zt) — 2U(t) + U(t — (3.1.19)21ia2u z.t2ô4U z\t86u=2= 2[cos(tC) — l]UBy expressing (3.1.7) in terms ofthe real and imaginary parts, one has:eZ — Hm(zt)2 cos (izt)— HmR(izt) (3.1.20)+ sin (—izt) + H(—zzt)where Hm(zt) H(—izt) + H(—izt)According to (3.8) in [39]:H(tL) = Jo(Rt) + 2(_l)kJk(Rt)T() (3.1.21)H(tJ) = 2 (_1)kJ (Rt)T ()The cosine of a differential operator acting on a function can be represented by anorthogonal polynomial series expansion as follows:cos (tC)=C2kJk(tR)Q() (3.1.22)where C2k , J21 and Q2k are definedthe same as the previous definitions in(3.1.9).Particularly, based on (3.1.12), Q2k satisfy the following recurrence relation:Qo() =1; Q2()‘2L (3.1.23)Q2k+2 () =(42+ 21) Q2k — Q2k--222where I is the identity operator. In [12], Etgen also gives the formal solution to the waveequation (3.1.18) for an initial value problem as:U(t) = cos (tL)U(O) +sin(tC) öU(O)(3.1.24)This equation can be easily used to advance a solution to the wave equation in time. Thetime updating operator for any time level can be found by shifting the initial conditionsto other values of t:U(t + zt) = —U( — it) + 2 cos (L\tL)U(t) (3.1.25)Theoretically, in this scheme there’s no limitation on the time step t. However, dueto the resolution requirement, one has to use the spatial operator at least R/t. times. Inother words, the highest degree of the evaluated polynomial should be at least (Rz.t)thorder. Therefore, the largest zt practically is limited by the highest order polynomialfor given machine precision.3.2 Approximating The Time Dependence By Complex PolynomialsThe main limitation to the orthogonal polynomial time-marching scheme is that itrequires a normal or skew-symmetric derivative matrix and its eigenvalues have to bepurely real or imaginary. Therefore, a more general polynomial time-marching algorithmneeds to be introduced.3.2.1 Polynomial Interpolation AlgorithmBasically, the method described in the last section is the approximation of the timedependence of the solution, which can be represented as f(A) where A is a finite operator.Approximating frA) can be reduced to a problem of approximating f(z) where z belongsto a domain D in the complex plane. This domain includes all the eigenvalues of A [40,41]. A possible approach is to expand the function as a sum of orthogonal polynomials.In the last section, it has been shown that this approach can be efficiently used for the23function f(z)=exp(tz) where the domain is part of the imaginary (or the real) axis. Forthe more complicated domains, this method fails to demonstrate its effectiveness andefficiency. Thus, Tal-Ezer [40, 41] suggests analternative way which is based on thecomplex polynomial interpolation for the topologically complex eigenvalue domains.Consider a domain D in the complex plane C.The D is a bounded continuum andits complement is simply connected in theextended plane and contains the point at c’o.As concluded in [40, 41, 10], a general approximation to a function f(z) in the arbitrarydomain D is given by a Faber polynomial expansion. Let q’(z) be a conformal mappingfrom z to w that maps the complement of D to the complement of a disk of radius psuch thatlirn-=i (3.2.1)Here p is called the logarithmic capacity or the transfinite diameter of D. The conformalmapping (z) and its inverse mapping ,b(w) are illustrated in Figure 3.1.p(z)(w)Figure 3.1 Conformal Mappings for the Newton-form Polynomial InterpolationIf the Laurent expansion at oc of [b(z)]m is[(z)Jm = m + Cm_i zm + ... + C1z +C_1z+“ (3.2.2)24then the Faber polynomial of degree m, F7(z), which corresponds to the domain D isthe polynomial part of (3.2.2):Fm(Z) = Zm + Cm_lZm_l + + Co (3.2.3)withC, —--. f çf) dz (3.2.4)27r3 z)IzI=Rwhere R is chosen sufficiently large so that D is contained in the interior of the regionbounded by the circle Iz=R. 1ff is a function that is analytic at every point of D, thenf can be approximated as:f(z) = >akFk(z) (3.2.5)The coefficients ak areak =_ f (3.2.6)IwI=rwhere &(w) is the inverse of 4(z) and r>p is sufficiently small so thatf can be extendedanalytically to I,. (Jr is the closed Jordan region with boundary Fr, where F,. is theimage under of the circle IwI=r). Based on (3.2.5), the matrix function f(A) can beapproximated by the Faber polynomial expansion:f(A) akFk(A) (3.2.7)When D is an ellipse or the special case, a segment of the real or imaginary axis, Fk isthe translated and scaled Chebyshev polynomial (Markushevich in [32]).When D is a more complicated domain, the generalized Faber polynomial satisfies a kterm recurrence relation instead of the 3—term recurrence for the Chebyshev polynomial.Thus it is more difficult to compute Fk’S. In order to overcome this major drawback,Tal-Ezer has proposed the Newton-form interpolation in [40, 411 to simplify the problemas outlined below:25First, some proper interpolation points for a general domain D are chosen in twofeasible ways:a. The m zeros of Fm(z);b. z, = (c4), j = 1,... , m +1 where w are the m+ 1 roots of the equation m+lThese points are called Fejér points.Since, interpolating at = /‘(w,), j = 0,• . , m does not involve computing Faberpolynomials, it is a simpler and more efficient approach to the problem of approximation.The interpolating polynomial in Newton-form isPm(z) = akRk(z) (3.2.8)where ak is the divided difference of order k: [1]ak=f[zo,.,zk] (3.2.9)kk = 0,••• , m.= (z1 — . (z — z_i)(z — (z1 — Zk)f,=f(z,)Ro(z) = 1, (3.2.10)Rk(z)=fl(z—z),k=1,.”,mIn order to prevent overflow or underfiow while computing the divided differences, thedomain should be scaled such that its logarithmic capacity p will be equal to 1:(3.2.11)pHence= f(z) = f(p. ) and Fm(s) f(s) (3.2.12)26wherePm(s)=b1() (3.2.13)k = 0,.Then one can approximate the operator f(A) by Fm (A). wherePm (A) = b01 + b1 (A — + b2 (A — .2o1) (A — iii) (3.2.14)++bm(A_I)”(A_m_i1)withA= (-A\\pJand p is the logarithmic capacity of the domain D.3.2.2 Rate of ConvergenceBefore considering the convergence of the Newton-form interpolation approximation,one needs the following definitions: [40, 41]“Definition 1: Let FR be the image under ç& of the circle IwIR (R>p), and ‘R the closedJordan region whose boundary is “R• Iff(z) is single-valued and analytic on ‘R then thesequence of polynomials Pm(z) is said to convergence to f(z) on D maximally iff(z) — Pm(z)I C(p/R)m,z e D (3.2.15)where C depends on p and R but not on m or z.”[m]. . . .Definition 2: A set of points z is said to be uniformly distributed on D (the boundaryof D)ifurn IRm(z)!lhIm = p (3.2.16)m—oo27where Rm (z) (z — z)It has been shown in [32) that either the m zeros of Fm(Z) or the m Fejér points areuniformly distributed onAccording to the theorem in [58], if one chooses these uniformly distributed pointsm1 as the interpolation points, the Pm(Z) converges maximally toflz) on D. Thus, one has= the asymptotic rate of convergence (3.2.17)Based on complex variable theory, Tal-Ezer has also shown that if f(z) is an entirefunction, 3.2.15 is satisfied for arbitrary R.Consequently, the degree m of the polynomial Pm(z) can be determined in thefollowing ways:1. Getting the parameters R (the disc radius) and p (the logarithmic capacity) (analytically or numerically) and choosing m such that() (3.2.18)where e is the desired accuracy.2. Computing the error numerically on a set of check points on the boundary for differentm’s and choosing m that will satisfy the desired accuracy.This method can provide an accurate value of the m only when A is a normal matrix[40]. When A is “far” from normality, m should be increased by an amount that is notknown before hand. Therefore, it is preferable, in this case, to use the infinite set ofpoints generated by the second algorithm which will be described later.283.2.3 Computational ConsiderationsIn the Newton-form of polynomial interpolation, computing the divided differencesis very vulnerable to round off errors and overflow. Besides scaling the logarithmiccapacity to 1, the interpolation points should be arranged in a proper order. If thesuccessive points are very close to each other, there exist severe round off errors oroverflow for computing the divided differences.Tal-Ezer gives two possible algorithms in [40, 41] to generate a set of interpolationpoints which don’t cause severe round off errors and overflow.Fejér point generating Algorithm 1: t9=0,j=l=1, S0=2irlm, find the largest k such thatk is power of two and kSO<zir2kSO. Then the algorithm proceeds as follows:(1) For i = 1 until 1 do0 + kSOif (‘i ir) goto (2)j=j+1‘1(2) end dol=jk = k/2if (k < 1) stopgoto (1)This part of the algorithm produces arguments of points on the upper part of the unitcircle. Thusm) = ij(1), (m) = (3.2.19)(m) — / (rn) — (m)2j ‘K ) 2j+1 2j291 < < (m— 1)/2Here, () is the conformal mapping from the complement of the unit disc to thecomplement of the domain D. The set of points cm) described above is equally distributedon the boundary of the domain D for any degree of m. It has the disadvantage of havingto decide on m before starting the iterations. The next algorithm generates an infinite setof points free of this disadvantage.Fejér point generating Algorithm 2:60 =k=1zob(1)01 = 0(1) For i 1 until k do0k+i = 0 + 60Zk+j_1 ?,b(ei91)end do60 = 60/2k = 2kgoto (1)Those points generated by Algorithm 2 are asymptotically equally distributed. The evendistribution is achieved only when m is power of two. The uneven distribution of points inAlgorithm 2 is the price one has to pay for the ability to add additional points. Therefore,in practice, one stops the iterations of Algorithm 2 at some proper degree m which satisfiesthe desired accuracy. Otherwise, one can just add some more points to the existing set of30points by this algorithm without recomputing all the points until the accuracy is satisfied.In general, one can expect the interpolation based on Algorithm 1 to be more accurate.When the operator matrix A is real, f(z) accepts real values for z real, and alsof() = f(z), Tal-Ezer [40] has proven that the approximation algorithm can be modifiedto eliminate complex arithmetic.Approximation algorithm with real arithmetic: Wm f(A)v:u = [ao + ai(A — zoI)]v (3.2.21)r = (A — zoI)(A — ziI)vFor i=1,•,do= (A— zI)r -u — u +a2r +a22+irR R -1r‘- S goto (1)hullr= (A—zI)r+ (zj)2rend do(1) Wm=UWhere zk’ are respectively the real and imaginary part of the interpolation points zk.The only difficulty in finding the interpolation points is to get the conformal mapping&() and its inverse çS(z). For few simple domains, it is possible to find this functionanalytically. For more complicated domains, one has to resort to a numerical approach.When the domain D is a polygon, the mapping function is a Schwarz-Christoffeltransformation. The numerical routines described by Trefethen in [51] map the interiorof the unit disk on the interior of the polygon. In order to map the exterior of the unit31disk to the exterior of the polygon, when it is symmetric about the real axis, Tal-Ezer[40, 41] has modified the Trefethen’s routines as outlined below.One can treat the upper half of the exterior of the polygon as an interior of a polygonwith vertex at infinity. Then, the desired mapping function is an interior SchwarzChristoffel transfonnation composed with ftu) which maps the upper half of the exteriorof the unit disk on the interior of the unit disk. The f(u) can be written as f(u)—f2fj(u)),wherefl = ( +—) (3.2.2 1)f2(u) = UaU — UbHere, Ua is a point on the unit circle and 11b is a point in the upper half plane. Thesepoints determine the orientation of the mapping.Therefore, one construct the composed conformal mapping to map the complementof the unit disk to the complement of the domain D. In general, one can use this ideafor any domain that can be divided into two symmetric parts, simply by rotating it suchthat the line of symmetry will match the real axis.Since the Faber series expansion deals with more general complex domain approximation to the time dependence operator J(A) of the solution, one can expect that thismethod can solve the problems with a more complicated operator matrix A. For example,as discussed in the previous chapter, for the non-periodic case, Chebyshev collocationrather than Fourier pseudospectral has to be used. According to (2.3.8), the Chebyshevderivative operator may be neither normal nor skew-symmetric. Therefore, the approximation of the time-dependent symbolic solution has to be based on the Newton-forminterpolation, rather than the Chebyshev expansion. The Newton-form interpolation isjust one kind of computational implementation based on the Faber series expansion theory. By these new complex domain approximation methods, the time dependence of32the solutions to the PDEs can be efficiently approximated as long as the eigenvalue distributions of spatial differential operators are available. Thus, the new time-marchingtechnique based on the complex domain polynomial expansion can be applied to manytime-evolution problems.3.3 Numerical Results for Time-Marching in the Periodic DomainSome simulations have been done to demonstrate the accuracy of the new polynomialtime-marching techniques in the periodic domain. One-dimensional wave propagation ina slowly-varying medium with damping has been simulated by the Fourier pseudospectral time-marching and the results are compared with the corresponding analytical solution. The well-known d’Alembert solution is used here to check the one-dimensionalhomogeneous wave propagation simulation. A simple-shape two-dimensional scatteringsimulation is also performed via the Fourier pseudospectral time-marching algorithm.3.3.1 Variable Wave Velocity and Damping in thePseudospectral Time-Marching SchemeIn the equation (3.1.18), the operator £ actually includes not only a spatial differentialoperator, but also the wave-velocity distribution in the medium. It has been proven (in[39]) that when the wave-velocity distribution does not change sign in the computationaldomain, the eigenvalues of £ are also purely imaginary. Therefore, all the foregoingconclusions about the polynomial expansion remain valid.For the cases in medium with constant damping, let us consider the following 1storder hyperbolic equation model:+ £U= —aU (3.3.1.1)Here a is damping factor.The symbolic solution to (3.3.1.1) isU(t) = e(’’U(O) (3.3.1.2)33where I is identity operator. Because a is constant, and I is commutative with £according to [33], one has:=+L(3.3.1.3)=k!(n—n=0 k=0= jjk3=0 k=0=j=0 k=0=k!(n—n=0 k=000 T\\i—’ n!n=0= e(1+I) =Therefore, the (3.3.1.2) can be written as:U(t) = e_e_tU(O) (3.3.1.4)So one can separate the damping factor term from the differential operator if a is aconstant.For non-constant damping and complicated wave-velocity distribution cases, sincethe eigenvalues of a + £ are no longer purely imaginary or real, the correspondingpolynomial expansion has to be done in the complex plane by the Faber series.3.3.2 Chebyshev polynomial expansion resultsSome typical one-dimensional wave propagation and two-dimensional simple shapescattering problem simulations have also been implemented.34The first problem is a simple one-dimensional wave equation in a homogeneousmedium (with periodic boundary condition):(x_x)22 ,U(O,x)=O (3.3.1)where C is a constant, and the initial distribution is a Gaussian pulse.The corresponding analytical solution isU(t,x)—[exp(_fr+ct0)) +exp(__cx0)2] (332)The following computation is based on the parameters: wave Velocity C= 1, time stept=O.l, number of grid points=256 or 128, and axis length 6.Obviously, Figure 3.2 shows that the computation error will increase as time increasesdue to the recursive use of spectral time-marching. Theoretically, it is possible to computethe solution in one time step of any size. But since the highest Chebyshev polynomial tobe evaluated is governed byLtR, a time step which is too large requires very high orderChebyshev polynomials whose coefficients need more digits than the machine precision.In the above results, it seems that the errors are reduced by reducing the sampling gridnumber (from 256 to 128). Actually, it is the highest order of the truncated polynomialwhich directly affects the computational errors. When the grid number N is 256,ztR=l4and the computation uses up to the 20th order Chebyshev polynomial, i.e. 43% higherthan ztR. As previously discussed, when k is beyond LtR, the Bessel function Jk(/.tR)converges to zero exponentially. When the grid number N is 128 , i.tR=7 and thecomputation uses up to the 14th order polynomial, i.e. 100% higher thanztR. Theresults clearly show a higher accuracy for the latter, even though it has a lower spatialsampling frequency (i.e. fewer frequency modes).The second problem is a one-dimensional equation with a linear wave velocitydistribution and constant damping:--+ C(x) = —aU (3.3.3)35Ca)Magnitude Error ForöU(t,x(t)) — ÔU OUOx_aaa_’.aUi.e.— = —crUat.. UQ,x) = e_atU(O,x) = ef() (3.3.5)1.50.5Magnitude Error For Grid # N=25686%2 4XAxis2 4 6X AxisI8400 6X Axis X AxisFigure 3.2 Computation Enor for the One-dimensional Homogeneous Wave EquationLet 0(x)=then one has2 400 2 4 6that is:(3.3.4)36If C(x)=ax+b (linear wave velocity distribution),0(x)== ax + b (3.3.6)the solution to (3.3.6) is: (1/a)ln(ax+b)+constanh—tWhen t=0, U(O,x)=f(),i.e. x(O)=1 f’ax+b”\= —in i i and (3.3.7)a(ax + b)e_° — baSubstituting in (3.3.5) by (3.3.7), one can get the final analytical solution to (3.3.3):U(,x) e_tf{+ b)e — b] (3.3.8)Using (3.3.1.4) one can compute the numerical solution of (3.3.3). The followingfigures illustrate the comparisons and errors between the spectral numerical results by(3.3.1.4) and the analytical solution in (3.3.8). In Fig. 3.3 and the left part of Fig. 3.4,the polynomial series are truncated over 150% more than ztR, which governs the leastorder of the truncated series, but in the right part of Fig. 3.4 only 32% more than LtRpolynomials are evaluated. It is similar to the homogeneous case. If one evaluates morepolynomials, one can achieve higher computational accuracy. Comparing the left partsof the two figures, one can see the only difference between them is that the Nyquistrates of their wave velocity distributions are different (different linear rates). Thus, dueto the aliasing problem, the same spatial sampling rate (grid number) results in differentnumerical accuracies. The third model simulated by the new time-marching algorithmis a simple two-dimensional scattering problem without damping. The following Fig.3.5 and Fig. 3.6 are the two-dimensional initial value distribution and the wave velocitydistribution. The medium wave velocity is variable in the domain with wave velocitylower inside a disc region simulating a cylinder. Another appended figure (in Appendix37First Time Step (dtO.l).10th Time Step (t=10*dt=1)X1O____________________________6c(x)=0.01*x+0.1 A C(X)=0.01*X+0.1I 44 damping factor Oi damping factor 0.12 grid number 256 : d number 256XAxis XAxisFigure 3.3 Magnitude of Errors At Different Time Stepsx107 Time Step dt=0.l xlO-5 Time Step dt=O.5c(x)=0.1 *x+01damping factor 0.105 gridnumber256 J_______________1* 6ojj 6XAxis XAxisFigure 3.4 Magnitude of Errors For Different ‘lime Step Sizes (at 1st time step)A) illustrates one snapshot of the wave propagation and scattering procedure, in whichthe diffraction and reflection are obvious for this model. The refraction is relativelytrivial because the wave velocity inside the low-velocity region (0.2 inside the disc, 1.0outside).38Initial Pulsea,t•0IzIbOlY AxisFigure 3.5 Initial Wave Distribution Simulating an ImpulseWave Velocity DistributionFigure 3.6 Wave-Velocity Distribution Simulating a Low VelocityCylinder ( The interface is created by a Rapid Velicity Transition)Y AxIs39Chapter 4 Pseudospectral Polynomial Time-marchingfor Non-periodic Boundary Value ProblemsFor periodic problems or the other “behavior boundary” value problems, due to thesame boundary behavior of the corresponding spectral basis functions, the spectral operator automatically includes the boundary behavior. The Tal-Ezer time-marching technique,which is based on the approximation of the spectral operator function, can be directly applied. However, for non-periodic “numerical boundary” value problems, explicit boundary condition expressions are decidedly needed. In the conventional finite-differencetime-marching scheme, a so-called “boundary-bordering” technique (Boyd, 1986 ) [5]can be used to impose explicitly the boundary conditions on the spectral operator matrix.In this thesis, a technique is developed to incorporate implicitly the homogeneous Dirichlet or Neumann boundary conditions into the Chebyshev collocated spatial operator. Theeigenvalue distribution of the operator, which determines the time-marching polynomial,automatically includes the effect of the boundary conditions. Imposing homogeneousboundary conditions in this way makes it possible to apply directly Tal-Ezer’s advancedtime-marching technique to homogeneous Dirichlet or Neumann boundary value problems [31].4.1 Homogeneous Dirichiet Boundary ConditionsFirst, let us consider the Chebyshev collocated second order derivative matrix withhomogeneous Dirichiet boundary conditions. The second-order pseudospectral differentiation process can be described as follows:(1) Interpolate the data at Gauss-Lobatto grid pointsxi = cos (), j = 1,. ,N— 1 (4.1.1)40by a Chebyshev polynomial p(x), which should be constrained to satisfy homogeneousDirichiet boundary conditions at the end points xo = +1 and ZN = —1p(xo) = p(xN) = 0 (4.1.2)(2) Differentiate the interpolant twice to obtain estimates p”(xj) of the secondderivative of the data at each grid point.Since the differentiation process is linear it can be described by an (N-i) x (N-1) matrix (Weideman et at 1988) [59]. This matrix is typically neither sparsenor symmetric, in contrast to the situation with finite differences. Once the boundaryconditions (4.1.2) and collocation points x1 are fixed, the differentiation matrixis implicitly determined. According to Gottlieb et at [20], the explicit formulas forthe Chebyshev collocated second-order differentiation matrix entries (without boundaryconditions) can be calculated as follows:ö(FNf(xk))= f(xj)(Dp)ka (4.1.3)Ck (—i)’where (D1)k3 = — (k j)ci Xj —C0CN2, =1,(1jN—1)_______2N+1(D113, = — , (D1)00 = = —(D1)NN2(1_x) 6and D = (D1)Under homogeneous Dirichlet boundary constraints, one haso uü=OI,U1 Ui= [D2}(N+I)2 (4.1.4)UN1 UN_i0 (N+1)xl UN 0 (N+i)xiSince uo and UN are always fixed as zero, one has no need to evaluate their derivatives.Due to Trefethen et at in [54], similar to the description of the first-order differentiation41matrix entries, one can easily obtainu;’[L ] - = [Dpj (N )2 [ ] - (4.1.5)UN I (N 1)xl UN I (N I)xIwhere D2 is the same matrix as D2, but with the first, last rows and columns deleted(masked by the zero boundary conditions). Thus, it is two ranks lower than the originalD2, in contrast to the situation of two ranks higher as in the “boundary bordering” scheme.Even though it has been proven by Gottlieb et al [21] that all eigenvalues offor collocation at Gauss-Lobatto points are real, distinct, and negative, Weideman et al[59] discovered that only a proportion, 21’ir, of the numerically computed eigenvalues areaccurate approximations to those of the continuous problem. However, it is fortunatethat all one needs in the new time-marching scheme are just the eigenvalue boundsin the complex plane, rather than the accurately computed eigenvalues. As describedbefore, the domain D which determines the time-marching polynomials should includeall the eigenvalues of the spatial differentiation matrix. In many applications, the timedependence of the solution (or the symbolic solution for time-marching) is usually anentire function, such as an exponential, sin or cos. Thus, as long as all eigenvalues arelocated within D, the time-marching can be done properly no matter what the actual shapeof the domain D. In practice, in order to efficiently compute the conformal mapping qS(z)and b(w), sometimes the original eigenvalue domain needs to be re-scaled so that it iseasier to construct the conformal mapping. However, all eigenvalues are still includedinside.4.2 Homogeneous Neumann Boundary ConditionsFor homogeneous Neumann boundary conditions, since there are no fixed functionvalues, but the fixed first order derivatives at the end points, imposing the boundaryconditions into the second-order differentiation matrix is not as straightforward as forDirichlet boundary conditions.42According to equation (4.1.3), one hasug=D1D (4.2.1)Uj UNfor the no-boundary constraint case. The homogeneous Neumann boundary conditionsat Gauss-Lobatto grid points areU’(XO) = u’(XN) = 0 (4.2.2)Then equation (4.2.1) can be re-written aso 0II 1 1 1 1U0 U1 aoo alN U1= . , (4.2.3a)4 4_ dNo •.. dNN 4_o 00dIN U0and = = •.. (4.2.3b)UN_i UN dNo dNN UN0where d13 = (D1) (4.2.3c)Obviously, when one calculates the first-order derivative, one only needs to evaluate[Uç,.. , U’N_iI. The first and the last rows of the differentiation matrix D1 in (4.2.3a)can be deleted. Thus, the first-order derivative evaluation becomesU’1 d1,0 U0= . (4.2.4)UN_i (N—1)xl dN_i,o ... dN_i,N (N—i)x(N--i) UN (N+1)xiFurthermore, the fixed zero first-order derivatives at end points x=+1 and xj=—l willmask out the first and the last columns of the D1 matrix in (4.2.3a). Therefore thesecond-order differentiation process becomesU1 d0,1 ... do,N_1= (4.2.5)I, IUN (N-i-i)xl UN,1 .•• UN,N_1 (N+i)x(N—1) UN_i (N—i)xi43From (4.2.1) — (4.2.5) one can note that the Chebyshev collocated second-orderderivative matrix with homogeneous Neumann boundary conditions at the end grid pointscan be expressed as:d0,1 • do,N_1[Dp](N+l)2= [ •.. - I - (4.2.6)dN,1 dN,N 1 (N+1)x(N 1)d1,0dN_l,o dN_1,N (N—1)x(N+1)This (N+1)x (N+1) matrix implicitly reflects both the Chebyshev collocated spacedifferentiation at interior Gauss-Lobatto points and the homogeneous Neumann boundaryconditions at end points.By the foregoing manipulations, the homogeneous Dirichiet or Neumann boundaryconditions are incorporated into the spectral operator. Therefore, the polynomial time-marching based on the eigenvalue characteristics of the spectral operator will automatically generate the boundary-constrained solution. For the Dirichiet boundary case, sincethe time-marching is evaluated only at interior points, at every time step the end pointvalues should be updated separately by the boundary conditions. But for the Neumannboundary conditions, the function value time-marching is evaluated at all grid points(D is an (N+l)x(N+l) matrix ) but with the incorporated first derivative constraints.Therefore, all grid point values will be updated.4.3 Numerical ExamplesThe first numerical example used to demonstrate the accuracy and efficiency of thepolynomial time-marching for boundary value problems is a homogeneous wave equationwith Dirichlet boundary conditions on the interval [-1,1].2 2I GIL aU_fl 1- -1I — — U, —-I. — X —u(x,O) = uo(x), u(x,O) = 0 (4.3.1)1. u(—l,t) = u(+l,t) = 0, t 044In order to get a symbolic solution as an entire function (so that the eigenvaluedomain can be regularized), one can make some variable substitutions. Set = v andthe above equation becomesH = 2 (4.3.2)vj 0 VU UOwith=, u(±1,t)=0VChebyshev collocation at Gauss-Lobatto points (4.1.1) is represented as[ua]= GN[u J j = 1,... , N — 1 (4.3.3)V3 V3U0 UN = 0where GN= [jIN_i]where ‘N—i is an identity matrix of rank (N-i), is defined in Section 4.1, a Chebyshevcollocation differentiation operator with homogeneous Dirichlet boundary conditions. Thesymbolic solution of equation (4.3.3) is[] exp (GNt) [uo] (4.3.4)where u = [Ui,.. ,UN_iI and 5 = [vi,... ,vN_i1 are (N-1)xl vectors for the interiorgrid points.At this stage, the solution of equation (4.3.1) can be computed by the Newton-forminterpolation time-marching technique. Even though the matrix 0N is twice as largeas the D, the actual computation will still be concentrated on evaluating due tothe sparse characteristics of the matrix GN. This operator matrix is far from normal.Therefore, it is preferable to use the time-marching scheme based on an infinite set ofinterpolation points in the domain D. Because of the special structure of the operator45GN the eigenvalues of GN can be easily obtained from those of Suppose one ofthe eigenvalues of UN is g, then one can get the following:X 0 ‘N X1 XGN, = i-2 — g ( • •‘ (2N) J I• fY=gXDX=gYFinally, DX g2XThus, it can be concluded that the eigenvalues of the operator GN are the square roots ofthose of Therefore, the eigenvalue distribution bounds of GN can also be directlyobtained from the eigenvalues distribution of An eigenvalue distribution domain Dcan then be accordingly configured based on these bounds.According to the definition of in the previous section, the elements of thematrix operator GN are purely real, and the exponential function satisfies the conditionof f() = f(z). Due to the proof in [40](Tal-Ezer 1989), one can use the algorithmdesigned to carry out only real arithmetic operations. Thus, the time-marching algorithmin (3.2.2 1) should be preferred.In Fig. 4.1 and 4.2, which show the numerical results of the time-marching of theproblem (4.3.1), the initial pulse is a smooth polynomial pulse. The time-marching erroris less than iO. The corresponding analytical solution is based on the d’Alembertsolution combined with the method of images. The numerical results in the followingfigures show that under this time-marching scheme and error control factor, the spacediscretization error due to spectral sampling dominates in the solution. This can be seenby comparing the maximum errors at each time step for the case N=64 in Figure 4.1 andN=31 in Figure 4.2. In both figures, the wave distribution at t=1 should be a fiat lineof zero, therefore the wave amplitude at t=1 is identical to the numerical computational46errors at t=l.0.80.60.40.2Wave Amplitude at t=0.4-1 0xWave Ampiltude at t=1 .60-0.2-0.4-1 0x0xx io Erroratt=1.6Figure 4.1 Homogeneous Dirichiet Initial-Boundary Value Problem N=64 (time step 0.005)The numerical evidence listed in Table 4.1 shows that the stability limitation forthe time step in the polynomial time-marching scheme is approximately 0(1/N2). Theexperimental data in Table 4.1 are based on the polynomial or Gaussian pulse initialx 1 i4 Error at t=0.4x15 Wave at t=1-1 0 150-5xx105 Erroratt=11 —1 1-1 0 1x47Wave Amplitude at t=O.4 > io Error at t=O.41 +.++ +t=0 2+ +O.5_0 I -1 0x x1 Wave at t=1 1Error at t=1\A/vv\Jv\JV-1 0 1 -1 0x xWave Amplitude at t=1 .6 > 1 -4 Error at t=1 .6-0.2\-0.4____-20-1 0 1 -1 0x xFigure 4.2 Homogeneous Dirichiet Initial-Boundary Value Problem N=31 (time step 0.005)distribution and 20 time step calculations. The time-marching error control factor isless than iO.48Table 4.1 Numerical Stability Limitation for the Polynomial Time-marching Step SizeN=32 N=64 N=128Dirichiet boundary tmax=O.25 LtmaxO.O625 /tmax=O.Ol56caseNeuman boundary LtmaO.25 L1tmaxO.O625 tmaxO.O156caseSince there is no theoretical stability analysis available for the new polynomial time-marching algorithm, a rough explanation and qualitative analysis can be presented here.Numerically, the major cause of the instability is from the round-off errors of the Newton-form interpolation points z. Since the maximum eigenvalue of the operator GNLt isaffected by the time step Lit, the interpolation points z, which are distributed along theeigenvalue domain boundary, are also affected by the time step size. If the time stepis too large, the interpolation points, and thereby the divided difference coefficients ak,could be large enough to cause significant round-off errors. A similar situation occurs inChebyshev polynomial time-marching (Luo and Yedlin l993)[30].The second numerical example is a homogeneous initial boundary value problemwith homogeneous Neumann boundary conditionsI_=0, —1xl, t>Ou(x,O)=cos(rx), ut(x,O)=O (4.3.6)I u(—1,t) = u(+l,t) = 0, t 0solution to the above equation isThe analyticalu(x, t) = cos (rt) . cos (7rx) (4.3.7)The solution procedure is the same as for Dirichiet boundary problem described before,except that the operator GN now is defined as10GN‘N--1 10 ] (2N+2)(4.3.8)49oc(V.17.17)Nng(t+N)oootNnN’tNptN’JNp...t’t—NpO’INptnN’IptN’tp...t’tpO’tponooo3nspiusaidaiquiwupipiopuutjjjittpuppamspatatpNn000t—NnN’iNp1N’tNp...tLtNpO’tNptJVn=tN’IpTN’tp...I’tpO’tpflOn0009’—:MopquxotfsAT1ALIppouoojsqqpiosjsiquo‘i(iunbsuo3•=O‘On9’—=—N‘N— I——flZ)_jams’wtodpuiuwnsuooqi‘(TTV)swiodo#vqo-ssnvatppwooS1OUiWUIflsnouowoqtjiwuuxoiddsuornpuoot(mpunoqatp‘>>9’‘j>>siiqouwjsnouomoqqiwixoiddsuornpuooiupunoqtp‘T<<9’‘<<uqM‘1JsnotAq()siimsuooam9’puzIX—‘0(o’x)n‘(x)O=(o’x)nO<‘j5x—:Moquoqssnsuornpuo(.mpunoqSflOatiOWOt[eiuatpqitiUAtSiuoiinbiMsoddngSStopuuatomjouoius-aidatqioippuxsiqutonbiuqoiuorjndiusuiuompuookrspunoqatijSUOfl!pUO£upunoqSIIOUOUIOHI’’uoiwwixojddirnqm-wiwojswoouooniowqi‘uotoflooAqsiCqq(qpiusaidaiApsioaidquioqoiqMsosiuoiinqi.nstpjsrnuiatj‘idwxfiouuinuSRflUIpAitJ31quoiinampmuw(9-01)0uinqioqswqodS!tPJOS’lIflSat1U!JWUUtP‘(çJJt>JO13JIO.flUO3JOJJ4IIIU1PUioo=iv‘IE=N)suompuooiunuidaiuisqiiopun(9i 7)Uipupps’‘uaqHere the matrix D1 is also defined in (4.1.3).Thus, it can be concluded that the 2nd order Chebyshev collocation differentialoperator, including the general homogeneous boundary conditions in (4.4.1), is given by:—j3 0 0 0d1,0 d1, • dl,N_1 dl,N[Dj (N+1)2 = [D1](N+l)2dN_1,o dN_I,1 dN_1,N_1 dN_1,No a (N+i)2(4.4.5)This general mixed boundary condition is extremely useful in simulating some wavescattering problems in a domain with leaking walls, e.g. the wave propagation in a waveguide.4.5 The Two-Dimensional CaseIn the two-dimensional case, the Chebyshev collocation is done along x, y directions.The eigenvalue analysis for the collocated derivative operator is more complicated.Suppose one can represent the two-dimensional Laplace operator as below:2 OU OU (4.5.1)ox oy2=D2U+ UD,2Here, include homogeneous boundary conditions and can be obtained throughthe procedure described in the previous sections. Instead of a one-dimensional vector,here U is a two-dimensional matrix with x-row-y-column orientation:U00 UOMU(N+1)x(M+1) = ... (4.5.2)NO UNM (N-)-1)x(M-I-I)For this orientation, one can have the following relationships for the operator matrices(2)— -1’2sp .51D2 = (DP)Twhere are the corresponding one-dimensional Chebyshev collocated 2ndorder derivative operators (including homogeneous boundary conditions) for N+1 andM+1 grid points, respectively.To get the eigenvalue bounds of the operator £2, one needs to use Kroneckerproductand to stack the matrix U(N+J)x(M+J) to a vector VUOOUOMuloV = (4.5.4)UNM (N+1)(M+1)xlThe Kronecker product (sometimes called the direct product) is defined as: [4]a11B a12B a1BAmxn 0 Bpxq = (4.5.5)amiB am2B amnBEach submatrix in the above (mp) x (nq) matrix has dimensions p x q. The Kroneckerproduct is distributive and associative [4]. Based on the description of Barnett [4], thematrix equationAnxnnxm + XnxmBmxm =0nxm (4.5.6)is equivalent to([A 0 Jm]nmxnm + [i ® Bxm] ){vec(X)lnmxi = [vec(C)}nmxj (4.5.7)nm X nmHere, vecO is used to represent the re-stacking procedure along the column, as describedin (4.5.4). Hence, one can re-write (4.5.1) as:(D2 0 ‘M+l + ‘N+1 0 (D2))T)[vj(N+l)(M+l)Xl (4.5.8)52According to the property of Kronecker product [4j, if has eigenvaiues ,,, i0, 1,. , N, and has eigenvalues j = 0, 1,. , M, then the Kroneckerproduct operator in (4.5.8) has (M+1)(N+1) eigenvalues 1k = X + , k =0, 1,—S , (M + l)(N + 1). Thus, the bounds of the two-dimensional Chebyshev collocation operator can be directly obtained from the maximum bound of the correspondingone-dimensional operator bounds along X and Y. The eigenvalue bounds of the operatordetermine the size aiad the shape of the domain D, thereby determining the logarithmiccapacity and the distribution of the Fejèr points used in (3.2.14).It has been shown that the Chebyshev collocation approximation for spatial derivatives can be combined with the polynomial time-marching technique, provided that thecorresponding boundary conditions are properly incorporated into the spectral operator.This technique extends the applicability of the new spectrally accurate polynomial time-marching technique to many non-periodic problems. Homogeneous boundary conditionshave been successfully included within the corresponding Chebyshev collocation operator. Under this time-marching scheme, besides the significant improvement of theaccuracy in one dimension, the largest time step size required by stability is O(J/N2).In the two-dimensional case, the eigenvalue bounds of the derivative operator incorporated with boundary conditions can be obtained through the Kroneckerproduct properties.Compared to some implicit finite-difference time-marching schemes, which may allowlarger time steps, this algorithm also avoids matrix inversion required by some implicittime-marching schemes.53Chapter 5 Pseudospectral Polynomial Time-marchingfor Non-reflecting Boundary ProblemsWhen simulating wave propagation, usually it is essential to introduce artificialboundaries to limit the area of computation, due to the finite memory limitation of thecomputer. The boundary conditions at these artificial boundaries are used to guarantee aunique and well-posed solution to the PDE problems within the computational domain.These artificial boundary conditions are expected to affect the solution in a manner suchthat it closely approximates the free space solution which exists in the absence of theseboundaries. Thus, the amplitudes of waves reflected from these artificial boundaries areexpected to be minimized.In order to avoid (or reduce) the edge reflection contamination from the computationaldomain boundaries, one may make the model sufficiently large so that the arrival timesof these edge reflections are out of the time window of interest. Although it would becompletely free of any edge contamination, this option is very costly in terms of CPU timeand memory. Another type ofmore practical choices is to implement non-reflecting and/orabsorbing boundary conditions at the computational domain edges. The “non-reflecting”here means that a set of equations are imposed only at those grid points on the edges ofthe computational domain that mathematically absorb almost all the outgoing energy [11,38]. Unfortunately, according to Engquist et al in [11], the perfectly absorbing conditionsnecessarily have to be nonlocal in both space and time and thus are not useful for practicalcalculations. Hence, in practice, some approximations have to be derived to approachthe theoretical nonlocal boundary condition, at the cost of some energy being reflectedback into the computational domain. Often an absorbing region is used to dampen theoutgoing energy by surrounding the main domain with a narrow damping strip which candrastically dampen the traversing waves. The damping factor should go from zero in theinterior of the strip to a maximum value at the edge of the whole model. This technique54is more efficient than the large “free-space” one, but still involves a non-trivial amountof extra wave propagation. The third alternative to absorb the edge reflection is to setsome boundary conditions which can be used to cancel the edge reflection thoroughly[18). As described in [18], Dirichlet and Neumann boundary conditions can be usedto simulate rigid and free surface conditions respectively. Their combination can beused to effectively and completely cancel the first order edge reflections. Unlike otherschemes, it is independent of the incident angle. However, in order to obtain the combinedsolutions of both boundary conditions, 21 wave propagation problems, (i is the numberof boundaries involved) one for each combination of Dirichlet and Neumann conditionsapplied, have to be solved and linearly superimposed. Therefore, this perfectly-absorbingscheme is also very costly in computing time, although it’s simpler.In this chapter, the general approximation methods for the non-reflecting boundaryconditions will be briefly described and the method of incorporating the 2nd orderabsorbing approximation equations at the edges into the new polynomial time-marchingscheme will be introduced.5.1 Non-reflecting Boundary Condition ApproximationConsider the scalar wave equation,= w + w, t,x 0 (5.1.1)According to Engquist et al [11], the perfectly absorbing boundary condition for theabove equation at x=O can be written in the form( s°(,) — wzo = 0 (5.1.2)Here p(, w) is a smooth function homogeneous of degree zero for II + Iwl large withsupport in 2 > for (, ) large and identically one on a neighborhood of the supportof th(0, , ) (the Fourier transform of w(x, y, t) at x = 0).55Unfortunately, the perfectly absorbing boundary condition above in (5.1.2) is necessarily nonlocal in both space and time. This boundary condition is impractical from acomputational point of view since to advance one time level at a single point requiresinformation from all previous times over the entire boundary. Thus, many highly absorbing local approximations to (5.1.2) have been developed. Necessarily, the boundarycondition approximations need to satisfy the following two criteria [11]:1. These boundary conditions are local. This is essential for reasonable control of theoperation count.2. The boundary conditions lead to a well-posed mixed boundary value problem for thewave equation. This condition guarantees the stability of the solution.The first and second order approximations based on Taylor or Padé expansion are[11]:1st Order Approximation:(-— = 0 (5.1.3)2nd Order Approximation:/02 82 182___— + wIxo 0 (5.1.4)Both of the above two approximations have been verified to be well-posed in [11, 53].The first order one approximates absorption at the normal incidence and it’s perfect inthe one-dimensional space case. Actually, (5.1.3) is the standard left-traveling one-waywave equation, which represents the left-traveling-only propagation at the boundary x=O.In [38] (Renaut 1992), Renaut reviewed some of the higher order methods currentlyused for solving the absorbing boundary conditions for the two-dimensional scalar waveequation, such as Lindman [27] and Engquist and Majda [11]. She showed that theextension to higher orders is neither immediately obvious nor unique, as there are manydifferent ways one can discretize the derived absorbing boundary condition. She also56proposed a new high-order absorbing boundary condition approximation, which reservesthe 2nd order spatial derivative as the highest order derivative in the formula. For thetwo-dimensional scalar wave equationc2(us + u) , (5.1.5)the absorbing boundary condition at x=O can be approximated by a set of equations (asopposed to one)u—u= (5.1.6)where___2 aUãI3ic ,to be solved at the boundary. Here po, crc, are approximation coefficients. Thederivation of P0, cxi, j3 is described by Renaut in [38]. In the above equation, toincrease the degree of approximation, one only needs to add some new coefficients andincrease the ioop index (i.e. increase the degree of interpolation, but not the highest orderderivative). This property is quite useful in the finite-difference implementations.In the polynomial time-marching scheme, due to the complication of matrix manipulations, only the standard 2nd-order absorbing approximation (5.1.4) will be consideredas an example for the absorbing boundary condition implementation in the new time-marching scheme.5.2 Polynomial Time-marching With Non-reflectingBoundary Conditions in One-dimensional CaseIn the polynomial time-marching, as discussed in Chapter 4, one needs to incorporatethe time-dependent absorbing boundary conditions into the spatial operator. Here, onecan utilize the 1st order time derivative term in vector form (4.3.2)-(4.3.4). Let’s consider578c0N’tNpN’tp000000000’tNp0’tp00rn1r><(I+N)rxel==[jN4_=‘(Nn)._=X(Nfl)0a.=(0)=X(On)sutodpuqiwpusidaioixquosuornpuo&rupunoquu,j.((iv)UIUOIW]ndltnflnpsgurus)suiodorn?q07-ssnvDrnipUtfl?OOJJO3‘na1+55i—‘0=(o‘x)n‘(x)0=(o=ffl‘1t+._=i‘ieuicIi——ze 0<‘T+>X>T—:sput0000t’tNpt’tpTgaiq=(i+N)[Tj]=[]CffNa_ttNa0ta0Oa0(Evc)Tx(I+N)Nn000tNnN’tNpt’tNp0’t—NpInN’tpT’Ip0’tpOn000(zvc)spiusaidaiqunoSAt1AUP‘aiojartuj1+=Nr‘-=0xsuiodputjiiirnppnoojiooartNa‘Oa‘Nn‘On:1+=Nx‘-=OxOMIqinsuipunoquiqJosq1ojdqiiiuomnbAMnpostmoisuwip-uoThe matrix D1 and the elements dj are defined in (4.1.3). The vectors ü, are thesame as in Section 4.3. Following the same procedure as described in (4.3.2)-(4.3.4),one can obtain the general symbolic operator solution including the absorbing boundarycondition as below:ul u(O)j = exp (GNt) (5.2.4)0 ‘N-i-i 1Here , [GN](2N+)2= c2D1 -i c2D1 B] (2N+2)The matrices D1 and B are defined in (5.2.3). By a method similar to that describedin (4.3.5), one can also get the eigenvalue bounds of [GN](2N+)2 directly from thoseof [D1 B] and [D1 D1]. Let 1 is an eigenvalue of the operator [GJv](2N+)2, thenone can obtain(x’ 0 I (x”\ (x”[GN}= D1 D1 D1 B kY}= l} (5.2.5)uIY=lx1 [1 .i]x+[Di .B]Y =Here d , respectively are the eigenvalues of {D1 . D1j and [D1 B]. Then theeigenvalue bound of [GN](2N+)2I4,I can be obtained from (5.2.5). Thus, the maximumeigenvalue (spectral radius) of the operator [GN] (2N+2) can be calculated from theeigenvalues of [D1 . I)] and [D1 B]. Direct eigenvalue computation for the biggermatrix[GNJ(2N+2) is unnecessary.Figure 5.1 and Figure 5.2 show some numerical results for the one-dimensionalabsorbing boundary condition approximation when N=32 and N=64. The initial pulsesare all Gaussian, the time step is 0.2 and the wave velocity c=1. The time-marchingrelative error control factor is less than 10. The analytical reference can be derived from59Wave Amplitude at t=O.4 1 Error at t=0.41+ +t=o1”Hi0xWave Amplitude at t=iN=32. dt=0.2Figure 5.1 One-dimension Absorbing Boundary Condition Numerical Results (N=32)the physical interpretation of one-dimensional scalar wave propagation. The analyticalsolution of the wave at t=1 .6 in both figures should be a flat zero line because the pulsehas transmitted out of the computational domain. Therefore, what is shown in the leftcolumn plots (t=l.6) in Figure 5.1 and Figure 5.2 is just computational error, which isequivalent to the corresponding right column plot.5.3 Spectral Accuracy Discussions of the PolynomialTime-marching Scheme in One DimensionAs Fornberg stated in [14], the pseudospectral method performs in many situations farbetter than present theory would suggest. As for the spectral accuracy for general initial-boundary value problems, no general theory is available that is as readily applicableas those for finite-differences. In Chapter 2, I have shown some previously existing0.5C.xx103 Waveatt=i.6-1 0 1-1 0x-i 0x160Wave Amplitude at tO.4 1 Error at t=O.4Wave Amplitude at t=1 1 Error at t=1:::\11Wave att=1.6 Error att=1.6____________________tMIWdvMNX Nt64, dt=O.2 XFigure 5.2 One-dimension Absorbing Boundary Condition Numerical Results (N=64)spectral accuracy results for Fourier pseudospectral method in elastic wave equations,presented by Fornberg in [14, 15]. To investigate the spectral accuracy of the Chebyshevpseudospectral in polynomial time-marching scheme, some one-dimensional experimentalcalculations have been performed to obtain the data in the following tables.Table 5.1 Time-marching Errors for the One-dimensional Wave Equationwith Homogeneous Neumann Boundary Conditions (homogeneous medium)N Mean Error Max. Error4 6.041E-l 7.OlOE-18 2.788E-3 3.048E-312 l.552E-6 l.593E-616 2.270E-1O 2.297E-1O20 l.099E-13 l.634E-1361Table 52 Time-marching Errors for the One-dimensional Wave Equationwith Homogeneous Dirichiet Boundary Conditions (homogeneous medium)N Mean Error Max. Error4 1.641E-2 4.lO2E-28 5.568E-5 1.050E-412 1.841E-8 6.150E-816 1.423E-12 4.535E-1220 2.806E-14 5.684E-14Here, the computation is based on Equation (4.3.1) with Uo(x) = sin (7rx) for theDirichlet case and with Equation (4.3.6) for the Neumann case. A time step of Lt=O.O1 isused. The errors are computed from the 100th time step wave distribution, compared withthe corresponding analytical solution. From the error data listed in the tables, it is clearthat the error converges to zero fairly fast. In the Neumann case (Table 5.1), the meanerror ratio of N=8 to N=4 is about 0.0046, which roughly shows an error at the orderof o ((p4)8), i.e. 8th order finite-difference scheme accuracy. The mean error ratio ofN=16 to N=8 is about 8.14e-8, which roughly shows an error at the order of O((4)24),i.e. 25th order finite-difference scheme accuracy. In Diriciflet case (Table 5.2), the meanerror ratio of N=8 to N=4 is approximately 0.0034. It also shows an 8th finite-differenceorder of error convergence. The mean error ratio of N=16 to N=8 is 2.56e-8, whichis roughly equivalent to an error 0(()25)’i.e. 25th order finite-difference accuracy.Both Dirichlet and Neumann case results show that when the number of grid points Nincreases, the order of error convergence also increases. Although it is claimed by J.P.Boyd in [5] that in elliptic equation case the spectral method has an error of o ((y)N),in this experiment, probably due to the boundary condition interaction and also due tothe relatively low accuracy of computing Schwarz-Christoffel at small Ns, the results donot illustrate such kind of “exponential order” convergence. In fact, due to the limitationof computer double precision format, in practice it is impossible to get this order of62convergence for N>10. However, the experimental results do show that the degree oferror convergence increases with N.Table 5.3 Time-marching Errors for the One-dimensional WaveEquation with Absorbing Boundary Conditions (homogeneous medium)N Mean Error Max. Error64 5.629E-5 2.038E-4128 5.374E-8 1.401E-7In Table 5.3, the mean error ratio from N=128 to N=64 is 9.55e-4, which shows anerror convergence order of 0 ((*) 10). This error convergent degree is approximatelyabout the same as that for 10th order of finite-difference. Probably due to the operatorcomplexity in this case, the minimum error caused by Schwarz-Christoffel conformalmapping is around 0(1O-). Thus, the overall accuracy is certainly limited by this order.A time step of /.t=O.O1 is used in this experiment. The errors are computed from thewave at 240th time step. A cosine initial distribution is used.Table 5.4 Time-marching Errors for the One-dimensional WaveEquation with Absorbing Boundary Conditions (varying medium)N Mean Error Max. Error32 1.514E-3 5.215E-364 6.053E-4 2.186E-3128 2.936E-4 1.123E-3Finally, a step discontinuity with the absorbing boundary condition is analyzed to seethe effect of the medium variance. The experimental data listed in Table 5.4 show that invarying medium (wave velocity jumping from 0.5 to 1.0 with one middle point transition)the performance is quite poor ( o() ) due to the varying medium coefficient’s affect.Severe accuracy degradation is caused by the step discontinuity. In the polynomial timemarching scheme, since the differential operator is applied repeatedly over the mediumspace, some high order derivatives of the medium variation contaminate the overall63solution. In the computation of Table 5.4 data, the time step remains as tt=O.O1 andthe errors are computed from the wave at 380th time step. Also, a cosine initial pulse•is used here. For two-dimensional equations with absorbing boundary conditions, due tothe limitation of the 2nd order absorbing approximation, no result with accuracy higherthan 2nd order can be expected.5.4 Polynomial Time-marching With Non-reflectingBoundary Conditions in Two DimensionIn the two-dimensional case, in order to get a good absorbing approximation, oneneeds to use at least the 2nd order absorbing boundary condition approximation (5.1.4),rather than just the 1st order one for normal incidence. Consider a two-dimensional scalarwave equation within the rectangular domain x, y e [—1, +11. The boundary conditionat the edge of x=-1 is a 2nd order absorbing approximation. The other 3 edges (x=+1,y= +1-1 ) are all homogeneous Dirichiet.+ c2 y (—1, +1), t > 0u(x,y,0) Uo(x,y), uj(x,y,O) = 0, x,y E [—1,+1j(541)u(x = 1,y,t) = 0, u(x,y = ±1,t) = 0, 1 > 0= cu + .c2u x —l,y E (—1,1)To examine and compare the results, another two simulation results need to be linearlysuperimposed to get a perfectly absorbing at the edge of x=-1. One is with all 4 edgesof homogeneous Dirichiet, and the second is with homogeneous Neumann at the edge ofx—-1 but the rest all homogeneous Dirichiet. According to [18], their linear combinationwill generate the completely absorbing boundary at the edge of x—-1. Discretizing N, Mrespectively along x, y at Gauss-Lobatto points, I haveIx=cos(), i=0,.,Ny =co() , j =0,•.•,M (5.4.2)Similarly, setting vu1, based on the two-dimensional results in (4.5.1), one has thefollowing matrix expression:U 0 1 UV = c2G v (5.4.3)64d2N—1,20Here, d] , d] are respectively the elements of D1 , D2 defined in (4.1.3) and d,j =[Di] The first row, first and last columns of the solution matrix uo , u2,0 , i,Mare all zero, due to the 3—edge homogeneous Dirichlet conditions.In order to obtain the absorbing effect in a few time steps, and to avoid the reflectioninterference from the other full-reflection edges, the initial Gaussian pulse center is locatedvery close to the x=-1 boundary. The initial pulse is tapered to zero at the near-edgeelements. The initial pulse is generated from the expressionu(x, y, 0) = 0.1 * exp (— ((x — xO)2 + 2) i) (544)In this numerical experiment, the wave velocity c is 1.0, the time step t is 0.01, whichcorresponds to a time step 3.33x 10 second if the wave propagation speed is lightspeed c=3 x 108 rn/sec. This time step is comparable to 2.5 x 10_it second time step usedin [50]. In this experiment, the number of collocation grid points in both X and Y is 32,xO=-0.5290, that is, the peak of the initial pulse will pass the x=-1 boundary at 47th timestep. The global error and normalized reflection are also defined in the same way as inwhere GU = UNX(M1) + BN2UNX(M_1) [n(2)J(M-1)2J NxN‘lNxN [dO&N1)XNI dr d2 ‘ (2)I 1,1 d1,2[(2)] [ (2dN..l,l0 ... 0r[(2)](M-i)2 =LM_1,1 dM_1,M_1 .1 (M—l)2fi 0I.10 1BNN I:Lo 0NxN001 NxN65[501. At each time step the difference is defined to beThe normalized reflection is the reflection occuning at one cell above the x=-1boundary when the pulse peak passes the boundary. It is normalized by thevalue of the pulse along the boundary at that moment (47th time step).0w-D0030 40 50 60Time Step1 0 2nd Order Absorbing Approximation“0 10 20 30 40 50 60 70Time StepNormahzed Reflection at 47th Time Step, nx=31D(i,j) = uappr.(i,i) 0.5 * (uditich(i,j) + uneum(i,j)) (5.4.5)The global error is defined asE =D2(i,j) (5.4.6)absorbingmaximum0.000x 1 o4 2nd Order Absorbing Approximation defta=0.01 570 80 90delta=0.02280 90(defta=0.02)C00IDIDcc-oIDNCDE0zX AxisFigure 5.3 Two-dimension Absorbing Approximation Numerical Results (Grid points 32 in both X and Y, dt=0.01)The global errors of both cases in this experiment are comparable to those reportedin reference [50] (less than 0.0003 within 80 time steps). When z=0.0l5, the initial66pulse is steeper and more aliasing may occur, thus the results are worse. Since the samepolynomial time-marching and Chebyshev collocation scheme are used here for both theapproximated and the perfect absorber, the global error only demonstrates the accuracyof the absorbing approximation, which is limited by the order of the absorber accuracy,in this case second order, and is determined principally by the magnitude of the reflectedwaves propagating into the computational domain.67Chapter 6 Simulation andVisualization of Model Wave ScatteringNumerically simulating wave field propagation in heterogeneous media has becomeone of the major tools for studying seismograms. Many different numerical algorithmshave been applied in this area. Particularly, finite-difference time-domain algorithms havebeen intensively implemented on supercomputers for seismic forward modeling [25, 57,551, which starts with an assumed earth model to generate the wave field by solving theelastic wave equations.6.1 2—D Seismic Reflection ModelingThe two-dimensional SH-wave (horizontal shear wave ) propagation in a heterogeneous medium can be described by the following elastic equation:82v 8 Dv 8 Dv=—,u(x,z).— + — u(x,z)— (6.1.1)where p(x, z) is the density and IL(x, z) is the shear modulus at the point (x, z) of themedium and v(x, z) is the horizontal displacement along the y axis. The medium issupposed to be in equilibrium at time t=O. Hence, all the initial conditions are zeroeverywhere in the medium. The wave propagation is caused by a point source excitation(a short pulse under the earth surface ) in the medium. The distribution geometry ofthe medium is a rather simplified salt dome model used by J. Virieux in [57], illustratedin Figure 6.1. The positions Si, S2, S3 are three possible source locations (one source684.0km4kmx x xsls3 s2PLI 1.32.40kmokm_____ 8kmFigure 6.1 Geometry of Salt Dome for Polynomial Time-marching Modelingat a time). The top edge is free earth surface, so the boundary condition is stress-free,i.e. Neumann condition for the displacement v. The other three edges are all artificialdomain boundaries for computational purposes. They are all set as non-reflecting andthe 2nd order absorbing approximation (5.1.4) is applied there as described in Chapter 5and [28]. For ease of implementation, when the absorbing approximation is applied tothe left and the right side of the domain illustrated in Figure 6.1, a fairly homogeneousmedium is always assumed along the boundaries so that the simple form of (5.1.4) canbe used. The boundary conditions can be expressed in the following form:= cva + x = —1 (bottom)==2x = +1 (top)(6.1.2)z_—1(et)= —cv + z = +1 (right)where c is wave velocityVPSimilar to the manipulations in the previous chapters by introducing the 1st ordertime derivative of v, the equation (6.1.1) with the above boundary conditions incorporatedcan be comfigured in the following matrix operator form:(v=(0 1 (v(6.1.3)\vt j \opl op2,j \vtj69(6.1.4)(6.1.5)where I is an identity operator. Under the discretization of the scaled Gauss-Lobatto gridpoints [61], the node positions are given byI =X1-X+ Xmp Xm. cos () , i = o, . . . , N1 z = + ZmZmmn cos () , j o, ,where Zmin Xmin = 0 m, Zmax = 8000 m, Xmax = 4000 mThe operator 1 opi and operator 2 op2 can respectively represented asopl{v} = {r[Do](tL)[Dlj[vj[A2+r[Ai][vj[Dzj(t)[D2]},op2{v} (c){rs[D4J[vt][Bi] + r[vt][D5]}The detailed derivation procedure is described in Appendix C. Here, ( )s are all two-dimensional arrays and [ Is are matrices. The operator 1 reflects the 2nd-order equation(6.1.1), Neumann boundary condition at x=-1 and the 2nd-order derivatives in (6.1.2).The operator 2 reflects the 1st order derivatives in (6.1.2). The corresponding matricesare defined below as:,_lx Ax‘O,1 ON AX ,1X1,O 1,ND0= dX dX:, = ... (6.1.6)N—1,1 N—1,N dX dx0 •.. 0 (N+1)xNN,O N,N Nx(N+1)00•0 00 d,1 ... d,M_l 0 0 1 0 0D2= : : : ,A1= 0n AZ AZ a 00 ... 1 0‘ U.M1 uMM1 ‘ (M+i)20 0 •.. 0 0.5 (N+1)20 ... 0 AZ 11 AZ: :U UD4= :: ,D5= : •..d,0 ... dYN (N+i)2—d0 0 ... 0 (M+i)2700.5 0 0 00 0 0... 0A2 = 0 . , B1 = ‘M—l0 0 1 0 0 0 (M+i)20 0 0 0.5 (M+1)2where d , d, are the elements of derivative matrices along X and Z, defined in theformula (2.3.8) or (4.1.3). The D is simply the full 1st order derivative matrix inChebyshev collocation. The matrices D0 and D1 are configured in the same way asdescribed in (4.2.1) — (4.2.6). In addition, the last row of D0 is also masked by zerosso that the last row (corresponding to x=-1 in the solution array) is left for the absorbingapproximation. The matrix D2 simply is the 1st order Chebyshev collocation derivativematrix with its 1st and last column masked by zeros due to the absorbing approximationat these two columns (corresponding to z=+1 and z=-1 respectively). The matricesA1 and A2 are respectively used to add the absorbing boundary coefficient 0.5 to therelated row (the last row ) and columns (the 1st and the last colunms). The matrix A1also masks the 1st row of the solution array to zeros to satisfy the ji = 0 boundarycondition. The matrices D4 and D5 are used to represent the first order derivatives inthe absorbing approximation. The matrix B1 is used to avoid the repeat of the two lowercorners which have been counted twice at both the horizontal and the vertical edges.The factors r, r and their squares come from the scaled coordinate. According to Zhaoin [611, the Chebyshev collocated derivative under the scaled coordinate (6.1.4) has thefollowing relationship with the standard [-1, +11 interval expression= [Xmin,Xmaxl , E [1,+1] (6.1.7)r= Xmaz — XmjnConsequently, the symbolic solution of the 1st order operator equation system (6.1.3)71can be written as(i)= exp (tA) (:) (6.1.8)Here, the general operator A is defined asA=J (6.1.9)\opl op2jThen, the polynomial time-marching discussed in Chapter 3 can be applied.In order to obtain the initial disthbution, a similar method as used by Alford et al in[3] can be applied here. Assuming the point source is located in a small homogeneousregion, and the source excitation is a short pulse, which is the usual case in most seismicapplications, before the source term diminishes, the wave propagation resulting from thesource excitation is always limited within the homogeneous region. Therefore, the wavedistribution within the homogeneous region can be computed by an infinite-space Green’sfunction method. After the source pulse disappears, the wave distribution obtained bythe Green’s function method can be used as the initial condition for the time marchingin (6.1.8).The medium is composed of two layers. A dome rising from the lower medium (salt)intrudes the upper overburden medium. The medium parameters used here are:Upper layer: velocity 1500 rn/sec. density 2300 kg/rn3Lower layer: velocity 2250 mlsec, density 2100 kg/rn3The source is an impulsive point source located at 500 m below the free surface andright at the horizontal middle line of the domain (xs = 3500 m , zs = 4000 m). Thesource is defined asf(t) = (t—to) exp (_a(t — to)2) (6.1.10)with a=l000 and to—0.15. The main advantage of this pulse source is that its bandwidthis limited while its time-period is also short. The negative part caused by this source72is especially suited for identifying wave fronts in block diagrams, according to Vineux(1984) [57] and Alford (1974) [3]. The constant factor a governs the time interval w fromnegative to positive peaks of the function as shown in the following figure This choice ofImpulsive Source with tO=0.15, oc=10000.0150.01 IiwH0.005 I I0 I Iw=’1 2,tz-0.01 I0.05 0.1 0.15 0.2 0.25 0.3 0.35 0.4Time (sec)Figure 6.2 Impulsive Source Distribution Time Functionthe source function was made as a good compromise between short time duration (regionalhomogeneous propagation) and a narrow spectrum (reasonable sampling in time). TheFourier transform of (6.1.10) isF(w) = (6.1.11)This spectrum provides sufficient damping to control the bandwidth of the source andalso limits the bandwidth of the convolution with the infinite-space Green’s function dueto the following formulav3(x, z, t)= J{—iH2(sr) F(w) }e+iwtdw (6.1.12)where r is the distance between the observation point and the source point, c is the wavevelocity c H2 is the second Hankel function of 0th order. In practice, the73inverse Fourier transform integral is performed numerically to get the time-domain wavedistribution v(x, z, t). Its time derivative can obtained by the 4th order finite-differenceapproximation. Then they are used as the initial conditions for the time marching. Thet0=0.5 sec and the homogeneous propagation time period [0, 0.31 sec are carefully chosenso that the source values at both t=0 sec and t 0.3 sec are small enough (10_9*peak). Therefore, the zero initial condition for the domain can be satisfied and the sourceexcitation virtually disappears when the time-marching begins at t=0.3 sec. With the wavevelocity of 1500 rn/sec in the upper layer medium, the wavefront caused by the pointsource excitation (at sx=3500m , sz=4000m) is still within the upper layer homogeneousregion ( 0.3 sec * 1500 rn/sec = 450 m ).The highest frequency component can also be obtained from the source constant aas the maximum time-domain sampling interval should be no greater than the largesttransition interval w (see Figure 6.2). In this case, 65 temporal sampling points arechosen for the 0.3 sec initial propagation period (sample interval 0.0046 sec, far less thanw=0.045 sec for a=1000 case) for the Green’s function computation.Consequently, the shortest wavelength corresponding to the highest frequency component is given bymin =2WCmin 2Cmin 134 m (6.1.13)In order to resolve all the sampled frequency components (up to fmax = 0.5f8 =in the time-domain, the spatial discretization should have enough grid points so that themaximum spatial interval is less than half of the shortest wavelength corresponding tothe highest sampled frequency component, i.e. 67 meters. Since the Gauss-Lobatto gridused here is unevenly distributed, its maximum interval should be counted here. From(6.1.4), it’s easy to get the maximum interval for the scaled Gauss-Lobatto grid given bymax — mm . ft\(LX)max= 2sinN)(6.1.14)Zmax — Zmiri . I ir(Z)max2sin74Hence, N=100 and M=200 were chosen so as to satisfy the conditions of (LX)maz(Z)maz 62.8 m < min 67 m.The time-marching step size could also be limited to be no greater than w so that theminimum time sampling interval can always be guaranteed. On the other hand, in orderto utilize the advantage of the looser stability limitation of the polynomial time-marching,which allows much larger time step size than that in the conventional finite-differencescheme, a time step of 0.00 1 sec was chosen for the time marching.As a necessary practical requirement for numerical computation, the discontinuity ofthe medium parameters at the interface has to be somehow smoothed. Here, a hyperbolictangent function is used for this purpose so as to effectively reduce the aliasing causedby the discontinuity at the interface.6.2 Numerical Results and DiscussionIn the following pages, some simulation snapshots of the previously defined seismicmodel are illustrated there. They demonstrate that the validity of the upper free surfaceboundary condition and other 2nd-order absorbing boundary conditions. The cornerand the interface diffraction wavefronts can be clearly observed from these snapshots.Figure 6.6 shows a simpler half space wave scattering model, which demonstrates clearermultiple reflections, including the so-called “conical” wave occurring at the criticalincident angle. More animation movies of these wave scattering phenomena are presentedon a video tape as an appendix to this thesis.75Wave at t=2.0 (from tO=0.3 s) sec4000Z (meter)Figure 6.3 Scatter Movie Snapshots When the Source is Located at xs=3500m, zs=4000Wave at t=2.0 (from tO=0.3) sec1000 2000 3000 4000Z (meter)Figure 6.4 Scatter Movie Snapshots When the Source is Located at xs=3500m, zs=320076Wave at t=2.O (from tO=0.3) secWave Scattering at t=1 .5 (from tO=0.3)Figure 6.6 Half-space Scattering5000 6000 7000 80004000Z (meter)Figure 6.5 Scatter Movie Snapshots When the Source is Located at xs35OOm, zs=2400“0 1000 2000 3000 4000Z (meter)776.3 Half-space Wave ScatteringIn order to verify the two-dimensional wave scattering results, a half-space scatteringexample is considered. The analytical time-domain solution for half-space wave scatteringis derived by Cagniard in [7] in 1962 and Aid also gives more detailed descriptions in[2]. In practice, due to the numerical implementation difficulties of a step function, itis preferable to use the frequency domain solution as shown in (6.1.12). The reflectioncan also be computed in the frequency domain by multiplying the incident wave by theasymptotic reflection coefficients V(O) described by Brekhovskikh and Lysanov in [6]given bym cos 9 — \/fl2 — sin2 0V(0) = , sinO < ri (6.3.1)m cos 0 + — sin2 0m cos 9 — ijjsin2 9 —V(0)= , sm9>nm cos 0 + 9 —Here, 9 is the incidence angle from the normal at the interface (y=O), m = P2/P1 , ‘ =Cl /c2. p1, P2, ci, C2 are the medium densities and shear velocities for the upper and lowermedium layers respectively. In this analysis, we use pi = P2, ci < C2. Therefore, wehave a real n and n<1.The same source function (6.1.10) is applied here. Thus, the analytical solutionfor the half space y < 0 needs to be convolved with the source function to get thesource-excited wave propagation. The analytical solution can be written asv3(x,z,t) =f {_i(H2)(R) + (6.3.2)where R is the distance from the receiver to the source (xo, yo), Ro is the distance fromthe receiver to the image source at (xo, —yo) (illustrated in Figure 6.7).78R receiverxFigure 6.7 Wave Incidence and Reflection for a Half-SpaceOn the other hand, the band-limited source also bandlimits the analytical solution innumerical form. The “conical” wave is not included in the asymptotic reflection coefficients. The purpose of computing the analytical solution here is to verify the accuracyof the numerical results, in particular, the boundary-smoothing effect Fortunately, fromthe numerical solution snapshots, one can note that there are plenty of receiver positionswithout the “conical” wave involved.The comparison results are illustrated in Figure 6.9 and Figure 6.10. In the comparison, the medium densities are all set as 1 and the shear velocities are c1 = 1, c2 = 3(mlsec). The receivers are located at (x=0, y=-O.O94), (x=0, y=-O.lWl) and (xtO.187,y=—O.O94) three positions, as shown in Figure 6.8.—1y0___________________________________-1 x +1Figure 6.8 Source-Receiver Geometrysource(x0,y)p2 ,c2(x0,-y)SR3 R2Ri79000U)>U)>ct21.51——0.2C=tanh (y*30)+2Medium Interface Ramp—0.1 0 0.1YAt (x0,y.094)0.2300a)>U)>asU)CC)asci)21.51——0.2C=tanh(y*90)+2Medium Interface Ramp—0.1 0 0.1Y1 At (x=0,y=—.094)0.2Figure 6.9 Trace Comparison Between Analytical (—) and Numerical (++) Solutions at (0, —0.094)**ITS IT’*x0)D4-CC)asa)—5—100.4 0.6 0.8Time(sec)0.4 0.6 0.8Time(sec)80C=tanh(v*30)+2 C=tanh(v*90).,2x io At (x.17, y=—O94) x iü At (x=.17, y=—.094)5 . 50__>—5 >—5cri—10 —100.4 0.6 0.8 1 0.4 0.6 0.8Time (sec) Time (sec)1 At (x=0, y=—.1 87) 1 At (x=0, y=—.l 87)/71mtimtum111—10 —lu0.4 0.6 0.8 1 0.4 0.6 0.8Time (sec) Time (sec)Figure 6.10 Trace Comparisons Between Analytical (—) andNumerical (++) Solutions at (0.187, —0.094) and (0, -0.187)In the comparison graphs, the left column graphs are for numerical solutions with asmoother interface ramp ( 5 transition points from velocity 1 to velocity 3). The rightcolumn graphs are for numerical solutions with only one transition point for mediuminterface. From the trace comparison, it is clear that the direct wave is fairly accuratelyrepresented in the numerical solution. The main difference between the analytical solutionand the numerical one happens in the reflection part due to the smoothed numerical81interface ramp. Because of the existence of those intermediate transition points, thenumerically calculated reflection always occurs earlier than the analytical one. Also, thenumerical reflection wave is always weaker since the adjacent medium contrast at thesmoothed interface is lower. When the number of ramp transition points is reduced toone (minimum transition for numerical simulation), the reflected wave is closer to theanalytical one in both time and magnitude. On the other hand, the reflected field wasobtained by an asymptotic expansion which is more accurate in the far field. This canbe noted that the magnitude of the reflection wave in the trace at (0, -0.187) is a little bitcloser to the analytical one than at (0, -0.094). In the traces at point (0, —0.187), one canalso note that there are some ripples occurring in the numerical solution, right at the tallof the direct wave. This is mainly due to the band limitation of the Green’s function.Although time-marching is spectrally accurate due to the high accuracy of thepolynomial time-marching scheme, all successive steps are based on the previouslygenerated numerical results (with accumulated errors from the very beginning). Generally,this error accumulation effect exists for any time-marching scheme. Compared tothe conventional finite-difference time-marching techniques, since a larger time-stepsize is allowed in this polynomial time-marching scheme, for a certain time-period,fewer time steps are required to accomplish the time-marching. Therefore, less errorsare accumulated in this time-marching scheme. With the polynomial time-marchingtechnique, one can even accomplish the time-marching for a certain time-period in onebig time-step, provided machine precision is sufficient. Then the error accumulationcan be avoided. However, in the illustrated trace results, after 750 time steps of timemarching, and after the interface ramp distortion, the tail of the reflected wave in thenumerical solutions still fairly close to the analytical one. This phenomenon shows thehigh accuracy and stability of the polynomial time-marching scheme.In conclusion, the trace comparison and the foregoing analysis show that the majorerror sources are the numerical modeffing of the medium interface, the approximate82absorbing boundary conditions and the numerical implementation of the Green’s function(truncation and the asymptotic solution).83Chapter 7 Parallel Programming On the ConnectionMachine CM-5 and the Fujitsu VPX24O/1OAs mentioned in the Introduction, the pseudospectral methods have many potentialparallelisms and have been implemented on the parallel computers [36, 37, 34]. Onthe other hand, the iterative multi-dimensional collocation evaluation in space is anextremely time-consuming procedure and it has to rely on the computational capabilitiesof supercomputers. However, the parallel implementation of the new time-marchingtechnique is still a challenge even though the kernel of the new time-marching methodsis based on the pseudospectral methods. Fortunately, two state-of-the-art supercomputersare available for this algorithm development and simulation: a 32—node (with vectorunit) Connection Machine CM-5 at Stanford University (courtesy of Stanford ExplorationProject), and a 2.5 GFLOPS peak performance single-CPU vector machine FujitsuVPX24OI1O at the Calgary High-Performance Computing Center (through HPCfFujitsuscholarship). The general descriptions of the architecture of Fujitsu VPX24O/1O vectormachine and some important programming issues by the vectorized Fortran are givenhere. Also, Connection Machine CM-5 and some CM-Fortran programming featuresare introduced as a comparison. The implementation considerations of the polynomialtime-marching scheme are addressed.7.1 Fortran Programming on Fujitsu VPX24O/1OThe Fujitsu VPX24OI1O at Calgary HPCC is a single processor, pipelined SIMDparallel computer with peak performance rate currently at 2.5 GFLOPS. The general84I-tC) C)C — CD0 C)r0,-.C’,C’, o —Cd, FTable 7.1 VPX24O/1O Hardware Summary[231No. of Vector Processing Unit 1Vector Clock Period (Frequency) 3.2 nsec (312 MHz)Peak Computation Rate 8 flops/cycle (2.5 GFLOPS)Vector Register Size 64KB reconfigurableMain Storage Unit (MSU) 512 MB (64 Mwords)Memory Architecture 128-way interleavedLoad/Store Pipes 2 bidirectionalMemory Bandwidth 4 GB/secSystem Storage Unit (SSU) 1024 MB (128 Mwords)MMTJ to SSU Bandwidth 1 GB/s read+1 GB/s writeNo. of Scalar Processing Unit 1Secondary Storage (Disk) 60 GBThe VPX24O vector processing unit is a SIMD (Single Instruction Multiple Data)pipelined parallel processor. The term “vector” processing originated with the 1977vintage CRAY- 1. The power of the original CRAY-i was derived from the extensiveuse of pipelining, supported by segmented arithmetic units and an interleaved memory.However, the maximum speedup available from pipelining is limited by the number ofoperand pairs that can be in the pipe simultaneously (about 10); all further performancemust come from parallelism. The VPX24O has added significantly to the peak performance of a single vector processing unit by introducing in the VPX24O/l0 model fourindependent arithmetic pipes for a peak computational rate of 8 floating point operations(FLOPS) per 3.2 nsec clock period, which translates to 2.5 GFLOPS.The optimal computation on the VPX24O is the matrix multiply computation definedby the following kernel [23, 16]:do 4 j=1, n*VQCL LOQP,UNROLL(4)do 4 k=1, n86do 4 i=1, n4 c(i,j)c(i,j)+a(i,k) *b(k,j)Here, the use of compiler directive ( *VOCL LOOP,UNROLL(4) ) to unroll the k loopto a depth of 4 increases the number of parallel floating point operations in the inner loopto 8 to match the 4—pipe architecture of the VPX24O without introducing any changesto the original source code. A 1024 by 1024 matrix multiply in standard FORTRANcode can execute at a speed very close to the 2.5 GFLOPS peak performance. accordingto HPCC in [23].The VPX24O/10 used for this simulation has a 512 MB Main Memory Unit (MMU)divided into 128 memory banks. Although two 4 GB/sec bidirectional load/store pipesare provided for the vector unit memory access, the memory contention problem stillexists. This problem can severely degrade the system performance if continuous memoryaddressing is not properly used. Generally, for a continuous real*8 array access, unitstride addressing through memory causes no memory bank conflict problems, but a2n+l, 4n+2 and 4n stride addressing may respectively cause a performance degradationof factor 2, 4 and 8. Therefore, the easiest way to avoid memory bank contention withreal*8 strides is to declare the leading dimension of arrays as an odd value (odd stride),and try to put the leading dimension index in the innermost loop as long as it’s possible(unit stride).The VPX24O/10 also has 1 GB of secondary memory, a semiconductor memorybased System Storage Unit (SSU) , which can be configured as part of the file system.The transfer rate between SSU and MMU is 1GB/sec for either reads or writes, for atotal bandwidth of 2 GB/sec. Users can perform 110 to the SSU using standard Fortranreads or writes at a bandwidth approaching 1 GB/sec [23]. This feature provides a verygood way for temporarily swapping out some intermediate computational results.The VPX24O Fortran compiler is a fully conforming ANSI standard Fortran 77compiler. It is an optimized compiler with powerful auto-vectorizing capabilities. The87compiler is able to vectorize a wide range of loops using a variety of advanced techniques,including inner and outer loop unrolling, nested DO loop reordering, subroutine miming,vectorization of conditional blocks, first-order recursion and so on. The vectorization ofa DO loop can be simply illustrated in Figure 7.2 [17].When a DO loop is vectorized, the execution order of the statements within the scopeof the DO loop changes. Each statement appears as if it is executed within its own DOloop. Vectorized statements are executed so that the definition and reference order ofdata appearing within the statement do not change. If nothing restricts the definitionand reference order of the data in the statement, the execution order of the statementbecomes undefined. The statement is not necessarily executed according to the order inwhich is appears in the source program. Optimization is performed so that the programexecutes at high speed, utilizing the parallel processing capabilities of the vector processorhardware. Here the loop (1) has to be executed before the loop (2) and the loop (3) dueDo 1=1, 100A(I)=B(I)÷C(I) CDE(I) =A(I)*D(I) ©F(I) =A(I)-D(I) ©EnddoExecution ImageDo 1, 100A(I)=B(I)+C(I) CDEnddoDo 1=1, 100E(I)=A(I)*D(I)EnddoDO 1=1, 100F(I)=A(I)-D(I)EnddoFigure 7.2 Vectorizing a DO Loop88to the reference order in the original source. But the (2) and (3) loop execution orderis undefined. Each of the loops will be organized into the following vector form (takethe loop (1) as an example):A(1) B(1) c(i)A(2) = B(2)+c(2)A(100) B(100) c(loo)Thus, the DO loop operation becomes a vector operation that processes vector data [A],[B] and [C] in one operation.Consequently, the following factors have been carefully considered during the Fortranimplementation of the simulation on the VPX24O/lO:• Vectorizing the code as much as possible so that the real computational power of thevector unit can be properly utilized;• Carefully dealing with some useful compiler directives such as UNROLL; wheneverit’s possible, an appropriate unrolling factor (or NOUNROLL because the defaultunrolling depth is 2 ) is used so that the number of the inner loop floating pointoperations is as close as possible to 8, and/or the number of the operation statementsis 2. Then the 4 arithmetic pipes and/or the 2 load/store pipes can be fully utilized;• Declaring the leading dimension of every array as an odd value and indexing the leaddimension in the innermost loop where possible, to minimize the memory contention;• To avoid some unnecessary overhead, write repetitive matrix multiplication/addoperations in a optimal and customized way, instead of using those general purposematrix routines from NAG or SSL IT/VP (Scientific Subroutine Libraries);• For very large-scale simulations (for example, multi-thousand time steps ), the SSU isused to temporarily store partial results so as to reduce the main memory consumption.The following tables show some performance data of the VPX24O/lO and demonstratesome powerful effects of the compiler directives and the memory contention problem.89From the above tables, it is quite clear that the optimal performance of the matrix multiplyTable 7.2 2K*2K REAL*8 Matrix MultiplyNAG FO1CRF Compiler directive: Compiler directive:routine LOOP,UNROLL(4) LOOP,UNROLL(2)CPU Time 10.287 9.764 14.950(see)VU Time 10.206 9.740 14.928(see)Vectorization 99.2% 99.8% 99.9%Table 7.3 Memory Access (2K*2K REAL*8 Array): two-dimensional assignment (3 arrays)Column loop inside, row Column loop outside, rowloop outside loop insideCPU Time (microsec) 10772 10701VU Time (microsec) 10607 10598Vectorization 98.5 % 99.0 %kernel, which fully utilizes the arithmetic pipes, is better than the corresponding NAGroutine which may have some overhead. The 2nd table also shows that the memoryaccess is always faster for the unit stride contiguous access.Due to the non-IEEE internal binary format and different intrinsic function implementation, the Schwarz-Christoffel conformal mapping package, used for generating theinterpolation points and coefficients in (3.2.14), can not guarantee a robust output on theVPX24O/10. Since this package is obtained from NETLIB and includes many complicated numerical routines, which are hard to modify and optimize for the VPX24O/10, allthe necessary data through a good implementation of this SC package are precomputedon the SUN SPARCstation. Then one can let the VPX24O/10 concentrate on the kernel90operator evaluation operation. A preliminary Fortran implementation based on the abovelisted principles can achieve about 96.8% vectorization for the whole program execution.For the kernel part of the time-marching, over 98% vectorization can be achieved andeach time step (14 iterations per step) takes about 0.78 sec CPU time, including 0.765sec vector unit time (domain size 101*201). This execution time can be converted atthe sustained processing rate of about 650 MFLOPS, 25% of the theoretic peak performance 2.5 GFLOPS or 37% of the achievable peak performance 1.76 GFLOPS (basedon 2K*2K Real*8 matrix multiply execution time 9.76 sec).7.2 CM-5 and CM-FortranThe previous implementation of the pseudospectral time-marching technique wasbased on the Thinking Machines Corp. CM-2 SIMI) supercomputer. Specially, theFourier pseudospectral time-marching with periodic boundary conditions was mainlyimplemented on the hypercube networked CM-2. Since December 1992, the parallelcomputing platform has been upgraded to the latest TMC SIMD & MIMD supercomputer,the CM5. Since then, all later implementations, such as Chebyshev collocation, absorbingboundary approximation and the final simulation, were all based on the more powerfulCM5.The CM5 is the latest supercomputer developed by Thinking Machines Corp. (Cambridge, MA) in l990s. It continues and extends support of the parallel programmingmodel that has been proved successful in the CM-2 and CM-200. The major componentsof a typical CM-5 system are illustrated in Figure 7.3.91Control NetworkFigure 7.3 Components of a 1ipicaI CM-5 System [49]The two network and the processing nodes are the key components of the CM-5. Thecontrol network is used for operations that require all the nodes to work together, suchas the partitioning of the processing nodes, general broadcast of messages, and globalsynchronization. The data network is used for point-to-point exchange of messagesbetween individual processing nodes. The topology of the CM-5 data network is a fattree (quadric tree) and currently it transfers data at upwards of 5 Mbytes/second pernode. Each CM-5 processing node contains a RISC microprocessor (currently a SPARCchip) and the optional four vector units, as shown in Figure 7.4. From Figure 7.4, it/ Data NetworkI/oProcessljTo I/O device(Scalable Disk Array,Tape System,HIPPI, etc.)Processing INodejVector ParallelUnLt 0 Memory]Vector Parallell_F MemoryJctor [iarallelUnit 2 [Memory]Vector Parallel1 huu1t F MemoryrVector Parallel1 Unit 0 r MemoryJ Vector ParallelUnit lFMemoxy] Vector rparallelUnit 2 Memory] Vector Parallel1 UnitS F MemoryExternal Network]\\92Figure 7.4 Components of a Processing Node with Vector Unitsis clear that the VUs are implicitly grouped in pairs. This affects the speed of datatransfer between the memory regions of the VUs. The fastest exchanges take placebetween the two VUs on the same chip; the next fastest between VUs on the differentchips on the same node. Transfers between different processing nodes require networkcommunication, which takes longer.Here CM-Fortran is used as the programming language because it’s an implementation of Fortran 77 supplemented with array processing extensions from the ANSI Fortran90, and it is also suitable for scientific computing. In Fortran 90, an array object can bereferenced by name in an expression or passed as an argument to an intrinsic function,and the operation is performed on every element of the array. This feature matches theparallel architecture of the CM system, which can process all the thousands of elementsin unison. If the number of array elements is greater than the number of the physicalprocessors, the CM system supports Virtual Processing which lets each physical procesT VUs on the same chipL__Control DataNetwork Network64 bit1Vector VectorUnitO Unit 1 IIVU Memory rVU Memo(8 or 32MB) (8 or 32MB)93sor behave as many Virtual Processors (VPs) so that the Connection Machine programsare completely scalable among different physically-sized CM systems. The CM systemarchitecture and interconnection network provide three different VP set layouts in CMFortran: NEWS, SEND and SERIAL, respectively corresponding to the NEWS grid interconnection, the data network router connection, and all elements along the axis local(within one processing node).CM Fortran’s execution model refers to the way a program makes use of the hardware.The CM-Fortran on CM-5 systems supports three kinds of execution models: [43]• A global model, where a single program operates on arrays of data spread acrossall the parallel processors• A nodal model, where multiple copies of a program operate independently onsubsections of the data and communicate in message passing style• A global/local model, a combination of data parallel and message passing formaximum flexibility in programmingThe Fortran 90 features of CM-Fortran refer to the global model, which is data parallel(corresponding to SIMD).In the polynomial time-marching applications, a large number of operations arematrix/array operations in a standard SIMD data parallel form. Therefore, the globalmodel of CM-Fortran is chosen for the simulations.The global CM Fortran programs executes in a master/slave style between a partitionmanager ( PM ) and the processing nodes in its partition. When the PM comes to aprogram that requires a parallel operation ( that is, requires the processing nodes, PNsto execute some operation on their data), it broadcasts a call to a parallel routine, andthe PNs immediately execute that routine. The PM handles all scalar instructions andCM Run-Time Systems (CMRTS), thereby controlling all communication involved as94shown in Figure 7.5 It is clear that all use (explicit or implicit) of parallel communicationFigure 7.5 Disthbution of Code and Data in a CM Pmgram [49]involves not only the data movement (which is the bottleneck in any parallel computing)but also the PM. Therefore, a significant amount of overhead occurs with the parallelcommunication. As a general rule for multiprocessor parallel programming, the parallelcommunication should be minimized so as to increase the purely parallel computationalportion thereby increasing the general efficiency. CM Fortran provides some compilerdirectives, command line switches and some utility libraries to control the detailed parallelarray layout. Through these tools, a programmer can flexibly control the distribution ofthe parallel array elements so that the inter-array operations involve a minimal amount ofPM Memoryo scalar dataand arraysO CMRT datastructuresPartition Managercompiled program(scalar code)a CMRTS routinesparai returnedroutine ca scalar valuesvectoroperation_____ _____ _____IvU Ilvuilvull vu4, 4? 4?PN Parallel Memorya parallel array dataProcessing Nodeso dispatch loopD compiled program(parallel code)DCMRT routinesPN Scalar Memoryo PN Kernelo some CMRTSdata structures95parallel communication. On the CM-5 systems, basically the parallel distributed :NEWSlayout and the locally :SERIAL layout are in effect.As discussed in the previous chapters, in contrast to the FFT kernel operationfor the Fourier pseudospectral time-marching, the kernel operation of the Chebyshevcollocation polynomial time-marching is the matrix multiply. Therefore, among thecomprehensive scientific utilities provided by the Connection Machine Scientific SoftwareLibrary (CMSSL), the matrix multiply utility is addressed because it is extensively usedas the kernel operation in the final simulations described in Section 6.1.There are two routines available on the CM-5 for matrix multiply: CM Fortranintrinsic function MATMIJL and the CMSSL routine gen_matrix_mult_noadd. Whenthe CM Fortran program is linked with CMSSL Version 3.1 library, the MATMULperformance is specially optimized and improved. The performance data in the followingtable show how the parallel array layout affects the matrix multiply performance. The CMTable 7.4 ZK*2K Double Precision Matrix Multiply C=A*B by CMSSL Routine gen_matrix_mulLnoaddCM Elapsed Time (sec) CM Busy Time (sec)All arrays laid out as 9.636 9.622(:news, :news)A, B: (:news, :serial), C: 9.759 9.578(:news, :news)All arrays laid out as 16.555 16.338(:news, :serial)elapsed time here means the elapsed execution time while the program is not swappedout by the OS on the partition manager. The CM busy time means the time spent actuallyexecuting parallel computation on the processing nodes. Due to the default axis reorderaction taken by the CM Fortran compiler, the :NEWS axis always varies faster than the:SERIAL one. Therefore, a layout of (:NEWS, :SERIAL) is virtually equivalent to96(:SERIAL, :NEWS) layout here for these 2K*2K arrays. From the performance datalisted in the table, making one axis of A and B local ( :SERL4L) can cause a marginalperformance improvement and making one axis of all arrays local even degraded theperformance. On the other hand, since a series ofmatrix/array operations will be involved,the incompatible layout in the 2nd case in Table 7.4 may cause further performancedegradation due to the different layouts between arrays. Therefore, unanimous :NEWSlayout is used through the entire time-marching implementation.7.3 Parallel Implementation of thePolynomial Time-Marching by CM-FortranIn this section, the preliminary optimized implementation issues on the CM-5 of thepolynomial time-marching simulation will be briefly discussed. For the model discussedin Section 6.1, the following implementation issues should be carefully handled:1. Based on the layout discussion in the previous section, :NEWS layout is usedthroughout the entire program;2. In order to minimize the interprocessor communication, some row, column masks forincorporating boundary conditions (e.g. matrices [A1] , [A2] , [B1] in (6.1.5) ) shouldbe handled via CM Fortran WhERE mask block, rather than FORALL which maycause some unnecessary communication;3. Due to some complicated non-linear numerical approximation routines involved,currently it’s better to run the Schwarz-Christoffel conformal mapping package inFortran 77 serially on the front-end. However, this may cause a significant amountof PN idle waiting while the SC package is running on the front-end. Thus, a moreefficient alternative is to pre-compute the necessary SC parameters on the workstation(e.g. in MATLAB) and save these data for the final CM parallel processing;4. To guarantee the minimum amount of communication involvement, the -list compileroption can be used to generate an additional listing file. It contains a complete line-numbered source-code listing of the program, as well as detailed information about97the array homes, a list of all CM interprocessor communication operators used bythe program, with line number showing where they are used. This informationis particularly useful in helping to identify all the communication, to eliminateredundances.By applying the above points to the CM Fortran program, the 2—D seismic simulationprogram kernel with virtually no communication can be obtained. This is confirmed by thegenerated listing files. The following table briefly illustrates the performance upgrading.Since the communication caused by the FORALL operation is eliminated in the kernelTable 7.5 Per Time Step Kernel Operation Execution Time (model discussed in Section 6.1)CM Elapsed Time (sec) CM Busy Time (sec)per time step per time stepFORALL used for the 9.262 8.656boundary manipulationsWHERE mask used for 8.954 8.341the boundarymanipulationsoperation (6.1.5), the execution time is reduced. Furthermore, the difference between CMelapsed time and busy time is also reduced since the PM involvement for the parallelcommunication is reduced.The polynomial time-marching algorithms evaluate the time dependent solution ofthe wave equation by extensively and recursively using the pseudospectral methods. Thecomputational requirement therefore is much higher than the conventional time-marchingtechniques. Thus, the powerful parallel computers such as the Connection Machine mustdecidedly be the suitable tools. The parallel implementation of the algorithms should alsobe fairly important to the application of these algorithms. From the above discussion, it98is also clear that a good or an optimal implementation on the massively parallel computercritically depends on the machine architecture and the parallel language features.99Chapter 8 Conclusion8.1 SummaryThe purpose of this thesis research is to develop a novel numerical algorithm, basedon Tal-Ezer’s time-marching method, to simulate wave scattering phenomena in the time-domain. The simulation usually is implemented on the supercomputers, which requirevery high vectorization or parallelism in the algorithm. The pseudospectral methodsare chosen for the spatial derivative approximation and the polynomial expansion orinterpolation methods are developed for the time-marching.In Chapter 2, the theoretical principles of pseudospectral methods are introduced.The pseudospectral (or named as collocation) methods always associate a set of gridpoints with some specific basis function sets. The choice of the basis function sets isdetermined by the boundary conditions of the domain. Usually, the Fourier series ischosen for periodic domains and Chebyshev polynomials are chosen for non-periodicdomains. Compared to the finite-difference methods, the pseudospectral methods have“imfinite-order” or “exponential” convergence. This “exponential” convergence, however,degrades in the presence of internal boundary and also some boundary conditions imposedon the external boundaries. The discussion is focused on the derivative representationsof the Fourier and the Chebyshev pseudospectral methods. The latter one is extensivelyaddressed in consecutive chapters.First in Chapter 3, the method of expanding the solution time dependence byChebyshev polynomials is presented. Its accuracy, resolution, convergence and stabilityissue are also discussed. The time-marching scheme is later interpreted as a special caseof the general Faber polynomial approximation. It is limited to the case of a skewsymmetric spatial operator with purely real or imaginary eigenvalues. Hence, it is onlyapplicable to the Fourier pseudospectral spatial approximation which implies the periodicboundary situations. Then, a more general Faber polynomial time-marching scheme and100its practical implementation, the Newton-form interpolation, are introduced, to deal withthose more complicated spatial operators occurring in the wave equation.By manipulating the Chebyshev collocated derivative operators in space, non-periodicboundary conditions are incorporated into the polynomial time-marching scheme, aspresented in Chapter 4. Homogeneous Dirichiet, Neuman and the mixed boundaryconditions respectively are included in the polynomial time-marching scheme withoutchanging the polynomial marching procedure. The key is to incorporate the homogeneousboundary conditions into the spatial operator so that the eigenvalue distribution of thecombined operator reflects the boundary constraints. Consequently, the Newton-forminterpolation, which is determined by the eigenvalue distribution domain and the functionform of the symbolic solution, is also affected by the boundary constraints. Afterincorporating the boundary conditions, the new scheme still maintains the same highresolution and accuracy as discussed in Chapter 3, for the model problems presented.These numerical examples show that the stability limitation over the time step in thisnew scheme is O(r).Furthermore, in order to simulate a long period of wave-scattering in a restricteddomain without involving severe edge reflection, approximate non-reflecting boundaryconditions are introduced in Chapter 5. The approximate absorbing boundary conditionsare also incorporated into the polynomial time-marching in a similar way. It is shownthat the 1st and 2nd order approximate absorber can be conveniently included in the polynomial time-marching without any significant algorithm changes or more computationalcost. One complete numerical simulation example, two-dimensional seismic reflectionmodel, is presented in Chapter 6. The snapshots of the visualization demonstrate scattering phenomena of diffraction, reflection and transmission.Chapter 7 addresses the parallel programming issues of the numerical simulationon the supercomputers Fujitsu VPX24O/lO and TMC CM-5. The architectures of thetwo supercomputers are introduced and compared. The detailed programming and101optimization considerations in Fortran (vectorized Fortran on the VPX24O/lO and CM-Fortran on the CM-5) are also discussed. Some preliminary optimization results arepresented there. The performance data show that the polynomial time-marching schemecan be highly vectorized or parallelized. Therefore, it’s very suitable for implementationon supercomputers for simulating some large-scale complex physical phenomena.8.2 ContributionsThe following is a list of the major contributions of this thesis:• Implementation of the Fourier pseudospectral time-marching technique on the CM-2supercomputers to simulate some simple-shape scalar wave scatterings.• Extension of the polynomial time-marching scheme to the non-periodic boundaryvalue problems, including incorporating homogeneous boundary conditions, Dirichiet,Neumann or the mixed, into the Chebyshev collocation operators, thus significantlyextending the application range of the polynomial time-marching technique.• Derivation of the eigenvalue distribution of the combined two-dimensional Chebyshevcollocation operator via Kronecker product.• Incorporation of the 2nd order approximate absorbing boundary conditions into thepolynomial time-marching scheme, thus making it possible to apply the polynomialtime-marching scheme to simulate wave propagation in opened domains.• Simulation and visualization of a complex seismic wave scattering model by thepolynomial time-marching scheme on both of the CM-5 and the Fujitsu VPX24OI1Osupercomputers, including optimization of the parallel implementation in CM-Fortran(Fortran 90) on the CM-5 and in the vectorized Fortran on the VPX24O/10.• Examination, via numerical examples, for the one-dimensional 2nd order waveequation of the accuracy for different N. The exponential convergence degrades inthe presence of internal boundaries or external applied boundary conditions.1028.3 Future WorkDue to the severe accuracy degradation in the presence of medium discontinuities or2nd order absorbing boundary conditions, further work is needed to study the internalboundary manipulation and also a higher order absorbing approximation should beincorporated. A full comparison with an analytical solution in the two-dimensionalvariable medium should be studied in the future as well.In this thesis, the polynomial time-marching scheme for two-dimensional Chebyshevcollocation case have been studied. For three-dimensional case or the ChebyshevlFouriermixed collocation case (for example, in cylinder or spheric coordinates), the wholederivation procedure but the configuration of the collocation operators remains the same.Therefore, in order to efficiently obtain the collocation matrix operator and its eigenvaluedistribution in the three-dimensional case, an appropriate form of the three-dimensionalChebyshev collocation operator incorporated with boundary conditions needs to be builtup.Generally, this polynomial time-marching scheme deals with boundary-initial-valueproblems without any source excitation. However, some short pulse source problems canalways be approximated by the same scheme as used in Chapter 6. Further theoreticalwork is needed to work out the cases with a long-lasting source.Also, three-dimensional seismic forward modelling based on the complete elasticwave equation will be certainly considered as one of the goals for this algorithm. Themajor difficulty would be on constructing the three-dimensional operator and derivingits spectral radius.103Appendix A Some Snapshot of FourierPseudospectral Time-marching ResultsFigure A.1 Snapshot of wave scattering described in page 37—39104Appendix B Previous Study Results on thePerformance of the Connection Machine CM-2The Connection Machine CM-2/200 is a massively parallel, single instruction multiple data (SIMD) machine. A fully configured CM-2 has 64K one-bit processors. Allthese processing elements are organized into processing nodes each of which includes32 processors, and the distributed memory, a 64—bit floating-point accelerator and communication interfaces for interprocessor communication. The interprocessor connectionnetwork of the CM-2 provides three forms of interprocessor communication: general communication, which is based on hypercube message passing router and physically connectsthe addresses differing by one bit; NEWS grid communication, which provides the hardware connections for each processor to its four nearest neighbors in a two-dimensionalCartesian grid; and global communication, which is useful for the global cumulative andreduction computations along one axis. These architectures provide more flexible parallelprogramming techniques in high level languages.Some early stage work is based on Fourier pseudospectral and implemented on theConnection Machine CM-2 with 8K processors. The following is the outline of this work:Performance Results for The Parallel linplementationof Fourier Pseudospectral Time-Marching on CM-2The following table shows some performance data for computing a 128* 128 two-dimensional problem in one time step (the Chebyshev polynomials were evaluated upto 28th order): From the performance data in the above table, it is clear that the105The FFTcomplexworking arraylayout(10000:send,:send)The FF1’complexworking arraylayout(:serial,:send)’Layout is thesame as the firstone, butattached to 8KprocessorsTotal one step CM time (see) 0.8115 1.6923 0.459 1CM Elapsed 0.8326 1.7169 0.4877time-marchingtime (see)Pseudo- CM time (see) 0.7702 1.6511 0.4376CM Elapsed 0.7913 1.6757 0.4662spectraltime (see)evaluation(computing allthe requiredK’) CM time (see) 0.0398 0.0397 0.0206CM Elapsed 0.0398 0.0397 0.0206Polynomialtime (see)ExpansionTable B.1 The 128*128 one time step performance comparison: attached to 4K processor exceptotherwise described. The CM-2 used here has 256 Kbits/proc. distributed memory and one 64—bitsfloating-point processing unit per 32 processors. The clock frequency of this CM-2 is 7 MHz.pseudospectral evaluation uses the dominant part the execution time (about 95% of thethe wave speed array, wavenumber array are also set in the same way.106total time), and the serial operation for the Chebyshev polynomial expansion is also quiteefficient as the CM time is equal to the CM Elapsed time. Furthermore, setting the firstspace axis of a two-dimensional problem as SERIAL will greatly decrease the parallelexecution efficiency. Increasing the number of the parallel processors to 8K can increasethe performance by a factor about 1.7 (less than linear)As previously mentioned, the kernel parallel operation of this algorithm is the evaluation of various orders of the spatial derivative operator by the pseudospectral methods,either Fourier (for periodic interval) or Chebyshev (for boundary value problems) collocation. Hence, some conventional pseudospectral parallel implementation techniques, asanalyzed by R.B. Pelz in [36], wifi also be applied in this implementation:1. To minimize the con-imunication cost in the FFT, it is desirable that the x direction(the first space axis of the array) be mapped as locally as possible. The x axis weightis set to be greater than the other axes;2. To match the FF1’ butterfly communication pattern, which always operates betweendata elements at a distance of 2’, the FFT complex working array axes should belaid out in SEND hypercube ordering;3. To avoid wasteful communication cost, set the frequency domain axis ordering inbit-reversed order;4. To make the wavenumber multiplication more efficient, precompute all the one-dimensional wavenumber vectors and combine them as a three-dimensional bit-reversed ordered wavenumber array with the same layout as the working array. Thus,the interprocessor communication during the multiplication can be minimized.In addition to the above, because the new algorithm requires the polynomials of thespatial derivative operator for time-marching, it is also necessary to extend the previousparallel implementation techniques for the conventional pseudospectral methods:1. To make better use of the Chebyshev and Bessel recurrence relations, the Chebyshevpolynomial coefficients and the Bessel function values are calculated on the front-end107beforehand. The computation of the Bessel function values is based on the NETLIBspecial function routines (written by W. J. Cody of Argonne National Laboratory);2. To evaluate the Chebyshev polynomial of the spatial derivatives, the Chebyshevpolynomial coefficients and the Bessel functions will be addressed individually formultiplication with the various order derivative arrays. Therefore, storing thesecoefficients and function values as frond-end arrays instead of parallel CM arrayscan dramatically reduce the single CM array element movement, which is one of themost inefficient operations in the SIIvID machines;3. Because of the above consideration, the order index dimensions of the derivativearrays are also desirably set as SERIAL, so that some serial DO-LOOPs can beused for the polynomial evaluation. (the DO-LOOP is usually more efficient in theSERIAL dimension than FORALL[431);4. As the new time-marching technique is not as simple as the finite difference one,which may use the nearest neighbor NEWS grid communication, it is also expected toset the time dimension as SERIAL so as to use DO-LOOP for the more complicatedtime-marching operations;5. If the first space axis of the FF1’ working array (complex) is set SERIAL, all the otherarrays should also be laid out in the same way so as to minimize the communicationcost of the operations between them and the working array. Because the time axisof the solution array has been set SERIAL (mentioned above), the two-dimensionalproblems will only have one parallel dimension in the solution array. This layoutusually is also inefficient because the spreadout of the only parallel dimension may beless than the CM parallel processing requirement[43] (For an 8K CM-2, a 1K parallelelement dimension is the minimum requirement for the Slicewise mode. Otherwisesome processors have to be idled). Therefore, all the space axes have to be laid outas SEND ordering, but the first one should be given the highest weight.108Appendix C Derivation of (6.1.5)Step 1: represent the 2nd order equation (6.1.1) into Chebyshev collocation matrixform (no boundary conditions).[vj + r[v}[Dz]()[Dz]} (C.1)Here, are defined in (6.1.7). The [Is are matrices and Qs are arrays. [DJ, [Di] arejust normal Chebyshev collocated 1st order derivative matrix, defined in (4.1.3).Step 2: incorporate the top edge free surface (Neumann) boundary condition and the2nd order terms of all the absorbing boundary conditions into (C.1).[v] {r[Doj(fL)[Dl][v] + r[vj[D]()[D2]}, (c.2)where,Jzu01 UON dX• 1,0D0=:dZ:, D1 = : •.. : (C.3)N—1,1 N—i N dX dX0 0 (N+1)xNN,0 N,N Nx(N+1)o d,1 •.. d,M_l 0o d1 ••• d,M_l 0 (M-i-1)2The matrix [D0j includes the 2nd order term of the absorbing boundary condition for thebottom edge. The matrix [D1] reflects the top edge homogeneous Neumann condition.The matrix [D2] contains the 2nd order terms of the absorbing boundary conditions forthe left and right edges.Step 3: build the 1st order terms for the absorbing boundary conditions.op2{vj} = (c){r[D4J[v] + r[vt][D]}, (C.4)109where9 9 —d,0 0 0D4= ,D5= : :d,0 d,N (N+i)2—d,0 0 •.• 0 +d,M (M+i)2(C.5)The matrix [D4j, [D5] contain the 1st order term of the absorbing boundary conditionsrespectively for the bottom edge, the left and right edges.Step 4: extract those corner points which have been counted twice by their adjacentedges, include the 0.5 factor for the 1st order term in the absorbing approximation.opl{v} = + r[Aij[vJ[Dzj(1i)2J}, (C.6)op2{vt} = (c){r[D4][vt}[Bi] + r[vtj[D5]}00...0 001•..0 0A1= 0 •. (C.7)00•• 1 00 0 ... 0 0.5 (N+i)20.5 0 0 0o 1••.0 0 0•• 0A2 0 , B1 ‘M—i0 0 1 0 0 0 (M+i)20 0 0 0.5 (M+i)2References[1] Milton Abramowitz and Irene A. Stegun, editors. Handbook of MathematicalFunctions with Formulas, Graphs, and Mathematical Tables, pages 877—878. U.s.National Bureau of Standards, 1972.[2} Keiiti Aid and Paul G. Richards. Quantitative Seismology: theomy and methods. W.H. Freeman And Company, 1980.110[3] R.M. Alford, K.R. Kelly, and D. M. Boore. Accuracy of finite-difference modelingof the acoustic wave equation. Geophysics, 39(6), 1974.[4] Stephen Barnett. Matrices, Methods and Applications. CLARENDON PRESS,OXFORD, 1990.[5] Joy P. Boyd. Chebyshev & Fourier Spectral Methods. Springer-Verlag, 1989.[6] L.M. Brekhovskilch and Yu.P. Lysanov. Fundamentals of Ocean Acoustics.Springer-Verlag, 1982.[7] L. Cagniard. Reflection and Refraction of Progressive Seismic Waves. McGraw-Hill, 1962.[8] C. Canuto, M. Y. Hussaini, A. Quarteroni, and T. A. Zang. Spectral Methods inFluid Dynamics. Springer-Verlag, 1988.[9] C. Canuto and A. Quarteroni. Error estimation for spectral and pseudospectralapproximations of hyperbolic equations. SIAM J. Numer. Anal., 19:629—642, 1982.[10] S.W. Ellacott. Computation of Faber series with application to numerical polynomial approximation in the complex plane. Mathematics of Computation,40(162):575—587, 1983.[11] B. Engquist and A. Majda. Absorbing boundary conditions for the numericalsimulation of waves. Math. Comput., 31:629—651, 1977.[12] John T. Etgen. Accurate wave equation modeling. Private Communication, 1991.[13] B. Fischer and L. Reichel. Newton interpolation in Fejér and Chebyshev points.Math. Comp., 53:265—278, 1989.[14] Bengt Fornberg. The pseudospectral method: comparisons with finite differencesfor the elastic wave equation. Geophysics, 52(4):483—501, 1987.[15] Bengt Fornberg. The pseudospectral method: accurate representation of interfacesin elastic wave calculations. Geophysics, 53(5):625—637, 1988.[16] FUJITSU LIMITED. FUJiTSU FORTRAN VP PROGRAMMING GUIDE, 1991.111[17] FUJiTSU LIMITED. FUJiTSU UXP/M FORTRAN77 EX/VP USER’S GUIDE,1992.[18] G. E. Q. Goode. Numerical Simulation ofViscoelastic Waves. PhD thesis, Universityof Calgary, AB., 1993.[19] David Gottlieb and Richard S. Hirsh. Parallel pseudospectral domain decompositiontechniques. Journal of Scientific Computing, 4(4):309—325, 1989.[20] David Gottlieb, M. Yousuff Hussaini, and Steven A. Orzarg. Introduction: TheoryandApplications of Spectral Methods, pages 1—54. SIAM, Philadelphia, 1984.[21] David Gottlieb and L. Lustman. The spectrum of the Chebyshev collocationoperator for the heat equation. SIAM J. Numer. Anal., 20:909—921, 1983.[22] Ri. Higdon. Absorbing boundary conditions for difference approximations to themulti-dimensional wave equation. Math. Comput., 47:437—459, 1986.[23] HPC Center, Calgary, Canada. HPCC User’s Guide, 1993.[24] M.Y. Hussaini, D.A. Kopriva, and A.T. Patera. Spectral Collocation Methods.Applied Numerical Mathematics, 5:177—208, 1989.[25] F. Kalantzis, N. Dai, E. R. Kanasewich, S. Phadke, and A. Vafidis. 2—d and 3—dseismic reflection modeling and imaging using vector and parallel supercomputer.In Proceedings of Supercomputing Symposium’93, June 1993.[26] H. Kreiss and 3. Oliger. Methods for the approximate solution of time dependentproblems. Global Atmospheric Research Programme (GARP) Publications SeriesNo.10, January 1973.[27] E.L. Lindman. Free space boundary conditions for the time dependent waveequation. J. Comput. Phys., 18:66—78, 1975.[28] Yong Luo and Matthew J. Yedlin. Polynomial time-marching for non-reflectionboundary problems. Journal of Scientific Computing. (prepared in Sept. 1994).112[29] Yong Luo and Matthew J. Yedlin. Simulating Some Complex Wave ScatteringProblems on CM-2 Connection Machine by Pseudospectral Time-Marching. InProceedings ofThe 3rd International Conference onApplications ofSupercomputersin Engineering, Southampton, U.K., September 1993. Computational MechanicsPublications.[30] Yong Luo and Matthew J. Yedlin. Solving wave equations by pseudospectral time-marching on cm-2 connection machine. In Proceedings of The 7th Annual HighPeiformance Computing Conference of Canada, Calgary, AB, June 1993.[31] Yong Luo and Matthew J. Yedlin. Polynomial time-marching for non-periodicboundary value problems. Journal of Scientific Computing, 1994. (in press).[32] A. I. Markushevich. Theoiy of Functions of a Complex Variable. Chelsea, NewYork, 1977.[33] V.P. Maslov. Operational Methods. MIR Publishers, Moscow, 1976.[34] Oliver A. McBryan. The Connection Machine: PDE solution on 65,536 processors.Parallel Computing, 9(1):1—24, December 1988.[35] Alireza H Mohammadian, Vii aya Shankar, and William F. Hall. Computationof electromagnetic scattering and radiation using a time-domain finite-volumediscretization procedure. Computer Physics Communications, 68:175—196, 1991.[36] Richard B. Pelz. Pseudospectral methods on massively parallel computers. Computer Methods in Applied Mechanics & Engineering, 80:493—503, 1990.[37] Richard B. Pelz. Fourier spectral method on ensemble architectures. ComputerMethods in Applied Mechanics & Engineering, 89:529—542, 1991.[38] R.A. Renaut. Absorbing boundary conditions, difference operator, and stability.Journal of Computational Physics, 102:236—251, 1992.[39] Hillel Tal-Ezer. Spectral methods in time for hyperbolic equations. SIAM J. Numer.Anal., 23(1), February 1986.113[401 Hillel Tal-Ezer. Polynomial approximation of functions of matrices and applications. Journal of Scientific Computing, 4(1):25—60, 1989.[41] Hillel Tal-Ezer. High degree polynomial interpolation in Newton form. SIAM .1.Sci. Stat. Comput., 12(3):648—667, 1991.[42] Thinking Machines Corp., Cambridge, AB. Connection Machine CM-200 SeriesTechnical Summaiy, 1991.[43] Thinking Machines Corp., Cambridge, MA. ConnectionMachine Fortran Programming Guide (V.1.13), (V 2.1, Feb. 1994), July 1991.[44] Thinking Machines Corp., Cambridge, MA. Getting Started in CM Fortran, 1991.[45] Thinking Machines Corp., Cambridge, MA. CMFortran User’s Guidefor the CM-S(v 1.1.3), (v. 2.1, Feb. 1994), January 1992.[46] Thinking Machines Corp., Cambridge, MA. CM5 Technical Summary, November1992.[47] Thinking Machines Corp., Cambridge, MA. CMSSLfor CM Fortran: CM-S Edition(v 3.1), June 1993.[48] Thinking Machines Corp., Cambridge, MA. Using the CMAX Converter (v 1.0),July 1993.[49] Thinking Machines Corp., Cambridge, MA. CM-S CM Fortran Performance Guide(v. 2.1), 1994.[50] P. A. Tirkas, C. A. Balanis, and R.A. Renaut. Higher order absorbing boundaryconditionss for the finite-difference time-domain method. IEEE Trans. on Ant. &Prop., 40(10):1215—1222, October 1992.[51] Lloyd. N. Trefethen. Numerical computation of the Schwarz-Christoffel transformation. SIAM J. Sci. Stat. Comput., 1(1):82—102, 1980.[52] Lloyd N. Trefethen. SCPACK User’s Guide (NETLJB Document), 1983.114[53] L.N. Trefethen and L. Halpern. Well-posedness of one-way wave equations andabsorbing boundary conditions. Math. Comput., 47:421—435, 1986.[54] L.N. Trefethen and M.R. Trummer. An instability phenomenon in spectral methods.SIAM J. Numer. Anal., 24(5):1008—1023, 1987.[55] A Vafidis, F. Abramovici, and E.R. Kanasewich. Elastic wave propagation usingfully vectorized high order finite-difference algorithms. Geophysics, 57(2), 1992.[56] R. Vichnevetsky and J.B. Bowles. FourierAnalysis ofNumerical Approximation ofHyperbolic Equations. SIAM, 1982.[57] Jean Virieux. Sh-wave propagation in heterogeneous media: Velocity-stress finite-difference method. Geophysics, 49(11), 1984.[58] J.L. Walsh. Interpolation and approximation by rational functions in the complexdomain. Providence, Rhode Island, 1956.[59] J.A.C. Weideman and L.N. Trefethen. The eigenvalues of second-order spectraldifferentiation matrices. SIAM J. Numer. Anal., 25:1279—1298, 1988.[60] Qin Zhang. Acoustic Pulse Diffraction by CurvedAnd Planar Structures With Edges.PhD thesis, The University of British Columbia, Dept. of Electrical Engineering,1990.[61] Shengkai Zhao. Chebyshev Spectral Methodsfor Potential Field Computation. PhDthesis, The University of British Columbia, August 1993.115
- Library Home /
- Search Collections /
- Open Collections /
- Browse Collections /
- UBC Theses and Dissertations /
- Simulating wave scattering problems by pseudospectral...
Open Collections
UBC Theses and Dissertations
Featured Collection
UBC Theses and Dissertations
Simulating wave scattering problems by pseudospectral time-marching on supercomputers Luo, Yong 1994
pdf
Page Metadata
Item Metadata
Title | Simulating wave scattering problems by pseudospectral time-marching on supercomputers |
Creator |
Luo, Yong |
Date Issued | 1994 |
Description | In this thesis, a new time-marching scheme for time-dependent PDEs with periodic and non-periodic boundary conditions is introduced and implemented. The new time-marching scheme is based on the polynomial interpolation of the symbolic solution of the original PDEs. The approximation of the space derivatives is performed by pseudospectral methods. The marching of the solution in the time domain is done by the polynomial expansion or the Newton-form polynomial interpolation, depending on the properties of the space derivatives. The boundary conditions are properly represented by a suitable pseudospectral approximation and some technical manipulations of the collocated operators. This technique can efficiently provide balanced spectral accuracy in both the space and time dimensions. The numerical stability and resolution are also improved by the new polynomial time-marching scheme. In the periodic boundary case, the spatial approximation generally can be done by Fourier collocation and the time-marching sometimes can be easily implemented by Chebyshev polynomial series. In the non-periodic case, the spatial operator should be approximated by a Chebyshev collocation, which can include different non-periodic boundary conditions through careful manipulations of the boundary conditions. In this case and the complicated periodic case, the time-marching has to rely on the more general Newton-form interpolation based on Fejér points. Based on the new polynomial time-marching scheme, a two-dimensional, SH-seismic reflection model is simulated by full implementation of the new time-marching scheme with the approximated absorbing boundary conditions. Some scattering phenomena such as diffraction are illustrated through the visualization. The simulation of the physical model is accomplished on two supercomputers: the TMC Connection Machine CM-5 and the Fujitsu VPX240/10. The parallel programming (CM-Fortran on CM-5 and Fortran 77/VP on the VPX240/10) and optimization issues on the two supercomputers are also discussed in this thesis. |
Extent | 2940269 bytes |
Genre |
Thesis/Dissertation |
Type |
Text |
FileFormat | application/pdf |
Language | eng |
Date Available | 2009-06-09 |
Provider | Vancouver : University of British Columbia Library |
Rights | For non-commercial purposes only, such as research, private study and education. Additional conditions apply, see Terms of Use https://open.library.ubc.ca/terms_of_use. |
DOI | 10.14288/1.0065227 |
URI | http://hdl.handle.net/2429/8870 |
Degree |
Doctor of Philosophy - PhD |
Program |
Electrical and Computer Engineering |
Affiliation |
Applied Science, Faculty of Electrical and Computer Engineering, Department of |
Degree Grantor | University of British Columbia |
GraduationDate | 1995-05 |
Campus |
UBCV |
Scholarly Level | Graduate |
AggregatedSourceRepository | DSpace |
Download
- Media
- 831-ubc_1995-983167.pdf [ 2.8MB ]
- Metadata
- JSON: 831-1.0065227.json
- JSON-LD: 831-1.0065227-ld.json
- RDF/XML (Pretty): 831-1.0065227-rdf.xml
- RDF/JSON: 831-1.0065227-rdf.json
- Turtle: 831-1.0065227-turtle.txt
- N-Triples: 831-1.0065227-rdf-ntriples.txt
- Original Record: 831-1.0065227-source.json
- Full Text
- 831-1.0065227-fulltext.txt
- Citation
- 831-1.0065227.ris
Full Text
Cite
Citation Scheme:
Usage Statistics
Share
Embed
Customize your widget with the following options, then copy and paste the code below into the HTML
of your page to embed this item in your website.
<div id="ubcOpenCollectionsWidgetDisplay">
<script id="ubcOpenCollectionsWidget"
src="{[{embed.src}]}"
data-item="{[{embed.item}]}"
data-collection="{[{embed.collection}]}"
data-metadata="{[{embed.showMetadata}]}"
data-width="{[{embed.width}]}"
async >
</script>
</div>
Our image viewer uses the IIIF 2.0 standard.
To load this item in other compatible viewers, use this url:
https://iiif.library.ubc.ca/presentation/dsp.831.1-0065227/manifest