Open Collections

UBC Theses and Dissertations

UBC Theses Logo

UBC Theses and Dissertations

The discrete adjoint method for high-order time-stepping methods Rothauge, Kai 2016

Your browser doesn't seem to have a PDF viewer, please download the PDF to view this item.

Item Metadata

Download

Media
24-ubc_2017_february_rothauge_kai.pdf [ 1.62MB ]
Metadata
JSON: 24-1.0340675.json
JSON-LD: 24-1.0340675-ld.json
RDF/XML (Pretty): 24-1.0340675-rdf.xml
RDF/JSON: 24-1.0340675-rdf.json
Turtle: 24-1.0340675-turtle.txt
N-Triples: 24-1.0340675-rdf-ntriples.txt
Original Record: 24-1.0340675-source.json
Full Text
24-1.0340675-fulltext.txt
Citation
24-1.0340675.ris

Full Text

The Discrete Adjoint Method forHigh-Order Time-SteppingMethodsbyKai RothaugeMMath, The University of Bath, 2007A THESIS SUBMITTED IN PARTIAL FULFILLMENT OFTHE REQUIREMENTS FOR THE DEGREE OFDOCTOR OF PHILOSOPHYinThe Faculty of Graduate and Postdoctoral Studies(Mathematics)THE UNIVERSITY OF BRITISH COLUMBIA(Vancouver)December 2016c© Kai Rothauge 2016AbstractThis thesis examines the derivation and implementation of the discrete adjoint methodfor several time-stepping methods. Our results are important for gradient-based nu-merical optimization in the context of large-scale model calibration problems that areconstrained by nonlinear time-dependent PDEs. To this end, we discuss finding thegradient and the action of the Hessian of the data misfit function with respect to threesets of parameters: model parameters, source parameters and the initial condition.We also discuss the closely related topic of computing the action of the sensitivitymatrix on a vector, which is required when performing a sensitivity analysis. Thegradient and Hessian of the data misfit function with respect to these parametersrequires the derivatives of the misfit with respect to the simulated data, and we givethe procedures for computing these derivatives for several data misfit functions thatare of use in seismic imaging and elsewhere.The methods we consider can be divided into two categories, linear multistep (LM)methods and Runge-Kutta (RK) methods, and several variants of these are discussed.Regular LM and RK methods can be used for ODE systems arising from the semi-discretization of general nonlinear time-dependent PDEs, whereas implicit-explicitand staggered variants can be applied when the PDE has a more specialized form.Exponential time-differencing RK methods are also discussed. The implementation ofthe associated adjoint time-stepping methods is discussed in detail. Our motivation isthe application of the discrete adjoint method to high-order time-stepping methods,but the approach taken here does not exclude lower-order methods.All of the algorithms have been implemented in MATLAB using an object-iiAbstractoriented design and are written with extensibility in mind. For exponential RKmethods it is illustrated numerically that the adjoint methods have the same or-der of accuracy as their corresponding forward methods, and for linear PDEs we givea simple proof that this must always be the case. The applicability of some of themethods developed here to pattern formation problems is demonstrated using theSwift-Hohenberg model.iiiPrefaceThe work presented in this thesis is original research conducted while studying at theUniversity of British Columbia under the supervision of Dr. Eldad Haber and Dr.Uri Ascher. The inspiration came from a discussion with Dr. Haber about high-orderRunge-Kutta methods in the context of full waveform inversion.I am responsible for deriving the expressions for all time-stepping methods con-sidered in this study, aided by discussions with my supervisors regarding the discreteadjoint method and related topics. The code was implemented from scratch by myself,and I performed all the numerical tests and conducted the numerical experiments. Iam responsible for the thesis manuscript preparation, with revision support from Dr.Ascher and Dr. Haber.Some results of this work have been submitted for publication. The application ofthe discrete adjoint method to general linear multistep and Runge-Kutta methods forlinear PDEs is presented in [94], and the discrete adjoint method for exponential time-differencing Runge-Kutta methods, including the application to a pattern formationproblem, is discussed in [93]. Dr. Ascher and Dr. Haber provided fruitful discussionsand helped with editing of the manuscripts. Additional manuscripts based on thiswork are in preparation.ivTable of ContentsAbstract . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . iiPreface . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . ivTable of Contents . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . vList of Tables . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . ixList of Figures . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . xiList of Acronyms . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . xiiNotation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . xivAcknowledgements . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . xxDedication . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . xxi1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11.1 Time-Stepping Schemes . . . . . . . . . . . . . . . . . . . . . . . . 31.2 PDE-Constrained Optimization . . . . . . . . . . . . . . . . . . . . 41.3 The Adjoint Method . . . . . . . . . . . . . . . . . . . . . . . . . . 61.4 Thesis Contributions and Related Work . . . . . . . . . . . . . . . 81.5 Thesis Overview and Outline . . . . . . . . . . . . . . . . . . . . . 122 Review of Time-Stepping Methods . . . . . . . . . . . . . . . . . . 14vTable of Contents2.1 Regular Time-Stepping Methods . . . . . . . . . . . . . . . . . . . 152.1.1 Linear Multistep Methods . . . . . . . . . . . . . . . . . . . 162.1.2 Runge-Kutta Methods . . . . . . . . . . . . . . . . . . . . . 202.2 Implicit-Explicit Time-Stepping Methods . . . . . . . . . . . . . . 242.2.1 IMEX Linear Multistep Methods . . . . . . . . . . . . . . . 252.2.2 IMEX Runge-Kutta Methods . . . . . . . . . . . . . . . . . 272.3 Staggered Time-Stepping Methods . . . . . . . . . . . . . . . . . . 302.3.1 Staggered Linear Multistep Methods . . . . . . . . . . . . . 322.3.2 Staggered Runge-Kutta Methods . . . . . . . . . . . . . . . 332.4 Exponential Time-Differencing Methods . . . . . . . . . . . . . . . 382.4.1 Exponential Runge-Kutta Methods . . . . . . . . . . . . . . 392.4.2 The Action of ϕℓ on Arbitrary Vectors . . . . . . . . . . . . 433 Derivatives of the Misfit Function . . . . . . . . . . . . . . . . . . . 493.1 The Sensitivity Matrix . . . . . . . . . . . . . . . . . . . . . . . . . 503.2 The Gradient . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 543.3 The Hessian . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 554 Derivatives of t . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 584.1 Regular Time-Stepping Methods . . . . . . . . . . . . . . . . . . . 594.1.1 Linear Multistep Methods . . . . . . . . . . . . . . . . . . . 594.1.2 Runge-Kutta Methods . . . . . . . . . . . . . . . . . . . . . 604.2 Implicit-Explicit Time-Stepping Methods . . . . . . . . . . . . . . 624.2.1 IMEX Linear Multistep Methods . . . . . . . . . . . . . . . 624.2.2 IMEX Runge-Kutta Methods . . . . . . . . . . . . . . . . . 634.3 Staggered Time-Stepping Methods . . . . . . . . . . . . . . . . . . 654.3.1 Staggered Linear Multistep Methods . . . . . . . . . . . . . 654.3.2 Staggered Runge-Kutta Methods . . . . . . . . . . . . . . . 664.4 Exponential Time-Stepping Methods . . . . . . . . . . . . . . . . . 68viTable of Contents4.4.1 Exponential Runge-Kutta Methods . . . . . . . . . . . . . . 684.4.2 The Derivatives of ϕℓ . . . . . . . . . . . . . . . . . . . . . 685 The Linearized Forward Problem . . . . . . . . . . . . . . . . . . . 735.1 Regular Time-Stepping Methods . . . . . . . . . . . . . . . . . . . 745.1.1 Linear Multistep Methods . . . . . . . . . . . . . . . . . . . 745.1.2 Runge-Kutta Methods . . . . . . . . . . . . . . . . . . . . . 765.2 Implicit-Explicit Time-Stepping Methods . . . . . . . . . . . . . . 795.2.1 IMEX Linear Multistep Methods . . . . . . . . . . . . . . . 795.2.2 IMEX Runge-Kutta Methods . . . . . . . . . . . . . . . . . 815.3 Staggered Time-Stepping Methods . . . . . . . . . . . . . . . . . . 845.3.1 Staggered Multistep Methods . . . . . . . . . . . . . . . . . 855.3.2 Staggered Runge-Kutta Methods . . . . . . . . . . . . . . . 875.4 Exponential Time-Differencing Methods . . . . . . . . . . . . . . . 925.4.1 Exponential Runge-Kutta Method . . . . . . . . . . . . . . 925.4.2 Application to Krogstad’s Scheme . . . . . . . . . . . . . . 946 The Adjoint Problem . . . . . . . . . . . . . . . . . . . . . . . . . . . 966.1 Regular Time-Stepping Methods . . . . . . . . . . . . . . . . . . . 976.1.1 Linear Multistep Methods . . . . . . . . . . . . . . . . . . . 986.1.2 Runge-Kutta Methods . . . . . . . . . . . . . . . . . . . . . 1006.2 Implicit-Explicit Time-Stepping Methods . . . . . . . . . . . . . . 1036.2.1 IMEX Linear Multistep Methods . . . . . . . . . . . . . . . 1036.2.2 IMEX Runge-Kutta Methods . . . . . . . . . . . . . . . . . 1056.3 Staggered Time-Stepping Methods . . . . . . . . . . . . . . . . . . 1096.3.1 Staggered Linear Multistep Methods . . . . . . . . . . . . . 1096.3.2 Staggered Runge-Kutta Methods . . . . . . . . . . . . . . . 1116.4 Exponential Time-Differencing Methods . . . . . . . . . . . . . . . 1166.4.1 Exponential Runge-Kutta Methods . . . . . . . . . . . . . . 117viiTable of Contents6.5 Stability, Convergence, and Order of Accuracy for Linear Problems 1207 Data Misfit Function Examples . . . . . . . . . . . . . . . . . . . . 1247.1 Least-Squares Amplitudes Misfit . . . . . . . . . . . . . . . . . . . 1267.2 Cross-Correlation Time Shift Misfit . . . . . . . . . . . . . . . . . . 1287.3 Interferometric Misfit . . . . . . . . . . . . . . . . . . . . . . . . . 1348 Numerical Experiment . . . . . . . . . . . . . . . . . . . . . . . . . . 1469 Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 157Bibliography . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 160AppendixA Derivations of the Derivatives of t . . . . . . . . . . . . . . . . . . 175A.1 Regular Time-Stepping Methods . . . . . . . . . . . . . . . . . . . 176A.1.1 Linear Multistep Methods . . . . . . . . . . . . . . . . . . . 176A.1.2 Runge-Kutta Methods . . . . . . . . . . . . . . . . . . . . . 185A.2 Implicit-Explicit Time-Stepping Methods . . . . . . . . . . . . . . 197A.2.1 IMEX Linear Multistep Methods . . . . . . . . . . . . . . . 197A.2.2 IMEX Runge-Kutta Methods . . . . . . . . . . . . . . . . . 208A.3 Staggered Time-Stepping Methods . . . . . . . . . . . . . . . . . . 223A.3.1 Staggered Linear Multistep Methods . . . . . . . . . . . . . 223A.3.2 Staggered Runge-Kutta Methods . . . . . . . . . . . . . . . 231A.4 Exponential Time Differencing . . . . . . . . . . . . . . . . . . . . 263A.4.1 Exponential Runge-Kutta Methods . . . . . . . . . . . . . . 263B Derivation of the Hessian . . . . . . . . . . . . . . . . . . . . . . . . 273viiiList of Tables4.1 References to the first derivatives of t for linear multistep methods. 594.2 References to the second derivatives of t for linear multistep methods. 604.3 References to the first derivatives of t for Runge-Kutta methods. . . 614.4 References to the second derivatives of t for Runge-Kutta methods. 614.5 References to the first derivatives of t for IMEX linear multistep meth-ods. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 624.6 References to the second derivatives of t for IMEX linear multistepmethods. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 634.7 References to the first derivatives of t for IMEX Runge-Kutta methods. 644.8 References to the second derivatives of t for IMEX Runge-Kutta meth-ods. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 644.9 References to the first derivatives of t for staggered linear multistepmethods. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 654.10 References to the second derivatives of t for staggered linear multistepmethods. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 664.11 References to the first derivatives of t for staggered Runge-Kutta meth-ods. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 674.12 References to the second derivatives of t for staggered Runge-Kuttamethods. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 674.13 References to the first derivatives of t for exponential time-differencingRunge-Kutta methods. . . . . . . . . . . . . . . . . . . . . . . . . . 68ixList of Tables8.1 Approximate order of accuracy py for the forward solution, computedusing py (2iτ, 2i−1τ) = log2 (‖ǫy(2iτ)‖/‖ǫy(2i−1τ)‖) for i = 1, 2, 3, 4,with ǫy(2iτ) = y(2iτ)− yexact. . . . . . . . . . . . . . . . . . . . . . 1508.2 Approximate order of accuracy pλ for the adjoint solution, computedusing pλ (2iτ, 2i−1τ) = log2 (‖ǫλ(2iτ)‖/‖ǫλ(2i−1τ)‖) for i = 1, 2, 3, 4,with ǫλ(2iτ) = λ(2iτ)− λexact. . . . . . . . . . . . . . . . . . . . . . 1518.3 Approximate order of accuracy p∇ for the gradient, computed usingp∇ (2iτ, 2i−1τ) = log2 (‖ǫ∇(2iτ)‖/‖ǫ∇(2i−1τ)‖) for i = 1, 2, 3, 4, withǫ∇(2iτ) = ∇(2iτ)−∇exact. . . . . . . . . . . . . . . . . . . . . . . . 1518.4 Approximate order of accuracy py for the forward solution using aRosenbrock approach, computed using py (2iτ, 2i−1τ) = log2 (‖ǫy(2iτ)‖/‖ǫy(2i−1τ)‖)for i = 1, 2, 3, 4, with ǫy(2iτ) = y(2iτ)− yexact. . . . . . . . . . . . . 1528.5 Approximate order of accuracy pλ for the adjoint solution using aRosenbrock approach, computed using pλ (2iτ, 2i−1τ) = log2 (‖ǫλ(2iτ)‖/‖ǫλ(2i−1τ)‖)for i = 1, 2, 3, 4, with ǫλ(2iτ) = λ(2iτ)− λexact. . . . . . . . . . . . . 1528.6 Approximate order of accuracy p∇ for the gradient using a Rosenbrockapproach, computed using p∇ (2iτ, 2i−1τ) = log2 (‖ǫ∇(2iτ)‖/‖ǫ∇(2i−1τ)‖)for i = 1, 2, 3, 4, with ǫ∇(2iτ) = ∇(2iτ)−∇exact. . . . . . . . . . . . 153xList of Figures8.1 Actual parameter values . . . . . . . . . . . . . . . . . . . . . . . . 1488.2 Ground truth solution at t = 50s. . . . . . . . . . . . . . . . . . . . 1498.3 Recovered parameter values (bottom) and initial guesses (top) . . . 154xiList of AcronymsAbbreviation DescriptionODE(s) Ordinary differential equations(s)PDE(s) Partial differential equations(s)MOL Method of linesLM Linear multistepBDF(s) Backward Differentiation Formula(s)RK Runge-KuttaERK Explicit Runge-KuttaDIRK Diagonally-Implicit Runge-KuttaIMEX Implicit-explicitIMEX LM/RK Implicit-explicit linear multistep/Runge-KuttaStagLM/RK Staggered linear multistep/Runge-KuttaETD Exponential Time-DifferencingETDRK Exponential Time-Differencing Runge-KuttaDO Discretize-then-OptimizeOD Optimize-then-DiscretizeAD Automatic DifferentiationGN Gauss-NewtonBFGS Broyden-Fletcher-Goldfarb-ShannoContinued on next page. . .xiiList of AcronymsAbbreviation DescriptionL-BFGS Limited-memory BFGSFWI Full waveform inversionxiiiNotation• GeneralSymbol Descriptiona Lowercase boldface letters denote vectorsA Uppercase boldface letters denote matricest Time variableT Final integration timetk kth time levelτk Variable time step, τk = tk − tk−1τ Constant time stepK Number of time-stepsw Variable that is continuous is space and timew(t) Semi-discretized variable, still continuous in timewk Estimated value of y at tk, wk = w(tk)w0 Initial conditionwFully discretized variable, exact structure depends on time-stepping methodNa Number of entries in vector aNNumber of entries in yk (usually this is the number of spatialunknowns)y Forward solutionContinued on next page. . .xivNotationSymbol Descriptionξ Solution of the linearized forward problemq Source term for the forward problemλ Adjoint solutionθ Source term for the adjoint problemtTime-stepping vector, abstract representation of time-steppingmethod∂ t∂yDerivative of the time-stepping method with respect to the so-lution, abstract representation of the linearized time-steppingmethod∂ t∂y⊤Derivative of the time-stepping method with respect to thesolution, abstract representation of the adjoint time-steppingmethodm Model parameterss Source parametersp Parameters set, p =[m⊤ y⊤0 s⊤]⊤pref Reference parameter set (incorporates prior information)d Simulated data/measurements/observationsdobs Actual/observed data/measurements/observationsM Misfit function, M = M(d,dobs)R Regularization function, R = R(p,pref)β Regularization parameterΩ Objective function, Ω = Ω(p) = Ω(d,dobs;p,pref) = M+ βR∂x∂yJacobian/sensitivity matrix of x with respect to yJ Sensitivity matrix∂d∂pContinued on next page. . .xvNotationSymbol Description∇p/dM gradient of M with respect to p/dHM Hessian of M with respect to p∂2M∂p∂pHessian of M with respect to p∂2M∂d∂dHessian of M with respect to d∂2x∂y∂zw∂2x∂y∂zw =∂∂y(∂x∂zw∣∣∣∣w), where x, y, z and w are arbitraryvectors of appropriate lengths; w is taken to be fixed withrespect to yv/wUsually arbitrary vectors of appropriate size that are multi-plied with some matrixva/waUsually arbitrary vectors with the same length and structureas adiag (a) Diagonal matrix with the entries of a on its diagonalblkdiag Block diagonal matrix, with matrices A1, · · · ,AK on its(A1, · · · ,AK) diagonal⊗ Kronecker product⊙ Kronecker dot productFDiscrete Fourier transform matrix (1D or 2D, depends on con-text)â Discrete Fourier transform of vector a, â = fft(a) = Fa·⊤ Transpose operator·⋆ Hermitian operator· Conjugation operator• Runge-Kutta MethodsxviNotationSymbol Descriptions Number of internal stagesσ Index used to denote the current internal stageYk,σσth internal stage at the kth time-step if the forward solutionis given by y; internal stages are generally represented by theuppercase of the letter representing the forward solutionYkVector containing all the internal stages at the kth time-step,e.g. Yk =[Y⊤k,1 · · · Yk,s]⊤ŷkVector containing all the internal stages and the computedsolution at the kth time-step, i.e. ŷk =[Y⊤k y⊤k]⊤aσi ith Runge-Kutta coefficient for the σth stagebσ σth weightcσ σth nodeΨRunge-Kutta transition matrix, maps one time level to thenext• Linear Multistep MethodsSymbol Descriptions Number of internal stepsσ Index used to denote the current stepαj, βj, γj Weights used by various linear multistep methodsS Source weighting matrix• Regular Time-Stepping MethodsxviiNotationSymbol Descriptionf Nonlinear operator, f = f(y, t;m)• Implicit-Explicit Time-Stepping MethodsSymbol DescriptionfE Nonlinear operator to be handled explicitly, fE = fE(y, t;m)fI Nonlinear operator to be handled implicitly, fI = fI(y, t;m)E "Explicit", used only as superscriptI "Implicit", used only as superscript• Staggered Time-Stepping MethodsSymbol Descriptiontk As above, but referred to as integer time level in this contexttk+ 12Half-integer time level, halfway between tk and tk+1u Component of forward solution at integer time levelsv Component of forward solution at half-integer time levelsfuNonlinear operator mapping from the half-integer time levelto the integer time level, fu = fu(v, t;m)fvNonlinear operator mapping from the integer time level to thehalf-integer time level, fv = fv(u, t;m)e "even", used as superscript or under summation symbolso "odd", used as superscript or under summation symbols• Exponential Time-Differencing MethodsxviiiNotationSymbol DescriptionL Linear operatorLkLinear operator as a function of the solution at the kth timelevel, Lk = L (yk)n nonlinear operator n = n(y, t;m)ϕℓ(z) ϕℓ(z) =ϕℓ−1(z)− ϕℓ−1(0)z, with ℓ > 0 and ϕ0(z) = ezϕℓ,σ ϕℓ,σ = ϕℓ(cστkLk−1)• Misfit FunctionsSymbol DescriptionNR Number of receivers/tracesW Weighting matrix for least-squares misfitAi"Amplitude" of trace di, Ai = ‖di‖; used by the least-squaresamplitude misfitTiOptimal shift of trace di; used by the cross-correlation timeshift misfitS(t′)Discrete shift operator, shifts trace it acts on to the right byt′ grid points⋆ Cross-correlation operatorNω Number of frequencies used by Fourier transformE Matrix of 0s and 1sxixAcknowledgementsI am indebted to my supervisors Eldad Haber and Uri Ascher for their patience andsupport while completing this thesis. Their guidance and insight have been crucialduring my time at UBC, and their significant expertise in a wide range of topics hasmade me a much better applied mathematician.I am grateful to Chen Greif for the support and interesting discussions, and toDan Coombs and Lee Yupitun for their help. Thank you to Michael Ward for theengaging courses.My time at SFU as an exhange student was a pivotal moment in my life. I amgrateful to Ralf Wittenberg for showing me that research can be fun and inspiringme to go to grad school.A huge thank you to my best friend and amazing partner Rebecca for giving meso much encouragement, love and support during this stressful time.I am very grateful to Iain for his friendship, finding a friend like him is a rarething. A big thank you to Erin and Kyle for their generosity, they have made my lifea lot easier.Also thank you to Fred, having him around made the final few years of grad schoolfar more bearable, and to Mike for the many fun conversations. Thank you to Pouyaand Helen for being such good friends, and to everyone else who has made my timehere more enjoyable over the years.I am especially grateful for the support of my family.xxTo my familyxxiChapter 1IntroductionLarge-scale model calibration is an important type of inverse problem dealing with therecovery or approximation of the model parameters appearing in partial differentialequations (PDEs). The PDE models some real-life process and observations of thisprocess, which usually contain noise, are compared to numerically computed solutionsof the PDE. Model calibration problems are very common in science and engineering,for instance geophysics [36, 51, 87], fluid dynamics [71], computer vision [116] andbioscience [21], to name just a few. For a more general treatment of such problemswe mention [9–11, 49, 66, 118].We consider a generic time-dependent nonlinear PDE. Using the method of lines(MOL) approach, it is discretized in space by applying a finite difference, finite vol-ume, finite element or pseudspectral method, where the known boundary conditionsare assumed to have already been incorporated; we do not assume that the initialcondition is known, although it can be. This results in a large system of ordinarydifferential equations (ODEs) that can be written in generic first-order form as∂y(t;p)∂ t= f(y(t;p), t,m) + q(t, s)y(0) = y0,(1.1)on an underlying discrete spatial grid, with 0 ≤ t ≤ T . The vector p =[m⊤ y⊤0 s⊤]⊤contains the parameters that we are trying to recover, where m ∈ RNm is the set ofdiscretized model parameters that could, for instance, represent some physical prop-erties of the underlying material that we want to estimate. The source parameters1Chapter 1. Introductions ∈ RNs determine the behaviour of the source term, and we also allow for the initialcondition y0 to be unknown1.We refer to y = y(t;p) in this context as the forward solution. It is a time-dependent vector of length N and depends on p indirectly through the discretizedoperator f and the discretized source term q. Since y arises from a spatially discretizedPDE, N can be very large in practice. The length of the parameter vector, Np, maybe similarly large, even if y0 is known.Given a current estimate of p, the solution to (1.1) is approximated by using sometime-stepping method, leading to a fully discretized version of (1.1). The results ofthis simulation are then compared to the observations of the actual process to furtherimprove on the estimate of p. This is often done by employing a large optimizationproblem constrained, among other requirements, by the discretized PDE. Gradient-based optimization procedures are commonly employed, but computing the requiredgradients is generally a non-trivial matter. In this context, for large-scale problems,the adjoint method is a well-known and efficient approach to computing the requiredgradients.The goal of this study is to systematically implement the discrete adjoint methodto the fully discretized version of (1.1). The discrete adjoint method for time-dependent problems is highly dependent on the time-stepping method being used andour motivation is its application to several high-order time-stepping methods. Thisis expected to be a significant practical contribution to model calibration problemswhere it is desirable to solve the forward model to high accuracy.Our results are also relevant to the problem of sensitivity analysis, where theefficient computation of the action of the sensitivity matrix on a vector is required.Sensitivity analysis is an essential tool for parameter estimation, uncertainty quantifi-cation, optimization, optimal control, and the construction of reduced order models,1We note that model calibration is often referred to as parameter estimation if one is trying torecover justm or (sometimes) s. In some sources the two terms are used interchangeably, but we willtry to use the term model calibration when recovering p, and parameter estimation when recoveringonly m.21.1. Time-Stepping Schemesamong other applications.Below we give an outline of the time-stepping methods we discuss in this the-sis. This is followed by a brief overview of PDE-constrained optimization, and afterthat we introduce the adjoint method. We conclude the introduction with thesiscontributions and related work, then outline of the rest of the thesis.1.1 Time-Stepping SchemesThe main objective of this thesis is to implement the discrete adjoint method for thefollowing classes of time-stepping methods that lend themselves to high-order timeintegration:• Standard linear multistep (LM) and Runge-Kutta (RK) methods are commonlyused to solve a wide range of ODE systems. RK methods are multistage methodsthat use collocation points inbetween two time levels using internal stages, whereasLM methods save a finite time history of the solution in order to extrapolate to thenext step.• Implicit-explicit (IMEX) methods can be used in cases where f can be split into acomponent that should be evaluated explicitly and another component that wouldbe better served by an implicit treatment. IMEX methods can be divided intolinear multistep (IMEX LM) and Runge-Kutta approaches (IMEX RK).• Staggered methods are applicable when y can be split into two sets, say u andv, where the evolution of u depends solely on the current value of v, and viceversa. There are again two approaches: linear multistep (StagLM) and Runge-Kutta (StagRK).• Lastly, exponential time-differencing methods are applicable to problems that canbe written in semilinear form. Linear multistep and Runge-Kutta approaches are31.2. PDE-Constrained Optimizationagain possible, but in this case we focus solely on the Runge-Kutta (ETDRK)approach.Applying a time-stepping method to (1.1) fully discretizes the system by breakingthe time interval [0, T ] up into a finite set of time-steps. To utilize the discreteadjoint method, this fully discretized system is abstractly represented in the form ofthe time-stepping equationt(y(p);m,y0) = Sq(s), (1.2)where the time-stepping vector t and the source weighting operator S depend on thetime-stepping method being used. y represents the fully discretized forward solution,i.e. a vector containing the numerically approximated solutions at all of the timelevels, and q represents the fully discretized source term.1.2 PDE-Constrained OptimizationThe model calibration problem is to recover a feasible set of parameters p that ap-proximately solves a PDE-constrained optimization problem of the formp⋆ = argminpΩ(p) s.t. (1.2) holds. (1.3)The objective functionΩ(p) = Ω(d,dobs;p,pref) = M(d,dobs) + βR(p,pref)has two components. The regularization function R(p,pref) penalizes straying toofar away from the prior knowledge we have of the true parameter values (whichis incorporated in pref). Common examples are Tikhonov-type regularization [35,123], including least squares and total variation (TV) [127]. The data misfit function41.2. PDE-Constrained OptimizationM(d,dobs) in some way quantifies the difference between d and dobs, with dobs a givenset of observations of the true solution and d = d(p) the observation of the currentsimulated solution, depending on p implicitly through the forward solution.The most popular choice for M, corresponding to the assumption that the noisein the data is simple and white, is the least-squares functionM =12∥∥d− dobs∥∥22.However, we will not restrict our discussion to any particular misfit function. Therelative importance of the data misfit and prior is adjusted using the regularizationparameter β.There are several classes of optimization procedures that can be used to solve (1.3),see for instance [35, 123, 127]. Often it is possible to reformulate (1.3) as an uncon-strained optimization problem (this is the reduced space approach for PDE-constrainedoptimization), and then minimized using the gradient of Ω with respect to p [31, 86],∇pΩ = ∇pM+ β∇pR.Popular gradient-based minimization procedures include steepest-descent, nonlinearconjugate-gradient and BFGS. Newton’s method is a gradient-based procedure thatrequires the availability of the Hessian, and first-order approximations of the Hessianlead to Gauss-Newton and Levenberg-Marquardt methods.The derivatives of the regularization function are known and are independent ofthe time-stepping scheme, so in this thesis we focus on M. The adjoint method isused to find expressions for both the gradient and the action of the Hessian of M.51.3. The Adjoint Method1.3 The Adjoint MethodThe sensitivity matrix J = ∂d∂pstores the first-order derivatives of the predicted datawith respect to the model parameters and can be calculated explicitly in small-scaleapplications. However, in cases such as those considered here it is far more feasibleto compute the action of the sensitivity matrix on a vector using the adjoint method.Originally developed in the optimal control community, Chavent [19] introduced theadjoint method to the theory of inverse problems to efficiently compute the gradientof a function. See [90] for a review of the method applied to geophysical problems.As we will discuss in Chapter 3, the adjoint method requires the solution of anadjoint problem, which will have to be integrated backward in time and requires theavailability of the forward numerical solution.There are two frameworks that one can take when computing derivatives of themisfit function, discretize-then-optimize (DO) or optimize-then-discretize (OD). InOD one forms the adjoint differential problem, for either the given PDE or its semi-discretized form (1.1), which is subsequently discretized, thus decoupling the dis-cretization process of the problem from that of its adjoint. A disadvantage of thisapproach is that the gradients of the discretized problem are not the (exact) gradientsof any function, because a discretization error gets introduced when moving from thecontinuous setting [49]. However, this approach offers flexibility in its implementa-tion, since the numerical method and the sequence of step sizes used to solve theadjoint problem may differ from those used to solve the forward problem, and may betuned separately to satisfy the accuracy needs of the reverse integration. This comesat the cost of having to interpolate the forward solution to the time levels requiredin the integration of the adjoint problem.We therefore prefer the DO approach, where one applies the adjoint method to thefully discretized form (1.2). This provides the exact gradient (within roundoff error) ofthe numerical data misfit function, which can be important in numerical optimizationproblems. On the other hand, there is no flexibility to separately tune the adjoint61.3. The Adjoint Methodintegration since the sequence of step sizes are determined by the forward numericalmethod. The advantage of this, of course, is that all the variables needed to form thediscrete adjoint solution procedure are computed during the forward solution, and noadditional interpolations are necessary. Needless to say, the time-stepping method isof central importance, which is the motivation for the work done in this thesis.An alternative approach for computing the gradient that falls inside the DO frame-work is automatic differentiation (AD) [48, 85], where the exact derivatives of M withrespect to p, up to rounding error, are computed automatically by following thesequence of arithmetic commands used in the computation of M and successivelyapplying basic differentiation rules, particularly the chain rule, to determine the de-pendence of M on p.AD is a great tool for many purposes, in particular because it can be easy to useif one is already familiar with an AD implementation, and it allows one to focus onthe results rather than the differentiation procedure. However, it can also be difficultto use if the practitioner is not already familiar with it, especially if sophisticatedfeatures are required, and the generated output may be hard to follow, leading a lossof understanding of the mathematical structure.For large-scale problems, where efficiency and memory allocation are essential,the adjoint method approach should be preferred since hard-coded and optimizablecomputations can lead to a significant performance increase. For instance, havingan explicit algorithm for the adjoint time-stepping method, as we develop here, al-lows one to exploit opportunities to increase the computational efficiency, such asparallelization and the precomputation of repeated quantities.Furthermore, it would require a very sophisticated AD code to accomplish someof the tasks considered here, since most AD codes would have difficulties in handlingblack-box routines, linear solvers, etc. without supervision by the user. In particular,the derivatives of the ϕ-functions that arise in ETD schemes, further elaborated uponin Sections 2.4.1 and 2.4.2, would present a significant challenge. Finally, when using71.4. Thesis Contributions and Related WorkAD the derivatives need to be computed for each PDE and time-stepping methodused to solve it, possibly for different platforms and time-stepping libraries. Thediscrete adjoint method is much more flexible in this regard, since the adjoint time-stepping methods and related expressions need to be derived just once for each familyof time-stepping method.For these reasons we do not consider AD any further, but see [47] for a discussionon AD vs. the continuous adjoint method, and see [95] for a joint adjoint-AD imple-mentation to compute Hessian matrices. [99] discusses AD for the sensitivity analysisof ODE-constrained problems.1.4 Thesis Contributions and Related WorkThe main objective of this thesis is the systematic implementation of the discrete ad-joint method for a variety of popular time-stepping methods used for high-order timeintegration. There are numerous publications that discuss the methodology behindadjoint-based gradient computation, but these are usually for time-independent prob-lems and the details on the implementation are frequently omitted. In contrast, ourwork is done with the numerical implementation of the resulting expressions in mind.Therefore we present several algorithms throughout the text to assist the interestedreader in the implementation of our results.For each time-stepping method, the adjoint method requires the derivatives of(1.2) with respect to p and y, as well as the transposes of these derivative matrices.One of the main contributions of this thesis is to find the expressions for these terms.This is presented in such a way that keeps the various time-stepping methods distinctfrom each other, so that researchers interested in one particular scheme can easilyfind the relevant expressions without having to go through extra layers of discussionthat are not relevant to the desired scheme.Model calibration involving the simultaneous recovery of three distinct sets of81.4. Thesis Contributions and Related Workparameters (model parameters m, source parameters s and the initial condition y0)appears to be a little-explored field and will be useful in applications where some orall of these parameters are poorly known. Our main contribution here is finding anexpression for the action of the Hessian of M on some arbitrary vector, again usingthe adjoint method, where these distinct sets of parameters lead to cross-terms thatcomplicate the derivation.For most of the thesis we will allow M to be some generic data misfit function. Aswe will see, the derivatives of M with respect to p require the derivatives of M withrespect to the simulated data d (i.e. observations of the forward solution). In mostapplications the ℓ2 data misfit discussed above is sufficient and in this case findingthe derivatives of M with respect to d is trivial, but there a some applications, inparticular full waveform inversion (FWI) [36] or seismic tomography, where other,more specialized, misfit functions might be more sensible. While the derivatives ofsome of these misfit functions in the continuous setting are known, finding boththe first and second derivatives in a discrete setting is novel as far as we know. Inparticular, finding the derivatives of the interferometric misfit is new to the best ofour knowledge.We apply some of our results to the problem of parameter estimation in patternformation. The model parameters are allowed to vary in space, leading to differentpatterns forming in different regions. The number of model parameters in this case islarge and the application of adjoint-based optimization procedures to this particularproblem appears to be novel.Applying the discrete adjoint method to certain time-stepping methods has ofcourse been done before, particularly in the area of optimal control. See for instance[4, 58, 83] for applications to the Crank-Nicolson method. Adjoint methods for RKmethods have already been tackled in several publications. In [32] a second order RKmethod for optimal control problems with ODE constraints is developed, and Hagerinvestigated the order conditions for adjoint RK methods up to order four in [52], but91.4. Thesis Contributions and Related Workthe discussion is somewhat inaccessible to practitioners working outside of the fieldof optimal control. [100] tackles the consistency of the methods found in [52], and see[3] for a discussion on the adjoint RK method in combination with adaptive spatialgrids. The adjoint method is applied to Rosenbrock time-stepping methods (whichare closely related to implicit RK methods) in [28]. The discrete adjoint method wasused by Zhang [131] for a wide range of RK methods to perform sensitivity analysiswith respect to model parameters and the initial condition.Properties of adjoint RK methods have also been investigated in the context ofoptimal control in, for example, [12, 60, 69]. Sanz-Serna has recently [107] discussedthe adjoint method in conjunction with symplectic RK methods in this context. Thestandard approach to developing adjoint RK schemes in the optimal control commu-nity appears to be using AD, as described in [128].The discrete adjoint method for LM methods has been addressed as well, see [101]for the derivation and some analysis of adjoint LM methods. In this case the adjointmethods are obtained using a variational calculus approach, which differs from theapproach taken in this thesis, although it leads to the same expressions in the case ofconstant step sizes. Note that variable step sizes are allowed in [101] (in contrast tothe work in this thesis, where only constant step sizes are considered for LM methods)and it is shown that having variable step sizes may lead to the adjoint LM methodsthat are inconsistent. The consistency and stability properties of both adjoint LMand RK methods are discussed by Sandu in [102]. See also [2] for the adjoint methodin conjunction with variable time-step integrators.As will be shown in Chapter 3, the adjoint method can also be used to obtainexpressions for second-order derivatives (often called second-order adjoints in theliterature), and some work on this has already been done for LM and RK methods,see in particular [22] and [106].Aside from optimal control problems, the work listed above has been applied todata assimilation problems [23–25], PDE-constrained optimization problems [22] and101.4. Thesis Contributions and Related Worka-posteriori error estimates [92]. See also the references in these papers, and theexamples in the papers listed in the paragraphs above, for other applications.In this thesis we re-derive the adjoint LM and RK methods. While the expres-sions we obtain do not differ significantly from those in the references above, ourapproach is geared towards the practical implementation of these methods and wegive a significant number of additional details that will be of relevance when comput-ing the gradient and Hessian of M. Re-deriving the adjoint methods of regular LMand RK methods also helps the reader in understanding the derivation of the morecomplicated IMEX, staggered and exponential time-stepping methods that are alsotackled.The work presented for these latter methods appears to be novel, although weshould mention that the leap-frog method is a lower-order staggered time-steppingmethod that has been used in numerous implementations for parameter estimationin time-domain seismic imaging, electromagnetic inversion, and elsewhere (see also[115] for a discussion of the adjoint leapfrog method in meteorology). Our discussionon staggered time-stepping methods is, however, more general and geared towardshigher-order methods.There are only a few software packages available for the solution of ODEs have thecapability to compute sensitivities for large-scale problems. These include ODESSA [75]and CVODES within SUNDIALS [110] from Lawrence Livermore National Laboratory,which are geared towards sensitivity analysis for problems solved using backward dif-ferentiation formulas (BDFs). There are also several packages available for various RKmethods from the group of Adrian Sandu, such as KPP [30, 103, 105], MatlODE [104],DENSERKS [1], and, more recently, FATODE [132–134], which can handle many differenttypes of regular RK methods, including implicit and Rosenbrock methods. Thesepackages have been used in 3D PDE-constrained optimization solvers in atmosphericdata assimilation, in particular NASA’s GEOS-Chem [113, 114] and EnvironmentalProtection Agency’s CMAQ [45, 57].111.5. Thesis Overview and OutlineNone of the packages above can handle general IMEX, staggered or exponentialtime integration methods. The major contribution of this thesis is that it providesthe background needed for the implementation of the sensitivity computations, aswell as gradient and Hessian computations, not only for the time-stepping methodsimplemented by the packages listed above, but many more.The strength of the approach taken in this study is that it is not limited toa specific time-stepping method since we find general formulations for each classof time-stepping method. This is in contrast to AD, where the adjoint expressionswould have to be found for each specific time-stepping method. While our motivationis large-scale model calibration, our results can be applied to any application wherethe goal is to solve a problem in the form of (1.3) using gradient-based minimizationprocedures.To avoid potential confusion, we mention at this point that it appears that in thecomputational fluid dynamics community, especially in regard to shape optimization,the term discrete adjoint method is used when working with steady-state (i.e. time-independent) problems or, more relevant to our discussion, semidiscrete problems inthe form (1.1). A time-stepping method is then applied to this semidiscrete adjointformulation. In our discussion we consider the adjoint method to be discrete only ifit is applied to the fully discretized problem (1.2).1.5 Thesis Overview and OutlineThis thesis has nine chapters. After this introductory chapter, we review the time-stepping methods of interest in Chapter 2 and show how each of these can be writtenin the form of (1.2).Expressions for the sensitivity matrix, as well as the gradient and the action ofthe Hessian of M are derived in Chapter 3 using the discrete adjoint method. Thederivations of the Hessian are included in Appendix B.121.5. Thesis Overview and OutlineThe adjoint method requires the derivatives of t with respect to p and y. Findingexpressions for these derivatives is an important contribution of this thesis, but dueto the very technical nature of these derivations we have omitted them from the maintext and instead included them in Appendix A. For ease of reference, Chapter 4 liststhe page and equation numbers of the final expressions of the derivatives.The computation of the action of the sensitivity matrix requires the solution of thelinearized forward problem∂ t∂yξ = q, which is the focus of Chapter 5. The adjointmethod requires the adjoint solution λ, which is the solution to the adjoint problem∂ t∂y⊤λ = θ, and this is discussed in detail in Chapter 6.Chapter 7 presents the derivatives of several misfit functions that might be ofuse especially in the context of seismic tomography. Some of our results concerningexponential time-differencing methods are used in Section 8 to tackle a parameter esti-mation problem involving the Swift-Hohenberg model, a PDE problem with solutionsexhibiting the interesting phenomenon of pattern formation. Some final thoughts andavenues for future work are provided in Chapter 9.13Chapter 2Review of Time-Stepping MethodsIn this chapter we review the time-stepping methods considered in this thesis, whereeach class of time-stepping methods can be approached using either an LM-type oran RK-type approach.The time interval 0 ≤ t ≤ T is discretized by 0 = t0 < t1 < · · · < tK = T , withtime-step sizes τk = tk−tk−1. When using an LM-type scheme we restrict ourselves touniform time-steps τ = τk, whereas for RK-type schemes we allow variable time-steps(except for staggered schemes). We let yk ≈ y(tk) be the numerical approximation ofthe true solution at the kth time-step, and y represents the fully-discretized solutioncontaining the solutions at all of the time levels. In the case of RK-type methodswe will additionally include the internal stages computed at each time level in y forreasons that will become apparent in later chapters.As discussed in the introduction, it is also shown how to abstractly represent eachtime-stepping method by the time-stepping equation (1.2)t(y;m,y0) = Sq(s).Representing a time-stepping scheme in this manner is a crucial tool when used inconjunction with the discrete adjoint method, as we will see in Chapter 3, but weemphasize that t is purely conceptual and is never formed in practice.For notational simplicity we suppress the dependence of all quantities on the modeland source parameters m and s in this chapter. It is important for later chapters tobe aware of the fact that y depends on m indirectly through t and on s indirectlythrough q.142.1. Regular Time-Stepping MethodsTo aid the readability in what follows, we introduce the following KN×KN blocktemplate matrix that will be required by LM-type methods in the sections below:× =×(s) ⊗ IN , (2.1a)where ×(s) is the placeholder of a K ×K matrix of the form×(s) =×(1:s)×s−1 · · · ×1 ×0×s ×s−1 · · · ×1 ×0. . . . . . . . . . . .×s ×s−1 · · · ×1 ×0. . . . . . . . . . . .×s ×s−1 · · · ×1 ×0. (2.1b)The initialization block ×(1:s) is needed to handle the first few steps of the LMmethod. We review several options for this in Section 2.1.1 and leave the structureof×(1:s) undefined for now.2.1 Regular Time-Stepping MethodsBy "regular" time-stepping methods we mean the usual LM and RK methods that areused to compute the solution of a large number of ODEs in practice. These methodsare described in numerous works on the numerical solution of ODEs, standard refer-ences include Gear [39]; Lambert [73, 74]; Butcher [15]; Hairer, Nørsett and Wanner[54]; Hairer and Wanner [56]; and Ascher and Petzold [6]. See also [5, 16, 77] for morerecent treatments.152.1. Regular Time-Stepping Methods2.1.1 Linear Multistep MethodsGiven a linear ODE system in the form (1.1), the general s-step method, with s ≤k ≤ K, is written ass∑j=0α(s)j yk−j = τs∑j=0β(s)j fk−j + τs∑j=0β(s)j q(tk−j), (2.2)with fk−j = f (yk−j, tk−j). The weights α(s)j and β(s)j determine the s-step method,with α(s)0 = 1 (always). There are two main classes of multistep methods in activeuse:• Backward differentiation formulas (BDFs), for which β(s)1 = . . . = β(s)s = 0, and• Adams-type methods, for which α1 = −1 and α2 = . . . = αs = 0. Further, Adams-Bashforth methods are explicit (β(s)0 = 0) and Adams-Moulton methods are implicit(β(s)0 6= 0).By Dahlquist’s first barrier theorem [29, 54], an explicit s-step multistep methodcannot attain an order of accuracy greater than s, and an implicit method cannotattain an order greater than s+ 1 if s is odd and greater than s+ 2 if s is even.An s-step method of course needs to have the solution at s previous time stepsavailable, which is not the case at the start of the integration when k < s. There arevarious options to handle this:• Specify s−1 additional initial conditions, for instance in addition to y0 also specifyy−1 if s = 2, y−1 and y−2 if s = 3, etc. This is the most sensible option if y0 = 0.• If the same order of accuracy needs to be maintained, use a Runge-Kutta methodof the same order for first s−1 iterations, keeping the same time step τ if possible.• Use lower-order LM methods to gradually build up a solution for the first s − 1time steps. One can use methods with increasingly higher orders of accuracy, for162.1. Regular Time-Stepping Methodsinstance employ a first-order method to integrate up to τ , then a second-ordermethod to integrate up to 2τ (using the previously computed solution at τ), etc.If necessary, the lower-order schemes can have time steps that are smaller than τ ,thereby ensuring that the solution at kτ is sufficiently accurate.Using the block template matrix (2.1), A(s) and A are simply ×(s), ×(1:s) and× when we set ×j = α(s)j , and likewise setting ×j = β(s)j gives B(s) and B. Theinitialization blocks A(1:s) andB(1:s) inAs andBs allow for one of the three approachesmentioned above to handle the initialization. The first approach just requires thatthe blocks have the same diagonal and subdiagonals as ×(s) and the Runge-Kuttaapproach can be implemented using the discussion in the next section. We focuson the third approach here, but for simplicity assume that each of the lower-ordermethods uses the same time step τ as the higher-order method. In this case theinitialization blocks are×(1:s) =×(1)0×(2)1 ×(2)0.... . . . . .×(s−1)s−2 · · · ×(s−1)1 ×(s−1)0 , (2.3)with the superscripts (σ) indicating the highest order of accuracy that can be attainedat the time step k for 0 < k < s, with α(σ)j and β(σ)j being the corresponding weightsof the σ-step method.We also define the KN ×N block matricesα⊤ =[α(1)1 α(2)2 · · · α(s)s 01×(K−s)]⊗ INβ⊤ =[β(1)1 β(2)2 · · · β(s)s 01×(K−s)]⊗ IN .(2.4)The blocks A(1:s) and B(1:s), block matrices α and β, solution y and source termq can be modified in an obvious way if lower-order methods at the start of the172.1. Regular Time-Stepping Methodsintegration use time steps that are smaller than τ . We will not consider this moregeneral formulation here due to space constraints.The time-stepping system (2.2) can then be compactly represented byAy = τBf (y) + τ β f0(y0)−αy0 + τ[β B]q, (2.5)withy =[y⊤1 · · · y⊤K]⊤, q =[q⊤0 q⊤1 · · · q⊤K]⊤,f (y) =[f⊤1 · · · f⊤K]⊤.(2.6)(2.5) is then obtained by lettingt(y,y0) = Ay− τBf (y) +αy0 − τ β f0(y0) (2.7)and S = τ[β B]. Notice t(y,y0) is a vector of length KN and the kth time-stepis given by the kth subvector of length N :tk(y,y0) =s∑j=0α(s)j yk−j − τs∑j=0β(s)j fk−j. (2.8)ExamplesThe s-step Adams-Bashforth methods with s = 1, 2, 3 ares = 1 : yk = yk−1 + τ fk−1 + qks = 2 : yk = yk−1 + τ(32fk−1 − 12fk−2)+ qks = 3 : yk = yk−1 + τ(2312fk−1 − 43fk−2 +512fk−3)+ qk,182.1. Regular Time-Stepping Methodswhereqk = τ q(tk−1)qk =τ2(3q(tk−1)− q(tk−2))qk =τ12(23q(tk−1)− 16q(tk−2) + 5q(tk−3)) ,so if using a third order Adams-Bashforth method we haveA =1−1 1−1 1−1 1. . .−1 1⊗ IN , α =−1000...0⊗ IN ,B =0320−4323120512−4323120. . .512−4323120⊗ IN , β =1−125120...0⊗ IN ,with A and B being matrices of size KN ×KN .The s-step BDFs of orders s = 1, 2, 3 ares = 1 : yk − yk−1 = τ fk + τ q(tk)s = 2 : yk − 43yk−1 +13yk−2 = τ23fk + τ23q(tk)s = 3 : yk − 1811yk−1 +911yk−2 − 211yk−3 = τ611fk + τ611q(tk),192.1. Regular Time-Stepping MethodsTherefore a third-order BDF can be represented in matrix form usingA =1−431911−18111− 211911−18111. . .− 211911−18111⊗ IN , α =−113− 2110...0⊗ INB =123611611. . .611⊗ IN β = 0K×1 ⊗ IN .Again, A and B are matrices of size KN ×KN .2.1.2 Runge-Kutta MethodsThe family of s-stage Runge-Kutta methods is given byyk = yk−1 + τks∑σ=1bσYk,σ, (2.9a)where the internal stages are, for σ = 1, · · · , s,Yk,σ = f(yk−1 + τks∑i=1aσiYk,i, tk−1 + cστk)+ q(tk−1 + cστk). (2.9b)The procedure starts from a known initial value y0. Here a particular method isdetermined by setting the number of stages and the corresponding coefficients aσi,202.1. Regular Time-Stepping Methodsweights bσ and nodes cσ, which are commonly summarized in a Butcher tableau:c1 a11 a12 · · · a1sc2 a21 a22 · · · a2s.......... . ....cs as1 as2 · · · assb1 b2 · · · bs=cs Asb⊤s.The method is said to be explicit if aσi = 0 for σ ≤ i and diagonally-implicit if aσi = 0for σ < i. It is implicit otherwise.We mention that for stiff problems, RK methods are susceptible to the prob-lem of order reduction, where the order of accuracy of the method can be reducedsignificantly.For ease of notation, define yk,σ:= yk−1 + τk∑si=1 aσiYk,i. The method given by(2.9a) and (2.9b) can be written in matrix form asINIsN−IN B⊤ INyk−1Ykyk−0N×1Fk0N×1 =yk−1Qk0N×1 , (2.10)with B = −τbs ⊗ IN ,Yk =Yk,1...Yk,s , Fk =Fk,1...Fk,s and Qk =Qk,1...Qk,s , (2.11)whereFk,σ = Fk,σ (y,y0) = f(yk,σ, tk−1 + cστk)Qk,σ = q(yk,σ, tk−1 + cστk).212.1. Regular Time-Stepping MethodsThe system (2.10) constitutes a single time step in the solution procedure, so theRunge-Kutta procedure as a whole can be represented by (1.2) by letting S = I(s+1)KNandt(y,y0) = Ty − f , (2.12)withy =[Y⊤1 y⊤1 · · · Y⊤K y⊤K]⊤(2.13)f =[F⊤1 y0 F⊤2 01×N · · · F⊤K 01×N]⊤(2.14)q =[Q⊤1 01×N · · · Q⊤K 01×N]⊤(2.15)T =IsNB⊤ INIsN−IN B⊤ IN. . . . . . . . .IsN−IN B⊤ IN. (2.16)Here y,q, f ∈ R(s+1)KN and T ∈ R(s+1)KN×(s+1)KN .Notice that we have included the internal stages (2.9b) in y. We do this becausethey are required when computing derivatives, specifically when we compute thederivatives of t, which leads to a significant increase in storage overhead for high-order Runge-Kutta schemes. This can be mitigated by the use of checkpointing,where the solution is stored only for some time steps and the solution at other timesteps is then recomputed using these stored solutions as needed, although note thatthis effectively means that the forward solution has to be calculated twice for everyadjoint computation.222.1. Regular Time-Stepping MethodsExamplesThe most popular fourth-order Runge-Kutta (RK4) method is given byAs =01201201 0, bs =16131316and cs =012121,so that at some time step k we haveB = −τkb4 ⊗ IN = −τk16131316⊗ IN , Qk =q(tk−1)q(tk−1 + 12τk)q(tk−1 + 12τk)q(tk) ,Fk =Fk,1Fk,2Fk,3Fk,4 =f (yk−1, tk−1)f(yk−1 +τk2Yk,1, tk−1 + 12τk)f(yk−1 +τk2Yk,2, tk−1 + 12τk)f (yk−1 + τkYk,3, tk) .A fourth order 2-stage implicit RK method is given byAs = 14 124(6− 4√3)124(6 + 4√3) 14 , bs = 1212 and cs =16(3−√3)16(3 +√3) ,and hence at some time step k we haveB = −τkb4 ⊗ IN = −τk 1212⊗ IN , Qk = 16q(tk−1 + (3−√3)τk)q(tk−1 + (3 +√3)τk) ,232.2. Implicit-Explicit Time-Stepping MethodsFk =Fk,1Fk,2 =f (yk−1 + τk4 Yk,1 + 124(6− 4√3)Yk,2, tk−1 + (3−√3)τk)f(yk−1 + 124(6 + 4√3)Yk,1 +τk4Yk,2, tk−1 + (3 +√3)τk) .2.2 Implicit-Explicit Time-Stepping MethodsFor many PDEs there are often natural splittings of the right hand sides of thedifferential systems into two parts, one of which is a non-stiff (or mildly stiff) termthat can be handled by explicit time integration, and the other part is stiff and requiresimplicit time integration. This leads to an ODE system of the form (cf. (1.1))∂y∂ t= f(y, t) + q(t) = fE(y, t) + fI(y, t) + q(t)y(0) = y0,(2.17)where fE represents the non-stiff process, for example convection, and fI representsthe stiff process, for instance chemical reaction or diffusion. Implicit-explicit (IMEX)time-stepping methods are a popular class of time-stepping methods that can be usedto solve ODE systems of this type. There are main variants of IMEX methods, linearmultistep (IMEX LM) and Runge-Kutta (IMEX RK) methods, which we describebelow.We treat the source term q separately because it depends on the model parameterss and not the model parameters m, as fE and fI do. For both IMEX LM andIMEX RK methods we have taken it to be part of the explicit term fE and handle itaccordingly.IMEX methods were proposed in the late 1970’s [27, 124] and have enjoyed muchattention since then, especially for solving convection-diffusion-reaction problems [70].Applications include mathematical biology [89, 96], weather forecasting [130], optimalcontrol [59] and options pricing [98].There is a large amount of work that has been published on IMEX methods overthe last few decades, here we follow the approach taken in the pioneering work in242.2. Implicit-Explicit Time-Stepping Methods[7, 8].2.2.1 IMEX Linear Multistep MethodsThe s-step IMEX linear multistep (IMEX LM) scheme for (2.17), with s > 1, is givenbys∑j=0α(s)j yk−j = τs∑j=1β(s)j fEk−j + τs∑j=0γ(s)j fIk−j + τs∑j=1β(s)j qk−j, (2.18)with α(s)0 = 1.The order of accuracy of IMEX LM methods is equal to the number of stepss. IMEX LM methods are designed by either choosing an implicit method (usuallyBDF) and combining it with an explicit scheme of the same order, or by choosingan explicit scheme such as an Adams-Bashforth or total-variation bounded (TVB)method, then picking an implicit scheme with the same order of accuracy and goodstability or damping properties.We now let A and B be defined using (2.1) and (2.3), as in Section 2.1.1, withβ(s)0 = 0. We obtain the matrices Γ(s), Γ(1:s) and Γ by letting ×j = γ(s)j in ×(s),×(1:s) and×. We also define the KN ×N block matricesα =[α(1)1 α(2)2 · · · α(s)s 01×(K−s)]⊤⊗ INβ⊤ =[β(1)1 β(2)2 · · · β(s)s 01×(K−s)]⊤⊗ INγ⊤ =[γ(1)1 γ(2)2 . . . γ(s)s 01×(K−s)]⊤⊗ IN .(2.19)Then (2.18) can be written asAy = τBfE(y) + τ ΓfI(y) + τ β fE0 (y0) + τ γ fI0 (y0)−αy0 + τ[β B]q, (2.20)252.2. Implicit-Explicit Time-Stepping Methodswithy =[y⊤1 · · · y⊤K]⊤, fE(y) =[fE1⊤ · · · fEK⊤]⊤q =[q⊤0 q⊤1 · · · q⊤K]⊤, fI(y) =[fI1⊤ · · · fIK⊤]⊤.(2.21)The method can then be represented by the time-stepping equation t(y;y0) = Sq,with S = τ[β B]and the time-stepping vectort(y,y0) = Ay− τBfE (y)− τ ΓfI (y)− τ β fE0 − τ γ fI0 +αy0 (2.22)t is a vector of length KN and the kth time level is given by the kth subvector oflength N :tk(y,y0) =s∑i=0α(s)i yk−i − τs∑i=1β(s)i fEk−i − τs∑i=0γ(s)i fIk−i. (2.23)ExamplesThe coefficients for the second-order IMEX BDF2 method areα(2)0:2 =[1 −4313]⊤, β(2)1:2 =[43−23]⊤and γ(2)0:2 =[230 0]⊤.The third-order IMEX BDF3 method is defined byα(3)0:3 =[1 −1811911− 211]⊤β(3)1:3 =[1811−1811611]⊤γ(3)0:3 =[6110 0 0]⊤and the third-order IMEX TVB3 method has the coefficientsα(3)0:3 =[1 −3909204813671024− 8732048]⊤β(3)1:3 =[1846312288−1271768823312288]⊤262.2. Implicit-Explicit Time-Stepping Methodsγ(3)0:3 =[10892048− 113912288− 3676144169912288]⊤.2.2.2 IMEX Runge-Kutta MethodsThe family of s-stage IMEX Runge-Kutta (IMEX RK) methods [7] to solve (2.17) isyk = yk−1 + τks∑i=1bEi YEk,i + τks−1∑i=1bIi YIk,i. (2.24a)Let tk−1,σ = tk−1 + cστk. The internal stages areYEk,1 = fE (yk−1, tk−1) + q(tk−1) (2.24b)and for σ = 1, · · · , s− 1, solveYIk,σ = fI(yk,σ, tk−1,σ), (2.24c)withyk,σ= yk−1 + τkσ∑i=1aEσ+1,iYEk,i + τkσ∑i=1aIσ,iYIk,i, (2.24d)and then evaluateYEk,σ+1 = fE(yk,σ, tk−1,σ)+ q (tk−1,σ) . (2.24e)The implicit schemes are generally chosen to be diagonally-implicit, so we restrict ourdiscussion to this case. IMEX RK methods can be represented by a pair of Butcher272.2. Implicit-Explicit Time-Stepping Methodstableaux, one for the explicit method and one for the implicit method:0 0 0c2 0 aI1,1 aE2,1 0c3 0 aI2,1 aI2,2 aE3,1 aE3,2 0.......... . ..... . .cs 0 aIs−1,1 aIs−1,2 · · · aIs−1,s−1 aEs,1 aEs,2 · · · aEs,s−1 00 bI1 bI2 · · · bIs−1 bE1 bE2 · · · bEs−1 bEs .(2.25)As for RK methods, for stiff problems IMEX RK methods are susceptible to theproblem of order reduction, where the order of accuracy of the method can be reducedsignificantly.LettingB = −τk[bE1 bI1 bE2 · · · bEs−1 bIs−1 bEs]⊤⊗ IN , (2.26)the single step in (2.24) can be written in matrix form asINI(2s+1)N−IN B⊤ INykYkyk−0N×1Fk (y)0N×1 =yk−1Qk0N×1 , (2.27)withYk =Yk,1...Yk,s , Fk (y) =Fk,1 (y)...Fk,s (y) and Qk =Qk,1...Qk,s , (2.28)where we have letYk,i =YEk,⌈ i2⌉ if i oddYIk, i2if i even(2.29)282.2. Implicit-Explicit Time-Stepping MethodsandFk,i =fE(yk,⌈ i2⌉−1, tk−1,⌈ i2 ⌉)if i oddfI(yk, i2, tk−1, i2+1)if i even,(2.30)with yk,σ= yk−1 + τkσ∑i=1aEσ+1,iYEk,i + τkσ∑i=1aIσ,iYIk,i.In order to write the IMEX RK scheme in the form of the time-stepping equation(1.2), we let the time-stepping vector bet(y) = Ty− f , (2.31)wherey =[Y⊤1 y⊤1 Y⊤2 y⊤2 · · · Y⊤K y⊤K]⊤(2.32)f =[F⊤1 y0 F⊤2 01×N · · · F⊤K 01×N]⊤(2.33)T =I(2s+1)NB⊤ INI(2s+1)N−IN B⊤ IN. . .. . .. . .I(2s+1)N−IN B⊤ IN. (2.34)T is a block lower-triangular matrix. We have explicitly included the internal stagesin y because they will be needed later on.The source term is given byq =[Q⊤1 01×N Q⊤2 01×N · · · Q⊤K 01×N]⊤(2.35)and S = I2K(s+1)N×2K(s+1)N .292.3. Staggered Time-Stepping MethodsExamplesThe ARS3 scheme found in [7] is a 3-stage, order 3 method that has the followingButcher tableaus:0 0 0γ 0 γ γ 01− γ 0 1− 2γ γ γ − 1 2(1− γ) 00 1/2 1/2 0 1/2 1/2,where γ = 3+√36. As mentioned above, reduction of order can be expected when theproblem is stiff.There are several other IMEX RK schemes of higher order, including an 6-stagescheme with order 4 derived in [70].2.3 Staggered Time-Stepping MethodsGhrist et al. [42, 43] introduced high-order staggered time-stepping methods for hy-perbolic systems in first-order form∂y∂ t= f (y(t), t,m) + q(t) ⇒∂u∂ t= fu (v(t), t) + qu(t)∂v∂ t= fv (u(t), t) + qv(t).(2.36)For ODE systems in this form it is known that staggered grids in space can in-crease the accuracy of finite difference and pseudospectral difference methods [37].A similar staggering in time of the two variables can be performed, leading to stag-gered linear multistep (StagLM) and staggered Runge-Kutta (StagRK) methods. Theanalysis given in [42, 43] shows that the staggered time-stepping methods can havea significantly reduced error constant and an increased imaginary stability boundarycompared to their non-staggered counterparts.302.3. Staggered Time-Stepping MethodsThe efficiency of these methods was investigated in [125, 126] for linear wave equa-tions, and the stability and convergence properties of StagRK schemes for semilinearwave equations was looked at in [84]. The stability of StagLM methods was revisitedin [41].To describe the staggered methods and how they can be represented by (1.2), wediscretize u(t) in time on the integer time levels, so that uk ≈ u(tk), with lengthNu, and v(t) is discretized on the half-integer time levels, i.e. vk+ 12≈ v(tk+ 12), withlength Nv. Clearly N = Nu + Nv. We take it as a given that these two variableswould also be staggered in space, but this is not important for the discussion thatfollows.We define the following quantities:yk = ukvk+ 12fk =fuk− 12fvk =fu (vk− 12 , tk− 12)fv (uk, tk)qk =quk− 12qvk =qu (tk− 12)qv (tk)(2.37)for k ≥ 0, with initial condition y0 =[u⊤0 v⊤12]⊤. We let f0 =[01×Nu fv0⊤]⊤andq0 =[01×Nu qv0⊤]⊤.312.3. Staggered Time-Stepping Methods2.3.1 Staggered Linear Multistep MethodsA staggered s-step multistep methods may be applied to (2.36), giving us, for k ≥ s,s∑j=0α(s)j uk−j = τs∑j=0β(s)j fuk− 12−j + τs∑j=0β(s)j quk− 12−js∑j=0α(s)j vk+1/2−j = τs∑j=0β(s)j fvk−j + τs∑j=0β(s)j qvk−j(2.38)Recall that we are using fixed time-steps τ for LM-type methods. StagLM methodsare classified into staggered backward differentiation formulas and staggered Adams-type methods, along the lines of the corresponding non-staggered methods. Due totime-staggering, even methods with β(σ)0 6= 0 will lead to explicit methods.As in Section 2.1.1, let the KN ×KN matrices A and B be defined using (2.1)and (2.3), and also define α and β as in that section.The system (2.38) can then be written asAy = τBf + τ β f0 + τ[β B]q−αy0, (2.39)withy =[y⊤1 · · · y⊤K]⊤, f =[f⊤1 · · · f⊤K]⊤q =[q⊤0 q⊤1 · · · q⊤K]⊤.(2.40)Then, setting S = τ[β B], we arrive at (1.2) by lettingt (y;y0) = Ay +αy0 − τBf − τ β f0. (2.41)ExamplesThe leapfrog method is a 1-step StagLM method of order 2, with α =[1 −1]⊤and β = 1. This can be regarded as either a staggered BDF or a staggered Adams-Bashforth scheme.322.3. Staggered Time-Stepping MethodsThe next staggered BDF schemes of interest are StagBDF3, withα =[1 −2123− 323123]⊤and β0 =2423,and StagBDF4, withα =[1 −1722− 922522− 122]⊤and β0 =1211.It was found in [41] that there are no staggered BDF schemes of higher order.Higher-order staggered Adams-Bashforth schemes include the third-order StagAB3scheme, withα =[1 −1]⊤and β =[2524− 112124]⊤,and the fourth-order StagAB4 scheme, withα =[1 −1]⊤and β =[1312− 52416− 124]⊤.7th-order and 8th-order schemes are available as well.2.3.2 Staggered Runge-Kutta MethodsStagRK schemes were derived in [42], where only explicit schemes with fixed time-steps were considered. Due to the time-staggering it is in fact unclear how variabletime-steps would be handled, so we will also only consider fixed time-steps here. It isnecessary to carefully distinguish between the number of stages s being odd or even,since the parity of s will lead to slightly different formulations.To propagate from yk−1 to yk, we compute• Number of stages s is odd:332.3. Staggered Time-Stepping Methodsuk = uk−1 + τs∑σ=1σ oddbσUok,σvk+ 12= vk− 12+ τs∑σ=1σ oddbσVok+ 12,σ,(2.42a)where the internal stages are, for σ = 1, · · · , s,Uok,σ =fu(voU,k− 12,σ, tk− 12,σ)+ qu(tk− 12,σ) if σ oddfv(uoU,k−1,σ, tk−1,σ)+ qv(tk−1,σ) if σ evenVok+ 12,σ=fv(uoV,k,σ, tk,σ)+ qv(tk,σ) if σ oddfu(voV,k− 12,σ, tk− 12,σ)+ qu(tk− 12,σ) if σ even.(2.42b)• Number of stages s is even:uk = uk−1 + τs∑σ=2σ evenbσUek,σvk+ 12= vk− 12+ τs∑σ=2σ evenbσVek+ 12,σ,(2.43a)where the internal stages are, for σ = 1, · · · , s,Uek,σ =fv(ueU,k−1,σ, tk−1,σ)+ qv(tk−1,σ) if σ oddfu(veU,k− 12,σ, tk− 12,σ)+ qu(tk− 12,σ) if σ evenVek+ 12,σ=fu(veV,k− 12,σ, tk− 12,σ)+ qu(tk− 12,σ) if σ oddfv(ueV,k,σ, tk,σ)+ qv(tk,σ) if σ even.(2.43b)342.3. Staggered Time-Stepping MethodsFor brevity of notation we have let tk−1,σ = tk−1 + cστ , tk− 12,σ = tk− 12+ cστ , andvo/eU,k− 12,σ= vk− 12+ τσ−1∑i=1i e/oaσiUo/ek,i , vo/eV,k− 12,σ= vk− 12+ τσ−1∑i=1i o/eaσiVo/ek+ 12,iuo/eU,k−1,σ = uk−1 + τσ−1∑i=1i o/eaσiUo/ek,i , uo/eV,k,σ = uk + τσ−1∑i=1i e/oaσiVo/ek+ 12,i.(2.44)With Yo/ek,σ =[Uo/ek,σ⊤Vo/ek+ 12,σ⊤]⊤, (2.42a) and (2.43a) areyk = yk−1 + τs∑σ=1σ oddbσYok,σ and yk = yk−1 + τs∑σ=1σ evenbσYek,σ, (2.45)respectively. In order to represent the procedure in the form of the time-steppingequation, we defineBo = −τ[b1 0 b3 0 · · · bs−2 0 bs]⊤⊗ IN ,Be = −τ[0 b2 0 b4 · · · bs−2 0 bs]⊤⊗ IN ,Yo/ek =Yo/ek,1...Yo/ek,s , Qo/ek =Qo/ek,1...Qo/ek,s and Fo/ek =Fo/ek,1...Fo/ek,s(2.46)withQok,σ Qek,σσ oddqu(tk− 12 ,σ)qv(tk,σ) qv(tk−1,σ)qu(tk− 12,σ)σ evenqv(tk−1,σ)qu(tk− 12,σ) qu(tk− 12 ,σ)qv(tk,σ)(2.47)352.3. Staggered Time-Stepping MethodsandFok,σ Fek,σσ oddfu (voU,k− 12 ,σ, tk− 12 ,σ)fv(uoV,k,σ, tk,σ)  fv (ueU,k−1,σ, tk−1,σ)fu(veV,k− 12,σ, tk− 12,σ)σ even fv (uoU,k−1,σ, tk−1,σ)fu(voV,k− 12,σ, tk− 12,σ) fu (veU,k− 12 ,σ, tk− 12 ,σ)fv(ueV,k,σ, tk,σ)(2.48)The propagation for yk, for both an odd and an even number of stages, can thenabstractly be represented byINIsN−IN(Bo/e)⊤INyk−1Yo/ekyk−0N×1Fo/ek0N×1 =yk−1Qo/ek0N×1 , (2.49)(2.49) constitutes a single time step in the solution procedure, so the StagRK proce-dure as a whole can be written in the form (1.2) with S = I(s+1)KN×(s+1)KN ,q = qo/e =[(qo/e1)⊤01×N(qo/e2)⊤01×N · · ·(qo/eK)⊤01×N]⊤(2.50a)t (y;y0) = to/e (y) = Ty − f (y,y0) , (2.50b)wherey = yo/e =[(Yo/e1)⊤y⊤1(Yo/e2)⊤y⊤2 · · ·(Yo/eK)⊤y⊤K]⊤f = fo/e=[(Fo/e1)⊤y⊤0(Fo/e2)⊤01×N · · ·(Fo/eK)⊤01×N]⊤362.3. Staggered Time-Stepping MethodsandT = To/e =IsN(Bo/e)⊤INIsN−IN(Bo/e)⊤IN. . . . . . . . .IsN−IN(Bo/e)⊤IN, (2.51)which is a block lower-triangular matrix. We have explicitly included the internalstages in y because they will be needed later on.ExampleThe leapfrog method is a 1-stage StagRK method of order 2. The next StagRK ofinterest is the 5-stage method of order 4 (StagRK4) that has the Butcher tableaucs Asb⊤s=0 014(2− γ) 14(2− γ) 0−12γ −12γ 014(2 + γ) 14(2 + γ) 012γ 12γ 01− 2b5 0 b5 0 b5.The method is parametrized by γ = (6b5)−1/2 and the most appealing member of thisfamily was found [42] to have the value b5 = 1/24 (and therefore γ = 2). Note thatUok,1 = Vok+ 12,2and Vok+ 12,1= Uok+1,2, so only 4 evaluations of each of fv and fu arerequired at each time level.372.4. Exponential Time-Differencing Methods2.4 Exponential Time-Differencing MethodsExponential time-differencing (ETD) methods, also referred to as exponential inte-grators, were originally developed in the 1960s [18, 91] and have attracted muchrecent attention [63, 78–80, 120, 121]. They have become an important approachto numerically solving PDEs, particularly when the PDE system is time-dependent,semi-linear, and exhibits stiffness, taking on the form∂y(t)∂ t= f(y(t), t) = Ly(t) + n(y(t), t) + q(t)y(0) = y0.(2.52)The operator L contains the leading derivatives and is linear, while n is generallynonlinear in y. Many interesting PDEs have this form.Before we review exponential integration, we first give a quick overview of Rosenbrock-type methods. Here, an ODE system resulting from a spatial semi-discretization of afully nonlinear time-dependent PDE ∂∂ ty(t) = f(y(t), t) is written in semi-linear form,after which an exponential integrator can be applied. This can be done by writingf(y(t), t) = Lk−1y(t) + nk−1(y(t), t) + q(t), (2.53)where nk−1 (y(t), t) = f(y(t), t) − Lk−1y(t). ETD schemes are applied to this lin-earization, involving exponentiation of the matrix Lk−1 and leading to exponentialRosenbrock-type methods [64, 65]. The matrix Lk−1 is meant to approximate theJacobian fy(yk−1, tk−1), or some part of it, at the kth time-step. Occasionally it issufficient to select Lk−1 = L independently of time, but when the dynamical systemtrajectory varies significantly and rapidly we may have to set Lk−1 = Lk−1 (yk−1) =∂ f∂y(yk−1, tk−1). The case where the linear operator Lk−1 depends on yk−1 adds asignificant amount of complexity to the adjoint computations considered later.In this section we will work with the linearized system (2.53), where Lk−1 = L and382.4. Exponential Time-Differencing Methodsnk−1 (y(t), t) = n (y(t), t) if the system is semilinear to begin with an L is independentof the model parameters. As with the other methods considered so far, there are twoapproaches to solving (2.53), linear multistep and Runge-Kutta. Exponential Runge-Kutta (ETDRK) methods appear be far more popular, so we limit our discussion tothese methods.2.4.1 Exponential Runge-Kutta MethodsIntegrating (2.53) exactly from time level tk−1 to tk = tk−1 + τk givesyk = eτkLk−1yk−1 +∫ tktk−1e(tk−t)Lk−1nk−1 (y(tk−1 + t), tk−1 + t) dt++∫ tktk−1e(tk−t)Lk−1q(tk−1 + t)dt(2.54)The exponential Euler method is obtained by interpolating the integrand at the knownvalue nk−1 (yk−1, tk−1) only,yk = eτkLk−1yk−1 + τkϕ1(τkLk−1) (nk−1 (yk−1, tk−1) + q(tk−1)) (2.55)where ϕ1(z) = ez−1z. This is the simplest numerical method that can be obtained forsolving (2.54).The integral in (2.54) can be approximated using some quadrature rule, leadingto a class of s-stage explicit exponential time-differencing Runge-Kutta (ETDRK)methods with matrix coefficients aσj(τkLk−1), weights bσ(τkLk−1) and nodes cσ, so for1 ≤ σ, j ≤ s we obtainyk = eτkLk−1yk−1 + τks∑σ=1bσ(τkLk−1)Yk,σ, (2.56a)392.4. Exponential Time-Differencing Methodswith the internal stagesYk,σ = nk−1(ecστkLk−1yk−1 + τkσ−1∑i=1aσi(τkLk−1)Yk,i, tk−1,σ)+ q(tk−1,σ) (2.56b)for 1 ≤ σ ≤ s, where tk−1,σ = tk−1 + cστk.The procedure starts from a known initial condition y0. There are several alternateways of writing (2.56a), but for our purposes the representation given here is mostuseful.The Butcher tableau for these methods isc1c2 a21(τLk−1)....... . .cs as1(τkLk−1) · · · as,s−1(τkLk−1)b1(τkLk−1) · · · bs−1(τkLk−1) bs(τkLk−1).The coefficients aσi and bσ are linear combinations of the entire functionsϕ0(z) = ez ϕℓ =∫ 10e(1−θ)zθℓ−1(ℓ− 1)! dθ, ℓ ≥ 1.It is not hard to see that the ϕ-functions satisfy the recurrence relationϕℓ(z) =ϕℓ−1(z)− ϕℓ−1(0)z, ℓ > 0, (2.57)and that ϕℓ(z) =∞∑i=0zi(i+ ℓ)!. Notice that the expansion of ϕℓ(z) is that of theexponential function with the coefficients shifted forward.As is evident by the structure of ϕℓ for ℓ > 0, for small z the evaluation of ϕℓ(z)will be subject to cancellation error, and this could become a problem when evaluatingϕℓ(τkLk−1) if the matrix τkLk−1 has small eigenvalues. From now on, for brevity of402.4. Exponential Time-Differencing Methodsnotation we use ϕℓ = ϕℓ(τkLk−1) and ϕℓ,σ = ϕℓ(cστkLk−1).The method given by (2.56) can be abstractly represented byINIsN−eτkLk−1 −B⊤k INyk−1Ykyk−0N×1Nk0N×1 =yk−1Qk0N×1 , (2.58)withBk = τk[b1(τkLk−1) · · · bs(τkLk−1)]⊤, (2.59a)andYk =Yk,1...Yk,s , Nk =Nk,1,...Nk,s , Qk =Qk,1...Qk,s (2.59b)whereNk,σ = nk−1(ecστkLk−1yk−1 + τkσ−1∑j=1aσj(τkLk−1)Yk,j, tk−1,σ)Qk,σ = q (tk−1,σ) .In (2.58) we have a single time step in the solution procedure, so the ETDRK pro-cedure as a whole can be represented in the form of (1.2) with S = I(s+1)KN×(s+1)KN ,q =[Q⊤1 01×N Q⊤2 01×N · · · Q⊤K 01×N]⊤, (2.60a)t (y;y0) = Ty− n (y,y0) (2.60b)wherey =[Y⊤1 y⊤1 Y⊤2 y⊤2 · · · Y⊤K y⊤K]⊤,412.4. Exponential Time-Differencing Methodsn =[N⊤1 (eτ1L0y0)⊤ N⊤2 01×N · · · N⊤K 01×N]⊤,andT =IsN−B⊤1 INIsN−eτ2L1 −B⊤2 IN. . . . . . . . .IsN−eτKLK−1 −B⊤K IN, (2.61)which is a block lower-triangular matrix. We have explicitly included the internalstages in y because they will be needed later on.ExamplesThe four-stage ETD4RK method of Cox and Matthews [26] has the following Butchertableau:01212ϕ1,2120N×N 12ϕ1,31 ϕ1,4 − ϕ1,3 0N×N ϕ1,3ϕ1 − 3ϕ2 + 4ϕ3 2ϕ2 − 4ϕ3 2ϕ2 − 4ϕ3 4ϕ3 − ϕ2. (2.62)This method can be fourth-order accurate when certain conditions are satisfied, butin the worst case is only second-order.422.4. Exponential Time-Differencing MethodsKrogstad [72] derived the method given by01212ϕ1,21212ϕ1,3 − ϕ2,3 ϕ2,31 ϕ1,4 − 2ϕ2,4 0N×N 2ϕ2,4ϕ1 − 3ϕ2 + 4ϕ3 2ϕ2 − 4ϕ3 2ϕ2 − 4ϕ3 4ϕ3 − ϕ2. (2.63)It is usually also fourth-order accurate and has order three in the worst case.The following five-stage method is due to Hochbruck and Ostermann [62]:01212ϕ1,21212ϕ1,3 − ϕ2,3 ϕ2,31 ϕ1,4 − 2ϕ2,4 ϕ2,4 ϕ2,41212ϕ1,5 − 14ϕ2,5 − a5,2 a5,2 a5,2 14ϕ2,5 − a5,2ϕ1 − 3ϕ2 + 4ϕ3 0N×N 0N×N −ϕ2 + 4ϕ3 4ϕ2 − 8ϕ3, (2.64)with a5,2 = 12ϕ2,5−ϕ3,4+ 14ϕ2,4− 12ϕ3,5. It has order four under certain mild assump-tions.2.4.2 The Action of ϕℓ on Arbitrary VectorsTo simplify the notation in this section, let L = τkLk−1 and ϕ = ϕℓ for some k ≥1, ℓ ≥ 0. We very briefly review some methods used in practice to evaluate theproduct of ϕ(L) ∈ RN×N with some arbitrary vector w ∈ RN , where N is too largefor ϕ(L) to be computed explicitly and stored in full, or where it is impractical tofirst diagonalize L.Much has been written about the approximation of these products for large N ;432.4. Exponential Time-Differencing Methodssee [63] and the references therein. Here we mention four of the most relevant ap-proaches to performing these approximations, all of which also help to address thenumerical cancellation error that would occur when computing ϕ directly using therecurrence relation (2.57).If using a Rosenbrock-type scheme with L depending on yk−1, then we are in-terested in finding methods that lend themselves to calculating the derivatives ofϕ(L(yk−1,m))w with respect to yk−1 or m. This will be explored in more detail inSection 4.4.2.Krylov Subspace MethodsThe Mth Krylov subspace with respect to a matrix L and a vector w is denoted byKM(L,w) = span{w,Lw, . . . ,LM−1w}. Normalizing ‖w‖ = 1, the Arnoldi processcan be used to construct an orthonormal basis VM ∈ CN×M of KM(L,w) and anunreduced upper Hessenberg matrix HM ∈ CM×M satisfying the standard Krylovrecurrence formulaLVM = VMHM + hM+1,MvM+1eTM , V∗MVM = IM ,with eM the Mth unit vector in CN . Using the orthogonality of VM , it can then beshown thatϕ(L)w ≈ VMϕ(HM)e1 (2.65)(see for instance [97]). It is assumed that M ≪ N , so that ϕ(HM) can be computedusing standard methods such as diagonalization or Padé approximations.There has been a lot of work on Krylov subspace methods for evaluating matrixfunctions, see for instance [34, 50] and the references therein. See in particular [61]for a discussion on Krylov subspace methods for matrix exponentials.442.4. Exponential Time-Differencing MethodsPolynomial ApproximationsPolynomial methods approximate ϕ(L) using some truncated polynomial series, forinstance Taylor series (which is rarely used in this context), Chebyshev series forHermitian or skew-Hermitian L, Faber series for general L, or Leja interpolants. See[63] and the references therein for a review of Chebyshev approximations and Lejainterpolants in the context of exponential time differencing.A polynomial approximation can generally be written in the formϕ(L)w ≈M∑j=0cjLjw, (2.66)although sometimes other forms are more suitable. For instance, in the case of Cheby-shev polynomials it makes more sense to writeϕ(L)w ≈M∑j=0cj Tj(L)w (2.67)if L is Hermitian or skew-Hermitian and the eigenvalues of L all lie inside [−1, 1].The Tj(L) satisfy the recurrence relationTj+1(L) = 2LTj(L)− Tj−1(L), j = 1, 2, . . .initialized by T0(L) = I and T1(L) = L.Rational ApproximationsThe function ϕ(z) can be estimated to arbitrary order using rational approximationsϕ(z) ≈ ϕ[m,n](z) =m∑i=0aizi/n∑k=0bk−1zk = pm(z)/qn(z).452.4. Exponential Time-Differencing MethodsThe polynomials pm(z) and qn(z) can be found using either Padé approximations orby using the Carathéodory-Fejér (CF) method on the negative real line, which is anefficient method for constructing near-best rational approximations. It has been ap-plied to the problem of approximating ϕ-functions in [109]. The Padé approximationworks for general matrices L, but the CF method used in [109] works only if z isnegative and real, so that L must be symmetric negative definite.For large N it will generally be too expensive to evaluate ϕ[m,n](L) in this form.But suppose we have that m ≤ n, qn has n distinct roots denoted by s1, . . . , sn, andpm and qn have no roots in common. Then we can find a partial fraction expansionϕ[m,n](z) = c0 +n∑i=1ck−1si − z ,where c0 is some constant and ck−1 = Res[ϕ[m,n](z), ρk−1]. In practice one would findthese coefficients simply by clearing the denominators.The product of ϕ(L) with some vector w therefore isϕ(L)w ≈ ϕ[m,n](L)w = c0w +n∑i=1ck−1 (siI− L)−1w. (2.68)See [109] for a discussion on how a common set of poles can be used for the eval-uation of different ϕℓ. While this approach requires a higher degree n in the rationalapproximation to achieve a given accuracy, in the use of exponential integration thiswould still lead to a more efficient method overall since the same computations canbe used to evaluate different ϕℓ.Contour IntegrationThe last approach we consider is based on the Cauchy integral formulaφ(z) =12πi∫Γφ(s)s− z ds462.4. Exponential Time-Differencing Methodsfor a fixed value of z, where φ(z) is some arbitrary function and Γ is a contour in thecomplex plane that encloses z and is well-separated from 0. This formula still holdswhen replacing z by some general matrix L, so thatφ(L) =12πi∫Γφ(s) (sI− L)−1 ds, (2.69)where Γ can be any contour that encloses all the eigenvalues of L. The integral isthen approximated using some quadrature rule.There is some freedom in choosing the contour integral and the quadrature rule.Kassam and Trefethen [68] proposed the contour integral approach to circumvent thecancellation error in (2.57). For convenience, they let Γ simply be a circle in thecomplex plane that is large enough to enclose all the eigenvalues of L, and then usedthe trapezoidal rule for the approximation. If L is real, then one can additionallysimplify the calculations by considering only points on the upper half of a circle withits center on the real axis, and taking the real part of the result. Discretizing thecontour using M points si and using the trapezoidal rule to evaluate (2.69), we haveφ(L) ≈ 1Mℜ(M∑i=1siφ(si) (siI− L)−1), (2.70)where M must be chosen large enough to give a good approximation. Then let φ = ϕand multiply by w to get an approximation of the product ϕ(L)w.Since the same contour integral will be used throughout the procedure, the quadra-ture points si remain the same for all ϕ-functions and one can therefore use the samesolutions vi = (siI− L)−1w of the resolvent systems when computing the product ofdifferent ϕ-functions with some vector w.A contour integral that specifically applies to ϕ-functions isϕℓ(z) =12πi∫Γessℓ1s− z ds.472.4. Exponential Time-Differencing MethodsAgain, different contour integrals and quadrature rules can be used. If z is on thenegative real line or close to it, then we can use a Hankel contour (see also [122]).Letting z = L and using the trapezoidal rule, we getϕℓ(L) ≈ 1Mℜ(M∑i=1esisℓi(siI− L)−1), (2.71)for quadrature points si on Γ. This integral representation has the advantage thatthe integrand is exponentially decaying and therefore fewer quadrature points needto be used [63, 109].Incidentally, (2.70), (2.71) and (2.68) all require an efficient procedure for solvinglinear systems of the form(siI− L)vi = w.This can be achieved, for instance, using sparse direct solvers or preconditioned Krylovsubspace methods. Solving M different linear systems might seem prohibitive, butwhen using Krylov methods one has the advantage that KM(L,w) = KM(sI− L,w)for all s ∈ CN , so that the same Krylov subspace can theoretically be used for allsi. The computation also allows for parallelization since each system can be solvedindependently.48Chapter 3Derivatives of the Misfit FunctionThe time-stepping equationt(y (p) ,m,y0) = Sq(s), (3.1)is now used in conjunction with the adjoint method to systematically find the pro-cedures for computing the action of the sensitivity matrix J = ∂ d∂ pand its transpose,as well as the action of the Hessian of the misfit function M = M(d(p)) with re-spect to the parameters p =[m⊤, y⊤0 , s⊤]⊤. From now on we explicitly include thedependence of y on the component(s) of p we are currently interested in.We assume that the data d depends on the forward solution y in some knownand (twice-)differentiable way, so that the computations of∂d∂y(and∂ 2d∂y∂y) do notpresent a significant challenge. In most applications the data at a given time level k,dk, depends only on yk, i.e. the data at a given point in time depends on the solutiononly at that time, and we will assume that this is the case here too. The modificationsare straightforward in case the data depends on the solution from previous time-steps,for instance if the data consists of some time-averaged measurements of the solution.In this chapter we let yk include, for ease of notation, the internal stages at thekth time level when the time-stepping method is an RK-type method. In this casewe abuse notation slightly and let N denote the length of yk when in fact the actuallength is (s + 1)N , with s denoting the number of stages of the RK method. Thisshould not cause any serious confusion and we will not further elaborate on resultingimplementation details.493.1. The Sensitivity Matrix3.1 The Sensitivity MatrixThe sensitivity matrix (or Jacobian) is an Nd ×Np matrix defined byJ :=∂d∂p=[∂d∂m∂d∂y0∂d∂ s]=∂d∂y[∂y∂m∂y∂y0∂y∂ s], (3.2)where we have used the chain rule. The components of each column are the partialderivatives of every data point with respect to one of the unknown parameters; simi-larly, each row is the partial derivative of a given data point with respect to all of theunknown parameters. Sensitivity analysis involves the investigation of the structureof J to determine the relative influence of various parameters on the measured data.As mentioned in the introduction of this chapter, we consider the computationof ∂ d∂ yto be known. We therefore focus on the computation of the derivative of theforward solution with respect to the parameters and use the adjoint method to findcomputationally tractable expressions for these derivatives:• Derivative of y with respect to m:Taking the derivative with respect to the model parameters on both sides of thediscrete time-stepping system (1.2), we get0N×Nm =∂∂m(t (y(m),m)) =∂ t (m)∂m+∂ t (y)∂y∂y (m)∂m(3.3)and therefore∂y∂m= −(∂ t∂y)−1∂ t∂m. (3.4)• Derivative of y with respect to y0:Now taking the derivative with respect to the initial condition y0 on both sides of(1.2) leads to:0N×Ny0 =∂∂y0t (y(y0),y0) =∂ t (y0)∂y0+∂ t (y)∂y∂y (y0)∂y0, (3.5)503.1. The Sensitivity Matrixso that∂y∂y0= −(∂ t∂y)−1∂ t∂y0. (3.6)• Derivative of y with respect to s:Finally, we differentiate (1.2) with respect to the source parameters s on both sides:S∂q∂ s=∂∂ st (y(s)) =∂ t (y)∂y∂y (s)∂ s(3.7)and hence∂y∂ s=(∂ t∂y)−1(S∂q∂ s). (3.8)It follows thatJ =∂d∂y(∂ t∂y)−1 [− ∂ t∂m− ∂ t∂y0S∂q∂ s]. (3.9)The derivatives of t are derived in Appendix A and references to the relevant expres-sions are given in Chapter 4. The solution of the linearized forward problemξ =∂ t∂y−1q (3.10)is given in Chapter 5 for each time-stepping method. It is neither desirable nornecessary to compute the sensitivity matrix J explicitly, what one really needs is tobe able to quickly compute the product of J with some arbitrary vector wp of lengthNp, and we outline this procedure in Algorithm 3.1. To simplify the notation wehave let R = S∂ q∂ s, which has dimensions NK ×Ns. Rk,: represents the kth N ×Nsblock-row of R.Algorithm 3.1. Computing u = JwpLet wp =[w⊤m w⊤y0w⊤s]⊤be an arbitrary vector of length Nm +Ns+Ny0. Theproduct u = Jwp is computed as follows:1. If the forward solution is not available already, solve for and store y.513.1. The Sensitivity Matrix2. For k = 1, · · · , K:(a) Compute the source term qk = Rk,:ws − ∂ tk∂mwm − ∂ tk∂ y0wy0. Chapter 4outlines how the last two terms on the right-hand side can be computed.(b) Solve for ξk using one step of the linearized forward time-stepping method∂ t∂yas shown in Chapter 5, with qk acting as the source term and ξ0 =0N×1.(c) Compute uk =∂dk∂ykξk.3. Set u =[u⊤1 · · · u⊤K]⊤.In the following two sections we will also need to compute the action of the trans-pose of the sensitivity matrixJ⊤ =− ∂ t∂m⊤− ∂ t∂y0⊤∂q∂ s⊤S⊤(∂ t∂y)−⊤∂d∂y⊤(3.11)on some vector wd of length Nd. This will involve the solution of the adjoint problemλ =∂ t∂y−⊤θ (3.12)where λ is the adjoint solution and θ is the adjoint source. Solving the adjoint problemis discussed in detail in Chapter 6 for each time-stepping method. The transposes ofthe derivatives of t with respect to m and y0 are given in Chapter 4 and derived inAppendix A. Algorithm 3.2 presents the procedure for computing the product of J⊤with wd.523.1. The Sensitivity MatrixAlgorithm 3.2. Computing u = J⊤wdLet wd =[w⊤1 · · · w⊤K]⊤be an arbitrary vector of length Nd =∑Kk=1Ndk . Theproduct u = J⊤wd is computed as follows:1. Set um = 0Nm×1, us = 0Ns×1 and uy0 = 0Ny0×1. These will store u.2. If the forward solution is not available already, solve for and store y.3. For k = K, · · · , 1:(a) Compute θk =∂ dk∂yk⊤wk.(b) Solve for λk using one step of the adjoint time-stepping method, with θkacting as the adjoint source term and λK+1 = 0N×1. See Chapter 6.(c) Update us = us + R⊤k,:λk, and (see Chapter 4) um = um −∂ tk∂m⊤λk anduy0 = uy0 −∂ tk∂ y0⊤λk.4. Set u =[u⊤m u⊤y0u⊤s]⊤.Note the following:• The forward solution y is required to compute both the action of J and its transpose.• The product u = Jwp requires the solution of the linearized forward problem,and the product u = J⊤wd requires the adjoint solution. This is in addition to thecomputation of y (if y is not already available). If y is already available, computingthe action of the sensitivity matrix, or its transpose, is about as expensive ascomputing y, not counting any extra computational effort that might be requiredto prepare the source terms.• As mentioned, in the case of RK methods the forward solution yk at time-stepk includes the internal stages. There is further abuse in notation, as θk, vk and533.2. The Gradientλk are assumed to also include their respective internal stages for that time step,therefore these vectors are also of length (s + 1)N . This does not apply to theinitial condition v0 and the final condition λK+1, which are both only of length N .• We do not address here the derivatives of the source q with respect to the sourceparameters s since the source terms are application-dependent and are expected tobe independent of the time-stepping method.3.2 The GradientThe gradient of M with respect to p, ∇pM, is used by all iterative gradient-basedoptimization methods, such as steepest descent, nonlinear conjugate gradient meth-ods, quasi-Newton methods such as BFGS, etc. It gives the direction in which thedirectional derivative ofM has the largest value, i.e. the direction in which the largestincrease in the value of M is to be expected at a given point in parameter space. Tocompute the gradient of the misfit function, we use the chain rule to immediately get∇pM = ∂ d∂ p⊤∇dM = J⊤∇dM.The gradient ∇pM is therefore easily obtained using Algorithm 3.2 with w = ∇dM.The gradient of M with respect to d depends on the misfit function that is being usedand is independent of the time-stepping method. We discuss some misfit functionsthat might be of interest, and their derivatives with respect to d, in Chapter 7.543.3. The Hessian3.3 The HessianThe Newton method is a gradient-based optimization procedure that, in addition tothe gradient, requires the action of the full Hessian of M with respect to pHMw := ∂2M∂p∂pw = ∇p(∇pM⊤w) , (3.13)on an arbitrary vector w =[w⊤m w⊤y0w⊤s]⊤of length Np = Nm +Ny0 +Ns. Theadjoint method can again be used to compute the product HMw.The Hessian has the following structureHM = ∂2M∂p∂p=∂2M∂m∂m∂2M∂m∂y0∂2M∂m∂ s∂2M∂y0∂m∂2M∂y0∂y0∂2M∂y0∂ s∂2M∂ s∂m∂2M∂ s∂y0∂2M∂ s∂ s(3.14)and the presence of the cross-terms in (3.14) makes the application of the adjointmethod quite cumbersome, so we have left the calculations out of the main text andput it in Appendix B.We use the following notation: if x, y, z and w are some vector quantities ofappropriate size, we let∂2x∂y∂zw =∂∂y(∂x∂zw∣∣∣∣w),where the ·|w on the right-hand side means that w is taken to be fixed when per-forming the differentiation with respect to y. Note that∂2x∂y∂zis actually a three-dimensional tensor and its product with a vector is ambiguously defined, so it isimportant to keep our convention in mind.553.3. The HessianThe expressions that we found are (see (B.15))HMwp = ∂2M∂p∂pwp=− ∂ t∂m⊤− ∂ t∂y0⊤∂q∂ s⊤S⊤µ−∂ 2t∂m∂mwm⊤+∂ 2t∂m∂y0wy0⊤+∂ 2t∂m∂yx⊤∂ 2t∂y0∂mwm⊤+∂ 2t∂y0∂y0wy0⊤+∂ 2t∂y0∂yx⊤− ∂2q∂ s∂ sws⊤S⊤λwithµ =∂ t∂y−⊤(∂d∂y⊤ ∂2M∂d∂d∂d∂yx +(∂2d∂y∂yx)⊤∇dM+ (3.15a)−(∂ 2t∂y∂mwm⊤+∂ 2t∂y∂y0wy0⊤+∂ 2t∂y∂yx⊤)λ),andx =∂y∂pwp =∂ t∂y−1(S∂q∂ sws − ∂ t∂mwm − ∂ t∂y0wy0). (3.15b)As far as we know the derivation of the Hessian with respect to three distinct setsof parameters (m, s and y0) using the discrete adjoint method is new.We do not give an algorithm detailing the computation of the Hessian, but we havederived the expressions of the second derivatives of t for each of the time-steppingmethods, except for ETDRK. The derivations are presented in Appendix A, and therelevant equations and pages are pointed to in Chapter 4 for ease of reference.The computation of the action of the Hessian requires the availability of the for-ward solution y, the computation of two adjoint solutions λ and µ, and the com-putation of one linearized forward solution x. The implementation of (3.15) can bedone in such a way that only y and x need to be stored in full.The full Hessian is actually rarely used in practice though, in large part becauseof the added expense of computing the second derivatives and the difficulty in coding563.3. The Hessianthem. It will also generally not lead to improved convergence rates in numericalminimization procedures unless one is fairly close to the minimum already, so usingthe full Hessian might not always be beneficial even if an efficient implementation isat hand.As a result, the action of the Hessian is often approximated, for instance in thecase of the (generalized) Gauss-Newton method we instead use the symmetric positivesemi-definite approximationHMwp ≈ HGNM wp = J⊤∂2M∂d∂dJwp, (3.16)which is obtained by omitting all the second derivatives of t and d. A related methodis the Levenberg-Marquardt method, where we add a damping term δ > 0 to get asymmetric positive definite approximationHMwp ≈ HLMM wp =(J⊤∂ 2M∂d∂dJ+ δ INp)wp. (3.17)Both of these approximations are computed using Algorithm 3.1 followed by Al-gorithm 3.2, so that in addition to y we will have to compute one linearized forwardsolution and one adjoint solution.57Chapter 4Derivatives of tAs we saw in the previous chapter, the discrete adjoint method requires the derivativesof the time-stepping vector t(y,m,y0) with respect to y, m and y0. Only the firstderivatives of t are required if computing the gradient or the action of the sensitivitymatrix, but when computing the action of the Hessian the second derivatives arerequired as well.The expressions for all of these derivatives are of course highly dependent onthe time-stepping method being used. We are in fact not so much interested in theexpressions of the derivative matrices themselves, but on the action that they, andtheir transposes, have on arbitrary vectors. The derivations can be quite technical,so to aid the readability of this thesis we have opted to include them in AppendixA. In this chapter, for ease of reference, we simply present the page and equationnumbers where expressions that assist in computing these derivatives and productscan be found for each of the time-stepping methods. We emphasize that even thoughthe derivations of the derivative terms are in the appendix, they form an importantpart of the contribution of this thesis.Due to space constraints we do not give the products of the second derivativeswith arbitrary vectors, but the expressions of the derivatives themselves that we havefound will help the interested reader in the implementation.Recall that for RK-type methods we defined the solution vector to be (with asmall change of notation)y =[ŷ⊤1 · · · ŷ⊤K]⊤where ŷk =[Y⊤k y⊤k]⊤. Yk are the internal stages of the kth time-step. In Chapter584.1. Regular Time-Stepping Methods3 we abused notation slightly by letting yk also include the internal stages of thesolution at the kth time-step, so that what we were actually referring to was ŷk. Wedid this for notational convenience, but from now on we will keep ŷk and yk distinct.4.1 Regular Time-Stepping Methods4.1.1 Linear Multistep MethodsReferences to the first derivatives of t are given in Table 4.1, and references to thesecond derivatives can be found in Table 4.2.We let w =[w⊤1 · · · w⊤K]⊤be an arbitrary vector of length KN , wm an arbi-trary vector of length Nm, and w0 an arbitrary vector of length N .Derivative Product Page Equation∂ t∂y∂ tk∂yw p.177 (A.4)∂ tk∂yj⊤wk p.177 (A.5)∂ t∂m∂ tk∂mwm p.178 (A.10)∂ tk∂m⊤wk p.178 (A.11)∂ t∂y0∂ tk∂y0w0 p.178 (A.15)∂ tk∂y0⊤wk p.179 (A.16)Table 4.1: References to the first derivatives of t for linear multistep methods.594.1. Regular Time-Stepping Methods∂ 2t∂m∂ywp. 179, (A.18)∂ 2t∂m∂mwp. 181, (A.26)∂ 2t∂m∂y0w0p. 184, (A.36)∂ 2t∂y0∂ywp. 180, (A.21)∂ 2t∂y0∂mwp. 182, (A.29)∂ 2t∂y0∂y0w0p. 184, (A.39)∂ 2t∂y∂ywp. 180, (A.22)∂ 2t∂y∂mwp. 182, (A.32)∂ 2t∂y∂y0w0p. 185, (A.42)Table 4.2: References to the second derivatives of t for linear multistep methods.4.1.2 Runge-Kutta MethodsReferences to the first derivatives of t are given in Table 4.3, and references to thesecond derivatives can be found in Table 4.4.Let wm be an arbitrary vector of length Nm and w0 an arbitrary vector of lengthN . Also let w =[ŵ⊤1 ŵ⊤2 · · · ŵ⊤K]⊤be an arbitrary vector of length (s + 1)KNdefined analogously to y.Derivative Product Page Equation∂ t∂y∂ tk∂yw p.188 (A.49)∂ tk∂yj⊤wk p.188 (A.52)∂ t∂m∂ tk∂mwm p.189 (A.57)∂ tk∂m⊤wk p.189 (A.58)Continued on next page. . .604.1. Regular Time-Stepping MethodsDerivative Product Page Equation∂ t∂y0∂ tk∂y0w0 p.190 (A.61)∂ tk∂y0⊤wk p.190 (A.62)Table 4.3: References to the first derivatives of t for Runge-Kutta methods.∂ 2t∂m∂ywp. 193, (A.68)∂ 2t∂m∂mwmp. 195, (A.72)∂ 2t∂m∂y0w0p. 197, (A.78)∂ 2t∂y0∂ywp. 194, (A.71)∂ 2t∂y0∂mwmp. 195, (A.73)∂ 2t∂y0∂y0w0p. 196, (A.77)∂ 2t∂y∂ywp. 191, (A.64)∂ 2t∂y∂mwmp. 195, (A.75)∂ 2t∂y∂y0w0p. 197, (A.79)Table 4.4: References to the second derivatives of t for Runge-Kutta methods.614.2. Implicit-Explicit Time-Stepping Methods4.2 Implicit-Explicit Time-Stepping Methods4.2.1 IMEX Linear Multistep MethodsReferences to the first derivatives of t are given in Table 4.5, and references to thesecond derivatives can be found in Table 4.6.We let w =[w⊤1 · · · w⊤K]⊤be an arbitrary vector of length KN , wm an arbi-trary vector of length Nm, and w0 an arbitrary vector of length N .Derivative Product Page Equation∂ t∂y∂ t∂yw p.199 (A.84)∂ t∂y⊤w p.199 (A.85)∂ t∂m∂ t∂mwm p.200 (A.90)∂ t∂m⊤w p.200 (A.91)∂ t∂y0∂ t∂y0w0 p.201 (A.95)∂ t∂y0⊤w p.201 (A.96)Table 4.5: References to the first derivatives of t for IMEX linear multistep methods.∂ 2t∂m∂ywp. 202, (A.98)∂ 2t∂m∂mwmp. 204, (A.106)∂ 2t∂m∂y0w0p. 207, (A.116)Continued on next page. . .624.2. Implicit-Explicit Time-Stepping Methods∂ 2t∂y0∂ywp. 202, (A.101)∂ 2t∂y0∂mwmp. 205, (A.109)∂ 2t∂y0∂y0w0p. 208, (A.119)∂ 2t∂y∂ywp. 203, (A.102)∂ 2t∂y∂mwmp. 206, (A.112)∂ 2t∂y∂y0w0p. 208, (A.122)Table 4.6: References to the second derivatives of t for IMEX linear multistep meth-ods.4.2.2 IMEX Runge-Kutta MethodsReferences to the first derivatives of t are given in Table 4.7, and references to thesecond derivatives can be found in Table 4.8.Let wm be an arbitrary vector of length Nm and w0 an arbitrary vector of lengthN . Also let w =[ŵ⊤1 ŵ⊤2 · · · ŵ⊤K]⊤be an arbitrary vector of length (s + 1)KNdefined analogously to y.634.2. Implicit-Explicit Time-Stepping MethodsDerivative Product Page Equation∂ t∂y∂ tk∂yw p.212 (A.129)∂ t∂y⊤w p.214 (A.133)∂ t∂m∂ t∂mw p.215 (A.137)∂ t∂m⊤w p.215 (A.138)∂ t∂y0∂ t∂y0w0 p.216 (A.141)∂ t∂y0⊤w p.216 (A.142)Table 4.7: References to the first derivatives of t for IMEX Runge-Kutta methods.∂ 2t∂m∂ywp. 217, (A.143)∂ 2t∂m∂mwmp. 219, (A.148)∂ 2t∂m∂y0w0p. 222, (A.152)∂ 2t∂y0∂ywp. 217, (A.144)∂ 2t∂y0∂mwmp. 219, (A.149)∂ 2t∂y0∂y0w0p. 222, (A.153)∂ 2t∂y∂ywp. 218, (A.147)∂ 2t∂y∂mwmp. 220, (A.150)∂ 2t∂y∂y0w0p. 222, (A.154)Table 4.8: References to the second derivatives of t for IMEX Runge-Kutta methods.644.3. Staggered Time-Stepping Methods4.3 Staggered Time-Stepping Methods4.3.1 Staggered Linear Multistep MethodsReferences to the first derivatives of t are given in Table 4.9, and references to thesecond derivatives can be found in Table 4.10.We let w =[w⊤1 · · · w⊤K]⊤be an arbitrary vector of length KN , wm an arbi-trary vector of length Nm, and w0 an arbitrary vector of length N .Derivative Product Page Equation∂ t∂y∂ t∂yw p.225 (A.159)∂ t∂y⊤w p.225 (A.160)∂ t∂m∂ t∂mwm p.226 (A.165)∂ t∂m⊤w p.226 (A.166)∂ t∂y0∂ t∂y0w0 p.227 (A.170)∂ t∂y0⊤w p.227 (A.171)Table 4.9: References to the first derivatives of t for staggered linear multistep meth-ods.∂ 2t∂m∂ywp. 228, (A.173)∂ 2t∂m∂mwmp. 229, (A.176)∂ 2t∂m∂y0w0p. 230, (A.180)Continued on next page. . .654.3. Staggered Time-Stepping Methods∂ 2t∂y0∂ywp. 228, (A.174)∂ 2t∂y0∂mwmp. 229, (A.177)∂ 2t∂y0∂y0w0p. 231, (A.181)∂ 2t∂y∂ywp. 228, (A.175)∂ 2t∂y∂mwmp. 229, (A.178)∂ 2t∂y∂y0w0p. 231, (A.182)Table 4.10: References to the second derivatives of t for staggered linear multistepmethods.4.3.2 Staggered Runge-Kutta MethodsReferences to the first derivatives of t are given in Table 4.11, and references to thesecond derivatives can be found in Table 4.12.Let wm be an arbitrary vector of length Nm and w0 an arbitrary vector of lengthN . Also let w =[ŵ⊤1 ŵ⊤2 · · · ŵ⊤K]⊤be an arbitrary vector of length (s + 1)KNdefined analogously to y.664.3. Staggered Time-Stepping MethodsDerivative Product Page Equation∂ t∂y∂ tk∂yw p.238 (A.197)∂ t∂y⊤w p.240 (A.201)∂ t∂m∂ t∂mw p.242 (A.205)∂ t∂m⊤w p.243 (A.206)∂ t∂y0∂ t∂y0w0 p.245 (A.210)∂ t∂y0⊤w p.246 (A.211)Table 4.11: References to the first derivatives of t for staggered Runge-Kutta methods.∂ 2t∂m∂ywp. 247, (A.212)∂ 2t∂m∂mwmp. 253, (A.217)∂ 2t∂m∂y0w0p. 260, (A.222)∂ 2t∂y0∂ywp. 248, (A.213)∂ 2t∂y0∂mwmp. 254, (A.218)∂ 2t∂y0∂y0w0p. 261, (A.223)∂ 2t∂y∂ywp. 252, (A.216)∂ 2t∂y∂mwmp. 259, (A.221)∂ 2t∂y∂y0w0p. 263, (A.224)Table 4.12: References to the second derivatives of t for staggered Runge-Kuttamethods.674.4. Exponential Time-Stepping Methods4.4 Exponential Time-Stepping Methods4.4.1 Exponential Runge-Kutta MethodsReferences to the first derivatives of t are given in Table 4.13. We do not considersecond derivatives of ETDRK methods in this thesis.Let w be an arbitrary vector of length Nm and w0 an arbitrary vector of lengthN . Also let w =[ŵ⊤1 · · · ŵ⊤K]⊤be an arbitrary vector of length (s+1)KN definedanalogously to y.Derivative Product Page Equation∂ t∂y∂ tk∂yw p.267 (A.232)∂ t∂y⊤w p.268 (A.235)∂ t∂m∂ t∂mw p.270 (A.239)∂ t∂m⊤w p.270 (A.240)∂ t∂y0∂ t∂y0w0 p.271 (A.241)∂ t∂y0⊤w p.272 (A.243)Table 4.13: References to the first derivatives of t for exponential time-differencingRunge-Kutta methods.4.4.2 The Derivatives of ϕℓIf the linear operator Lk depends on either yk or m we have to be able to take thederivatives of the products aσj(τkLk(yk,m))w, bσ(τkLk(yk,m))w and ecστkLk(yk,m)w,where w is an arbitrary vector of length N , with respect to yk or m.684.4. Exponential Time-Stepping MethodsSince the aσj and bσ are linear combinations of the ϕℓ,σ-and ϕℓ-functions respec-tively, we will simply consider the derivatives of ϕ(L)w for some arbitrary ϕ-functionand matrix L = τkLk(yk,m). Without loss of generality we assume in this subsec-tion that the differentiation is with respect to yk, but all the results also apply whentaking the derivative with respect to m.The calculation of the derivatives of the products of the ϕℓ-functions will dependon the way these terms are evaluated numerically. Recall the approaches for evaluat-ing ϕℓ reviewed in Section 2.4.2.Krylov Subspace MethodsUsing Krylov subspace methods is unfortunately unsuitable for our purposes since if Ldepends on yk or m, it is extremely difficult to find the dependence of the right-handside of (2.65) on these variables. For this reason we will not consider this approachfurther here, although it can of course be used for parameter estimation problemswhere L is independent of yk and m.Polynomial ApproachIf we represent ϕ by a truncated polynomial series as in (2.66) and multiply by w,we haveϕ(L(yk))w ≈M∑j=0ωj Lj(yk)w.Let L depend on a single parameter z first. The derivative of Lj(z)w then is∂ Lj(z)∂ zw =j∑i=1Li−1∂ L(z)∂ zLj−iw =j∑i=1Li−1∂ L(z)∂ zvj−iwith vj−i = Lj−iw. In the case of L = L(yk) we therefore have∂ Lj(yk)w∂ yk=j∑i=1Li−1∂ L(yk)vj−i∂ yk.694.4. Exponential Time-Stepping MethodsThis implies that we need to have j − 1 derivatives ∂ L(yk)vj−i∂ ykavailable for each j,but of course a lot of these derivatives can also be reused for different values of j.In the special case where the matrices L and∂ L∂ ycommute for each element y ofyk, we can simplify the above by using the matrix analogue of the usual power rule:∂ Lj(yk)∂ ykw = Lj−1∂ L(yk)w∂ yk.This is much less cumbersome to implement, but it is hard to think of realistic sce-narios where this commutativity property would hold.The drawback of considering each monomial term on its own is that this does notnecessarily reflect how the polynomial approximation of ϕ is actually computed. Forinstance, the Chebyshev approximation (2.67) isϕ(L)w ≈M∑j=0ωj Tj(L)wassuming L(m) is Hermitian or skew-Hermitian and the eigenvalues of L(yk) all lieinside [−1, 1]. Each Tj(yk)w is computed using the recurrence relationTj+1(L(yk))w = 2LTj(L(yk))w− Tj−1(L(yk))w, j = 1, 2, . . .initialized by T0(L(yk))w = w and T1(L(yk))w = L(yk)w. Taking the derivative of(2.67) with respect to yk,∂ ϕ(L)w∂ yk≈M∑j=0ωj∂ Tj(L)w∂ yk,with the recurrence relation∂ Tj+1(L)w∂ yk= 2∂ (Lv)∂ yk+ 2L∂ Tj(L)w∂ yk− ∂ Tj−1(L)w∂ yk, j = 1, 2, . . .704.4. Exponential Time-Stepping Methodswhere v = Tjw,∂ T0(L)w∂ yk= 0N×N and∂ T1(L)w∂ yk=∂ (Lw)∂ yk.We also need to be able to compute the products of the transposes of∂ ϕ(L)w∂ ykwith some vector; this is straightforward to derive from the equations above.Rational Approximations and Contour IntegrationWe saw in Section 2.4.2 that the action of a ϕ-function can be computed by bothrational approximations (under certain conditions) and contour integrals in the formϕ(L)w ≈M∑i=1ωi (siI− L)−1w,for some complex scalars ωi and si, where the si do not coincide with the eigenvaluesof L.To find the derivative of ϕ(L)w in this case, let vi = (siI− L)−1w, so that∂ ϕ(L)w∂ yk≈M∑i=1ωi∂ vi∂ yk. (4.1)To find an expression for∂ vi∂ yk, consider (siI− L)vi = w and take the derivative withrespect to yk on both sides,∂∂ yk(siI− L)vi = 0N×N ⇒ (siI− L) ∂ vi∂ yk− ∂ (Lvi)∂ yk= 0N×N⇒ ∂ vi∂ yk= (siI− L)−1 ∂ (Lvi)∂ yk,where the vi on the right-hand side is taken to be fixed.For each term in the sum in (4.1) we thus require two matrix solves, one to find viand an additional one to then find (siI− L)−1 ∂ (Lvi)∂ yk . As mentioned previously, thevi can be computed using the same Krylov subspace. However, with z an arbitrary714.4. Exponential Time-Stepping Methodsvector of length N , the linear systems(siI− L)ui = ∂ (Lvi)∂ ykzeach have a different right-hand side, and therefore we are no longer working in thesame Krylov subspace. This means that we need to solve M different linear systemsjust for a single evaluation of the derivative of a given ϕ-function, which is not ideal.The process is fortunately highly parallelizable and given the ease of access to a largenumber of processors these days, we do not consider this to be the bottleneck it mighthave been just a few years ago. Nonetheless it is a significant inconvenience for manya mathematician, and the polynomial approach does not suffer from this limitation.We also should be able to compute the transpose of (4.1). In this case we have∂ ϕ(L)w∂ yk⊤≈M∑i=1ωi∂ vi∂ yk⊤,with∂ vi∂ yk⊤=∂ (Lvi)∂ yk⊤(siI− L)−⊤ = ∂ (Lvi)∂ yk⊤ (siI− L⊤)−1, so when multiplyingby some vector z we can work within the same Krylov subspace KM(L⊤, z) for anysi ∈ CN .72Chapter 5The Linearized Forward ProblemIn Chapter 3 we showed that computing the action of the sensitivity matrix J onsome vector wp of length Np requires the solution of the linearized forward problemξ =(∂ t∂y)−1q, (5.1)where we will refer to ξ as the linearized forward solution and q in this chapter istaken to be some arbitrary source term; it is not the same source used by the forwardproblem. In the context of sensitivity analysis we will haveq =[− ∂ t∂m− ∂ t∂y0S∂q∂ s]wp,see (3.9). The linearization of the time-stepping equation,∂ t∂y, is derived in AppendixA for each of the time-stepping methods considered in this thesis.Solving the linearized forward problem amounts to simply solving the standardforward problem with the nonlinear terms having been linearized about the valueof the forward solution y at the current time level. Therefore the methods in thischapter are not particularly interesting and we present them here simply for the sakeof having a complete discussion; we also give examples of the linearizations of theparticular time-stepping schemes given as examples in Chapter 2.The exception to the above are exponential Rosenbrock-type methods, where asignificant number of additional terms appear from the linearization. In this case thelinearized forward problem is not at all straightforward, so the example given for thiscase is especially important to illustrate the implementation.735.1. Regular Time-Stepping MethodsWe make the following remarks before we start:• The linearization of the nonlinear operators about the current values of the forwardsolution means that we must have the forward solution y available, either in storageor computed on the fly using checkpointing.• The structure of (5.1) dictates that the initial value of the linearized forward solu-tion must be ξ0 = 0N×1.• We do not have a source weighting operator in (5.1), so for LM-type methods wedo not need to combine source values from different time levels.• We do not assume any structure on q, so it is possible that RK-type methods willhave a source term appearing in the update step.• In the case of the operator f being linear, the methods below are identical to thestandard forward solution procedure for linear problems, apart from how the sourceterms are treated.5.1 Regular Time-Stepping MethodsHere we present the linearized LM and RK methods, where, for LM methods, thenonlinear operator f is linearized about the solution at the kth time level, yk, givingus the linear operator∂ fk∂y. For RK methods the operator is linearized about the σthinternal stage at the kth time-step,∂Fk,σ∂y.5.1.1 Linear Multistep MethodsLet ξ =[ξ⊤1 · · · ξ⊤K]⊤represent the solution to the linearized forward problem,and let q =[q⊤1 · · · q⊤K]⊤be the source.745.1. Regular Time-Stepping MethodsThe K × K block matrix ∂ t∂yrepresents the linearized time-stepping procedureand was found in Section A.1.1 to have the form of (2.1) and (2.3), with the (k, j)thN ×N block being (see (A.3))∂ tk∂yj=α(σ)k−jIN − τ β(σ)k−j∂ fj∂yif k⋆ ≤ j ≤ k0 otherwise,(5.2)where σ = min (k, s) and k⋆ = max (1, k − s).To solve (5.1) we therefore must solve, for 1 ≤ k ≤ K,∂ tk∂ykξk = qk −k−1∑j=k⋆∂ tk∂yjξj .The solution procedure is summarized in Algorithm 5.1.Algorithm 5.1. The Linearized Linear Multistep MethodLet σ = min (k, s) and k⋆ = max (1, k − s).For k = 1, · · · , K, solve for ξk:(IN − τ β(σ)0∂ fk∂y)ξk = qk −k−1∑j=k⋆(α(σ)k−jIN − τ β(σ)k−j∂ fj∂y)ξj ,with∂ fk∂y=∂ f (yk)∂y=∂ f (yk, tk)∂y.ExamplesWe find the linearized versions of the IMEX linear multistep methods mentioned in2.2.1.755.1. Regular Time-Stepping MethodsThe linearized s-step Adams-Bashforth methods with s = 1, 2, 3 are, for k = 1, · · · , K,s = 1 : ξk = qk +(IN + τ∂ f (yk−1)∂y)ξk−1s = 2 : ξk = qk +(IN + τ32∂ f (yk−1)∂y)ξk−1 −τ2∂ f (yk−2)∂yξk−2s = 3 : ξk = qk +(IN + τ2312∂ f (yk−1)∂y)ξk−1 − τ43∂ f (yk−2)∂yξk−2++ τ512∂ f (yk−3)∂yξk−3.The linearized s-step BDFs of orders s = 1, 2, 3 are, for k = 1, · · · , K,s = 1 :(IN − τ ∂ f (yk)∂y)ξk = qk + ξk−1s = 2 :(IN − τ 23∂ f (yk)∂y)ξk = qk +43ξk−1 −13ξk−2s = 3 :(IN − τ 611∂ f (yk)∂y)ξk = qk +1811ξk−1 −911ξk−2 +211ξk−3.5.1.2 Runge-Kutta MethodsFor RK methods we have ξ =[ξ̂⊤1 · · · ξ̂⊤K]⊤, where ξ̂k =[Ξ⊤k ξ⊤k]⊤. Ξk consistsof the internal stages at the kth time-step. The source term q =[q̂⊤1 · · · q̂⊤K]⊤isdefined similarly.The (s+ 1)KN × (s+ 1)KN matrix ∂ t∂ywas found in A.1.2 to be (see (A.48))∂ t∂y=A1B⊤ INC2 A2−IN B⊤ IN. . . . . . . . .CK AK−IN B⊤ IN,765.1. Regular Time-Stepping Methodswhere (see (A.47))Ak = IsN − τk ∂Fk∂y(As ⊗ IN)Ck = −∂Fk∂y(1s ⊗ IN)and B = −τkbs ⊗ IN . ∂Fk∂yis given by∂Fk∂y= blkdiag(∂Fk,1∂y, · · · , ∂Fk,s∂y)with Fk,σ = f(yk,σ, tk−1 + cστk)and yk,σ= yk−1 + τks∑i=1aσiYk,i.The system (5.1) is then obtained by solving, for 1 ≤ k ≤ K, the internal stagesAkΞk = Qk −Ck ξk−1⇒(IsN − τk ∂Fk∂y(As ⊗ IN))Ξk = Qk +∂Fk∂y(1s ⊗ IN) ξk−1and then computing the next step of the linearized solution,ξk = qk + ξk−1 −B⊤Ξk.The detailed solution procedure is presented in Algorithm 5.2.Algorithm 5.2. The Linearized Runge-Kutta MethodSet ξ0 = 0N×1 and recall that we definedyk,σ= yk−1 + τkσ∑i=1aσiYk,i.For k = 1, · · · , K:775.1. Regular Time-Stepping Methods• For σ = 1, · · · , s, evaluate the internal stages:Ξk,σ = Qk,σ +∂ f(yk,σ)∂y(ξk−1 + τks∑i=1aσiΞk,i).• Update ξ:ξk = qk + ξk−1 + τks∑σ=1bσΞk,σ.ExamplesThe linearized RK4 method is, for k = 1, · · · , K:• Compute the internal stages for ξk:Ξk,1 = Qk,1 +∂ f (yk−1)∂yξk−1Ξk,2 = Qk,2 +∂ f(yk−1 +τk2Yk,1)∂y(ξk−1 +τk2Ξk,1)Ξk,3 = Qk,3 +∂ f(yk−1 +τk2Yk,2)∂y(ξk−1 +τk2Ξk,2)Ξk,4 = Qk,4 +∂ f (yk−1 + τkYk,3)∂y(ξk−1 + τkΞk,3).• Update ξ:ξk = qk + ξk−1 +τk6(Ξk,1 + 2Ξk,2 + 2Ξk,3 +Ξk,4) .The linearized version of the fourth-order 2-stage implicit RK method mentioned inSection 2.1.2 is• Let G1 =∂ f(yk,1)∂yand G2 =∂ f(yk,2)∂y, withyk,1= yk−1 +τk4Yk,1 +τk24(6− 4√3)Yk,2yk,2= yk−1 +τk24(6 + 4√3)Yk,1 +τk4Yk,2.785.2. Implicit-Explicit Time-Stepping MethodsSolve for the internal stages of ξk: IN − τk4 G1 − τk24(6− 4√3)G1− τk24(6 + 4√3)G2 IN − τk4G2Ξk,1Ξk,2 =Qk,1Qk,2+G1G2 ξk−1• Update ξ:ξk = qk + ξk−1 +τk2(Ξk,1 +Ξk,2) .5.2 Implicit-Explicit Time-Stepping MethodsWe now apply the same procedure to find the linearized IMEX LM and RK methods.5.2.1 IMEX Linear Multistep MethodsLet ξ =[ξ⊤1 · · · ξ⊤K]⊤represent the solution to the linearized forward problem,and let q =[q⊤1 · · · q⊤K]⊤be the source.The K × K block matrix ∂ t∂yrepresents the linearized time-stepping procedureand was found in Section A.2.1 to have the form of (2.1) and (2.3), with the (k, j)thN ×N block being (see (A.83))∂ tk∂yj=α(σ)k−jIN − τ β(σ)k−j∂ fEj∂y− τ γ(σ)k−j∂ fIj∂yif k⋆ ≤ j ≤ k0N×N otherwise,(5.3)where β(σ)0 = 0. We have let σ = min (k, s) and k⋆ = max (1, k − s).To solve (5.1) we compute, for 1 ≤ k ≤ K,∂ tk∂ykξk = qk −k−1∑j=k⋆∂ tk∂yjξj .The solution procedure is summarized in Algorithm 5.3.795.2. Implicit-Explicit Time-Stepping MethodsAlgorithm 5.3. The Linearized Implicit-Explicit Linear Multistep MethodLet σ = min (k, s) and k⋆ = max (1, k − s).For k = 1, · · · , K, solve for ξk:(IN − τ γ(σ)0∂ fIk∂y)ξk = qk −k−1∑j=k⋆(α(σ)k−jIN − τ β(σ)k−j∂ fEj∂y− τ γ(σ)k−j∂ fIj∂y)ξj,with∂ fI/Ek∂y=∂ fI/E (yk)∂y=∂ fI/E (yk, tk)∂y.ExamplesWe find the linearized versions of the IMEX linear multistep methods mentioned in2.2.1.For the linearized IMEX BDF2 method one needs to solve, for k = 2, · · · , K,(IN − τ 23∂ fI (yk)∂y)ξk = qk +43(IN + τ∂ fE (yk−1)∂y)ξk−1+− 13(IN + 2τ∂ fE (yk−2)∂y)ξk−2.For the linearized IMEX BDF3 method one needs to solve, for k = 3, · · · , K,(IN − τ 611∂ fI (yk)∂y)ξk = qk +1811(IN + τ∂ fE (yk−1)∂y)ξk−1+− 911(IN + 2τ∂ fE (yk−2)∂y)ξk−2 +211(IN + 3τ∂ fE (yk−3)∂y)ξk−3.Finally, for the linearized IMEX TVB3 method one needs to solve, for k = 3, · · · , K,(IN − τ 10892048∂ fI (yk)∂y)ξk = qk++(39092048IN + τ1846312288∂ fE (yk−1)∂y− τ 113912288∂ fI (yk−1)∂y)ξk−1+805.2. Implicit-Explicit Time-Stepping Methods+(−13671024IN − τ 1271768∂ fE (yk−2)∂y− τ 3676144∂ fI (yk−2)∂y)ξk−2++(8732048IN + τ823312288∂ fE (yk−3)∂y+ τ169912288∂ fI (yk−3)∂y)ξk−3.5.2.2 IMEX Runge-Kutta MethodsWe let ξ =[ξ̂⊤1 · · · ξ̂⊤K]⊤represent the linearized forward solution, with ξ̂k =[Ξ⊤k ξ⊤k]⊤, whereΞk =[ΞEk,1⊤ΞIk,1⊤ΞEk,2⊤ · · · ΞIk,s−1⊤ΞEk,s⊤]⊤represents the 2s−1 internal stages of the linearized forward solution at the kth timestep. The source q is defined similarly, with internal stages Qk,i.The (s+ 1)KN × (s+ 1)KN matrix ∂ t∂ywas found in A.1.2 to be (see (A.48))∂ t∂y=A1B⊤ INC2 A2−IN B⊤ IN. . . . . . . . .CK AK−IN B⊤ IN,where (see (A.127))Ak = I(2s−1)N − τk ∂ fk∂y(As ⊗ IN) and Ck = −∂ fk∂y(12s−1 ⊗ IN)andB = −τk[bE1 bI1 bE2 · · · bEs−1 bIs−1 bEs]⊤⊗ IN .815.2. Implicit-Explicit Time-Stepping MethodsSee (A.125) for the definition of As.∂ fk∂ywas defined to be∂ fk∂y= blkdiag(∂ fEk,1∂y,∂ fIk,2∂y,∂ fEk,2∂y,∂ fIk,3∂y, · · · , ∂ fEk,s∂y),with fEk,σ = fE(yk,σ−1, tk−1 + cστk), fIk,σ = fI(yk,σ−1, tk−1 + cστk)and yk,σ= yk−1 +τkσ∑i=1aEσ+1,iYEk,i + τkσ∑i=1aIσ,iYIk,i.The system (5.1) is then obtained by solving, for 1 ≤ k ≤ K, the internal stagesAkΞk = Qk −Ck ξk−1⇒(I(2s−1)N − τk ∂ fk∂y(As ⊗ IN))Ξk = Qk +∂ fk∂y(12s−1 ⊗ IN) ξk−1and then computing the next step of the linearized solution,ξk = qk + ξk−1 −B⊤Ξk.The detailed solution procedure is presented in Algorithm 5.4.Algorithm 5.4. The Linearized Implicit-Explicit Runge-Kutta MethodSet ξ0 = 0N×1 and recall that we definedyk,σ= yk−1 + τkσ∑i=1aEσ+1,iYEk,i + τkσ∑i=1aIσ,iYIk,i.For k = 1, · · · , K:• For i = 1, · · · , 2s− 1, evaluate the internal stages:825.2. Implicit-Explicit Time-Stepping Methods◦ if i odd, set σ = ⌈ i2⌉ and compute:ΞEk,σ = Qk,σ +∂ fE(yk,σ−1)∂y(ξk−1 + τkσ−1∑j=1aEσjΞEk,j + τkσ−1∑j=1aIσ−1,jΞIk,j).◦ if i even, set σ = i2and solve:IN − τkaIσσ ∂ fI(yk,σ)∂yΞIk,σ == Qk,σ +∂ fI(yk,σ)∂y(ξk−1 + τkσ∑j=1aEσ+1,jΞEkj + τkσ−1∑j=1aIσjΞIkj).• Computeξk = qk + ξk−1 + τks∑σ=1bEσΞEk,σ + τks−1∑σ=1bIσΞIk,σ.ExampleThe linearized ARS3 scheme is, with γ = 3+√36,yk,1= yk−1 + τk γYEk,1 + τk γYIk,1yk,2= yk−1 + τk((γ − 1)YEk,1 + 2(1− γ)YEk,2 + (1− 2γ)YIk,1 + γYIk,2),and for k = 1, · · · , K:• Compute the internal stages for ξ:◦ Compute ΞEk,1:ΞEk,1 = Qk,1 +∂ fE (yk−1)∂yξk−1;835.3. Staggered Time-Stepping Methods◦ Solve for ΞIk,1:IN − τkγ ∂ fI(yk,1)∂yΞIk,1 = Qk,1 + ∂ fI(yk,1)∂y(ξk−1 + τk γΞEk,1);◦ Compute ΞEk,2:ΞEk,2 = Qk,2 +∂ fE(yk,1)∂y(ξk−1 + τk γΞEk,1 + τk γΞIk,1);◦ Solve for ΞIk,2:IN×1 − τkγ ∂ fI(yk,2)∂yΞIk,2 = Qk,2 + ∂ fI(yk,2)∂y(ξk−1++τk((γ − 1)ΞEk,1 + 2(1− γ)ΞEk,2 + (1− 2γ)ΞIk,1));◦ Compute ΞEk,2:ΞEk,3 = Qk,3 +∂ fE(yk,2)∂y(ξk−1++τk((γ − 1)ΞEk,1 + 2(1− γ)ΞEk,2 + (1− 2γ)ΞIk,1 + γΞIk,2)).• Update ξ:ξk = qk + ξk−1 +τk2(ΞEk,2 +ΞEk,3 +ΞIk,1 +ΞIk,2).5.3 Staggered Time-Stepping MethodsWe use the same approach used in the previous subsections to find the linearizedStagLM and StagRK methods.845.3. Staggered Time-Stepping Methods5.3.1 Staggered Multistep MethodsLet ξ =[ξ⊤1 · · · ξ⊤K]⊤represent the linearized forward solution and q =[q⊤1 · · · q⊤K]⊤the source. We have ξk =[ξuk⊤ξvk+ 12⊤]⊤and likewise for θk.The K ×K block matrix ∂ t∂yis a lower block triangular matrix that has the formof the template matrices (2.1) and (2.3). Its (k, j)th N ×N block is (see (A.158))∂ tk∂yj=α(σ)k−jIN − τ β(σ)k−j∂ fj∂yj− τ β(σ)k−j−1∂ fj+1∂yjif k⋆ ≤ j ≤ k0N×N otherwise,(5.4)with β(σ)ℓ = 0 for ℓ < 0, σ = min (k, s), k⋆ = max (1, k − s), and ∂ fk∂ykand∂ fk+1∂ykdefined as in (A.157).To solve (5.1) we compute, for 1 ≤ k ≤ K,∂ tk∂ykξk = qk −k−1∑j=k⋆∂ tk∂yjξj .The solution procedure is summarized in Algorithm 5.5.Algorithm 5.5. The Linearized Staggered Linear Multistep MethodLet σ = min (k, s), k⋆ = max (1, k − s).For k = 1, · · · , K:• Update ξu:ξuk = quk −k−1∑j=k⋆α(σ)k−jξuj + τk−1∑j=k⋆−1β(σ)k−j−1∂ fu(vj+ 12, tj+ 12)∂vξvj+ 12855.3. Staggered Time-Stepping Methods• Update ξv:ξvk+ 12= qvk+ 12−k−1∑j=k⋆α(σ)k−jξvj+ 12+ τk∑j=k⋆β(σ)k−j∂ fv (uj , tj)∂uξuj .ExamplesWe find the linearized versions of the staggered linear multistep methods mentionedin 2.3.1. The linearized leapfrog method for k = 1, · · · , K isξuk = quk + ξuk−1 + τ∂ fu(vk− 12, tk− 12)∂vξvk− 12ξvk+ 12= qvk+ 12+ ξvk− 12+ τ∂ fv (uk, tk)∂uξuk .The linearized StagBDF3 method for k = 3, · · · , K isξuk = quk +2123ξuk−1 +323ξuk−2 −123ξuk−3 + τ2423∂ fu(vk− 12, tk− 12)∂vξvk− 12ξvk+ 12= qvk+ 12+2123ξvk− 12+323ξvk− 32− 123ξvk− 52+ τ2423∂ fv (uk, tk)∂uξuk .The linearized StagBDF4 method for k = 4, · · · , K isξuk = quk +1722ξuk−1 +922ξuk−2 −522ξuk−3 +122ξuk−4 + τ1211∂ fu(vk− 12, tk− 12)∂vξvk− 12ξvk+ 12= qvk+ 12+1722ξvk− 12+922ξvk− 32− 522ξvk− 52+122ξvk− 72+ τ1211∂ fv (uk, tk)∂uξuk .The linearized StagAB3 scheme for k = 3, · · · , K isξuk = quk + ξuk−1 + τ2524∂ fu(vk− 12, tk− 12)∂vξvk− 12− τ 112∂ fu(vk− 32, tk− 32)∂vξvk− 32++ τ124∂ fu(vk− 52, tk− 52)∂vξvk− 52865.3. Staggered Time-Stepping Methodsξvk+ 12= qvk+ 12+ ξvk− 12+ τ2524∂ fv (uk, tk)∂uξuk − τ112∂ fv (uk−1, tk−1)∂uξuk−1++ τ124∂ fv (uk−2, tk−2)∂uξuk−2.Finally, the linearized StagAB4 scheme for k = 4, · · · , K isξuk = quk + ξuk−1 + τ1312∂ fu(vk− 12, tk− 12)∂vξvk− 12− τ 524∂ fu(vk− 32, tk− 32)∂vξvk− 32++ τ16∂ fu(vk− 52, tk− 52)∂vξvk− 52− τ 124∂ fu(vk− 72, tk− 72)∂vξvk− 72ξvk+ 12= qvk+ 12+ ξvk− 12+ τ1312∂ fv (uk, tk)∂uξuk − τ524∂ fv (uk−1, tk−1)∂uξuk−1++ τ16∂ fv (uk−2, tk−2)∂uξuk−2 − τ124∂ fv (uk−3, tk−3)∂uξuk−3.5.3.2 Staggered Runge-Kutta MethodsWe let ξ =[ξ̂⊤1 · · · ξ̂⊤K]⊤represent the adjoint solution, with ξ̂k =[Ξ⊤k ξ⊤k]⊤,whereξk =[ξuk⊤ξvk+ 12⊤]⊤and Ξk,σ =[Ξuk,σ⊤Ξvk+ 12,σ⊤]⊤,the latter representing the s internal stages of the linearized forward solution at thekth time-step. The source q is defined similarly, with internal stages Qk,σ.The linearization of the time-stepping equation,∂ t∂y, was derived in Section A.3.2,see (A.196):∂ t∂y⊤=A1 C1,1B⊤ INC2,1 A2 C2,2−IN B⊤ IN. . . . . . . . .CK,K−1 AK CK,K−IN B⊤ IN.875.3. Staggered Time-Stepping MethodsAk and Ck,j were defined in (A.195) to beAk = IsN − τGo/ek (As ⊗ IN)Ck,j = −∂Fo/ek∂yjB = Bo/e;see Section A.3.2 for the definitions of Go/ek and∂Fo/ek∂yj, which contain the derivativesof fu and fv with respect to u and v. Important for this is the shorthandvo/eU,k− 12,σ= vk− 12+ τσ−1∑i=1i evenaσiUo/ek,i , vo/eV,k− 12,σ= vk− 12+ τσ−1∑i=1i oddaσiVo/ek+ 12,iuo/eU,k−1,σ = uk−1 + τσ−1∑i=1i oddaσiUo/ek,i , uo/eV,k,σ = uk + τσ−1∑i=1i evenaσiVo/ek+ 12,i,where o/e denotes whether the number of stages s is odd or even.While not apparent from the form above,∂ t∂yis essentially lower (block) triangularbecause we can swap rows and columns to make it lower block triangular, which isin fact what we do to get to the algorithm below. Conceptually, we solve (5.1) asfollows:AkΞk = Qk +Ck,kξk +Ck,k−1ξk−1ξk = qk + ξk−1 +B⊤Ξk(5.5)for i, · · · , K. The detailed solution procedure for s odd is summarized in Algorithm6.6.Algorithm 5.6. The Linearized Staggered Runge-Kutta Method, with s odd885.3. Staggered Time-Stepping MethodsSet ξ0 = 0N×1. Recall that (see (2.44))vU,k− 12,σ = vk− 12+ τσ−1∑i=1i evenaσiUk,i, vV,k− 12,σ = vk− 12+ τσ−1∑i=1i oddaσiVk+ 12,iuU,k−1,σ = uk−1 + τσ−1∑i=1i oddaσiUk,i, uV,k,σ = uk + τσ−1∑i=1i evenaσiVk+ 12,i.For k = 1, · · · , K:• For σ = 1, · · · , s, evaluate the internal stages for ξu:Ξuk,σ = Quk,σ +∂ fu(vU,k−1/2,σ)∂vξvk− 12+ τσ−1∑i=1i evenaσiΞuk,i if σ odd∂ fv(uU,k−1,σ)∂uξuk−1 + τ σ−1∑i=1i oddaσiΞuk,i if σ even• Computeξuk = quk + ξuk−1 + τs∑σ=1σ oddbσΞuk,σ.• For σ = 1, · · · , s, evaluate the internal stages for ξv:Ξvk+ 12,σ = Qvk+ 12,σ+∂ fv(uV,k,σ)∂uξuk + τ σ−1∑i=1i evenaσiΞvk+ 12,i if σ odd∂ fu(vV,k− 12,σ)∂vξvk− 12+ τσ−1∑i=1i oddaσiΞvk+ 12,i if σ even• Computeξvk+ 12= qvk+ 12+ ξvk− 12+ τs∑σ=1σ oddbσΞvk+ 12,σ.The solution procedure for even s is similar, see Section 2.3.2 and Appendix A.3.2 for895.3. Staggered Time-Stepping Methodsguidance.ExampleThe linearized forward time-stepping method corresponding to the 5-stage StagRK4method with parameter b5 and γ = (6b5)−1/2 (see Section 2.3.2) is:For k = K, · · · , 1:• For σ = 1, · · · , s, evaluate the internal stages for ξu:Ξuk,1 = Quk,1 +∂ fu(vk− 12)∂vξvk− 12Ξuk,2 = Quk,2 +∂ fv(uk−1 +τ4(2− γ)Uk,1)∂u(ξuk−1 +τ4(2− γ)Ξuk,1)Ξuk,3 = Quk,3 +∂ fu(vk− 12− τ2γUk,2)∂v(ξvk− 12− τ2γΞuk,2)Ξuk,4 = Quk,4 +∂ fv(uk−1 +τ4(2 + γ)Uk,1)∂u(ξuk−1 +τ4(2 + γ)Ξuk,1)Ξuk,5 = Quk,5 +∂ fu(vk− 12+τ2γUk,4)∂v(ξvk− 12+τ2γΞuk,4)• Update ξu:ξuk = quk + ξuk−1 + τ (1− 2b5)Ξuk,1 + τ b5Ξuk,3 + τ b5Ξuk,5.• For σ = 1, · · · , s, evaluate the internal stages for ξv:Ξvk+ 12,1 = Qvk+ 12,1+∂ fv (uk)∂uξukΞvk+ 12,2 = Qvk+ 12,2+∂ fu(vk− 12+τ4(2− γ)Vk+ 12,1)∂v(ξvk− 12+τ4(2− γ)Ξvk+ 12,1)Ξvk+ 12,3= Qvk+ 12,3+∂ fv(uk − τ2γVk+ 12,2)∂u(ξuk −τ2γΞvk+ 12,2)905.3. Staggered Time-Stepping MethodsΞvk+ 12,4 = Qvk+ 12,4+∂ fu(vk− 12+τ4(2 + γ)Vk+ 12,1)∂v(ξvk− 12+τ4(2 + γ)Ξvk+ 12,1)Ξvk+ 12,5 = Qvk+ 12,5+∂ fv(uk +τ2γVk+ 12,4)∂u(ξuk +τ2γΞvk+ 12,4)• Update ξv:ξvk+ 12= qvk+ 12+ ξvk− 12+ τ(1− 2b5)Ξvk+ 12,1 + τ b5Ξvk+ 12,3 + τ b5Ξvk+ 12,5.Letting b5 = 1/24 (and therefore γ = 2), the internal stages for ξu areΞuk,1 = Quk,1 +∂ fu(vk− 12)∂vξvk− 12Ξuk,2 = Quk,2 +∂ fv (uk−1)∂uξuk−1Ξuk,3 = Quk,3 +∂ fu(vk− 12− τUk,2)∂v(ξvk− 12− τΞuk,2)Ξuk,4 = Quk,4 +∂ fv (uk−1 + τUk,1)∂u(ξuk−1 + τΞuk,1)Ξuk,5 = Quk,5 +∂ fu(vk− 12+ τUk,4)∂v(ξvk− 12+ τΞuk,4)and the internal stages for ξv areΞvk+ 12,1 = Qvk+ 12,1+∂ fv (uk)∂uξukΞvk+ 12,2 = Qvk+ 12,2+∂ fu(vk− 12)∂vξvk− 12Ξvk+ 12,3 = Qvk+ 12,3+∂ fv(uk − τVk+ 12,2)∂u(ξuk − τΞvk+ 12,2)Ξvk+ 12,4 = Qvk+ 12,4+∂ fu(vk− 12+ τVk+ 12,1)∂v(ξvk− 12+ τΞvk+ 12,1)Ξvk+ 12,5= Qvk+ 12,5+∂ fv(uk + τVk+ 12,4)∂u(ξuk + τΞvk+ 12,4).915.4. Exponential Time-Differencing MethodsThe update steps are obvious.5.4 Exponential Time-Differencing MethodsThe linearized ETDRK method is somewhat more difficult to obtain if the linearoperator Lk happens to depend on yk, as is the case with exponential Rosenbrockmethods. Therefore we illustrate the linearized ETDRK method using an example.5.4.1 Exponential Runge-Kutta MethodThe linear system to be solved isA1−B⊤1 INC2 A2D2 −B⊤2 IN. . . . . . . . .Ck−1 Ak−1Dk−1 −B⊤k−1 INΞ1ξ1Ξ2...ξK−1Ξk−1ξk−1=Q1q1Q2...qK−1Qk−1qk−1, (5.6)with Ak, Ck and Dk defined in (A.229) and B⊤k = τk[b1(τkLk) · · · bs(τkLk)]. Sincethe system is block-lower triangular we use forward substitution to getξk = qk +B⊤kΞk −Dkξkfor k = 1, . . . , K and internal stagesAkΞk = Qk −Ckξk,with ξ0 = 0N×1.925.4. Exponential Time-Differencing MethodsSubstituting (A.229) then givesξk = qk +B⊤kΞk +∂(eτkLkyk)∂ykξk +∂(B⊤kYk)∂ykξkfor k = 1, · · · , K, with the internal stagesΞk = Qk +∂Nk∂ykξk + τk∂Nk∂yAskΞk.The detailed solution procedure is summarized in Algorithm 5.7.Algorithm 5.7. The Linearized Exponential Runge-Kutta MethodIgnore the terms in braces if Lk does not depend on yk.Set ξ0 = 0N×1. For k = 1, · · · , K:• For σ = 1, · · · , s, compute the internal stagesΞk,σ = Qk,i +∂nk,i∂y(ecστkLk−1ξk−1 + τki−1∑j=1aσi(τkLk−1)Ξk,j)++{∂nk,i∂yk−1ξk−1 +∂nk,i∂y(∂ (ecστkLk−1(yk−1)yfixedk−1 )∂yk−1ξk−1 ++ τki−1∑j=1∂(aσi(τkLk−1(yk−1))Yk,j)∂yk−1ξk−1)}. (5.7a)935.4. Exponential Time-Differencing Methods• Then compute the update:ξk = qk + eτkLk−1ξk−1 + τks∑σ=1bσ(τkLk−1)Ξk,i++{∂ (eτkLk−1(yk−1)yfixedk−1 )∂yk−1ξk−1++τks∑σ=1∂(bσ(τkLk−1(yk−1))Yk,i)∂yk−1ξk−1}.(5.7b)The derivatives in the braces in (5.7) can be evaluated by writing aσi and bσin terms of ϕ-functions and then differentiating the ϕ-functions as in Section 4.4.2.Without the terms in braces we simply have the standard ETDRK method where thenonlinear term has been linearized.5.4.2 Application to Krogstad’s SchemeWe show how the above algorithm would look when applied to the scheme proposedin [72]. Recall that the matrix coefficients aσi and bσ are given in (2.63) in terms ofthe ϕ-functions. The linearized Krogstad scheme is presented in Algorithm 5.8.Algorithm 5.8. The Linearized Krogstad SchemeSet ξ0 = 0N×1. Let ϕℓ = ϕℓ(τkLk−1(yk−1)) and ϕℓ,i = ϕℓ,i(τkLk−1(yk−1)), andignore the terms in braces if Lk−1 does not depend on yk. For k = 1, · · · , K:945.4. Exponential Time-Differencing Methods• Compute the internal stagesΞk,1 = Qk,1 +∂nk,1∂yξk−1 +{∂nk,1∂yk−1ξk−1}Ξk,2 = Qk,2 +∂nk,2∂y(ϕ0,2ξk−1 +τk2ϕ1,2Ξk,1)++{∂nk,2∂y(∂ (ϕ0,2yfixedk−1 )∂yk−1+τk2∂(ϕ1,2Yk,1)∂yk−1)ξk−1 +∂nk,2∂yk−1ξk−1}Ξk,3 = Qk,3 +∂nk,3∂y(ϕ0,3ξk−1 +τk2ϕ1,3Ξk,1 + τkϕ2,3 (Ξk,2 −Ξk,1))++{∂nk,3∂yk−1ξk−1 +∂nk,3∂y(∂(ϕ0,3yfixedk−1 )∂yk−1+τk2∂(ϕ1,3Yk,1)∂yk−1)ξk−1}+{τk∂nk,3∂y(∂(ϕ2,3(Yk,2 −Yk,1))∂yk−1)ξk−1}Ξk,4 = Qk,4 +∂nk,4∂y(ϕ0ξk−1 + τkϕ1,4Ξk,1 + 2τkϕ2,4 (Ξk,3 −Ξk,1))++{∂nk,4∂yk−1ξk−1 +∂nk,4∂y(∂ (ϕ0yfixedk−1 )∂yk−1+ τk∂(ϕ1,4Yk,1)∂yk−1)ξk−1}++{2τk∂nk,4∂y(∂(ϕ2,4(Yk,3 −Yk,1))∂yk−1)ξk−1}• Compute the update:ξk = qk + ϕ0ξk−1 + τkϕ2(−3Ξk,1 + 2Ξk,2 + 2Ξk,3 −Ξk,4)++ τkϕ1Ξk,1 + 4τkϕ3(Ξk,1 − Ξk,2 − Ξk,3 +Ξk,4)++{∂ (ϕ0yfixedk−1 )∂yk−1ξk−1 + τk∂(ϕ1Yk,1)∂yk−1ξk−1++ τk∂(ϕ2(−3Yk,1 + 2Yk,2 + 2Yk,3 −Yk,4))∂yk−1ξk−1++ 4τk∂(ϕ3(Yk,1 −Yk,2 −Yk,3 +Yk,4))∂yk−1ξk−1}.The way the scheme is written here may of course not represent the most compu-tationally efficient implementation. For instance, there are some opportunities forparallelization and one should also exploit the fact that c2 = c3 = 1/2.95Chapter 6The Adjoint ProblemThe major ingredient needed when calculating the action of the transpose of thesensitivity matrix is the solution of the adjoint problem,λ =(∂ t∂y)−⊤θ, (6.1)where λ is the adjoint solution and θ is the adjoint source. In this chapter the adjointsource is taken to be some arbitrary adjoint source, but for the gradient computationwe will have θ =∂d∂y⊤∇dM. The adjoint solution is a crucial component in thecomputation of the gradient and the action of the Hessian.The following remarks are in order:• The structure of the linearization of the time-stepping vector is lower block trian-gular (or essentially lower block triangular in the case of staggered methods), so theadjoint problem will require the solution of an upper block triangular system. Thisis done using backsubstitution, which corresponds to solving the adjoint problembackward in time, starting from a final condition.• The structure of (6.1) dictates that the final value of the adjoint solution must beλK+1 = 0N×1. The value of λK is that of the adjoint source at the final time-step,so in the literature this is generally considered to be the final condition.• As was the case when computing the linearized forward solution, the linearizationof the nonlinear operators about the current values of the forward solution meansthat we must have the forward solution available, either in storage or computed onthe fly using checkpointing.966.1. Regular Time-Stepping Methods• We do not have a source weighting operator in (6.1), so for LM-type methods wedo not need to combine source values from different time levels.• We do not assume any structure on q, so it is possible that RK-type methods willhave a source term appearing in the update step as well as in the internal stages.We will now show the details of the adjoint time-stepping procedure to solve (6.1)for each of the time-stepping methods under consideration. For each of the time-stepping methods we will also give examples of particular schemes, based on theexamples given in Chapter 2.6.1 Regular Time-Stepping MethodsIn this section we present the adjoint LM and RK methods, and give examples ofparticular adjoint schemes.As discussed in the introduction to the thesis, the results presented in this sectionare not new. Adjoint LM methods were discussed in [101, 102] and in fact variabletime-steps were considered in those papers; we only consider constant time-steps dueto related to simplicity and consistency of the resulting adjoint methods. Adjoint RKmethods are discussed in several references, for instance [3, 52, 100, 131] and severalothers.Nonetheless, the adjoint LM and RKmethods are included here for several reasons:• The approach used to derive the adjoint methods is different from the one usedin the references above. The use of an abstract representation of a time-steppingmethod using a time-stepping vector t allows for a general approach to findingadjoint time-stepping methods; a large number of different time-stepping methodscan be tackled in this way.• For the sake of completeness and readability it is useful to have the adjoint LMand RK methods in the same chapter as the other adjoint time-stepping methods.976.1. Regular Time-Stepping Methods• Connected to the above point, having the adjoint LM and RKmethods available willhelp the reader better understand the derivation of the more complicated adjointIMEX, staggered and exponential time-stepping methods.6.1.1 Linear Multistep MethodsLet λ =[λ⊤1 · · · λ⊤K]⊤represent the adjoint solution and θ =[θ⊤1 · · · θ⊤K]⊤theadjoint source.The transpose of the K×K block matrix ∂ t∂yis an upper block triangular matrixthat has the form of the transposes of the template matrices (2.1) and (2.3). Its(k, j)th N ×N block is (see (A.3))∂ tj∂yk⊤=α(σ)j−kIN − τ β(σ)j−k∂ fk∂y⊤if k ≤ j ≤ k⋆0N×N otherwise,(6.2)where σ = min (j, s) and k⋆ = min (K, k + s).To solve (6.1) we must solve, for 1 ≤ k ≤ K,∂ tk∂yk⊤λk = θk −k⋆∑j=k+1∂ tj∂yk⊤λj .The solution procedure is summarized in Algorithm 6.1.Algorithm 6.1. The Adjoint Linear Multistep MethodLet σ = min (k, s) and k⋆ = max (1, k − s).For k = K, · · · , 1, solve for λk:(IN − τ β(σ)0∂ fk∂y⊤)λk = θk −k⋆∑j=k+1α(σ)j−kλj + τ∂ fk∂y⊤(k⋆∑j=k+1β(σ)j−kλj)986.1. Regular Time-Stepping Methodswith∂ fk∂y=∂ f (yk)∂y=∂ f (yk, tk)∂y.For explicit methods the left-hand side is simply λk, whereas for implicit methods oneneeds to solve a N×N linear system at each time-step. Note that the right-hand siderequires just one matrix-vector product, in contrast to the linearized forward linearmultistep method, where s matrix-vector products are needed on the right-hand sideat each time level.ExamplesWe find the adjoint time-stepping methods corresponding to the linear multistepmethods mentioned in Section 2.1.1. In all cases we have λk = 0N×1 for k > K.The adjoint s-step Adams-Bashforth methods, with s = 1, 2, 3 and for k =K, · · · ,max 1, s− 1, ares = 1 : λk = θk + λk+1 + τ∂ f (yk)∂y⊤λk+1s = 2 : λk = θk + λk+1 +τ2∂ f (yk)∂y⊤(3λk+1 − λk+2)s = 3 : λk = θk + λk+1 +τ12∂ f (yk)∂y⊤(23λk+1 − 16λk+2 + 5λk+3) .For the adjoint s-step BDFs, with s = 1, 2, 3, we solve, for k = K, · · · ,max1, s− 1,s = 1 :(IN − τ ∂ f (yk)∂y)λk = θk + λk+1s = 2 :(IN − τ 23∂ f (yk)∂y)λk = θk +43λk+1 − 13λk+2s = 3 :(IN − τ 611∂ f (yk)∂y)λk = θk +1811λk+1 − 911λk+2 +211λk+3.Note that the BDFs are essentially the same as the linearized forward BDFs exceptthat each ξk−i is replaced by λk+i, i.e. we compute the linearized forward solution,just backward in time.996.1. Regular Time-Stepping Methods6.1.2 Runge-Kutta MethodsWe let λ =[λ̂⊤1 · · · λ̂⊤K]⊤represent the adjoint solution, with λ̂k =[Λ⊤k λ⊤k]⊤,where Λ⊤k =[Λ⊤k,1 · · · Λ⊤k,s]⊤represents the internal stages of the adjoint solutionat the kth time step. The adjoint source θ is defined similarly, with internal stagesΘk,σ.The linearization of the time-stepping equation, ∂ t∂ y, was derived in Section A.1.2.Its transpose is (see (A.51))∂ t∂y⊤=A⊤1 BIN C⊤2 −INA⊤2 B. . . . . . . . .IN C⊤K −INA⊤K BIN.where (see (A.47))Ak = IsN − τk ∂Fk∂y(As ⊗ IN) and Ck = −∂Fk∂y(1s ⊗ IN)and B = −τkbs ⊗ IN . ∂Fk∂ywas defined to be∂Fk∂y= blkdiag(∂Fk,1∂y, · · · , ∂Fk,s∂y)with Fk,σ = f(yk,σ, tk−1 + cστk)and yk,σ= yk−1 + τk∑si=1 aσ,iYk,i.Since∂ t∂y⊤is upper (block) triangular, we solve the adjoint system (6.1) by back-1006.1. Regular Time-Stepping Methodssubstitution:λk = θk + λk+1 −C⊤k+1Λk+1= θk + λk+1 +(1⊤s ⊗ IN) ∂Fk+1∂y⊤Λk+1Λk = A−⊤k (Θk −Bλk)=(IsN − τk ∂Fk∂y(As ⊗ IN))−⊤(Θk −Bλk) .(6.3)The detailed solution procedure is summarized in Algorithm 6.2.Algorithm 6.2. The Adjoint Runge-Kutta MethodLet λK+1 = 0N×1 and ΛK+1 = 0sN×1, and recall that we definedyk,σ= yk−1 + τkσ∑i=1aσiYk,i.Denote Λ̂k,σ =∂ f(yk,σ)∂y⊤Λk,σ. For k = K, · · · , 1:• Update λ:λk = θk + λk+1 +s∑σ=1Λ̂k+1,σ• Compute, or solve for, the (modified) internal stages for λk−1. For σ = s, · · · , 1:Λk,σ = Θk,σ + τkbσλk + τks∑i=1aiσΛ̂k,iΛ̂k,σ =∂ f(yk,σ)∂y⊤Λk,σ.The adjoint method is explicit, diagonally implicit or fully implicit in accordance withthe corresponding forward method. For explicit methods the summation ranges fromσ + 1 to s.1016.1. Regular Time-Stepping MethodsExampleThe adjoint RK4 method is, for k = K, · · · , 1 and with λK+1 = 0N×1 and ΛK+1 =0sN×1:• Update λ:λk = θk + λk+1 + Λ̂k+1,1 + Λ̂k+1,2 + Λ̂k+1,3 + Λ̂k+1,4;• Compute the (modified) internal stages for λk−1. For σ = s, · · · , 1:Λ̂k,4 =∂ f (yk−1 + τkYk,3)∂y⊤ (Θk,4 +τk6λk)Λ̂k,3 =∂ f(yk−1 +τk2Yk,2)∂y⊤ (Θk,3 +τk3λk + τk Λ̂k,4)Λ̂k,2 =∂ f(yk−1 +τk2Yk,1)∂y⊤ (Θk,2 +τk3λk +τk2Λ̂k,3)Λ̂k,1 =∂ f (yk−1)∂y⊤ (Θk,1 +τk6λk +τk2Λ̂k,2)Note that we do not actually require the 4th internal stage Yk,4 of yk in the compu-tation due to the fact that the RK4 method is explicit.The adjoint time-stepping method corresponding to the fourth-order 2-stage implicitRK method mentioned in Section 2.1.2 is, for k = K, · · · , 1 and with λK+1 = 0N×1and ΛK+1 = 0sN×1:• Update λ:λk = θk + λk+1 + Λ̂k+1,1 + Λ̂k+1,2;• Let G1 =∂ f(yk,1)∂yand G2 =∂ f(yk,2)∂y, withyk,1= yk−1 +τk4Yk,1 +τk24(6− 4√3)Yk,21026.2. Implicit-Explicit Time-Stepping Methodsyk,2= yk−1 +τk24(6 + 4√3)Yk,1 +τk4Yk,2.Solve for the internal stages of λk−1: IN − τk4 G⊤1 − τk24(6 + 4√3)G⊤2− τk24(6− 4√3)G⊤1 IN −τk4G⊤2Λk,1Λk,2 =Θk,1Θk,2+ τk2λkλk ,then let Λ̂k,1 = G⊤1 Λk,1 and Λ̂k,2 = G⊤2 Λk,2.6.2 Implicit-Explicit Time-Stepping MethodsThis section presents the adjoint IMEX LM and IMEX RK time-stepping methods.6.2.1 IMEX Linear Multistep MethodsLet λ =[λ⊤1 · · · λ⊤K]⊤represent the adjoint solution and θ =[θ⊤1 · · · θ⊤K]⊤theadjoint source.The transpose of the K×K block matrix ∂ t∂yis an upper block triangular matrixthat has the form of the transposes of the template matrices (2.1) and (2.3). Its(k, j)th N ×N block is (see (A.83))∂ tj∂yk=α(σ)j−kIN − τ β(σ)j−k∂ fEk∂y− τ γ(σ)j−k∂ fIk∂yif k ≤ j ≤ k⋆0N×N otherwise,(6.4)with β(σ)0 = 0, σ = min (j, s) and k⋆ = min (K, k + s).To solve (6.1) we must solve, for 1 ≤ k ≤ K,∂ tk∂yk⊤λk = θk −k⋆∑j=k+1∂ tj∂yk⊤λj .1036.2. Implicit-Explicit Time-Stepping MethodsThe solution procedure is summarized in Algorithm 6.3.Algorithm 6.3. The Adjoint Implicit-Explicit Linear Multistep MethodLet σ = min (k, s) and k⋆ = max (1, k − s).For k = K, · · · , 1, solve for λk:(IN − τ γ(σ)0∂ fIk∂y⊤)λk = θk −k⋆∑j=k+1α(σ)j−kλj + τ∂ fEk∂y⊤( k⋆∑j=k+1β(σ)j−kλj)++ τ∂ fIk∂y⊤( k⋆∑j=k+1γ(σ)j−kλj)with∂ fI/Ek∂y=∂ fI/E (yk)∂y=∂ fI/E (yk, tk)∂y.The right-hand side requires just two matrix-vector products, in contrast to the lin-earized forward IMEX linear multistep method, where 2s matrix-vector products areneeded on the right-hand side at each time level.ExamplesWe find the adjoint time-stepping methods corresponding to the IMEX linear multi-step methods mentioned in Section 2.2.1. In all cases we have λk = 0N×1 for k > K.For the adjoint IMEX BDF2 method one needs to solve, for k = 2, · · · , K,(IN − τ 23∂ fI (yk)∂y)λk = θk +(43λk+1 − 13λk+2)++ τ∂ fE (yk)∂y⊤(43λk+1 − 23λk+2).For the adjoint IMEX BDF3 method one needs to solve, for k = 3, · · · , K,(IN − τ 23∂ fI (yk)∂y)λk = θk +(1811λk+1 − 911λk+2 +211λk+3)+1046.2. Implicit-Explicit Time-Stepping Methods+ τ∂ fE (yk)∂y⊤(1811λk+1 − 1811λk+2 +611λk+3).Finally, for the adjoint IMEX TVB3 method one needs to solve, for k = 3, · · · , K,(IN − τ 10892048∂ fI (yk)∂y)λk = θk +(39092048λk+1 − 13671024λk+2 +8732048λk+3)++∂ fE (yk)∂y⊤(1846312288λk+1 − 1271768λk+2 +823312288λk+3)++∂ fI (yk)∂y⊤(− 113912288λk+1 − 3676144λk+2 +169912288λk+3).6.2.2 IMEX Runge-Kutta MethodsWe let λ =[λ̂⊤1 · · · λ̂⊤K]⊤represent the adjoint solution, with λ̂k =[Λ⊤k λ⊤k]⊤,whereΛk =[ΛEk,1⊤ΛIk,1⊤ΛEk,2⊤ · · · ΛIk,s−1⊤ΛEk,s⊤]⊤represents the 2s− 1 internal stages of the adjoint solution at the kth time step. Theadjoint source θ is defined similarly, with internal stages Θk,i.The linearization of the time-stepping equation,∂ t∂y, was derived in Section A.2.2.Its transpose is (see (A.131))∂ t∂y⊤=A⊤1 BIN C⊤2 −INA⊤2 B. . . . . . . . .IN C⊤K −INA⊤K BIN.1056.2. Implicit-Explicit Time-Stepping Methodswhere (see (A.127))Ak = I(2s−1)N − τk ∂ fk∂y(As ⊗ IN) and Ck = −∂ fk∂y(12s−1 ⊗ IN)andB = −τk[bE1 bI1 bE2 · · · bEs−1 bIs−1 bEs]⊤⊗ IN .See (A.125) for the definition of As.∂ fk∂ywas defined to be∂ fk∂y= blkdiag(∂ fEk,1∂y,∂ fIk,2∂y,∂ fEk,2∂y,∂ fIk,3∂y, · · · , ∂ fEk,s∂y),with fEk,σ = fE(yk,σ−1, tk−1 + cστk), fIk,σ = fI(yk,σ−1, tk−1 + cστk)and yk,σ= yk−1 +τkσ∑i=1aEσ+1,iYEk,i + τkσ∑i=1aIσ,iYIk,i.Since∂ t∂y⊤is upper (block) triangular, we solve the adjoint system (6.1) by back-substitution:λk = θk + λk+1 −C⊤k+1Λk+1= θk + λk+1 +(1⊤2s−1 ⊗ IN) ∂Fk+1∂y⊤Λk+1Λk = A−⊤k (Θk −Bλk)=(I(2s−1)N − τk ∂ fk∂y(As ⊗ IN))−⊤(Θk −Bλk) .(6.5)The detailed solution procedure is summarized in Algorithm 6.4.Algorithm 6.4. The Adjoint Implicit-Explicit Runge-Kutta MethodLet λK+1 = 0N×1 and ΛK+1 = 0(2s−1)N×1, and recall that we definedyk,σ= yk−1 + τkσ∑i=1aEσ+1,iYEk,i + τkσ∑i=1aIσ,iYIk,i.1066.2. Implicit-Explicit Time-Stepping MethodsDenote Λ̂Ek,σ =∂ fE(yk,σ−1)∂y⊤ΛEk,σ and Λ̂Ik,σ =∂ fI(yk,σ)∂y⊤ΛIk,σ.For k = K, · · · , 1:• Update λ:λk = θk + λk+1 +s∑σ=1Λ̂Ek+1,σ +s−1∑σ=1Λ̂Ik+1,σ• Compute the internal stages for λk. For i = 2s− 1, · · · , 1,◦ if i odd, let σ = ⌈ i2⌉ and computeΛEk,σ = ΘEk,σ + τk bEσλk + τks∑j=σ+1aEjσΛ̂Ik,j−1 + τks∑j=σ+1aEjσΛ̂Ek,jΛ̂Ek,σ =∂ fE(yk,σ−1)∂y⊤ΛEk,σ;◦ if i even, let σ = i2and solveIN − τk aIσσ ∂ fI(yk,σ)∂y⊤ΛIk,σ = ΘIk,σ + τk bIσλk++ τks−1∑j=σ+1aIjσΛ̂Ik,j + τks−1∑j=σaIjσΛ̂Ek,j+1Λ̂Ik,σ =∂ fI(yk,σ)∂y⊤ΛIk,σ.ExampleThe adjoint ARS3 method is, with γ = 3+√36,yk,1= yk−1 + τk γYEk,1 + τk γYIk,1yk,2= yk−1 + τk((γ − 1)YEk,1 + 2(1− γ)YEk,2 + (1− 2γ)YIk,1 + γYIk,2),1076.2. Implicit-Explicit Time-Stepping Methodsand for k = K, · · · , 1:• Update λ:λk = θk + λk+1 + Λ̂Ek+1,1 + Λ̂Ek+1,2 + Λ̂Ek+1,3 + Λ̂Ik+1,1 + Λ̂Ik+1,2.• Compute the (modified) internal stages for λ:◦ Compute Λ̂Ek,3:Λ̂Ek,3 =∂ fE(yk,2)∂y⊤ (ΘEk,3 +τk2λk);◦ Solve for Λ̂Ik,2:IN − τk γ ∂ fI(yk,2)∂y⊤ΛIk,2 = ΘIk,2 + τk2 λk + τk γΛ̂Ek,3Λ̂Ik,2 =∂ fI(yk,2)∂y⊤ΛIk,2;◦ Compute Λ̂Ek,2:Λ̂Ek,2 =∂ fE(yk,1)∂y⊤ (ΘEk,2 +τk2λk + 2τk (1− γ)(Λ̂Ik,2 + Λ̂Ek,3));◦ Solve for Λ̂Ik,1:IN − τk γ ∂ fI(yk,1)∂y⊤ΛIk,1 = ΘIk,1 + τk2 λk + τkγΛ̂Ek,2++ τk(1− 2γ)(Λ̂Ik,2 + Λ̂Ek,3)Λ̂Ik,1 =∂ fI(yk,1)∂y⊤ΛIk,1;1086.3. Staggered Time-Stepping Methods◦ Compute Λ̂Ek,1:Λ̂Ek,1 =∂ fE (yk−1)∂y⊤ (ΘEk,1 + τk γ(Λ̂Ik,1 + Λ̂Ek,2)+ τk(γ − 1)(Λ̂Ik,2 + Λ̂Ek,3)).6.3 Staggered Time-Stepping MethodsIn this section we discuss the adjoint StagLM and StagRK time-stepping methods.6.3.1 Staggered Linear Multistep MethodsLet λ =[λ⊤1 · · · λ⊤K]⊤represent the adjoint solution and θ =[θ⊤1 · · · θ⊤K]⊤theadjoint source. We have λk =[λuk⊤λvk+ 12⊤]⊤and likewise for θk.The transpose of the K×K block matrix ∂ t∂yis an upper block triangular matrixthat has the form of the transposes of the template matrices (2.1) and (2.3). Its(k, j)th N ×N block is (see (A.158))∂ tj∂yk=α(σ)j−kIN − τ β(σ)j−k∂ fk∂yk⊤− τ β(σ)j−k−1∂ fk+1∂yk⊤if k ≤ j ≤ k⋆0N×N otherwise,(6.6)with β(σ)ℓ = 0 for ℓ < 0, σ = min (j, s), k⋆ = min (K, k + s), and∂ fk∂ykand∂ fk+1∂ykdefined as in (A.157).To solve (6.1) we must solve, for 1 ≤ k ≤ K,∂ tk∂yk⊤λk = θk −k⋆∑j=k+1∂ tj∂yk⊤λj .The solution procedure is summarized in Algorithm 6.5.1096.3. Staggered Time-Stepping MethodsAlgorithm 6.5. The Adjoint Staggered Linear Multistep MethodLet σ = min (j, s) and k⋆ = min (K, k + s).For k = K, · · · , 1:• Update λv:λvk+ 12= θvk+ 12−k⋆∑j=k+1α(σ)j−kλvj+ 12+ τ∂ fu(vk+ 12, tk+ 12)∂v⊤(k⋆∑j=k+1β(σ)j−k−1λuj).• Update λu:λuk = θuk −k⋆∑j=k+1α(σ)j−kλuj + τ∂ fv (uk, tk)∂u⊤( k⋆∑j=kβ(σ)j−kλvj+ 12)The right-hand side requires just two matrix-vector products, in contrast to the lin-earized forward staggered linear multistep method, where 2s matrix-vector productsare needed on the right-hand side at each time level.ExamplesWe find the adjoint time-stepping methods of the staggered linear multistep methodsmentioned in Section 2.3.1. In all cases we have λvk+ 12= 0N×1 and λuk = 0 for k > K.The adjoint leapfrog method for k = K, · · · , 1 isλvk+ 12= θvk+ 12+ λvk+ 32+ τ∂ fu(vk+ 12, tk+ 12)∂v⊤λuk+1λuk = θuk + λuk+1 + τ∂ fv (uk, tk)∂u⊤λvk+ 12.1106.3. Staggered Time-Stepping MethodsThe adjoint StagBDF3 method for k = K, · · · , 3 isλvk+ 12= θvk+ 12+2123λvk+ 32+323λvk+ 52− 123λvk+ 72+ τ2423∂ fu(vk+ 12, tk+ 12)∂v⊤λuk+1λuk = θuk +2123λuk+1 +323λuk+2 −123λuk+3 + τ2423∂ fv (uk, tk)∂u⊤λvk+ 12.The adjoint StagBDF4 method for k = K, · · · , 4 isλvk+ 12= θvk+ 12+1722λvk+ 32+922λvk+ 52− 522λvk+ 72+122λvk+ 92++ τ1211∂ fu(vk+ 12, tk+ 12)∂v⊤λuk+1λuk = θuk +1722λuk+1 +922λuk+2 −522λuk+3 +122λuk+4 + τ1211∂ fv (uk, tk)∂u⊤λvk+ 12.The adjoint StagAB3 scheme for k = K, · · · , 3 isλvk+ 12= θvk+ 12+ λvk+ 32+ τ∂ fu(vk+ 12, tk+ 12)∂v⊤(2524λuk+1 −112λuk+2 +124λuk+3)λuk = θuk + λuk+1 + τ∂ fv (uk, tk)∂u⊤(2524λvk+ 12− 112λvk+ 32+124λvk+ 52).Finally, the adjoint StagAB4 scheme for k = K, · · · , 4 isλvk+ 12= θvk+ 12+ λvk+ 32+τ24∂ fu(vk+ 12, tk+ 12)∂v⊤ (26λuk+1 − 5λuk+2 + 4λuk+3 − λuk+4)λuk = θuk + λuk+1 +τ24∂ fv (uk, tk)∂u⊤ (26λvk+ 12− 5λvk+ 32+ 4λvk+ 52− λvk+ 72).6.3.2 Staggered Runge-Kutta MethodsWe let λ =[λ̂⊤1 · · · λ̂⊤K]⊤represent the adjoint solution, with λ̂k =[Λ⊤k λ⊤k]⊤,whereλk =[λuk⊤λvk+ 12⊤]⊤and Λk,σ =[Λuk,σ⊤Λvk+ 12,σ⊤]⊤,1116.3. Staggered Time-Stepping Methodsthe latter representing the s internal stages of the adjoint solution at the kth timestep. The adjoint source θ is defined similarly, with internal stages Θk,σ.The linearization of the time-stepping equation,∂ t∂y, was derived in Section A.3.2.Its transpose is (see (A.200))∂ t∂y⊤=A⊤1 BC⊤1,1 IN C⊤2,1 −INA⊤2 B. . . . . . . . . . . .C⊤K−1,K−1 IN C⊤K,K−1 −INA⊤K BC⊤K,K IN.Ak and Ck,j were defined in (A.195) to beAk = IsN − τGo/ek (As ⊗ IN)Ck,j = −∂Fo/ek∂yjB = Bo/e.See Section A.3.2 for the definitions of Go/ek and∂Fo/ek∂yj, which contain the derivativesof fu and fv with respect to u and v. Important for this is the shorthandvo/eU,k− 12,σ= vk− 12+ τσ−1∑i=1i evenaσiUo/ek,i , vo/eV,k− 12,σ= vk− 12+ τσ−1∑i=1i oddaσiVo/ek+ 12,iuo/eU,k−1,σ = uk−1 + τσ−1∑i=1i oddaσiUo/ek,i , uo/eV,k,σ = uk + τσ−1∑i=1i evenaσiVo/ek+ 12,i,where o/e denotes whether the number of stages s is odd or even.1126.3. Staggered Time-Stepping MethodsWhile not apparent from the form above,∂ t∂y⊤is essentially upper (block) trian-gular (we can swap rows and columns to make it upper block diagonal), and thereforewe solve the adjoint system (6.1) by backsubstitution:λk = θk + λk+1 −C⊤k,kΛk −C⊤k+1,kΛk+1= θk + λk+1 +∂Fo/ek∂yk⊤Λk +∂Fo/ek+1∂yk⊤Λk+1Λk = A−⊤k (Θk −Bλk)=(IsN − τGo/ek (As ⊗ IN))−⊤(Θk −Bλk) .(6.7)The detailed solution procedure for s odd is summarized in Algorithm 6.6.Algorithm 6.6. The Adjoint Staggered Runge-Kutta Method, for s oddLet λK+1 = 0N×1 and ΛK+1 = 0sN×1. Recall that (see (2.44))voU,k− 12,σ= vk− 12+ τσ−1∑i=1i evenaσiUok,i, voV,k− 12,σ= vk− 12+ τσ−1∑i=1i oddaσiVok+ 12,iuoU,k−1,σ = uk−1 + τσ−1∑i=1i oddaσiUok,i, uoV,k,σ = uk + τσ−1∑i=1i evenaσiVok+ 12,i.For k = K, · · · , 1:• Computeλvk+ 12= qvk+ 12+ λvk+ 32+s∑σ=1σ oddΛ̂uk+1,σ +s∑σ=1σ evenΛ̂vk+ 32,σ1136.3. Staggered Time-Stepping Methods• Compute the (modified) internal stages for λv, for σ = s, · · · , 1:Λ̂vk+ 12,σ = Θvk+ 12,σ+ τ∂ fv(uoV,k,σ)∂u⊤ s∑i=σ+1i evenaiσ Λ̂vk+ 12,i + bσλvk+ 12 if σ odd∂ fu(voV,k− 12,σ)∂v⊤ s∑i=σ+1i oddaiσ Λ̂vk+ 12,i if σ even• Computeλuk = quk + λuk+1 +s∑σ=1σ oddΛ̂vk+ 12,σ +s∑σ=1σ evenΛ̂uk+1,σ• Compute the internal stages for λuk , for σ = s, · · · , 1:Λ̂uk,σ = Θuk,σ + τ∂ fu(voU,k− 12,σ)∂v⊤ s∑i=σ+1i evenaiσ Λ̂uk,i + bσλuk if σ odd∂ fv(uoU,k−1,σ)∂u⊤ s∑i=σ+1i oddaiσ Λ̂uk,i if σ evenThe solution procedure for even s is similar, see Section 2.3.2 and Appendix A.3.2 forguidance.ExampleThe adjoint time-stepping method corresponding to the 5-stage StagRK4 methodwith parameter b5 and γ = (6b5)−1/2 (see Section 2.3.2) is:For k = K, · · · , 1:• Compute the update for λv:λvk+ 12= qvk+ 12+ λvk+ 32+ Λ̂uk+1,1 + Λ̂vk+ 32,2 + Λ̂uk+1,3 + Λ̂vk+ 32,4 + Λ̂uk+1,51146.3. Staggered Time-Stepping Methods• Compute the (modified) internal stages for λv:Λ̂vk+ 12,5 =∂ fv(uk +τ2γVok+ 12,4)∂u⊤ (Θvk+ 12,5 + τ b5λvk+ 12)Λ̂vk+ 12,4 =∂ fu(vk− 12+τ4(2 + γ)Vok+ 12,1)∂v⊤ (Θvk+ 12,4 +τ2γ Λ̂vk+ 12,5)Λ̂vk+ 12,3 =∂ fv(uk − τ2γVok+ 12,2)∂u⊤ (Θvk+ 12,3 + τ b5λvk+ 12)Λ̂vk+ 12,2 =∂ fu(vk− 12+τ4(2− γ)Vok+ 12,1)∂v⊤ (Θvk+ 12,2 −τ2γ Λ̂vk+ 12,3)Λ̂vk+ 12,1 =∂ fv (uk)∂u⊤ (Θvk+ 12,1 +τ4(2− γ)Λ̂vk+ 12,2 +τ4(2 + γ)Λ̂vk+ 12,4++τ (1− 2b5)λvk+ 12).• Compute the update for λu:λuk = quk + λuk+1 + Λ̂vk+ 12,1 + Λ̂uk+1,2 + Λ̂vk+ 12,3 + Λ̂uk+1,4 + Λ̂vk+ 12,5.• Compute the (modified) internal stages for λu:Λ̂uk,5 =∂ fu(vk− 12+τ2γUok,4)∂v⊤ (Θuk,5 + τ b5λuk)Λ̂uk,4 =∂ fv(uk−1 +τ4(2 + γ)Uok,1)∂u⊤ (Θuk,4 +τ2γ Λ̂uk,5)Λ̂uk,3 =∂ fu(vk− 12− τ2γUok,2)∂v⊤ (Θuk,3 + τ b5λuk)Λ̂uk,2 =∂ fv(uk−1 +τ4(2− γ)Uok,1)∂u⊤ (Θuk,2 −τ2γ Λ̂uk,3)Λ̂uk,1 =∂ fu(vk− 12)∂v⊤ (Θuk,1 +τ4(2− γ)Λ̂uk,2 +τ4(2 + γ)Λ̂uk,4 + τ (1− 2b5)λuk).1156.4. Exponential Time-Differencing MethodsLetting b5 = 1/24 (and therefore γ = 2), the modified internal stages for λv areΛ̂vk+ 12,5 =∂ fv(uk + τVok+ 12,4)∂u⊤ (Θvk+ 12,5 +τ24λvk+ 12)Λ̂vk+ 12,4 =∂ fu(vk− 12+ τVok+ 12,1)∂v⊤ (Θvk+ 12,4 + τ Λ̂vk+ 12,5)Λ̂vk+ 12,3 =∂ fv(uk − τVok+ 12,2)∂u⊤ (Θvk+ 12,3+τ24λvk+ 12)Λ̂vk+ 12,2 =∂ fu(vk− 12)∂v⊤ (Θvk+ 12,2 − τ Λ̂vk+ 12,3)Λ̂vk+ 12,1 =∂ fv (uk)∂u⊤(Θvk+ 12,1+ τ Λ̂vk+ 12,4 +11τ12λvk+ 12).and the modified internal stages for λu areΛ̂uk,5 =∂ fu(vk− 12+ τUok,4)∂v⊤ (Θuk,5 +τ24λuk)Λ̂uk,4 =∂ fv(uk−1 + τUok,1)∂u⊤ (Θuk,4 + τ Λ̂uk,5)Λ̂uk,3 =∂ fu(vk− 12− τUok,2)∂v⊤ (Θuk,3 +τ24λuk)Λ̂uk,2 =∂ fv (uk−1)∂u⊤ (Θuk,2 − τ Λ̂uk,3)Λ̂uk,1 =∂ fu(vk− 12)∂v⊤(Θuk,1 + τ Λ̂uk,4 +11τ12λuk).6.4 Exponential Time-Differencing MethodsLastly, we discuss the adjoint ETDRK method, and apply it to Krogstad’s scheme asan illustration.1166.4. Exponential Time-Differencing Methods6.4.1 Exponential Runge-Kutta MethodsWe let λ =[λ̂⊤1 · · · λ̂⊤K]⊤represent the adjoint solution, with λ̂k =[Λ⊤k λ⊤k]⊤.The adjoint source θ is defined similarly, with internal stages Θk,σ.The linearization of the time-stepping equation,∂ t∂y, was derived in Section A.4.1.Its transpose is (see (A.234))∂ t∂y⊤=A⊤1 −B1IN C⊤2 D⊤2A⊤2 −B2. . . . . . . . .IN C⊤K D⊤KA⊤K −BKIN.Ak, Ck and Dk were defined in (A.229):Ak = IsN − τk ∂Nk∂yAsk−1 Ck = −∂Nk∂yk−1,Dk = −∂(eτkLk−1yk−1)∂yk−1− ∂(B⊤k yk)∂yk−1.One step of the solution procedure isλk = θk −C⊤k+1Λk+1 −D⊤k+1λk+1 (6.8a)followed by the internal stagesA⊤kΛk = Θk +Bkλk, (6.8b)where λK+1 = 0N×1 and ΛK+1 = 0sN×1.1176.4. Exponential Time-Differencing MethodsUsing the general formulas for Ak, Ck and Dk givesλk = θk +(∂Nk+1∂y)⊤Λk+1 +(∂ (eτkLkyk)∂yk⊤+∂ (B⊤k+1Yk+1)∂yk⊤)λk+1 (6.9a)with internal stagesΛk = Θk + τk−1(Ask−1)⊤(∂Nk∂y)⊤Λk +Bkλk. (6.9b)The detailed solution procedure is summarized in Algorithm 6.7.Algorithm 6.7. The Adjoint Exponential Runge-Kutta MethodLet λK+1 = 0N×1 and ΛK+1 = 0sN×1. For k = K, · · · , 1:• Computeλk = θk + eτk+1L⊤k λk+1 +s∑σ=1ecσ τk+1L⊤k(∂nk,σ∂y)⊤Λk+1,σ++{s∑i=1[∂nk,i∂yk⊤+(∂ eciτk+1Lk(yk)yfixedk∂yk⊤++ τk+1i−1∑j=1∂(aij(τk+1Lk(yk))Yk+1,j)∂yk⊤) ∂nk,i∂y⊤]Λk+1,i++(∂ (eτk+1Lk(yk)yfixedk )∂yk⊤+ τk+1s∑i=1∂(bi(τk+1Lk(yk))Yk+1,i)∂yk⊤)λk+1},where the terms in braces are ignored if Lk does not depend on yk.• For σ = s, · · · , 1,Λk,σ = Θk,σ + τk bσ (τkLk−1)⊤λk + τks∑j=σ+1ajσ(τkL⊤k−1)(∂nk−1,j∂y)⊤Λk,j.The derivatives in the braces in (6.10a) are computed as discussed in Section 4.4.2.1186.4. Exponential Time-Differencing MethodsExampleWe continue the example from Section 5.4 and apply the above algorithm to Krogstad’sscheme. Recall the matrix coefficients aσj and bσ given in (2.63). The adjoint schemeis given in Algorithm 6.8.Algorithm 6.8. The Adjoint Krogstad SchemeLet Λ̂k,σ =(∂nk−1,σ∂y)⊤Λk,σ. For k = K, . . . , 1:• In this step, let ϕℓ = ϕℓ(τk+1Lk(yk)) and ϕℓ,i = ϕℓ(cστk+1Lk(yk)). Computeλk = θk + eτk+1L⊤k(λk+1 + Λ̂k+1,4)+ e12τk+1L⊤k(Λ̂k+1,2 + Λ̂k+1,3)++ Λ̂k+1,1 +{4∑σ=1∂nk,i∂yk⊤Λk+1,i +∂ eτk+1Lk(yk)yfixedk∂yk⊤Λ̂k+1,4+ ++∂ e12τk+1Lk(yk)yfixedk∂yk⊤ (Λ̂k+1,2 + Λ̂k+1,3)++∂ (eτk+1Lk(yk)yfixedk )∂yk⊤+τk+12∂(ϕ1,2Yk+1,1)∂yk⊤Λ̂k+1,2++ τk((12∂(ϕ1,3Yk+1,1)∂yk⊤+∂(ϕ2,3(Yk+1,2 −Yk+1,1))∂yk⊤)Λ̂k+1,3++(∂(ϕ1,4Yk+1,1)∂yk⊤+ 2∂(ϕ2,4(Yk+1,3 −Yk+1,1))∂yk⊤)Λ̂k+1,4++(∂(ϕ1Yk+1,1)∂yk⊤− ∂(ϕ2 (3Yk+1,1 − 2Yk+1,2 − 2Yk+1,3 +Yk+1,4))∂yk⊤++ 4∂(ϕ3(Yk+1,1 −Yk+1,2 −Yk+1,3 +Yk+1,4))∂yk⊤)λk+1)}with λK+1 = Λ̂K+1,i = 0N×1. Ignore the terms in braces if Lk is independent ofyk.• Now let ϕℓ = ϕℓ(τkLk−1(yk−1)) and ϕℓ,i = ϕℓ(cστkLk−1(yk−1)). The internal1196.5. Stability, Convergence, and Order of Accuracy for Linear Problemsstages are computed as follows:Λk,4 = τk−1 b⊤4 λk = τk(4ϕ⊤3 − ϕ⊤2)λkΛk,3 = τk b⊤3 λk + τk a⊤4,3Λ̂k,4 = τk(2ϕ⊤2 − 4ϕ⊤3)λk + τk 2ϕ⊤2,4Λ̂k,4Λk,2 = τk b⊤2 λk + τk a⊤4,2Λ̂k,4 + τk a⊤3,2Λ̂k,3= τk(2ϕ⊤2 − 4ϕ⊤3)λk + τkϕ⊤2,3Λ̂k,3Λk,1 = τk b⊤1 λk + τk a⊤4,1Λ̂k,4 + τk a⊤3,1Λ̂k,3 + τk a⊤2,1Λ̂k,2= τk(ϕ⊤1 − 3ϕ⊤2 + 4ϕ⊤3)λk + τk(ϕ⊤1,4 − 2ϕ⊤2,4)Λ̂k,4++τk2(ϕ⊤1,3 − 2ϕ⊤2,3)Λ̂k,3 +τk2ϕ⊤1,2Λ̂k,2.(6.11)Other schemes can be handled similarly. We note that the procedure gives am-ple opportunity for parallelization and precomputing quantities, for instance whencomputing the products of λk with ϕ⊤ℓ or the products of Λ̂k,i with ϕ⊤ℓ,i.6.5 Stability, Convergence, and Order of Accuracyfor Linear ProblemsLet us briefly discuss some properties of interest of the adjoint RK-type methods ifthe PDE happens to be linear. The consistency and order of accuracy of adjointtime-stepping methods corresponding to regular RK methods for nonlinear problemshave been investigated by Sandu in [100], and an analysis of the order of accuracy ofthese adjoint RK methods (up to order 4) in the context of optimal control was givenby Hager in [52]. The approach taken in these papers is using Taylor expansions.However, we find that for linear problems the simple argument given in the the-orem below, which is relevant to not only regular RK methods, but also IMEX,staggered RK and ETDRK methods, should suffice.Theorem 6.9. For linear PDEs, the adjoint RK-type method inherits the convergenceand stability properties from the corresponding forward method. It also has the same1206.5. Stability, Convergence, and Order of Accuracy for Linear Problemsorder of accuracy.Proof. A single step of any RK method applied to a homogeneous problem can bewritten asyk = Ψ (τkL)yk−1, (6.12)where Ψ is the transfer operator that represents the propagation from one time levelto the next. For instance, after some manipulation we obtain for explicit, regular RKmethodsΨ (τkL) = I+s∑σ=1bσCk,σ (τkL)withCk,σ (τkL) = τkL(IN + τkσ−1∑σ1=1aσσ1L+ τ2kσ−1∑σ1=2σ1−1∑σ2=1aσσ1aσ1σ2L2++τ 3kσ−1∑σ1=3σ1−1∑σ2=2σ2−1∑σ3=1aσσ1aσ1σ2aσ2σ3L3 + · · ·+ τσ−1kσ∏i=2(ai,i−1L)).Slightly different representations of Ψ (τkL) for regular RK methods can be foundin the literature. It is possible to find expressions similar to this for IMEX RK andStagRK methods, although we do not give them here.Letting Ψk = Ψ (τkL), the homogeneous forward problem can be written asIN−Ψ2 IN−Ψ3 IN. . . . . .−ΨK INy1y2y3...yK=Ψ1y000...0,1216.5. Stability, Convergence, and Order of Accuracy for Linear Problemsand therefore the homogeneous adjoint system isIN −Ψ⊤2IN −Ψ⊤3. . . . . .IN −Ψ⊤KINλ1λ2...λK−1λK=00...0ΨK+1λK+1.Hence λk−1 = Ψ⊤k (L)λk. It is fairly straightforward, albeit tedious, to show thatthis condition holds for the adjoint RK method we found in earlier in this chapter.Note that Ψ⊤k (L) = Ψk(L⊤).By the Lax equivalence theorem, a consistent linear method in the form (6.12) isconvergent if and only if it is Lax-Richtmyer stable. Letting τk = τ for simplicity,Lax-Richtmyer stability means that for each time T , there is a constant CT > 0 suchthat ∥∥∥Ψ (τL)K∥∥∥ ≤ CTfor all τ > 0 and integers K for which τK < T . We know that this must hold forΨ (τL) (because the forward method is stable), and hence the adjoint time-steppingmethod is Lax-Richtmyer stable as well since matrix transposition is invariant under,e.g., the 2-norm.Now consider a continuous forward solution y(t) and a continuous adjoint solutionλ(t). An arbitrary RK method applied to the scalar test problem y˙(t) = µy(t), whereµ is some scalar (it represents one of the eigenvalues of L). Then we have that overa single time step yk = Ψ(z)yk−1, where z = τkµ, and if the method is pth orderaccurate we must have thatΨ(z)− ez = O(zp+1).Since the same Ψ applies to the corresponding scalar adjoint RK method, it follows1226.5. Stability, Convergence, and Order of Accuracy for Linear Problemsthat for a single time step the same condition must hold for the solution of thecontinuous adjoint scalar test problem, and therefore the adjoint RK method musthave the same order of accuracy as the forward method.This result also applies to IMEX RK and StagRK methods, if the PDE is lin-ear. Nonlinear PDEs require a much more cumbersome approach involving Taylorexpansions of Ψ and lie outside the scope of this thesis, but, as mentioned above, see[52, 100].We will not discuss here the order reduction phenomenon [13, 88, 108] that canhappen close to the boundary when RK methods are applied to nonhomogeneousproblems, but it is possible that the absence of source terms in the internal stages ofthe adjoint RK methods would lessen this effect. At the least it seems reasonable toassume that the reduction of order suffered by the adjoint RK method would not beworse than that of the RK method, but a more careful analysis needs to be done tosupport this statement.123Chapter 7Data Misfit Function ExamplesAn important application of large-scale distributed parameter estimation is that ofseismic imaging. Seismic waves, from either natural or man-made sources, are mea-sured at seismic receivers, with each receiver generating a trace that displays thewaveform passing through that point in space. The observed data is generated usinga number of receivers, and from these measurements one wants to infer characteristicsabout the structure of the subsurface. The problem is commonly referred to as fullwaveform inversion (FWI) if it is treated as a parameter estimation problem andnumerical opimization procedures are used; see [36] for a good introduction to thesubject.There are two main approaches to obtaining seismic data, either reflection seis-mology, where seismic waves are generated using sources that are usually close tothe surface, and their reflections from the interfaces between different strata in thesubsurface are measured using receivers that are also usually located close to the sur-face. Alternatively, seismic tomography measures seismic waves that have propagatedthrough the subsurface without having been reflected off of interfaces. Usually boththe seismic sources and receivers are located underground, with the sources located inone borehole and the receivers in a different one, and the properties of the subsurfacebetween these two boreholes is investigated in this way.Over the years different misfit functions have been developed to help improve thequality of the recovered model parameters2. In this chapter we present the derivationsof the derivatives of some of these misfits, namely the least-squares amplitude misfit,2Recall that the misfit function gives a scalar value that measures the discrepancy between thesimulated data d = d(p) and the actual observed data dobs.124Chapter 7. Data Misfit Function Examplesthe cross-correlation time shift misfit and the more recent interferometric misfit, thelatter being the main motivation for the work in this chapter. It is not yet well-known,but could potentially become more important in the future. As before, we will useM = M(d,dobs) to denote the misfit function. While all of these misfits can be usedfor reflection seismology, they are more useful, and used more often, in the contextof seismic tomography.The gradient ∇pM requires ∇dM and the Hessian ∂2M∂p∂p, as well as its Gauss-Newton (3.16) and Levenberg-Marquardt (3.17) approximations, additionally require∂2M∂d ∂d. We will compute both of these derivatives for each of the misfit functionsunder consideration here.We let di be the simulated measurements taken at the ith of NR receivers, so thatthe simulated data is d =[d⊤1 · · · d⊤NR]⊤. Similarly, the actual observations at theith receiver are given by dobsi . These measurements are naturally discrete in practice,but in what follows it will be beneficial to at first treat them as continuous in orderto mathematically justify the misfit function under consideration.Not discussed at length is the least-squares (or ℓ2) waveform difference misfit, orsimply least-squares misfit, which is by far the most commonly used misfit becauseof its simplicity, although often it may not necessarily be the best choice, especiallyin seismic imaging. The misfit is given byM =12∥∥W (d− dobs)∥∥2 = 12(d− dobs)⊤W⊤W (d− dobs) ,where W is a weighting matrix, and its derivatives are∇dM =W⊤W(d− dobs)and∂2M∂d ∂d=W⊤W.1257.1. Least-Squares Amplitudes MisfitFor the least-squares amplitude misfit and cross-correlation time shift misfit, see[36] and the references therein for both background information and their gradientsin a continuous setting. Our contribution here is to derive the gradient of M withrespect to d in a discrete setting (as much as possible), and to derive the expressionfor the product of the Hessian∂2M∂d∂dwith some arbitrary vector v of length Nd,the latter not having been done before as far as we know. The derivatives of theinterferomentric are also novel to the best of our knowledge. For all three misfits wegive algorithms for computing the gradient and the action of the Hessian.A misfit that is not discussed in this thesis is the time-frequency misfit, whichis more versatile than the least-squares amplitude and cross-correlation time shiftmisfits, but also signifcantly more complicated. See [36] for a discussion.7.1 Least-Squares Amplitudes MisfitThe least squares amplitude measures the relative difference in total energy in thesimulated waveform compared to the observed waveform3. It has been used in large-scale seismic tomography applications in for instance, [111, 112, 119]. Our discussionis based on [36], where the gradient of the M was computed in a continuous setting.Letting Ai = ‖d‖ =√d⊤i di and Aobsi =∥∥dobsi ∥∥ = √dobs⊤i dobsi , then the misfitfunction that we want to minimize isM =12∑i(Ai −Aobsi )2(Aobsi )2 = 12∑i(√d⊤i di −√dobs⊤i dobsi)2dobs⊤i dobsi. (7.1)Using the chain rule gives∇diM =(√d⊤i di −√dobs⊤i dobsi)(dobs⊤i dobsi)√d⊤i did3The "amplitude" refers to the fact that if just a single wave is observed then the misfit willmeasure the relative difference in the simulated and actual wave amplitudes.1267.1. Least-Squares Amplitudes Misfit=1dobs⊤i dobsidi − 1√dobs⊤i dobsi√d⊤i didi=Ai −Aobsi(Aobsi )2Ai di, (7.2)so that∇dM =A1 −Aobs1(Aobs1 )2A1 d1...ANR −AobsNR(AobsNR)2ANR dNR .The numerical implementation of this gradient is straightforward and is given inAlgorithm 7.1.Algorithm 7.1. Gradient of the Least-Squares Amplitude MisfitFor i = 1, · · · , NR,• If not already available, compute the norm of each observed and simulated trace.• Setθi =Ai −Aobsi(Aobsi )2Ai di.Finally, set∇dM =[θ⊤1 · · · θ⊤NR]⊤.The Hessian is also easy to find, we have∂2M∂d ∂d=∂2M∂d1∂d1. . .∂2M∂dNR ∂dNR1277.2. Cross-Correlation Time Shift Misfitwith∂2M∂di∂di=1dobs⊤i dobsiI− 1√dobs⊤i dobsi√d⊤i diI+1√dobs⊤i dobsi√d⊤i di(d⊤i di)did⊤i=Ai −Aobsi(Aobsi )2Ai I+ 1Aobsi A3i did⊤i . (7.3)The procedure for computing the product of the Hessian of M with some arbitraryvector v of length Nd, defined analogously to d, is given in Algorithm 7.2.Algorithm 7.2. Product of the Hessian of the Least-Squares Amplitude MisfitProcedure for computing the product of the Hessian∂2M∂d ∂dwith an arbitrary vectorv =[v⊤1 · · · v⊤NR]⊤, where each vi has length Ndi.For i = 1, · · · , NR,• If not already available, compute the norm of each observed and simulated trace.• Setθi =Ai −Aobsi(Aobsi )2Ai vi + d⊤i viAobsi A3idi.Finally, set∂2M∂d ∂dv =[θ⊤1 · · · θ⊤NR]⊤.7.2 Cross-Correlation Time Shift MisfitExplicitly including phase information of the observed and simulated waveformsmakes it possible to extract more information from the data, and this is the goalof the cross-correlation time shift misfit. The misfit was introduced in [81] and issimilar in spirit to ideas discussed previously in [33], [17] and [76]. Further develop-ment was done in [40] and the misfit was applied to actual data in [135] and [20]. We1287.2. Cross-Correlation Time Shift Misfitagain follow the discussion in [36], where the gradient of the misfit was derived in acontinuous setting.The cross-correlation time shift misfit assumes that there is only a phase differencebetween the observed waveform and the simulated one, and measures the magnitudeof this shift by cross-correlating the observed and simulated data.To start, let us assume that the observed and simulated data at the ith receiverare both continuous time series dobsi (t) and di(t). We do this in order to define theoptimal or cross-correlation time shift for the ith receiver byTi := argmaxt′C (t′; di,dobsi ) = argmaxt′∫ T0dobsi (t)di(t + t′) dt (7.4)and the corresponding optimality condition below. For simplicity we assume that Tiis a multiple of the discrete time step τ . Note that the optimal time shift Ti dependson di. The data misfit is given byM =12∑iT 2i , (7.5)and its derivatives in terms of Ti are∇dM =T1∇d1 T1...TNR∇dNR TNRand∂2M∂d∂d=∂2M∂d1∂d1. . .∂2M∂dNR ∂dNRwith∂2M∂di∂di= T1 ∂2Ti∂di∂di+∇diTi (∇diTi)⊤.1297.2. Cross-Correlation Time Shift MisfitWe therefore need to find expressions for the terms ∇diTi and∂ 2Ti∂di∂di. Usingintegration by parts we get the following condition for Ti:0 =ddt′C (t′;di,dobsi )∣∣∣∣t′=Ti=∫ T0dobsi (t).di(t+ Ti) dt= −∫ T0.dobsi (t− Ti)di(t)dt,where we have assumed di(0) = dobsi (0) = 0 and di(T ) = dobsi (T ) = 0.Now consider the corresponding discrete setting, where this optimality conditionbecomes0 = −(S (Ti).dobsi)⊤di, (7.6)with S (t′) being a discrete shift operator that has the effect of moving whatever timeseries it acts on by an amount t′ to the right and.dobsi being the discretized derivativeof the continuous observed data.Differentiating (7.6) with respect to d then gives0 = −S (Ti).dobsi − d⊤i( .S (Ti).dobsi)∇diTi (7.7)⇒ ∇diTi = −S (Ti).dobsid⊤i( .S (Ti).dobsi) .At this point note that.S (Ti) ≈ S (Ti + τ)− S (Ti)τ= S (Ti) S (τ)− 1τand thatS (τ)− 1τdi ≈.disince S (τ) has the effect of shifting di to the right by one time step. Therefored⊤i( .S (Ti).dobsi)≈ .d⊤i(S (Ti).dobsi).1307.2. Cross-Correlation Time Shift MisfitThis relationship can be shown to be exact in the continuous setting using integrationby parts.Next, since Ti is the optimal time shift and we assume that the observed andsimulated data vary only in the arrival time and not in the shape or amplitude of thewaveform, we have that di = S (Ti)dobsi and hence.d⊤i(S (Ti).dobsi)=.d⊤i.di.As a result of all of this we have∇diTi = −S (Ti).dobsid⊤i( .S (Ti).dobsi) = − 1∥∥∥ .di∥∥∥2.di. (7.8)It follows that∇diM = Ti∇diTi = −Ti∥∥∥ .di∥∥∥2.di. (7.9)The algorithm for computing the gradient of M with respect to the simulated data isgiven below.Algorithm 7.3. Gradient of the Cross-Correlation Time-Shift MisfitFor i = 1, · · · , NR,• If not already available, compute the optimal time shift Ti by finding the shiftthat maximizes the value of the cross-correlation in (7.4).• Use a finite difference approximation to find the approximate derivative .di ofthe simulated data di.• Setθi = − Ti∥∥∥.di∥∥∥2.di.1317.2. Cross-Correlation Time Shift MisfitFinally, set∇dM =[θ⊤1 · · · θ⊤NR]⊤.Let us now compute the Hessian of Ti with respect to its corresponding trace di.For this we go back to (7.7). Taking the derivative with respect to di on both sidesgives0 =∂∂ di(S (Ti).dobsi + d⊤i( .S (Ti).dobsi)∇diTi)=(2.S (Ti).dobsi + d⊤i(..S (Ti).dobsi)∇diT)∇dT ⊤i + d⊤( .S (Ti).dobsi) ∂2Ti∂di∂diso that∂2Ti∂di∂di= − 1d⊤i( .S (Ti).dobsi) (2 .S (Ti) .dobsi + d⊤i (..S (Ti) .dobsi )∇diTi)∇diT ⊤i= − 1.d⊤i(S (Ti).dobsi) (2S (Ti) ..dobsi + .d⊤i (S (Ti) ..dobsi )∇diTi)∇diT ⊤i= − 1∥∥∥ .di∥∥∥2(2..di +.d⊤i..di∇diTi)∇diT ⊤i= − 1∥∥∥ .di∥∥∥4−2..di .d⊤i + .d⊤i ..di .di .d⊤i∥∥∥ .di∥∥∥2 (7.10)where we have again used the property that the derivatives of S(Ti) can be moved tothe vectors and also the assumption that the optimally shifted simulated data is thesame as the observed data, which must also hold for their derivatives. Therefore∂ 2M∂di∂di= Ti ∂2Ti∂di∂di+∇diT (∇diTi)⊤=Ti∥∥∥ .di∥∥∥42..di − .d⊤i ..di .di∥∥∥ .di∥∥∥2 .d⊤i + .di .d⊤i∥∥∥ .di∥∥∥4 . (7.11)1327.2. Cross-Correlation Time Shift MisfitThe procedure for computing the product of the Hessian of M with some arbitraryvector v of length Nd, defined analogously to d, is given in Algorithm 7.4.Algorithm 7.4. Product of the Hessian of the Cross-Correlation Time Shift Mis-fitProcedure for computing the product of the Hessian∂2M∂d ∂dwith an arbitrary vec-tor v =[v⊤1 · · · v⊤NR]⊤, where each vi has length Ndi.For i = 1, · · · , NR,• If not already available, compute the optimal time shift Ti by finding the shiftthat maximizes the value of the cross-correlation in (7.4).• Use finite difference approximations to find the approximate derivatives .di and..di of the trace di.• Setθi = Ti.d⊤i vi∥∥∥.di∥∥∥42..di − .d⊤i ..di .di∥∥∥.di∥∥∥2+ .d⊤i vi∥∥∥.di∥∥∥4.di.Finally, set∂ 2M∂d∂dv =[θ⊤1 · · · θ⊤NR]⊤.Note: Testing the correctness of the numerical computation of ∇dM and ∂2M∂d∂dvmay prove tricky since a perturbation of the simulated data d will usually not leadto a change in any of the Ti; the best approach for testing the implementation isto test the gradient ∇pM (which of course depends on ∇dM) and using a verysmall measurement interval τ ; similarly for∂ 2M∂d∂dv.1337.3. Interferometric Misfit7.3 Interferometric MisfitThe interferometric misfit has shown some potential [67, 129] in application to seis-mic imaging problems. It does not appear to be too well-known, but providing thederivatives of this misfit with respect to the data will hopefully facilitate the use ofgradient-based optimization procedures with this misfit.The idea is to measure the phase difference, using cross-correlation, between thesimulated data measured by a given pair of receivers, then comparing it to the cross-correlation of the actual data from those receivers. This serves to give a relativemeasure of discrepancy between the simulated and observed data [67], rather than anabsolute one, as would be the case for the least-squares misfit.The form of the interferometric misfit that we are interested in measures thedifferences between the cross-correlations of pairs of simulated and observed data asfollows:12∑i,jEij∥∥di(t) ⋆ dj(t)− dobsi (t) ⋆ dobsj (t)∥∥2 .The matrix E is a NR×NR matrix of zeros and ones that determines whether a givenpair is included in the summation or not. This is a mechanism that is introduced sothat only particular combinations of traces are included in the misfit computation,allowing us to easily exclude receiver pairs that are, for instance, too far apart toprovide useful information. The cross-correlation is defined to bedi(t) ⋆ dj(t) =∫Rdi(t)dj(t + t′) dt′,and similarly for dobsi (t).For discrete measurements the cross-correlation can be efficiently computed usingthe Fast Fourier Transform (fft) and its inverse (ifft). The MATLAB pseudocode1347.3. Interferometric Misfitlooks as follows:di ⋆ dj = fftshift(ifft(fft (di, Nω) . ∗ fft (dj , Nω))). (7.12)Each trace di is assumed to have a length of Ndi . For simplicity, we assume thatthere is a fixed number of frequencies Nω used by the Fourier transform, with Nω ≥Ndi +Ndj − 1 for all i, j.The fft above can be represented using the Nω × Nω discrete Fourier transformmatrix F. The ifft is represented by F−1 = 1NωF. The discrete cross-correlationcomputation (7.12) can therefore also be written asdi ⋆ dj = JF−1 (diag (FUijdi) FUjidj) = JF−1 (diag (FUijdi) FUjidj) (7.13)which is the form we will work with here. We have introducedJ =0⌊Nω/2⌋×⌈Nω/2⌉ I⌊Nω/2⌋I⌈Nω/2⌉ 0⌈Nω/2⌉×⌊Nω/2⌋ and Ui = INdi0(Nω−Ndi)×Ndiwhere J acts as the matrix representation of fftshift and Ui appends Nω − Ndizeros to a vector of length Ndi . J⊤ acts as the matrix representation of ifftshift.The discrete cross-correlation misfit is thenM =12∑i,jEij∥∥di ⋆ dj − dobsi ⋆ dobsj ∥∥2 (7.14)with ⋆ as in (7.13). However, we need to express M as a function of d in order totake derivatives with respect to d. To this end, letG1 := 1NR ⊗ ((INR ⊗ F)U) ,G2 :=(E˜⊗ INω) ((INR ⊗ F)U)G3 := IN2R⊗ (JF−1)(7.15)1357.3. Interferometric MisfitwithU =U1. . .UNR , E˜ =E1,:. . .ENR,:and 1NR a vector of ones of length NR. It is easy to see that G1 and G2 are matricesof size N2RNω ×Nd and G3 is a matrix of size N2RNω ×N2RNω. Note thatG⋆3G3 =(IN2R⊗ ((F−1)⋆ J⊤))(IN2R⊗ (JF−1)) = 1NωIN2RNωsince J⊤J = INω and (F−1)⋆F−1 = 1NωFF−1 = 1NωINω , where we have used theproperty of the discrete Fourier transform matrix that F⋆ = F = Nω F−1.(7.14) can therefore be rewritten asM =12∥∥G3 (diag (G2d)G1d− diag (G2dobs)G1dobs)∥∥2 . (7.16)Then the gradient with respect to the data d is∇dM = (diag (G2d)G1 + diag (G1d)G2)⋆ ··G⋆3G3(diag (G2d)G1d− diag(G2dobs)G1dobs)=1Nω(G⋆1 diag(G2d)+G⋆2 diag(G1d)) ·· (diag (G2d)G1d− diag (G2dobs)G1dobs) (7.17)It is beneficial to bring this expression into a form that is easier to compute byrewriting them in terms of Fourier transforms and cross-correlations. We start bylooking at the products G1x, G2x and G⊤1 y, G⊤2 y, where x =[x⊤1 · · · x⊤NR]⊤isa vector of size Nd =∑NRi=1Ndi and y =[y⊤1 · · · y⊤NR]⊤is a vector of size N2RNω,with yi =[y⊤i,1 · · · y⊤i,NR]⊤and each yi,j having length Nω:1367.3. Interferometric Misfit• G1x:G1x = 1NR ⊗ ((INR ⊗ F)U)x = 1NR ⊗FU1x1...FUNRxNR =: 1NR ⊗ z (x) (7.18)• G2x:G2x =(E˜⊗ INω)FU1x1...FUNRxNR =E1,: ⊗ FU1x1...ENR,: ⊗ FUNRxNR =:w1 (x1)...wNR (xNR)(7.19)with wi,j (xi) = Ei,j FUixi.• G⋆1y:G⋆1y = 1⊤NR⊗ (U⊤ (INR ⊗ F⋆))y=U⊤1. . .U⊤NRFNR∑i=1yi =U⊤1 F∑NRi=1 yi,1...U⊤NRF∑NRi=1 yi,NR (7.20)• G⋆2y:G⋆2y = U⊤(INR ⊗(F)⋆)(E˜⊤ ⊗ INω)y= U⊤ (INR ⊗ F)∑NRi=1 E1,i y1,i...∑NRi=1 ENR,i yNR,i =U⊤1 F∑NRi=1 E1,i y1,i...U⊤NRF∑NRi=1 ENR,i yNR,i . (7.21)The conjugates of the above products are formed by conjugating F and its inverse.We also need the following terms:1377.3. Interferometric Misfit• G1d⊙G1d:G1d⊙G1d = 1NR ⊗ (z (d)⊙ z (d))= 1NR ⊗(FU1d1)⊙ (FU1d1)...(FUNRdNR)⊙ (FUNRdNR) (7.22)• G2d⊙G2d:G2d⊙G2d =w1 (d1)⊙w1 (d1)...wNR (dNR)⊙wNR (dNR) (7.23)with wi,j (di)⊙wi,j (di) = Ei,j (FUidi)⊙(FUidi).• G2d⊙G1d:G2d⊙G1d =w1 (d1)⊙ z (d)...wNR (dNR)⊙ z (d) (7.24)with wi,j (di)⊙zj (d) = Ei,j(FUidi)⊙(FUjdj), and similarly forG2dobs⊙G1dobs.• G2d⊙G1d:G2d⊙G1d =w1 (d1)⊙ z (d)...wNR (dNR)⊙ z (d) (7.25)with wi,j (di)⊙ zj (d) = Ei,j (FUidi)⊙ (FUjdj).• G2d⊙G1d:G2d⊙G1d =w1 (d1)⊙ z (d)...wNR (dNR)⊙ z (d) (7.26)with wi,j (di)⊙ zj (d) = Ei,j(FUidi)⊙ (FUjdj).1387.3. Interferometric MisfitUsing the above equations, we will now simplify the expression for the gradient in(7.17). Letr = G2d⊙G1d−G2dobs ⊙G1dobs. (7.27)Then, using (7.19) and (7.20),G⋆1 diag(G2d)r =U⊤1 F∑NRi=1 (FUidi)⊙ ri1...U⊤NRF∑NRi=1 (FUidi)⊙ riNR ; (7.28)we left out the Eij since Eij = E2ij and there already is a Eij appearing in rij .Similarly, using (7.18) and (7.21)G⋆2 diag(G1d)r =U⊤1 F∑NRi=1 E1i y1i...U⊤NRF∑NRi=1 ENRi yNRi ,with y = G1d⊙ r = (1NR ⊗ z (d))⊙ r, whereyi = zi (d)⊙ ri =FU1d1 ⊙ ri1...FUNRdNR ⊙ riNR .ThereforeG⋆2 diag(G1d)r =U⊤1 F∑NRi=1(FUidi)⊙ r1i...U⊤NRF∑NRi=1(FUidi)⊙ rNRi . (7.29)The algorithm for computing the gradient is given below.1397.3. Interferometric MisfitAlgorithm 7.5. Gradient of the Interferometric Misfit• If not already available, compute the discrete Fourier transform of each tracedi:d̂i = fft (pad (di))for i = 1, · · · , NR, where pad pads di with Nω −Ndi zeros. Likewise,d̂obsi = fft(pad(dobsi)).• Set θi = 0Ndi×1 for i = 1, · · · , NR.• For j = 1, · · · , NR compute:◦ Set φ1 = 0Nω×1 and φ2 = 0Nω×1 (these are temporary storage arrays).◦ For j = 1, · · · , NR:φ1 = φ1 + Eij(d̂i ⊙(d̂j ⊙ d̂i − d̂obsj ⊙ d̂obsi))andφ2 = φ2 + Eji(d̂j ⊙(d̂i ⊙ d̂j − d̂obsi ⊙ d̂obsj)).• Computeθi = ℜ(restrict(ifft (φ1) +1Nωfft (φ2))),where restrict restricts the vector to the first Ndi entries.• Finally, set∇dM =[θ⊤1 · · · θ⊤NR]⊤.1407.3. Interferometric MisfitDifferentiating (7.17) with respect to d gives the Hessian∂2M∂d ∂d=1Nω[G⋆1 diag(G2d⊙G2d)G1 +G⋆2 diag(G1d⊙G1d)G2 ++G⋆1 diag(G2d⊙G1d)G2 +G⋆2 diag(G1d⊙G2d)G1++G⋆1 diag(G2d⊙G1d−G2dobs ⊙G1dobs)G2++G⋆2 diag(G2d⊙G1d−G2dobs ⊙G1dobs)G1]. (7.30)Our goal now is to find the expression for the product of the Hessian with an arbitraryvector v of length Nd. Letting r be defined as in (7.27), we look at each of the termsin (7.30) separately:• G⋆1 diag(G2d⊙G2d)G1v:Use (7.20) and let y =(G2d⊙G2d)⊙G1v, withyi,j = wi,j (di)⊙wi,j (di)⊙ zj (v)= Eij (FUidi)⊙(FUidi)⊙ (FUjvj) ,where we have used (7.23) and (7.18). ThenG⋆1 diag(G2d⊙G2d)G1v ==U⊤1 F∑NRi=1Ei1 (FUidi)⊙(FUidi)⊙ (FU1v1)...U⊤NRF∑NRi=1EiNR (FUidi)⊙(FUidi)⊙ (FUNRvNR) (7.31)• G⋆2 diag(G1d⊙G1d)G2v:Use (7.21) and let y =(G1d⊙G1d)⊙G2v, withyi,j = zj (d)⊙ zj (d)⊙wi,j (vi)= Eij(FUjdj)⊙ (FUjdj)⊙ (FUivi) ,1417.3. Interferometric Misfitwhere we have used (7.22) and (7.19). ThenG⋆2 diag(G1d⊙G1d)G2v ==U⊤1 F∑NRi=1 E1i(FUidi)⊙ (FUidi)⊙ (FU1v1)...U⊤NRF∑NRi=1 ENRi(FUidi)⊙ (FUidi)⊙ (FUNRvNR) (7.32)• G⋆1 diag(G2d⊙G1d)G2v:Use (7.20) and let y =(G2d⊙G1d)⊙G2v, withyi,j = wi,j (di)⊙ zj (d)⊙wi,j (vi)= Eij (FUidi)⊙ (FUjdj)⊙(FUivi),where we have used (7.25) and (7.19). ThenG⋆1 diag(G2d⊙G1d)G2v ==U⊤1 F∑NRi=1Ei1 (FUidi)⊙ (FU1d1)⊙(FUivi)...U⊤NRF∑NRi=1EiNR (FUidi)⊙ (FUNRdNR)⊙(FUivi) (7.33)• G⋆2 diag(G1d⊙G2d)G1v:Use (7.21) and let y =(G1d⊙G2d)⊙G1v, withyi,j = zj (d)⊙wi,j (di)⊙ zj (v)= Eij(FUjdj)⊙ (FUidi)⊙ (FUjvj) ,where we have used (7.26) and (7.18). ThenG⋆2 diag(G1d⊙G2d)G1v =1427.3. Interferometric Misfit=U⊤1 F∑NRi=1 E1i(FUidi)⊙ (FU1d1)⊙ (FUivi)...U⊤NRF∑NRi=1 ENRi(FUidi)⊙ (FUNRdNR)⊙ (FUivi) (7.34)• G⋆1 diag(G2d⊙G1d−G2dobs ⊙G1dobs)G2v:Using (7.20), (7.19), (7.24) and (7.27):G⋆1 diag(G2d⊙G1d−G2dobs ⊙G1dobs)G2v == G⋆1 diag (r)G2v= G⋆1 diag(G2v)r=U⊤1 F∑NRi=1 (FUivi)⊙ ri1...U⊤NRF∑NRi=1 (FUivi)⊙ riNR . (7.35)Also look at (7.28).• G⋆2 diag(G2d⊙G1d−G2dobs ⊙G1dobs)G1v:Using (7.21), (7.18), (7.24) and (7.27):G⋆2 diag(G1d⊙G1d−G1dobs ⊙G1dobs)G2v == G⋆2 diag (r)G1v= G⋆2 diag(G1v)r=U⊤1 F∑NRi=1(FUivi)⊙ r1i...U⊤NRF∑NRi=1(FUivi)⊙ rNRi . (7.36)Also look at (7.29).We are now ready to give the procedure for computing the product of the Hessian ofM with some arbitrary vector v of length Nd, defined analogously to d.1437.3. Interferometric MisfitAlgorithm 7.6. Product of the Hessian of the Interferometric MisfitProcedure for computing the product of the Hessian∂2M∂d ∂dwith an arbitrary vectorv =[v⊤1 · · · v⊤NR]⊤, where each vi has length Ndi.For i = 1, · · · , NR,• If not already available, compute the discrete Fourier transform of each tracedi:d̂i = fft (pad (di))for i = 1, · · · , NR, where pad pads di with Nω −Ndi zeros. Likewise,d̂obsi = fft(pad(dobsi)).• Similarly, compute the discrete Fourier transform of each trace vi:v̂i = fft (pad (vi))for i = 1, · · · , NR.• Set θi = 0Ndi×1 for i = 1, · · · , NR.• For i = 1, · · · , NR:◦ Set φ1 = 0Nω×1 and φ2 = 0Nω×1 (these are temporary storage arrays).◦ For j = 1, · · · , NR compute:φ1 = φ1 + Eij(d̂j ⊙(d̂j ⊙ v̂i + d̂i ⊙ v̂j)+ v̂j ⊙(d̂j ⊙ d̂i − d̂obsj ⊙ d̂obsi))1447.3. Interferometric Misfitandφ2 = φ2 + Eji(d̂j ⊙(d̂i ⊙ v̂j + d̂j ⊙ v̂i)+ v̂j ⊙(d̂i ⊙ d̂j − d̂obsi ⊙ d̂obsj)).• Computeθi = ℜ(restrict(ifft (φ1) +1Nωfft (φ2))),where restrict restricts the vector to the first Ndi entries.• Finally, set∂2M∂d ∂dv =[θ⊤1 · · · θ⊤NR]⊤.145Chapter 8Numerical ExperimentAs an interesting and simple application of the methods derived in this thesis forthe ETDRK method, we consider the parameter estimation problem applied to thefollowing version of the fourth-order Swift-Hohenberg model on the torus Ω = T2 :[0, Lx)× [0, Ly):∂y∂ t= ry − (1 +∇2)2 y + gy2 − y3, (8.1)where r > 0 and g are parameter functions that determine the behaviour of thesolution. After spatially discretizing in some appropriate way we have∂y∂ t= diag (r)y− (I+∇2h)2 y + diag (g)y2 − y3, (8.2)with y = y(t), r and g being the spatial discretizations of r and g respectively,and ∇2h the spatially discretized Laplace operator. The Swift-Hohenberg model isan example of a PDE whose solutions exhibit pattern formation, a phenomenon thatoccurs in many different branches of science, for instance biology (morphogenesis,vegetation patterns, animal markings, growth of bacterial colonies, etc.), physics (liq-uid crystals, nonlinear waves, Bénard cells, etc.) and chemical kinetics (e.g. theBelousov-Zhabotinsky, CIMA and PA-MBO relations). The field is enormous; see[44, 82] for some interesting applications, but there are many others. The Swift-Hohenberg model itself was derived from the equations of thermal convection [117],and it is possible to use other nonlinear terms than the one used here.The fourth-order derivative term implies that this equation is very stiff, makingit a good candidate for use with an exponential integrator. The obtained patterns146Chapter 8. Numerical Experimentdepend on the parameters m =[r⊤ g⊤]⊤. It is possible to obtain different patternsin different regions of the domain if these parameters exhibit spatial variability, whichis the case that we consider here. This is therefore a distributed parameter estimationproblem.We take the linear part to be L = − (I+∇2h)2, hence n(y,m, t) = diag (r)y(t) +diag (g)y(t)2 − y(t)3. Notice that we have included the linear term diag (r)y in thenonlinear part of the equation: the dominant differential term is in L anyway.Experiment setupUsing a finite difference spatial discretization with Nx ×Ny grid points, the periodicboundary conditions allow us to diagonalize the linear term using the pseudospectralmethod, where we compute the product with the linear part in the frequency domainand then switch back to the real domain to compute the nonlinear part. Hence∂ ŷ(t)∂ t= L̂ ŷ(t) + F(diag (r)F−1ŷ(t) + diag (g)(F−1ŷ(t))2 − (F−1ŷ(t))3) , (8.3)where F represents the 2D Fourier transform in space, F−1 is its inverse, ŷ(t) = Fy(t)and L̂ is the diagonalized differential operator. The wave numbers on this grid arekx =2πLx(−Nx2: Nx2− 1) and ky = 2πLy (−Ny2: Ny2− 1), so ∇2h is then diagonalized witheach diagonal entry the negative of the sum of the square of an element of kx and thesquare of an element of ky. Now ŷ(t) is taken to be the forward solution instead ofy(t).In our experiment we let Lx = Ly = 40π, Nx = Ny = 27, and the initial conditiony0 is given by a field of Gaussian noise. We employ contour integration to evaluatethe ϕ-functions, using a parabolic contour with 32 quadrature points. The actualparameter fields r and g are taken to be piecewise constant, as shown in Figure 8.1.The value of r is 2 in the outer strips and 0.04 in the inner strip. The value of g is−1 in the outer strips and 1 in the inner strip. The solution y(t) at time t = 50s147Chapter 8. Numerical Experiment0 20 40 60 80 100 120x020406080100120y0.511.52(a) r0 20 40 60 80 100 120x020406080100120y-1-0.500.51(b) gFigure 8.1: Actual parameter valuesis shown in Figure 8.2. The outer vertical strips in the solution evolve fairly quicklyrelative to the central vertical strip, which evolves on a much slower time scale dueto the small value of r there.Derivatives of nThe adjoint solution procedure and sensitivity computations require the derivativesofn(ŷ,m, t) = F(diag (r)F−1ŷ(t) + diag (g)(F−1ŷ(t))2 − (F−1ŷ(t))3)with respect to ŷ(t) and m, as well as their transposes. We have∂n(t)∂ ŷ= F(diag (r) + 2 diag (g)(F−1ŷ(t))− 3 (F−1ŷ(t))2)F−1∂n(t)∂m=[diag (ŷ(t)) F diag((F−1ŷ(t))2)]∂n(t)∂ ŷ⊤= F−⊤(diag (r) + 2 diag (g)(F−1ŷ(t))− 3 (F−1ŷ(t))2)F⊤∂n(t)∂m⊤= diag (ŷ(t))diag((F−1ŷ(t))2)F⊤ .148Chapter 8. Numerical Experiment0 20 40 60 80 100 120x020406080100120y-2-1.5-1-0.500.511.52Figure 8.2: Ground truth solution at t = 50s.The transposes of F and F−1 are F⊤ = NF−1 and F−⊤ = 1NF, with N = NxNy, sothat∂n(t)∂ ŷ⊤= F(diag (r) + 2 diag (g)(F−1ŷ(t))− 3 (F−1ŷ(t))2)F−1∂n(t)∂m⊤= diag (ŷ(t))N diag((F−1ŷ(t))2)F−1 .Note that∂n(t)∂ ŷ⊤=∂n(t)∂ ŷ.Order of AccuracyWe numerically illustrate that the adjoint solution and the gradient have the sameorder of accuracy as the forward solution, meaning that pth-order forward schemes areexpected to lead to pth-order adjoint schemes. In this subsection we let y(nτ), λ(nτ)and∇(nτ) denote the quantity of interest one gets when performing the computationswith a time-step nτ , where we let τ = 180s. We let yexact, λexact and ∇exact denote149Chapter 8. Numerical ExperimentEulerCox-MatthewsKrogstadHochbruck-Ostermannpy (2τ, τ) 0.9914 4.0644 4.0699 4.0275py (4τ, 2τ) 0.9908 3.9726 3.9732 3.9719py (8τ, 4τ) − 3.9375 3.9378 3.9387py (16τ, 8τ) − 3.8849 3.8835 3.8770Table 8.1: Approximate order of accuracy py for the forward solution, computedusing py (2iτ, 2i−1τ) = log2 (‖ǫy(2iτ)‖/‖ǫy(2i−1τ)‖) for i = 1, 2, 3, 4, with ǫy(2iτ) =y(2iτ)− yexact.the "exact" values attained by performing the computations with a fine time-step1160s using the Krogstad scheme. The error between the computed and exact forwardsolution is ǫy(nτ) = y(nτ)− yexact, and we must have ‖ǫy(nτ)‖ ≈ O(nτ)p for a pth-order method, from which it follows that (‖ǫy(2iτ)‖/‖ǫy(2i−1τ)‖) ≈ 2p. Computingpy (2iτ, 2i−1τ) := log2 (‖ǫy(2iτ)‖/‖ǫy(2i−1τ)‖) for different values of i = 1, 2, . . . thengives an approximation of the order of accuracy p, with the estimate being moreaccurate for smaller values of τ and i. The results for the four ETD schemes discussedin this thesis are given in Table 8.1. We note the following:• The simulations were run from 0s to 20s for the forward solution, and from 20s to0s for the adjoint solution.• The initial condition is Gaussian noise and the approximate orders of accuracyactually differ from simulation to simulation because of this. These differences areonly slight for the higher-order methods, but can be quite pronounced for the lower-order Euler method. Therefore the results shown are the averages of 10 simulationsusing different initial conditions.• The larger time steps are too large for the Euler method, and even the smallertime-steps can lead to inaccurate results on occasion. In computing the averageorder of accuracy we have therefore only included the computed py’s that are inthe interval (p− 0.5, p+ 0.5).• Incidentally, for the Swift-Hohenberg equation with periodic boundary conditions,150Chapter 8. Numerical ExperimentEulerCox-MatthewsKrogstadHochbruck-Ostermannpλ (2τ, τ) 0.9976 4.0383 4.0588 3.9902pλ (4τ, 2τ) 0.9434 3.9516 3.9568 3.9679pλ (8τ, 4τ) − 3.8969 3.9041 3.9343pλ (16τ, 8τ) − 3.8027 3.8100 3.8616Table 8.2: Approximate order of accuracy pλ for the adjoint solution, computedusing pλ (2iτ, 2i−1τ) = log2 (‖ǫλ(2iτ)‖/‖ǫλ(2i−1τ)‖) for i = 1, 2, 3, 4, with ǫλ(2iτ) =λ(2iτ)− λexact.EulerCox-MatthewsKrogstadHochbruck-Ostermannp∇ (2τ, τ) 1.0260 4.0430 4.0627 3.9947p∇ (4τ, 2τ) 0.9270 3.9546 3.9575 3.9684p∇ (8τ, 4τ) − 3.9022 3.9056 3.9347p∇ (16τ, 8τ) − 3.8103 3.8136 3.8622Table 8.3: Approximate order of accuracy p∇ for the gradient, computed usingp∇ (2iτ, 2i−1τ) = log2 (‖ǫ∇(2iτ)‖/‖ǫ∇(2i−1τ)‖) for i = 1, 2, 3, 4, with ǫ∇(2iτ) =∇(2iτ)−∇exact.we see that Cox-Matthews and Krogstad schemes do indeed attain fourth orderaccuracy.The quantities pλ (2iτ, 2i−1τ) and p∇ (2iτ, 2i−1τ) are defined analogously to py and aregiven in Tables 8.2 and 8.3, respectively. We have again averaged the approximateorders of accuracy from 10 simulations.The crucial observation here is that the orders of accuracy of the adjoint ETDschemes and the resulting gradient are the same as that of the corresponding forwardscheme.Testing the Rosenbrock ApproachTo test the order of accuracy of adjoint exponential Rosenbrock methods, and sim-ulataneously check that our derivations and implementations for these methods arecorrect, the Swift-Hohenberg equation was reformulated by finding the Jacobian of151Chapter 8. Numerical ExperimentEulerCox-MatthewsKrogstadHochbruck-Ostermannpy (4τ, 2τ) 1.9505 4.1970 4.1967 4.1213py (8τ, 4τ) 1.9713 4.0774 4.0811 4.0561py (16τ, 8τ) 1.9846 3.7946 3.7991 3.7884Table 8.4: Approximate order of accuracy py for the forward solution using a Rosen-brock approach, computed using py (2iτ, 2i−1τ) = log2 (‖ǫy(2iτ)‖/‖ǫy(2i−1τ)‖) fori = 1, 2, 3, 4, with ǫy(2iτ) = y(2iτ)− yexact.EulerCox-MatthewsKrogstadHochbruck-Ostermannpλ (4τ, 2τ) 1.8885 4.2572 4.2482 4.2244pλ (8τ, 4τ) 1.9414 4.0531 4.0498 4.0411pλ (16τ, 8τ) 1.9702 3.1043 3.1029 3.0998Table 8.5: Approximate order of accuracy pλ for the adjoint solution using a Rosen-brock approach, computed using pλ (2iτ, 2i−1τ) = log2 (‖ǫλ(2iτ)‖/‖ǫλ(2i−1τ)‖) fori = 1, 2, 3, 4, with ǫλ(2iτ) = λ(2iτ)− λexact.the right-hand side of (8.3),∂ f∂ ŷ= L̂+ F(diag (r) + 2 diag (g) diag(F−1ŷ(t))− 3 diag (F−1ŷ(t)))F−1,and then setting Lk =∂ f(ŷk)∂ ŷat the kth time-step. Consequently, with yk = F−1ŷk,Lk = L̂ + F diag(r+ 2g⊙ yk − 3y2k)F−1,and hencenk = f − Lk ŷ(t) = F diag(g ⊙ (F−1ŷ(t)− 2yk)− (F−1ŷ(t))2 − 3y2k)F−1ŷ(t).The experiment from the previous subsection is repeated, but due to the significantincrease in computational effort required by the additional terms in the derivativeswe have run the simulations for only 2 different random initial conditions, and for asmallest time-step of 2τ , with the "exact" equations computed using a time-step of τ .152Chapter 8. Numerical ExperimentEulerCox-MatthewsKrogstadHochbruck-Ostermannpλ (4τ, 2τ) 1.8730 4.2892 4.2779 4.2580pλ (8τ, 4τ) 1.9345 4.0911 4.0839 4.0764pλ (16τ, 8τ) 1.9669 3.4815 3.4758 3.4754Table 8.6: Approximate order of accuracy p∇ for the gradient using a Rosen-brock approach, computed using p∇ (2iτ, 2i−1τ) = log2 (‖ǫ∇(2iτ)‖/‖ǫ∇(2i−1τ)‖) fori = 1, 2, 3, 4, with ǫ∇(2iτ) = ∇(2iτ)−∇exact.We have also run the simulations for just 10s instead of 20s. The results in tables 8.4-8.6 suggest that the adjoint exponential Rosenbrock method does indeed also attainthe same (numerical) order of accuracy as the corresponding forward method. Oddlythe Euler method seems to have an order of accuracy of around 2 instead of theexpected value of 1, but this result should be viewed with reservation. The resultsfor the largest time-step suggest that this time-step was too large, with a noticeabledecrease in the order of accuracy especially for the adjoint solution.Parameter EstimationAlthough outside the stated scope of this study, we show a possible parameter estimateattained using a gradient-based optimization method, where the gradient is computedusing the results from this thesis. Recovering estimates that are closer to the groundtruth parameters takes a more sophisticated approach than the one used here. Inparticular, the regularization term R(m,mref) in (1.3), which is not the focus of thisexample, may have to be altered, or a level set method may be introduced; this is thesubject of a future investigation. The initial guesses for the parameters are shown inFigures 8.3a and 8.3b, and the recovered parameters are shown in Figures 8.3c and8.3d. Here are details of the setup:• The simulations were run using the Krogstad scheme with a time-step of 0.25s.• Observations were taken every 0.5s up to 25s, and 5% noise was added to eachobservation.153Chapter 8. Numerical Experiment0 20 40 60 80 100 120x020406080100120y0.511.52(a) Initial guess for r0 20 40 60 80 100 120x020406080100120y-1-0.500.51(b) Initial guess for g0 50 100x020406080100120y00.511.52(c) Recovered r0 50 100x020406080100120y-1-0.500.51(d) Recovered gFigure 8.3: Recovered parameter values (bottom) and initial guesses (top)154Chapter 8. Numerical Experiment• The standard least-squares misfit function M = ‖d(m)− dobs‖ was employed.• TV regularization using the smoothed Huber norm was used with β = 10.• The parameters were recovered using 400 iterations of L-BFGS (keeping 20 previousupdates in storage) with cubic line search. The results after 200 iterations werealready very similar to those in Figure 8.3.• We used a projected gradient method, where the value of r at each point wasconstrained to always lie in the interval [0.01, 2.3] and the value of g at each pointwas in [−1.2, 1.2].We make the following remarks on the results:• The recovered values of r are quite acceptable, as are the values of g on the twooutside vertical strips, whereas the estimate of the central vertical strip of g is poor.• This is the result of the observations having only been taken for up to 25s, a timeafter the patterns on the outside vertical strips have formed but before the centralpattern, which evolves on a much slower time scale, has formed.• Interestingly, including observations at later times - after the spotty pattern hasformed - actually leads to parameter estimates that are qualitatively worse overall.This is connected to the fact that the central spotty pattern evolves on a muchslower time-scale. Presumably it is because the later observation times contain thespotty pattern in the central region instead of the stripe-like pattern occurring inthis region in the initial simulated data, which is based on the initial guess for theparameters and contains the stripe-like pattern in the central region at all times.When only a relatively small number of observations are included that have a spottypattern there is a bias towards a non-spotty pattern in the central region, leadingto the recovery above.Once more late observations are included, which all contain the spotty pattern,there will be a bias towards finding parameters that recreate this pattern and for155Chapter 8. Numerical Experimentthat the parameters used as an initial guess are simply too far from the true values.As a result, the minimization procedure ends up at a local minimum that is too farfrom the actual minimum, leading to a poor recovery.156Chapter 9ConclusionWe have systematically derived the expressions required by the discrete adjoint methodfor several high-order time-stepping methods. Our results for IMEX, staggered andexponential time-stepping methods are new and can potentially play an importantrole in the field of large-scale model calibration problems involving nonlinear time-dependent PDEs that are solved using any of these methods. While the adjoint meth-ods for regular LM and RK methods have already been found previously, we haveprovided details on their implementation and that of several associated expressions.The discrete adjoint method was used to derive algorithms for computing thesensitivity matrix with respect to three distinct sets of parameters: the model pa-rameters, source parameters, and the initial condition. The expression for the Hessianof the data misfit with respect to these parameters was also found. These formulasrequire the solution of the adjoint problem, which is solved using an adjoint time-stepping scheme corresponding to the forward scheme. The methods also require thefirst and, when computing the Hessian, second derivatives of the time-stepping vector,and these have been addressed in detail.Our formulations assumed some generic data misfit M, but we also derived thederivatives of three discrete data misfit functions that are applicable to seismic imag-ing and related fields. In particular, the interferometric misfit is an interesting ideathat might be well-suited to certain types of seismic imaging problems. We hope thathaving the gradient of with respect to the data available will aid in the further studyof this misfit.Not discussed in this thesis but an important contribution nonetheless is that157Chapter 9. Conclusionwe have implemented the results from this study in MATLAB. We use an object-oriented approach, and the modular design allows for easy addition of new time-stepping methods and PDEs. This code will be used to apply our results to modelcalibration problems in a variety of applications.One such application of interest to us is pattern formation when the model param-eters exhibit spatial variability, and we have applied the techniques from this studyto a simple problem involving the Swift-Hohenberg model. A more in-depth exami-nation into parameter estimation in pattern formation problems will be the subjectof future work.A simple experiment reveals that the adjoint exponential integrator and the com-puted gradient of the misfit function have the same order of accuracy as the cor-responding exponential integrator. This also holds for other time-stepping methodswhen applied to other simple problems, although we have not shown these results dueto space constraints. This of course does not constitute a general proof that the orderof accuracy of the adjoint time-stepping method matches that of the correspondingforward method. However, for the case when the PDE is linear we have given a simpleargument for Runge-Kutta methods to show that this is indeed the case in general.For exponential time-stepping methods we have found that if the linear opera-tor depends on the solution yk at the current time level (as is often the case withRosenbrock-type methods) or the model parameters, the derivatives of the ϕ-functionswith respect to yk and m introduce significant computational overhead. The extraexpense introduced in this case could be prohibitive in some applications, and it mightthen be more reasonable to instead apply an IMEX method to the semi-linearizedPDE. The use of IMEX methods in the context of parameter estimation will be thesubject of a future investigation.The time-stepping methods we considered can broadly be classified as either linearmultistep type methods or Runge-Kutta type methods. The internal stages of theRK method are needed when computing the derivatives of M, so these will have to158Chapter 9. Conclusionbe stored, leading to an increase in storage overhead. For explicit RK methods thelast internal stage does not need to be stored since it will not be used. LM methodsdo not have internal stages that need to be stored, however explicit higher-order RKmethods allow for larger time steps than Adams methods so they may still lead toless required storage space overall.Regular, IMEX, staggered and exponential time integration methods are four ofthe main time-stepping methods that lend themselves to high-order time-stepping,but our treatment is not exhaustive and there are of course others. General linearmethods (GLMs) [14, 38, 46, 55] combine the multistage approach of RK methodswith the multistep approach of LM methods. In predictor-corrector methods [16] aninitial prediction step estimates the solution at a given time-step and is followed bya correction step that then refines the approximated solution. Geometric integrators[53] are an important method for solving Hamiltonian ODE systems since they pre-serve the structure of the phase space of the ODE. These methods can be tackledwith the same techniques we have used here.As mentioned, our results can of course also be applied to lower-order methodsin the time-stepping families we have discussed. There are several other lower-ordermethods, usually specialized to certain types of applications, that we do not considerhere, for instance Verlet integration and the Newmark-beta method. We also onlyconsider time-stepping methods for ODE systems in first-order form, so this excludesmethods designed for use with second-order ODE systems such as Beeman’s algorithmand Runge-Kutta-Nyström methods.159Bibliography[1] M. Alexe and A. Sandu. Forward and Adjoint Sensitivity Analysis with Contin-uous Explicit Runge-Kutta Schemes. Applied Mathematics and Computation,208(2):328–334, 2009.[2] M. Alexe and A. Sandu. On the Discrete Adjoints of Variable Step TimeIntegrators. Journal of Computational and Applied Mathematics, 233:1005–1020, 2009.[3] M. Alexe and A. Sandu. Space-time Adaptive Solution of Inverse Problemswith the Discrete Adjoint Method. Journal of Computational Physics, 270:21–39, 2014.[4] T. Apel and T. G. Flaig. Crank-Nicolson schemes for optimal control problemswith evolution equations. SIAM Journal on Numerical Analysis, 50:1484–1512,2012.[5] U. M. Ascher. Numerical Methods for Evolutionary Differential Equations.SIAM, 2008.[6] U. M. Ascher and L. Petzold. Computer Methods for Ordinary DifferentialEquations and Differential-Algebraic Equations. SIAM, 1998.[7] U.M. Ascher, S.J. Ruuth, and R.J. Spiteri. Implicit-Explicit Runge-Kutta meth-ods for time-dependent partial differential equations. Applied Numerical Math-ematics, 25:151–167, 1997.160Bibliography[8] U.M. Ascher, S.J. Ruuth, and B.T.R. Wetton. Implicit-Explicit methods fortime-dependent PDE’s. SIAM J. Numer. Anal., 32:797–823, 1995.[9] R. C. Aster, B. Borchers, and C. H. Thurber. Parameter Estimation and InverseProblems. Academic Press, 2 edition, 2012.[10] L. Biegler, G. Biros, O. Ghattas, and M. Heinkenschloss et al., editors. LargeScale Inverse Problems and Quantification of Uncertainty. John Wiley & Sons,2010.[11] H. G. Bock, T. Carraro, W. Jäger, S. Körkel, R. Rannacher, and J. Schlöder, ed-itors. Model Based Parameter Estimation: Theory and Applications. Springer,2013.[12] J. F. Bonnans and J. Laurent-Varin. Computation of order conditions for sym-plectic partitioned Runge-Kutta schemes with application to optimal control.Numerische Mathematik, 103:1–10, 2006.[13] P. Brenner, M. Crouzeix, and V. Thomée. Single step methods for inhomoge-neous linear differential equations in Banach space. RAIRO Analyse Numerique,16:5–26, 1982.[14] J. Butcher. A Modified Multistep Method for the Numerical Integration ofOrdinary Differential Equations. Journal of the ACM (JACM), (12):124–135,1965.[15] J. C. Butcher. The Numerical Analysis of Ordinary Differential Equations:Runge-Kutta and General Linear Methods. John Wiley, Chichester, UK, 1987.[16] J. C. Butcher. Numerical Methods for Ordinary Differential Equations. JohnWiley, Chichester, UK, 2003.[17] M. Cara and J. J. Leveque. Waveform inversion using secondary observables.Geophysical Research Letters, 14(10):1046–1049, 1987.161Bibliography[18] J. Certaine. The solution of ordinary differential equations with large timeconstants. In A. Ralston and H.S. Wilf, editors, Mathematical Methods forDigital Computers. Wiley, New York, 1960.[19] G. Chavent. Identification of function parameters in partial differential equa-tions. In R.E. Goodson and American Automatic Control Council, editors,Identification of Parameters in Distributed Systems: Symposium, Joint Auto-matic Control Conference, Austin, June 1974, Papers. American Society ofMechanical Engineers, New York, 1974.[20] P. Chen, L. Zhao, and T. H. Jordan. Full 3D tomography for the crustalstructure of the Los Angeles region. Bulletin of the Seismological Society ofAmerica, 97:1094–1120, 2007.[21] I-C. Chou and E. O. Voit. Recent developments in parameter estimation andstructure identification of biochemical and genomic systems. Mathematical Bio-sciences, 219:57–83, 2009.[22] A. Cioaca, M. Alexe, and A. Sandu. Second Order Adjoints for SolvingPDE-Constrained Optimization Problems. Optimization Methods and Software,27:625–653, 2012.[23] A. Cioaca and A. Sandu. An Optimization Framework to Improve 4DVar DataAssimilation System Performance. Journal of Computational Physics, 275:377–389, 2014.[24] A. Cioaca, A. Sandu, and E. de Sturler. Efficient Methods for ComputingObservation Impact in 4D-Var Data Assimilation. Computational Geosciences,17:975–990, 2013.[25] A. Cioaca, A. Sandu, E. de Sturler, and E. Constantinescu. Efficient compu-tation of observation impact in 4D-Var data assimilation. In A.M. Dienstfrey162Bibliographyand R.E. Boisvert, editors, Uncertainty Quantification in Scientific Computing,IFIP Advances in Information and Communication Technology, pages 250–263.Springer, 2012.[26] S. M. Cox and P. C. Matthews. Exponential Time Differencing for stiff systems.Journal of Computational Physics, 176:430–455, 2002.[27] M. Crouzeix. Une méthode multipas implicite-explicite pour l’approximationdes équations d’évolution paraboliques. Numerische Mathematik, 35:257–276,1980.[28] S. Daescu, G.R. Carmichael, and A. Sandu. Adjoint Implementation of Rosen-brock Methods Applied to Variational Data Assimilation. In S.E. Gryningand F.A. Schiermeier, editors, Air Pollution Modeling and its Application XIV,pages 361–369. Springer, 2000.[29] G. Dahlquist. Convergence and stability in the numerical integration of ordinarydifferential equations. Mathematica Scandinavica, (3):33–53, 1956.[30] V. Damian, A. Sandu, M. Damian, F. Potra, and G.R. Carmichael. The KineticPreProcessor KPP: A Software Environment for Solving Chemical Kinetics.Computers and Chemical Engineering, 26:1567–1579, 2002.[31] J. E. Dennis and R. B. Schnabel. Numerical Methods for Unconstrained Opti-mization and Nonlinear Equations. SIAM, 1996.[32] A. L. Dontchev, W. W. Hager, and V. M. Veliov. Second-order Runge- Kuttaapproximations in control constrained optimal control. SIAM Journal on Nu-merical Analysis, 38:202–226, 2000.[33] A. M. Dziewonski, J. Mills, and S. Bloch. Residual dispersion measurement - anew method of surface wave analysis. Bulletin of the Seismological Society ofAmerica, 62:129–139, 1972.163Bibliography[34] M. Eiermann and O. G. Ernst. A restarted Krylov subspace method for theevaluation of matrix functions. SIAM Journal of Numerical Analysis, 44:2481–2504, 2006.[35] H. W. Engl, M. Hanke, and A. Neubauer. Regularization of Inverse Problems.Kluwer, 1996.[36] A. Fichtner. Full Seismic Waveform Modeling and Inversion. Springer, 2011.[37] B. Fornberg. High-order finite differences and the pseudospectral method onstaggered grids. SIAM Journal of Numerical Analysis, 27:904–918, 1990.[38] C. W. Gear. Hybrid Methods for Initial Value Problems in Ordinary DifferentialEquations. Society for Industrial and Applied Mathematics, (2):69–86, 1965.[39] C. W. Gear. Numerical Initial Value Problems in Ordinary Differential Equa-tions. Prentice-Hall, 1971.[40] L. S. Gee and T. H. Jordan. Generalized seismological data functionals. Geo-physical Journal International, 111:363–390, 1992.[41] M. Ghrist and B. Fornberg. Two Results Concerning the Stability of StaggeredMultistep Methods. SIAM Journal of Numerical Analysis, 50(4):1849–1860,2012.[42] M. Ghrist, B. Fornberg, and T. A. Driscoll. Staggered Time Integrators forWave Equations. SIAM Journal of Numerical Analysis, 38(3):718–741, 2000.[43] M. L. Ghrist. High-Order Finite Difference Methods for Wave Equations. PhDthesis, University of Colorado, Boulder, Colorado, 2000.[44] J. P. Gollub and J. S. Langer. Pattern formation in nonequilibrium physics.Revisions of Modern Physics, 71:S396–S403, 1999.164Bibliography[45] T. Gou, K. Singh, A. Sandu, A. Hakami, P. Percell, T. Chai, D. Byun, andJ. Seinfeld. CMAQ_ADJ v4.5.4: An adjoint model for EPA’s Community Mul-tiscale Air Quality (CMAQ). http://people.cs.vt.edu/~asandu/Software/CMAQ_ADJ, 2008–2010.[46] W. B. Gragg and H. J. Stetter. Generalized Multistep Predictor-CorrectorMethods. Journal of the ACM (JACM), (11):188–209, 1964.[47] R. Griesse and A. Walther. Evaluating Gradients in Optimal Control: Continu-ous Adjoints versus Automatic Differentiation. Journal of Optimization Theoryand Applications, 122:63–86, 2004.[48] A. Griewank and A. Walther. Evaluating Derivatives: Principles and Tech-niques of Algorithmic Differentiation, volume 105 of Other Titles in AppliedMathematics. SIAM, 2nd edition, 2008.[49] M. D. Gunzburger. Perspectives in Flow Control and Optimization. SIAM,2002.[50] S. Güttel. Rational krylov approximation of matrix functions: Numerical meth-ods and optimal pole selection. GAMM Mitteilungen, 36:8–31, 2013.[51] E. Haber. Computational Methods in Geophysical Electromagnetics. SIAM,2014.[52] W. W. Hager. Runge-Kutta methods in optimal control and the transformedadjoint system. Numerische Mathematik, 87:247–282, 2000.[53] E. Hairer, C. Lubich, and G. Wanner. Geometric Numerical Integration. Num-ber 31 in Springer Series in Computational Mathematics. Springer, 2002.[54] E. Hairer, S. P. Nørsett, and G. Wanner. Solving Ordinary Differential Equa-tions I: Nonstiff Problems. Springer, 2 edition, 1993.165Bibliography[55] E. Hairer and G. Wanner. Multistep-multistage-multiderivative methods forordinary differential equations. Computing, (11):287–303, 1973.[56] E. Hairer and G. Wanner. Solving Ordinary Differential Equations II: Stiff andDifferential-Algebraic Problems. Springer, 2 edition, 1996.[57] A. Hakami, D.K. Henze, J.H. Seinfeld, K. Singh, A. Sandu, S. Kim, D. Byun,and Q. Li. The Adjoint of CMAQ. Environmental Science and Technology,41:7807–7817, 2007.[58] M. Heinkenschloss. Numerical solution of implicitly constrained optimizationproblems. Technical Report TR08-05, Rice University, Department of Compu-tational and Applied Mathermatics, 2013.[59] M. Herty, L. Pareschi, and S. Steffensen. Implicit-Explicit Runge-KuttaSchemes for Numerical Discretization of Optimal Control Problems. SIAMJournal on Numerical Analysis, 51:1875–1899, 2013.[60] M. Herty and V. Schleper. Time discretizations for numerical optimization ofhyperbolic problems. App. Math. Comp., 218:183–194, 2011.[61] M. Hochbruck and C. Lubich. On Krylov subspace approximations to the ma-trix exponential operator. SIAM Journal of Numerical Analysis, 34:1911–1925,1997.[62] M. Hochbruck and A. Ostermann. Explicit exponential RungeâĂŞ Kutta meth-ods for semilinear parabolic problems. SIAM Journal of Numerical Analysis,43:1069–1090, 2005.[63] M. Hochbruck and A. Ostermann. Exponential Integrators. Acta Numerica,19:209–286, 2010.[64] M. Hochbruck, A. Ostermann, and J. Schweitzer. Exponential Rosenbrocktypemethods. SIAM Journal of Numerical Analysis, 47:786–803, 2009.166Bibliography[65] M. Hochbruck and J. van den Eshof. Explicit integrators of Rosenbrock-type.Oberwolfach Reports, 3:1107–1110, 2006.[66] V. Isakov. Inverse Problems for Partial Differential Equations. Springer, 2ndedition, 2006.[67] V. Jugnon and L. Demanet. Interferometric inversion: A robust approach tolinear inverse problems. In SEG Technical Program Expanded Abstracts 2013,pages 5180–5184. SEG, 2013.[68] A.-K. Kassam and L. N. Trefethen. Fourth-order time-stepping for stiff PDEs.SIAM Journal of Scientific Computing, 26:1214–1233, 2005.[69] C. Y. Kaya. Inexact Restoration for Runge-Kutta Discretization of OptimalControl Problems. SIAM Journal of Numerical Analysis, 48:1492–1517, 2010.[70] C. A. Kennedy and M. H. Carpenter. Additive Runge-Kutta schemesfor convection-diffusion-reaction equations. Applied Numerical Mathematics,44:139–181, 2003.[71] J. B. Kool, J. C. Parker, and M. T. van Genuchten. Parameter estimationfor unsaturated flow and transport models: A review. Journal of Hydrology,91:255–293, 1986.[72] S. Krogstad. Generalized integrating factor methods for stiff PDEs. Journal ofComputational Physics, 203:72–88, 2005.[73] J. D. Lambert. Computational Methods in Ordinary Differential Equations.John Wiley, 1973.[74] J. D. Lambert. Numerical Methods for Ordinary Differential Equations. JohnWiley, 1991.167Bibliography[75] J.R. Leis and M.A. Kramer. ODESSA - an ordinary differential equation solverwith explicit simultaneous sensitivity analysis. ACM Transactions on Mathe-matical Software, 14(1):61–75, 1986.[76] A. L. Lerner-Lam and T. H. Jordan. Earth structure from fundamental andhigher-mode waveform analysis. Geophysical Journal of the Royal AstronomicalSociety, 75:759–797, 1983.[77] R. J. LeVeque. Finite Difference Methods for Ordinary and Partial DifferentialEquations: Steady State and Time Dependent Problems. SIAM, Philedelphia,2007.[78] V. T. Luan and A. Ostermann. Exponential B-series: The stiff case. SIAMJournal of Numerical Analysis, 51:3431–3445, 2013.[79] V. T. Luan and A. Ostermann. Explicit exponential Runge-Kutta methodsof high order for parabolic problems. Journal of Computational and AppliedMathematics, 256:168–179, 2014.[80] V. T. Luan and A. Ostermann. Stiff order conditions for exponential Runge-Kutta methods of order five. In H. G. Bock et al., editor, Modeling, Simulationand Optimization of Complex Processes - HPSC 2012. Springer, 2014.[81] Y. Luo and G. T. Schuster. Wave-equation traveltime inversion. Geophysics,56:645–653, 1991.[82] P. K. Maini, K. J. Painter, and H. N. P. Chau. Spatial pattern formation inchemical and biological systems. Faraday Transactions, 93:3601–3610, 1997.[83] D. Meidner and B. Vexler. A priori error analysis of the Petrov-Galerkin Crank-Nicolson scheme for Parabolic Optimal Control Problems. SIAM Journal onControl and Optimization, 49:2183–2211, 2011.168Bibliography[84] D. Murai and T. Koto. On Time Staggering for Wave Equations. Journal ofComputational and Applied Mathematics, 235(14):4251–4264, 2011.[85] R. Neidinger. Introduction to Automatic Differentiation and MATLAB Object-Oriented programming. SIAM Review, 52:545–563, 2010.[86] J. Nocedal and S. J. Wright. Numerical Optimization. Springer, 2nd edition,2006.[87] D. Oldenburg, E. Haber, and R. Shekhtman. Three dimensional inversion ofmulti-source time domain electromagnetic data. J. Geophysics, 78:E47–E57,2013.[88] A. Ostermann and M. Roche. Runge-Kutta methods for partial differentialequations and fractional orders of convergence. Mathematics of Computing,59:403–420, 1992.[89] K. M. Owolabi and K. C. Patidar. Higher-order time-stepping methods for time-dependent reactionâĂŞdiffusion equations arising in biology. Applied Mathemat-ics and Computation, 240:30–50, 2014.[90] F.-E. Plessix. A review of the adjoint-state method for computing the gradientof a functional with geophysical applications. Geophys. J. Int., 167:495–503,2006.[91] D. A. Pope. An exponential method of numerical integration of ordinary differ-ential equations. Communications of the Association for Computing Machinery,6:491–493, 1963.[92] V. Rao and A. Sandu. A-posteriori Error Estimates for Inverse Problems.SIAM/ASA Journal on Uncertainty Quantification, 3:737–761, 2015.[93] K. Rothauge, E. Haber, and U.M. Ascher. The Discrete Adjoint Method forExponential Integration. 2016. Submitted.169Bibliography[94] K. Rothauge, E. Haber, and U.M. Ascher. The Discrete Adjoint Method forLarge-Scale Optimization Problems with Linear Time-Dependent PDE Con-straints. 2016. Submitted.[95] M. P. Rumpfkeil and D. J. Mavriplis. Efficient Hessian Calculations usingAutomatic Differentiation and the Adjoint Method with applications. AIAAJournal, 48:2406–2417, 2010.[96] S. J. Ruuth. Implicit-explicit methods for reaction-diffusion problems inpattern-formation. J. Math. Biol., 34:148–176, 1995.[97] Y. Saad. Iterative Methods for Sparse Linear Systems. SIAM, 2nd edition, 2003.[98] S. Salmia and J. Toivanena. IMEX schemes for pricing options under jump-diffusion models. Applied Numerical Mathematics, 84:33–45, 2014.[99] A. Sandu. Sensitivity Analysis of ODE via Automatic Differentiation. Master’sthesis, University of Iowa, Iowa City, Iowa, 1997.[100] A. Sandu. On the Properties of Runge-Kutta Discrete Adjoints. In InternationalConference for Computational Science (ICCS-2006), volume 3994 of LectureNotes in Computer Science, pages 550–557, Berlin, Germany, 2006. Springer.[101] A. Sandu. Reverse Automatic Differentiation of Linear Multistep Methods. InC.H. Bischof, H.M. Bucker, P. Hovland, U. Naumann, and J. Utke, editors,Advances in Automatic Differentiation, volume 10 of Lecture Notes in Compu-tational Science and Engineering, pages 1–11. Springer, 2008.[102] A. Sandu. Solution of Inverse Problems using Discrete ODE Adjoints. InL. Biegler, G. Biros, O. Ghattas, M. Heinkenschloss, D. Keyes, B. Mallick,L. Tenorio, B. van Bloemen Waanders, and K. Willcox, editors, Large ScaleInverse Problems and Quantification of Uncertainty, Advances in AutomaticDifferentiation, chapter 16, pages 345–364. John Wiley & Sons, 2010.170Bibliography[103] A. Sandu, D. Daescu, and G.R. Carmichael. Direct and Adjoint SensitivityAnalysis of Chemical Kinetic Systems with KPP: I - Theory and Software Tools.Atmospheric Environment, 37:5083–5096, 2003. URL: http://dx.doi.org/10.1016/j.atmosenv.2003.08.019.[104] A. Sandu, J. Linford, and V. Rao. MatlODE: Matlab Integration Package forStiff ODEs. http://people.cs.vt.edu/~asandu/Software/MatlODE/matlode.html, 2011–2015.[105] A. Sandu and P. Miehe. Forward, Tangent Linear, and Adjoint Runge-KuttaMethods in KPP-2.2 for Efficient Chemical Kinetic Simulations. InternationalJournal of Computer Mathematics, 87:2458–2479, 2010.[106] A. Sandu and L. Zhang. Discrete Second Order Adjoints in Atmospheric Chem-ical Transport Modeling. Journal of Computational Physics, 227:5949–5983,2008.[107] J. M. Sanz-Serna. Symplectic Runge-Kutta Schemes for Adjoint Equations,Automatic Differentiation, Optimal Control and More. SIAM Review, 58:3–33,2016.[108] J. M. Sanz-Serna, J. G. Verwer, and W. H. Hundsdorfer. Convergence and orderreduction of Runge-Kutta schemes applied to evolutionary problems in partialdifferential equations. Numerische Mathematik, 50:405–418, 1986.[109] T. Schmelzer and L. N. Trefethen. Evaluating matrix functions for exponentialintegrators via Carathéodory-Fejér approximation and contour integral. Elec-tronic Transactions on Numerical Analysis, 29:1–18, 2007.[110] R. Serban and A.C. Hindmarsh. CVODES, the sensitivity-enabled ODE solverin SUNDIALS. Technical Report UCRL-PROC-210300, Lawrence LivermoreNational Laboratory, 2003.171Bibliography[111] K. Sigloch, N. McQuarrie, and G. Nolet. Two-stage subduction history undernorth america inferred from multiple-frequency tomography. Nature Geoscience,1:458–462, 2008.[112] K. Sigloch and G. Nolet. Measuring finite-frequency body-wave amplitudes andtraveltimes. Geophysical Journal International, 167:271–287, 2006.[113] K. Singh, P. Eller, A. Sandu, D. Henze, K. Bowman, M. Kopacz, and M. Lee. To-wards the Construction of a Standard Adjoint GEOS-Chem Model. In Proceed-ings of the 2009 Spring Simulation Multiconference, High Performance Comput-ing Symposium, San Diego, CA, USA, 2009. Society for Computer SimulationInternational.[114] K. Singh, P. Eller, A. Sandu, D. Henze, M. Kopacz, and K. Bowman. GEOS-Chem ADJOINT v7: An adjoint model for Harvard’s GEOS-Chem. http://people.cs.vt.edu/~asandu/Software/GC_AD, 2008–2010.[115] Z. Sirkes and E. Tziperman. Finite difference of adjoint or adjoint of finitedifference? Monthly Weather Review, 125:3373–3378, 1997.[116] C. V. Stewart. Robust parameter estimation in computer vision. SIAM Review,41(3):513–537, 1999.[117] J. Swift and P.C. Hohenberg. Hydrodynamic fluctuations at the convectiveinstability. Phys. Rev. A., 15:319–328, 1977.[118] A. Tarantola. Inverse Problem Theory and Methods for Model Estimation.SIAM, 2004.[119] I. M. Tibuleac, G. Nolet, C. Michaelson, and I. Koulakov. P wave amplitudesin a 3-D Earth. Geophysical Journal International, 155:1–10, 2003.172Bibliography[120] M. Tokman. Efficient integration of large stiff systems of ODEs with exponen-tial propagation iterative (EPI) methods. Journal of Computational Physics,213(2):748–776, 2005.[121] M. Tokman. A new class of exponential propagation iterative methods of Runge-Kutta type (EPIRK). Journal of Computational Physics, 230(24):8762–8778,2011.[122] L. N. Trefethen, J. A. C. Weideman, and T. Schmelzer. Talbot Quadraturesand Rational Approximations. BIT, 46:653–670, 2006.[123] K. van den Doel, U. Ascher, and E. Haber. The lost honour of ℓ2-based regu-larization. Radon Series in Computational and Applied Math, 2013. M. Cullen,M. Freitag, S. Kindermann and R. Scheinchl (Eds).[124] J. M. Varah. Stability restrictions on second order, three-level finite-differenceschemes for parabolic equations. SIAM J. Numer. Anal., 17:300–309, 1980.[125] J. G. Verwer. On Time Staggering for Wave Equations. Journal of ScientificComputing, 33(2):139–154, 2007.[126] J. G. Verwer. Time Staggering for Wave Equations Revisited. CWI-reportMAS-E0710, 2007.[127] C. R. Vogel. Computational Methods for Inverse Problems. SIAM, Philadelphia,2002.[128] A. Walther. Automatic differentiation of explicit Runge-Kutta methods for op-timal control. Computational Optimization and Applications, 36:83–108, 2007.[129] R. Wang, O. Yilmaz, and F. J. Herrmann. Full waveform inversionwith interferometric measurements. Technical Report TR-EOAS-2014-5, 04 2014. URL: https://www.slim.eos.ubc.ca/Publications/Public/TechReport/2014/wang2014SEGfwi/wang2014SEGfwi.html.173[130] H. Weller, S.-J. Lock, and N. Wood. Runge-Kutta IMEX schemes for theHorizontally Explicit/Vertically Implicit (HEVI) solution of wave equations.Journal of Computational Physics, 252:365–381, 2013.[131] H. Zhang. Efficient Time Stepping Methods and Sensitivity Analysis for LargeScale Systems of Differential Equations. PhD thesis, Virginia Polytechnic In-stitute and State University, Blacksburg, Virginia, 8 2014.[132] H. Zhang and A. Sandu. FATODE: A Library for Forward, Adjoint, and Tan-gent Linear Integration of ODEs. Technical Report TR-11-25, Department ofComputer Science, Virginia Polytechnic Institute and State University, 2011.[133] H. Zhang and A. Sandu. FATODE: A Library for Forward, Adjoint, andTangent Linear Integration of ODEs. http://people.cs.vt.edu/~asandu/Software/FATODE/index.html, 2011–2015.[134] H. Zhang and A. Sandu. FATODE: A Library for Forward, Adjoint, andTangent Linear Integration of ODEs. SIAM Journal on Scientific Computing,36:504–523, 2014. URL: http://dx.doi.org/10.1137/130912335.[135] C. Zhou, W. Cai, Y. Luo, G. Schuster, and S. Hassanzadeh. Acoustic wave-equation traveltime and waveform inversion of crosshole seismic data. Geo-physics, 60:765–773, 1995.174Appendix ADerivations of the Derivatives of tIn this appendix we give the derivations of the derivatives of the time-stepping vectort(y,m,y0) (A.1)with respect to the forward solution y, the initial condition y0 and the model param-eters m. We also find expressions for the products of the first derivatives, and theirtransposes, with appropriately-sized arbitrary vectors. These expressions constitutean important part in the calculations of the action of the sensitivity matrix or thecomputation of the gradient or the action of the Hessian. In this chapter y is takento be fixed with respect to y0 and m.It aid readability, if x, y, z and w are some vector quantities of appropriate size,we let∂2x∂y∂zw =∂∂y(∂x∂zw∣∣∣∣w),At times we will use the conventionxz =∂x∂z.175A.1. Regular Time-Stepping MethodsA.1 Regular Time-Stepping MethodsA.1.1 Linear Multistep MethodsThe time-stepping vector for a generic s-step LM method was found in (2.7) in Section2.1.1:t (y,m,y0) = Ay− τBf (y,m,y0) +αy0 − τ β f0 (m,y0) .We have now included the dependence of t on m; y is independent of m and y0.Recall the definitions of the matrices A and B, defined in (2.1) and (2.3), and αand β, defined in (2.4). Also recall that y =[y⊤1 · · · y⊤K]⊤. Let σ = min (k, s)and k⋆ = max (1, k − s). In this subsection we also let w =[w⊤1 · · · w⊤K]⊤andv =[v⊤1 · · · v⊤K]⊤be arbitrary vectors of length KN , wm and vm arbitrary vectorsof length Nm, and w0 and v0 arbitrary vectors of length N .Let us now take the derivative of t = t (y,m,y0) with respect to y, m and y0 inturn:• ∂ t∂yWe have∂ t∂y=[∂ t∂y1∂ t∂y2· · · ∂ t∂yK]= A− τB∂ f1∂y. . .∂ fK∂y .(A.2)It is easy to see that∂ t∂yhas the form of (2.1) and (2.3), with the (k, j) N × Nblock being∂ tk∂yj=α(σ)k−jIN − τ β(σ)k−j∂ fj∂yif k⋆ ≤ j ≤ k0N×N otherwise.(A.3)176A.1. Regular Time-Stepping Methods◦ The product of ∂ tk∂ywith w is∂ tk∂yw =k∑j=k⋆α(σ)k−jwj − τk∑j=k⋆β(σ)k−j∂ fj∂ywj . (A.4)◦ The product of ∂ tk∂yj⊤with wk is∂ tk∂yj⊤wk = α(σ)k−jwk − τ β(σ)k−j∂ fj∂y⊤wk (A.5)and therefore the product of∂ t∂yj⊤with w is∂ t∂yj⊤w =min(K,j+s)∑k=jα(σ)k−jwk − τmin(K,j+s)∑k=jβ(σ)k−j∂ fj∂y⊤wk. (A.6)• ∂ t∂mDifferentiating t with respect to m gives∂ t∂m=∂ t∂m= −τ(B∂ f∂m+ β∂ f0∂m)(A.7)with∂ f∂m=∂ f1 (m)∂m...∂ fK (m)∂m . (A.8)Therefore the kth N ×Nm block of (A.7) is∂ tk∂m= −τk∑j=k⋆β(σ)k−j∂ fj∂m− τ β(k)k∂ f0∂m(A.9)177A.1. Regular Time-Stepping Methodswith β(k)k = 0 if k > s.◦ The product of (A.9) with wm is∂ tk∂mwm = −τk∑j=k⋆β(σ)k−j∂ fj∂mwm − τ β(k)k∂ f0∂mwm. (A.10)◦ The product of the transpose with the k the block of w is∂ tk∂m⊤wk = −τk∑j=k⋆β(σ)k−j∂ fj∂m⊤wk − τ β(k)k∂ f0∂m⊤wk. (A.11)The product of the transpose of∂ tk∂mwith w hence is∂ t∂m⊤w = −τK∑k=1k∑j=k⋆β(σ)k−j∂ fj∂m⊤wk − τ ∂ f0∂m⊤ s∑k=1β(k)k wk. (A.12)• ∂ t∂y0This derivative t with respect to y0 is∂ t∂y0=∂ t∂y0= α− τ β ∂ f0∂y(A.13)and the kth N ×N block is∂ tk∂y0= α(k)k IN − τ β(k)k∂ f0∂y, (A.14)with α(k)k = β(k)k = 0 if k > s.◦ The product of (A.14) with w0 is∂ tk∂y0w0 = α(k)k w0 − τ β(k)k∂ f0∂yw0. (A.15)178A.1. Regular Time-Stepping Methods◦ The product of the transpose with the kth block of w is∂ tk∂y0⊤wk = α(k)k wk − τ β(k)k∂ f0∂y⊤wk. (A.16)The product of the transpose of∂ t∂y0with w therefore is∂ t∂y0⊤w =s∑k=1α(k)k wk − τ∂ f0∂y⊤ s∑k=1β(k)k wk. (A.17)Derivatives of∂ t∂yWe will now find expressions for the derivatives of (A.4)∂ tk∂yw =k⋆∑j=kα(σ)k−jwj − τk⋆∑j=kβ(σ)k−j∂ fj∂ywjwith respect to m, y0 and y.• ∂2t∂m∂ywDifferentiating (A.4) with respect to m gives the kth N ×Nm block of ∂2t∂m∂yw:∂ 2tk∂m∂yw = −τk∑j=k⋆β(σ)k−j∂ 2fj∂m∂ywj. (A.18)◦ The product of (A.18) with vm is(∂ 2tk∂m∂yw)vm = −τk∑j=k⋆β(σ)k−j(∂ 2fj∂m∂ywj)vm. (A.19)179A.1. Regular Time-Stepping Methods◦ The product of its transpose with the kth block of v is(∂ 2tk∂m∂yw)⊤vk = −τk∑j=k⋆β(σ)k−j(∂ 2fj∂m∂ywj)⊤vk. (A.20)• ∂2t∂y0∂ywThe derivative of (A.4) with respect to y0 is∂ 2tk∂y0∂yw = 0N×N . (A.21)• ∂2t∂y∂ywConsider the jth N ×N block of∂ 2tk∂y∂yw =[∂ 2tk∂y1∂yw∂ 2tk∂y2∂yw · · · ∂2tk∂yK ∂yw].Taking the derivative of (A.4) with respect to yj gives the (k, j)th N ×N block of∂ 2t∂y∂yw,∂ 2tk∂yj ∂yw =−τβ(σ)k−j∂ 2fj∂y∂ywj if k⋆ ≤ j ≤ k0N×N otherwise.(A.22)◦ The product of ∂2tk∂yj ∂yw with the jth block of v trivially is(∂ 2tk∂yj ∂yw)vj =−τβ(σ)k−j(∂ 2fj∂y∂ywj)vj if k⋆ ≤ j ≤ k0N×1 otherwise,(A.23)180A.1. Regular Time-Stepping Methodsso that (∂ 2tk∂y∂yw)v = −τk∑j=k⋆β(σ)k−j(∂ 2fj∂y∂ywj)vj . (A.24)◦ The product of the transpose of(∂ 2tk∂yj ∂yw)with the kth block of v is(∂ 2tk∂yj ∂yw)⊤vk =−τβ(σ)k−j(∂ 2fj∂y∂ywj)⊤vk if j ≤ k ≤ j⋆0N×1 otherwise.(A.25)with j⋆ = min(K, j + s).Derivatives of∂ t∂mThe derivatives of (A.10),∂ tk∂mwm = −τk∑j=k⋆β(σ)k−j∂ fj∂mwm − τ β(k)k∂ f0∂mwm= −τk∑j=min(0,k−s)β(σ)k−j∂ fj∂mwm,with respect to m, y0 and y are:• ∂2t∂m∂mwmTaking the derivative of (A.10) with respect to m gives the kth N × Nm block of∂ 2t∂m∂mwm∂ 2tk∂m∂mwm = −τk∑j=min(0,k−s)β(σ)k−j∂ 2fj∂m∂mwm. (A.26)◦ The product with vm is(∂ 2tk∂m∂mwm)vm = −τk∑j=min(0,k−s)β(σ)k−j(∂ 2fj∂m∂mwm)vm. (A.27)181A.1. Regular Time-Stepping Methods◦ The product of its transpose with the kth block of v is(∂ 2tk∂m∂mwm)⊤vk = −τk∑j=min(0,k−s)β(σ)k−j(∂ 2fj∂m∂mwm)⊤vk. (A.28)• ∂2t∂y0∂mwmDifferentiating (A.10) with respect to y0 immediately gives∂ 2tk∂y0∂mwm = −τ β(k)k∂ 2f0∂y∂mwm (A.29)with β(k)k = 0 if k > s.◦ The product with v0 is(∂ 2tk∂y0∂mwm)v0 = −τ β(k)k(∂ 2f0∂y∂mwm)v0. (A.30)◦ The product of its transpose with the kth block of vy is(∂ 2tk∂y0∂mwm)⊤vk = −τ β(k)k(∂ 2f0∂y∂mwm)⊤vk (A.31)with β(k)k = 0 if k > s.• ∂2t∂y∂mwmConsider the jth N ×N block of∂ 2tk∂y∂mwm =[∂ 2tk∂y1∂mwm∂ 2tk∂y2∂mwm · · · ∂2tk∂yK ∂mwm]. (A.32)182A.1. Regular Time-Stepping MethodsThe (k, j)th N ×N block of ∂2t∂y∂mwm is∂ 2tk∂yj ∂mwm =−τβ(σ)k−j∂ 2fj∂y∂mm if k⋆ ≤ j ≤ k0N×N otherwise.(A.33)◦ The product of ∂2tk∂yj ∂mwm with the jth block of v is(∂ 2tk∂yj ∂mwm)vj =−τβ(σ)k−j(∂ 2fj∂y∂mm)vj if k⋆ ≤ j ≤ k0N×1 otherwise.(A.34)◦ The product of the transpose of ∂2tk∂yj ∂mwm with the kth block of v is(∂ 2tk∂yj ∂mwm)⊤vk =−τβ(σ)k−j(∂ 2fj∂y∂mm)⊤vk if j ≤ k ≤ j⋆0N×1 otherwise.(A.35)with j⋆ = min(K, j + s).Derivatives of∂ t∂y0w0Finally, we find expressions for the derivatives of (A.15),∂ tk∂y0w0 = α(k)k w0 − τ β(k)k∂ f0∂yw0,where α(k)k = β(k)k = 0 if k > s, with respect to m, y0 and y:• ∂2t∂m∂y0w0183A.1. Regular Time-Stepping MethodsDifferentiating (A.15) with respect tom gives the kth N×Nm block of ∂2t∂m∂y0w0:∂ 2tk∂m∂y0w0 = −τ β(k)k∂ 2f0∂m∂yw0 (A.36)with β(k)k = 0 if k > s.◦ The product with vm is(∂ 2tk∂m∂y0w0)vm = −τ β(k)k(∂ 2f0∂m∂yw)vm. (A.37)◦ The product of the transpose with the kth block of v is(∂ 2tk∂m∂y0w0)⊤vk = −τ β(k)k(∂ 2f0∂m∂yw)⊤vk. (A.38)• ∂2t∂y0∂y0w0The kth N ×N block of ∂2t∂y0∂y0w0 is obtained by taking the derivative of (A.15)with respect to y0:∂ 2tk∂y0∂y0w0 = −τ β(k)k∂ 2f0∂y∂yw0 (A.39)with β(k)k = 0 if k > s.◦ The product with vm is(∂ 2tk∂y0∂y0w0)v0 = −τ β(k)k(∂ 2f0∂y∂yw)v0. (A.40)◦ The product of the transpose with the kth block of v is(∂ 2tk∂y0∂y0w0)⊤vk = −τ β(k)k(∂ 2f0∂y∂yw)⊤vk. (A.41)184A.1. Regular Time-Stepping Methods• ∂2t∂y∂y0w0Differentiating (A.15) with respect to y immediately gives∂ 2tk∂y∂y0w0 = 0N×KN . (A.42)A.1.2 Runge-Kutta MethodsIn Section 2.1.2 we found the following time-stepping equation for RK methods:t (y,y0) =IsNB⊤ INIsN−IN B⊤ IN. . . . . . . . .IsN−IN B⊤ IN︸ ︷︷ ︸TY1y1Y2y2...YKyK︸ ︷︷ ︸y−F1y0F20N×1...FK0N×1︸ ︷︷ ︸f (y,y0),where B = −τkbs ⊗ IN ∈ RsN×N ,Yk =Yk,1...Yk,s and Fk =Fk,1...Fk,s ,with Fk,σ = Fk,σ (y,y0) = f(yk,σ, tk−1 + cστk). We have let yk,σ:= yk−1+τk∑si=1 aσ,iYk,i.Note that T ∈ R(s+1)KN×(s+1)KN and t, f ,y ∈ R(s+1)KN×1. The kth time-step is rep-resented bytk = Yk − Fk−yk−1 +B⊤Yk + yk .185A.1. Regular Time-Stepping MethodsWe let ŷk =[Y⊤k y⊤k]⊤. We also let w =[ŵ⊤1 · · · ŵ⊤K]⊤and v =[v̂⊤1 · · · v̂⊤K]⊤be arbitrary vectors of length (s+ 1)KN , where ŵ⊤k and v̂⊤k are defined analogouslyto ŷ⊤k . Also, wm and vm are arbitrary vectors of length Nm, and w0 and v0 arearbitrary vectors of length N .We will now take the derivative of t with respect to each of y, m and y0 in turn.For this we need the derivatives of yk,σwith respect to yk−1 and Yk,i, which are∂yk,σ∂yk−1= IN and∂yk,σ∂Yk,i= τkaσiIN .• ∂ t∂yWe break the derivative of t with respect to y into parts:∂ t∂y=[∂ t∂ ŷ1∂ t∂ ŷ2· · · ∂ t∂ ŷK]= T−[∂ f∂ ŷ1∂ f∂ ŷ2· · · ∂ f∂ ŷK]. (A.43)If∂Fk∂ ŷj=[∂Fk∂Yj,1· · · ∂Fk∂Yj,s∂Fk∂yj]denotes the derivative of Fk with respect to ŷj, with 1 ≤ k, j ≤ K, and ∂Fk,σ∂ ŷj,idenotes the (σ, i)th N ×N block of ∂Fk∂ ŷj, with 1 ≤ σ ≤ s and 1 ≤ i ≤ s+ 1, then,using the chain rule,∂Fk,σ∂ ŷj=τk∂Fk,σ∂y([aσ,1 · · · aσ,s 0])⊗ IN if j = k∂Fk,σ∂y([0 · · · 0 1])⊗ IN if j = k − 10N×(s+1)N otherwise.(A.44)186A.1. Regular Time-Stepping MethodsLetting∂Fk∂y=∂Fk,1∂y. . .∂Fk,s∂y , (A.45)the (k, j)th (s+ 1)N × (s+ 1)N block of ∂ t∂ytherefore is∂ tk∂ ŷj=IsN − τk ∂Fk∂y (As ⊗ IN) 0sN×NB⊤ IN if k = j0sN×sN −∂Fk∂y (1s ⊗ IN)0N×sN −IN if j = k − 10(s+1)N×(s+1)N otherwise.(A.46)DefiningAk := IsN − τk ∂Fk∂y(As ⊗ IN)Ck := −∂Fk∂y(1s ⊗ IN) ,(A.47)we can then write∂ t∂y=A1B⊤ INC2 A2−IN B⊤ IN. . . . . . . . .CK AK−IN B⊤ IN, (A.48)which is a lower block-triangular matrix.187A.1. Regular Time-Stepping Methods◦ The kth (s+ 1)N × 1 block of the product ∂ t∂yw is∂ tk∂yw = AkWk +Ckwk−1B⊤Wk +wk −wk−1 (A.49)with w0 = 0N×1. If∂ tk,σ∂yw is the σth N ×1 block of ∂ tk∂yw, with 1 ≤ σ ≤ s+1,we have∂ tk,σ∂yw =Wk,σ − ∂Fk,σ∂ywk,σ if σ ≤ s−τks∑i=1biWk,i +wk −wk−1 if σ = s+ 1,(A.50)where wk,σ = wk−1 + τks∑i=1aσiWk,i.◦ The transpose of ∂ t∂yis upper block-triangular:∂ t∂y⊤=A⊤1 BIN C⊤2 −INA⊤2 B. . . . . . . . .IN C⊤K −INA⊤K BIN. (A.51)The kth (s+1)N×1 block of the product of ∂ t∂y⊤with some arbitraryw thereforeis∂ t∂ ŷk⊤w = A⊤kWk +BwkC⊤k+1Wk+1 +wk −wk+1 , (A.52)with WK+1,σ = wK+1 = 0N×1. The σth N × 1 block ∂ t∂ ŷkσ⊤w of∂ t∂ ŷk⊤w, with188A.1. Regular Time-Stepping Methods1 ≤ σ ≤ s+ 1, is∂ t∂ ŷkσ⊤w =Wkσ − τks∑i=1aiσ∂Fki∂y⊤Wki − τkbσwk if σ ≤ s−s∑i=1∂Fki∂y⊤Wk+1,i +wk −wk+1 if σ = s+ 1.(A.53)• ∂ t∂mTaking the derivative of t with respect to m,∂ t∂m=∂∂m(Ty − f) = − ∂ f∂m(A.54)with∂ f∂m=[∂F1∂m⊤0Nm×N · · ·∂Fk∂m⊤0Nm×N]⊤, (A.55)where∂Fk∂m=[∂Fk,1∂m⊤ ∂Fk,2∂m⊤· · · ∂Fk,s∂m⊤]⊤. (A.56)◦ The product ∂ tk∂mwm is simply∂ tk∂mwm = −∂Fk∂mwm0N×1 . (A.57)◦ The product ∂ tk∂m⊤ŵk is∂ tk∂m⊤ŵk = −s∑σ=1∂Fkσ∂m⊤Wkσ. (A.58)• ∂ t∂y0189A.1. Regular Time-Stepping MethodsTaking the derivative of t with respect to y0,∂ t∂y0=∂∂y0(Ty − f) = − ∂ f∂y0(A.59)with∂ f∂y0=∂F1∂y∂y1∂y0IN0(s+1)(K−1)N×N , where ∂y1∂y0 =∂y1,1∂y0...∂y1,s∂y0 = 1s ⊗ IN . (A.60)◦ The product ∂ tk∂y0w0 is∂ tk∂y0w0 =−∂F1∂y (1s ⊗w0)w0 if k = 10N×1 otherwise,. (A.61)◦ The product ∂ tk∂y0⊤ŵk is∂ tk∂y0⊤ŵk =−s∑σ=1∂F1,σ∂y⊤W1,σ −w1 if k = 10N×1 otherwise.(A.62)190A.1. Regular Time-Stepping MethodsDerivatives of∂ t∂yWe will now find expressions for the derivatives of the vector∂ tk,σ∂yw (A.50),∂ tk,σ∂yw =Wk,σ − ∂Fk,σ∂ywk,σ if σ ≤ s−τks∑i=1biWk,i +wk −wk−1 if σ = s+ 1,with respect to m, y0 and y.∂ tk,σ∂yw is the σth N × 1 block of ∂ tk∂yw, with 1 ≤ σ ≤s+ 1, which in turn is the kth (s+ 1)N × 1 block of ∂ t∂yw.• ∂2 t∂y∂ywConsider the jth term of∂ 2 tk∂y∂yw =[∂ 2 tk∂ ŷ1∂yw∂ 2 tk∂ ŷ2∂yw · · · ∂2 tk∂ ŷK ∂yw]. (A.63)Taking the derivative of (A.50) with respect to ŷj,∂ 2tk,σ∂ ŷj ∂yw =−∂2Fk,σ∂ ŷj ∂ywk,σ if 1 ≤ σ ≤ s0N×(s+1)N if σ = s + 1,(A.64)where wk,σ = wk−1 + τks∑i=1aσiWk,i, w0 = 0N×1 and the term∂ 2Fk,σ∂ ŷj ∂ywk,σ is∂ 2Fk,σ∂ ŷj ∂ywk,σ =τk∂ 2Fk,σ∂y∂ywk,σ([aσ,1:s 0]⊗ IN)if j = k∂ 2Fk,σ∂y∂ywk,σ([01×s 1]⊗ IN)if j = k − 10N×(s+1)N otherwise.(A.65)191A.1. Regular Time-Stepping Methods◦ The product of ∂2 tk∂ ŷj ∂yw with v̂j is(∂ 2 tk∂ ŷj ∂yw)v̂j =(∂ 2 tk,1∂ ŷj ∂yw)v̂j...(∂ 2 tk,s+1∂ ŷj ∂yw)v̂j , (A.66)with(∂ 2tk,σ∂ ŷk ∂yw)v̂k =−∂2Fk,σ∂y∂ywk,σ(τks∑i=1aσ,iVk,i)if 1 ≤ σ ≤ s0N×1 if σ = s+ 1,(∂ 2tk,σ∂ ŷk−1∂yw)v̂k−1 =−∂2Fk,σ∂y∂ywk,σvk−1 if 1 ≤ σ ≤ s0N×1 if σ = s + 1,(∂ 2tk,σ∂ ŷj ∂yw)⊤v̂j = 0N×1 for all other values of j,where v0 = 0N×1.◦ The product of the transpose of ∂2 tk∂ ŷj ∂yw with v̂k is(∂ 2 tk∂ ŷj ∂yw)⊤v̂k =(∂ 2 tk∂ ŷj,1∂yw)⊤v̂k...(∂ 2 tk∂ ŷj,s+1∂yw)⊤v̂k , (A.67)with(∂ 2tj∂ ŷj,σ ∂yw)⊤v̂j =−τjs∑i=1aiσ(∂ 2Fj,σ∂ ŷj ∂ywj,σ)⊤Vj,i if 1 ≤ σ ≤ s0N×1 if σ = s+ 1,192A.1. Regular Time-Stepping Methods(∂ 2tj+1∂ ŷj,σ ∂yw)⊤v̂j+1 =0N×1 if 1 ≤ σ ≤ s−s∑i=1(∂ 2Fj+1,i∂ ŷj,σ ∂ywj+1,i)⊤Vj+1,i if σ = s+ 1,(∂ 2tk∂ ŷj,σ ∂yw)⊤v̂k = 0N×1 for all other values of k,where VK+1,i = 0N×1.• ∂2t∂m∂ywTaking the derivative of (A.50) with respect to m,∂ 2 tk,σ∂m∂yw =−∂2Fk,σ∂m∂ywk,σ if σ ≤ s0N×Nm if σ = s+ 1,(A.68)where wk,σ = wk−1 + τks∑i=1aσiWk,i and w0 = 0N×1.◦ The product of ∂2 tk∂m∂yw with vm is(∂ 2 tk∂m∂yw)vm =(∂ 2 tk,1∂m∂yw)vm...(∂ 2 tk,s+1∂m∂yw)vm , (A.69)with (∂ 2 tk,σ∂m∂yw)vm =−(∂ 2Fk,σ∂m∂ywk,σ)vm if σ ≤ s0N×1 if σ = s+ 1.193A.1. Regular Time-Stepping Methods◦ The product of the transpose of ∂2 tk∂m∂yw with v̂k is(∂ 2 tk∂m∂yw)⊤v̂k = −s∑σ=1(∂ 2Fk,σ∂m∂ywk,σ)⊤Vk,σ. (A.70)• ∂2 t∂y0∂yDifferentiating (A.50) with respect to y0,∂ 2 t1,σ∂y0∂yw =−∂2F1,σ∂y∂yw1,σ if 1 ≤ σ ≤ s and k = 10N×N otherwise(A.71)where w1,σ = τks∑i=1aσiW1,i. We used the fact that∂ 2F1,σ∂y0∂yw1,σ =∂ 2F1,σ∂y∂yw1,σ∂y1∂y0=∂ 2F1,σ∂y∂yw1,σ.Derivatives of∂ t∂mExpressions for the derivatives of∂ tk,σ∂mwm =−∂Fk,σ∂mwm if 1 ≤ σ ≤ s0N×1 otherwise,(see (A.57)) with respect to m, y0 and y are derived below.∂ tk,σ∂mwm is the σthN × 1 block of ∂ tk∂mwm, with 1 ≤ σ ≤ s + 1, which in turn is the kth (s + 1)N × 1block of∂ t∂mwm.• ∂2 t∂m∂mwm194A.1. Regular Time-Stepping MethodsTaking the derivative of (A.57) with respect to m gives∂ 2 tk,σ∂m∂mwm =− ∂2Fk,σ∂m∂mwm if 1 ≤ σ ≤ s0N×1 otherwise.(A.72)• ∂2 t∂y0∂mwmSimilarly, differentiating (A.57) with respect to y0 gives∂ 2 tk,σ∂y0∂mwm =− ∂2Fk,σ∂y∂mwm if 1 ≤ σ ≤ s and k = 10N×1 otherwise,(A.73)since∂F1,σ∂y0=∂F1,σ∂y∂y1,σ∂y0=∂F1,σ∂y.• ∂2 t∂y∂mwmDifferentiating (A.57) with respect to y gives∂ 2 tk,σ∂y∂mwm =[∂ 2 tk∂ ŷ1∂mwm · · · ∂2 tk∂ ŷK ∂mwm]. (A.74)The jth (s+ 1)N ×N block of ∂2tk,σ∂y∂mwm is∂ 2tk,σ∂ ŷj ∂mwm =− ∂2Fk,σ∂ ŷj ∂mwm if 1 ≤ σ ≤ s0N×1 otherwise,(A.75)195A.1. Regular Time-Stepping MethodsAs before, using the chain rule,∂ 2Fk,σ∂ ŷj ∂mwm =τk∂ 2Fk,σ∂y∂mwm([aσ,1:s 0]⊗ IN)if j = k∂ 2Fk,σ∂y∂mwm([01×s 1]⊗ IN)if j = k − 10N×(s+1)N otherwise.(A.76)Derivatives of∂ t∂y0Lastly, we find expressions for the derivatives of the product (A.61) for k = 1:∂ t1,σ∂y0w0 =−∂F1,σ∂yw0 if 1 ≤ σ ≤ sw0 otherwise.For k > 1,∂ t1,σ∂y0w0 = 0N×1. With respect to m, y0 and y are derived below.∂ tk,σ∂y0w0 is the σth N × 1 block of ∂ tk∂y0w0, with 1 ≤ σ ≤ s+ 1, which in turn is thekth (s+ 1)N × 1 block of ∂ t∂y0w0.• ∂2 t∂y0∂y0w0Differentiating (A.61) with respect to y0 gives∂ 2 t1,σ∂y0∂y0w0 =− ∂F1,σ∂y∂yw0 if 1 ≤ σ ≤ s0N×N otherwise.(A.77)and∂ 2 tk,σ∂y0∂y0w0 = 0N×N for k > 1.• ∂2 t∂m∂y0w0196A.2. Implicit-Explicit Time-Stepping MethodsSimilarly, taking the derivative of (A.61) with respect to m gives∂ 2 t1,σ∂m∂y0w0 =− ∂F1,σ∂m∂yw0 if 1 ≤ σ ≤ s0N×Nm otherwise.(A.78)and∂ 2 tk,σ∂y0∂y0w0 = 0N×Nm for k > 1.• ∂2 t∂y∂y0w0Differentiating (A.61) with respect to y gives∂ 2 tk∂y∂y0w0 =[∂ 2 tk∂ ŷ1∂y0w0 · · · ∂2 tk∂ ŷK ∂y0w0]. (A.79)Clearly∂ 2 tk∂ ŷj ∂y0w0 = 0N×N for j > 1. For j = 1 we have∂ 2t1,σ∂ ŷ1∂y0w0 ={∂ 2F1,σ∂ ŷ1∂y0w0 if 1 ≤ σ ≤ s. (A.80)Using the chain rule we have∂ 2F1,σ∂ ŷ1∂y0w0 =∂ 2F1,σ∂y∂y0w0([aσ,1:s 0]⊗ IN). (A.81)A.2 Implicit-Explicit Time-Stepping MethodsA.2.1 IMEX Linear Multistep MethodsThe time-stepping vector for a generic s-step IMEX LM method was found in (2.22)in Section 2.2.1:t(y,m,y0) = Ay− τBfE (y,m)− τ ΓfI (y,m)− τ β fE0 − τ γ fI0 +αy0197A.2. Implicit-Explicit Time-Stepping Methodst is a vector of length KN and the kth time vector is given by the kth subvector oflength N :tk(y,m,y0) =s∑i=0α(s)i yk−i − τs∑i=1β(s)i fEk−i − τs∑i=0γ(s)i fIk−i.We have now included the dependence of t on m; y is independent of m and y0.Recall the definitions of the matrices A, B and Γ, defined in (2.1) and (2.3), and α,β and γ, defined in (2.19). Also recall that y =[y⊤1 · · · y⊤K]⊤. Let σ = min (k, s)and k⋆ = max (1, k − s). In this subsection we also let w =[w⊤1 · · · w⊤K]⊤andv =[v⊤1 · · · v⊤K]⊤be arbitrary vectors of length KN , wm and vm arbitrary vectorsof length Nm, and w0 and v0 arbitrary vectors of length N .Let us now take the derivative of t = t (y,m,y0) with respect to y, m and y0 inturn:• ∂ t∂yWe have∂ t∂y=[∂ t∂y1∂ t∂y2· · · ∂ t∂yK]= A− τB∂ fE1∂y. . .∂ fEK∂y− τ Γ∂ fI1∂y. . .∂ fIK∂y .(A.82)It is easy to see that∂ t∂yhas the form of (2.1) and (2.3), with the (k, j) N × Nblock being∂ tk∂yj=α(σ)k−jIN − τ β(σ)k−j∂ fEj∂y− τ γ(σ)k−j∂ fIj∂yif k⋆ ≤ j ≤ k0N×N otherwise,(A.83)198A.2. Implicit-Explicit Time-Stepping Methodswith β(σ)0 = 0.◦ The product of ∂ tk∂ywith w is∂ tk∂yw =k∑j=k⋆α(σ)k−jwj − τk−1∑j=k⋆β(σ)k−j∂ fEj∂ywj − τk∑j=k⋆γ(σ)k−j∂ fIj∂ywj . (A.84)◦ The product of ∂ tk∂yj⊤with wk is∂ tk∂yj⊤wk = α(σ)k−jwk − τ β(σ)k−j∂ fEj∂y⊤wk − τ γ(σ)k−j∂ fIj∂y⊤wk (A.85)with β(σ)0 = 0, and therefore the product of∂ t∂yj⊤with w is∂ t∂yj⊤w =min(K,j+s)∑k=jα(σ)k−jwk − τmin(K,j+s)∑k=j+1β(σ)k−j∂ fEj∂y⊤wk+− τmin(K,j+s)∑k=jγ(σ)k−j∂ fIj∂y⊤wk.(A.86)• ∂ t∂mDifferentiating t with respect to m gives∂ t∂m= −τ(B∂ fE∂m+ Γ∂ fI∂m+ β∂ fE0∂m+ γ∂ fI0∂m)(A.87)with∂ fE∂m=∂ fE1 (m)∂m...∂ fEK (m)∂m and ∂ fI∂m=∂ fI1 (m)∂m...∂ fIK (m)∂m . (A.88)199A.2. Implicit-Explicit Time-Stepping MethodsTherefore the kth N ×Nm block of (A.87) is∂ tk∂m= −τk−1∑j=k⋆β(σ)k−j∂ fEj∂m− τk∑j=k⋆γ(σ)k−j∂ fIj∂m− τ β(k)k∂ fE0∂m− τ γ(k)k∂ fI0∂m(A.89)with β(k)k = γ(k)k = 0 if k > s.◦ The product of (A.89) with wm is∂ tk∂mwm = −τk−1∑j=k⋆β(σ)k−j∂ fEj∂mwm − τk∑j=k⋆γ(σ)k−j∂ fIj∂mwm+− τ β(k)k∂ fE0∂mwm − τ γ(k)k∂ fI0∂mwm.(A.90)◦ The product of the transpose with the k the block of w is∂ tk∂m⊤wk = −τk−1∑j=k⋆β(σ)k−j∂ fEj∂m⊤wk − τk∑j=k⋆γ(σ)k−j∂ fIj∂m⊤wk+− τ β(k)k∂ fE0∂m⊤wk − τ γ(k)k∂ fI0∂m⊤wk.(A.91)The product of the transpose of∂ tk∂mwith w hence is∂ t∂m⊤w = −τK∑k=1k−1∑j=k⋆β(σ)k−j∂ fEj∂m⊤wk − τ ∂ fE0∂m⊤ s∑k=1β(k)k wk+− τK∑k=1k∑j=k⋆γ(σ)k−j∂ fIj∂m⊤wk − τ ∂ fI0∂m⊤ s∑k=1γ(k)k wk.(A.92)• ∂ t∂y0This derivative t with respect to y0 is∂ t∂y0= α− τ β ∂ fE0∂y− τ γ ∂ fI0∂y(A.93)200A.2. Implicit-Explicit Time-Stepping Methodsand the kth N ×N block is∂ tk∂y0= α(k)k IN − τ β(k)k∂ fE0∂y− τ γ(k)k∂ fI0∂y, (A.94)with α(k)k = β(k)k = γ(k)k = 0 if k > s.◦ The product of (A.94) with w0 is∂ tk∂y0w0 = α(k)k w0 − τ β(k)k∂ fE0∂yw0 − τ γ(k)k∂ fI0∂yw0. (A.95)◦ The product of the transpose with the kth block of w is∂ tk∂y0⊤wk = α(k)k wk − τ β(k)k∂ fE0∂y⊤wk − τ γ(k)k∂ fI0∂y⊤wk. (A.96)The product of the transpose of∂ t∂y0with the w therefore is∂ t∂y0⊤w =s∑k=1α(k)k wk − τ∂ fE0∂y⊤ s∑k=1β(k)k wk − τ∂ fI0∂y⊤ s∑k=1γ(k)k wk. (A.97)Derivatives of∂ t∂yWe will now find expressions for the derivatives of (A.84)∂ tk∂yw =k∑j=k⋆α(σ)k−jwj − τk−1∑j=k⋆β(σ)k−j∂ fEj∂ywj − τk∑j=k⋆γ(σ)k−j∂ fIj∂ywjwith respect to m, y0 and y.• ∂2t∂m∂yw201A.2. Implicit-Explicit Time-Stepping MethodsDifferentiating (A.84) with respect to m gives the kth N ×Nm block of ∂2t∂m∂yw:∂ 2tk∂m∂yw = −τk−1∑j=k⋆β(σ)k−j∂ 2fEj∂m∂ywj − τk∑j=k⋆γ(σ)k−j∂ 2fIj∂m∂ywj. (A.98)◦ The product of (A.98) with vm is(∂ 2tk∂m∂yw)vm = −τk−1∑j=k⋆β(σ)k−j(∂ 2fEj∂m∂ywj)vm+− τk∑j=k⋆γ(σ)k−j(∂ 2fIj∂m∂ywj)vm.(A.99)◦ The product of the transpose of (A.98) with the kth block of v is(∂ 2tk∂m∂yw)⊤vk = −τk−1∑j=k⋆β(σ)k−j(∂ 2fEj∂m∂ywj)⊤vk+− τk∑j=k⋆γ(σ)k−j(∂ 2fIj∂m∂ywj)⊤vk.(A.100)• ∂ tk∂y0∂ywThe derivative of (A.84) with respect to y0 is simply∂ 2tk∂y0∂yw = 0N×N . (A.101)• ∂2t∂y∂ywConsider the jth N ×N block of∂ 2tk∂y∂yw =[∂ 2tk∂y1∂yw∂ 2tk∂y2∂yw · · · ∂2tk∂yK ∂yw].202A.2. Implicit-Explicit Time-Stepping MethodsTaking the derivative of (A.84) with respect to yj gives the (k, j)th N × N blockof∂ 2t∂y∂yw,∂ 2tk∂yj ∂yw =−τβ(σ)k−j∂ 2fEj∂y∂ywj − τγ(σ)k−j∂ 2fIj∂y∂ywj if k⋆ ≤ j ≤ k0N×N otherwise.(A.102)◦ The product of ∂2tk∂yj ∂yw with the jth block of v trivially is(∂ 2tk∂yj ∂yw)vj =−τ(β(σ)k−j∂ 2fEj∂y∂ywj + γ(σ)k−j∂ 2fIj∂y∂ywj)vjif k⋆ ≤ j ≤ k0N×1 otherwise,(A.103)so that(∂ 2tk∂y∂yw)v = −τk∑j=k⋆(β(σ)k−j∂ 2fEj∂y∂ywj + γ(σ)k−j∂ 2fIj∂y∂ywj)vj . (A.104)◦ The product of the transpose of(∂ 2tk∂yj ∂yw)with the kth block of v is(∂ 2tk∂yj ∂yw)⊤vk =−τ(β(σ)k−j∂ 2fEj∂y∂ywj⊤+ γ(σ)k−j∂ 2fIj∂y∂ywj⊤)vkif j ≤ k ≤ j⋆0N×1 otherwise,(A.105)with j⋆ = min(K, j + s).203A.2. Implicit-Explicit Time-Stepping MethodsDerivatives of∂ t∂mThe derivatives of (A.90),∂ tk∂mwm = −τk−1∑j=k⋆β(σ)k−j∂ fEj∂mwm − τk∑j=k⋆γ(σ)k−j∂ fIj∂mwm+− τ β(k)k∂ fE0∂mwm − τ γ(k)k∂ fI0∂mwm= −τk−1∑j=k†β(σ)k−j∂ fEj∂mwm − τk∑j=k†γ(σ)k−j∂ fIj∂mwm,where k† = min(0, k − s), with respect to m, y0 and y are:• ∂2t∂m∂mwmTaking the derivative of (A.90) with respect to m gives the kth N × Nm block of∂ 2t∂m∂mwm∂ 2tk∂m∂mwm = −τk−1∑j=k†β(σ)k−j∂ 2fEj∂m∂mwm − τk∑j=k†γ(σ)k−j∂ 2fIj∂m∂mwm. (A.106)◦ The product with vm is(∂ 2tk∂m∂mwm)vm = −τk−1∑j=k†β(σ)k−j(∂ 2fEj∂m∂mwm)vm+− τk∑j=k†γ(σ)k−j(∂ 2fIj∂m∂mwm)vm.(A.107)204A.2. Implicit-Explicit Time-Stepping Methods◦ The product of its transpose with the kth block of v is(∂ 2tk∂m∂mwm)⊤vk = −τk−1∑j=k†β(σ)k−j(∂ 2fEj∂m∂mwm)⊤vk+− τk∑j=k†γ(σ)k−j(∂ 2fIj∂m∂mwm)⊤vk.(A.108)• ∂2t∂y0∂mwmDifferentiating (A.90) with respect to y0 immediately gives∂ 2tk∂y0∂mwm = −τ β(k)k∂ 2fE0∂y∂mwm − τ γ(k)k∂ 2fI0∂y∂mwm (A.109)with β(k)k = γ(k)k = 0 if k > s.◦ The product with v0 is(∂ 2tk∂y0∂mwm)v0 = −τ β(k)k(∂ 2fE0∂y∂mwm)v0 − τ γ(k)k(∂ 2fI0∂y∂mwm)v0(A.110)with β(k)k = γ(k)k = 0 if k > s.◦ The product of its transpose with the kth block of vy is(∂ 2tk∂y0∂mwm)⊤vk = −τ β(k)k(∂ 2fE0∂y∂mwm)⊤vk+− τ γ(k)k(∂ 2fI0∂y∂mwm)⊤vk(A.111)with β(k)k = γ(k)k = 0 if k > s.• ∂2t∂y∂mwm205A.2. Implicit-Explicit Time-Stepping MethodsConsider the jth N ×N block of∂ 2tk∂y∂mwm =[∂ 2tk∂y1∂mwm∂ 2tk∂y2∂mwm · · · ∂2tk∂yK ∂mwm]. (A.112)The (k, j)th N ×N block of ∂2t∂y∂mwm is∂ 2tk∂yj ∂mwm =−τβ(σ)k−j∂ 2fEj∂y∂mm− τγ(σ)k−j∂ 2fIj∂y∂mm if k⋆ ≤ j ≤ k0N×N otherwise.(A.113)◦ The product of ∂2tk∂yj ∂mwm with the jth block of v is(∂ 2tk∂yj ∂mwm)vj =− τβ(σ)k−j(∂ 2fEj∂y∂mm)vj+− τγ(σ)k−j(∂ 2fIj∂y∂mm)vjif k⋆ ≤ j ≤ k0N×1 otherwise,(A.114)with β(σ)0 = 0.◦ The product of the transpose of ∂2tk∂yj ∂mwm with the kth block of v is(∂ 2tk∂yj ∂mwm)⊤vk =− τβ(σ)k−j(∂ 2fEj∂y∂mm)⊤vk+− τγ(σ)k−j(∂ 2fIj∂y∂mm)⊤vkif j ≤ k ≤ j⋆0N×1 otherwise.(A.115)with j⋆ = min(K, j + s) and β(σ)0 = 0.206A.2. Implicit-Explicit Time-Stepping MethodsDerivatives of∂ t∂y0Finally, we find expressions for the derivatives of (A.95),∂ tk∂y0w0 = α(k)k w0 − τ β(k)k∂ fE0∂yw0 − τ γ(k)k∂ fI0∂yw0,where α(k)k = β(k)k = γ(k)k = 0 if k > s, with respect to m, y0 and y:• ∂2t∂m∂y0w0Differentiating (A.95) with respect tom gives the kth N×Nm block of ∂2t∂m∂y0w0:∂ 2tk∂m∂y0w0 = −τ β(k)k∂ 2fE0∂m∂yw0 − τ γ(k)k∂ 2fI0∂m∂yw0 (A.116)with β(k)k = γ(k)k = 0 if k > s.◦ The product with vm is(∂ 2tk∂m∂y0w0)vm = −τβ(k)k(∂ 2fE0∂m∂yw)vm−τγ(k)k(∂ 2fI0∂m∂yw0)vm. (A.117)◦ The product of the transpose with the kth block of v is(∂ 2tk∂m∂y0w0)⊤vk = −τ β(k)k(∂ 2fE0∂m∂yw)⊤vk+− τ γ(k)k(∂ 2fI0∂m∂yw)⊤vk.(A.118)• ∂2t∂y0∂y0w0The kth N ×N block of ∂2t∂y0∂y0w0 is obtained by taking the derivative of (A.95)207A.2. Implicit-Explicit Time-Stepping Methodswith respect to y0:∂ 2tk∂y0∂y0w0 = −τ β(k)k∂ 2fE0∂y∂yw0 − τ γ(k)k∂ 2fI0∂y∂yw0 (A.119)with β(k)k = γ(k)k = 0 if k > s.◦ The product with vm is(∂ 2tk∂y0∂y0w0)v0 = −τ β(k)k(∂ 2fE0∂y∂yw)v0 − τ γ(k)k(∂ 2fI0∂y∂yw)v0. (A.120)◦ The product of the transpose with the kth block of v is(∂ 2tk∂y0∂y0w0)⊤vk = −τβ(k)k(∂ 2fE0∂y∂yw)⊤vk−τγ(k)k(∂ 2fI0∂y∂yw)⊤vk. (A.121)• ∂2t∂y∂y0w0Differentiating (A.95) with respect to y immediately gives∂ 2tk∂y∂y0w0 = 0N×KN . (A.122)A.2.2 IMEX Runge-Kutta MethodsThe time-stepping vector t = t(y;m,y0) for IMEX RK methods was found in Section2.2.2 to bet (y;m,y0) = Ty − f (y,m,y0) ,wherey =[Y⊤1 y⊤1 Y⊤2 y⊤2 · · · Y⊤K y⊤K]⊤f =[F⊤1 y0 F⊤2 01×N · · · F⊤K 01×N]⊤208A.2. Implicit-Explicit Time-Stepping MethodsT =I(2s+1)NB⊤ INI(2s+1)N−IN B⊤ IN. . .. . .. . .I(2s+1)N−IN B⊤ IN,see (2.31). We have letB = −τk[bE1 bI1 bE2 · · · bEs−1 bIs−1 bEs]⊤⊗ IN .andYk =Yk,1...Yk,2s−1 , Fk =Fk,1...Fk,2s−1 ,withYk,i =YEk,⌈ i2⌉ if i oddYIk, i2if i evenandFk,i =fE(yk,⌈ i2⌉−1, tk−1,⌈ i2 ⌉)if i oddfI(yk, i2, tk−1, i2+1)if i even.yk,σwas defined to be yk,σ= yk−1 + τkσ∑i=1aEσ+1,iYEk,i + τkσ∑i=1aIσ,iYIk,i, and we haveyk,0= yk−1. Let fEk,σ = fE(yk,σ−1, tk−1,σ)and fIk,σ = fI(yk,σ−1, tk−1,σ).From now on we omit the time dependence of fE and fI . We let ŷk =[Y⊤k y⊤k]⊤.We also let w =[ŵ⊤1 · · · ŵ⊤K]⊤and v =[v̂⊤1 · · · v̂⊤K]⊤be arbitrary vectors oflength 2sKN , where ŵ⊤k and v̂⊤k are defined analogously to ŷ⊤k . Also, wm and vmare arbitrary vectors of length Nm, and w0 and v0 are arbitrary vectors of length N .209A.2. Implicit-Explicit Time-Stepping MethodsWe will now take the derivative of t with respect to each of y, m and y0 in turn.For this we need the derivatives of yk,σwith respect to yk−1, YEk,i and YIk,i, which are∂yk,σ∂yk−1= IN ,∂yk,σ∂YEk,i= τkaEσ+1,iIN and∂yIk,σ∂YIk,i= τkaIσiIN .• ∂ t∂yWe break the derivative of t with respect to y into parts:∂ t∂y=[∂ t∂ ŷ1∂ t∂ ŷ2· · · ∂ t∂ ŷK]= T−[∂ f∂ ŷ1∂ f∂ ŷ2· · · ∂ f∂ ŷK]. (A.123)The derivative of f at the kth time-step is∂ fk∂ ŷj= ∂Fk∂ ŷj0N×2sNwith∂Fk∂ ŷj=[∂Fk∂YEj,1∂Fk∂YIj,1∂Fk∂YEj,2· · · ∂Fk∂YIj,s−1∂Fk∂YEj,s∂Fk∂yj]where 1 ≤ j, k ≤ K. ∂Fk,i∂ ŷj,ℓdenotes the (i, ℓ)th N ×N block of ∂Fk∂ ŷj, with 1 ≤ i ≤2s− 1 and 1 ≤ ℓ ≤ 2s.For j = k − 1,∂Fk,i∂ ŷk−1=[0N×(2s−1)N∂ fEk,⌈ i2⌉∂y]if i odd[0N×(2s−1)N∂ fIk, i2+1∂y]if i even(A.124a)210A.2. Implicit-Explicit Time-Stepping Methodsand for j = k we have∂Fk,i∂ ŷk=[τka⌈ i2⌉∂ fEk,⌈ i2⌉∂y0N×N]if i odd[τka i2+1∂ fIk, i2+1∂y0N×N]if i even,(A.124b)whereaσ =[aEσ,1 aIσ−1,1 · · · aEσ,σ−1 aIσ−1,σ−1 01×2(s−σ)+1]⊗ IN .For all other values of j we have∂Fk,i∂ ŷj= 0N×2sN . Now lettingAs =[a⊤1 a⊤2 a⊤2 a⊤3 · · · a⊤s]⊤∂ fk∂y= blkdiag(∂ fEk,1∂y,∂ fIk,2∂y,∂ fEk,2∂y,∂ fIk,3∂y, · · · , ∂ fEk,s∂y),(A.125)and using the definition of T, the (k, j)th 2sN × 2sN block of ∂ t∂yis∂ tk∂ ŷj=I(2s−1)N − τk ∂ fk∂y (As ⊗ IN) 0(2s−1)N×NB⊤ IN if j = k0(2s−1)N×(2s−1)N −∂ fk∂y (12s−1 ⊗ IN)0N×(2s−1)N −IN if j = k − 102(s−1)N×2(s−1)N otherwise.(A.126)Finally, letAk = I(2s−1)N − τk ∂ fk∂y(As ⊗ IN)Ck = −∂ fk∂y(12s−1 ⊗ IN)(A.127)211A.2. Implicit-Explicit Time-Stepping Methodsand write∂ t∂y=A1B⊤ INC2 A2−IN B⊤ IN. . . . . . . . .CK AK−IN B⊤ IN. (A.128)which is a lower block-triangular matrix.◦ The kth 2sN × 1 block of the product ∂ t∂yw is∂ tk∂yw = AkWk +Ckwk−1B⊤Wk +wk −wk−1 (A.129)with w0 = 0N×1. If∂ tk,i∂yw is the ith N × 1 block of ∂ tk∂yw, with 1 ≤ i ≤ 2s, wehave∂ tk,i∂yw =WEk,⌈ i2⌉ −∂ fEk,⌈ i2⌉∂ywk,⌊ i2⌋if i ≤ 2s− 1and i oddWIk, i2−∂ fIk, i2+1∂ywk, i2if i ≤ 2(s− 1)and i even−τk(s∑σ=1bEσWEk,σ +s−1∑σ=1bIσWIk,σ)++wk −wk−1if i = 2s,(A.130)where wk,σ = wk−1 + τkσ∑i=1aEσ+1,iWEk,i + τkσ∑i=1aIσiWIk,i.212A.2. Implicit-Explicit Time-Stepping Methods◦ The transpose of ∂ t∂yis upper block-triangular:∂ t∂y⊤=A⊤1 BIN C⊤2 −INA⊤2 B. . . . . . . . .IN C⊤K −INA⊤K BIN. (A.131)The jth 2sN × 1 block of the product of ∂ t∂y⊤with some arbitrary w thereforeis∂ t∂ ŷj⊤w = A⊤j Wj +BwjC⊤j+1Wj+1 +wj −wj+1 , (A.132)with WK+1,σ = wK+1 = 0N×1. The ith N × 1 block ∂ t∂ ŷj,i⊤w of∂ t∂ ŷj⊤w, with213A.2. Implicit-Explicit Time-Stepping Methods1 ≤ σ ≤ 2s, is∂ t∂ ŷj,i⊤w =WEj,⌈ i2⌉ − τjs∑ℓ=⌈ i2⌉+1aEℓ,⌈ i2⌉∂ fIj,ℓ∂y⊤WIj,ℓ−1+− τjs∑ℓ=⌈ i2⌉+1aEℓ,⌈ i2⌉∂ fEj,ℓ∂y⊤WEj,ℓ − τj bE⌈ i2⌉wjif i ≤ 2s− 1and i oddWIj, i2− τjs−1∑ℓ= i2aIℓ, i2∂ fIj,ℓ+1∂y⊤WIj,ℓ+− τjs−1∑ℓ= i2aIℓ, i2∂ fEj,ℓ+1∂y⊤WEj,ℓ+1 − τj bIi2wjif i ≤ 2(s− 1)and i evenwj −wj+1 −s∑ℓ=1∂ fEj+1,ℓ∂y⊤WEj+1,ℓ+−s−1∑ℓ=1∂ fIj+1,ℓ+1∂y⊤WIj+1,ℓif i = 2s,(A.133)• ∂ t∂mTaking the derivative of t with respect to m,∂ t∂m=∂∂m(Ty − f) = − ∂ f∂m(A.134)with∂ f∂m=∂F1∂m0N×Nm...∂FK∂m0N×Nm, where∂Fk∂m=∂Fk,1∂m...∂Fk,2s−1∂m (A.135)214A.2. Implicit-Explicit Time-Stepping Methodsand∂Fk,i∂m=∂ fEk,⌈ i2⌉∂mif i odd∂ fIk, i2+1∂mif i even.(A.136)◦ The product ∂ tk∂mwm is simply∂ tk∂mwm = −∂Fk∂m wm0N×1 (A.137)with∂Fk∂mwm computed in the obvious way using (A.136).◦ The product ∂ tk∂m⊤ŵk is∂ tk∂m⊤ŵk = −s∑σ=1∂ fEk,σ∂m⊤WEk,σ −s−1∑σ=1∂ fIk,σ∂m⊤WIk,σ. (A.138)• ∂ t∂y0Taking the derivative of T with respect to y0,∂ t∂y0=∂∂y0(Ty − f) = − ∂ f∂y0(A.139)with∂ f∂y0=∂F1∂y∂y1∂y0IN02s(K−1)N×N , where ∂y1∂y0 =∂y1,1∂y0...∂y1,s∂y0 = 1s ⊗ IN . (A.140)215A.2. Implicit-Explicit Time-Stepping Methods◦ The product ∂ tk∂y0w0 is∂ tk∂y0w0 =−∂F1∂y (1s ⊗w0)w0 if k = 10N×1 otherwise,(A.141)with∂F1∂y= diag(∂ fE1,1∂y,∂ fI1,2∂y,∂ fE1,2∂y,∂ fI1,3∂y, · · · , ∂ fE1,s∂y).◦ The product ∂ tk∂y0⊤ŵk is∂ tk∂y0⊤ŵk =−s∑σ=1∂ fE1,σ∂m⊤WE1,σ −s−1∑σ=1∂ fI1,σ∂m⊤WI1,σ −w1 if k = 10N×1 otherwise.(A.142)Derivatives of∂ t∂yWe now find the derivatives of the ith N × 1 block ∂ tk,i∂yw of the vector∂ tk∂yw, with1 ≤ 1 ≤ 2s, which in turn is the kth block of ∂ t∂yw =∂ t∂yw.The expression for the product∂ tk,i∂yw was given in (A.130). We now find expres-sions for the derivatives of (A.130) with respect to y, m and y0.•(∂ t∂yw)m216A.2. Implicit-Explicit Time-Stepping MethodsWe differentiate (A.130) with respect to m:∂ 2tk,i∂m∂yw =−∂ 2fEk,⌈ i2⌉∂m∂ywk,⌈ i2⌉ if i ≤ 2s− 1 and i odd−∂ 2fIk, i2+1∂m∂ywk, i2if i ≤ 2(s− 1) and i even0N×Nm if i = 2s,(A.143)where wk,σ = wk−1 + τkσ∑i=1aEσ+1,iWEk,i + τkσ∑i=1aIσiWIk,i.• ∂2t∂y0∂ywTaking the derivative of (A.130) with respect to y0 gives∂ 2t1,i∂y0∂yw =−∂ 2fE1,⌈ i2⌉∂y∂yw1,⌈ i2⌉ if i ≤ 2s− 1 and i odd−∂ 2fI1, i2+1∂y∂yw1, i2if i ≤ 2(s− 1) and i even0N×Nm if i = 2s,(A.144)where w1,σ = w0 + τ1σ∑i=1aEσ+1,iWE1,i + τ1σ∑i=1aIσiWI1,i, and∂ 2tk,i∂y0∂yw = 0N×N fork > 1.• ∂2t∂y∂ywConsider the ℓth term of∂ 2tk,i∂y∂yw =[∂ 2tk,i∂ ŷ1∂yw∂ 2tk,i∂ ŷ2∂yw · · · ∂2tk,i∂ ŷK ∂yw], (A.145)with∂ 2tk,i∂ ŷℓ∂yw =[∂ 2tk,i∂YEℓ,1∂yw∂ 2tk,i∂YIℓ,1∂yw · · · ∂2tk,i∂YEℓ,s∂yw∂ 2tk,i∂yℓ∂yw]. (A.146)217A.2. Implicit-Explicit Time-Stepping MethodsThen, for ℓ = k − 1,∂ 2tk,i∂ ŷk−1∂yw =[0N×(2s−1)N −∂ 2fEk,⌈ i2⌉∂y∂ywk,⌈ i2⌉]if i ≤ 2s− 1and i odd[0N×(2s−1)N −∂ 2fIk, i2+1∂y∂ywk, i2]if i ≤ 2(s− 1)and i even0N×2sN if i = 2s,(A.147a)for ℓ = k,∂ 2tk,i∂ ŷk ∂yw =[−τk a⌈ i2⌉∂ 2fEk,⌈ i2⌉∂y∂ywk,⌈ i2⌉ 0N×N]if i ≤ 2s− 1and i odd[−τk a i2∂ 2fIk, i2+1∂y∂ywk, i20N×N]if i ≤ 2(s− 1)and i even0N×2sN if i = 2s,(A.147b)and∂ 2tk,i∂ ŷℓ∂yw = 02sN×2sN (A.147c)otherwise.Derivatives of∂ t∂mWe consider the derivatives of the ith N × 1 block ∂ tk,i∂mwm of the vector∂ tk∂mwm,which in turn is the kth 2sN × 1 block of ∂ t∂mwm.The expression for the product∂ tk∂mwm was given in (A.137). We now find ex-pressions for the derivatives of (A.137) with respect to y, m and y0.• ∂2t∂m∂mwm218A.2. Implicit-Explicit Time-Stepping MethodsWe differentiate (A.137) with respect to m:∂ 2tk∂m∂mwm = − ∂2Fk∂m∂mwm0N×Nm (A.148)with∂ 2Fk∂m∂mwm =∂ 2Fk,1∂m∂mwm...∂ 2Fk,2s−1∂m∂mwm ,where∂ 2Fk,i∂m∂mwm =∂ 2fEk,⌈ i2⌉∂m∂mwm if i odd∂ 2fIk, i2+1∂m∂mwm if i even.• ∂2t∂y0∂mwmTaking the derivative of (A.141) with respect to y0 gives∂ 2tk∂y0∂mwm = − ∂2Fk∂y0∂mwm0N×N (A.149)with∂ 2Fk∂y0∂m= 0(2s−1)N×N for k > 1 and∂ 2F1∂y0∂mwm =∂ 2F1,1∂y0∂mwm...∂ 2F1,2s−1∂y0∂mwm ,219A.2. Implicit-Explicit Time-Stepping Methodswhere∂ 2F1,i∂y0∂mwm =∂ 2fE1,⌈ i2⌉∂y∂mwm if i odd∂ 2fI1, i2+1∂y∂mwm if i even.• ∂2t∂y∂mwmConsider the ℓth term of∂ 2tk,i∂y∂mwm =[∂ 2tk,i∂ ŷ1∂mwm∂ 2tk,i∂ ŷ2∂mwm · · · ∂2tk,i∂ ŷK ∂mwm], (A.150)with∂ 2tk,i∂ ŷℓ∂mwm =[∂ 2tk,i∂YEℓ,1∂mwm∂ 2tk,i∂YIℓ,1∂mwm · · ·· · · ∂2tk,i∂YIℓ,s−1∂mwm∂ 2tk,i∂YEℓ,s∂mwm∂ 2tk,i∂yℓ∂mwm].Then, for ℓ = k − 1,∂ 2tk,i∂ ŷk−1∂mwm =[0N×(2s−1)N −∂ 2fEk,⌈ i2⌉∂y∂mwm]if i ≤ 2s− 1and i odd[0N×(2s−1)N −∂ 2fIk, i2+1∂y∂mwm]if i ≤ 2(s− 1)and i even0N×2sN if i = 2s,(A.151a)220A.2. Implicit-Explicit Time-Stepping Methodsfor ℓ = k,∂ 2tk,i∂ ŷk ∂mwm =[−τk a⌈ i2⌉∂ 2fEk,⌈ i2⌉∂y∂mwm 0N×N]if i ≤ 2s− 1and i odd[−τk a i2∂ 2fIk, i2+1∂y∂mwm 0N×N]if i ≤ 2(s− 1)and i even0N×2sN if i = 2s,(A.151b)and∂ 2tk,i∂ ŷℓ∂mwm = 02sN×2sN (A.151c)otherwise.Derivatives of∂ t∂y0We consider the derivatives of the ith N × 1 block ∂ tk,i∂y0w0 of the vector∂ tk∂y0w0,which in turn is the kth 2sN × 1 block of ty0w0 =∂ t∂y0w0.The expression for the product∂ tk∂y0w0 was given in (A.141). We now find expres-sions for the derivatives of (A.141) with respect to y, m and y0.• ∂2t∂m∂y0w0We differentiate (A.141) with respect to m. Clearly∂ 2tk∂m∂y0w0 = 02sN×Nm (A.152a)221A.2. Implicit-Explicit Time-Stepping Methodsfor k > 1. For k = 1 we have∂ 2t1,i∂m∂y0w0 =−∂ 2fE1,⌈ i2⌉∂m∂yw0 if 1 ≤ i ≤ 2s− 1 and i odd−∂ 2fI1, i2+1∂m∂yw0 if 1 ≤ i ≤ 2(s− 1) and i even0N×Nm if i = 2s(A.152b)• ∂2t∂y0∂y0w0Taking the derivative of (A.141) with respect to y0 gives∂ 2tk∂y0∂y0w0 = 02sN×N (A.153a)for k > 1. For k = 1 we have∂ 2t1,i∂m∂y0w0 =−∂ 2fE1,⌈ i2⌉∂m∂yw0 if 1 ≤ i ≤ 2s− 1 and i odd−∂ 2fI1, i2+1∂m∂yw0 if 1 ≤ i ≤ 2(s− 1) and i even0N×N if i = 2s(A.153b)• ∂2t∂y∂y0w0We have∂ 2tk,i∂y∂yw0 = 0N×2sN for k > 1. For k = 1, consider the ℓth term of∂ 2t1,i∂y∂yw0 =[∂ 2t1,i∂ ŷ1∂yw0∂ 2tk,i∂ ŷ2∂yw0 · · · ∂2t1,i∂ ŷK ∂yw0], (A.154)222A.3. Staggered Time-Stepping Methodsis 0N×N except for ℓ = 1, in which case we have∂ 2t1,i∂ ŷ1∂yw0 =[∂ 2t1,i∂YE1,1∂yw0∂ 2t1,i∂YI1,1∂yw0∂ 2t1,i∂YE1,s∂yw0∂ 2t1,i∂y1∂yw0],so that∂ 2t1,i∂ ŷk ∂yw0 =[−τka⌈ i2⌉∂ 2fE1,⌈ i2⌉∂y∂yw0 0N×N]if i ≤ 2s− 1and i odd[−τka i2∂ 2fI1, i2+1∂y∂yw0 0N×N]if i ≤ 2(s− 1)and i even0N×2sN if i = 2s.(A.155)A.3 Staggered Time-Stepping MethodsA.3.1 Staggered Linear Multistep MethodsThe time-stepping vector for a generic s-step StagLM method was found in (2.41) inSection 2.3.1:t (y,m,y0) = Ay +αy0 − τBf − τ β f0,where the time-stepping vector t has length KN . Let σ = min (k, s) and k⋆ =max (1, k − s). The kth time vector is given by the kth subvector of length N :tk (y,m,y0) =k∑j=k⋆α(σ)k−juk − τk∑j=k⋆β(σ)k−jfuk− 12k∑j=k⋆α(σ)k−jvk+1/2 − τk∑j=k⋆β(σ)k−jfvk .We have now included the dependence of t on m; y is independent of m and y0.Recall the definitions of the matrices A and B, defined in (2.1) and (2.3), and α andβ, defined as in (2.4). Also recall that y =[y⊤1 · · · y⊤K]⊤and f =[f⊤1 · · · f⊤K]⊤,223A.3. Staggered Time-Stepping Methodswithyk = ukvk+ 12 and fk =fuk− 12fvk .In this subsection we also let w =[w⊤1 · · · w⊤K]⊤be an arbitrary vector of lengthKN , with wk =[wuk⊤ wvk+1/2⊤]⊤having length N = Nu +Nv. wm is an arbitraryvector of length Nm, and w0 =[wu0⊤ wv1/2⊤]⊤is an arbitrary vector of lengthN = Nu +Nv.Let us now take the derivative of t = t (y,m,y0) with respect to y, m and y0 inturn:• ∂ t∂yWe have∂ t∂y=[∂ t∂y1∂ t∂y2· · · ∂ t∂yK]= A− τB ∂ f∂y, (A.156)where∂ fk∂yj=0Nu×Nu 0Nu×Nv∂ fvk∂u0Nv×Nv if j = k0Nu×Nu∂ fuk− 12∂v0Nv×Nu 0Nv×Nv if j = k − 1.(A.157)It is easy to see that∂ t∂yhas the form of (2.1) and (2.3), with the (k, j)th N ×Nblock being∂ tk∂yj=α(σ)k−jIN − τ β(σ)k−j∂ fj∂yj− τ β(σ)k−j−1∂ fj+1∂yjif k⋆ ≤ j ≤ k0N×N otherwise,(A.158)with β(σ)ℓ = 0 for ℓ < 0.224A.3. Staggered Time-Stepping Methods◦ The product of ∂ tk∂ywith w is∂ tk∂yw =k−1∑j=k⋆−1α(σ)k−jwuj − τk∑j=k⋆β(σ)k−j−1∂ fuj+1/2∂vwvj+1/2k∑j=k⋆α(σ)k−jwvj+1/2 − τk∑j=k⋆β(σ)k−j∂ fvj∂uwuj , (A.159)with β(σ)ℓ = 0 for ℓ < 0.◦ The product of ∂ tk∂yj⊤with wk is∂ tk∂yj⊤wk = α(σ)k−jwuk − τ β(σ)k−j∂ fvj∂u⊤wvk+1/2α(σ)k−jwvk+1/2 − τ β(σ)k−j−1∂ fuj+1/2∂v⊤wuk (A.160)and therefore the product of∂ t∂yj⊤with w is∂ t∂yj⊤w =min(K,j+s)∑k=jα(σ)k−jwuk − τmin(K,j+s)∑k=jβ(σ)k−j∂ fvj∂u⊤wvk+1/2min(K,j+s)∑k=jα(σ)k−jwvk+1/2 − τmin(K,j+s)∑k=j+1β(σ)k−j−1∂ fuj+1/2∂v⊤wuk . (A.161)• ∂ t∂mDifferentiating t with respect to m gives∂ t∂m= −τ(B∂ f∂m+ β∂ f0∂m)(A.162)with∂ f∂m=[∂ f1 (m)∂m⊤· · · ∂ fK (m)∂m⊤]⊤. (A.163)225A.3. Staggered Time-Stepping MethodsTherefore the kth N ×Nm block of (A.162) is∂ tk∂m=−τk∑j=k⋆β(σ)k−j∂ fuj− 12∂m−τk∑j=k†β(σ)k−j∂ fvj∂m (A.164)with β(k)k = 0 if k > s and k† = min(0, k − s).◦ The product of (A.164) with wm is∂ tk∂mwm =−τk∑j=k⋆β(σ)k−j∂ fuj− 12∂mwm−τk∑j=k†β(σ)k−j∂ fvj∂mwm (A.165)with β(k)k = 0 if k > s and k† = min(0, k − s).◦ The product of the transpose with the k the block of w is∂ tk∂m⊤wk = −τk∑j=k⋆β(σ)k−j∂ fuj− 12∂mwuk − τk∑j=k†β(σ)k−j∂ fvj∂mwvk+1/2. (A.166)The product of the transpose of∂ tk∂mwith w hence is∂ t∂m⊤w = −τK∑k=1k∑j=k⋆β(σ)k−j∂ fuj− 12∂mwuk − τK∑k=1k∑j=k†β(σ)k−j∂ fvj∂mwvk+1/2 (A.167)with β(k)k = 0 if k > s and k† = min(0, k − s).• ∂ t∂y0This derivative t with respect to y0 is∂ t∂y0=∂ t∂y0= α− τB ∂ f∂y0− τ β∂ f0∂y(A.168)226A.3. Staggered Time-Stepping Methodsand the kth N ×N block is∂ tk∂y0= α(k)k INu −τ β(k)k−1 ∂ fu1/2∂v−τ β(k)k∂ fv0∂uα(k)k INv (A.169)with α(k)k = β(k)k = 0 if k > s.◦ The product of (A.169) with w0 is∂ tk∂y0w0 =α(k)k wu0 − τ β(k)k−1 ∂ fu1/2∂vwv1/2α(k)k wv1/2 − τ β(k)k∂ fv0∂uwu0 . (A.170)◦ The product of the transpose with the kth block of w is∂ tk∂y0⊤wk =α(k)k wuk − τ β(k)k−1∂ fv0∂u⊤wvk+1/2α(k)k wvk+1/2 − τ β(k)k∂ fu1/2∂v⊤wuk . (A.171)The product of the transpose of∂ t∂y0with w therefore is∂ t∂y0⊤w =K∑k=1α(k)k wuk − τK∑k=0β(k)k∂ fv0∂u⊤wvk+1/2K∑k=1α(k)k wvk+1/2 − τK∑k=1β(k)k∂ fu1/2∂v⊤wuk . (A.172)Derivatives of∂ t∂yWe will now find expressions for the derivatives of (A.159),∂ tk∂yw =k∑j=k⋆α(σ)k−jwuj − τk−1∑j=k⋆β(σ)k−j−1∂ fuj+1/2∂vwvj+1/2k∑j=k⋆α(σ)k−jwvj+1/2 − τk∑j=k⋆β(σ)k−j∂ fvj∂uwuj ,227A.3. Staggered Time-Stepping Methodswith respect to m, y0 and y.• ∂2t∂m∂ywDifferentiating (A.159) with respect to m gives the kth N×Nm block of ∂2t∂m∂yw:∂ 2tk∂m∂yw =−τk−1∑j=k⋆β(σ)k−j−1∂ 2fuj+1/2∂m∂vwvj+1/2−τk∑j=k⋆β(σ)k−j∂ 2fvj∂m∂uwuj . (A.173)• ∂2t∂y0∂ywThe derivative of (A.159) with respect to y0 is∂ 2tk∂y0∂yw = 0N×N . (A.174)• ∂2t∂y∂ywConsider the jth N ×N block of∂ 2tk∂y∂yw =[∂ 2tk∂y1∂yw∂ 2tk∂y2∂yw · · · ∂2tk∂yK ∂yw].Taking the derivative of (A.159) with respect to yj gives the (k, j)th N ×N blockof∂ 2t∂y∂yw,∂ 2tk∂yj ∂yw =−τ 0Nu×Nu β(σ)k−j−1∂ 2fuj+ 12∂v∂vwvj+ 12β(σ)k−j∂ 2fvj∂u∂uwuj 0Nv×Nv if k⋆ ≤ j ≤ k0N×N otherwise.(A.175)228A.3. Staggered Time-Stepping MethodsDerivatives of∂ t∂mThe derivatives of (A.165),∂ tk∂mwm =−τk∑j=k⋆β(σ)k−j∂ fuj− 12∂mwm−τk∑j=k†β(σ)k−j∂ fvj∂mwmwhere k† = min(0, k − s), with respect to m, y0 and y are:• ∂2t∂m∂mwmTaking the derivative of (A.165) with respect to m gives the kth N ×Nm block of∂ 2t∂m∂mwm∂ 2tk∂m∂mwm =−τk∑j=k⋆β(σ)k−j∂ 2fuj− 12∂m∂mwm−τk∑j=k†β(σ)k−j∂ 2fvj∂m∂mwm . (A.176)• ∂2t∂y0∂mwmDifferentiating (A.165) with respect to y0 immediately gives∂ 2tk∂y0∂mwm =0Nu×Nu −τ β(σ)k−1 ∂2fu12∂v∂mwm0Nv×Nu 0Nv×Nv (A.177)with β(k)k = 0 if k > s.• ∂2t∂y∂mwmConsider the jth N ×N block of∂ 2tk∂y∂mwm =[∂ 2tk∂y1∂mwm∂ 2tk∂y2∂mwm · · · ∂2tk∂yK ∂mwm]. (A.178)229A.3. Staggered Time-Stepping MethodsThe (k, j)th N ×N block of ∂2t∂y∂mwm is∂ 2tk∂yj ∂mwm =−τ 0Nu×Nu β(σ)k−j−1∂ 2fuj+1/2∂v∂mwmβ(σ)k−j∂ 2fvj∂u∂mwm 0Nv×Nvif k⋆ ≤ j ≤ k0N×N otherwise.(A.179)Derivatives of∂ t∂y0w0Finally, we find expressions for the derivatives of (A.170),∂ tk∂y0w0 =α(k)k wu0 − τ β(k)k−1∂ fu1/2∂vwv1/2α(k)k wv1/2 − τ β(k)k∂ fv0∂uwu0 ,where α(k)k = β(k)k = 0 if k > s, with respect to m, y0 and y:• ∂2t∂m∂y0w0Differentiating (A.170) with respect tom gives the kth N×Nm block of ∂2t∂m∂y0w0:∂ 2tk∂m∂y0w0 = −τβ(k)k−1∂ 2fu1/2∂m∂vwv1/2β(k)k∂ 2fv0∂m∂uwu0 (A.180)with β(k)k = 0 if k > s.• ∂2t∂y0∂y0w0The kth N×N block of ∂2t∂y0∂y0w0 is obtained by taking the derivative of (A.170)230A.3. Staggered Time-Stepping Methodswith respect to y0:∂ 2tk∂y0∂y0w0 =0Nu×Nu −τ β(k)k−1∂ 2fu1/2∂v∂vwv1/2−τ β(k)k∂ 2fv0∂u∂uwu0 , (A.181)with β(k)k = 0 if k > s.• ∂2t∂y∂y0w0Differentiating (A.170) with respect to y immediately gives∂ 2tk∂y∂y0w0 = 0N×KN . (A.182)A.3.2 Staggered Runge-Kutta MethodsThe time-stepping vector t = t(y;m,y0) for StagRK methods was found in Section2.3.2 to bet (y;m,y0) = to/e (y;m,y0) = Ty − f (y,m,y0) ,wherey = yo/e =[(Yo/e1)⊤y⊤1(Yo/e2)⊤y⊤2 · · ·(Yo/eK)⊤y⊤K]⊤f = fo/e=[(Fo/e1)⊤y⊤0(Fo/e2)⊤01×N · · ·(Fo/eK)⊤01×N]⊤231A.3. Staggered Time-Stepping MethodsandT = To/e =IsN(Bo/e)⊤INIsN−IN(Bo/e)⊤IN. . . . . . . . .IsN−IN(Bo/e)⊤IN, (A.183)see (2.50b). Here we haveBo = −τ[b1 0 b3 0 · · · bs−2 0 bs]⊤⊗ IN ,Be = −τ[0 b2 0 b4 · · · bs−2 0 bs]⊤⊗ IN ,Yo/ek =Yo/ek,1...Yo/ek,s , and Fo/ek =Fo/ek,1...Fo/ek,s(A.184)withFok,σ Fek,σσ oddfu (voU,k− 12 ,σ, tk− 12 ,σ)fv(uoV,k,σ, tk,σ)  fv (ueU,k−1,σ, tk−1,σ)fu(veV,k− 12,σ, tk− 12,σ)σ even fv (uoU,k−1,σ, tk−1,σ)fu(voV,k− 12,σ, tk− 12,σ) fu (veU,k− 12 ,σ, tk− 12 ,σ)fv(ueV,k,σ, tk,σ)232A.3. Staggered Time-Stepping MethodsFor brevity of notation we have let tk−1,σ = tk−1 + cστ , tk− 12,σ = tk− 12+ cστ , and (see(2.44)),vo/eU,k− 12,σ= vk− 12+ τσ−1∑i=1i e/oaσiUo/ek,i , vo/eV,k− 12,σ= vk− 12+ τσ−1∑i=1i o/eaσiVo/ek+ 12,iuo/eU,k−1,σ = uk−1 + τσ−1∑i=1i o/eaσiUo/ek,i , uo/eV,k,σ = uk + τσ−1∑i=1i e/oaσiVo/ek+ 12,i.(A.185)Withyk = ukvk+ 12 and Yo/ek,σ = Uo/ek,σVo/ek+ 12,σ ,From now on we omit the time dependence of fu and fv. We will now take thederivative of t with respect to each of y, m and y0 in turn.• ∂ t∂yWe break the derivative of t with respect to y into parts:∂ t∂y=[∂ t∂ ŷ1∂ t∂ ŷ2· · · ∂ t∂ ŷK]= T−[∂ f∂ ŷ1∂ f∂ ŷ2· · · ∂ f∂ ŷK]. (A.186)If∂Fo/ek∂ ŷj=[∂Fo/ek∂Yj,1· · · ∂Fo/ek∂Yj,s∂Fo/ek∂yj]denotes the derivative of Fo/ek with respect to ŷj , with 1 ≤ k, j ≤ K, and∂Fo/ek,σ∂ ŷj,idenotes the (σ, i)th N ×N block of ∂Fo/ek∂ ŷj, with 1 ≤ σ ≤ s and 1 ≤ i ≤ s+1, then,using the chain rule,◦ For number of stages s odd :We distinguish between the stage σ being odd or even.233A.3. Staggered Time-Stepping MethodsFor σ odd we have∂Fok,σ∂Uok,i=τασi∂ fu(voU,k− 12,σ)∂voU,k− 12,σ0Nv×Nu , ∂Fok,σ∂Vok+ 12,i= 0Nu×Nvτασi∂ fv(uoV,k,σ)∂uoV,k,σ (A.187)and hence, for j = k, 1 ≤ σ ≤ s and 1 ≤ i ≤ σ − 1, i even:∂Fok,σ∂Yok,i=∂ fu(voU,k− 12,σ)∂Uok,i∂ fu(voU,k− 12,σ)∂Vok+ 12,i∂ fv(uoV,k,σ)∂Uok,i∂ fv(uoV,k,σ)∂Vok+ 12,i=∂ fu(voU,k− 12,σ)∂Uok,i0Nu×Nv0Nv×Nu∂ fv(uoV,k,σ)∂Vok+ 12,i= τασifuv (voU,k− 12 ,σ) 0Nv×Nu0Nu×Nv fvu(uoV,k,σ)︸ ︷︷ ︸Gok,σand∂Fok,σ∂Yoj,i= 0N×N otherwise.Also,∂Fok,σ∂yj=∂ fu(voU,k− 12,σ)∂uj∂ fu(voU,k− 12,σ)∂vj+ 12∂ fv(uoV,k,σ)∂uj∂ fv(uoV,k,σ)∂vj+ 12 , (A.188)234A.3. Staggered Time-Stepping Methodsso∂Fok,σ∂yj= 0Nu×Nu 0Nu×Nvfvu(uoV,k,σ)0Nv×Nv if j = k,0Nu×Nu fuv(voU,k− 12,σ)0Nv×Nu 0Nv×Nv if j = k − 10N×N otherwise.(A.189)Similarly, for σ even:∂Fok,σ∂Yok,i=τασifvu (uoU,k−1,σ) 0Nv×Nv0Nu×Nu fuv(voV,k− 12,σ)︸ ︷︷ ︸Gok,σif 1 ≤ i ≤ σ − 1,i odd0N×N otherwise,(A.190a)and∂Fok,σ∂yj=fvu (uoU,k−1,σ) 0Nv×Nv0Nu×Nu fuv(voV,k− 12,σ) if j = k − 10N×N otherwise.(A.190b)◦ For number of stages s even:We again differentiate between the stage σ being odd or even. Following thesame procedure as for s odd, we get, for σ odd,∂Fek,σ∂Yek,i=τασifvu (ueU,k−1,σ) 0Nv×Nv0Nu×Nu fuv(veV,k− 12,σ)︸ ︷︷ ︸Gek,σif 1 ≤ i ≤ σ − 1,i even0N×N otherwise,(A.191a)235A.3. Staggered Time-Stepping Methodsand∂Fek,σ∂yj=fvu (ueU,k−1,σ) 0Nv×Nv0Nu×Nu fuv(veV,k− 12,σ) if j = k − 10N×N otherwise,(A.191b)and for σ even,∂Fek,σ∂Yek,i=τασifuv(veU,k− 12,σ)0Nv×Nu0Nu×Nv fvu(ueV,k,σ)︸ ︷︷ ︸Gek,σif 1 ≤ i ≤ σ − 1,i odd0N×N otherwise,(A.192a)and∂Fek,σ∂yj= 0Nu×Nu 0Nu×Nvfvu(ueV,k,σ)0Nv×Nv if j = k,0Nu×Nu fuv(veU,k− 12,σ)0Nv×Nu 0Nv×Nv if j = k − 10N×N otherwise.(A.192b)For s odd and even, the stages are combined by lettingGo/ek =Go/ek,1Go/ek,2. . .Go/ek,s ,∂Fo/ek∂yj=∂Fo/ek,1∂yj...∂Fo/ek,s∂yj. (A.193)236A.3. Staggered Time-Stepping MethodsThe (k, j)th (s+ 1)N × (s+ 1)N block of ∂ t∂yis∂ tk∂ ŷj=IsN − τGo/ek (As ⊗ IN) −∂Fo/ek∂ykB⊤ IN if k = j0sN×sN −∂Fo/ek∂yk−10N×sN −IN if j = k − 10(s+1)N×(s+1)N otherwise.(A.194)WithAk = IsN − τGo/ek (As ⊗ IN)Ck,j = −∂Fo/ek∂yjB = Bo/e,(A.195)and we can then write∂ t∂y=A1 C1,1B⊤ INC2,1 A2 C2,2−IN B⊤ IN. . . . . . . . .CK,K−1 AK CK,K−IN B⊤ IN. (A.196)which is essentially lower block-triangular matrix since the variables can be rear-ranged to give a lower block-triangular matrix.237A.3. Staggered Time-Stepping Methods◦ The kth (s+ 1)N × 1 block of the product tyw is∂ tk∂yw =Ck,kwk +AkWk +Ck,k−1wk−1B⊤Wk +wk −wk−1 (A.197)with w0 = 0N×1. If∂ tk,σ∂yw is the σth N ×1 block of ∂ tk∂yw, with 1 ≤ σ ≤ s+1,we have For number of stages s odd1 ≤ σ ≤ s and σ odd:∂ tk,σ∂yw =Wuk,σ − fuv(voU,k− 12,σ)wvk− 12+ τσ−1∑i=1i evenaσiWuk,iWvk+ 12,σ− fvu(uoV,k,σ)wuk + τ σ−1∑i=1i evenaσiWvk+ 12,i . (A.198a)1 ≤ σ ≤ s and σ even:∂ tk,σ∂yw =Wuk,σ − fvu(uoU,k−1,σ)wuk−1 + τ σ−1∑i=1i oddaσiWuk,iWvk+ 12,σ− fuv(voV,k− 12,σ)wvk− 12+ τσ−1∑i=1i oddaσiWvk+ 12,i .(A.198b)σ = s+ 1:∂ tk,s+1∂yw =−τs∑σ=1σ oddbσWuk,σ +wuk −wuk−1−τs∑σ=1σ oddbσWvk+ 12,σ+wvk+ 12−wvk− 12 . (A.198c) For number of stages s even238A.3. Staggered Time-Stepping Methods1 ≤ σ ≤ s and σ odd:∂ tk,σ∂yw =Wuk,σ − fvu(ueU,k−1,σ)wuk−1 + τ σ−1∑i=1i evenaσiWuk,iWvk+ 12,σ− fuv(veV,k− 12,σ)wvk− 12+ τσ−1∑i=1i evenaσiWvk+ 12,i .(A.198d)1 ≤ σ ≤ s and σ even:∂ tk,σ∂yw =Wuk,σ − fuv(veU,k− 12,σ)wvk− 12+ τσ−1∑i=1i oddaσiWuk,iWvk+ 12,σ− fvu(ueV,k,σ)wuk + τ σ−1∑i=1i oddaσiWvk+ 12,i . (A.198e)σ = s+ 1:∂ tk,s+1∂yw =−τs∑σ=1σ evenbσWuk,σ +wuk −wuk−1−τs∑σ=1σ evenbσWvk+ 12,σ+wvk+ 12−wvk− 12 . (A.198f)From now on we letwv,o/eWu,k− 12,σ= wvk− 12+ τσ−1∑i=1i e/oaσiWuk,i wu,o/eWv,k,σ = wuk + τσ−1∑i=1i e/oaσiWvk+ 12,iwu,o/eWu,k−1,σ = wuk−1 + τσ−1∑i=1i o/eaσiWuk,i wv,o/eWv,k− 12,σ= wvk− 12+ τσ−1∑i=1i o/eaσiWvk+ 12,i.(A.199)239A.3. Staggered Time-Stepping Methods◦ The transpose of ∂ t∂yis upper block-triangular:∂ t∂y⊤=A⊤1 BC⊤1,1 IN C⊤2,1 −INA⊤2 B. . . . . . . . . . . .C⊤K−1,K−1 IN C⊤K,K−1 −INA⊤K BC⊤K,K IN. (A.200)The jth (s+1)N×1 block of the product of ∂ t∂y⊤with some arbitrary w thereforeis∂ t∂ ŷj⊤w = A⊤j Wj +BwjC⊤j,jWj +C⊤j+1,jWj+1 +wj −wj+1 , (A.201)with WK+1,σ = wK+1 = 0N×1. The σth N × 1 block ∂ t∂ ŷj,σ⊤w of∂ t∂ ŷj⊤w, with1 ≤ σ ≤ s+ 1, we have For number of stages s odd1 ≤ σ ≤ s and σ odd:∂ t∂ ŷj,σ⊤w =Wuj,σ − τs∑i=σ+1i evenaiσ fvu(uoU,j−1,i)⊤Wuj,i − τ bσwujWvj+ 12,σ− τs∑i=σ+1i evenaiσ fuv(voV,j− 12,σ)⊤Wvj+ 12,i− τ bσwvj+ 12 .(A.202a)1 ≤ σ ≤ s and σ even:∂ t∂ ŷj,σ⊤w =Wuj,σ − τs∑i=σ+1i oddaiσ fuv(voU,j− 12,σ)⊤Wuj,iWvj+ 12,σ− τs∑i=σ+1i oddaiσ fvu(uoV,j,σ)⊤Wvj+ 12,i . (A.202b)240A.3. Staggered Time-Stepping Methodsσ = s+ 1:∂ t∂ ŷj,s+1⊤w =wuj −wuj+1 −s∑σ=1σ oddfvu(uoV,j,σ)⊤Wvj+ 12,σ+−s∑σ=1σ evenfvu(uoU,j,σ)⊤Wuj+1,σwvj+ 12−wvj+ 32−s∑σ=1σ oddfuv(voU,j+ 12,σ)⊤Wuj+1,σ+−s∑σ=1σ evenfuv(voV,j+ 12,σ)⊤Wvj+ 32,σ. (A.202c) For number of stages s even1 ≤ σ ≤ s and σ odd:∂ t∂ ŷj,σ⊤w =Wuj,σ − τs∑i=σ+1i evenaiσ fuv(veU,j− 12,σ)⊤Wuj,iWvj+ 12,σ− τs∑i=σ+1i evenaiσ fvu(ueV,j,σ)⊤Wvj+ 12,i . (A.202d)1 ≤ σ ≤ s and σ even:∂ t∂ ŷj,σ⊤w =Wuj,σ − τs∑i=σ+1i oddaiσ fvu(ueU,j−1,i)⊤Wuj,i − τ bσwujWvj+ 12,σ− τs∑i=σ+1i oddaiσ fuv(veV,j− 12,σ)⊤Wvj+ 12,i− τ bσwvj+ 12 .(A.202e)σ = s+ 1:∂ t∂ ŷj,s+1⊤w =wuj −wuj+1 −s∑σ=1σ oddfvu(ueU,j,σ)⊤Wuj+1,σ+−s∑σ=1σ evenfvu(ueV,j,σ)⊤Wvj+ 12,σwvj+ 12−wvj+ 32−s∑σ=1σ oddfuv(veV,j+ 12,σ)⊤Wvj+ 32,σ+−s∑σ=1σ evenfuv(veU,j+ 12,σ)⊤Wuj+1,σ. (A.202f)241A.3. Staggered Time-Stepping Methods• ∂ t∂mTaking the derivative of T with respect to m,∂ t∂m=∂ t∂m=∂∂m(Ty − f) = − ∂ f∂m(A.203)with∂Fo/e∂m=∂Fo/e1∂m0N×Nm...∂Fo/eK∂m0N×Nm, where∂Fo/ek∂m=∂Fo/ek,1∂m...∂Fo/ek,s∂m , (A.204)and∂Fok,σ∂m∂Fek,σ∂mσ oddfum (voU,k− 12 ,σ)fvm(uoV,k,σ) fvm (ueU,k−1,σ)fum(veV,k− 12,σ)σ evenfvm (uoU,k−1,σ)fum(voV,k− 12,σ) fum (veU,k− 12 ,σ)fvm(ueV,k,σ)◦ The product ∂ tk∂mwm is simply∂ tk∂mwm = −∂Fo/ek∂mwm0N×1 =∂Fo/ek,1∂mwm...∂Fo/ek,s∂mwm0N×1, (A.205)242A.3. Staggered Time-Stepping Methodswith∂Fok,σ∂mwm∂Fek,σ∂mwmσ oddfum (voU,k− 12 ,σ)wmfvm(uoV,k,σ)wm fvm (ueU,k−1,σ)wmfum(veV,k− 12,σ)wmσ evenfvm (uoU,k−1,σ)wmfum(voV,k− 12,σ)wm fum (veU,k− 12 ,σ)wmfvm(ueV,k,σ)wm◦ The product ∂ tk∂m⊤ŵk is∂ tk∂m⊤ŵk = −s∑σ=1∂Fk,σ∂m⊤Wk,σ=−s∑σ=1σ odd(fum(voU,k− 12,σ)⊤Wuk,σ + fvm(uoV,k,σ)⊤Wvk+ 12,σ)+−s∑σ=1σ even(fvm(uoU,k−1,σ)⊤Wuk,σ + fum(voV,k− 12,σ)⊤Wvk+ 12,σ)if s odd−s∑σ=1σ odd(fvm(ueU,k−1,σ)⊤Wuk,σ + fum(veV,k− 12,σ)⊤Wvk+ 12,σ)+−s∑σ=1σ even(fum(veU,k− 12,σ)⊤Wuk,σ + fvm(ueV,k,σ)⊤Wvk+ 12,σ)if s even(A.206)• ∂ t∂y0Taking the derivative of t with respect to y0,∂ t∂y0=∂ t∂y0=∂∂y0(Ty− f) = − ∂ f∂y0(A.207)243A.3. Staggered Time-Stepping Methodswith∂ f∂y0=[∂Fo/e1∂y0⊤IN∂Fo/e2∂y0⊤0N×N · · ·∂Fo/eK∂y0⊤0N×N]⊤,where∂Fo/ek∂y0=[∂Fo/ek,1∂y0⊤· · · ∂Fo/ek,s∂y0⊤]⊤.Clearly∂Fo/ek∂y0= 0N×N for k > 1. For k = 1 we have◦ For number of stages s odd :∂Fo1,σ∂y0=0Nu×Nu fuv(voU,1/2,σ)0Nv×Nu 0Nv×Nv if σ oddfvu (uoU,0,σ) 0Nv×Nv0Nu×Nu fuv(voV,1/2,σ) if σ even.(A.208)◦ For number of stages s even:∂Fe1,σ∂y0=fvu (ueU,0,σ) 0Nv×Nv0Nu×Nu fuv(veV,1/2,σ) if σ odd0Nu×Nu fuv(veU,1/2,σ)0Nv×Nu 0Nv×Nv if σ even.(A.209)Then the products of∂ tk∂y0are◦ The product ∂ tk∂y0w0 is244A.3. Staggered Time-Stepping Methods∂ tk∂y0w0 =−∂Fo/e1,1∂y0w0...∂Fo/e1,s∂y0w0wu0wv0if k = 10(s+1)N×1 if k > 1(A.210)with∂Fo1,σ∂y0w0∂Fe1,σ∂y0w0σ oddfuv (voU,1/2,σ)wv00Nv×1  fvu (ueU,0,σ)wu0fuv(veV,1/2,σ)wv0σ even fvu (uoU,0,σ)wu0fuv(voV,1/2,σ)wv0 fuv (veU,1/2,σ)wv00Nv×1◦ The product ∂ tk∂y0⊤ŵk is 0N×1 for k > 1 and for k = 1 For number of stages s odd :∂ t1∂y0⊤ŵ1 =−wu1 −s∑σ=1σ evenfvu(uoU,0,σ)⊤Wu1,σ−wv32−s∑σ=1σ oddfuv(voU, 12,σ)⊤Wu1,σ+−s∑σ=1σ evenfuv(voV, 12,σ)⊤Wv32,σ(A.211a) For number of stages s even:245A.3. Staggered Time-Stepping Methods∂ t1∂y0⊤ŵ1 =−wu1 −s∑σ=1σ oddfvu(ueU,0,σ)⊤Wu1,σ−wv32−s∑σ=1σ oddfuv(veV, 12,σ)⊤Wv32,σ+−s∑σ=1σ evenfuv(veU, 12,σ)⊤Wu1,σ. (A.211b)Derivatives of∂ t∂yWe now find the derivatives of the σth N × 1 block ∂ tk,σ∂yw of the vector∂ tk∂yw,which in turn is the kth block of tyw =∂ t∂yw.The expression for the product∂ tk,σ∂yw was given in (A.198). We now find ex-pressions for the derivatives of (A.198) with respect to y, m and y0.• ∂2t∂m∂ywWe differentiate (A.198) with respect to m:◦ For number of stages s odd1 ≤ σ ≤ s and σ odd:∂ 2tk,σ∂m∂yw =−(fuv (voU,k− 12 ,σ)wv,oWu,k− 12 ,σ)m− (fvu (uoV,k,σ)wu,oWv,k,σ)m . (A.212a)1 ≤ σ ≤ s and σ even:∂ 2tk,σ∂m∂yw = −(fvu (uoU,k−1,σ)wu,o/eWu,k−1,σ)m−(fuv(voV,k− 12,σ)wv,oWv,k− 12,σ)m . (A.212b)σ = s+ 1:∂ 2tk,s+1∂m∂yw = 0N×Nm . (A.212c)246A.3. Staggered Time-Stepping Methods◦ For number of stages s even1 ≤ σ ≤ s and σ odd:∂ 2tk,σ∂m∂yw = − (fvu (ueU,k−1,σ)wu,eWu,k−1,σ)m−(fuv(veV,k− 12,σ)wv,eWv,k− 12,σ)m . (A.212d)1 ≤ σ ≤ s and σ even:∂ 2tk,σ∂m∂yw =−(fuv (veU,k− 12 ,σ)wv,eWu,k− 12 ,σ)m− (fvu (ueV,k,σ)wu,eWv,k,σ)m . (A.212e)σ = s+ 1:∂ 2tk,s+1∂m∂yw = 0N×Nm . (A.212f)• ∂2t∂y0∂ywTaking the derivative of (A.198) with respect to y0 gives◦ For number of stages s odd1 ≤ σ ≤ s and σ odd:∂ 2t1,σ∂y0∂yw =0Nu×Nu −(fuv (voU, 12 ,σ)wv,oWu, 12 ,σ)v0Nv×Nu 0Nv×Nv . (A.213a)1 ≤ σ ≤ s and σ even:∂ 2t1,σ∂y0∂yw =− (fvu (uoU,0,σ)wu,oWu,0,σ)u 0Nv×Nv0Nu×Nu −(fuv(voV, 12,σ)wv,oWv, 12,σ)v .(A.213b)σ = s+ 1:∂ 2t1,s+1∂y0∂yw = 0N×N . (A.213c)247A.3. Staggered Time-Stepping Methods◦ For number of stages s even1 ≤ σ ≤ s and σ odd:∂ 2t1,σ∂y0∂yw =− (fvu (ueU,0,σ)wu,eWu,0,σ)u 0Nv×Nv0Nu×Nu −(fuv(veV, 12,σ)wv,eWv, 12,σ)v .(A.213d)1 ≤ σ ≤ s and σ even:∂ 2t1,σ∂y0∂yw =0Nu×Nu −(fuv (veU, 12 ,σ)wv,eWu, 12 ,σ)v0Nv×Nu 0Nv×Nv . (A.213e)σ = s+ 1:∂ 2t1,s+1∂y0∂yw = 0N×N . (A.213f)The derivatives∂ 2tk,σ∂y0∂yw are 0N×N for all k > 1, 1 ≤ σ ≤ s + 1.• ∂2t∂y∂ywConsider the ℓth term of∂ 2tk,σ∂y∂yw =[∂ 2tk,σ∂ ŷ1∂yw∂ 2tk,σ∂ ŷ2∂yw · · · ∂2tk,σ∂ ŷK ∂yw], (A.214)with∂ 2tk,σ∂ ŷℓ∂yw =[∂ 2tk,σ∂Yℓ,1∂yw · · · ∂2tk,σ∂Yℓ,s∂yw∂ 2tk,σ∂yℓ∂yw], (A.215)where∂ 2tk,σ∂Yℓ,i∂yw =[∂ 2tk,σ∂Uℓ,i∂yw∂ 2tk,σ∂Vℓ,i∂yw]for 1 ≤ i ≤ s∂ 2tk,σ∂yℓ∂yw =[∂ 2tk,σ∂uℓ∂yw∂ 2tk,σ∂vℓ∂yw].Then248A.3. Staggered Time-Stepping Methods◦ For number of stages s oddFor 1 ≤ σ ≤ s and σ odd:∂ 2tk,σ∂Uk,i∂yw =τασi−(fuv(voU,k− 12,σ)wv,oWu,k− 12,σ)v0Nv×Nv if 1 ≤ i ≤ σ,i even0N×Nv otherwise,(A.216a)∂ 2tk,σ∂Vk,i∂yw =τασi 0Nv×Nu−(fvu(uoV,k,σ)wu,oWv,k,σ)u if 1 ≤ i ≤ σ,i even0N×Nu otherwise,(A.216b)and∂ 2tk,σ∂Yℓ,i∂yw = 0N×N for all other values of ℓ. Also,∂ 2tk,σ∂uℓ∂yw = 0Nu×Nu− (fvu (uoV,k,σ)wu,oWv,k,σ)u if ℓ = k0N×Nu otherwise(A.216c)and∂ 2tk,σ∂vℓ∂yw =−(fuv(voU,k− 12,σ)wv,oWu,k− 12,σ)v0Nv×Nv if ℓ = k − 10N×Nv otherwise.(A.216d)249A.3. Staggered Time-Stepping MethodsFor 1 ≤ σ ≤ s and σ even:∂ 2tk,σ∂Uk,i∂yw =τασi−(fvu(uoU,k−1,σ)wu,o/eWu,k−1,σ)u0Nu×Nu if 1 ≤ i ≤ σ,i odd0N×Nv otherwise,(A.216e)∂ 2tk,σ∂Vk,i∂yw =τασi 0Nu×Nv−(fuv (voV,k− 12,σ)wv,oWv,k− 12,σ)v if 1 ≤ i ≤ σ,i odd0N×Nu otherwise,(A.216f)and∂ 2tk,σ∂Yℓ,i∂yw = 0N×N for all other values of ℓ. Also,∂ 2tk,σ∂uℓ∂yw =−(fvu(uoU,k−1,σ)wu,o/eWu,k−1,σ)u0Nu×Nu if ℓ = k − 10N×Nu otherwise. (A.216g)and∂ 2tk,σ∂vℓ∂yw = 0Nv×Nv−(fuv(voU,k− 12,σ)wv,oWu,k− 12,σ)v if ℓ = k − 10N×Nv otherwise.(A.216h)For σ = s+ 1:∂ 2tk,s+1∂ ŷℓ∂yw = 0N×(s+1)N . (A.216i)◦ For number of stages s even250A.3. Staggered Time-Stepping MethodsFor 1 ≤ σ ≤ s and σ odd:∂ 2tk,σ∂Uk,i∂yw =τασi−(fvu(ueU,k−1,σ)wu,eWu,k−1,σ)u0Nv×Nu if 1 ≤ i ≤ σ,i even0N×Nv otherwise,(A.216j)∂ 2tk,σ∂Vk,i∂yw =τασi 0Nv×Nv−(fuv (veV,k− 12,σ)wv,eWv,k− 12,σ)v if 1 ≤ i ≤ σ,i even0N×Nu otherwise,(A.216k)and∂ 2tk,σ∂Yℓ,i∂yw = 0N×N for all other values of ℓ. Also,∂ 2tk,σ∂uℓ∂yw =− (fvu (ueU,k−1,σ)wu,eWu,k−1,σ)u0Nu×Nu if ℓ = k − 10N×Nu otherwise(A.216l)and∂ 2tk,σ∂vℓ∂yw = 0Nv×Nv−(fuv(veV,k− 12,σ)wv,eWv,k− 12,σ)v if ℓ = k − 10N×Nv otherwise.(A.216m)For 1 ≤ σ ≤ s and σ even:∂ 2tk,σ∂Uk,i∂yw =τασi−(fuv(veU,k− 12,σ)wv,eWu,k− 12,σ)v0Nv×Nv if 1 ≤ i ≤ σ,i odd0N×Nv otherwise,(A.216n)251A.3. Staggered Time-Stepping Methods∂ 2tk,σ∂Vk,i∂yw =τασi 0Nu×Nv−(fvu(ueV,k,σ)wu,eWv,k,σ)u if 1 ≤ i ≤ σ,i odd0N×Nu otherwise,(A.216o)and∂ 2tk,σ∂Yℓ,i∂yw = 0N×N for all other values of ℓ. Also,∂ 2tk,σ∂uℓ∂yw = 0Nu×Nu− (fvu (ueV,k,σ)wu,eWv,k,σ)u if ℓ = k0N×Nu otherwise(A.216p)and∂ 2tk,σ∂vℓ∂yw =−(fuv(veU,k− 12,σ)wv,eWu,k− 12,σ)v0Nv×Nv if ℓ = k − 10N×Nv otherwise.(A.216q)For σ = s+ 1:∂ 2tk,s+1∂ ŷℓ∂yw = 0N×(s+1)N . (A.216r)Derivatives of∂ t∂mWe consider the derivatives of the σth N × 1 block ∂ tk,σ∂mwm of the vector∂ tk∂mwm,which in turn is the kth block of tyw =∂ t∂mwm.The expression for the product∂ tk∂mwm was given in (A.205). We now find ex-pressions for the derivatives of (A.205) with respect to y, m and y0.• ∂2t∂m∂mwmWe differentiate (A.205) with respect to m:252A.3. Staggered Time-Stepping Methods◦ For number of stages s oddFor 1 ≤ σ ≤ s and σ odd:∂ 2tk,σ∂m∂mwm =−(fum (voU,k− 12 ,σ)wm)m− (fvm (uoV,k,σ)wm)m . (A.217a)For 1 ≤ σ ≤ s and σ even:∂ 2tk,σ∂m∂mwm = − (fvm (uoU,k−1,σ)wm)m−(fum(voV,k− 12,σ)wm)m . (A.217b)For σ = s+ 1:∂ 2tk,s+1∂m∂mwm = 0N×Nm. (A.217c)◦ For number of stages s evenFor 1 ≤ σ ≤ s and σ odd:∂ 2tk,σ∂m∂mwm = − (fvm (ueU,k−1,σ)wm)m−(fum(veV,k− 12,σ)wm)m . (A.217d)For 1 ≤ σ ≤ s and σ even:∂ 2tk,σ∂m∂mwm =−(fum (veU,k− 12 ,σ)wm)m− (fvm (ueV,k,σ)wm)m . (A.217e)For σ = s+ 1:∂ 2tk,s+1∂m∂mwm = 0N×Nm. (A.217f)• ∂2t∂y0∂mwmTaking the derivative of (A.205) with respect to y0 gives◦ For number of stages s odd253A.3. Staggered Time-Stepping MethodsFor 1 ≤ σ ≤ s and σ odd:∂ 2t1,σ∂y0∂mwm =0Nu×Nu −(fum (voU, 12 ,σ)wm)v0Nv×Nu 0Nv×Nv . (A.218a)For 1 ≤ σ ≤ s and σ even:∂ 2t1,σ∂y0∂mwm =− (fvm (uoU,0,σ)wm)u 0Nv×Nv0Nu×Nu −(fum(voV, 12,σ)wm)v . (A.218b)For σ = s+ 1:∂ 2t1,s+1∂y0∂mwm = 0N×N . (A.218c)◦ For number of stages s evenFor 1 ≤ σ ≤ s and σ odd:∂ 2t1,σ∂y0∂mwm =− (fvm (ueU,0,σ)wm)u 0Nv×Nv0Nu×Nu −(fum(veV, 12,σ)wm)v . (A.218d)For 1 ≤ σ ≤ s and σ even:∂ 2t1,σ∂y0∂mwm =0Nu×Nu −(fum (veU, 12 ,σ)wm)v0Nv×Nu 0Nv×Nv . (A.218e)For σ = s+ 1:∂ 2t1,s+1∂y0∂mwm = 0N×N . (A.218f)The derivatives∂ 2tk,σ∂y0∂mwm are 0N×N for all k > 1, 1 ≤ σ ≤ s+ 1.• ∂2t∂y∂mwm254A.3. Staggered Time-Stepping MethodsConsider the ℓth term of∂ 2tk,σ∂y∂mwm =[∂ 2tk,σ∂ ŷ1∂mwm∂ 2tk,σ∂ ŷ2∂mwm · · · ∂2tk,σ∂ ŷK ∂mwm], (A.219)with∂ 2tk,σ∂ ŷℓ∂mwm =[∂ 2tk,σ∂Yℓ,1∂mwm · · · ∂2tk,σ∂Yℓ,s∂mwm∂ 2tk,σ∂yℓ∂mwm], (A.220)where∂ 2tk,σ∂Yℓ,i∂mwm =[∂ 2tk,σ∂Uℓ,i∂mwm∂ 2tk,σ∂Vℓ,i∂mwm]for 1 ≤ i ≤ s∂ 2tk,σ∂yℓ∂mwm =[∂ 2tk,σ∂uℓ∂mwm∂ 2tk,σ∂vℓ∂mwm].Then◦ For number of stages s oddFor 1 ≤ σ ≤ s and σ odd:∂ 2tk,σ∂Uk,i∂mwm =τασi−(fum(voU,k− 12,σ)wm)v0Nv×Nv if 1 ≤ i ≤ σ,i even0N×Nv otherwise,(A.221a)∂ 2tk,σ∂Vk,i∂mwm =τασi 0Nv×Nu−(fvm(uoV,k,σ)wm)u if 1 ≤ i ≤ σ,i even0N×Nu otherwise,(A.221b)255A.3. Staggered Time-Stepping Methodsand∂ 2tk,σ∂Yℓ,i∂mwm = 0N×N for all other values of ℓ. Also,∂ 2tk,σ∂uℓ∂mwm = 0Nu×Nu− (fvm (uoV,k,σ)wm)u if ℓ = k0N×Nu otherwise(A.221c)and∂ 2tk,σ∂vℓ∂mwm =−(fum(voU,k− 12,σ)wm)v0Nv×Nv if ℓ = k − 10N×Nv otherwise.(A.221d)For 1 ≤ σ ≤ s and σ even:∂ 2tk,σ∂Uk,i∂mwm =τασi−(fvm(uoU,k−1,σ)wm)u0Nu×Nu if 1 ≤ i ≤ σ,i odd0N×Nu otherwise,(A.221e)∂ 2tk,σ∂Vk,i∂mwm =τασi 0Nu×Nv−(fum(voV,k− 12,σ)wm)v if 1 ≤ i ≤ σ,i odd0N×Nv otherwise,(A.221f)and∂ 2tk,σ∂Yℓ,i∂mwm = 0N×N for all other values of ℓ. Also,∂ 2tk,σ∂uℓ∂mwm =− (fvm (uoU,k−1,σ)wm)u0Nu×Nu if ℓ = k − 10N×Nu otherwise. (A.221g)256A.3. Staggered Time-Stepping Methodsand∂ 2tk,σ∂vℓ∂mwm = 0Nv×Nv−(fum(voU,k− 12,σ)wm)v if ℓ = k − 10N×Nv otherwise.(A.221h)For σ = s+ 1:∂ 2tk,s+1∂ ŷℓ∂mwm = 0N×(s+1)N . (A.221i)◦ For number of stages s evenFor 1 ≤ σ ≤ s and σ odd:∂ 2tk,σ∂Uk,i∂mwm =τασi−(fvm(ueU,k−1,σ)wm)u0Nv×Nu if 1 ≤ i ≤ σ,i even0N×Nu otherwise,(A.221j)∂ 2tk,σ∂Vk,i∂mwm =τασi 0Nv×Nv−(fum(veV,k− 12,σ)wm)v if 1 ≤ i ≤ σ,i even0N×Nv otherwise,(A.221k)and∂ 2tk,σ∂Yℓ,i∂mwm = 0N×N for all other values of ℓ. Also,∂ 2tk,σ∂uℓ∂mwm =− (fvm (ueU,k−1,σ)wm)u0Nu×Nu if ℓ = k − 10N×Nu otherwise(A.221l)257A.3. Staggered Time-Stepping Methodsand∂ 2tk,σ∂vℓ∂mwm = 0Nv×Nv−(fum(veV,k− 12,σ)wm)v if ℓ = k − 10N×Nv otherwise.(A.221m)For 1 ≤ σ ≤ s and σ even:∂ 2tk,σ∂Uk,i∂mwm =τασi−(fum(veU,k− 12,σ)wm)v0Nv×Nv if 1 ≤ i ≤ σ,i odd0N×Nv otherwise,(A.221n)∂ 2tk,σ∂Vk,i∂mwm =τασi 0Nu×Nv−(fvm(ueV,k,σ)wm)u if 1 ≤ i ≤ σ,i odd0N×Nu otherwise,(A.221o)and∂ 2tk,σ∂Yℓ,i∂mwm = 0N×N for all other values of ℓ. Also,∂ 2tk,σ∂uℓ∂mwm = 0Nu×Nu− (fvm (ueV,k,σ)wm)u if ℓ = k0N×Nu otherwise(A.221p)and∂ 2tk,σ∂vℓ∂mwm =−(fum(veU,k− 12,σ)wm)v0Nv×Nv if ℓ = k − 10N×Nv otherwise.(A.221q)258A.3. Staggered Time-Stepping MethodsFor σ = s+ 1:∂ 2tk,s+1∂ ŷℓ∂mwm = 0N×(s+1)N . (A.221r)Derivatives of∂ t∂y0We consider the derivatives of the σth block∂ tk,σ∂y0w0 of the vector∂ tk∂y0w0, which inturn is the kth block of∂ t∂y0w0 =∂ t∂y0w0.The expression for the product∂ tk∂y0w0 was given in (A.210). We now find expres-sions for the derivatives of (A.210) with respect to y, m and y0.• ∂2t∂m∂y0w0We differentiate (A.210) with respect to m:◦ For number of stages s oddFor 1 ≤ σ ≤ s and σ odd:∂ 2t1,σ∂m∂y0w0 =−(fuv (voU, 12 ,σ)wv0)m0Nv×Nm . (A.222a)For 1 ≤ σ ≤ s and σ even:∂ 2t1,σ∂m∂y0w0 = − (fvu (uoU,0,σ)wu0 )m−(fuv(voV,k− 12,σ)wv0)m . (A.222b)For σ = s+ 1:∂ 2t1,s+1∂m∂y0w0 = 0N×Nm. (A.222c)◦ For number of stages s even259A.3. Staggered Time-Stepping MethodsFor 1 ≤ σ ≤ s and σ odd:∂ 2t1,σ∂m∂y0w0 = − (fvu (ueU,0,σ)wu0 )m−(fuv(veV, 12,σ)wv0)m . (A.222d)For 1 ≤ σ ≤ s and σ even:∂ 2t1,σ∂m∂y0w0 =−(fuv (veU, 12 ,σ)wv0)m0Nv×Nm . (A.222e)For σ = s+ 1:∂ 2t1,s+1∂m∂y0w0 = 0N×Nm. (A.222f)We have∂ 2t1,σ∂m∂y0w0 = 0N×Nm for k > 1.• ∂2t∂y0∂y0w0Taking the derivative of (A.198) with respect to y0 gives◦ For number of stages s oddFor 1 ≤ σ ≤ s and σ odd:∂ 2t1,σ∂y0∂y0w0 =0Nu×Nu −(fuv (voU, 12 ,σ)wv0)v0Nv×Nu 0Nv×Nv . (A.223a)For 1 ≤ σ ≤ s and σ even:∂ 2t1,σ∂y0∂y0w0 =− (fvu (uoU,0,σ)wu0 )u 0Nv×Nv0Nu×Nu −(fuv(voV, 12,σ)wv0)v . (A.223b)For σ = s+ 1:∂ 2t1,s+1∂y0∂y0w0 = 0N×N . (A.223c)260A.3. Staggered Time-Stepping Methods◦ For number of stages s evenFor 1 ≤ σ ≤ s and σ odd:∂ 2t1,σ∂y0∂yw =− (fvu (ueU,0,σ)wu0 )u 0Nv×Nv0Nu×Nu −(fuv(veV, 12,σ)wv0)v . (A.223d)For 1 ≤ σ ≤ s and σ even:∂ 2t1,σ∂y0∂yw =0Nu×Nu −(fuv (veU, 12 ,σ)wv0)v0Nv×Nu 0Nv×Nv . (A.223e)For σ = s+ 1:∂ 2t1,s+1∂y0∂y0w0 = 0N×N . (A.223f)The derivatives∂ 2t1,σ∂y0∂y0w0 are 0N×N for all k > 1, 1 ≤ σ ≤ s+ 1.• ∂2t∂y∂y0w0It is immediately clear that∂ 2tk,σ∂y∂y0w0 = 0(s+1)KN×Nfor all k > 1 and that∂ 2t1,σ∂ ŷℓ∂y0w0 = 0(s+1)N×Nfor ℓ > 1. Since∂ 2t1,σ∂Y1,i∂y0w0 =[∂ 2t1,σ∂U1,i∂y0w0∂ 2t1,σ∂V1,i∂y0w0]for 1 ≤ i ≤ s,we have◦ For number of stages s odd261A.3. Staggered Time-Stepping MethodsFor 1 ≤ σ ≤ s and σ odd:∂ 2t1,σ∂Y1,i∂y0w0 =τασi−(fuv(voU, 12,σ)wv0)v0Nu×Nu0Nv×Nv 0Nv×Nu if 1 ≤ i ≤ σ,i even0N×N otherwise.(A.224a)For 1 ≤ σ ≤ s and σ even:∂ 2t1,σ∂U1,i∂y0w0 =τασi−(fvu(uoU,0,σ)wu0)u0Nu×Nu if 1 ≤ i ≤ σ,i odd0N×Nu otherwise,(A.224b)∂ 2t1,σ∂V1,i∂y0w0 =τασi 0Nu×Nv−(fuv(voV, 12,σ)wv0)v if 1 ≤ i ≤ σ,i odd0N×Nv otherwise.(A.224c)◦ For number of stages s evenFor 1 ≤ σ ≤ s and σ odd:∂ 2t1,σ∂U1,i∂y0w0 =τασi−(fvu(ueU,0,σ)wu0)u0Nv×Nu if 1 ≤ i ≤ σ,i even0N×Nv otherwise,(A.224d)∂ 2t1,σ∂V1,i∂y0w0 =τασi 0Nv×Nv−(fuv(veV, 12,σ)wv0)v if 1 ≤ i ≤ σ,i even0N×Nu otherwise.(A.224e)262A.4. Exponential Time DifferencingFor 1 ≤ σ ≤ s and σ even:∂ 2t1,σ∂Y1,i∂y0w0 =τασi−(fuv(veU, 12,σ)wv0)v0Nu×Nu0Nv×Nv 0Nv×Nu if 1 ≤ i ≤ σ,i odd0N×Nv otherwise.(A.224f)A.4 Exponential Time DifferencingA.4.1 Exponential Runge-Kutta MethodsThe time-stepping vector t = t(y;m,y0) for ETDRK methods was found in Section2.4.1 to bet (y;m,y0) = Ty− n (y,y0) ,wherey =[Y⊤1 y⊤1 Y⊤2 y⊤2 · · · Y⊤K y⊤K]⊤,n =[N⊤1 (eτ1L0y0)⊤ N⊤2 01×N · · · N⊤K 01×N]⊤,T =IsN−B⊤1 INIsN−eτ2L1 −B⊤2 IN. . . . . . . . .IsN−eτKLK−1 −B⊤K IN,263A.4. Exponential Time Differencingsee (2.60b). We have let Bk = τk[b1(τkLk−1) · · · bs(τkLk−1)]⊤andYk =Yk,1...Yk,s , Nk =Nk,1,...Nk,s ,withNk,σ = nk−1(ecστkLk−1yk−1 + τkσ−1∑j=1aσj(τkLk−1)Yk,j, tk−1 + cστk).Let Ask−1 be the s × s block matrix where the (i, j)th entry is aij(τkLk−1). To easethe notation further, lettk,σ = tk−1 + cστk, (A.225a)yk,σ= ecστkLk−1(yk−1,m)yk−1 + τkσ−1∑j=1aij(τkLk−1(yk−1,m))yk,j, (A.225b)and∂Nk∂y= diag(∂Nk,1∂y, · · · , ∂Nk,s∂y). (A.225c)The solution procedure at the kth time-step is represented bytk = Yk −Nk−eτkLk−1yk−1 −B⊤kYk + yk . (A.226)We will now take the derivative of t with respect to each of y, m and y0 in turn.• ∂ t∂yLetting ŷk =[Y⊤k y⊤k]⊤, we break the derivative of t with respect to y into parts:∂ t∂y=[∂ t∂ ŷ1∂ t∂ ŷ2· · · ∂ t∂ ŷK]= T−[∂n∂ ŷ1∂n∂ ŷ2· · · ∂n∂ ŷK]. (A.227)264A.4. Exponential Time DifferencingThe derivative of the kth time-step is∂ tk∂y=[∂ tk∂ ŷ1∂ tk∂ ŷ2· · · ∂ tk∂ ŷK],with∂ tk∂ ŷj=∂∂ ŷj(Yk −Nk)∂∂ ŷj(−eτkLk−1yk−1 −B⊤kYk + yk) . (A.228)Looking at the terms in (A.228) individually and using the chain rule, we have forj = k∂Yk∂ ŷk=[IsN×sN 0sN×N] ∂Nk∂ ŷk= τk∂Nk∂y[Ask−1 0sN×N]∂yk∂ ŷk=[0N×sN IN×N] ∂(B⊤kYk)∂ ŷk=[B⊤k 0N×N]and for j = k − 1∂Nk∂ ŷk−1=[0sN×sN∂Nk∂yk−1]∂(eτkLk−1yk−1)∂ ŷk−1=[0N×sN∂(eτkLk−1yk−1)∂yk−1]∂(B⊤kYk)∂ ŷk−1=[0N×sN∂(B⊤kYk)∂yk−1].The terms∂(B⊤kYk)∂yk−1= τks∑σ=1∂(bσ(τkLk−1(yk−1))Yk,σ)∂yk−1∂nk−1,σ(yk,i(yk−1),yk−1)∂yk−1=∂nk−1,σ∂y∂yk,σ(yk−1)∂yk−1+∂nk−1,σ (yk−1)∂yk−1265A.4. Exponential Time Differencingwith∂yk,σ(yk−1)∂yk−1=∂ ecστkLk−1(yk−1)yk−1∂yk−1+ τkσ−1∑j=1∂(aσj(τkLk−1(yk−1))Yk,j)∂yk−1,where∂ (ecστkLk−1(yk−1)yk−1)∂yk−1= ecστkLk−1(yk−1) +∂ (ecστkLk−1(yk−1)yfixedk−1 )∂yk−1,are discussed in Section 4.4.2.Now letAk = IsN − τk ∂Nk∂yAsk−1 Ck = −∂Nk∂yk−1,Dk = −∂(eτkLk−1yk−1)∂yk−1− ∂(B⊤kYk)∂yk−1(A.229)so that the (k, j)th (s+ 1)N × (s+ 1)N block of ∂ t∂yis∂ tk∂ ŷj= Ak 0sN×N−B⊤k IN if j = k0sN×sN Ck0N×sN Dk if j = k − 10(s+1)N×(s+1)N otherwise.(A.230)266A.4. Exponential Time Differencingand we can therefore write∂ t∂y=A1−B⊤1 INC2 A2D2 −B⊤2 IN. . . . . . . . .CK AKDK −B⊤K IN, (A.231)which is a block lower-triangular matrix representing the linearized forward time-stepping scheme.◦ The kth (s+1)N × 1 block of the product ∂ t∂yw, where w is an arbitrary vectorof the form w =[W⊤1 w⊤1 · · · W⊤K w⊤K]⊤defined analogously to y, is∂ tk∂yw = AkWk +Ckwk−1−B⊤kWk +wk +Dkwk−1 (A.232)with w0 = 0N×1. If∂ tk,σ∂yw is the σth N ×1 block of ∂ tk∂yw, with 1 ≤ σ ≤ s+1,267A.4. Exponential Time Differencingwe have∂ tk,σ∂yw =Wk,σ − ∂nk−1,σ∂ywk,σ −{∂nk−1,σ∂yk−1wk−1+∂nk−1,σ∂y(∂ (ecστkLk−1(yk−1)yfixedk−1 )∂yk−1wk−1 ++ τkσ−1∑i=1∂(aσi(τkLk−1(yk−1))Yk,i)∂yk−1wk−1)} if σ ≤ swk − eτkLk−1wk−1 − τks∑σ=1bσ(τkLk−1)Wk,σ+−{∂ (eτkLk−1(yk−1)yfixedk−1 )∂yk−1wk−1+τks∑i=1∂(bi(τkLk−1(yk−1))Wk,i)∂yk−1wk−1} if σ = s+ 1,(A.233)where wk,σ = ecστkLk−1wk−1 + τkσ−1∑i=1aσi(τkLk−1)Wk,i.◦ The transpose of ∂ t∂yis upper block-triangular:∂ t∂y⊤=A⊤1 −B1IN C⊤2 D⊤2A⊤2 −B2. . . . . . . . .IN C⊤K D⊤KA⊤K −BKIN. (A.234)The jth (s+1)N×1 block of the product of ∂ t∂y⊤with some arbitrary w thereforeis∂ t∂ ŷj⊤w = A⊤j Wj −BjwjC⊤j+1Wj+1 +wj +D⊤j+1wj+1 , (A.235)268A.4. Exponential Time Differencingwith WK+1,σ = wK+1 = 0N×1. The σth N × 1 block ∂ t∂ ŷj,σ⊤w of∂ t∂ ŷj⊤w, with1 ≤ σ ≤ s+ 1, is∂ t∂ ŷj,σ⊤w ==Wk,σ − τk bσ (τkLk−1)⊤wk+− τks∑j=σ+1ajσ(τkL⊤k−1)(∂nk−1,j∂y)⊤Wk,jif σ ≤ swk − eτk+1L⊤k wk+1 −s∑σ=1ecσ τk+1L⊤k(∂nk,σ∂y)⊤Wk+1,σ+−{s∑i=1[∂nk,i∂yk⊤+(∂ eciτk+1Lk(yk)yfixedk∂yk⊤++ τk+1i−1∑j=1∂(aij(τk+1Lk(yk))Yk+1,j)∂yk⊤ ∂nk,i∂y⊤Wk+1,i++(∂ (eτk+1Lk(yk)yfixedk )∂yk⊤++τk+1s∑i=1∂(bi(τk+1Lk(yk))Yk+1,i)∂yk⊤)wk+1},if σ = s+ 1.• ∂ t∂mWe now turn our attention to computing the derivative of t with respect to m.Consider the derivative of the kth time-step:∂ tk∂m= −∂Nk(y,m)∂m− ∂∂m(eτkLk−1yk−1 −B⊤kYk) . (A.236)The individual terms are∂(B⊤kYk)∂m= τks∑σ=1∂(bi(τkLk−1(m))Yk,σ)∂m, (A.237a)269A.4. Exponential Time Differencingand∂nk−1,σ(yk−1,σ,m)∂m=∂nk,i(m)∂y∂yk,σ∂m+∂nk,σ(m)∂m(A.237b)with∂yk,σ∂m=∂(eciτkLk−1(m)yk−1)∂m+ τkσ−1∑j=1∂(aσj(τkLk−1(m))Yk,j)∂m.If the linear terms τkLk−1 are independent of m then the only term that dependson m is Nk, so trivially we have∂ tk∂m=−∂Nk(m)∂m0N×Nm . (A.238)See Section 4.4.2 if τkLk−1 depends on m.◦ The product of ∂ tk∂mwith some vector wm of length Nm is∂ tk∂mwm = −∂Nk(y,m)∂m wm−∂(eτkLk−1yk−1 −B⊤kYk)∂mwm , (A.239)with the bottom block ignored if Lk does not depend on m.◦ The product ∂ tk∂m⊤ŵk, with ŵk =[W⊤k w⊤k]⊤an arbitrary vector of length(s+ 1)N defined analogously to ŷk (replacing Yk by Wk), is∂ tk∂m⊤ŵk = −s∑σ=1∂nk−1,σ∂m⊤Wk,σ −{∂ (eτkLk−1(m)yk−1)∂m⊤wk++s∑σ=1(∂yk,σ∂m⊤vk,σ + τk∂(bσ(τkLk−1(m))yk,σ)∂m⊤wk)},270A.4. Exponential Time Differencingwhere vk,σ =∂nk−1,σ∂y⊤Wk,σ and∂yk,σ∂m⊤vk,σ =∂ (eciτkLk−1(m)yk−1)∂m⊤vk,σ + τki−1∑j=1∂(aσj(τkLk−1(m))Yk,σ)∂m⊤vk,σ.The terms in braces in (A.240) are ignored if Lk−1 is independent of m.• ∂ t∂y0It is clear that the derivative of tk with respect to y0 is 0sN×N for all k > 1 sincethere is no explicit dependence on y0 in later time-steps.For k = 1 we have∂ t1∂y0= −∂N1(y)∂y0− ∂∂y0(eτ1L0y0 −B⊤1Y1) . (A.240)The individual terms are∂(B⊤1Y1)∂y0= τ1s∑σ=1∂(bσ(τ1L0(y0))Y1,σ)∂y0∂n0,σ(y1,i(y0),y0)∂y0=∂n0,σ∂y∂y1,σ(y0)∂y0+∂n0,σ (y0)∂y0with∂y1,σ(y0)∂y0=∂ ecστ1L0(y0)y0∂y0+ τ1σ−1∑i=1∂(aσi(τ1L0(y0))Y1,i)∂y0,where∂ (ecστ1L0(y0)y0)∂y0= ecστkLk−1(y0) +∂ (ecστ1L0(y0)yfixed0 )∂y0.◦ The product ∂ tk∂y0w0, where w0 is an arbitrary vector of length N , is 0N×1 ifk > 1. If k = 1,∂ tk∂y0w0 =C1w0D1w0 . (A.241)271A.4. Exponential Time DifferencingIf∂ t1,σ∂y0w0 is the σth N × 1 block of ∂ t1∂y0w0, with 1 ≤ σ ≤ s+ 1, we have∂ tk,σ∂y0w0 =− ∂n0,σ∂yecστkLk−1w0 −{∂n0,σ∂y0w0+∂n0,σ∂y(∂ (ecστ1L0(y0)yfixed0 )∂y0w0 ++ τ1σ−1∑i=1∂(aσi(τ1L0(y0))Y1,i)∂y0w0)} if σ ≤ s− eτ1L0w0 −{∂ (eτ1L0(y0)yfixed0 )∂y0w0+τ1s∑i=1∂(bi(τ1L0(y0))W1,i)∂y0w0} if σ = s+ 1.(A.242)◦ The transpose of ∂ t1∂y0with some arbitrary vector ŵ1 of length (s+ 1)N is∂ tk∂y0⊤ŵ1 =[C⊤1W1 +D⊤1w1]. (A.243)which gives∂ t∂y0⊤w0 = −eτ1L⊤0 w1 −{s∑i=1[∂n0,i∂y0⊤+(∂ eciτ1L0(y0)yfixed0∂y0⊤++ τ1i−1∑j=1∂(aij(τ1L0(y0))Y1,j)∂y0⊤) ∂n0,i∂y⊤]W1,i++(∂ (eτ1L0(y0)yfixed0 )∂y0⊤+ τ1s∑i=1∂(bi(τ1L0(y0))Y1,i)∂y0⊤)w1}.272Appendix BDerivation of the HessianIn this appendix we use the discrete adjoint method to find the expression (3.15) forthe product of the Hessian of the data misfit function M multiplied by some arbitraryvector wp =[w⊤m w⊤y0w⊤s]⊤of length Nm +Ny0 +Ns.We use the following notation: if x, y, z and w are some vector quantities ofappropriate lengths, we let∂2x∂y∂zw =∂∂y(∂x∂zw∣∣∣∣w),where the ·|w on the right-hand side means that w is taken to be fixed when per-forming the differentiation with respect to y. Note that∂2x∂y∂zis actually a three-dimensional tensor and its product with a vector is ambiguously defined, so it isimportant to keep our convention in mind.From Section 3.2 we know that∇pM = ∂d∂p⊤∇dM = J⊤∇dM,so by the product rule∂2M∂p∂pwp = ∇p(∇pM⊤wp) = ∇p (∇dM⊤ Jwp)=1©∂2M∂p∂d⊤Jwp +2©(∂∂pJwp)⊤∇dM .(B.1)273Appendix B. Derivation of the HessianBy the chain rule, 1© is just∂2M∂m∂d⊤Jwp =∂d∂m⊤ ∂2M∂d∂dJwp = J⊤ ∂2M∂d∂dJwp. (B.2)Lettingx = x(p) =∂y∂pwp =∂y∂mwm +∂y∂ sws +∂y∂y0wy0 (B.3)for brevity, we have Jwp =∂d∂pwp =∂d∂yx. 2© becomes(∂∂pJwp)⊤∇dM =(∂∂p(∂d∂yx))⊤∇dM=3©∂y∂p⊤( ∂2d∂y∂yx)⊤∇dM +4©∂x∂p⊤∂d∂y⊤∇dM . (B.4)To compute 4©, we need to consider each term of∂2y∂p∂pwp =[∂2y∂m∂pwp∂2y∂y0∂pwp∂2y∂ s∂pwp]. (B.5)We apply the following labels:∂2y∂m∂pwp = A©∂2y∂m∂mwm +B©∂2y∂m∂y0wy0 +C©∂2y∂m∂ sws , (B.6a)∂2y∂y0∂pwp = D©∂2y∂y0∂mwm +E©∂2y∂y0∂y0wy0 +F©∂2y∂y0∂ sws , (B.6b)∂2y∂ s∂pwp = G©∂2y∂ s∂mwm +H©∂2y∂ s∂y0wy0 +I©∂2y∂ s∂ sws . (B.6c)Each of these terms will now be considered in turn.274Appendix B. Derivation of the Hessian• A©, D©, G©: Multiply (3.3) by wm,∂ t∂mwm +∂ t∂y∂y∂mwm = 0N×1, (B.7)with t = t (y (p) ,m,y0), and let xm =∂y∂mwm, zm =∂ t∂mwm and vm =∂ t∂yxm.◦ A©: Differentiate (B.7) with respect to m:∂zm∂m+∂zm∂y∂y∂m+∂vm∂m+∂vm∂y∂y∂m+∂ t∂y∂xm∂m= 0N×Nm.ThereforeA© = ∂xm∂m= −(∂ t∂y)−1(∂zm∂m+∂zm∂y∂y∂m+∂vm∂m+∂vm∂y∂y∂m).◦ D©: Differentiate (B.7) with respect to y0:∂zm∂y0+∂zm∂y∂y∂y0+∂vm∂y0+∂vm∂y∂y∂y0+∂ t∂y∂xm∂y0= 0N×N .ThereforeD© = ∂xm∂y0= −(∂ t∂y)−1(∂zm∂y0+∂zm∂y∂y∂y0+∂vm∂y0+∂vm∂y∂y∂y0).◦ G©: Differentiate (B.7) with respect to s:∂zm∂y∂y∂ s+∂vm∂y∂y∂ s+∂ t∂y∂xm∂ s= 0N×Ns.ThereforeG© = ∂xm∂ s= −(∂ t∂y)−1(∂zm∂y∂y∂ s+∂vm∂y∂y∂ s).• B©, E©, H©: Multiply (3.5) by wy0 ,∂ t∂y0wy0 +∂ t∂y∂y∂y0wy0 = 0N×N , (B.8)275Appendix B. Derivation of the Hessianwith t = t (y (p) ,m,y0), and let xy0 =∂y∂y0wy0 , zy0 =∂ t∂y0wy0 and vy0 =∂ t∂yxy0 .◦ B©: Differentiate (B.8) with respect to m:∂zy0∂m+∂zy0∂y∂y∂m+∂vy0∂m+∂vy0∂y∂y∂m+∂ t∂y∂xy0∂m= 0.ThereforeB© = ∂xy0∂m= −(∂ t∂y)−1(∂zy0∂m+∂zy0∂y∂y∂m+∂vy0∂m+∂vy0∂y∂y∂m).◦ E©: Differentiate (B.8) with respect to y0:∂zy0∂y0+∂zy0∂y∂y∂y0+∂vy0∂y0+∂vy0∂y∂y∂y0+∂ t∂y∂xy0∂y0= 0.ThereforeE© = ∂xy0∂y0= −(∂ t∂y)−1(∂zy0∂y0+∂zy0∂y∂y∂y0+∂vy0∂y0+∂vy0∂y∂y∂y0).◦ H©: Differentiate (B.8) with respect to s:∂zy0∂y∂y∂ s+∂vy0∂y∂y∂ s+∂ t∂y∂xy0∂ s= 0.ThereforeH© = ∂xy0∂ s= −(∂ t∂y)−1(∂zy0∂y∂y∂ s+∂vy0∂y∂y∂ s).• C©, F©, I©: Multiply (3.7) by ws,∂ t∂y∂y∂ sws = S∂q∂ sws, (B.9)with t = t (y (p) ,m,y0), and let xs =∂y∂ sws and vs =∂ t∂yxs.◦ C©: Differentiate (B.9) with respect to m:∂vs∂m+∂vs∂y∂y∂m+∂ t∂y∂xs∂m= 0N×Nm.276Appendix B. Derivation of the HessianThereforeC© = ∂xs∂m= −(∂ t∂y)−1(∂vs∂m+∂vs∂y∂y∂m). (B.10)◦ F©: Differentiate (B.9) with respect to y0:∂vs∂y0+∂vs∂y∂y∂y0+∂ t∂y∂xs∂y0= 0.ThereforeF© = ∂xs∂y0= −(∂ t∂y)−1(∂vs∂y0+∂vs∂y∂y∂y0).◦ I©: Differentiate (B.9) with respect to s:∂vs∂y∂y∂ s+∂ t∂y∂xs∂ s= S∂2q∂ s∂ sws.ThereforeI© = ∂xs∂m=(∂ t∂y)−1(S∂2q∂ s∂ sws − ∂vs∂y∂y∂ s). (B.11)Hence, after some simplification,∂2y∂m∂pwp = A©+ B©+ C©= −(∂ t∂y)−1(∂ 2t∂m∂mwm +∂ 2t∂m∂y0wy0 +∂ 2t∂m∂y∂y∂pwp++(∂ 2t∂y∂mwm +∂ 2t∂y∂y0wy0 +∂ 2t∂y ∂y∂y∂pwp)∂y∂m), (B.12)∂2y∂y0∂pwp = D©+ E©+ F©= −(∂ t∂y)−1(∂ 2t∂y0∂mwm +∂ 2t∂y0∂y0wy0 +∂ 2t∂y0∂y∂y∂pwp++(∂ 2t∂y∂mwm +∂ 2t∂y∂y0wy0 +∂ 2t∂y ∂y∂y∂pwp)∂y∂y0), (B.13)∂2y∂ s∂pwp = G©+ H©+ I©= −(∂ t∂y)−1(−S ∂2q∂ s∂ sws+277Appendix B. Derivation of the Hessian+(∂ 2t∂y∂mwm +∂ 2t∂y∂y0wy0 +∂ 2t∂y ∂y∂y∂pwp)∂y∂ s). (B.14)∂2y∂p∂pwp then is∂2y∂p∂pwp⊤=∂2y∂m∂pwp⊤∂2y∂y0∂pwp⊤∂2y∂ s∂pwp⊤= −∂ 2t∂m∂mwm⊤+∂ 2t∂m∂y0wy0⊤+∂ 2t∂m∂y∂y∂pwp⊤∂ 2t∂y0∂mwm⊤+∂ 2t∂y0∂y0wy0⊤+∂ 2t∂y0∂y∂y∂pwp⊤− ∂2q∂ s∂ sws⊤S⊤∂ t∂y−⊤+− ∂y∂p⊤(∂ 2t∂y∂mwm⊤+∂ 2t∂y∂y0wy0⊤+∂ 2t∂y ∂y∂y∂pwp⊤) ∂ t∂y−⊤.Plugging all of this into 4© and using the definition of the adjoint solution λ =∂ t∂y−⊤∂d∂y⊤∇dM, we therefore have:4© = −∂ 2t∂m∂mwm⊤+∂ 2t∂m∂y0wy0⊤+∂ 2t∂m∂yx⊤∂ 2t∂y0∂mwm⊤+∂ 2t∂y0∂y0wy0⊤+∂ 2t∂y0∂yx⊤− ∂2q∂ s∂ sws⊤S⊤λ+− ∂y∂p⊤(∂ 2t∂y∂mwm⊤+∂ 2t∂y∂y0wy0⊤+∂ 2t∂y ∂yx⊤)λ,278Appendix B. Derivation of the Hessianwith x =∂y∂pwp. Therefore the action of the Hessian is given byHMwp = ∂2M∂p∂pwp= 1©+ 2© = 1©+ 3©+ 4©= J⊤∂2M∂d∂dJw +∂y∂p⊤( ∂2d∂y∂yx)⊤∇dM+−∂ 2t∂m∂mwm⊤+∂ 2t∂m∂y0wy0⊤+∂ 2t∂m∂yx⊤∂ 2t∂y0∂mwm⊤+∂ 2t∂y0∂y0wy0⊤+∂ 2t∂y0∂yx⊤− ∂2q∂ s∂ sws⊤S⊤λ+− ∂y∂p⊤(∂ 2t∂y∂mwm⊤+∂ 2t∂y∂y0wy0⊤+∂ 2t∂y ∂yx⊤)λ=− ∂ t∂m⊤− ∂ t∂y0⊤∂q∂ s⊤S⊤µ−∂ 2t∂m∂mwm⊤+∂ 2t∂m∂y0wy0⊤+∂ 2t∂m∂yx⊤∂ 2t∂y0∂mwm⊤+∂ 2t∂y0∂y0wy0⊤+∂ 2t∂y0∂yx⊤− ∂2q∂ s∂ sws⊤S⊤λ (B.15a)withµ =∂ t∂y−⊤(∂d∂y⊤ ∂2M∂d∂dJw +(∂2d∂y∂yx)⊤∇dM+−(∂ 2t∂y∂mwm⊤+∂ 2t∂y∂y0wy0⊤+∂ 2t∂y ∂yx⊤)λ). (B.15b)279

Cite

Citation Scheme:

        

Citations by CSL (citeproc-js)

Usage Statistics

Share

Embed

Customize your widget with the following options, then copy and paste the code below into the HTML of your page to embed this item in your website.
                        
                            <div id="ubcOpenCollectionsWidgetDisplay">
                            <script id="ubcOpenCollectionsWidget"
                            src="{[{embed.src}]}"
                            data-item="{[{embed.item}]}"
                            data-collection="{[{embed.collection}]}"
                            data-metadata="{[{embed.showMetadata}]}"
                            data-width="{[{embed.width}]}"
                            async >
                            </script>
                            </div>
                        
                    
IIIF logo Our image viewer uses the IIIF 2.0 standard. To load this item in other compatible viewers, use this url:
https://iiif.library.ubc.ca/presentation/dsp.24.1-0340675/manifest

Comment

Related Items