Parametrization and multiple time scale problems with non-Gaussianstatistics related to climate dynamicsbyWilliam F. ThompsonB. Math, The University of Waterloo, 2008M. Sc, The University of British Columbia, 2010a thesis submitted in partial fulfillmentof the requirements for the degree ofDoctor of Philosophyinthe faculty of graduate and postdoctoral studies(Mathematics)The University of British Columbia(Vancouver)August 2015c© William F. Thompson, 2015AbstractMany problems in climate modelling are characterized by their chaotic nature and multiple time scales.Stochastic parametrization methods can often simplify the analysis of such problems by using appropriatestochastic processes to account for degrees of freedom that are impractical to model explicitly, such thatthe statistical features of the reduced stochastic model are consistent with more complicated modelsand/or observational data. However, applying appropriate stochastic parametrizations is generally anon-trivial task. This is especially true when the statistics of the approximated processes exhibit non-Gaussian features, like a non-zero skewness or infinite variance. Such features are common in problemswith nonlinear dynamics, anomalous diffusion processes, and multiple time scales. Two common topicsin stochastic parameterization are model parameter estimation and the derivation of reduced stochasticmodels.In this dissertation, we study both of these topics in the context of stochastic differential equationmodels, which are the preferred formalism for continuous-time modelling problems. The motivationfor this analysis is given by problems in atmospheric or climate modelling. In Chapter 2, we estimateparameters of a dynamical model of sea surface vector winds using a method based on the properties ofdifferential operators associated with the probabilistic evolution of the wind model. The parameter fieldswe obtain allow us to reproduce statistics of the vector wind data and inform us regarding the limitationsof both the estimation method and the model itself. In Chapter 3, we derive reduced stochastic modelsfor a class of dynamical models with multiple time scales that are driven by α-stable stochastic forcing.The results of Chapter 3 are applied in Chapter 4, where we derive a similar approximation for processesthat are driven by a fast linear process experiencing additive and multiplicative Gaussian white noiseforcing. The results of these chapters complement previous results for systems driven by Gaussian whitenoise and suggest a possible dynamical mechanism for the appearance of α-stable stochastic forcing insome climatic time series.iiPrefaceThe research contained in this dissertation was primarily directed by Dr. Rachel A. Kuske and Dr. AdamH. Monahan and undertaken by William F. Thompson. This thesis was written by William F. Thompsonwith editorial and content suggestions provided throughout by Dr. Rachel A. Kuske and Dr. Adam H.Monahan. Dr. D. Crommelin was also a research collaborator for Chapter 2.The work in Chapter 2 has been published at the following reference:W. F. Thompson, A. H. Monahan, D. Crommelin. Parametric estimation of the stochastic dynamics ofsea surface winds. Jour. Atmos. Sci., 71:3465–3483, 2014.The work in Chapter 3 has been submitted and is under review. The research has been published on-line at arXiv.org and can be found at the following reference, as of the submission date of this dissertation:W. F. Thompson, R. A. Kuske, A. H. Monahan. Stochastic averaging of dynamical systems withmultiple time scales forced with α-stable noise. arXiv.iiiTable of ContentsAbstract . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . iiPreface . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . iiiTable of Contents . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . ivList of Tables . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . viiList of Figures . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . viiiAcknowledgements . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . xiii1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11.1 Background . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11.1.1 Review of key concepts for stochastic differential equations . . . . . . . . . . . . . 41.2 Literature review . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 151.3 Overview of this thesis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 162 Parametric estimation of the stochastic dynamics of sea surface winds . . . . . . . . 182.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 182.2 Method . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 192.2.1 The Ornstein-Uhlenbeck process . . . . . . . . . . . . . . . . . . . . . . . . . . . . 212.2.2 Weighting of eigenvalues . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 232.2.3 The effect of correlated noise . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 242.3 The sea surface wind model . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 272.3.1 Properties of the wind model . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 282.3.2 Estimating model parameters from simulated data . . . . . . . . . . . . . . . . . . 302.3.3 Resimulation of winds using the reconstructed model . . . . . . . . . . . . . . . . . 322.4 Estimation of model parameters from reanalysis wind data . . . . . . . . . . . . . . . . . 342.4.1 Limitations of the model . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 352.4.2 Parameter estimates at representative points . . . . . . . . . . . . . . . . . . . . . 352.4.3 Global parameter estimates . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 372.4.4 Improved estimates of h . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 412.5 Summary and conclusions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 42iv3 Stochastic averaging of systems with multiple time scales forced with α-stable noise 433.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 433.2 Stochastic averaging and α-stable noise . . . . . . . . . . . . . . . . . . . . . . . . . . . . 463.2.1 (A) approximation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 473.2.2 (L) approximation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 473.2.3 (N+) approximation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 513.3 Computational results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 553.3.1 Linear system . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 553.3.2 Nonlinear system 1: bilinear slow dynamics . . . . . . . . . . . . . . . . . . . . . . 573.3.3 Nonlinear system 2: state-dependent reversion . . . . . . . . . . . . . . . . . . . . 603.3.4 Nonlinear system 3: strong slow nonlinearity . . . . . . . . . . . . . . . . . . . . . 613.4 Conclusions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 614 The appearance of α-stable dynamics via fast processes forced with correlated ad-ditive and multiplicative noise . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 654.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 654.2 Properties of the linear CAM noise process and their relationship to α-stable distributions 664.2.1 Properties of the linear CAM noise process, yt . . . . . . . . . . . . . . . . . . . . 674.2.2 Infinite variance distributions and their relationship to α-stable distributions . . . 684.3 Reduction of the multiple time scale system forced by a linear CAM process . . . . . . . . 704.3.1 The distribution for the integral of an OULP . . . . . . . . . . . . . . . . . . . . . 714.3.2 Convergence of CAM noise to an α-stable noise process . . . . . . . . . . . . . . . 724.3.3 The stochastic averaging approximation . . . . . . . . . . . . . . . . . . . . . . . . 764.4 Sample systems . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 784.4.1 Linear system . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 784.4.2 Nonlinear system 1: nonlinear potential . . . . . . . . . . . . . . . . . . . . . . . . 794.4.3 Nonlinear system 2: bilinear interaction . . . . . . . . . . . . . . . . . . . . . . . . 804.4.4 Discussion of numerical results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 814.5 Conclusions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 825 Conclusions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 845.1 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 845.2 Future work . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 86Bibliography . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 88A Appendix to Chapter 3 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 94A.1 Evaluation of the characteristic function ψ . . . . . . . . . . . . . . . . . . . . . . . . . . . 94A.2 Numerical methods . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 95A.2.1 Simulating the SDEs . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 95A.2.2 Estimating the PDFs . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 99A.2.3 Estimating the autocodifference function . . . . . . . . . . . . . . . . . . . . . . . . 100vB Appendix to Chapter 4 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 102B.1 Intermediate calculations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 102B.1.1 Asymptotic characteristic function for Yj . . . . . . . . . . . . . . . . . . . . . . . 102B.2 Simulating the CAM noise process . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 103B.2.1 Consistency of simulations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 103B.2.2 Summary of implications for numerical simulations . . . . . . . . . . . . . . . . . . 104B.3 Estimating θ . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 105B.3.1 Estimating the scale parameter . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 105viList of TablesTable 2.1 The base-case parameters used to generate realizations from the model (2.22), (2.23). . 30Table 2.2 Top: Observed statistics of the ERA40 data at indicated locations. Bottom: Computedstatistics from the wind model (2.22, 2.23) with estimated parameters. Estimation ofthe parameters is carried out using the constraints described in Section 2.3.3, weightingw = 1000 and a subsamping of degree 4. . . . . . . . . . . . . . . . . . . . . . . . . . . 36Table 2.3 Top table: Estimates of the parameters in the wind model (2.22, 2.23) with weightingw = 1000 and subsamping of degree 4. Bottom table: parameter estimates followingthe rescaling of parameters to improve estimates of the autocorrelation structure asdescribed in Section 2.4.2. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 37Table A.1 Simulation parameters for the examples considered in Section 3.3. The indicated variabledetermines the smallest characteristic time scale of the system, which is also indicated.All systems were sampled on the time step Dt = 10−2. Single asterisks indicate the sys-tem was simulated using a higher-order method as described by (A.15). Double/Tripleasterisks indicate numerical Marcus integration without/with numerical evaluation ofthe Marcus map, θ (A.28). . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 96viiList of FiguresFigure 2.1 Estimates of the eigenvalues and the coefficients of the generator of the OUP withµ = −1, σ = 1. The time series used to estimate the eigenvalues contains 30000 pointswith a resolution time step δt = 0.1. Six eigenvalues are considered to obtain param-eter estimates and no weighting is applied when minimizing the objective function.Sampling variability was assessed by estimating these parameters from 50 independentrealizations of the random process. The bottom and top of the boxes span the lower toupper quartiles with the median drawn in between. The whiskers extend to a maximumof 1.5 times the interquartile range and the black dots indicate values lying outside ofthis range. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 22Figure 2.2 Estimates for the coefficients of the overspecified model given by (2.15) for data gen-erated from the OUP with µ = −1, σ = 1, illustrated as in Figure 2.1. The timeseries used to estimate the eigenvalues are 30000 points long with resolution time stepδt = 0.1. Note that we have estimated 6 eigenvalues, and the parameter estimates areslightly biased. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 23Figure 2.3 Estimates of µ and σ2 when weighting is applied to the eigenvalues in the objectivefunction. The value of w is indicated on the horizontal axis. Due to degeneracies in thegenerator for the OUP, the estimates of µ, σ2 are biased to values closer to 0 when wis large. Since the PDF of the OUP depends only on the quantity µ/σ2, this quantityis well estimated when the lowest eigenvalue is heavily weighted - although µ and σ2are often not well estimated themselves when higher order eigenvalues are suppressed.The box plots are drawn as described in Figure 2.1. . . . . . . . . . . . . . . . . . . . 24Figure 2.4 Estimates for the coefficients µ and σ2 of the generator when the data is generated bythe system (2.17) In both figures, τx = 1. In the top, middle, and bottom rows the ACTscales for y are respectively τy = 0.02, 0.1, 0.25. Ensembles were computed from 50 timeseries, each of length 50000 points with resolution timestep δt = 0.1. The horizontalaxis indicates the degree of subsamping used in the estimations (1 = no subsampling),and the red dashed line indicates the values of the coefficients for the equivalent OUPas determined by (2.20). Weighting is applied to offset biased estimates of the higherorder eigenvalues (w = 2). . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 26Figure 2.5 Plot of the autocorrelation function, Cxx(T )/Cxx(0), as a function of lag time T for xtas governed by the system (2.17). The values of τy are as indicated in the legend andτx = σ = 1. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 27viiiFigure 2.6 A visual schematic of the sea surface wind model (2.22), (2.23). Note that the meanwind forcing term is directed parallel to the direction of the vector winds, while themixing and drag forces are antiparallel. The stochastic forcing terms perturb the meanforcing term and no visual representation is given for them. . . . . . . . . . . . . . . . 28Figure 2.7 The influence of the weight parameter w on estimates of various parameters from thewind model using simulated data from (2.22, 2.23). The values of w are shown onthe horizontal axis for each parameter boxplot. The true value of each parameter isindicated by the dashed red line. No subsampling of the data was carried out. . . . . 31Figure 2.8 Top-left: Boxplots of estimates for the first three non-zero eigenvalues for the windmodel with varying ACTs in the forcing. The black boxplots indicate the eigenvalueestimates for simulated time series with white noise forcing, while the blue and redboxplots indicate the estimates when red noise having short (τi = 0.1 h) and long(τi = 6 h) ACTs is used. The pink boxplots indicate the parameter estimates from thetime series with τi = 6 h with subsampling of degree 4 applied. The data are resolvedat the standard time resolution δt = 6 h. The other panels display the estimates forthe parameters of the SDEs. The true values for the white noise case are indicatedwith a black dashed line. In each case, an ensemble of parameter estimates from 50independent realizations was computed. . . . . . . . . . . . . . . . . . . . . . . . . . . 32Figure 2.9 Relative error of simulated statistics relative to original statistics (mean, standard de-viation, and skewness) for simulations from models fit to time series produced by thewind model with white noise forcing. For each of these, the relative error of a quantity zis defined as (zoriginal− zreconstructed)/zoriginal. Lower right panel: The computed ACFof u from the original time series (black, circles) and from the resimulated time series(red, crosses). The estimates of the parameters were obtained without subsamping andwith weight w = 1000. δt = 6 hours and 30000 data points are used in each time series. 33Figure 2.10 As in Figure 2.9, for the wind model (2.22, 2.23) fit to time series generated with rednoise forcing with ACT scales similar to the resolution of the time series (τi = δt =6 h). Red symbols denote results obtained using parameter estimates without datasubsamping, while the blue curve denotes the results following subsamping of degree 4.These calculations were carried out with an ensemble of 50 independent realisations. . 34Figure 2.11 The autocovariance functions for the zonal and meridional wind directions (blue andred, resp.). Crosses: observed estimates. Dashed lines: simulations based on parameterestimates without a rescaling of h. Solid lines: simulations using parameter estimatesthat include a rescaling of h. The rescaling is defined to match the absolute geometricmean autocovariance at a lag of 18 hours. . . . . . . . . . . . . . . . . . . . . . . . . . 36Figure 2.12 Left: Statistics of the original data. Middle: Statistics of resimulated data with pa-rameters estimated using the C-VE method on the original data. Right: Statistics ofthe resimulated data with parameter estimates that include a rescaling of the parame-ters so that the ACT scale is more accurately captured. As expected, the middle andright columns are identical by construction. The parameters were estimated using themodified weighting scheme (2.16) with w = 1000 and subsamping factor of 4. . . . . . 38ixFigure 2.13 Left: Estimates of the parameter fields. Right: Parameter field estimates after therescaling of the parameters so that the overall ACT scale is more accurately captured.(The parameters were estimated using the weighting scheme (2.16) with w = 1000 andsubsamping factor of 4.) In the initial estimates for h, we have enforced bounds onh ∈ [10−3, 102] km. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 39Figure 2.14 Top: Scatterplot of the skewness of the wind speed along the mean wind directionagainst the logarithm of the estimated value of h. Recall that in the original parameterestimates we apply constraints that include an upper bound on h. Bottom-Left: Skew-ness of the wind speeds along the mean wind direction. The white (black) contoursindicate the level curves where the skewness is equal to 0 (equal to -0.5). Bottom-Right:The field of rescaled h estimates with the level curves superimposed. . . . . . . . . . . 40Figure 2.15 Skewness fields for the measured and simulated data when K is set to zero. . . . . . . 41Figure 2.16 Estimated log10(h) fields when K is set to zero. . . . . . . . . . . . . . . . . . . . . . . 41Figure 3.1 A comparison of the time series and simulated marginal distributions of the fast-slowsystem (3.56, 3.57) and the stochastically averaged system (3.59). Top-Left: A sampleof the time series for the slow variable of both systems: xt (solid blue) and xt+ξt (dashedgreen). Top-Right: Normalized histogram-based estimates for the density for the slowvariable. Blue line: full system. Green crosses: (L) approximation. Black squares:Numerically evaluated probability density function for ξ from the (L) approximation(3.59) using the MATLAB package STBL as described in Section 3.3. Parameters forthis simulation are (a = 0.2, b = 0.7, c = 1, = 0.01, α = 2.0, x0 = 0). The bottompanels are as for the top, but with α-stable noise where α = 1.9, β = 0. . . . . . . . . 57Figure 3.2 Normalized histogram-based estimates of the PDF with symmetric forcing (β = 0) ofxt from (3.56, 3.57) (blue line) and the (L) approximation ξt of (3.59) (green crosses).Left/Middle: As in Fig. 3.1 with α = 1.9/1.4. Right: As in Fig. 3.1, but witha = 0.7, b = 2, α = 1.7 and = 0.1. The numerically evaluated density using theMATLAB package STBL for ξ in (3.59) is shown by black squares. . . . . . . . . . . . 58Figure 3.3 Normalized histogram-based estimates of the PDF with β 6= 0 of xt from (3.56, 3.57)(blue line). Also plotted is the histogram for ξt of (3.59) (green crosses). For allfigures: a = 0.2, b = 0.7, c = 1, α = 1.7. Left/Middle/Right: = 0.01/0.01/0.1, β =−0.5/1/0.75. The numerically evaluated density using the MATLAB package STBLfor ξ in (3.59) is shown in black squares. . . . . . . . . . . . . . . . . . . . . . . . . . . 58Figure 3.4 Logarithmic plot of the numerically estimated ACD for xt of (3.56, 3.57) (blue line)and ξt of (3.59) (green crosses) where β = 0. The analytically derived autocodifferenceAξ(τ) in (3.61) is also displayed (black squares). The parameters for the left, middleand right panels correspond to those from Fig. 3.2. . . . . . . . . . . . . . . . . . . . . 59Figure 3.5 Normalized histogram-based estimates of the PDF of xt for the full nonlinear system(3.62, 3.63) (blue line), the (L) approximation xt+ξt (3.65) (green crosses) and the (N+)approximation Xt (3.64) (red circles). For all figures: a = 1, b = 0.1, c = 1, = 0.01and x0 = c. Left/Middle/Right: α = 1.7/1.9/2. . . . . . . . . . . . . . . . . . . . . . . 59xFigure 3.6 Logarithmic plot of the estimated ACD for xt of the full nonlinear system (3.62, 3.63)(blue line). Also plotted are the real parts of the autocodifferences of the (L) approx-imation, xt + ξt, (3.65) (green crosses) and the (N+) approximation, Xt, (3.64) (redcircles). The parameters for the 3 panels correspond to those from Fig. 3.5. . . . . . . 60Figure 3.7 Normalized histogram-based estimates of the PDF of xt for the full system (3.66, 3.67)(blue line), the (L) approximation xt+ ξt (3.69) (green crosses), and the (N+) approxi-mation, Xt, (3.68) (red circles). For all plots, b = 2 and x0 = 0. Left: = 0.1, α = 1.6.Middle: = 0.01, α = 1.9. Right: = 0.01, α = 2. (Insets are on a linear scale) . . . . 61Figure 3.8 Normalized histogram-based estimates of the PDF of the xt - variable for (3.70, 3.71)(blue line), xt + ξt for the system (3.73) (green crosses), and Xt for (3.72) (red circles).For all figures: σ = 0.3, a = 1, = 0.01, x0 = 0. Left/Middle/Right: α = 1.7/1.9/2.0. . 62Figure 3.9 Left & Middle: Normalized histogram-based estimates of the PDF with symmetric α-stable forcing for α < 1, which is not explicitly covered in our analysis. (Left: As inFig. 3.1 with (a, b, c, , α) = (0.2, 0.7, 1, 10−2, 0.7). Middle: As in Fig. 3.7 with (, α) =(10−2, 0.8)). Right: Estimates of the PDF for xt where dxt = (c− xt + −γxtyt [1 + ζyt]) dtand yt satisfies (3.63), (a, b, c, , α) = (1, 0.1, 1, 10−2, 1.7). The values of ζ are indicatedin the legend. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 64Figure 4.1 Sample time series of yt. Left: (L,E, g, b) = (−1, 1.0541, 0, 1) corresponding to (α∗, β∗) =(1.8, 0). Right: (L,E, g, b) = (−1, 1.118, 0.6, 0.4) corresponding to (α∗, β∗) = (1.6, 0.89) 67Figure 4.2 Logarithmic plots of the numerically estimated ACD function of yt. Left: (L,E, g, b) =(−1, 1.0541, 0, 1), ν = 0.4. Right: (L,E, g, b) = (−1, 1.118, 0.6, 0.4), ν = 0.3 . . . . . . 69Figure 4.3 Plots of the codifference CD and cosum CS of Y j and Y j+1 as a function of the lengthof time integral ∆/ as given in (4.21). For both plots, α∗ = 1.7, and L is as indicatedin the legend. 105 pairs of (Yj , Yj+1) are sampled each value of L considered in eachfigure. Left: g = 0, b =√−L. Right: g = 0.4√−L, b = 0.6√−L. . . . . . . . . . . . . 74Figure 4.4 Plots of the estimated value of the scale parameter σ˜ as a function of and ∆ for differentvalues of α∗ as indicated in the legend. NY = 5000 realizations of Yj were generated tocompute the numerical estimates of σ˜. The fixed parameters are (L, g, b) = (−1, 0, 1).Left: = 10−3 and the dashed lines are proportional to ∆1/α∗ . Right: ∆ = 1 and thedashed lines are proportional to γ∗ . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 76Figure 4.5 Top-Left: The numerically estimated stationary PDF of xt (4.41) and the predictedPDF ofXt (4.42) on a logarithmic scale (linear scale, inset) with parameters (L,α∗, g, b) =(−1, 1.5, 0, 1), (µ, ζ) = (1, 1). The value of is indicated in the legend. Top-Right: Thecorresponding ACDs. Bottom: Same as Top, but with α∗ = 1.8. . . . . . . . . . . . . 79Figure 4.6 As in Figure 4.5, but with parameters (L,α∗, g, b) = (−1, 1.7, 0.6, 0.4), (µ, ζ) = (1, 1). . 80Figure 4.7 The numerically estimated stationary PDFs, u, for xt and Xt on a logarithmic scale andlinear scale (inset) for nonlinear system 1 with dynamics (4.43) and (4.44) . (L, g, b) =(−1, 0, 1), (µ, ζ) = (1, 0.2) and as indicated in the legend. Left/Right: α∗ = 1.5/1.8. 80Figure 4.8 Same as Figure 4.7, but with µ = −1 in both panels. . . . . . . . . . . . . . . . . . . . 81xiFigure 4.9 The numerically estimated stationary PDFs, u, for xt and Xt on both a linear andlogarithmic scale for nonlinear system 2 with dynamics (4.45) and (4.46). (L, g, b) =(−1, 0, 1), (c, ζ) = (1, 0.2). Left/Right: α∗ = 1.5/1.8 . . . . . . . . . . . . . . . . . . . 81Figure B.1 The relative difference between the histogram-based empirical PDF of yt simulatedaccording to (B.8) and the true CAM noise PDF for step sizes δt as indicated, =10−3. The time difference between successive sampled points is equal to 10−1 in allcases. The number of sampled points for each time series in N = 105. Left/Right:(L,E, g, b) = (−1, 1.1547, 0, 1)/(−1, 1.0541, 0.6, 0.4). . . . . . . . . . . . . . . . . . . . . 104Figure B.2 Histogram-based empirical PDFs of∫ 0.10 yt/ dt simulated according to (B.8) using sim-ulation step size δt as indicated. The sample size is N = 104. Left/Right: (L,E, g, b) =(−1, 1.1547, 0, 1)/(−1, 1.0541, 0.6, 0.4). . . . . . . . . . . . . . . . . . . . . . . . . . . . 105xiiAcknowledgementsI would like to express my gratitude to my research supervisors, Dr. Rachel A. Kuske and Dr. Adam H.Monahan for their time, expertise, and patience.Additionally, I would like thank Dr. D. Crommelin (CWI Amsterdam) for his advice and editorialcontributions and Dr. Eric Vanden-Eijnden (NYU) for helpful conversations related to Chapter 2. Iwould like to thank Dr. Ilya Pavlyukevich (Uni. Jena) for bringing the Marcus integral to our attentionand for helpful advice. I also thank Dr. John Walsh (UBC) for helpful discussions related to Chapter 3.I also thank the anonymous reviewers of the publications based upon Chapters 2 and 3 respectively, fortheir constructive comments on my research.I wish to acknowledge the generous financial support of the National Science and Engineering ResearchCouncil of Canada1.Lastly, I would like to acknowledge my peers, professors and the support staff in the Math Depart-ment and the Institute of Applied Mathematics at the University of British Columbia for their wisdomand support.1http://www.nserc-crsng.gc.caxiiiChapter 1Introduction1.1 BackgroundThe weather and climate systems are among the most complex systems studied today. Due to thisoverwhelming complexity, mathematical modelling techniques are increasingly important to researchersseeking to make contributions to this field. Modern climate research requires a broad mathematical skillset that includes techniques ranging from the analysis and understanding of results based on relativelysimple phenomenological models for basic climate processes to large-scale, high resolution, numericalmodels of regional and global climate that incorporate the effects of many different interacting processes[77]. The latter usually requires a good understanding of the former as the simple models are oftenused to interpret the output of more comprehensive models of climate. Many problems in weather andclimate science are characterized by chaotic behaviour, multiple time scales, and unresolved or poorlyunderstood processes. For these reasons, stochastic modelling techniques are frequently required whenresearching any problem with one or more of these qualities as they provide a means to account fordynamical processes having effects that are impractical to resolve, but are nonetheless non-negligiblewhen considering the phenomenon under study [66]. The inclusion of stochastic effects yields results thatare often unexpected and, in some cases, counter-intuitive, making these considerations important toobtain a complete understanding of a particular problem.The approximation of fast chaotic processes by stochastic ones is formally justified. Climate processesare studied from the perspective of continuous-time dynamical systems as researchers are often looking toextrapolate from current conditions to make a statement about the future. It has been shown via entropy-based arguments and the calculation of dynamical entropies given in [29, 30] that dynamical systemsdisplaying fast chaotic behaviour are qualitatively indistinguishable from systems that are stochastic, whenobserved on sufficiently coarse scales [41]. Numerical simulations comparing the chaotic and correspondingstochastic systems provide further justification for this conclusion. The dynamical behaviour of theatmosphere displays sensitivity to initial conditions, which makes it chaotic, and so forecasts of thestate of the atmosphere are essentially impossible beyond a few weeks time. These characteristics ofthe atmosphere and weather systems make long-term predictions for the exact state of the climate (i.e.atmospheric carbon levels, the cryosphere, ocean state, etc. [77]) effectively impossible since the stateof the climate depends on short-term atmospheric behaviour. To deal with these challenges, several1methodologies have been proposed, including ensemble and multimodel forecasting, where the results aregiven as a “cloud” of climatic trajectories rather than a single one, and stochastic dynamical modelling[67]. In essence, these approaches allow for forecasts based on the probabilities of the climatic stateand can give an indication of how likely they are to occur. Hence, it often makes more sense to talkabout climate forecasts in a probabilistic or distributional context, rather than a deterministic context.Due to the chaotic nature of atmospheric processes and the probabilistic paradigm, stochastic modellingapproaches to climate problems are desirable, However, to properly undertake such an approach howeverrequires consistent methods of parametrizing those fast and/or small scale processes to be modelled asstochastic perturbations to the resolved variables.Including stochasticity in continuous-time dynamical systems models requires understanding of thetheory of stochastic differential equations (SDEs), which are differential equations where the time-evolutionof the state is governed by both deterministic and random components. The popularity of dynamical sys-tems models in many fields and the growing recognition of the significance of stochasticity are largelyresponsible for the importance of SDEs. Virtually every research discipline that requires mathematicalmodelling also requires parameter estimation methods to determine unknown variables within a model.Many scientific fields are data driven (rather than process model driven) and modelling becomes possibleonly when the parameters of “ad hoc models” (i.e. models without rigourous justification) are fit to data.Even in fields where process modelling is common place, there are quantities that cannot be computedfrom theory. Often when parameters for continuous-time models are estimated, the model itself is assumedto be deterministic and any randomness in the data is assumed to result from measurement error. Whenstochasticity is assumed to be a dynamical component of the model, then a SDE formalism is typicallyadopted and the problem of parameter estimation is more complicated. When the assumed model forthe variables of interest is such that the associated distributions are Gaussian, parameter estimation isgenerally more straightforward as Gaussian SDE models can be entirely characterized by a mean andautocovariance function, each of which can be easily estimated via maximum likelihood, for example. Anexample where such a characterization is used is [71], where a linear Gaussian model is used to modelthe evolution of principal components in order to model the El Nino Southern Oscillation. Such simplecharacterizations are generally not possible for non-Gaussian time-dependent processes, nor is parameterestimation for those processes so straightforward. In such parametrization projects, the stochastic modelis assumed to be perturbed by Gaussian white noise, which is generally a safe assumption provided theeffects being approximated by Gaussian noise are sufficiently fast and do not display large deviations. Theterminology “white noise” is based on the analogy to the visible portion of the electromagnetic spectrumas white light has equal power at all frequencies. Similarly, “red noise” stochastic forcing has more powerat lower frequencies, as does the red portion of the visible spectrum. The fast processes perturbing theslow variables can sometimes be given a more complicated and explicit dynamical form that includesstochastic forcing. In these cases, it is possible to study the behaviour of the slow variables in the weak ordistributional sense independently of the fast variables by approximating them with a different stochasticdynamical system of reduced dimension and evolving on the slow time scale only. The techniques used toderive these approximations are referred to as stochastic averaging methods or stochastic homogenizationmethods and have been the subject of study for many years [33, 43, 44, 69]. They have also been ofparticular interest to researchers working in climate modelling for several decades with some of the more2recent work given in [2, 52–54, 63]. More comprehensive discussions of stochastic averaging methodscan be found in [25, 70], and analogous studies of the case where the fast process is chaotic be found in[27, 41, 57].In climate science, it is standard to model the evolution of weather and climate variables in terms of aprojection of atmospheric dynamics onto a set of discrete modes (e.g. a principal component basis or otherGalerkin-like basis [47, 71]) and to assume Gaussian white noise as a dynamical component that drives thetime-evolution of the modes. However, several studies have shown that some atmospheric processes haveheavy-tailed statistics, meaning that probability distributions associated with the variables of interestare not exponentially bounded and decay according to a power law at the extremes of the distributionwhich makes the probability of observing “extreme events” non-negligible. An example of this is particledispersal in turbulent flows. If the mean squared displacement of a particle grows linearly with time insuch a flow, the particle’s trajectory is said to be diffusive and its dynamics can be modelled using a SDEdynamical framework with Gaussian white noise processes [26]. If the mean squared displacement of aparticle is not a linear function of time, but grows as tp where 1 < p < 2 then the particle is said to beundergoing anomalous diffusion. Anomalous diffusion processes have been observed in an experimentalstudy involving vortex chains generated by shear flow [83]. A later analysis of the experimental results [17]showed that assuming an atmospheric fluid dynamical model that includes shear flow and a vortex chain,various features of the experiment could be reproduced and confirmed that anomalous diffusion processescan occur in atmospheric flows. Later studies of atmospheric data confirmed the existence of superdiffusivebehaviour in the troposphere occurring over the time scales of several days, by measuring the temporaldispersion of tracers in reconstructed wind fields in extratropical regions [37]. Similar behaviour has beenobserved for stratospheric transport [82]. Superdiffusive particles are said to undergo Le´vy flights: periodsof behaviour that appear to be diffusive in a path-wise sense, but sporadically display large jumps. Inthis case the dynamics of the particle are modelled using SDEs where the stochastic component is notGaussian white noise, but α-stable white noise (also called α-stable Le´vy white noise or stable whitenoise), giving their displacement infinite variance. There have also been examples of infinite variancestatistics appearing in studies other than tracer dispersion, such as the appearance of α-stable noiseforcing in calcium concentration measurements in Greenland ice cores [19, 34]. This further motivatesstochastic averaging studies for cases where the fast processes have infinite variance, but for the reasonof understanding the behaviour of the fast process, rather than being able to approximate its effect on aslower one.The fact that stochastic forcing from the α-stable class of processes is used to model the dynamicsof superdiffusive particles is not surprising. In the same way that a sum of independent, identically-distributed random variables with finite variance converges in distribution to a Gaussian random variableby the Central Limit Theorem, a sum of independent, identically-distributed random variables withinfinite variance convergences in distribution to an α-stable random variable by the Generalized CentralLimit Theorem [24]. In fact, the Gaussian distribution is one particular case of an α-stable distribution(as we discuss in Chapter 3), and the fact that α-stable distributions are “ubiquitous” in various scientificfields can be further justified via entropic arguments [89]. This calls us to revisit the topic of stochasticaveraging. In the case of superdiffusive dynamics, the well-established stochastic averaging methodsdiscussed in [2, 63] are no longer applicable as they require that the fast process has finite variance.3Superdiffusive processes do not have finite variance and so the approximation of a superdiffusive fastprocess by a Gaussian white noise forcing is not appropriate. In these situations, α-stable white noiseforcing is the appropriate replacement, but comparatively little is known about stochastic averagingapproximations for α-stable noise driven systems beyond the linear case [85].1.1.1 Review of key concepts for stochastic differential equationsHere, we give some background information on several concepts that appear frequently throughout thisdissertation. It is intended as a compact introduction to some of the concepts that appear frequently inthis thesis and by no measure does this section constitute a thorough or exhaustive treatment of theseconcepts. Please consult the given references for more thorough discussions of the concepts given here.All of the concepts introduced here are given for one-dimensional cases, but higher-dimensional analoguesexist and are standard in most general references on the subject.Stochastic Differential Equations (SDEs)A stochastic differential equation (SDE) is a differential equation describing the time-evolution of a statevariable that is influenced by a stochastic (i.e. random) process. All the SDEs that are considered in thisdissertation are autonomous, meaning that the evolution of the state variables has no explicit dependenceon time; only the state variable. Unless specified otherwise, the SDEs considered in this thesis are Ito¯SDEs, denoting the choice of stochastic calculus used in their interpretation. We discuss other types ofstochastic calculi later in this section. An autonomous Ito¯ SDE is usually written in differential form asfollows:dxt = f(xt) dt+ σ(xt) dWt, t ≥ 0. (1.1)where dxt denotes the infinitesimal change of state variable xt over the infinitesimal time change dt,f(xt) dt is the deterministic drift term and σ(xt) dWt is the diffusion term where σ(xt) is a function, anddWt ∼DN (0, dt) (1.2)where N (µ,Σ) denotes a normal random variable with mean µ and variance Σ. The notation A ∼DBshould be interpreted as indicating distributional equality between A and B (i.e. “A is distributed asB”). If the coefficient of the noise term is a constant, the diffusion term is referred to as additive noise.Otherwise, it is referred to as multiplicative noise. The choice to express SDEs in differential form isdue to the fact that stochastic processes are generally not differentiable in the classical sense. The initialcondition for (1.1), x0, can be specified as a specific point in state-space, or less commonly, as a probabilitydistribution over the state-space. The term dWt is commonly referred to as a Gaussian white noise process,which plays a central role throughout this dissertation. The process Wt, which satisfies W0 = 0 andWt = limδ→0N∑j=1∆Wj , ∆Wj ∼DN (0, δ), t = Nδ (1.3)=∫ t0dWs ∼DN (0, t), (1.4)4is called a Wiener process (mathematical literature) or a standard Brownian motion (physics literature).We also consider a less common stochastic forcing process, dL(α,β)t , which is called an α-stable noiseprocess. This stochastic forcing is distinct from dWt in that its increments do not have a finite secondmoment in general and exhibit large jumps. It is prominently featured in Chapters 3 and 4 of thisdissertation, and more specific details related to it can be found in the next section.α-stable random variables and α-stable noise processesAn α-stable random variable is distributed according to an α-stable distribution (sometimes called anα-stable Le´vy distribution or a stable distribution), which we denote by Sα(β, σ). It has three associatedparameters: the stability index α ∈ (0, 2], the skewness parameter β ∈ [−1, 1] and the scale parameterσ ∈ (0,∞). Sometimes an additional parameter called the location parameter which defines the center ofthe distribution is specified. If the location parameter is not specified, as is the case in this thesis, it isassumed to be zero.The nomenclature “stable” comes from the important property that the sum of two α-stable randomvariables with the same stability index is also an α-stable random variable:X1 ∼DSα(β1, σ1), X2 ∼DSα(β2, σ2) ⇒ X1 +X2 ∼DSα(β, σ),σα = σα1 + σα2σαβ = σα1 β1 + σα2 β2.(1.5)There is no closed form expression for the PDF of an α-stable random variable, p(x), except in a fewspecial cases:• Cauchy random variable (α = 1, β = 0):p(x) = σpi(σ2 + x2)• Le´vy random variable (α = 1/2, β = 1):p(x) =√σ2piexp(−σ/x)x3/2• Gaussian random variable (α = 2, β = 0):p(x) = 1√4piσ2exp(− x24σ2)Note that in the case where the α-stable random variable is Gaussian, the variance is 2σ2 (rather thanσ2). Because the PDF cannot be given for general α-stable distributions, they are typically characterizedby their characteristic functions, ψ(k), which do have closed form:ψ(k) = F [p(x)](k) = exp(−σα|k|αΞ(k;α, β)), Ξ(k;α, β) = 1− iβ sgn (k)ϕ(k) (1.6)5where F denotes the Fourier transform F [p(x)](k) =∫R exp(ikx)p(x) dx andϕ(k) =tan(piα/2) if α 6= 1− 2pi log(|k|) if α = 1(1.7)In this thesis we are only concerned with the analysis of the α ∈ (1, 2) case, so ϕ(k) = tan(piα/2) fromhere on. Notice that when α = 2, the value of β is redundant as Ξ(k; 2;β) = 1 for any β and so ψ in (1.6)is the characteristic function for the Gaussian distribution: ψ(k) = exp(−σ2k2).An important property of α-stable random variables is their distributions have infinite second momentsand hence have infinite variance, except for the Gaussian case where α = 2. This can be seen by attemptingto evaluate their second moment using the characteristic function, ψ(k) (1.6): If X ∼ Sα(β, σ), 1 < α < 2,thenE[X2]= limk→0[(−i)2∂2ψ∂k2]= α(α− 1)σα[limk→0|k|α−2Ξ(k;α, β)ψ(k)], (1.8)which does not exist. Hence, the second moment does not exist. Another important fact is that appropri-ately scaled and shifted sums of identically distributed random variables with infinite variance convergeto an α-stable random variable, as per the Generalized Central Limit Theorem [24].If a continuous-time stochastic process has a stationary α-stable distribution, then we refer to thisprocess as an α-stable noise process and denote it by dL(α,β)t . The values of dL(α,β)t are distributedaccording todL(α,β)t ∼D Sα(β, dt1/α). (1.9)When α = 2, the increments of the α-stable white noise process satisfy dL(2,β)t ∼D√2 dWt. The α-stablewhite noise process plays an important role in Chapters 3 and 4 and is discussed in more detail therein.Forward and Backward Kolmogorov Equations (FKEs and BKEs)When discussing the solutions to a SDE, one must distinguish between strong solutions, which referto realized trajectories of xt given particular realizations of Wt, and the weak solution, which is theprobability density function (PDF) for xt. The weak solutions to SDEs are the primary focus in thisthesis, and they are commonly studied via the Forward Kolmogorov equation (FKE) (also known asthe Fokker-Planck equation) [26, 75]. The FKE is a time-dependent partial differential equation whosesolution, u = u(x, t), gives the PDF for the SDE of interest and many properties of the SDE are encodedin a corresponding FKE. The FKE for the SDE (1.1) is given by:∂u∂t = Au, Au = −∂∂x (f(x)u) +12∂2∂x2(σ2(x)u), (1.10)where u(x, 0) is the PDF of x0. If the initial condition is x0 is given by a specific point, x∗, in the statespace of x (i.e. limt→0+ xt = x∗ with probability 1), then u(x, 0) = δ(x − x∗) where δ is the Dirac deltafunction. We refer to the operator A as the Forward Kolmogorov operator. The FKE can be derivedfrom the first principles given in [26] by studying the Chapman-Kolmogorov equations, which give theevolution of the probability density for discrete space and time stochastic processes not modelled by aSDE, but for which some probability transition law exists. Similar derivations allow for the concept of6the Forward Kolmogorov equation to be extended to other stochastic forcing processes via the differentialChapman-Kolmogorov equation.Let the inner product [u, v] =∫R uv dx and let the operator G satisfy[Au, v] = [u,Gv], (1.11)(i.e. G is the adjoint operator to A). The equation ∂v∂t = Gv is called the Backward Kolmogorov equationand the operator G is known as the Backward Kolmogorov operator or the infinitesimal generator (whichis the nomenclature we use in this thesis). For the process (1.1), the infinitesimal generator is given byGv = f(x)∂v∂x +σ2(x)2∂2v∂x2 . (1.12)While the FKE describes the time-evolution of the probability density function u, the BKE is usedgenerically to study the expected value of a function of x and is useful for answering the question ofwhether a subset of the state space is reached at a particular time. The eigenvalues of the infinitesimalgenerator are the same as those of the Forward Kolmogorov operator and their eigenfunctions form abiorthogonal set with respect to the inner product [·, ·].When the stochastic forcing is α-stable, a different operator called the Riesz-Feller derivative is requiredto model the effect of the stochastic forcing in the FKE. This operator D(α,β) is given in Section 3.2.2,and is defined in terms of a Fourier Transform. By taking the Fourier Transform of the FKE, we obtain apartial differential equation for the characteristic function for xt, ψx(k, t) = F [u(x, t)](k), in which D(α,β)takes a much less complicated form, making it much easier to perform analysis on the distributionalproperties of xt by studying its characteristic function.A more detailed discussion of the infinitesimal generator can be found in [65] and a thorough treatiseon the Forward Kolmogorov Equation (The Fokker-Planck equation) is given in [75].Stochastic calculus: Ito¯ vs. StratonovichWhen evaluating stochastic differential equations like (1.1) or stochastic integrals like (1.4), it is importantto specify the stochastic calculus that applies. It is commonly known that Riemann sum approximationsto integrals converge to a unique value in the limit of infinitesimally small Riemann discretization fordeterministic integrators and integrands. In the case where the integrator is a stochastic process, this isno longer true.The integral interpretation of the SDE (1.1) is given byxt = x0 +∫ t0f(xs) ds+∫ t0σ(xs) dWs. (1.13)At this point, the last term on the right-hand side,∫ t0 σ(xs) dWs, has not been defined. It can be definedin terms of Riemann sums as we did for the integral of a Gaussian white noise process (1.4):∫ t0σ(xs) dWs = limδ→0N−1∑j=0σ(x∗j )∆Wj , t = Nδ, (1.14)7where x∗j = xtj , tj ∈ [jδ, (j + 1)δ]. If the value tj = jδ is chosen to evaluate (1.14) (i.e. the left endpointof the time interval), then the Ito¯ integral is computed. If tj = (j + 1/2)δ is chosen (i.e. the midpoint),then the Stratonovich integral is computed. As we show here, they are, in general, not the same.If we consider the difference between the Stratonovich and Ito¯ interpretations of (1.14), we obtain thefollowinglimδ→0N−1∑j=0[σ(x(j+1/2)δ)− σ(xjδ)]∆Wj , ∆Wj = W(j+1)δ −Wjδ (1.15)Applying a linear approximation to σ(x(j+1/2)δ) for small changes in x, we obtainσ(x(j+1/2)δ) ≈ σ(xjδ) + σ′(xjδ)[x(j+1/2)δ − xjδ]. (1.16)By the dynamics of xt given in (1.1) and the approximation (1.16), we can write (1.15) aslimδ→0N−1∑j=0[σ′(xjδ)(f(xjδ)δ2 + σ(xjδ)(W(j+1/2)δ −Wjδ))]∆Wj (1.17)= limδ→0N−1∑j=0σ′(xjδ)f(xjδ)δ2∆Wj+ limδ→0N−1∑j=0σ′(xjδ)σ(xjδ)(W(j+1/2)δ −Wjδ)2 (1.18)+ limδ→0N−1∑j=0σ′(xjδ)σ′(xjδ)(W(j+1/2)δ −Wjδ)(W(j+1)δ −W(j+1/2)δ)At this point, we can effectively eliminate terms in the sum with summands that are smaller than O(δ).The only term in (1.18) that is O(δ) is the second term; all others are smaller in the limit of small δ.Using the fact that (W(j+1/2)δ −Wjδ)2 converges in probability to δ/2 [65], we can write the differencebetween the Ito¯ and Stratonovich integrals aslimδ→0N−1∑j=0[σ(x(j+1/2)δ)− σ(xjδ)]∆Wj = limδ→0N−1∑j=0δ2σ′(xjδ)σ(xjδ) + o(δ) . (1.19)By writing the Riemann sum (1.19) as an integral we see that the difference between the Stratonovichand Ito¯ integrals is∫ t012σ′(xt)σ(xt) dt. (1.20)This term is known as the Stratonovich correction and is sometimes referred to as noise-induced drift,since it adds a deterministic perturbation to the dynamics of xt that is the direct result of the noisyforcing.As a result of this difference in stochastic calculi, one needs to be careful to specify the stochasticcalculus used to interpret the stochastic differential terms of the SDE. The notation given (1.1) is the con-ventional notation for the Ito¯ interpretation, while the Stratonovich interpretation is typically (although8not always) denoted by writing the SDE (1.1) asdxt = f(xt) dt+ σ(xt) ◦ dWt, (1.21)where ‘◦’ denotes the Stratonovich interpretation of the stochastic integral given in (1.13). One can easilywrite (1.21) as an Ito¯ SDE using the Stratonovich correction (1.20):σ(xt) ◦ dWt = σ(xt) dWt +12σ′(xt)σ(xt) dt. (1.22)Notice that if σ is a constant function, then the Ito¯ and Stratonovich interpretations are equivalent, sinceσ′(x) = 0. For simulation purposes, it is common to transform SDEs where the integrals are interpretedin the sense of Stratonovich into Ito¯ SDEs and use an explicit time-stepping algorithm.These two calculi tend to have different applications. Ito¯ calculus is typically chosen for financialmathematics problems because the choice to use the left endpoint of the time interval as the argument forthe integrand does not “look into the unknown future”, unlike the Stratonovich integral. The Stratonovichcalculus is favoured for physics-based modelling problems since it has been shown (in the well-knownWong-Zakai theorem) to be the appropriate calculus when white noise is used as an approximation forunpredictable, rapidly-fluctuating continuous processes [91, 92]. Moreover, it satisfies the rules of classicalcalculus which often makes it easier to use for analysis.The generalization of the Stratonovich calculus to processes with jumps is called the Marcus calculus[7, 55] and it is the natural interpretation for stochastic integrals when α-stable white noise used as anapproximation to smoother processes. Like the Stratonovich calculus, it satisfies the classical rules ofcalculus. This calculus appears prominently in Chapters 3 and 4 with significant discussion is devoted toit in Section A.2.1, including the associated numerical methods. There is no need to generalize the Ito¯calculus to jump processes as the interpretation is the same as the Gaussian white noise case.Unless stated otherwise, all SDEs in this dissertation are interpreted in the sense of Ito¯.AutocovarianceConsider a continuous-time stochastic process with finite variance, xt. To characterize the dependence ofxt+T on the value of xt in the probabilistic sense, one typically uses the autocovariance function,Cxx(T ) = E [(xt+T − E [xt+T ])(xt − E [xt])] . (1.23)The maximum value of the autocovariance function is obtained at lag time T = 0 and its values relative tothe maximum characterize the degree of linear dependence between time-displaced measurements of thetime series. Typically, though not always, the autocovariance function decays toward zero as |T | becomessufficiently large (not necessarily monotonically). From the definition (1.23), the autocovariance at lagtime T = 0 is equal to the variance of xt: Cxx(0) = E[(xt − E [xt])2]= Var [xt]. It is also common to usethe variance of xt to normalize Cxx which gives us the autocorrelation function Cxx(T )/Cxx(0) which hasa bounded range [−1, 1]. The autocovariance function is introduced in Section 2.2.3.The autocovariance function is a key component of classical stochastic averaging approximations aswe discuss below. However, there are some cases where it cannot be defined. The fact that the variance9for α-stable noise processes is infinite often means the variance of processes driven by α-stable noise is alsoinfinite. As a result, the autocovariance functions for these processes cannot be defined, since the secondmoments do not exist. The analogous quantity for infinite variance processes, called the autocodifferencefunction, is introduced in Section 3.3.1. As discussed, it is defined in terms of the characteristic function ofthe α-stable noise driven process and reduces to the autocovariance function in limits where the α-stablenoise is Gaussian. A discussion of how it can be numerically estimated is given in Appendix A.2.3.White vs. red noiseThe Gaussian white noise process is so named because its increments have a Gaussian distribution andbecause its increments are uncorrelated with each other. The usage of the term white to describe un-correlated noise processes is due to the fact that the power spectrum of a white noise process has equalpower at all frequencies, similar to white light which has significant power across the visible portion ofthe electromagnetic spectrum. If a noisy process has correlated increments, then that process is referredto as red noise, since the power spectrum has significantly more power in one part of the frequency do-main, typically the lower frequencies (as does red light relative the rest of the visible spectrum). Other“coloured” noises exist, but they are not considered in this thesis.The spectrum of a stochastic process can be computed using the Wiener-Khinchin theorem [26] whichstates that the Fourier transform of the autocovariance function of a stationary stochastic process givesthe power spectral density of the stochastic process. For example, the autocovariance of a Gaussian whitenoise process, which is a Dirac δ-function multiplied by the infinitesimal change in time over which it isresolved: CdWdW (T ) = dtδ(T ), has a constant power spectral density PSD(ω):PSD(ω) =∫Rexp(iωs)CdWdW (s) ds = dt, (1.24)which is constant for all angular frequencies, ω. The Wiener-Khinchin theorem can also be used to derivethe autocovariance functions for more complicated processes like a linear process driven by the incrementsof an Ornstein-Uhlenbeck process as in Section 2.2.3.An example: The Ornstein-Uhlenbeck processThe Ornstein-Uhlenbeck process (OUP), xt is a continuous-time stochastic process that appears frequentlyin this thesis. It has dynamics given by the following SDEdxt = −µxt dt+ σ dWt, t ≥ 0, (1.25)where µ > 0, σ are constants. This process has broad application, since it has been studied extensivelyand many analytical results are available and also it can be considered the linear approximation of morecomplicated dynamical systems near stable equilibria. While (1.25) is interpreted in the sense of Ito¯, wenotice that since the diffusion coefficient is a constant σ, the Ito¯ and Stratonovich interpretations of (1.25)are equivalent. The strong solutions can be derived, using the differential analogue of the well-known linear10differential equation solution method as follows:dxt = −µxt dt+ σ dWt (1.26)exp(µt) dxt = −µ exp(µt)xt dt+ σ exp(µt) dWt (1.27)d[exp(µt)xt] = σ exp(µt) dWt (1.28)exp(µt)xt − x0 =∫ t0σ exp(µs) dWs (1.29)Isolating for xt allows us to express the strong solutions to (1.25) asxt = x0 exp(−µt) +(∫ t0σ exp(µ(s− t)) dWs)(1.30)or alternativelyxt = x0 exp(−µt) +Wτ(t), τ(t) =∫ t0(σ exp(µ(s− t)))2 ds = σ22µ(1− exp(−2µt)) (1.31)which implies thatxt ∼DN(x0 exp(−µt),σ22µ(1− exp(−2µt)))(1.32)using the properties of the Wiener process and the normal distribution. As t → ∞, we see that xtconverges in distribution to a normal distribution with mean 0 and variance σ2/2µ. The weak solution isalso of interest as it informs us about more statistical properties of xt. The FKE for the OUP (1.25) isgiven by∂u∂t = µ∂∂x(xu) +σ22∂2u∂x2 , u(x, 0) = δ(x− x0) (1.33)where we have assumed x0 is known exactly. Solving (1.33) assuming the boundary conditions lim|x|→∞ u(x, t) =0 for all t ≥ 0 givesu(x, t) = u(0)(x)1 +∞∑j=1exp(−jµt)pj(x) , u(0)(x) =√µpiσ2 exp(−µx2σ2). (1.34)where u(0) is the stationary PDF for xt and pj is a polynomial function of order j. Taking t → ∞ in(1.34) confirms that the stationary distribution for xt is a Gaussian distribution with mean 0 and varianceσ2/2µ.Using the result (1.30) and assuming T ≥ 0, we compute the autocovariance function for xt, Cxx(T ),11which isCxx(T ) = E [(xt+T − E [xt+T ])(xt − E [xt])] (1.35)= E[σ(∫ t+T0exp(µ(s− t− T )) dWs)σ(∫ t0exp(µ(s− t)) dWs)](1.36)=(σ2 exp(−2µt− µT ))E[(∫ t+T0exp(µs) dWs)(∫ t0exp(µs) dWs)](1.37)=(σ2 exp(−2µt− µT )){E[(∫ t+TTexp(µs) dWs)(∫ t0exp(µs) dWs)](1.38)+ E[(∫ t0exp(µs) dWs)2]}.Using the properties of the Wiener process and the Ito¯ isometry [65] respectively, the first expectationabove evaluated to zero and and the second evaluates to giveCxx(T ) = σ2 exp(−2µt− µT )(∫ t0exp(2µs) ds)(1.39)= σ22µ (1− exp(−2µt)) exp(−µT ). (1.40)The case where T < 0 can be evaluated in a similar way and gives the following result for all T :Cxx(T ) =σ22µ (1− exp(−2µt)) exp(−µ|T |) ∼σ22µ exp(−µ|T |) for t→∞. (1.41)As |T | → ∞, we see that Cxx(T ) → 0. We also see that when µ is large, the rate of decay to zero issignificantly quicker, indicating that an OUP with a stronger attraction to the origin decorrelates morequickly. For the OUP xt, the autocorrelation function is equal to exp(−µ|T |).We can compute the power spectral density for the OUP by computing the Fourier transform ofCxx(T ) (1.41):PSD(ω) = F [Cxx(T )] (ω) =∫Rexp(iωs)Cxx(s) ds (1.42)=(σ22µ (1− exp(−2µt)))∫Rexp(iωs) exp(−µ|s|) ds (1.43)= σ2 (1− exp(−2µt))µ2 + ω2(∼ σ2µ2 + ω2 for t→∞). (1.44)From this, we see that the OUP is a red noise process as its power spectral density decays with increasingvalues of ω.If the Gaussian white noise forcing, dWt in (1.25) is replaced with an α-stable noise process, dL(α,β)tthe resulting process, x˜t is referred to as an Ornstein-Uhlenbeck-Le´vy pricess (OULP). The OULP is firstintroduced in Section 3.2.2, and is used as a fast stochastic forcing process several times in Section 3.3.12It also appears in Chapter 4. The dynamics of an OULP x˜t are given bydx˜t = −µx˜t dt+ σ dL(α,β)t , t ≥ 0. (1.45)The OUP and OULP share several similarities. In particular, their characteristic functions are functionallyvery similar:ψx(k, t) = exp(ikx0e−µt −σ2k24µ (1− exp(−2µt)))(1.46)ψx˜(k, t) = exp(ikx˜0e−µt −σα|k|αΞ(k;α, β)αµ (1− exp(−αµt)))(1.47)However, many of the quantities that characterize the OUP like autocovariance and power spectral densitycannot be defined for the OULP.Stochastic averagingThe goal of stochastic averaging is to approximate the slow dynamics of a fast-slow stochastic dynamicalsystem in the weak sense with another lower-dimensional system evolving on the slow time scale only.A weak sense approximation refers to, broadly-speaking, approximating the statistical properties such asthe distribution and autocovariance (if appropriate), as opposed to a strong sense approximation whichtries to model path-wise properties. The stochastic averaging approximations given in [63], called the (L)approximation and the (N+) approximation, are of principal interest in this thesis.We discuss their derivation here via an example of stochastic averaging. We consider a slow-fastsystem inspired by [63],dxt = −xt dt+h(xt)yt√ dt, dyt = −ytτ dt+1√ dWt (1.48)where 0 < 1, τ > 0, and h is a function. The fast process, yt, which is an OUP, is driving the slowprocess xt. In case where h(x) = 0 for all x, xt decays to zero and has stationary probability distributiongiven by a δ-function centred at x = 0. If this is not the case, then the driving from yt will cause thedistribution for xt to “spread out”, ceasing to be a δ-function. In general, we cannot determine the exactform of the marginal PDF for xt.The (L) approximation is given by the sum of a mean drift and an OUP perturbation: xt+ξt. The termxt is an approximation for the dynamics of xt in the mean sense only and is called the (A) approximation.It is derived by computing the mean drift of xt, conditioned on the behaviour of yt:dxt = f(xt) dt,x0 = x0,where f(x) = E(y|x)[−x+ h(x)yt√]= −x (1.49)and E(y|x) [f ] denotes the expectation of f with respect to the stationary distribution of y|x (i.e. thedistribution of y assuming a fixed, known value of x). Computing the (A) approximation is now a simple13matter of solving the initial value problem (1.49) for xt, giving the solutionxt = x0 exp(−t). (1.50)The (L) approximation is computed by adding a linear approximation of the remainder xt − xt = ξt tothe (A) approximation xt. The dynamics of ξt are given bydξt = dxt − dxt = −xt dt+h(xt)yt√ dt+ xt dt (1.51)= −ξt dt+h(xt + ξt)yt√ dt (1.52)where ξ0 = 0. Assuming that ξt is small, a leading order approximation to xt is applied to the secondterm on the right-hand side, which givesdξt = −ξt dt+ fˆ(xt, yt) dt, fˆ(x, y) =(h(x)y√)(1.53)At this point, we wish to replace the term fˆ dt, with an appropriately scaled Gaussian white noise,ς(xt) dWt, where ς(xt) is determined by studying the statistical character of fˆ ; in particular, its stationaryautocovariance, conditioned on a fixed value of x:Cfˆ fˆ (T ;x) =h(x)2 Cyy(T ) = h(x)2τ2(exp (−|T |/τ)2τ). (1.54)We notice something significant here. As → 0, which means the time scales between the fast and slowprocesses is becoming infinitely large, the autocovariance function for fˆ becomes narrower and taller,approaching a scaled δ-function since lim→0(2τ)−1 exp(−|T |/τ) = δ(T ). This justifies our assumptionthat a white noise process is an appropriate weak approximation for fˆ dt and also gives us the appropriatevariance for the forcing, ς(x)2 = h(x)2τ2. The variance of approximating stochastic forcing can also bedetermined by computing the global integral of the autocovariance function for fˆ :ς(x)2 =∫ ∞−∞Cfˆ fˆ (T ;x) dT = h(x)2τ2. (1.55)Hence, we can weakly approximate the fˆ dt term with τh(xt) dWt and the dynamics of ξt are approximatedbydξt = −ξt dt+ τh(xt) dWt, ξ0 = 0. (1.56)Thus the (L) approximation is given by xt + ξt where the dynamics of both processes are given by (1.50),(1.56). Notice that the coefficient of the noise term in the approximation is dependent on the parameterτ , which characterizes the rate at which the yt decorrelates.The (N+) approximation is computed in a similar fashion, but does not treat the dynamics of xt asthe sum of a mean drift and a perturbation like the (L) approximation. Rather it relies on separating thedynamics of xt into a drift , f(xt) and fast perturbation term fˆ(xt, yt). The example (1.48) is naturallygiven in this form with f as defined in (1.49) and fˆ as defined in (1.53). Assuming a fixed value of xt,14due to the difference in time scales, we can replace the fast forcing term fˆ(xt, yt) with a scaled Gaussianwhite noise forcing as we did for the (L) approximation. However due to the results of Wong and Zakai[91, 92], the approximation of a rapidly fluctuating continuous process by a white-noise forcing should beinterpreted according to the Stratonovich stochastic calculus: fˆ(xt, yt) dt→ ς(xt) ◦ dWt, where ς satisfies(1.55). Hence the (N+) approximation of xt (1.48) is Xt which satisfies the SDEdXt = −Xt dt+ τh(Xt) ◦ dWt, X0 = x0. (1.57)There are advantages to analyzing or utilizing the approximations (1.56) and (1.57) instead of (1.48)itself to study the statistical properties of xt. Due to the fact that (1.48) has multiple time scales andhence is a stiff problem, very fine time discretizations are required to ensure stable and reliable simulations,which may make simulation of (1.48) or similar multiple time scale systems prohibitive. The (L) and (N+)approximations generally require no such considerations. Also, the analysis of the statistical propertiesof xt + ξt (1.56) and Xt (1.57) are generally more straightforward than for xt in (1.48) due to theirsimpler dynamical form, and so results related to xt in the small limit are easier to obtain from theapproximate dynamical forms (1.56) and (1.57). The (N+) approximation is more accurate in the weaksense than the (L) approximation, matching the distribution of xt more accurately, particularly in thecase of nonlinear fast-slow systems, but is also more challenging to implement in general due to theStratonovich interpretation.More information related to these stochastic averaging approximations can be found in Chapter 3 andin [2, 63], which also includes examples of their usage in climate modelling.1.2 Literature reviewParameter estimation problems in which discrete-time data are fit to stochastic differential equation mod-els of the form (1.1) or higher-dimensional variations typically involve assuming functional forms for f, σwhich are dependent on unknown parameters, and attempting to estimate those parameters such that themismatch between the given data and the corresponding predictions based on the model give the smallestmismatch possible. A broad survey of currently established methods for parameter estimation of SDEmodels can be found in [84]. The methods discussed include estimating function methods which attemptto maximize the statistical likelihood relative to the unknown parameters by determining solutions to thescore function and Bayesian methods in which Markov Chain Monte Carlo methods can be applied todetermine a distribution for the true value of the parameter, and the final estimate is typically the meanof the resulting distribution. Unfortunately, these methods are computationally demanding, especiallyfor higher-dimensional problems. A more recently developed method of parameter estimation researchedby D. Crommelin and E. Vanden-Eijnden exploits the relationship between the eigenstructures of the in-finitesimal generator [65] and the conditional expectation operator, which, for the latter, can be estimatednumerically from a given set of discretely-measured time series data [11, 12, 15]. While implementationof this method can be moderately challenging in terms of constructing the appropriate linear operators, itis computationally fast, as it only requires the numerical solution of a least squares problem of dimensionequal to the number of parameters being estimated, possibly subject to some constraints. The work in[15] provides a thorough outline of the theory behind the estimation procedure in the context of stochastic15differential equations and includes a discussion of the applicability of this method to data that cannotstrictly be considered Markovian. This discussion is important since the assumption of a MarkovianSDE model requires white noise forcing, which is an idealization of quickly decorrelating coloured noiseprocesses and all physical processes relevant to climate have non-zero memory.There is a large body of work related to stochastic averaging methods, much of it motivated byclimate modelling problems. Some of the pioneering works in the field written by Khas’minskii [43, 44],Papanicolaou [69] and Hasselmann [33] have been summarized in the works of Arnold et al. [2] and Culinaand Monahan [63] where a hierarcy of approximations is discussed, denoted (A), (L), and (N+). Underthe (A) approximation, the slow variables of a fast-slow dynamical system are approximated in the meansense only, while the (L) approximation can be considered an improvement upon the (A) approximationwhere an additive linear correction about the mean is included. The (N+) approximation allows forinteractions between the fast and slow processes through a generally state-dependent noise term. Thesestochastic averaging methods have been well-studied and there are several convergence-related results foreach approximation [2, 45]. They are appropriate for fast-slow stochastic dynamical systems where thereis a clear separation of time scales and are not necessarily appropriate for systems where a continuum oftime scales or not-well-separated time scales are present. In this case, approaches based on Mori-Zwanzigprojection operators and Ruelle response theory can be used to derive approximate dynamics in thepresence of weak coupling between two subsystems [93, 94]. In the case of fast-slow systems, significantlyless attention has been paid to stochastic averaging methods for the case where the stochastic forcing isnot Gaussian, but α-stable. To our knowledge, the only study that investigates this topic is given in [85],where a stochastic averaging approximation is derived for a basic linear system forced with α-stable noise.A more broad study of the topic is lacking.Given the evidence of the presence of anomalous diffusion processes in atmospheric flows [17, 37,82, 83], it is natural that α-stable processes would appear in related modelling problems. As has beendemonstrated in [27, 41, 57], the slow variables of a coupled fast-slow dynamical system where the fastprocess is a sufficiently fast deterministic, but chaotic, process can appear to be driven by a Gaussianwhite noise process. This is also true if the fast process is stochastic with finite variance, but whosestationary distribution is non-Gaussian [63]. The work in [79, 87] indicate the existence of multiplicativenoise dynamics in atmospheric modelling problems and that the associated distributions have infinitevariance. Because of these results, we expect that slower dynamical variables would appear to be drivenby α-stable noise processes when they are driven by these multiplicative noise processes and the timescale separation is sufficiently large. To our knowledge, this has not been explicitly demonstrated.1.3 Overview of this thesisTo our knowledge, the parameter estimation method of Crommelin and Vanden-Eijnden [12, 15] has notbeen applied on an observationally-based data set. To test the applicability of this method, we applyit to zonal and meridional wind velocity data from the ERA-40 re-analysis data project [22] to estimatethe parameters for a nonlinear surface ocean wind velocity model derived in [59, 60]. The statistics ofsea surface vector wind components are known to be strongly skewed in many places [58]. In Chapter2, we demonstrate the successful application of the generator estimation method to the data resultingin smooth parameter field estimates allowing accurate reproductions of the statistics of observed surface16winds. The estimated parameter fields also displayed certain tendencies in different geographic regionswhere the weather patterns are similar. It also allowed us to identify possible deficiencies of the model.Dynamical nonlinearities are not the only way that non-Gaussian statistics appear in time-series data.In Chapters 3 and 4, we investigate a distinctly different mechanism under which non-Gaussian behaviourappears. Chapter 3 outlines our derivation of stochastic averaging approximations for fast-slow systemswhere the fast process is driven by α-stable noise forcing and the dynamics are linear in the fast process.The approximations we derive are analogous to the (L) and (N+) approximations discussed in [2, 63]but the fact that the systems are driven with α-stable noise requires us to take a significantly differentapproach in our derivation. This approach makes use of the characteristic functions, similar to thatundertaken in [85] when considering the case of purely linear systems. Additionally, the classical (N+)approximation is interpreted in the sense of Stratonovich when the approximation has multiplicative noise.The (N+) approximation that we derive is interpreted according to the less well-known Marcus stochasticcalculus [55] in cases where it includes a multiplicative noise term.Having derived stochastic averaging methods that can be used to approximate the slow dynamics ofthe fast-slow system driven by α-stable noise in Chapter 3, we subsequently address the averaging resultsfor more general driving processes with power-law tails and infinite variance. We consider a class of SDEsinspired by low dimensional projections of quadratically nonlinear dynamics such as atmospheric fluiddynamical models [52]. These SDEs are linear and are driven with correlated additive and multiplicativeGaussian white noise, which we abbreviate as CAM noise [79]. It has been shown that within a domainof the parameter space of the CAM noise process, its stationary PDF has power law decay with exponent−(1 + α), where 0 < α < 2 [73], and hence has infinite variance. The GCLT then suggests that if theCAM noise process were to be a fast perturbation to a slow process, the slow process would appear to beforced with an α-stable noise process on the slow time scale. In Chapter 4, we extend the results derivedin Chapter 3 to show that the slow component of fast-slow systems where the fast process is a linear CAMnoise process can be weakly approximated by a single time scale system forced with α-stable noise andhaving lower dimension.17Chapter 2Parametric estimation of the stochasticdynamics of sea surface winds2.1 IntroductionIn [59], an idealized model was developed for the stochastic dynamics of sea-surface winds, based on ahighly simplified representation of the momentum budget of a surface atmospheric layer of fixed depth.This model results in an analytic expression for the probability distribution of surface winds in terms ofphysically meaningful parameters. The focus of this earlier study was on this probability distribution,which is of interest in the context of air-sea interactions [e.g. 20, 23, 38], wind power [5, 51], and windextremes [78]. No attention was paid to the temporal structure of the simulated winds. Neither was therean effort made to estimate model parameters from observations. In fact, this cannot be done using theprobability distribution alone as it does not uniquely determine the model parameter set. In the presentstudy, the model parameters are estimated from long time series of sea surface wind data.The model from [59] consists of two coupled stochastic differential equations (SDEs). Parameter esti-mation for SDEs is a challenging task in general, and the fact that the SDE considered here is multivariate(2-dimensional) and contains nonlinear terms adds to the difficulty. Furthermore, the observational datamay not exactly satisfy an SDE. For example, the data may not be Markov. Recently, Crommelin andVanden-Eijnden have introduced a variational approach which avoids the restriction that the availabledata are exactly described by a SDE [11, 13, 15]. In this approach, the data are fit with the “closest”SDE, in spectral terms. The method is computationally cheap, and it can handle 2-dimensional SDEsand nonlinear terms.Alternative estimation methods for this purpose, such as Markov chain Monte Carlo methods (see [84]for a survey), are often computationally very demanding, making them less suitable to process timeseriesfor many different spatial locations as will be done here. Moreover, they usually rely on model properties(for example, a diagonal diffusion matrix) that can prove too restrictive for the model under considerationhere.The analysis in this study will demonstrate that the method of Crommelin and Vanden-Eijnden isapplicable in situations when the data are not exactly described by the model to which they are fit.In particular, we will consider the effects of fitting a model driven by Gaussian white noise to data for18which the driving variability has a autocorrelation time (ACT) that is not short relative to the timescaleof the resolved dynamics. In earlier studies, the model was used as a tool to investigate the influenceof large-scale and boundary-layer processes on the probability distribution of surface winds. Obtainingestimates of model parameters is the first step in an assessment of the quantitative utility of the model.Furthermore, as we will show, this analysis can be used to identify model features which are unreconcilablewith the data.In Section 2.2, we offer a brief description of the parameter estimation method, demonstrate the resultsof its application to simulated data from a simple stochastic differential equation (SDE) model, and discussstrategies used to improve parameter estimates. In Section 2.3 we analyze the stochastic model for seasurface wind dynamics and consider the application of the estimation method to data generated by thismodel. Lastly, we estimate model parameters from long time series of sea surface winds in Section 2.4.A discussion and conclusions are presented in Section 3.4.2.2 MethodA basic outline of the method of [13, 15] for the parametric estimation of diffusions follows. Consider astochastic process, Xt ∈ Rd that is described by the SDEdXt = γ(Xt) dt+ Σ(Xt) dW t, (2.1)where γ is a vector function of length d and Σ is a d × d matrix function. This equation is interpretedin the sense of an Ito¯ SDE [26, 65]. Governed by these dynamics, Xt has a probability density function(PDF) whose evolution can be expressed as:p(x, t+ δt) = P†δtp(x, t). (2.2)(where Q† denotes the formal adjoint of the operator Q). The operator Pδt is given byPδtf(x) = E[f(Xt+δt) |Xt = x](2.3)and is known as the conditional expectation operator [28]. The infinitesimal generator, G, for the diffusionprocess is given byG =d∑i=1γi∂∂xi+ 12d∑i=1d∑j=1(ΣΣT )ij∂2∂xi∂xj(2.4)where ΣT indicates the transpose of the matrix Σ. It is a standard result that the infinitesimal generator(henceforth referred to as “the generator”) and the conditional expectation operator satisfy the followingoperator equation:Pt = exp (tG) . (2.5)Formally, (2.2) is equivalent to the Fokker-Planck equation. When the dynamics admit a stationaryprocess, the leading eigenvalue for the operator G will be zero, while all others will be strictly negative.A particular set of observations which one desires to model as an SDE may not be generated bydynamics of the form (2.1). For example, the data may be non-Markovian because only a projection of19the full state space is sampled, analogous to how the dynamics of stochastic partial differential equationmodels can be reduced via stochastic parameterizing manifolds to approximate the behaviour of a fewobserved modes using SDEs which include memory terms, making them non-Markovian [8, 9]. Rather thanrequire the data come exactly from an SDE, the approach of Crommelin and Vanden-Eijnden finds theclosest Markovian SDE to the data in terms of the eigenstructure of the generators of the model and thedata. In particular, this approach minimizes the residual of the eigenproblem ‖Gφˆ− λˆφˆ‖2,w (the subscript“2, w” denotes a weighted Euclidean norm which will be described soon), where G is the generator of themodel and φˆ and λˆ represent the eigenfunctions and eigenvalues of the generator, estimated from the data(throughout the text, estimated quantities will be denoted by a caret). In this sense, the approach findsthe closest SDE model to the data.The method of Crommelin and Vanden-Eijnden requires the estimation of the eigenstructure of thegenerator from data. From (2.5), there is a direct relationship between the eigenstructures of Pt and G:they have the same eigenfunctions and the eigenvalues can be related via a simple transformation:Ptφˆk = Λˆkφˆk ←→ Gφˆk = λˆkφˆk, where λˆk =1t log(Λˆk). (2.6)We can exploit this relationship between eigenstructures to obtain the eigenstructure of G from that ofPt, as the latter is relatively easy to estimate from time series data. We can also define the adjointeigenproblem: P†t (ξˆlρ) = Λˆlξˆlρ where ρ denotes the stationary distribution of (2.1). The eigenstructureof the conditional expectation operator is estimated by using a truncated Galerkin approximation whichsolves a modified version of the eigenproblem (2.6). By approximating the eigenfunction φˆk(x) by anexpansion over n basis functions {βi} (i.e: φˆk(x) ≈∑ni=1 vkiβi(x)) chosen such that they are squareintegrable with respect to the invariant measure (the stationary probability density function for Xt), weobtain the following eigenproblem:n∑i=1vki〈Pδtβi(Xt), βj(Xt)〉 = Λk(n∑i=1vki〈βi(Xt), βj(Xt)〉), 1 ≤ j, k ≤ n (2.7)where vki is the i-th entry in the k-th eigenvector and Λk is the corresponding eigenvalue. For the adjointeigenproblem, we can define a similar weak form by expanding ξˆl onto the same basis (i.e: ξˆl(x) ≈∑nj=1wljβj(x) ). By the properties of the conditional expectation operator and our definition of the innerproduct:〈Pδtβi(Xt), βj(Xt)〉 = E[βi(Xt+δt)βj(Xt)](2.8)〈βi(Xt), βj(Xt)〉 = E [βi(Xt)βj(Xt)] (2.9)It is not possible in general to perform a complete decomposition of φk into a finite number of basisfunctions, but this is not required for this procedure. Using the sample mean as approximations in(2.8, 2.9), we create the linear system governing the eigenvalue problem (2.7) and solve it numerically toget estimates {Λˆk} for the eigenvalues of the conditional expectation operator. Furthermore, we obtainestimates of the projections of the eigenfunctions, φˆk onto the basis βi.Analysis of the convergence of the estimates {Λˆk} with respect to the number of data points N for N20large indicates that the absolute difference between the true value and estimate of the eigenvalue estimate,|Λk − Λˆk|, is proportional to N−1/2 [14, 28]. The estimates of the eigenfunctions, φˆk, are determined bytheir projection onto the basis {βi} and are characterized by the eigenvectors vˆk = [vˆk1, vˆk2, . . . vˆkn]>which have a similar dependence on N : ‖vk − vˆk‖2 ∼ ckN−1/2, ck constant. In theory, discretizationerror decreases with an increasing number of Galerkin basis functions, n, when the data set is infinitelylarge. Numerical experiments suggest that a finite value of n is optimal in terms of reducing biases andvariability of estimates when the dataset is finite [15].With a set of estimated eigenvalues and projected eigenfunctions, we use (2.6) to obtain estimates ofthe eigenvalues for the generator, {λˆk}. Assuming a particular form for the model SDE, its generator canbe constructed and the objective function is given by:‖Gφˆ− λˆφˆ‖2,w =n∑k,l=1mkl∣∣∣〈G({bi}, {aj})φˆk, ξˆl〉 − λˆk〈φˆk, ξˆl〉∣∣∣2, (2.10)where, for the models under consideration, the drift and diffusion of the generator can be expressed inthe following form:γ(x) =Nb∑i=1bigi(x), A(x) = ΣΣT (x) =Na∑i=1aiH i(x) (2.11)and the mkl are weight coefficients (to be discussed later). The objective function (2.10) proposed in [15]is defined in this work as Eg and other choices for objective functions are offered. The minimization of(2.10) is performed with respect to the parameters {bi}, {ai} and can be done using a least squares orquadratic programming minimization [12]. If γ(x) and A(x) cannot be expressed in the form given by(2.11), the estimation procedure can still be applied but a different minimization technique must be used.We will now demonstrate the application of the estimation procedure to a simple stochastic process, forwhich analytical results are available.2.2.1 The Ornstein-Uhlenbeck processAs an illustration of the application of the parameter estimation method, we consider the univariateOrnstein-Uhlenbeck process (OUP),dxt = µxt dt+ σ dWt, t ≥ 0 (2.12)where W = Wt is a standard Brownian motion. The constant µ is chosen to be less than zero to ensurethat the process possesses stationary solutions. The univariate OUP is one of the simplest stationarycontinuous time stochastic processes and is easily studied analytically. The generator for the OUP isG = µx ∂∂x +σ22∂2∂x2 . (2.13)It can easily be shown that the eigenvalues of the generator for this process are the non-negative integermultiples of µ. Simulations of (2.12) with µ = −1, σ = 1 were generated, from which the eigenvaluesof the conditional expectation operator Pδt were estimated (where δt is the time step between data211 2 3 4 5 6−8−6−4−20kλk−1.4−1.2−1−0.8µ0.9511.051.11.15σ2Figure 2.1: Estimates of the eigenvalues and the coefficients of the generator of the OUP withµ = −1, σ = 1. The time series used to estimate the eigenvalues contains 30000 points witha resolution time step δt = 0.1. Six eigenvalues are considered to obtain parameter estimatesand no weighting is applied when minimizing the objective function. Sampling variabilitywas assessed by estimating these parameters from 50 independent realizations of the randomprocess. The bottom and top of the boxes span the lower to upper quartiles with the mediandrawn in between. The whiskers extend to a maximum of 1.5 times the interquartile rangeand the black dots indicate values lying outside of this range.points). To assess sampling variability, 100 independent realizations of x(t) were generated. From these,the corresponding estimates for the eigenvalues of the generator were estimated using (2.6) (Figure 2.1).Not surprisingly, there is some error in the estimates of the eigenvalues, resulting from truncation of thespectrum (due to the projection of the eigenfunctions onto a finite size basis) and sampling variability.Consistent with the results of previous studies [15], eigenvalues of higher index tend to have greatersampling variability and bias, while the first few eigenvalues (associated with the stationary distributionand the slowest decaying modes) are the most robustly estimated.Having estimated the eigenvalues of the generator, we now focus on parametric estimates of thegenerator itself. We assume that the simulated data satisfies an OUP as given by (2.12) and minimizethe norm ‖G({ai}, {bj})φˆ− λˆφˆ‖2,w with respect to {ai}, {bi} such that the drift and diffusion coefficientsof the generator of x(t) are given byγ(x) = µx, A(x) = σ2. (2.14)The estimates µˆ, σˆ2 are shown in Figure 2.1. Although the estimated values are on average close to thetrue values, the estimates are clearly biased. Strategies to reduce this bias will be presented in Section2.2.2.When estimating the parameters of G from a specified SDE model, it is possible that the generator maybe overspecified, in that there are terms within the expressions for the drift and diffusion with coefficientsthat are identically equal to zero in the dynamics that generated the data. To assess the robustness ofthe approach to model overspecification for the case of data drawn from the OUP, we fit the data to thegenerator for whichγ(x) =3∑i=0bixi, A(x) =3∑i=0aixi. (2.15)Ideally, the method should return values of {ai}, {bi} that are zero except for a0 = σ2 and b1 = µ. Asexpected, these estimates are found to be distributed around the values we predict, although there aresome biases (Figure 2.2).220 1 2 3−1−0.50b ii 0 1 2 300.51a iiFigure 2.2: Estimates for the coefficients of the overspecified model given by (2.15) for data gener-ated from the OUP with µ = −1, σ = 1, illustrated as in Figure 2.1. The time series used toestimate the eigenvalues are 30000 points long with resolution time step δt = 0.1. Note thatwe have estimated 6 eigenvalues, and the parameter estimates are slightly biased.2.2.2 Weighting of eigenvaluesAs was illustrated in Figure 2.1, the sampling variability of eigenvalues increases with eigenvalue order.Depending on the problem under consideration, it is important that a sufficient number of eigenvalues beestimated so as to avoid possible degeneracies in the generator (i.e. having an identical PDF for differentparameter sets) and to capture the temporal structure of the stochastic process. For example, to estimatethe parameters of an OUP, we require at least two eigenvalues as the PDF is determined by the ratioµ/σ2. This requirement must be balanced against the tendency of estimates of higher order eigenvaluesto be biased.To retain higher-order eigenvalues in estimates of the values of the coefficients {ai}, {bi} while reducingtheir contribution to the objective function in order to account for their greater uncertainty, we use aweighted least squares method. We choose to use a weighting scheme having weights mij wheremij = w−λi/λ2 , w > 1, i = 1, 2, . . . , (λ1 = 0). (2.16)This weighting scheme assigns the first eigenvalue a weight of 1 and subsequent weights 1 ≥ w2 = w−1 ≥w3 ≥ w4 . . . (since all λi ≤ λ2 < 0 for i ≥ 2 by stationarity). We choose this weighting scheme so thateigenvalues of similar magnitude are penalized by a similar amount and eigenvalues with greater magnitudeare penalized more than smaller ones. There is no particular motivation for choosing a weighting schemeof this form beyond placing more emphasis on more well-estimated eigenvalues. In principle, one couldassign a weight of zero to eigenvalues beyond a certain index, but in cases in which groups of eigenvaluesare close or occur as complex conjugate pairs it is undesirable to give a weight of zero to one and anon-zero weight to another.Applying this weighting scheme to the variational estimates of the OUP parameters we find that thereis an improvement in the parameter estimates, such that both bias and sampling variance are reduced(Figure 2.3). Note that in this approach, w should not be allowed to become too large as that effectivelyputs all the weight on the first eigenmode which can lead to degenerate parameter estimates if it dependson a combination of parameters (as is the case for the OUP).231 5 10−1.4−1.2−1−0.8−0.6wµ1 5 100.60.811.2wσ2Figure 2.3: Estimates of µ and σ2 when weighting is applied to the eigenvalues in the objectivefunction. The value of w is indicated on the horizontal axis. Due to degeneracies in thegenerator for the OUP, the estimates of µ, σ2 are biased to values closer to 0 when w islarge. Since the PDF of the OUP depends only on the quantity µ/σ2, this quantity is wellestimated when the lowest eigenvalue is heavily weighted - although µ and σ2 are often notwell estimated themselves when higher order eigenvalues are suppressed. The box plots aredrawn as described in Figure 2.1.2.2.3 The effect of correlated noiseA potential source of mismatch between the data and the model to which they are fit is the assumptionthat the data are Markov and described by a diffusion process. The Crommelin-Vanden-Eijnden methodassumes that data are Markov and that the model to which they are fit is a SDE driven by Gaussian whitenoise. Real world processes are often better modeled by forcing which is correlated in time (e.g. red noise)and so there is a potential discrepency between the data and the white noise driven model to which theyare fit. If the driving red noise process is an Ornstein-Uhlenback process and is directly observed, thenthe dynamics can be expressed as a SDE in an extended state space with white noise forcing. However,if the red noise process can not be directly observed and its dynamics not accounted for in the generatorestimation method, it is still possible to estimate the stochastic dynamics of the data (albeit resultingin biased parameter estimates) provided that the data are subsampled with a coarse enough samplinginterval so that the red noise effects can effectively be “whitened”. This issue is considered in [15] forthe asymptotic limit in which the ACT of the red noise forcing (modelled as an OUP) approaches zero.Here, we consider ACT scales that are not small relative to the characteristic timescales of the resolveddynamics.We consider a linearly damped process driven by an OUP:dxt = −1τxxt dt+1√τyyt dt, dyt = −1τyyt dt+σ√τydWt. (2.17)It can be shown that in the limit τy → 0 the dynamics of x in (2.17) approach those of the univariate OUPgiven by (2.12). However, when τy, τx are of approximately the same order then the red noise forcingwill cause noticeable differences in the dynamics of xt from those of the univariate OUP. Specifically, itcan be shown via the Wiener-Khinchin theorem [26] that the autocovariance function for the variable x24in (2.17) is given by:Cxx(T ) =τ2xσ22(τ2x − τ2y )[τx exp(−|T |τx)− τy exp(−|T |τy)](2.18)Having generated a realization of x(t) from (2.17) we will fit it to a univariate (white-noise driven) OUPand interpret these results in the context of a univarate OUP that is statistically similar to x(t) in thesense that the stationary variance and ACT scales are equal. Using the autocovariance function above,we can determine the autocorrelation e-folding time (ACT) and variance for x:ACT[x] =∫ ∞0Cxx(T )Cxx(0)dT = τx + τy, Var [x] = Cxx(0) =σ2τ2x2(τx + τy)(2.19)Hence, the equivalent univariate OUP, x˜, satisfies the following SDE:dx˜t = −1τx + τyx˜t dt+στxτx + τydW˜t. (2.20)We note that in the τy → 0 limit, (2.20) is identical to the result obtained from stochastic homogenizationtheory [70]. For the fit of x(t) to a univariate time series to be meaningful, it is expected that the parameterestimates should yield estimates consistent with (2.20). In fact, applying the estimation procedure to xdoes not generally yield results that are consistent with (2.20) (Figure 2.4). When the resolution timestep, δt, is of the same order as τy, then the noise is not sufficiently de-correlated between time steps tobe approximated as “white”. Despite the fact that the separation of time scales between x(t) and y(t)may be large, the correlated nature of the noise is still apparent as the autocorrelation function (ACF)for x(t) displays the concave downward structure at the origin that is characteristic of a non-Markovianforcing [18] (Figure 2.5). Thus, x(t) cannot be well-approximated as Markov on the timescale of theresolution of the time series. The Markov assumption in the method results in fitting the data to matchshort time correlation behaviour, causing significant overestimation of the ACT scale. To address thisissue, we simply increase the resolution timescale.This increase in resolution time scale is achieved through data subsampling. Before the eigenstructureof the conditional expectation operator is estimated we reorder the data such that the time intervalbetween successive points is increased. For example, subsamping with stepsize 2δt yields the reordering:xδt, x2δt, x3δt, x4δt, ... , xNδt −→ xδt, x3δt, x5δt, ... , xnδt, x2δt, x4δt, ... x(N−1)δt (if N is odd).(2.21)By concatenating the subsets of subsampled data, no data is thrown away. Subsamping increases theseparation of the red noise ACT scale, τy from that of the resolution timescale, δt, of the data from whichthe conditional expectation operator is constructed, effectively “whitening” the correlated forcing. Thisdoes not eliminate the non-Markovian properties of the time series data, but merely mitigates them byresolving the data on a time scale such that the unresolved stochastic driving is weakly correlated. Itshould be noted that minor errors are introduced through the process of concatenating the data subsets(e.g: where xNδt is followed by x2δt). These errors could be corrected by ignoring the data around thetransition points, but given that the number of such points in the reordered time series is much smaller251 2 3 4 5 6 7 8 9 10−1.5−1−0.5µ1 2 3 4 5 6 7 8 9 100.60.811.21.4σ21 2 3 4 5 6 7 8 9 10−1.5−1−0.5µ1 2 3 4 5 6 7 8 9 100.40.60.811.2σ21 2 3 4 5 6 7 8 9 10−0.8−0.6−0.4−0.2µ1 2 3 4 5 6 7 8 9 100.20.40.6σ2Figure 2.4: Estimates for the coefficients µ and σ2 of the generator when the data is generated bythe system (2.17) In both figures, τx = 1. In the top, middle, and bottom rows the ACT scalesfor y are respectively τy = 0.02, 0.1, 0.25. Ensembles were computed from 50 time series, eachof length 50000 points with resolution timestep δt = 0.1. The horizontal axis indicates thedegree of subsamping used in the estimations (1 = no subsampling), and the red dashedline indicates the values of the coefficients for the equivalent OUP as determined by (2.20).Weighting is applied to offset biased estimates of the higher order eigenvalues (w = 2).than the number of points in the series itself, the error introduced is negligible.Applying various degrees of subsamping to an OUP where the forcing is a red noise process, theestimates of the parameters converge to the expected values of the equivalent white noise process (Figure2.4). Note that while subsamping results in mean parameter estimates that are closer to those expectedfor an equivalent OUP (2.20), the sampling variance of the estimated parameters increases with the degreeof subsamping, although the effect is marginal for low degrees of subsamping. Although not shown inFigure 2.4, the variance in the parameter estimates increases dramatically if the subsampling degree isincreased beyond 10. Finally, when coarsening the resolution of the time series, it is important that thedegree of subsamping is not so large that all information about serial dependence of x(t) itself is lost.Loss of this information will prevent accurate estimation of eigenmodes beyond the first, and thereforecorrupt all parameter estimates.260.2 0.4 0.6 0.8 100.20.40.60.81TCx x(T )/Cx x(0)τy = 10− 1τy = 10− 2τy = 0Figure 2.5: Plot of the autocorrelation function, Cxx(T )/Cxx(0), as a function of lag time T forxt as governed by the system (2.17). The values of τy are as indicated in the legend andτx = σ = 1.2.3 The sea surface wind modelThe sea surface wind model which we will consider is a slab model of the lower-atmospheric momentumbudget introduced by [58, 59]. It is assumed in this model that vector wind tendencies result fromimbalances between surface drag, downwards mixing of momentum from above the slab layer, and “large-scale ageostrophic forcing” (the sum of pressure gradient and Coriolis forces) which is decomposed intoa mean and fluctuations which are modelled as white noise. With ut and vt respectively the zonal andmeridional components of the wind vector, the model is given by the nonlinear SDE,dut =(〈Πu〉 −Kh2ut −cdh ut√u2t + v2t)dt+ σ11 dW1,t + σ12 dW2,t, (2.22)dvt =(〈Πv〉 −Kh2 vt −cdh vt√u2t + v2t)dt+ σ21 dW1,t + σ22 dW2,t. (2.23)whereW1,t, W2,t represent (mutually uncorrelated) standard Brownian motions. The parameters 〈Πu〉, 〈Πv〉represent the mean large-scale driving for their respective components, while the Brownian motion termsrepresent stochastic fluctuations of this driving. The turbulent exchange of momentum between the slablayer of depth h and the atmosphere above is represented as a coarsely-finite differenced diffusion witheddy viscosity K. The surface turbulent momentum flux is parameterized with a standard bulk drag lawwith drag coefficient cd. The coefficients of the noise terms can be expressed as components of a matrixΣ:Σ =[σ11 σ12σ21 σ22]. (2.24)The model (2.22, 2.23) is of the form given by (2.1) and has a generator given byGf =[〈Πu〉 −Kh2u−cdh u√u2 + v2] ∂∂uf +[〈Πv〉 −Kh2 v −cdh v√u2 + v2] ∂∂vf+ 12[(σ211 + σ212)∂2∂u2 + 2(σ11σ12 + σ21σ22)∂2∂u∂v + (σ221 + σ222)∂2∂v2]f. (2.25)27so the the model parameters 〈Πu〉, 〈Πv〉,K, h, σij can be estimated from surface wind data using themethod under consideration. We first cast the SDE in the form (2.11) by defining:γ(u, v) = b0(10)+ b1(01)+ b2(uv)+ b3(u√u2 + v2v√u2 + v2), (2.26)A(u, v) = a0[1 00 0]+ a1[0 11 0]+ a2[0 00 1], (A = ΣΣT ). (2.27)The estimation routine will be used to determine the parameters {bi}3i=0 and {aj}2j=0. Throughout thesecalculations, we assume that the parameter cd is fixed at cd = 1.3× 10−3. In fact, the drag coefficient isa function of wind speed and surface stratification [38]. This dependence is neglected in order to simplifythe calculations. Note that due to the form that we have assumed for the wind model, we cannot directlyestimate the values of the coefficients σij due to non-uniqueness of the square root of a matrix, butestimating {aj} gives us estimates for the entries of ΣΣT (a0 = σ211 + σ212, a1 = σ11σ21 + σ21σ22, a2 =σ221 + σ222). We use polynomial basis functions in u and v up to and including degree 2 in the estimationprocedure: {1, u, v, u2, uv, v2}. This choice of basis is motivated by the relative simplicity of the functionsand the infinite domain: (u, v) ∈ R2.hTop layerOcean surfaceDownward mixing ofmomentum from top layer ( Kh 2 u)Bulk surface drag ( c dh u|u|)Mean wind forc ing(〈Π〉 , constant)xFigure 2.6: A visual schematic of the sea surface wind model (2.22), (2.23). Note that the meanwind forcing term is directed parallel to the direction of the vector winds, while the mixingand drag forces are antiparallel. The stochastic forcing terms perturb the mean forcing termand no visual representation is given for them.2.3.1 Properties of the wind modelBefore considering parameter estimates for this model, we first review some of its features. We will theninvestigate the ability of the estimation procedure to recover model parameters from time series generatedby the model itself.28ReversibilityA stochastic process is defined to be reversible if the equations governing its behaviour satisfy detailedbalance conditions [75]. Detailed balance essentially means that when the system has achieved its sta-tionary state, the mass of probability flowing from one vector wind state u1 to another, u2, is preciselybalanced by the probability mass flowing from u2 to u1. Sufficient conditions for reversibility are thatthe diffusion matrix is a constant proportional to the identity matrix, and that the drift can be expressedas the gradient of a potential. Reversibility of the process is of practical utility in the present context asit regularizes calculations in the parameter estimation [15]. In particular, it allows the estimation of theinner product (2.8) to be determined using a symmetric estimator:〈Pδtβi(Xt), βj(Xt)〉 ≈12nn−1∑k=1βj(xk+1)βi(xk) + βj(xk)βi(xk+1). (2.28)We use a similar estimator can also be used to estimate the inner product (2.9): 〈βi(Xt), βj(Xt)〉 ≈12n∑n−1k=1 βj(xk)βi(xk) + βj(xk+1)βi(xk+1), although reversibility is not required for this estimator to beconsistent. Results from previous studies [59, 61] suggest that Σ is diagonal to a good approximation.This indicates that the process (2.22, 2.23) is close to reversible although possibly not exactly so. Inestimating parameters, we will make the approximation that the system is reversible.Stationary probability density functionWhen the noise intensity matrix Σ is diagonal (σij = ζδij), we can obtain an explicit expression for thestationary PDF for ut, vt, denoted puv:puv(u, v) = N exp{ 2ζ2[〈Πu〉u+ 〈Πu〉v −K2h2 (u2 + v2)− cd3h(u2 + v2)3/2]}(2.29)where N is the normalization constant [59]. We immediately see that for the probability density func-tion to be bounded, we must have h > 0. This constraint is of course physically necessary given theinterpretation of h as an atmospheric layer thickness. From inspection of this stationary PDF, we seethat there is a degeneracy with respect to the parameters such that different sets of the parameters(〈Πu〉, 〈Πv〉, K, cd, h, σ) will yield the same probability density function. In particular, the PDF is de-termined by the following four quantities:{〈Πu〉ζ2 ,〈Πv〉ζ2 ,cdhζ2 ,Kh2ζ2}(2.30)Under parameter changes in which these quantities are conserved the probability density function willremain unchanged. While we do not have an expression for the PDF for the anisotropic or correlatednoise model (σ11 6= σ22 or σ21, σ12 6= 0), inspection of the Fokker-Planck equation shows that the PDF isinvariant so long as the following quantities are conserved:{ 〈Πu〉σ211 + σ212, 〈Πv〉σ211 + σ212, cdh(σ211 + σ212), Kh2(σ211 + σ212), σ11σ21 + σ12σ22σ211 + σ212, σ221 + σ222σ211 + σ212}. (2.31)29Parameter Value〈Πu〉 1.6 km h−2〈Πv〉 0.8 km/h−2cd 1.3× 10−3K 4.5× 10−2 km2 h−1h 1 kmσ11, σ22 12.2 km h−3/2σ12, σ21 0Table 2.1: The base-case parameters used to generate realizations from the model (2.22), (2.23).By rescaling these parameters in a particular we can modify the time scale of the process without modifyingthe PDF. As we show in Section 2.4, this is a useful tool for dealing with model mismatch in this particularproblem.2.3.2 Estimating model parameters from simulated dataTo evaluate the estmation of model parameters in a “perfect model” framework, we will now considerapplying the estimation method to time series simulated from the stochastic wind model. For thesesimulations, we take parameter values that result in wind statistics within the range of the observedstatistics and simulate time series with the 6 hour resolution of the surface wind data we will considerin Section 2.4. The parameters that we use for our simulations are given in Table 2.1. A forward Eulerscheme [46] is used to integrate the model with simulation time step, dt = 0.001 h, for a duration of 45years. The model is resolved every 6 hours (yielding a time series with a length of 65700 data points).While the forward Euler method has low accuracy in both the weak and strong sense compared to othernumerical integration schemes, we do not expect a higher order numerical scheme to offer substantialimprovements. The forward Euler method has a low computational load and the large ratio of resolutionto simulation time steps plus the stochastic driving means that numerical inaccuracies between resolvedpoints will be essentially negligible.Weighting of the eigenvalues in the objective functionTo assess the effect that weighting the eigenvalues has on the quality of the estimation, we apply weights(as described in Section 2.2) to the estimation procedure. The results of these calculations are shown inFigure 2.7. For all values of the weight parameter, we see that model parameters are estimated well bythe method. We note that an increasing weight tends to improve the estimates of all parameters in thatthe median of the estimated values is closer to the true value. For some parameters, the sample variancedecreases with increased weighting, while in other cases it increases. These results reinforce the resultthat weighting generally improves the parameter estimates, although in the present case there is only amodest dependence of the recovered parameters on the weight value.The effect of red noiseThe assumption of white noise forcing of the atmospheric layer momentum budget is a useful approxima-tion, but it is physically unrealistic. In fact, we expect that fluctuations in the “large-scale forcing” should301 10 1001.51.551.61.65〈Πu〉1 10 1000.750.80.85〈Πv〉1 10 1000.040.060.08K1 10 1000.911.11.2h1 10 100146148150152a01 10 100146148150152wa2Figure 2.7: The influence of the weight parameter w on estimates of various parameters from thewind model using simulated data from (2.22, 2.23). The values of w are shown on thehorizontal axis for each parameter boxplot. The true value of each parameter is indicated bythe dashed red line. No subsampling of the data was carried out.occur on similar timescales as fluctuations in the surface winds themselves. Replacing the white noiseforcing terms dW1,t, dW2,t of (2.22, 2.23) (with σ12 = σ21 = 0) with red noise forcing terms η1,t dt, η2,t dtsuch thatdηi,t = −1τiηi,t dt+στidWi+2,t, i = 1, 2, t ≥ 0 (2.32)results in changes to the dynamics that the generator estimation scheme cannot account for (when theprocedure is applied to time series of u and v alone). The influence of red-noise forcing on model parameterestimates was tested using a range of forcing ACT scales τi (Figure 2.8). We see that when the red noiseforcing is close to white (i.e: τi is close to zero), the eigenvalue and parameter estimates are close to thetrue values, as was the case with white noise forcing. In contrast, eigenvalue and parameter estimateswhere the ACT is on the order of the resolution of the data show significant deviations from the truevalues.When the system is forced with red noise, the ACT scale of the measured trajectory is increased. Iftime of t is rescaled to t = αt∗ then the initial parameter estimates can be rescaled ash = αh∗, 〈Πi〉 =1α〈Πi〉∗, K = αK∗, σij =1√ασ∗ij . (2.33)As the ACT scale of the white noise forced model is directly scaled by the value of h, the estimate ofh increases to match the ACT scale of the data. To conserve the PDF the values of the square of thediffusion matrix (a0, a1, a2) must decrease to h increases. The values of 〈Πu〉, 〈Πv〉 decrease by a factor312 3 4−0.25−0.2−0.15−0.1−0.05λkk 0 0.1 6 subs.0.511.5〈Πu〉τ i 0 0.1 6 subs.0.40.60.8〈Πv〉τ i 0 0.1 6 subs.0.020.040.060.08Kτ i0 0.1 6 subs.11.522.5hτ i 0 0.1 6 subs.50100150a0τ i 0 0.1 6 subs.−1012a1τ i 0 0.1 6 subs.50100150a2τ iFigure 2.8: Top-left: Boxplots of estimates for the first three non-zero eigenvalues for the windmodel with varying ACTs in the forcing. The black boxplots indicate the eigenvalue estimatesfor simulated time series with white noise forcing, while the blue and red boxplots indicatethe estimates when red noise having short (τi = 0.1 h) and long (τi = 6 h) ACTs is used.The pink boxplots indicate the parameter estimates from the time series with τi = 6 h withsubsampling of degree 4 applied. The data are resolved at the standard time resolution δt = 6h. The other panels display the estimates for the parameters of the SDEs. The true valuesfor the white noise case are indicated with a black dashed line. In each case, an ensemble ofparameter estimates from 50 independent realizations was computed.close to that of the decrease in a0 to maintain the correct values of mean(u) and mean(v) with reduced σi,in accordance with the conserved quantities given in (2.31). Thus, while the presence of red-noise forcingresults in biased parameter estimates, these biases can be understood in terms of the model dynamics.2.3.3 Resimulation of winds using the reconstructed modelAs a final analysis of the accuracy of the reconstructed models, we will compare the statistics of simulationsthey generate with those from the data to which they were fit. In particular, we will investigate how wellthe means, standard deviations and skewness of the resimulated data match those of the original timeseries. As a demonstration of the accuracy of the reconstructed model parameters, the results of thisanalysis for data from the model driven by white noise are displayed in Figure 2.9. Also displayed isthe ACF of u for both the original data and resimulated data. Noting the small relative error in thecomputed statistics, we see that the reconstructed model is able to accurately reconstruct the statisticsof time series produced by these dynamics.We also considered the ability of the reconstructed model to capture the vector wind statistics whenthe time series are produced from the model with red-noise driving. In this case, while the parameterestimates are expected to be biased relative to their true values, the reconstructed model should be ableto capture the moments of the time series (cf Section 2.2.3). In fact, the first 3 moments of the PDF32u v−0.100.1meanu v−0.0100.01standard dev.u v−0.4−0.200.2skewness0 20 400.20.40.60.8autocor. of ut(h)Figure 2.9: Relative error of simulated statistics relative to original statistics (mean, standard de-viation, and skewness) for simulations from models fit to time series produced by the windmodel with white noise forcing. For each of these, the relative error of a quantity z is definedas (zoriginal − zreconstructed)/zoriginal. Lower right panel: The computed ACF of u from theoriginal time series (black, circles) and from the resimulated time series (red, crosses). Theestimates of the parameters were obtained without subsamping and with weight w = 1000.δt = 6 hours and 30000 data points are used in each time series.are recovered to a good accuracy (Figure 2.10). However, when the parameter estimation is carried outwithout subsamping of the data, the estimated parameters give resimulated data with an autocovariancefunction that matches only up to the resolution time step of the data. This bias is consistent with thefact that the estimation routine is predicated on the assumption that the data is Markovian [18].As was discussed in Section 2.2, this bias can be addressed by subsampling the data to a sufficientlylarge degree that the memory of the driving process is suppressed. This can be accomplished in practiceby first performing a preliminary analysis of the ACF of the data to estimate the ACT of the data, andthen subsampling the data such that the time step is on the same scale as the ACT of the data. In thepresent case, subsampling the data by taking every 4th point results in an evident improvement in thesimulation of the autocorrelation function (Figure 2.10), without significantly altering the estimates ofthe other statistics. As we have mentioned, this technique will only work when the ACT scale of the rednoise driving is sufficiently short compared to that of the dynamics of the observed variable such that thesubsamping eliminates the effect of memory in the forcing without destroying the autocorrelation structureof the resolved dynamics. When fitting the model to observed winds, we will combine data subsampingwith a parameter adjustment, such that the modeled time series autocorrelation approximates that of theobservations as closely as possible.33u v−0.100.1meanu v−0.0200.02standard dev.u v−0.200.20.40.6skewness0 20 400.20.40.60.8autocor. of ut(h)Figure 2.10: As in Figure 2.9, for the wind model (2.22, 2.23) fit to time series generated with rednoise forcing with ACT scales similar to the resolution of the time series (τi = δt = 6 h). Redsymbols denote results obtained using parameter estimates without data subsamping, whilethe blue curve denotes the results following subsamping of degree 4. These calculations werecarried out with an ensemble of 50 independent realisations.2.4 Estimation of model parameters from reanalysis wind dataHaving considered the application of the estimation method in a “perfect model” setting, we now considerthe reconstruction of wind model parameters from a global sea surface wind dataset. For this analysis,we will use the six-hourly 10-m winds from the European Centre for Medium-Range Forecasts ReanalysisERA-40 data, available on a 2.5◦× 2.5◦ grid from September 1, 1957 to August 31, 2002 [22]. Reanalysisproducts provide a three-dimensional representation of the atmosphere on a regular grid by assimilatingobservations into a fixed forecast model. As such, reanalysis winds are not direct observations butinstead represent a balance between observations and the predictions of a global, comprehensive modelof atmospheric physics. These data were used for the reconstruction rather than direct, remotely-sensedobservations (such as from the SeaWinds scatterometer on the QuikSCAT satellite) because of theirrelatively fine resolution in time and long duration. In fact, there is little difference between the statisticalfeatures of remotely sensed surface winds and those from a range of different reanalysis products [60, 62].We will first present the results of the appliction of the estimation procedure to data from threerepresentative locations. Following this, parameter estimates will be obtained across the global oceanbetween 60◦S and 60◦N (avoiding regions with sea ice for which the surface wind model is not appropriate).In order to ensure that the estimated parameters are physically meaningful (despite potential mismatchesbetween observations and the wind model), we impose constraints on the optimization. First, we requirethat the layer thickness, h, be bounded in between 1 m and 100 km. This range is clearly well outsideof the physically meaningful range; the most important requirement here is that h be non-negative. Alsobased on physical requirements, the eddy viscosity, K, is constrained to be non-negative. Without theseconstraints, the estimation method sometimes estimates unphysical values of K, h that are negative.Negative values of h are particularly problematic, as these are inconsistent with stationary solutions of34the model. That negative values of h can potentially occur without this constraint can be explained bythe fact that the algorithm estimates the parameter cd/h, which is often near zero, rather than h itself.2.4.1 Limitations of the modelNatural processes are always more complicated than any model chosen to study them. As such, weexpect that there are aspects of observed wind variability that will not be captured by the model andwhich may influence the parameter estimates. As discussed above, an important difference between thewind data and the model is that the real data is almost certainly non-Markovian in nature, while themodel solutions are, by construction, Markov processes. While it would be more accurate to fit the datato a model in which the variations in the “large-scale” forcing are modelled as red noise processes, we donot have these forcing time series from observations and as such cannot include them in the parameterestimation process. In addition to the challenges posed by the “red-noise” nature of the data, there isa potential problem posed by a difference of ACT scales between the zonal and meridional componentsof the wind data [62]. In many locations the meridional component experiences a much quicker rate ofdecorrelation than the zonal component. In the model, the single parameter h scales the ACT scale ofboth components. As the white noise processes driving ut and vt have the same (infinitely short) memory,the model cannot account for this anisotropy in autocorrelation structure. The process of estimatingmodel parameters from observations will have to accommodate this fact.One of the predictions of the wind model is that the mean and skewness of the vector winds arespatially anticorrelated. In particular, the component of the wind in the direction of the time-meanwind is predicted to be negatively skewed [58]. While this is broadly consistent with observations, insome locations the observed skewness of the along-mean wind component is weakly positive; in suchlocations, there will be a mismatch between the modelled and observed PDFs. Furthermore, while therelationship between the mean and skewness of the vector winds is captured qualitatively by the model,it underestimates the magnitude of the skewness [59]. Thus, it is not to be expected that the statistics ofthe reconstructed model will exactly match those of the observed winds.Finally, for the sake of simplicity and to be able to make use of the largest amount of data in ourreconstructions, in the present analysis we have neglected nonstationarities associated with the seasonaland diurnal cycles in the winds. What effect these nonstationarities may have on the reconstructedparameters is unclear, although seasonal and diurnal variability in the winds is generally considerablysmaller than the internally generated “weather” variability over the open ocean [16, 60].2.4.2 Parameter estimates at representative pointsWe will now consider the estimation of parameters at three different locations selected to be representativeof the statistics of large regions over the ocean. These three points are in the Pacific sector of the SouthernOcean (53◦ S, 135◦ W), in the midlatitude North Pacific near Japan (35◦ N, 180◦ E) and in the EquatorialPacific (3◦ S, 235◦ E). These points are respectively representative of three broad oceanic provinces.The southernmost point is characterised by relatively large vector mean wind and skewness (Table 2.2),with an autocorrelation function which decays on a timescale on the order of a day (Figure 2.11). Thenorthernmost point has relatively small mean vector wind and skewness, with a strongly anisotropicautocorrelation function that also decays on a timescale on the order of a day. The Equatorial point is35characterized by large mean vector winds and skewness, but a much more slowly decaying autocorrelationfunction.location 53◦ S, 135◦ W 35◦ N, 180◦ E 3◦ S, 235◦ Emean(u) (m s−1) 5.76 2.30 -5.93mean(v) (m s−1) -0.61 0.85 1.64st.d.(u) (m s−1) 5.92 6.09 1.66st.d.(v) (m s−1) 6.75 5.76 1.78skew(u) -0.68 0.14 1.78skew(v) -1.4 ×10−2 6.0 ×10−2 -0.45location 53◦ S, 135◦ W 35◦ N, 180◦ E 3◦ S, 235◦ Emean(u) (m s−1) 5.30 2.04 -5.94mean(v) (m s−1) -0.77 0.45 1.69st.d.(u) (m s−1) 5.97 6.05 1.52st.d.(v) (m s−1) 6.69 5.72 1.70skew(u) -0.31 1.9 ×10−3 0.31skew(v) 1.5 ×10−2 -4.3 ×10−2 -0.27Table 2.2: Top: Observed statistics of the ERA40 data at indicated locations. Bottom: Computedstatistics from the wind model (2.22, 2.23) with estimated parameters. Estimation of theparameters is carried out using the constraints described in Section 2.3.3, weighting w = 1000and a subsamping of degree 4.0 50 100010020030040050053◦ S, 135◦ WAutocov.(m2 /s2)0 50 10010020030040035◦ N, 180◦ E0 50 100102030403◦ S, 235◦ Etime (h)Figure 2.11: The autocovariance functions for the zonal and meridional wind directions (blue andred, resp.). Crosses: observed estimates. Dashed lines: simulations based on parameterestimates without a rescaling of h. Solid lines: simulations using parameter estimates thatinclude a rescaling of h. The rescaling is defined to match the absolute geometric meanautocovariance at a lag of 18 hours.The parameter estimates obtained from direct application of the estimation procedure to the reanalysiswinds with modified weighting w = 1000 and a degree of subsamping of 4 are presented in Table 2.3.Observed and simulated statistics are given in Table 2.2. As discussed in Section 2.4.1, the model isunable to account for the autocorrelation anisotropy that is evident at the locations considered. Thismodel bias results in a range of consequences; in some cases, the model ACF using the raw estimates issubstantially different than either of the vector wind ACFs (Figure 2.11). In order to offset this effect, werescale the estimated parameters (as described in Section 2.3.2) such that the PDF remains unchangedand the ACT scale is changed to match the geometric mean of the ACT scales of ut and vt.36location 53◦ S, 135◦ W 35◦ N, 180◦ E 3◦ S, 235◦ E〈Πu〉(m s−2)3.04× 10−5 8.56× 10−6 −4.35× 10−5〈Πv〉(m s−2)−3.46× 10−6 3.14× 10−6 1.28× 10−5K(m2 s−1)16.7 3.56× 104 9.51× 10−6h (m) 3.77× 103 1× 105 1.21× 103a0(m2 s−3)3.98× 10−4 2.75× 10−4 5.83× 10−5a1(m2 s−3)2.96× 10−5 7.63× 10−6 −2.18× 10−5a2(m2 s−3)4.97× 10−4 2.46× 10−4 4.74× 10−5location 53◦ S, 135◦ W 35◦ N, 180◦ E 3◦ S, 235◦ E〈Πu〉(m s−2)5.13× 10−5 2.47× 10−5 −1.76× 10−5〈Πv〉(m s−2)−5.84× 10−6 9.07× 10−6 5.19× 10−6K(m2 s−1)9.92 1.23× 104 2.35× 10−5h (m) 2.23× 103 3.46× 104 2.98× 103a0(m2 s−3)6.73× 10−4 7.94× 10−4 2.36× 10−5a1(m2 s−3)5.00× 10−5 2.20× 10−5 −8.82× 10−6a2(m2 s−3)8.40× 10−4 7.10× 10−4 1.92× 10−5Table 2.3: Top table: Estimates of the parameters in the wind model (2.22, 2.23) with weightingw = 1000 and subsamping of degree 4. Bottom table: parameter estimates following therescaling of parameters to improve estimates of the autocorrelation structure as described inSection 2.4.2.In the present case, we have used a lag of 18 hours for the autocorrelation matching. Rescaledparameter estimates are presented in the second row of Table 2.3. This rescaling results in modelledACFs that are closer approximations to those of observations, although significant differences persist(Figure 2.11).A comparison of the observed and modelled statistics at the three points under consideration demon-strates that certain moments of the data are captured better than others (Table 2.2). The mean windspeeds in the zonal and meridional directions are well captured, as are the standard deviations of thosequantities. In contrast, while the sign and relative magnitude of the skewness values are captured by themodel, the absolute magnitude is not. As we will see in Section 2.4.3, these results are consistent acrossthe global ocean.Even with rescaling, the parameter h takes values significantly greater than 1 km, which is physicallyunreasonable. As discussed above, this bias in h is consistent with the large-scale ageostrophic forcinghaving an ACT scale comparable to that of the resolved dynamics. The other model parameters areexpected to be correspondingly biased (relative to their true values) so that the model results in reasonablesimulations of the vector wind component PDFs.2.4.3 Global parameter estimatesWe now reconstruct global fields of the model parameters from the reanalysis surface wind data. Thestatistics of the simulated winds with estimated parameters are displayed in Figure 2.12; the parameterfields are shown in Figure 2.13. In both of these plots, results are shown with and without rescalingof h (to bring the observed and modelled autocorrelation structures into closer accord). Note that therestriction on h to keep the estimates bounded within 10−3 and 102 km is only applied in the initial37mean(u)(km/h)Original data Resimulated data Resim. data, rescal. −20020mean(v)(km/h) −20020st.d.(u)(km/h) 510152025st.d.(v)(km/h) 510152025skew.(u) −1012 −0.4−0.200.20.40.6skew.(v) −101 −0.500.5Figure 2.12: Left: Statistics of the original data. Middle: Statistics of resimulated data withparameters estimated using the C-VE method on the original data. Right: Statistics of theresimulated data with parameter estimates that include a rescaling of the parameters sothat the ACT scale is more accurately captured. As expected, the middle and right columnsare identical by construction. The parameters were estimated using the modified weightingscheme (2.16) with w = 1000 and subsamping factor of 4.parameter estimates and is not applied in the parameter rescaling.In general, the mean and standard deviation fields of the zonal and meridional winds are reproducedvery well. The sign and relative magnitude of the skewness fields are also well reproduced; as noted above,the model is unable to accurately simulate the absolute magnitude of the vector wind skewnesses.Considering the estimated parameter fields, we see that the parameter fields are generally less noisy(and more easily interpreted meteorologically) after the rescaling of parameters. The reconstructed 〈Πu〉field is strong in the region of midlatitude westerlies and the trade winds, while the reconstructed 〈Πv〉38< Πu >(km/h2 )Original estimates Rescaled estimates −202< Πv >(km/h2 ) −1012log 10(K)(km2 /h) −10−50log 10(h)(km) 012a0(km2 /h3) 204060a1(km2 /h3) −1001020a2(km2 /h3) 204060Figure 2.13: Left: Estimates of the parameter fields. Right: Parameter field estimates after therescaling of the parameters so that the overall ACT scale is more accurately captured. (Theparameters were estimated using the weighting scheme (2.16) with w = 1000 and subsampingfactor of 4.) In the initial estimates for h, we have enforced bounds on h ∈ [10−3, 102] km.field is only strong on the eastward flanks of the subtropical highs. As these parameters determine themean vector wind, the results are consistent with the mean vector wind climatology. The values of a0, a2are strongest in the storm tracks of the NW Pacific and Atlantic, and the Atlantic-Indian Ocean sectorof the Southern Ocean, which again is consistent with the interpretation of the stochastic forcing asrepresenting variability in the large-scale driving processes. The a0 and a2 maps are also similar which39is consistent with the observation that the vector wind standard deviations is generally close to isotropic[59]. That the cross terms, a1, are generally weak is consistent with the observation that the vector windsare, to a first approximation uncorrelated. These results also provide an a posteriori justification of theassumption that the vector wind dynamics are reversible (Section 2.3).In contrast, the estimates of h and K are more problematic -particularly where the vector windskewness is small (Figure 2.14). In such regions, it would appear that the estimation routine is unable todistinguish between the linear and nonlinear drag terms in the equations of motion. Skewness in the vectorwinds results from the nonlinearity of the surface drag in this model. When the vector wind fluctuationsare approximately symmetric around the mean, there is a degeneracy between the linear and nonlineardrag terms. Numerical simulations of (2.22, 2.23) demonstrate that the modelled wind component ACTis set by both h and K, such that the ACT is unchanged if h and K are increased together in theappropriate way (not shown). When the vector winds are unskewed, h can take arbitrarily large valueswithout substantially changing the shape of puv(u, v). In such a case, K is determined by the ACT: if his unreasonably large, so too is K. To improve estimates of h, we will now consider a reinterpretation ofthe model in which K is set to zero.−2 −1 0 1−0.500.511.52log10(h)Original estimates−2 −1 0 1−10123Rescaled estimatesSkewness along mean vector wind directionSkewness along mean vector wind direction −1.5 −1 −0.5 0 0.5log10(h) estimates 0 0.5 1 1.5 2 2.5Figure 2.14: Top: Scatterplot of the skewness of the wind speed along the mean wind directionagainst the logarithm of the estimated value of h. Recall that in the original parameterestimates we apply constraints that include an upper bound on h. Bottom-Left: Skewnessof the wind speeds along the mean wind direction. The white (black) contours indicate thelevel curves where the skewness is equal to 0 (equal to -0.5). Bottom-Right: The field ofrescaled h estimates with the level curves superimposed.402.4.4 Improved estimates of hWe consider an alternative interpretation of the wind model in which h is interpreted not as the depthof an arbitrary slab, but as the height at which turbulent transport of momentum vanishes. In thisinterpretation, there is no downwards mixing of momentum from above the layer so the parameter K isset to 0 and the only two deterministic forces that act on the wind speeds are the mean “ageostrophicforce” and the surface drag.Re-estimating parameters when we constrain K to be zero, the statistical fields for the mean andstandard deviation of the vector wind components do not change (not shown), while the skewness fieldsare slightly affected (Figure 2.15). The linear drag term does influence the shape of the vector wind PDF;in its absence, the flexibility of the model in this context is reduced.Original dataskew.(u) Resimulated data Resim. data w/ rescal. h −1012−0.500.5skew.(v) −101 −0.500.5Figure 2.15: Skewness fields for the measured and simulated data when K is set to zero.log 10(h)(km)Original estimates Rescaled estimates −101Figure 2.16: Estimated log10(h) fields when K is set to zero.The corresponding fields for the model parameters are not substantially different from the previousresults, with the obvious exception of h (Figure 2.16). The field of h is markedly smoother than thosedisplayed in Figure 2.13. While the estimated values of h are unrealistically large in order to account forthe finite ACT scale of the large-scale driving, the values (ranging from a few hundred metres to a few km)have the correct order of magnitude. The greatest values of h occur in the Arabian Sea, where the windsare observed to have the longest lag ACTs [62]. This particularly long ACT likely reflects the monsoonalseasonal reversals of the wind in this region, which the model under consideration cannot account for asconstructed.412.5 Summary and conclusionsThe stochastic model of the near-surface atmospheric momentum budget presented in [59] was developedas a tool for the qualitative investigation of physical controls on the variability of sea surface winds.An assessment of its utility as a quantitative tool requires observationally-based estimates of modelparameters. In this study, we have applied the procedure of Crommelin and Vanden-Eijnden [11, 13, 15]to estimate the parameters of a stochastic differential equation describing sea surface wind variabilityusing long time series of 10m sea surface vector wind components. Although the data includes aspectsthat cannot be accounted for by the model under consideration (diurnal and annual nonstationarities,anisotropic vector wind autocorrelation function, and positive skewness in the along mean wind direction),meaningful estimates of model parameters were obtained. In particular, we have demonstrated thatalthough the parameter estimates from data obtained from a system with autocorrelated forcing lead tobiased autocorrelation functions, these biases can be understood in terms of the dynamics of the system.An important result of the process of estimating parameters of the stochastic boundary layer mo-mentum budget from sea surface wind observations is a better understanding of the limitations of thismodel. In particular, it is unable to account for the observed anisotropy in the vector wind autocorrelationstructure, and results in simulations with realistic ACTs only if unrealistic values of the layer thicknessare used. These model limitations can be addressed to some extent by considering a more realistic repre-sentation of the large-scale driving processes – particularly coherent structures like extratropical cyclonesand equatorial waves [62]. Such an extension of the model will be considered in future studies.In this analysis, we have addressed the issue of non-Markov structure in the time series by applyingthe estimator to a new time series made up of subsamples of the original process concatenated together.An alternative approach is to apply the estimator to each subsample and then average the resultingestimates. A preliminary investigation of this approach indicates that for the time series and model underconsideration, it results in estimates of the leading eigenmodes which are used in the variational analysisthat are essentially the same as the estimates from the first approach (with a relative difference of lessthan 10−3). The benefit of the second of these approaches is that it is more naturally applied to analysesin which the time series are broken down by season or time of day, to account for annual or diurnal cycles.A more general comparison of these two approaches to handling non-Markov structure in the time seriesis an interesting direction of future study.This analysis demonstrates that the Crommelin and Vanden-Eijnden estimation procedure is a power-ful tool for the estimation of model parameters, particularly when the estimation process can be informedby an understanding of the model dynamics. An important outstanding challenge remains the problem ofobtaining unbiased parameter estimates when the data are driven by noise which is autocorrelated in time.Moreover, there are certainly additional insights that could be gained by adapting the method to accountfor diurnal or annual cycles. However, this would require adapting or extending the parameter estimationprocedure for non-autonomous stochastic models, which would certainly be non-trivial. Consideration ofthese problems offer important directions of future study.42Chapter 3Stochastic averaging of systems withmultiple time scales forced with α-stablenoise3.1 IntroductionStochastic averaging (also referred to as “stochastic reduction” or in some cases, “stochastic homogeniza-tion”) refers to techniques whereby a stochastic dynamical system with multiple time scales is approxi-mated by a reduced dynamical system on a subset of slow variables sharing essential statistical propertiesof the slow variables of the original system. The presence of stochastic forcing in dynamical models canbe motivated by the existence of coarsely-resolved, fast chaotic processes that drive a slower process andjustified via calculation of dynamical entropies [40, 41]. In much the same way, stochastic averagingmethods allow the modelling of the behaviour in the weak sense (i.e. distributions, moments) of processesinfluenced by faster processes without having to explicitly model the dynamical details of the faster vari-ables. This averaging can reduce computational load and the resulting approximation is typically easierto analyze and interpret.Stochastic averaging methods have been discussed widely and we mention a few relevant referenceshere. Freidlin and Wentzell [25] provide a mathematically rigorous overview of fundamental stochasticaveraging procedures, such as those developed by Khas’minskii [43] and extended by Papanicolaou [69]and Borodin [4]. Majda, Timofeyev and Vanden-Eijnden developed a generalized stochastic averagingprocedure for quadratically nonlinear dynamical systems with slow resolved components and fast unre-solved components (MTV theory) [52–54]. Work by Givon et al. [27] and Pavilotis [70] cover stochasticaveraging and homogenization results obtained by perturbation analysis of the Backward Kolmogorovequation in the context of Gaussian noise.MTV theory was developed primarily in the context of weather-climate interaction which is a keyapplication for stochastic averaging theory. Variability in the atmosphere-ocean-ice-land surface systemoccurs over a range of time scales spanning many orders of magnitude [68]. Generally speaking, theslowly-evolving climate variables describe the state of the ocean, the land surface, ice, and atmosphericcomposition, while the fast weather variables describe the circulation and thermodynamic state of the43atmosphere [77]. In order to model longer-term properties of the climate system, evolving on time-scalesof years to centuries, it is undesirable to explicitly simulate meteorological processes (i.e. weather) whichevolve on much faster time-scales ranging from hours to days to weeks. However, variability of the climateand weather systems are mutually dependent.There are three established stochastic averaging approximations that are generalized in this paper. Werefer to them using the terminology introduced in [2, 63] so as to be consistent with previous literature.Given a stochastic dynamical system with Gaussian white noise forcing and variables that evolve on bothfast and slow time scales, the (A) approximation for the mean of the slow dynamics of the system isdetermined by replacing any functions of the fast variables with their means conditioned on the value(s)of the slow variable(s). This approximation accounts for the influence of functions of the fast variablesonly through their means, and otherwise neglects their variability. The (L) approximation improves uponthe (A) approximation by adding a correction in the form of an Ornstein-Uhlenbeck-like process withdynamics determined via an analysis of the full system about the mean of the slow variables. As (L)describes local variability around (A), it is unable to account for transitions between metastable states.Both the (A) and (L) approximations are derived in the work of Khas’minskii [43, 44]. Lastly, we considerthe (N+) approximation which represents the slow dynamics as a more general stochastic differentialequation (SDE) with potentially nonlinear and/or with state-dependent noise. The (N+) approximationwas inspired by the work of Hasselmann [33], with more rigorous analysis of the approximation in [45].This approximation was derived formally in [33] by considering an arbitrary point in the state spaceof the slow variables and performing a local analysis of the dispersion at that point to derive drift anddiffusion coefficients. Other studies offer derivations of the (N+) approximation via projection methodsand perturbation analysis of the Forward Kolmogorov equation [39, 40].Previous derivations of the (L) and (N+) approximations rely on the evaluation of the integratedautocovariance function of the fast process conditioned on the state of the slow variable in the presenceof Gaussian noise forcing. When the autocovariance function is not defined, if for example the secondmoment of the stochastic forcing does not exist, the (L) and (N+) approximations cannot be defined inthe usual way as given in [2, 63]. This paper focuses on averaging results for systems driven by processeswith increments that are distributed according to an α-stable law, for which the autocovariance is notdefined [24] (except for the case when α = 2). A stochastic averaging procedure for systems with multipletime scales and α-stable noise forcing is relevant for modelling problems with heavy-tailed statistics, asa shifted and normalized series of heavy-tailed random variables converges in distribution to an α-stablerandom variable [88]. Hence, they form a family of attracting distributions for independent, identically-distributed (i.i.d.) sums of heavy-tailed random variables, analogous to the Gaussian distribution as thelimiting distribution for i.i.d. sums of random variables with finite variance. An integrated α-stablenoise process (sometimes referred to as a “Le´vy flight” in the physics literature) is characterized by its“superdiffusive” behaviour whereby the observed variance in the trajectories grows superlinearly in time.The dynamics of atmospheric tracers have been shown to be superdiffusive in several contexts [37, 82].Similar dynamics have also been predicted to be generated by chaotic dispersion and shear flows [83].Processes such as these would be considered relatively fast atmospheric processes. In addition to theseresults, an analysis by Ditlevsen has suggested the presence of α-stable forcing on longer time scales ispresent in time series of calcium concentration from Greenland ice cores [19, 34] recording high latitude44climate variability during the most recent glacial period.An α-stable noise process is one for which the increments are α-stable random variables whose distri-bution we denote by Sα(β, σ). The distribution depends on three parameters: the stability index α ∈ (0, 2]which determines the rate of decay of the tails of the distribution, the skewness parameter β ∈ [−1, 1]which describes the asymmetry in the distribution, and the scale parameter σ > 0 which governs the“width” of the density function [1, 88]. The density function for an α-stable random variable cannot beexpressed in closed form, except for a small number of special cases. Instead, an α-stable random variable,L ∼DSα(β, σ), is typically characterized through its characteristic function (CF), ΦL(k),ΦL(k) = exp (−σα|k|αΞ(k;α, β)) (3.1)whereΞ(s;α, β) = 1− iβ sgn (s) tan(piα/2). (3.2)It is easy to see that α = 2 in (3.1) corresponds to the special case of a Gaussian random variable withvariance 2σ2 and mean 0. The parameter β is inconsequential in this case, since Ξ(k; 2, β) = 1 for allk, β. If α < 2, the second moment of the distribution does not exist. When α < 1, the first momentdoes not exist. For α-stable increments in continuous time denoted by dL(α,β)t , the scale parameter of theincrement is nonlinearly related to that of the infinitesimal timestep dt. Formally this relationship impliesthat σα ∝ dt and so the noisy increments dL(α,β)t ∼D Sα(β, (dt)1/α). If β = 0, the noise is symmetric.Otherwise, it is asymmetric.The mathematical problem of stochastic averaging of heavy-tailed processes for linear systems hasbeen investigated in [85] in the context of the Forward Kolmogorov equation (FKE). We use the moregeneral FKE terminology throughout the paper, noting that if the stochastic forcing is Gaussian, theequation is usually called the Fokker-Planck equation. More recent work by Chechkin and Pavlyukevich[7] and Li et al. [50] give Wong-Zakai type results for systems driven by a fast compound Poisson jumpprocesses. Results like those of Xu et al. [96] consider strong sense (i.e. pathwise) stochastic averagingof α-stable forced systems, determining reduced SDE models that approximate the full systems in apathwise sense. They consider a class of two-time scale systems in which the fast dynamics correspondsto an explicit time-dependence (on fast timescales) of the slow variable’s drift and diffusion. We considera much broader class of systems, but only in the weaker sense of convergence in distribution.We derive stochastic averaging approximations in the weak (distributional) sense for fast-slow stochas-tic dynamical systems with additive α-stable noise forcing, obtaining reduced dynamical approximationsfor the slow component of the full fast-slow system. As in [85], we analyze the FKE of the fast-slow sys-tems, but in contrast we consider both linear and nonlinear systems under appropriate transformationsto allow an analysis of the FKE in Fourier space. This approach allows us to consider a broader classof systems than has been investigated previously in the context of stochastic averaging of α-stable noiseprocesses. These systems are linear in the fast variable, while allowing for some nonlinear interactionsbetween slow and fast variables and fully nonlinear behaviour in the dynamics of the slow variables.The averaging approximations are conceptually similar to existing stochastic averaging methods used forsystems with Gaussian white noise [2, 63], but the approach used for the α-stable case is significantlydifferent from that used for the Gaussian case to determine the appropriate stochastic forcing for the45reduced dynamics. The analysis here is based on the joint CF of the fast component and a fluctuationabout the mean of the slow variable. For the (N+) approximations, Marcus’ stochastic calculus [7, 55]for jump processes is required if the approximation has a state-dependent “diffusion” term. Analogous toStratonovich calculus for Gaussian processes, the Marcus interpretation of stochastic integrals allows forthe rules of ordinary calculus, like the chain rule, to hold.In Section 3.2, we define the stochastic averaging problem and write the expression for the (A) ap-proximation in the case where the stochastic forcing is α-stable noise, rather than Gaussian. In Section3.2.2 and Section 3.2.3, we derive the analogous (L) and (N+) approximations for systems satisfying theappropriate conditions, demonstrating that the noise terms in the (N+) approximation are interpretedin the sense of Marcus. Since the Marcus calculus is less well-known, we include a brief discussion inAppendix A.2.1. In Section 3.3, we apply our approximations to one linear system and three distinctnonlinear systems, comparing the averaging approximations to the full systems. These numerical resultsshow agreement between the approximations and the full systems not only in probability distribution,but also in the autocodifference [88] which characterizes the memory of the process. Our conclusions areoutlined in Section 3.4.3.2 Stochastic averaging and α-stable noiseWe study the canonical problem of stochastic averaging of a fast-slow system with the stochastic forcinggiven by the increments of an α-stable Le´vy process, L(α,β)t ,dxt = f(xt, yt) dt, (3.3) dyt = g(xt, yt) dt+ γb dL(α,β)t , (3.4)where γ and b are constants and x0, y0 are known. The parameter is a small positive number (0 < 1)that characterizes the well-separated time scales between the slow variable xt and the fast variable yt.The variable yt is a stationary and ergodic stochastic process forced with an additive α-stable Le´vy noisedL(α,β)t with increments distributed as Sα(β, (dt)1/α). In this study, we restrict our analysis to the caseα ∈ (1, 2] with E[dL(α,β)t]= 0. The system (3.3, 3.4) includes the well-studied case of Gaussian noiseforcing, corresponding to α = 2. When α = 2, we replace dL(2,β)t with√2 dWt, where Wt is a standardWiener process and the factor of√2 follows from the definition of the α-stable CF (3.1). Since α-stablenoise processes are best analyzed through their CFs, we consider forms of f, g that are linear in yt,f(x, y) = f1(x) + −γf2(x)y, g(x, y) = γg1(x) + g2(x)y, (3.5)to make analysis in Fourier space straightforward. Specifically, the linear form allows us to give resultsin terms of closed form functions of the parameters. Some preliminary work for examples that includepolynomials of y indicates the limited availability of explicit analytical expressions. Within the form (3.5),we also assume g2(x) < 0 for all x to ensure contracting dynamics so that the distribution of y|x reachesconditional stationary behaviour on a fast time scale. For the (N+) approximation in Section 3.2.3, werequire also that f2(x) has the same sign for all x and is never equal to zero in order to define a particularmonotonic transformation.46The goal of stochastic averaging is to derive an approximation in the weak sense to the SDE of xtbased on the distributional properties of yt. We obtain three different stochastic averaging approximationsresulting in averaged deterministic, linear stochastic, and nonlinear stochastic models, as described below.3.2.1 (A) approximationThe (A) approximation is an approximation of xt in the mean and is therefore deterministic. It is givenby xt wheredxt = f(xt) dt, x0 = x0 (3.6)andf(x) = E(y|x) [f(x, y)] =∫ ∞−∞f(x, y)p(y|x) dy. (3.7)The notation E(y|x) [f(x, y)] indicates the expected value of f(x, y) conditioned on a fixed x and the termp(y|x) is the conditional density for yt for a fixed xt = x. For this approximation we have implicitlyassumed that f can be evaluated for xt, yt in (3.3, 3.4). Indeed, by our assumption that α ∈ (1, 2] andthe form of f, g given in (3.5) holds, then f is evaluated using (3.7) giving us E(y|x) [y] = −γg1(x)/g2(x)and thus,f(z) = f1(z)−f2(z)g1(z)g2(z). (3.8)For the Gaussian noise case, the (A) approximation has been studied in detail [2, 25, 43]. In the nextsection, we show that the (A) approximation holds in the case of α-stable noise forcing if α > 1 andγ > 1 − 1/α in the strict → 0 limit. For 0 < 1, a more accurate approximation called the (L)approximation emerges that includes corrections to the (A) approximation.3.2.2 (L) approximationThe (L) approximation is an Ornstein-Uhlenbeck-Le´vy process (OULP) obtained by linearizing about themean x as defined in (3.6). We first decompose the dynamics into a mean component and a fluctuation asin [63]. The properties of α-stable noise require analysis of CFs to determine the appropriate stochasticapproximation for the fluctuation.We define the fluctuation about the mean as ξt = (xt− xt) and derive its dynamics by substituting in(3.3) and using (3.6),dξt = [f(xt, yt)− f(xt)] dt (3.9)=[f(xt)− f(xt)]dt+ fˆ(xt, yt) dt, (3.10)where fˆ(z, y) = f(z, y) − f(z). Substituting xt = xt + ξt and keeping leading order contributions for|ξt| 1 in (3.10) yields the linear equation for ξt,dξt ' f ′(xt)ξt dt+ fˆ(xt, yt) dt. (3.11)To complete the (L) approximation, we find a stochastic approximation for fˆ dt that has its distributionalproperties on the t time scale. The variable vt captures the stochastic forcing in the ξ equation. Its47dynamics aredvt = fˆ(xt, yt) dt, v0 = 0, where fˆ(z, y) = f2(z)(−γy + g1(z)g2(z)), (3.12)and we have substituted (3.5) and (3.8) into (3.11) to obtain an explicit expression for fˆ . Rewriting (3.11)using (3.12), we see the forcing effect of the increments dvt on the dynamics of ξ,dξt = f′(xt)ξt dt+ dvt, ξ(0) = 0. (3.13)In the case α = 2, the autocovariance C(s;x) can be evaluated and the fluctuation dvt in (3.13) is replacedwith a Gaussian white noise forcing σ(x) dWt [2, 63], whereσ2(x) =∫ ∞−∞C(s;x) ds, C(s;x) = limt→∞E(y|x)[fˆ(x, yt+s)fˆ(x, yt)]. (3.14)This result is justified for C(s;x) having a δ-function behaviour similar to that of Gaussian white noisein the limit → 0 [63]. However, the autocovariance function C(s;x) of fˆ(x, y) is undefined when α < 2.While moments with order greater than or equal to α do not exist for α-stable distributions, the CFis defined for any α < 2. Thus, we analyze the process vt in terms of its CF in order to determine anappropriate stochastic forcing approximation for dvt when α < 2.The joint probability density function (PDF) p(y, v, t) for the system (3.4), (3.12) satisfies the fractionalFKE (also known as the Fokker-Planck equation if α = 2),∂p∂t = −∂∂v(fˆ(x, y)p)− 1∂∂y (g(x, y)p) +bα(1−γ)αD(α,β)y p,p(y, v, 0) = δ(v)δ(y − y0).(3.15)where D(α,β)y is the the Riesz-Feller derivative, a generalization of the Liouville-Riemann fractional deriva-tive [6], acting with respect to y associated with the α-stable noise process. This operator can be definedin terms of a Fourier transform, F [u] =∫R exp(ily)u(y) dy, and its inverse:D(α,β)y u = F−1[ −1cos(piα/2)(1− β2 (il)α + 1 + β2 (−il)α)F [u]](3.16)For 0 < 1, we initially treat x as a constant relative to the fast variables v and y. Let ψ be the CFfor the (y, v) system,ψ(l,m, t) = F [p] =∫∫R2exp(ily + imv)p(y, v, t) dydv. (3.17)Taking the Fourier transform (FT) of (3.15), we note that the noise terms are entering the dynamicsthrough y and not through v, so that the operator consists of the usual first order partial derivativesin y and v, and the fractional derivatives are in terms of y only. Then, using (3.16), we obtain the48corresponding partial differential equation for the CF,∂ψ∂t = imF[fˆ(x, y)p]+ il F [g(x, y)p]−bα|l|α(1−γ)αΞ(l;α, β)ψ,ψ(l,m, 0) = exp(ily0).(3.18)Substituting g and fˆ as in (3.5) and (3.12) into (3.18) yields∂ψ∂t =(f2γm+g2 l) ∂ψ∂l +[i g11−γ l + ig1f2g2m− bα|l|α(1−γ)αΞ(l;α, β)]ψ,ψ(l,m, 0) = exp(ily0)(3.19)where we have dropped the argument x from f1, f2, g1, g2. We derive the solution for ψ using themethod of characteristics. The characteristic curves for the variables t, l and m are given in terms of thecharacteristic variables r, l0, and m0,t(r) = r, l(r) = l0 exp(−g2 r)− 1−γ f2g2m0(1− exp(−g2 r)), m(r) = m0, (3.20)and ψ satsifiesdψdr =(if2g1g2m+ iγ−1g1l −bα(1−γ)α |l(r)|αΞ(l(r);α, β))ψ, ψ(0) = exp(il0y0) (3.21)where Ξ is given in (3.2). The solution of (3.21) is given byψ(l,m, t) = exp[iΛ(t)y0 − i(γ g1g2l + g1f2g22m)(1− exp(g2t/)) (3.22)− bα(1−γ)α∫ t0|Λ(r)|αΞ(Λ(r);α, β) dr],whereΛ(s) = l exp(g2 s)− 1−γmf2g2(1− exp(g2 s)). (3.23)We evaluate the integral term of (3.22) asymptotically for small as shown in Appendix A.1 giving theresult,∫ t0|Λ(r)|αΞ(Λ(r);α, β) dr (3.24)=( |l|α−αg2)Ξ(l;α, β)(1− exp(αg2t/)) +O(2−γ) for t = O(),|f2g−12 |α|m|αΞ(m;α, β∗)t+|l|α−αg2Ξ(l;α, β) +O(2−γ) for t = O(1).whereβ∗ = β sgn (f2) . (3.25)49Substituting (3.24) into (3.22) gives the following approximation for ψ,ψ ∼exp[iΛ(t)y0 − i(γ g1g2 l + g1f2g22m)(1− exp(g2t/))− bα|l|α(1−γ)α−1(−αg2)Ξ(l;α, β)(1− exp(αg2t/))]for t = O(),exp[−ilγ g1g2 − im(g1f2g22+ 1−γ f2g2 y0)− bα|l|α(1−γ)α−1(−αg2)Ξ(l;α, β)− bα(1−γ)α−1|f2/g2|α|m|αΞ(m;α, β∗)t]for t = O(1).(3.26)Since ψ has no terms that involve products of l and m, we can factor the approximate CF into componentsψ ≈ ψyψv that correspond to CFs of yt and vt. The CF for yt given in (3.4) and g given in (3.5), treatingxt as a constant is,ψy(l, t) = exp[ily0 exp (g2t/)− ilγg1g2(1− exp (g2t/)) (3.27)− bα|l|α(1−γ)α−1(−αg2)Ξ(l;α, β)(1− exp(αg2t/))].The asymptotic approximation for t = O(1) isψy(l, t) ∼ exp[−ilγ g1g2− bα|l|α(1−γ)α−1(−αg2)Ξ(l;α, β)]. (3.28)We then factor ψy as defined by (3.27) for t = O() and (3.28) for t = O(1) out of the approximateexpression for ψ given in (3.26) to get the asymptotic behaviour of ψv,ψv(m, t) ∼ exp[−im(1−γ f2g2y0 + g1f2g22)(1− exp(g2t/))]for t = O(), (3.29)ψv(m, t) ∼ exp[−im(1−γ f2g2y0 + g1f2g22)(3.30)− bα(1−γ)α−1 |f2g−12 |α|m|αΞ(m;α, β∗)t]for t = O(1).This factorization of the CF allows us to analyze the asymptotic approximation of the CF, ψv for theprocess vt. We compare (3.30) to the CF of an α-stable process, zt, t ≥ 0, with the SDE and associatedCF given bydzt = Σ dL(α,β)t , ψz(k) = exp [ikz0 − tΣα|k|αΞ(k;α, β)] , (3.31)where Σ is a constant and k is the Fourier variable. The functional structure with respect to the Fouriervariables m in (3.30) and k in (3.31) is identical, with the terms (α, β,Σ) in (3.31) corresponding to(α, β∗, (γ−1+1/α)b|f2/g2|) in (3.30). The coefficient of im in (3.29), (3.30) defines the mean of vt, which isnon-zero in general if y0 6= E(y|x) [y] given in (3.8). The mean of vt is constant on the t = O(1) time scaleas is evident by inspection of (3.30) and is thus inconsequential to our analysis since we are interested inthe increments of vt on this time scale rather than vt itself.Since convergence in CF implies convergence in distribution [24], the comparison of (3.30) and (3.31)indicates that on the O(1) time scale, the increments dvt are approximately distributed according to anα-stable law, dvt ∼ ρb|f2/g2| dL(α,β∗)t where ρ = γ − 1 + 1/α. We conclude that the (L) approximation50for the system (3.3 – 3.5) is given by xt + ξt where xt is given by (3.6) anddξt = f′(xt)ξt dt+ ρb∣∣∣∣f2(xt)g2(xt)∣∣∣∣ dL(α,β∗)t , ξ0 = 0 (3.32)where f is given in (3.8). We note that if ρ < 0 (i.e. γ < 1− 1/α), then the scale parameter of the noisediverges and the approximation (3.32) is undefined in the → 0 limit. If ρ > 0, then the scaling of thenoise approaches zero in this limit and ξt → 0 as t→∞, provided f ′(xt) < 0 for all xt. In this case, the(L) approximation is asymptotically equivalent to the (A) approximation (3.6) as → 0. For non-zero,but sufficiently small , the (L) approximation is an asymptotic approximation in the weak sense for thedynamics of xt for ρ > 0. This is also true for ρ < 0, not too small. However in this case, the dynamicsof xt become dominated by the noise and it becomes questionable that xt can be considered a slow processrelative to yt. For ρ = 0, the intensity of the noise in (3.32) does not depend on and the stochasticcorrections persist as the timescale separation is made arbitrarily small.The (L) approximation is useful not only when the system being approximated is itself linear, butalso when the system is near a deterministic attractor to which xt has converged. In this case, the (L)approximation is reasonable on finite time intervals that are shorter than the escape time of the localattractor. We compare simulations of the full system and the (L) approximation for both linear andnonlinear systems in Section 3.3.3.2.3 (N+) approximationThe (L) approximation is a good approximation for linear systems or for certain local approximations.However, for systems with nonlinear dynamics, the (L) approximation may be inadequate due to itsinability to model systems with multimodal stationary distributions or behaviour different from an OULP.In these cases, the (N+) approximation is more appropriate as it is a better approximation for suchnonlinear systems.Beginning with the original system (3.3, 3.4) with f, g given by (3.5), we express the dynamics of xtin terms of f and fˆ given in (3.7) and (3.12),dxt = f(xt) dt+ fˆ(xt, yt) dt. (3.33)As in the previous section, to complete the (N+) approximation, we find a stochastic approximationfor fˆ(xt, yt) dt that has its distributional properties on the t = O(1) time scale. We make a change ofcoordinates µt = T (xt) where T is a differentiable function satisfyingT ′(x)f2(x) = g2(x). (3.34)The transformation µt = T (xt) defined in (3.34) is invertible since g2(x) < 0 and f2(x) 6= 0 has thesame sign for all x. As we show below, this coordinate transformation, x → µ allows us to approximatethe increments fˆ(xt, yt) dt in the transformed coordinates as an additive α-stable noise process whenwe consider the CF in Fourier space. Then there is no ambiguity in the interpretation of these noiseterms, particularly when we invert the transformation T to obtain the approximation in the originalvariables. Our use of such a coordinate transformation (3.34) is similar to that in [7] where a fast-slow51system is transformed so that the influence of the fast Ornstein-Uhlenbeck process drives the slow variableadditively. In [7], this formulation allows for the unambiguous derivation of a reduced stochastic model,with a Stratonovich interpretation.The dynamics of µt can be determined using the change of variables formula [10],dµt = T ′(xt−) dxt +12T′′(xt−) d[x, x]ct + T (xt)− T (xt−)− T ′(xt)∆xt, (3.35)where xt− = lims→t− xs and ∆xt denotes the jump component of xt. Since the continuous part of thequadratic variation of x, d[x, x]ct ∝ (dt)2 → 0 and there are no jump components because xt is continuousfor all t, we getdµt = T ′(xt) dxt (3.36)= T ′(T −1(µt))[f(T −1(µt)) + fˆ(T −1(µt), yt)]dt (3.37)= T ′(T −1(µt))f(T −1(µt)) dt+[g1(T −1(µt)) + −γg2(T −1(µt))yt]dt. (3.38)We define dVt as the additional fluctuations about the first forcing term in (3.38), analogous to dvt in the(L) approximation (3.12),dVt =[g1(T −1(µt)) + −γg2(T −1(µt))yt]dt, V0 = 0. (3.39)Note that the dynamics in (3.39) are the same (up to a multiplicative constant) as the drift terms inthe equation for y. This correspondence between the V and y equations leads to a straightforwardfactorization of the joint CF in Fourier space, as we see below.Invoking the separation of time scales for small , we use a multiple scale approximation for the FKEfor the (y, V ) system treating µt as effectively constant relative to the fast time scales of yt and Vt,∂q∂t = −( ∂∂V + γ−1 ∂∂y)([g1 + −γg2y]q)+ bα(1−γ)αD(α,β)y q,q(y, V, 0) = δ(y − y0)δ(V )(3.40)where q = q(y, V, t) is the joint PDF for y, V and g1 and g2 are shorthand for g1(T −1(µt)) and g2(T −1(µt)),treated as constants. We take the FT of (3.40) and define the CF of (y, V ) to be φ = φ(l,m, t) whereφ(l,m, t) =∫∫R2exp(ily + imV )q(y, V, t) dV dy. (3.41)The resulting linear partial differential equation for φ is similar to (3.19),∂φ∂t =(g2γm+g2 l) ∂φ∂l +[i(g1m+g1l1−γ)− bα|l|α(1−γ)αΞ(l;α, β)]φ,φ(l,m, 0) = exp(ily0)(3.42)with Ξ(l;α, β) given in (3.2). Analogous to (3.19), (3.42) can be solved using the method of characteristics52giving usφ(l,m, t) = exp[iΓ(t)y0 − ig1g2(γl + m)(1− exp(g2t/)) (3.43)− bα(1−γ)α∫ t0|Γ(r)|αΞ(Γ(r);α, β) dr],whereΓ(s) = l exp(g2 s)− 1−γm(1− exp(g2 s)). (3.44)Using the same asymptotic results derived in Appendix A.1, we can approximate φ on the t = O() andO(1) time scales. Thus, we obtain an approximate form for the CF of the (y, V ) system for 0 < 1,φ ∼exp[iΓ(t)y0 − i(γl + m)g1g2 (1− exp(g2t/))− bα|l|αΞ(l;α,β)(1−γ)α−1(−αg2)(1− exp(αg2t/))]for t = O()exp[−iγ g1g2 l − i(1−γy0 + g1g2)m− bα|l|αΞ(l;α,β)(1−γ)α−1(−αg2)− bα(1−γ)α−1|m|αΞ(m;α,−β)t]for t = O(1).(3.45)Since φ has no products of l and m, we decompose the t = O(1) approximation in (3.45) into two factors:φ ∼ φyφV , whereφV (m, t) = exp(−i(1−γy0 + g1g2)m− αρbα|m|αΞ(m;α,−β)t)(3.46)and φy = ψy is the CF of y given in (3.28). As in the case of the (L) approximation, φV is identical inform to the CF of an α-stable process. By the same reasoning leading from (3.30) to (3.32) in the previoussection, we conclude that the dynamics of the perturbation Vt can be approximated on the t = O(1) timescale by an α-stable noise process,dVt = g1(T −1(µt)) dt+ −γg2(T −1(µt))yt dt ≈ ρb dL(α,−β)t . (3.47)Note that in our transformed system µt = T (xt), the coefficient of the Le´vy increment is state-independentand so the limiting noise is additive. Using (3.39) and (3.47) in (3.38), we obtain an approximate equationfor µt,dµt ≈ T ′(T −1(µt))f(T −1(µt)) dt− ρb dL(α,β)t , µ0 = T (x0). (3.48)Here we have chosen to express the stochastic increments ρb dL(α,−β)t as −ρb dL(α,β)t , which is equiv-alently distributed. To complete the approximation for xt on the slow time scale, we invert our initialtransformation xt → µt = T (xt) by taking µt → Xt = T −1(µt), similarly to [7] in the example of Gaussiannoise. The fact that the stochastic forcing on µt is additive allows for an unambiguous interpretation ofthe noise terms prior to performing the inverse transformation, and inverting the transformation thenprovides the appropriate interpretation of the noise terms for the approximation of xt. Since T −1 isa nonlinear transformation, the change of variables formula for jump processes [10] yields non-trivial53contributions when applied to Xt as follows,dXt = (T −1)′(µt−) dµt + T −1(µt)− T −1(µt−)− (T −1)′(µt−) ∆µt, (3.49)where X0 = x0, µt− = lims→t− µs, and ∆µt = µt − µt− = −ρb dL(α,β)t is the jump component of µt attime t. Using the expression for dµt (3.38), (3.49) can be written asdXt =1T ′(T −1(µt−))T ′(T −1(µt))f(T −1(µt)) dt+{T −1(µt)− T −1(µt−)}(3.50)=( T ′(Xt)T ′(Xt−))f(Xt) dt+{T −1(µt)− T −1(µt−)}. (3.51)where Xt− = lims→t− Xs. With probability one, the ratio T ′(Xt)/T ′(Xt−) is equal to one, as Xt is aca`dla`g process and hence the set of times where a jump occurs is a null-set with respect to the Lebesguemeasure [10]. Hence, we replace T ′(Xt)/T ′(Xt−) with 1 in (3.51). The difference T −1(µt) − T −1(µt−)can not be neglected since the time integral of these terms results in a sum that does not scale with thesize of the infinitesimal time step dt. To see this, we show that this difference is equal to the Marcusincrement [T ′(Xt)]−1 dL(α,β)t , by writing T −1(µt)−T −1(µt−) in terms of the processes, Θ(r; ∆Lt, µt−) =µt− − ρb∆Ltr and θ(r; ∆Lt, T −1(µt−)) = T −1(Θ(r; ∆Lt, µt−)), wheredθdr =dθdΘdΘdr = −ρb∆LtT ′(θ(r)) , θ(0; ∆Lt, T−1(µt−)) = T −1(µt−) = Xt− (3.52)and ∆Lt = dL(α,β)t is the size of the jump in the α-stable noise process L(α,β)t at time t. The variableθ(r; ∆Lt, µt−) is referred to as the Marcus map and it gives the integrated effect of the jump ∆Lt over theinfinitesimal time on which the jump occurs which begins at r = 0 and ends at r = 1 (additional detailsin Appendix A.2.1). The difference T −1(µt)−T −1(µt−) can then be written in terms of θ(r; ∆Lt, µt−) asfollows,µt = µt− − ρb∆Lt = Θ(1; ∆Lt, µt−) = T (θ(1; ∆Lt, T −1(µt−))) (3.53)⇒ T −1(µt)− T −1(µt−) = T −1(Θ(1; ∆Lt, µt−))−Xt− = θ(1; ∆Lt, Xt−)−Xt− (3.54)The difference θ(1; ∆Lt, Xt−) − Xt−, where θ satisfies (3.52), is the definition of the Marcus stochasticincrement −ρb[T ′(Xt)]−1 dL(α,β)t . Then using (3.54) and T ′ from (3.34) in (3.51), yields the weakapproximation Xt for the dynamics of xt,dXt = f(Xt) dt+ ρb( f2(Xt)−g2(Xt)) dL(α,β)t , X0 = x0. (3.55)We note that (3.55), which we refer to as the (N+) approximation is equivalent to the (L) approximation(3.32) when (3.3, 3.4) are a linear system of equations (i.e. f1, f2, g1, g2 are constant). Also note that thestochastic increment −ρb[f2(Xt)/g2(Xt)] dL(α,β)t can be written equivalently as ρb |f2(Xt)/(g2(Xt))| dL(α,β∗)t where β∗ is given in (3.25) with f2 = f2(Xt) since any negative sign in the scale parameter canbe absorbed into the skewness parameter. As in Section 3.2.2, if ρ > 0, then the (N+) approximationreduces to the (A) approximation (3.6) in the limit → 0, while the scale parameter diverges if ρ < 0 in54the same limit. In the case of small, non-zero , (3.55) provides a good approximation to xt when ρ > 0,but if ρ < 0 then the approximation may not be appropriate for the same reasons given in Section 3.2.2.If ρ = 0, the stochastic dynamics persist in the strict limit → 0.When α = 2, the Marcus interpretation of (3.55) reduces to the Stratonovich interpretation (seeAppendix A.2.1) and thus our approximation is consistent with the previously derived (N+) approximationin the case of Gaussian white noise stochastic forcing [2].In the next section we show that the (N+) approximation is generally superior to the (L) approximationfor nonlinear systems as would be expected. However, we note that the (N+) approximation requires anadditional constraint (i.e. sgn (f2(x)) is constant and not equal to zero, for all x) and its numericalimplementation is more complicated, often requiring simulation of the Marcus differential terms (SeeAppendix A.2.1) for which we do not have an efficient numerical method when the solution for θ from(3.52) cannot be expressed in closed form and β 6= 0. Depending on the problem under consideration, forpractical reasons the (N+) approximation may not be a preferable option over the (L) approximation.Finally, we note the interesting fact that the drift and diffusion coefficients of the dynamics in theapproximations are identical to those in the Gaussian case discussed in [63] up to a power of , regardlessof the value of α.3.3 Computational resultsWe apply the (L) and (N+) stochastic averaging approximations to four fast-slow dynamical systems withα-stable noise forcing and numerically simulate the systems to compare the stationary probability densityfunctions and autocodifference functions of the approximations to those of the corresponding full systems.One of the systems we simulate is linear while the other three include nonlinear functions f1, f2, g1 andg2. For this section, we set ρ = 0 (i.e. γ = 1 − 1/α). Our numerical scheme is outlined in AppendixA.2.1, and we have run additional simulations with even smaller simulation time steps to confirm thatthe simulations are well-resolved. All density plots in this paper were derived from simulations having thesame sampling time step Dt = 10−2 and number of data points, N = 108. The rationale for our choice oftime series length is given in Appendix A.2.2.3.3.1 Linear systemWe apply the (L) approximation from Section 3.2.2 to a linear system inspired by an example in [63],dxt =(−xt + −γayt)dt, x0 ∈ R (3.56) dyt = (γcxt − yt) dt+ γb dL(α,β)t , y0 = 0 (3.57)where 0 < 1 and (1 − ac) > 0. Using (3.6), (3.32), we write the (L) approximation for xt in (3.56,3.57). The (A) approximation, x, as in (3.6) isdxt = −(1− ac)xt dt, x0 = x0, ⇒ xt = x0 exp[−(1− ac)t]. (3.58)55Using (3.32) and (3.58), the (L) approximation for xt is xt + ξt where xt satisfies (3.58) and ξt satisfiesdξt = −(1− ac)ξt dt+ |a|b dL(α,βa)t , ξ0 = 0, (3.59)where βa = sgn (a)β. In the long-time limit t → ∞ (i.e. at stationarity) or in the case that x0 = 0, the(N+) approximation of xt is identical to the (L) approximation since (3.56, 3.57) is linear. We set x0 = 0for all the numerical simulations of (3.56, 3.57) and (3.59), and hence xt = 0.In Fig. 3.1, we show example time series and the corresponding numerically simulated PDF of xtand ξt with dynamics given by (3.56, 3.57) and (3.59) where α = 2 and α < 2. In Fig. 3.2, we displayadditional numerically simulated PDFs of xt and ξt with different values of α < 2. We also considercases with asymmetric noise (i.e. β 6= 0) in Fig. 3.3. Comparing the numerically generated densities ofξt (3.59) with those of xt from (3.56, 3.57) we see that the density obtained by the (L) approximation isalmost indistinguishable from the density of the full system. In these figures, we also plot the numericallycomputed distribution as determined using the STBLPDF command in the STBL package for MATLAB[90]. This code computes the PDF by numerically evaluating an integral representation of the α-stabledistribution as given in Theorem 1 of [64]. The densities estimated from numerically simulated timeseries are in excellent agreement with the numerically computed densities as determined from the integralrepresentation.We also compare the autocodifference functions (ACDs) [88] of the simulated trajectories for the fulland averaged systems. This quantity is the generalization of the autocovariance function to variablesthat may not have a finite variance. In the case where α = 2, the autocovariance function is a standardmeasure of the degree of linear dependence between time-lagged states of a time series. However, whenα < 2, the autocovariance function is not defined and so instead we must use the analogous ACD to get asense of this linear dependence. The ACD for a continuous-time stochastic process zt is defined in termsof the CF of zt,Az(τ) = log [E [exp(i(zt+τ − zt))]]− log [E [exp(izt+τ )]]− log [E [exp(−izt)]] , (3.60)Marcu where τ is the lag time. In the case of ξt given by (3.59), the ACD can be explicity evaluated,Aξ(τ) =(ab)αα(1− ac) [1 + exp(−α(1− ac)τ)− |1− exp(−(1− ac)τ)|α] (3.61)− iβ tan(piα2) (ab)αα(1− ac) [(1− exp(−α(1− ac)τ))− |1− exp(−(1− ac)τ)|α] .We note that when β = 0, the imaginary part of the ACD is equal to zero and (3.61) reduces to theautcodifference function given in [88]. If α = 2, then Aξ, is identical to the autocovariance function ofξ. The existence of a nonzero imaginary part of the ACD for β 6= 0 is a consequence of definition (3.60).It has no obvious interpretation in terms of the standard autocovariance function or the dynamics of thecorresponding process.We estimate the ACD for the sample trajectories used to generate the density estimates given in Fig.3.2 and offer a brief summary of the estimation method in Appendix A.2.3. The results are displayed inFig. 3.4. We see that for 1, there is virtually no difference between the estimated ACDs of the full560 20 40 60 80−0.500.5tx(t)−0.5 0 0.510−510−310−1xp(x)0 20 40 60 80−0.500.51tx(t)−5 0 510−510−310−1xp(x)Figure 3.1: A comparison of the time series and simulated marginal distributions of the fast-slowsystem (3.56, 3.57) and the stochastically averaged system (3.59). Top-Left: A sample of thetime series for the slow variable of both systems: xt (solid blue) and xt + ξt (dashed green).Top-Right: Normalized histogram-based estimates for the density for the slow variable. Blueline: full system. Green crosses: (L) approximation. Black squares: Numerically evaluatedprobability density function for ξ from the (L) approximation (3.59) using the MATLABpackage STBL as described in Section 3.3. Parameters for this simulation are (a = 0.2, b =0.7, c = 1, = 0.01, α = 2.0, x0 = 0). The bottom panels are as for the top, but withα-stable noise where α = 1.9, β = 0.and averaged systems, and both are comparable to the analytically derived result (3.61). For = 0.1,small differences between the ACDs of the full and averaged systems are apparent but not unexpectedgiven the relatively large value of which is on the edge of the asymptotic regime.We also consider the case of skewed noise (not shown) for 1 and see that the modulus of theestimated ACDs of the full and reduced systems are indistinguishable for various values of β and = 0.01with some slight differences between the full and averaged systems for = 0.1. Similar agreement isobserved for the real and imaginary components of the ACD compared separately. From these results,we see that (3.59) is statistically similar to xt of (3.56, 3.57) not only in stationary distribution, but alsoin autocodifference.3.3.2 Nonlinear system 1: bilinear slow dynamicsThe first nonlinear system we study is a fast-slow system with a bilinear term in the slow equation:dxt =(c− xt + −γxtyt)dt, x0 > 0, (3.62)dyt = −yta dt+b1/α dL(α,0)t , y0 = 0. (3.63)57−10 −5 0 5 1010−510−310−1xp(x)−50 0 5010−510−310−1xp(x)−100 −50 0 50 10010−510−310−1xp(x)Figure 3.2: Normalized histogram-based estimates of the PDF with symmetric forcing (β = 0)of xt from (3.56, 3.57) (blue line) and the (L) approximation ξt of (3.59) (green crosses).Left/Middle: As in Fig. 3.1 with α = 1.9/1.4. Right: As in Fig. 3.1, but with a = 0.7, b =2, α = 1.7 and = 0.1. The numerically evaluated density using the MATLAB package STBLfor ξ in (3.59) is shown by black squares.−10 −5 0 5 1010−510−310−1xp(x)0 5 1010−310−1xp(x)−10 −5 0 5 1010−510−310−1xp(x)Figure 3.3: Normalized histogram-based estimates of the PDF with β 6= 0 of xt from (3.56, 3.57)(blue line). Also plotted is the histogram for ξt of (3.59) (green crosses). For all figures:a = 0.2, b = 0.7, c = 1, α = 1.7. Left/Middle/Right: = 0.01/0.01/0.1, β = −0.5/1/0.75.The numerically evaluated density using the MATLAB package STBL for ξ in (3.59) is shownin black squares.where c, a > 0. The positive initial condition x0 > 0 and the dynamical form of the slow equation restrictthe dynamics of xt to positive values since the increments dxt/dt → c > 0 as xt → 0+. We compute the(N+) approximation, Xt, using (3.55),dXt = (c−Xt) dt+ abXt dL(α,0)t , X0 = x0. (3.64)For the sake of comparison, we also give the (L) approximation using (3.32),xt ≈ xt + ξt wheredξt = −ξt dt+ ab|xt| dL(α,0)t , ξ0 = x0 − cxt = c+ (x0 − c) exp(−t).(3.65)We compare the densities of the full system (3.62, 3.63) and the approximations (3.64) and (3.65) vianumerical simulation in Fig. 3.5 demonstrating the expected superiority of the (N+) approximation overthe (L) approximation. The simulation methods are described in Appendix A.2.1, (Ito¯ for (3.62, 3.63)and (3.65) and Marcus for (3.64)). Particularly noteworthy is that the (N+) approximation, like the full580 1 2 3 410−210−1τ 0 1 2 3 410−210−1τ 0 1 2 3 4100.4100.8τFigure 3.4: Logarithmic plot of the numerically estimated ACD for xt of (3.56, 3.57) (blue line) andξt of (3.59) (green crosses) where β = 0. The analytically derived autocodifference Aξ(τ) in(3.61) is also displayed (black squares). The parameters for the left, middle and right panelscorrespond to those from Fig. 3.2.system, does not cross the x = 0 boundary. Given the multiplicative nature of the noise of the (N+)approximation, it is clear that the positivity of the slow process is preserved in the approximation, as thedynamics of Xt (3.9) satisfy dXt/dt → c > 0 for Xt → 0. The dynamics of xt in the full system (3.62,3.63) also satisfy the same positivity constraint, as shown above. In contrast, since the (L) approximationis linear drift with additive noise, it is possible for the (L) approximation to cross zero. However, eventhough the (L) approximation does cross x = 0, it is a reasonable approximation of the full system nearxt (the mean of xt), and thus may be suitable for local approximations. Although we have only presentedresults for β = 0, the approximations for β 6= 0 have similar quality (see Appendix A.2.1 for a discussionon the numerical method).−5 0 510−310−1xp(x)−4 −2 0 2 410−510−310−1xp(x)0.5 1 1.510−510−310−1xp(x)Figure 3.5: Normalized histogram-based estimates of the PDF of xt for the full nonlinear system(3.62, 3.63) (blue line), the (L) approximation xt + ξt (3.65) (green crosses) and the (N+)approximation Xt (3.64) (red circles). For all figures: a = 1, b = 0.1, c = 1, = 0.01 andx0 = c. Left/Middle/Right: α = 1.7/1.9/2.Figure 3.6 shows that the ACD of the (L) approximation, which exhibits exponential decay, differsfrom that of the (N+) approximation and the full system. The slope of the ACD for the (L) approximationfor small time lags is shallower than that of (N+), indicating less rapid ”memory loss” over short timeintervals. However, for longer times, the (L) approximation appears to have more rapid memory loss thanboth xt and Xt of the full and (N+) approximation systems. The autocodifference of the (N+) approx-imation is very similar to that of xt of the full system, indicating that Xt is an excellent approximationto xt in both stationary PDF and ACD.For brevity we do not give figures of the ACDs for the remaining examples as they give essentially the59same result as those in Fig. 3.6; that the ACDs of the (N+) approximation and the full system correspondvery well for sufficiently small , while the ACD of the (L) approximation differs.0 1 2 3 410−1100τ 0 1 2 3 410−1100τ 0 1 2 3 410−1100τFigure 3.6: Logarithmic plot of the estimated ACD for xt of the full nonlinear system (3.62, 3.63)(blue line). Also plotted are the real parts of the autocodifferences of the (L) approximation,xt + ξt, (3.65) (green crosses) and the (N+) approximation, Xt, (3.64) (red circles). Theparameters for the 3 panels correspond to those from Fig. 3.5.3.3.3 Nonlinear system 2: state-dependent reversionFor the second nonlinear system we consider, there is a nonlinear term in the equation for the fast variableyt,dxt =(−xt + −γyt)dt, x0 ∈ R, (3.66)dyt = −(1 + |xt|) yt dt+b1/α dL(α,0)t , y0 = 0. (3.67)For larger values of |x|, there is stronger reversion to zero of the fast variable yt. Computing the (N+)approximation using (3.55) and comparing to the (L) approximation (3.32) gives(N+): xt ≈ Xt where dXt = −Xt dt+b(1 + |Xt|) dL(α,0)t , X0 = x0, (3.68)(L): xt ≈ xt + ξt wherext = x0 exp(−t),dξt = −ξt dt+ b(1+|xt|) dL(α,0)t , ξ0 = 0.(3.69)Figure 3.7 shows the estimates of the stationary PDFs for (3.66, 3.67) and the approximations (3.68)and (3.69) obtained by simulation. Once again, we see that the (N+) approximation gives an excellentapproximation to the density of the full system. By comparison, the (L) approximation is poor as it doesnot capture the bimodal nature of the full system in the given parameter regimes and overestimates theprobability mass in the tails of the distribution. In failing to capture the bimodal nature of (3.66, 3.67),the (L) approximation does not perform well even as a local approximation of the full system near themean, x = 0. The (N+) approximation accurately captures both the tails and the bimodal nature of thefull system.60−5 0 510−310−1xp(x)−1 0 1−5 0 510−510−310−1xp(x)−1 0 1−1 0 110−1xp(x)−1 0 1Figure 3.7: Normalized histogram-based estimates of the PDF of xt for the full system (3.66, 3.67)(blue line), the (L) approximation xt+ξt (3.69) (green crosses), and the (N+) approximation,Xt, (3.68) (red circles). For all plots, b = 2 and x0 = 0. Left: = 0.1, α = 1.6. Middle: = 0.01, α = 1.9. Right: = 0.01, α = 2. (Insets are on a linear scale)3.3.4 Nonlinear system 3: strong slow nonlinearityIn the third nonlinear model, the drift term of the slow variable contains a cubic term in xt,dxt =(−xt − x3t + −γyt)dt, x0 = 0, (3.70)dyt = −yta dt+b1/α dL(α,0)t , y0 = 0. (3.71)where a > 0. The fast variable which is an OULP, appears linearly in the slow equation. For xt near 0,the system can be approximated as a multivariate OULP as the cubic term in (3.70) is small. As before,we compute the (N+) approximation (3.55) and compare it to the (L) approximation (3.32),(N+): xt ≈ Xt where dXt = (−Xt −X3t ) dt+ ab dL(α,0)t , X0 = x0, (3.72)(L): xt ≈ xt + ξt wherext = x0√x20(e2t−1)+e2t,dξt = −ξt dt+ ab dL(α,0)t , ξ0 = 0.(3.73)From the results in Fig. 3.8, we see that again the (N+) approximation captures the full nonlinearity ofthe slow dynamics (3.70) while the (L) approximation effectively gives a linearization of the drift dynamicsnear the deterministic equilibrium of (3.70) when yt is at its mean value of 0. Both approximations do wellnear the peak of the distribution at x = 0, but the (L) approximation deviates from the full system for|x| large, so that the OULP approximation is not appropriate. The (N+) approximation, by comparison,is once again a good approximation over the whole domain.3.4 ConclusionsIn this paper, we determine the (L) and (N+) appproximations for fast-slow sytems forced with α-stablenoise. These are reduced systems providing weak approximations for the slow variables, analogous tothe established (L) and (N+) approximation for Gaussian noise forcing. Previous results for Gaussiannoise have been based on the analysis of autocovariance [2, 63], while other analyses have used thecharacteristic functions obtained from the Fourier transform of the FKE for linear fast-slow systems [85].61−5 0 510−510−310−1xp(x)−4 −2 0 2 410−510−310−1xp(x)−1 0 110−510−310−1xp(x)Figure 3.8: Normalized histogram-based estimates of the PDF of the xt - variable for (3.70, 3.71)(blue line), xt + ξt for the system (3.73) (green crosses), and Xt for (3.72) (red circles). Forall figures: σ = 0.3, a = 1, = 0.01, x0 = 0. Left/Middle/Right: α = 1.7/1.9/2.0.Here we consider nonlinear systems, and since the autocovariance function is not available for systemswith α-stable noise, we use the characteristic functions for appropriately transformed variables as thebasis for the approximations. We restrict our attention to nonlinear models which are linear in the fastvariable, as the Fourier analysis necessary for determining the CF is straightforward in this case.For the (L) and (N+) approximations that we derive, we find that the drift and diffusion coefficientsof the dynamics have the same form up to a power of as the coefficients obtained in the (L) and (N+)approximations in the Gaussian case. The difference is in the treatment of the noise; obviously, for systemswith α-stable noise, the (L) and (N+) approximations also involve α-stable noise. In addition, the generalresult for the (N+) approximation requires a Marcus interpretation for the α-stable case, while in theGaussian case the interpretation of the noise terms is Stratonovich. For α = 2, corresponding to Gaussiannoise with no jumps, the Marcus interpretation reduces to the Stratonovich, demonstrating consistency.The derivation of the stochastic averaging methods presented here depends on the scaling relationshipswith regard to the timescale separation parameter . In particular, the value of the exponent γ in thedynamical equations of the unreduced system,dxt = f1(xt) dt+ −γf2(xt)yt dt, dyt = γg1(xt) + g2(xt)yt dt+ γb dL(α,β)t ,is critically important. If γ < 1 − 1/α, the stochastic terms of the (L) and (N+) approximations, whichhave the coefficients proportional to γ−(1−1/α), diverge in the formal → 0 limit. If γ > 1 − 1/α, the(L) and (N+) approximations are asymptotically equivalent to the (A) approximation as → 0, whichdoes not include any stochastic effects. For non-zero = o(1) and γ > 1 − 1/α, the (L) and (N+)approximations are asymptotic approximations in the weak sense for the dynamics of xt. While we alsohave this approximation for γ < 1− 1/α, the dynamics of xt become dominated by the noise in this caseand it becomes questionable that xt can be considered a slow process relative to yt. For the critical valueγ = 1− 1/α, the intensity of the noise in the (L) and (N+) approximations does not depend on and thestochastic corrections persist as the timescale separation parameter is made arbitrarily small.We assumed that g2(x) < 0 is O(1) to ensure contracting dynamics so that the distribution of yconditioned on x reaches stationary behaviour for y on a fast time scale. We expect that other systems62with g2(x) > 0 for some x also have this property, but we have not considered them here in order toconcentrate on developing the approach for deriving the reduced system, rather than on confirming thatconditionally stationary behaviour of y is realized on a fast time scale. In addition, we expect that cautionis required in the cases where the distribution of y conditioned on x is multimodal (e.g. multistability fory in the absence of noise).We compare numerical results for several fast-slow systems (both linear and nonlinear) and theircorresponding (L) and (N+) approximations for the slow variables, computing both the long time prob-ability density functions and the autocodifference functions (analogous to the autocovariance functionsfor α-stable processes). We observe good agreement between the full system and the approximationswith some expected limitations. The (L) approximation gives a good approximation for linear fast-slowsystems as well as a local approximation near the mean when a linear approximation is valid there. The(N+) approximation captures the nonlinear behaviour of both the long time probability density functionsand the autocodifference functions, showing good agreement with the full system over all values for theslow variable. However, the (N+) approximation is more difficult to simulate, primarily because of theMarcus stochastic terms. Considerations of numerical stability of nonlinear systems may also be an issue,requiring higher order methods for accuracy, as discussed in Appendix A.2.1. The choice to use the (L)or (N+) approximation depends on the balance of accuracy, range of values where the approximation isneeded, and the amount of analytical and computational work required.Our results suggest a number of valuable areas for future research. In our analysis, we required thatα > 1 so that the mean of the fast process is defined. We show numerical simulations in Fig. 3.9 thatindicate cases where the averaging approximations are reasonable for α < 1. This suggests that the (L) and(N+) approximations in the case of α < 1 would be of interest for future study. The analysis of that casewould likely require the use of a location parameter instead of the mean. The development of a stochasticreduction process from alternative perspectives (e.g. operator theory) would require substantial workbeyond the scope of this paper, given the lack of higher order moments for the processes considered here.A comparison with such approaches would be valuable, and we leave it for future work. Another area ofextension would be the consideration of systems with more general nonlinearities, including nonlinearitiesin the fast variables. In that case an explicit analytical result may not be available via Fourier analysis,but we expect that the densities for some systems that are nonlinear in y are similar in character tothose from systems that are linear in y. In Fig. 3.9, right panel, we show an example that is quadraticin y, demonstrating that the numerical approximation for the density is similar in character to thatof our nonlinear system 1 with bilinear form. It may then be possible to provide a (semi)-analyticalexpression for the reduced system, but the nonlinearities would require methods beyond those used inthe present study. Furthermore, in order to consider (N+) approximations for systems with asymmetricα-stable noise where the Marcus map cannot be expressed in closed form, a numerical method needs tobe developed. A computationally expensive approach would be simply to treat every stochastic incrementas a jump and use the numerical method outlined in Appendix A.2.1, but ideally one would prefer tohave a more efficient method. Extending the analysis to systems where the fast and slow subsystems aremulti-dimensional, rather than scalar is another natural direction for our research, although this may notbe a trivial endeavour. A detailed comparison with strong pathwise approximations, such as [96], couldprovide an interesting study in which to explore weak convergence rates and the connection between the63two distinct types of averaging.−100 −50 0 50 10010−310−1xp(x)−10 0 1010−310−1xp(x)−1 0 11 2 3 4 510−310−1xp(x) ζ = 0ζ = 0.05ζ = 0.1Figure 3.9: Left & Middle: Normalized histogram-based estimates of the PDF with symmetric α-stable forcing for α < 1, which is not explicitly covered in our analysis. (Left: As in Fig. 3.1with (a, b, c, , α) = (0.2, 0.7, 1, 10−2, 0.7). Middle: As in Fig. 3.7 with (, α) = (10−2, 0.8)).Right: Estimates of the PDF for xt where dxt = (c− xt + −γxtyt [1 + ζyt]) dt and yt satisfies(3.63), (a, b, c, , α) = (1, 0.1, 1, 10−2, 1.7). The values of ζ are indicated in the legend.64Chapter 4The appearance of α-stable dynamics viafast processes forced with correlatedadditive and multiplicative noise4.1 IntroductionStochastic differential equations with multiplicative noise have many applications to modelling situationswhere random fluctuations of the dynamical system depend on the current state of the system. A primemotivating example is given by a discussion of how the dynamics of various projections of fluid mechanicalmodels (e.g. Principal components, Fourier modes) can be described by systems that are quadraticallynonlinear [52]. Further work shows that these systems can be further approximated by stochastic processesthat can be written as a linear stochastic differential equation where the stochastic forcing is givenby correlated additive and multiplicative (CAM) Gaussian white noise [79]. Other problems such asautocatalytic chemical reaction modelling as described in [81] and noise-induced phase transitions [36]also contain processes that can be modelled using linear systems with multiplicative noise driving.The process yt having linear drift and forced with correlated additive and multiplicative Gaussianwhite noise is defined in [73, 79] and is referred to as a linear CAM noise process:dyt =(L+ E22)yt dt+ (Eyt + g) dW1,t + b dW2,t, t ≥ 0 (4.1)where E, g, b, and L < 0 are real constants and dW1,t, dW2,t are independent Gaussian white noiseprocesses (E [dWp,t] = 0, E [dWp,s dWq,t] = 1p=qδ(t − s)dt for p, q = 1, 2). The initial condition y0 isknown. Although there is no correlation between the additive noise terms and the multiplicative noiseterms when g = 0, we still refer to yt as a linear CAM noise process in this case. In certain parameterregimes, the stationary distribution for yt has infinite variance (i.e. infinite second moment).The fact that infinite variance processes can arise from simple multiplicative noise dynamics like (4.1)suggest that multiplicative dynamics could be a potential cause for the appearance of α-stable noiseforcing in the long-term climate record [19]. The work in [73] demonstrates that the linear CAM noiseprocesses are not equivalent to α-stable noise processes, as the statistics of the two are distinctly different.65However, while linear CAM noise processes are not strictly equivalent to α-stable noise processes, theymay behave very similarly in certain limits. We are guided to this conclusion by the Generalized CentralLimit Theorem (GCLT), which states that an appropriately scaled and shifted sum of random variableswith infinite variance converges weakly to an α-stable random variable [24]. In this paper, we explore andquantify those limits and offer simplified dynamics for systems experiencing forcing by a fast linear CAMnoise process. In particular, we study the stochastic dynamical system xt driven by a fast linear CAMnoise process, yt/, where xt satisfiesdxt = f1(xt) dt+ −ρf2(xt)yt/ dt, t ≥ 0, (4.2)where 0 < 1, f1, f2 are functions with f2(x˜) 6= 0 for any x˜ in the domain of xt, ρ is a constant, andx0 is the known initial condition.Our goal is to derive a one-dimensional stochastic dynamical system for a variable Xt that has ap-proximately the same weak properties as xt (i.e. the same distribution and temporal characteristics).In Section 4.2, we give some properties of the linear CAM process yt and discuss its relationship withα-stable distributions through the GCLT. In Section 4.3, we derive the stochastic averaging approxima-tion for (4.1), (4.2) by approximating yt/ with an Ornstein-Uhlenbeck-Le´vy process for which we havepreviously established (L) and (N+) stochastic averaging approximations in Chapter 3. We focus on the(N+) approximation (3.55), giving us a one-dimensional process driven by α-stable white noise forcing.As a minimum requirement, the characteristic time scale of yt/ must be at least an order of magnitudefaster than xt for the approximation by Xt to be valid. However the exact time scale separation neededdepends on the parameters of the CAM noise process, as the rate of convergence to the limiting α-stabledistribution depends on the tail behaviour of the random variables being summed. In Section 4.4, we ap-ply our approximation to one linear system and two nonlinear systems of the form (4.1), (4.2), illustratingthe validity of the approximation. We outline our conclusions in Section 4.5.4.2 Properties of the linear CAM noise process and theirrelationship to α-stable distributionsThe univariate linear CAM noise process yt having dynamics given by (4.1) is the principal subjectof study in [73, 79]. Despite its relatively simple appearance, the process yt can display a range ofdifferent behaviours. While it has the behaviour of an Ornstein-Uhlenbeck process (OUP) as E → 0 anda Geometric Brownian motion when E/g, E/b → ∞, we are concerned with the process in parameterregimes in which the dynamics differ considerably from these limiting cases. In particular, we are interestedin the case in which processes described by (4.1) have stationary statistics but do not possess a finitevariance. Some example time series for yt are plotted in Figure 4.1 to illustrate the behaviours that thisprocess can exhibit in this parameter regime. In this section, we state some basic properties of the CAMnoise process (many of which are derived in [73, 79]), state our parameter domain of interest, and discussthe relationship between the CAM noise distribution and the α-stable distribution within this parameterdomain.660 20 40 60 80 100−10−50510ty t0 20 40 60 80 100−100102030ty tFigure 4.1: Sample time series of yt. Left: (L,E, g, b) = (−1, 1.0541, 0, 1) corresponding to(α∗, β∗) = (1.8, 0). Right: (L,E, g, b) = (−1, 1.118, 0.6, 0.4) corresponding to (α∗, β∗) =(1.6, 0.89)4.2.1 Properties of the linear CAM noise process, ytThe time-dependent probability density function (PDF) of yt is denoted in this work by u(y, t). TheForward Kolmogorov equation (FKE) for the process yt associated with the SDE (4.1) is given by∂u∂t = Au, Au = −(L+ E22) ∂∂y (yu) +12∂2∂y2([(Ey + g)2 + b2]u). (4.3)Many of the basic properties of yt can be determined by studying the FKE (4.3) directly.Stationary PDFBy solving the time-independent FKE, Au = 0, one can determine the stationary PDF (equivalently, theeigenfunction for the operator A with corresponding eigenvalue 0), which we denote as u(0). This functionis given in [73]:u(0)(y) = 1N[(Ey + g)2 + b2]−(ν+1) exp(2gνb arctan(Ey + gb)), ν = −( LE2 +12). (4.4)In (4.4), N is the normalization constant such that∫R u(0)(y) dy = 1, shown in [73] to satisfyN = 1b2ν+1E∫ pi/2−pi/2exp (2gνξ/b)(1 + tan2(ξ))ν dξ =2pi(2b)−(2ν+1)Γ(2ν + 1)EΓ(ν + 1− i2gνb )Γ(ν + 1 + i2gνb )(4.5)where Γ is the complex Gamma function. Additionally, one can determine the mean of the distributionu(0) by evaluating∫yu(0)(y) dy. Multiplying the equation Au = 0 by y and integrating over all values ofy ∈ R, we obtain (L+ E22)∫Ryu(0)(y) dy = 0. (4.6)The stationary mean of yt is therefore equal to zero. We note that the integral in (4.6) can only beevaluated if ν > 0. If ν ≤ 0, the mean of yt does not exist. For the remainder of this chapter, we assumethat ν > 0.67The distribution of yt can have heavy tailsIt is straightforward to show that the stationary PDF of yt has tails which decay according to a powerlaw, making them heavy-tailed [73]. This result follows from taking the limits y → ∞ and y → −∞ ofu(0) (4.4), giving usu(0)(y) ∼ h(sgn (y))|y|2(ν+1) , where h(s) =exp(pigνb s)NE2(ν+1) as |y| → ∞. (4.7)While the fact that u(0) is heavy-tailed does not necessarily imply that it has infinite variance, it doesimply that moments of u(0) beyond a certain order do not exist. Under the finite mean restriction, ν > 0,we can see that the tails decay as |y|p where p < −2. A finite variance for the stationary distribution doesnot exist for 2(ν + 1) ≤ 3. The infinite variance case is the one which relates to non-Gaussian α-stableprocesses and so we restrict ourselves to the parameter space of the linear CAM noise process where0 < ν < 12 . We ignore the singular case ν = 12 as the convergence results quoted in Section 4.2.2 and theremainder of this chapter do not apply.Serial dependence of ytWe refer to the dependence of yt on ys, s ≤ t as the serial dependence of yt (or, less formally, the memoryof yt). The serial dependence of yt is difficult to study analytically due to the multiplicitive noise termwithin its dynamics. As a result of the multiplicative noise terms in the dynamics of yt, the spectrum ofthe FKE operator, A, has both discrete and continuous components [81], making analysis of the temporalcharacter of yt significantly more challenging than if A had only a discrete spectrum. The memoryof continuous-time stochastic processes with finite variance can be characterized by the autocovariancefunction which is equal to the variance of the process at lag time 0. However, since the distribution ofyt has infinite variance, it is not possible to define an autocovariance (or autocorrelation) function forit. Thus, we consider an alternative characterization of serial dependence known as the autocodifferencefunction (ACD function), which is defined in terms of characteristic functions of the process and time-shifted versions of it [88, 95]. Like the autocovariance function for the OUP, the ACD function forthe Ornstein-Uhlenbeck-Le´vy process (OULP), is an exponentially decaying function of lag time [88].Numerical estimates of the ACD function for the linear CAM process yt suggest a more complicatedserial dependence relationship between subsequent measurements of yt. As we show below in Figure4.2 via numerical calculation (using the method outlined in the Appendix A.2.3), the multiplicative noiseterm in the dynamics of yt complicates its autocodifference structure, giving deviations from exponentiallydecaying behaviour for short lag times.4.2.2 Infinite variance distributions and their relationship to α-stable distributionsThe GCLT states that a scaled and shifted sum of independent, identically distributed (iid) randomvariables, Rj with density uR(r) ∝ |r|−(α+1), α ∈ (0, 2) as |r| → ∞ (i.e. second moment does not exist,mean does not exist when α ≤ 1) will converge in distribution to an α-stable random variable (ratherthan a Gaussian random variable) with stability index α and skewness and scale parameters β and σ [24].680 0.1 0.2 0.3 0.4 0.510−0.1100TACDy(T )0 0.1 0.2 0.3 0.4 0.510−0.510−0.4TACDy(T )Figure 4.2: Logarithmic plots of the numerically estimated ACD function of yt. Left: (L,E, g, b) =(−1, 1.0541, 0, 1), ν = 0.4. Right: (L,E, g, b) = (−1, 1.118, 0.6, 0.4), ν = 0.3Specifically, the following holds:1n1/αn∑j=1(Rj −Rn)→DSα(β, σ) (4.8)where the notation “→D” denotes convergence in distribution and Sα(β, σ) denotes the α-stable randomvariable with stability index α, skewness parameter β, and scale parameter σ. If α > 1, the value of Rnis the first moment of Rj (the mean), otherwise it is a more complicated function of the distribution ofRi [42]. The value of the skewness parameter β is determined by the relative scaling of the tails of thedistribution uR and the scale parameter σ is determined by the magnitude of uR in the tails.As 0 < ν < 12 , u(0) has infinite variance as discussed in Section 4.2.1. It follows that a sum ofindependent random variables {Rj}nj=1 drawn from the distribution u(0) given by (4.4) converges indistribution to an α-stable random variable for sufficiently large n:1n1/α∗n∑j=1Rj →DSα∗(β∗, σ∗) as n→∞. (4.9)The specific values of the parameters α∗, β∗, and σ∗ of the attracting distribution can be determined fromanalysis of the pdf u(0) (As discussed in [42] and Appendix B.1.1 of this chapter). The stability parameterof the attracting distribution isα∗ = 2ν + 1 = −2L/E2. (4.10)The skewness parameter is determined by the normalized difference between the coefficients of the tails,β∗ = (h+− h−)/(h+ + h−) where h± = h(±1) with h(s) as given in (4.7). Simplifying this expression forβ∗ givesβ∗ = tanh(pigνb). (4.11)69The scale parameter is given byσ∗ =((h+ + h−)Γ(1− α∗)α∗ cos(piα∗2))1/α∗=(2 cosh(pigν/b)E(α∗+1)α∗N Γ(1− α∗) cos(piα∗2))1/α∗. (4.12)=(Γ(1− α∗/2)Γ(1 + α∗/2))1/α∗ ( b2E)if g = 0. (4.13)The decay rate of the tails of the CAM noise distribution depends only on the parameters L and E, andso the requirement that 1 < α∗ < 2 implies that E22 < (−L) < E2. The difference between the PDF forthe sum of {Rj}nj=1 and the attracting α-stable PDF is O(n1−2/α) for α ∈ (1, 2). For values of parameterssuch that α∗ / 2 (i.e. (−L) / E2), the rate of convergence to an α-stable distribution for the sum ofRj is quite slow as the exponent 1 − 2/α is close to zero, requiring a very large value of n to achieveconvergence. For values of α∗ away from 2 (i.e. E22 < (−L) E2), convergence is quicker since thenegative exponent 1− 2/α is not close to zero [42].4.3 Reduction of the multiple time scale system forced by a linearCAM processAs discussed in the introduction, the goal of the present analysis is to derive the stochastic dynamicsof a random variable Xt whose weak properties converge to those of the slow variable xt for the system(4.1), (4.2) as → 0. In this section, we outline the derivation and conditions for the validity of thisapproximation and give the specific form of our approximation below.In Section 4.2.2, we discussed how a sum of random variables drawn from the stationary distribution(4.4) of a linear CAM noise process converges in distribution to an α-stable random variable and presentedthe values of the parameters of the attracting stable distribution, α∗, β∗ and σ∗. Given this relationshipbetween the stationary distribution of sums of realizations of yt and α-stable distributions, we expect thatthe integral of yt/ converges in the weak sense to the integral of an α-stable process with stability indexα∗ and skewness parameter β∗, but with unknown scale parameter yet to be determined. As a result ofthis, we also expect that the system (4.1),(4.2) can be reduced by replacing the fast driving linear CAMprocess yt/ with this α-stable noise process such that the weak properties of xt are maintained (at leastapproximately) and that in the limit of infinitely large time scale separation (i.e. → 0) our stochasticaveraging result from Chapter 3 can be applied.Previous stochastic averaging results of fast-slow systems with finite variance such as [63], as well asthe results of Chapter 3 indicate that, in addition to the properties of the stationary distribution of thefast process, the details of its serial dependence affect the specific form of the stochastic averaging ap-proximation. Specifically, when the fast process is Gaussian, the standard deviation of the approximatingGaussian white noise term is proportional to the global integral of the autocovariance function of the fastprocess [63]. Similarly, if the fast process is α-stable, the scale parameter of the approximating α-stablewhite noise process is proportional to the characteristic decay time scale of the fast process as shown inChapter 3: the function g2 appears in the scaling factor for both averaging approximations, (3.32) and(3.55). Thus, we expect that any stochastic forcing replacing the fast CAM noise process will depend onthe memory of yt.70Our goal in this section is to derive a one-dimensional stochastic dynamical system, forced with anappropriate α-stable white noise process such that this reduced system has approximately the same weakproperties as xt. Since yt/ is a fast, time-correlated, infinite variance stochastic process having powerlaw tails, we consider a statistically similar fast OULP, zt/ (which is also time-correlated and has infinitevariance, power law tails) for which we have an effective stochastic averaging approximation derived inChapter 3. We consider the conditions under which yt/ can be approximated by an OULP in terms of itsintegrated influence on xt, and so the basis of our reduction is not the properties of yt/ and zt/ themselves,but their time integrals∫ t0 ys/ ds and∫ t0 zs/ ds. To derive our stochastic averaging approximation, we firstreiterate a key calculation from Section 3.2.2 to derive the statistical properties of∫ t0 zs/ ds, showing thatit is distributed according to an α-stable law. Next, we determine the requirements for the approximationof integral of the fast linear CAM process,∫ t0 ys/ ds, by an α-stable random variable to be valid andthen derive the appropriate values of the parameters for the integral of the fast OULP∫ t0 zs/ ds to bean effective approximation for∫ t0 ys/ ds. We conclude this section by using the results of Chapter 3 tocomplete our stochastic averaging approximation for xt.4.3.1 The distribution for the integral of an OULPConsider an OULP, zt, with dynamicsdzt = −θzt dt+ σ∗ dL(α∗,β∗)t , θ > 0. (4.14)Using the approach developed in detail in Chapter 3, we derive the characteristic function for the integralof the OULP. We let dvt = zt/ dt (so vt =∫ t0 zs/ ds) and w(v, z, t) be the time-dependent PDF for thejoint (vt, zt/) system, which satisfies the FKE,∂w∂t =(θ∂∂z −∂∂v)(zw) + (σ∗)α∗ D(α∗,β∗)z w, w(v, z, 0) = δ(v)δ(z − z0), t > 0. (4.15)The operator D(α,β)z is the fractional differentiation operator that is defined in terms of a Fourier transformwith respect to z, corresponding to α-stable noise driving with stability index α and skewness parameterβ as given in (3.16) [6]. Taking the Fourier transform, F , of (4.15), gives a quasi-linear partial differentialequation whose solution is the joint characteristic function of the (vt, zt/) system, which we denote byψv,z(m, k, t) = F [w](m, k, t) =∫∫R2 exp(ikz + imv)w(v, z, t) dvdz:∂ψv,z∂t +(θk −m) ∂ψv,z∂k = −(σ∗)α∗ |k|α∗Ξ(k;α∗, β∗)ψv,z, ψv,z(m, k, 0) = exp(ikz0) (4.16)where Ξ(Λ;α, β) = 1 − iβ sgn (Λ) tan(piα/2) as defined in (3.2). Solving (4.16) using the method ofcharacteristics gives the solution for ψv,z at time t:ψv,z(m, k, t) = exp(ikz0e−θt/ + imz0θ (1− e−θt/)− (σ∗)α∗∫ t0|Λ(r)|αΞ(Λ(r);α∗, β∗) dr)(4.17)71where Λ(r) = mθ +(k − mθ)e−θr/. Following the approach used to derive (3.24) in Chapter 3, we derivethe following asymptotic result for the integral term in (4.17) for t = O(1):(σ∗)α∗∫ t0|Λ(r)|α∗Ξ(Λ(r);α∗, β∗) dr (4.18)=(σα∗α∗θ)|k|α∗Ξ(k;α∗, β∗) + α∗−1(σ∗)α∗tθα∗ |m|α∗Ξ(m;α∗, β∗) +O().Note that the leading order k-dependent term on the right-hand side of (4.18) gives the component of thecharacteristic function corresponding to the stationary distribution for zt. More corrections can easily beadded to the right-hand side of (4.18), but they are unnecessary for our purposes as they vanish fasterin the limit → 0 than the terms given, and we are only concerned with the asymptotic behaviour ofψv,z(m, k, t). This asymptotic result allows us to factor the joint characteristic function for zt and vt intoa function of k and one of m in the limit of sufficiently large θt/ similar to equations (3.27) and (3.30)in Chapter 3: ψv,z(m, k, t) = ψz(k)ψv(m, t) where ψz(k) is the approximate characteristic function for ztandψv(m, t) = exp[imz0θ (1− e−θt/)− α∗−1(σ∗)α∗tθα |m|α∗Ξ(m;α∗, β∗)](4.19)is the approximate characteristic function for vt =∫ t0 zs/ ds. The functional form of ψv in (4.19) isidentical to that of the characteristic function for an α-stable random variable with mean z0θ (1− e−θt/)and scale parameter γ∗σ∗θ t1/α∗ , where γ∗ = 1 − 1/α∗. As convergence in characteristic function impliesconvergence in distribution, the asymptotic distribution for vt is α-stable with the same mean and scaleparameter. The mean of z0, assuming it is drawn from the stationary distribution of zt, is equal to zeroand hence the unconditional mean of vt (i.e. not conditioned on z0) is equal to zero, and sovt =∫ t0zs/ ds →DSα∗(β∗, γ∗t1/α∗θ−1σ∗) (4.20)for small and time t ≥ 0. In matching the statistics of∫ t0 zs/ and∫ t0 ys/ ds, the parameter θ can beestimated as a measure of the serial dependence of the CAM process. In Section 4.3.3, we compare thestatistics of vt =∫ t0 zs/ ds with those of∫ t0 ys/ ds to obtain the appropriate value for θ.4.3.2 Convergence of CAM noise to an α-stable noise processTo make the comparison between the behaviour of yt/ and zt/, we consider the conditions under whichthe integral of yt/ has the same statistical features as zt/ (4.20). We partition the integral of yt/ intoNY increments, each of length ∆, denoting the increments as {Yj}NYj=1:∫ T0ys/ ds =NY∑j=1Yj , Yj =∫ j∆(j−1)∆ys/ ds = (∫ j∆/(j−1)∆/ysˆ dsˆ), (4.21)where T = NY ∆. Based on the discussion in Section 4.2.2 of how sums of random variables withdistribution u(0) converge to α-stable random variables, we make the following ansatz: for any fixed72non-zero value ∆ and sufficiently small ,Yj ∼DY(α∗) (4.22)where the PDF for the distribution Y is given by uY , whereuY (r) ∼q−|r|−(α∗+1) for r < −aq+r−(α∗+1) for r > a, for some a > 0 and q+, q− satisfyingq+−q−q++q− =h+−h−h++h− = β∗q+, q− ∝ γ∗∆1/α∗(4.23)where h+, h− are given in (4.7) (i.e. uY has infinite variance and power law tail decay with the samerelative tail weights as those of the stationary distribution for yt, u(0)). We assert that the memory of ytdoes not affect the tail behaviour or the degree of asymmetry of Yj . We emphasize here that our ansatzdoes not include any assumptions about the exact values of q+, q−; only their relative values, as well astheir dependence on and ∆, which is justified at the end of this section by an analysis of the integral(4.21) and a corresponding numerical calculation.By this ansatz, a large enough sum of independent instances of Yi distributed according to (4.22)converges in distribution to an α-stable random variable with stability index α∗ and skewness parameterβ∗, by the GCLT. However, in order for us to make this claim under the GCLT, the following twoconditions must be satisfied.a) The value of ∆ must be large enough such that Yj and Yj+1 are (effectively) independent for allj ∈ {1, 2, . . . , NY − 1}.We need to have effective independence of summands, Yj , for the GCLT to apply when computingthe sum of Yj , and hence we need to ensure that Yj and Yj+1 are (asymptotically) independent forevery j in the sum. Intuitively, we expect this to be the case provided the value of ∆/ τywhere τy is a characteristic memory time scale of yt, such as the e-folding time of the ACD function:ACDy(τy) = e−1ACDy(0). To establish independence between two random variables with infinitevariance, P,Q, it is often sufficient to consider only the codifference, CD, of the random variables:CD(P,Q) = log(E [exp(iP − iQ)])−log(E [exp(iP )])−log(E [exp(−iQ)]). In the case 1 < α < 2, havingcodifference equal to zero is not sufficient for the independence of two symmetric α-stable randomvariables [32, 95], but having both the codifference and cosum equal to zero is sufficient. The cosum oftwo random variables CS(P,Q), can be defined in terms of the codifference: CS(P,Q) = CD(P,−Q).Hence, to determine how large ∆/ must be such that subsequent values of Yj can be consideredeffectively independent, we estimate both the codifference and the cosum of the normalized randomvariables Y j and Y j+1, which we denote by CD(Y j , Y j+1) and CS(Y j , Y j+1) respectively. The choiceto use the normalized values of Yj and Yj+1 is chosen for presentation purposes and does not affect thedegree of measured interdependence between the two random variables, as it only scales the absolutevalue of the codifference estimates, not their relative estimates. We numerically estimate the values ofthe codifference and cosum as a function of the length of the time interval over which the integral ofyt is computed according to (4.21). The results shown in Figure 4.3 indicate decreasing dependencebetween subsequent values of Yj with increasing ∆/, as we expect. Increasing the magnitude of thedecay term L < 0, while rescaling all parameters to leave the PDF of yt invariant reduces the timerequired for the codifference of Yj and Yj+1 to drop below a fixed threshold. It is also clear that for730 1 2 3 4 500.511.52CD(Y¯ j,Y¯j+1) L = −0 .5L = −1L = −20 1 2 3 4 5−0.4−0.3−0.2−0.10∆/ǫCS(Y¯ j,Y¯j+1)0 1 2 3 4 500.511.52CD(Y¯ j,Y¯j+1) L = −0 .5L = −1L = −20 1 2 3 4 5−0.4−0.3−0.2−0.100.1∆/ǫCS(Y¯ j,Y¯j+1)Figure 4.3: Plots of the codifference CD and cosum CS of Y j and Y j+1 as a function of the lengthof time integral ∆/ as given in (4.21). For both plots, α∗ = 1.7, and L is as indicated in thelegend. 105 pairs of (Yj , Yj+1) are sampled each value of L considered in each figure. Left:g = 0, b =√−L. Right: g = 0.4√−L, b = 0.6√−L.|L| = O(1), we can expect effective independence of Yj and Yj+1 provided that ∆/ is at least O(10),using the e-folding time for the characteristic time scale of yt.b) Assuming that is ∆ large enough such that condition a) is satisfied, the parameter NY = T/∆ mustbe large enough such that the sum of {Yj}NYj=1 converges to an α-stable random variable.The rate of distributional convergence of a sum of heavy-tailed random variables to the distribution ofan α-stable random variable depends on the details of their distribution, primarily the tail behaviour.Consider the expression for Yj given by (4.21) and assume that ∆/ is large enough to satisfy conditiona). Additionally suppose that ∆/ is large enough such that (4.22) and (4.23) hold, so we can writethe distribution of Yj as followsuY (y) ∼12pi∫Rexp(−iky)ψY (k) dk (4.24)where ψY is the Fourier transform of uY (y) (4.23), determined in Appendix B.1.1:ψY (k) = exp[(Γ(−α∗)(q+ + q−) cos(piα∗2))|k|α(1− iβ∗ sgn (k) tan(piα∗2))−Qk2 +O(k3)].(4.25)Since ∆/ is large enough for effective independence of subsequent values of Yj , we can write the PDF74of SNY =(N−1/α∗Y∑NYj=1 Yj), which we denote by u(NY )S (y) asu(NY )S (y) =12pi∫Rexp (−iky)ψS (k) dk, where ψS(k) ∼ ψY(kN1/α∗Y)NY. (4.26)Thus, the characteristic function for SNY can be determined using (4.25) giving usψS(k) = exp[−((q+ + q−)Γ(1− α∗)α∗ cos(piα∗2))|k|α∗Ξ (k;α∗, β∗)− QN2/α∗−1Yk2 +O(k3N3/α∗−1Y)](4.27)where Q is given in (B.7). We can see that as NY increases, the distribution of SNY converges to anα-stable distribution, implied by the point-wise convergence of their characteristic functions (as perLe´vy’s continuity theorem [24]) as the O(k2) terms decrease to zero:ψS(k)→ exp[−((q+ + q−)Γ(1− α∗)α∗ cos(piα∗2))|k|α∗Ξ (k;α∗, β∗)]as NY →∞. (4.28)As mentioned in Section 4.2.2, the rate of convergence of SNY to an α-stable distribution is quite slowfor α→ 2−. To decrease the magnitude of the coefficient of the k2 term by a factor of 10 for α∗ = 1.9requires that NY increase by a factor of 1019.Thus, if conditions a) and b) can be simultaneously satisfied for T = NY ∆, the approximation of∑NYj=1 Yjby an α-stable random variable is appropriate and the result (4.28) implies thatSNY →D Sα∗(β∗, σ˜), where σ˜ =((q+ + q−)Γ(1− α∗)α∗ cos(piα∗2))1/α∗, (4.29)for sufficiently large NY ,∆.We finish this section by confirming the assumption in our ansatz that q+ and q− are proportional toγ∗∆1/α∗ , which allows us to determine the scaling of∫ T0 ys/ ds with respect to and T . The full integral(4.21) can be expressed by multiplying the sum SNY by N1/α∗Y which gives us a α-stable distribution for∫ T0 ys/ ds: ∫ T0ys/ ds = N1/α∗Y SNY →D Sα∗(β∗, N1/α∗Y σ˜) = Sα∗(β∗, T 1/α∗∆−1/α∗ σ˜) (4.30)We apply the change of variable in (4.21) to (4.30) which allows us to write the integral as∫ T/0ysˆ dsˆ →DSα∗(β∗, T 1/α∗∆−1/α∗ σ˜). (4.31)Eliminating the factors of in the integral, we obtain the following result:∫ T0ysˆ dsˆ →DSα∗(β∗, −1T 1/α∗(∆/)−1/α∗ σ˜), (4.32)for sufficiently large T . Since the true value of the integral∫ T0 ysˆ dsˆ as defined in (4.32) has no dependenceon or ∆, this justifies our ansatz in that q− and q+ must be proportional to γ∗∆1/α (γ∗ = 1−1/α∗) for75σ˜ to also be proportional to γ∗∆1/α. This part of our ansatz is further justified by the results in Figure4.4, where we explicitly demonstrate the scaling of σ˜ with and ∆. Thus, any dependence on or ∆ in(4.32) is eliminated and if we apply this observation to (4.30) we see that∫ T0ys/ ds →DSα∗(β∗, γ∗T 1/α∗Σ), Σ = σ˜γ∗∆1/α∗ . (4.33)This shows that the scale parameter of the α-stable approximation for the integral of yt/ scales as γ∗ ,since Σ is independent of . Furthermore, we notice that this allows us to eliminate any dependence onthe variables ∆, NY , which characterize the arbitrary partition of the time interval [0, T ] and should notfactor into the approximation of∫ T0 ys/ ds. While this result has allowed us to determine the form of theapproximation for∫ T0 ys/ ds, it does not tell us about the specific value of the scale parameter Σ for agiven set of parameters for yt/.10010−1∆σ˜10−4 10−310−1ǫα∗ = 1 .5α∗ = 1 .8Figure 4.4: Plots of the estimated value of the scale parameter σ˜ as a function of and ∆ fordifferent values of α∗ as indicated in the legend. NY = 5000 realizations of Yj were generatedto compute the numerical estimates of σ˜. The fixed parameters are (L, g, b) = (−1, 0, 1). Left: = 10−3 and the dashed lines are proportional to ∆1/α∗ . Right: ∆ = 1 and the dashed linesare proportional to γ∗ .4.3.3 The stochastic averaging approximationTo determine the effect of yt/ on xt we first make a change of variable so that yt/ is an additive term,similar to the transformation µ = T (x) defined by (3.34) that was done in the derivation of the (N+)approximation (3.55) in Chapter 3. We define the stochastic process ηt = U(xt), where U ′(x) = 1f2(x) ,which has dynamics given bydηt = U ′(xt) dxt = f˜(ηt) dt+ −ρyt/ dt, where f˜(η) =f1(U−1(η))f2(U−1(η)), (4.34)as in the first term on the right-hand side of (3.38) from Chapter 3. Notice that the transformation U isinvertible, since f2(x) 6= 0 for any x in the domain. We write the dynamics (4.34) in integral form,ηt = η0 +∫ t0f˜(ηs) ds+ −ρ∫ t0ys/ ds, t > 0, (4.35)76and consider the integral of yt/ over a finite length time interval of length T ,∫ T0 ys/ ds. Since our endgoal is to approximate the dynamics of xt using an α-stable white noise approximation for yt/, we needthe approximation to be appropriate on the characteristic timescale of xt, τx. Thus, while T must besufficiently large such that∫ T0 ys/ ds can be approximated by an α-stable random variable according toour conditions in Section 4.3.2, those conditions must also hold for T ≤ τx for the approximation to bevalid on the time scale of xt.As we showed in Section 4.3.2,∫ T0 ys/ ds can be weakly approximated by an α-stable random variableas per (4.33), over a sufficiently long time interval of length T . Similarly, we also showed that thedistribution of∫ T0 zs/ ds for a sufficiently large T is also approximately distributed by an α-stable randomvariable, (4.20). In both cases, the scale parameter was proportional to T 1/α∗ . Thus, if we defineθ = σ∗/Σ (4.36)where Σ is given in (4.33) then∫ T0 zs/ ds converges to the same distribution as∫ T0 ys/ ds in the limit ofsmall : ∫ T0ys/ ds,∫ T0zs/ ds →DSα∗(β∗, γ∗T 1/α∗Σ). (4.37)Thus, we make the following stochastic averaging approximation: we replace the∫ t0 ys/ ds term in (4.35)with∫ t0 zs/ ds where zt satisfies (4.14) with θ given by (4.36) such that the distributions of the integralsof yt/ and zt/ are asymptotically equivalent for sufficiently small . Hence, the dynamical equation of ηt(4.34) is weakly approximated bydηt ≈(f˜(ηt) + −ρzt/)dt. (4.38)Applying the (N+) stochastic averaging approximation (3.55) from Chapter 3, the dynamics of ηt asdefined in (4.38) are well-approximated in the weak sense by η˜t wheredη˜t = f˜(η˜t) dt+ γ∗−ρ(σ∗θ)dL(α∗,β∗)t . (4.39)Note that the approximation of ηt (4.38) by η˜ (4.39) can also be justified using similar reasoning to thatleading to the approximation given in (3.47). Taking the inverse of the transformation U , as in (3.49), wefind that the weak properties of xt are well approximated by the process Xt = U−1(η˜t) wheredXt = f1(Xt) dt+ γ∗−ρ(σ∗f2(Xt)θ) dL(α∗,β∗)t , (4.40)and ‘’ indicates that the stochastic increment is interpreted according to Marcus’ stochastic integral [55].The stochastic averaging approximation given by (4.40) is analogous to the (N+) approximation (3.55).If f2(x) is a constant function, the Marcus integral interpretation is equivalent to the Ito¯ interpretation.Notice that for ρ < γ∗, the scale parameter of the stochastic forcing in (4.40) goes to zero as → 0, andfor ρ > γ∗, it diverges and the system becomes dominated by noise.774.4 Sample systemsIn this section, we simulate one linear and two nonlinear systems of the form (4.1), (4.2) and compare thestationary PDFs and ACDs of the numerically simulated trajectories of xt to those of the correspondingstochastic averaging approximation, Xt (4.40). We take ρ = γ∗ which gives us a non-vanishing scaleparameter for the noisy forcing in (4.40), as → 0. We finish this section with some general observationsabout the correspondence between xt and Xt.To estimate the value of θ, we use the method described in Appendix B.3. We note here that whenestimating the value of θ used in our simulations of Xt, we use = 10−5 to compute our estimates (asper Appendix B.3) which is less than considered in any of the full systems by at least one order ofmagnitude. Such a small value of is chosen to ensure that, asymptotically, the estimates of θ have nodependence (or at least a lessened dependence) on so as to emphasize the differences between the fulland reduced systems due to the large, but finite, time scale separation.4.4.1 Linear systemLet the dynamics of xt, t ≥ 0 be given bydxt =(−µxt + −γ∗ζyt/)dt (4.41)where µ > 0, ζ is a constant, x0 is known and yt/ is a fast linear CAM noise process with dynamics (4.1).Our stochastic averaging result (4.40) gives the reduced system,dXt = −µXt dt+(ζσ∗θ)dL(α∗,β∗)t . (4.42)where σ∗ is given in (4.12). As we see, in the case of linear dynamics of the slow variable, Xt is an OULP,with a stationary distribution which is α-stable with stability index α∗, skewness parameter β∗ and scaleparameter (α∗µ)−1/α∗θ−1ζσ∗.We simulate xt numerically and compare its estimated PDF and ACD functions with the known PDFand ACD for the corresponding process Xt. The results are shown in Figures 4.5 and 4.6 for the case ofunskewed noise where g = 0, (for which β∗ = 0) and for the skewed case where g 6= 0 (β 6= 0), respectively.Notice that in Figure 4.5, the ACD functions of xt and Xt do not match exactly for the α∗ = 1.8 case,but they appear to be the same up to a multiplicative constant. This difference can be explained byconsidering the slow rate of convergence to a stable distribution for α∗ near 2 and the fact that ourestimator for θ assumes that the sample Yj have converged in distribution to an α-stable distribution,which may not be the case. Nevertheless, the differences between the slow variable of the full system andthe reduced system are clearly decreasing as is reduced. The roughness in the ACD function for largerlag times in Figure 4.5 for α∗ = 1.8 and = 10−4 is due to sampling variability.78−20 −10 0 10 2010−410−310−210−1xu(x), ǫ = 10− 2u(x), ǫ = 10− 4u(X )−5 0 500.20 1 2 3 410−210−1100101TACDx(T ), ǫ = 10− 2ACDx(T ), ǫ = 10− 4ACDX(T )−20 −10 0 10 2010−410−310−210−1xu(x), ǫ = 10− 2u(x), ǫ = 10− 4u(X )−5 0 500.10.20 1 2 3 410−210−1100101TACDx(T ), ǫ = 10− 2ACDx(T ), ǫ = 10− 4ACDX(T )Figure 4.5: Top-Left: The numerically estimated stationary PDF of xt (4.41) and the predictedPDF of Xt (4.42) on a logarithmic scale (linear scale, inset) with parameters (L,α∗, g, b) =(−1, 1.5, 0, 1), (µ, ζ) = (1, 1). The value of is indicated in the legend. Top-Right: Thecorresponding ACDs. Bottom: Same as Top, but with α∗ = 1.8.4.4.2 Nonlinear system 1: nonlinear potentialWe first test our reduction method in the presence of nonlinearities by considering a system with a basiccubic nonlinearity in the slow equation and additive linear CAM noise driving:dxt = −(µxt + x3t)dt+ −γ∗ζ yt/ dt (4.43)where yt is given by (4.1). This system is close to linear for small xt, but experiences a stronger nonlineardrift for large xt. For µ > 0, the system is globally attracted to the x = 0 in the noiseless case, andfor µ < 0, the origin is unstable (repelling) equilibrium and the system is locally attracted to one of twostable equilibria at x = ±√−µ. According to our reduction results, we expect that the dynamics for xtcan be weakly approximated bydXt = −(µXt +X3t ) dt+(ζσ∗θ)dL(α∗,0)t , (4.44)where θ is numerically estimated in the same fashion as described in Section 4.3.2. Simulating xt and Xt(4.44) requires particular care due to the numerical instabilities caused by the cubic nonlinearity whenthe variable is large and so we use a predictor-corrector method described in Appendix A.2.1 as well as a79−20 −10 0 10 2010−510−410−310−210−1xu(x), ǫ = 10− 2u(x), ǫ = 10− 4u(X )−2 0 2 400.20.40 1 2 3 410−210−1100101TACDx(T ), ǫ = 10− 2ACDx(T ), ǫ = 10− 4ACDX(T )Figure 4.6: As in Figure 4.5, but with parameters (L,α∗, g, b) = (−1, 1.7, 0.6, 0.4), (µ, ζ) = (1, 1).−6 −4 −2 0 2 4 610−510−410−310−210−1100xu(x), ǫ = 10− 2u(x), ǫ = 10− 4u(X )−1 0 101−6 −4 −2 0 2 4 610−510−410−310−210−1100xu(x), ǫ = 10− 2u(x), ǫ = 10− 4u(X )−1 0 100.51Figure 4.7: The numerically estimated stationary PDFs, u, for xt and Xt on a logarithmic scaleand linear scale (inset) for nonlinear system 1 with dynamics (4.43) and (4.44) . (L, g, b) =(−1, 0, 1), (µ, ζ) = (1, 0.2) and as indicated in the legend. Left/Right: α∗ = 1.5/1.8.sufficiently small simulation time step to integrate the slow variable. The CAM noise process is integratedas described in Appendix B.2. We simulate both xt and Xt, and show their estimated PDFs for a casewhere µ > 0 in Figure 4.7 and a case where µ < 0 in Figure 4.8. We see that in both cases, the stochasticaveraging approximation works well. The roughness visible in the tails of the PDFs for xt and Xt inFigures 4.7 and 4.8 is due to sampling variability.4.4.3 Nonlinear system 2: bilinear interactionThe second nonlinear system that we consider has a bilinear term in which the CAM noise process interactsmultiplicatively with the slow dynamics:dxt = (c− xt + ζxtyt/) dt, c, x0 > 0. (4.45)It is clear by looking at the limit as x→ 0 that the dynamics of xt are restricted to the positive real line,due to the inclusion of the state-independent drift term c > 0. According to our stochastic averagingresult, we can weakly approximate the dynamics of (4.45) by Xt wheredXt = (c−Xt) dt+(ζσ∗θ Xt) dL(α∗,0)t , X0 = x0. (4.46)80−6 −4 −2 0 2 4 610−510−410−310−210−1100xu(x), ǫ = 10− 2u(x), ǫ = 10− 4u(X ) −1 0 100.51−6 −4 −2 0 2 4 610−510−410−310−210−1xu(x), ǫ = 10− 2u(x), ǫ = 10− 4u(X )−1 0 100.5Figure 4.8: Same as Figure 4.7, but with µ = −1 in both panels.1 2 3 4 5 6 710−210−1100xu(x), ǫ = 10− 2u(x), ǫ = 10− 4u(X )1 2011 2 3 4 5 6 710−210−1100xu(x), ǫ = 10− 2u(x), ǫ = 10− 4u(X )0.5 1 1.5 200.51Figure 4.9: The numerically estimated stationary PDFs, u, for xt and Xt on both a linear andlogarithmic scale for nonlinear system 2 with dynamics (4.45) and (4.46). (L, g, b) = (−1, 0, 1),(c, ζ) = (1, 0.2). Left/Right: α∗ = 1.5/1.8where ‘’ denotes the Marcus interpretation of the stochastic driving term. In this case, the Marcusstochastic term, ζσ∗θ Xt dL(α∗,0)t , can be written as λ(1; ∆L,Xt−) − Xt− where λ(s) = λ(s; ∆L,Xt−)satisfiesdλ(s)ds =ζσ∗(∆L)θ λ(s), λ(0) = Xt−, (4.47)∆L = dL(α∗,0)t and Xt− = lims→t− Xs, as per the discussion in Section A.2.1. Solving (4.47) for λ(1) givesλ(1; ∆L,Xt−) = exp( ζσ∗θ ∆L)Xt−. This analytic expression for the Marcus stochastic increment allows usto simulate (4.46) with relative ease and accuracy. The estimates of the stationary PDFs of xt and thePDF of the comparable reduced process Xt are shown in Figure 4.9.4.4.4 Discussion of numerical resultsFrom the results of Figures 4.5, 4.6, 4.7, 4.8 and 4.9, we see that the stationary behaviour of xt in everysystem in this section could be well-approximated by a system influenced by α-stable noise for sufficientlysmall values of . The PDFs of the full and reduced systems match well, as do the temporal properties(i.e. the ACD functions), indicating that the memory of the full system is approximated by the lesscomplicated α-stable noise forced system.As decreases, the approximation of the full system by the reduced system improves. However, the81degree of correspondence depends not only on , but α∗ as well. The weak properties of xt and Xt displayincreasing similarity for smaller values of , but for α∗ close to 2 the value of needs to be significantlysmaller to achieve the same degree of weak convergence than for α far from 2. Given our discussion ofthe requirements for the convergence of yt/ in Section 4.3.2, we can give some rough estimates for thevalues of NY and ∆ over the characteristic time scale of xt. In all cases, the characteristic time scale ofxt is τx = O(1). If we set the length of the time integral of yt/ in (4.21) to the value T = 1, which ison the same order as the characteristic time scale of xt as per our discussion in Section 4.3.3, and applythe assumption that ∆/ ≈ 4 is sufficient for effective independence of subsequent Yj (as per Figure 4.3),then we have an effective value of NY = T/∆ = (4)−1. In the simulations considered here, we haveonly used = 10−2 and 10−4, giving us respective values of NY = 25 and 2500. The value of NY = 25is almost certainly not large enough to assume sufficient convergence would occur for any α∗, while forNY = 2500 the degree of convergence is evidently greater for α∗ farther from 2. The numerical resultsshown above certainly suggest this, as the statistics for the full systems where α∗ = 1.5 and = 10−4 areall approximated well by the corresponding stochastic averaging approximations, while the other casesare not as well approximated.4.5 ConclusionsThis study demonstrates that α-stable forcing can appear in the asymptotic slow dynamics of fast-slowsystems where the fast process is linear and forced with a combination of additive and multiplicative noisesuch that its distribution has infinite variance. We studied the linear CAM process in this analysis dueto its use in applied research problems and the fact that many analytical results are available [73, 79].A particularly valuable aspect about the linear CAM process is that it provides a simple dynamicalform in which the interaction of multiplicative terms and Gaussian white noise forcing results in a processpossessing a distribution with power law tails and infinite variance. Due to the infinite variance stationarybehaviour of the linear CAM noise process and the explicit predictions of the power law tail behaviour,we hypothesized that the linear CAM noise process would appear to be effectively an α-stable forcingterm when used to drive a slower process. The derivation of the approximation required us to focus ourattention on the properties of the integral of the linear CAM noise process and determining an OULPhaving integral behaviour with similar statistics. Determining the stability index and skewness parameterof the corresponding OULP is very straightforward, but the corresponding value of the scale parameterneeds to be estimated numerically as it depends on the serial dependence properties of the linear CAMnoise process, which are difficult to study analytically due to the more complicated serial dependencestructure of the linear CAM noise process. When we apply our derived stochastic averaging approximationto linear systems as well as systems with nonlinearities in the slow variable, we observe good agreementbetween simulations of the full and reduced dynamics, which improve as decreases. For values of theparameters such that the tails of the stationary distribution of the linear CAM process has decay exponentclose to 3, the ratio of time scales between the fast and slow processes needs to be extremely large toobserve the distributional convergence of the slow variable to the predicted distribution. Both the PDFsand the ACDs of the slow variable of the full system are well-approximated by the stochastic averagingapproximationBesides presenting a method of approximating fast-slow systems that are forced with correlated ad-82ditive and multiplicative noise processes, this analysis also suggests a possible mechanism through whichα-stable forcing can emerge in the modelling of physical problems. For example, as mentioned above,CAM noise processes emerge from considerations of the dynamics of quadratically nonlinear systems likethose describing atmospheric motion [52, 79] and could result in longer time scale processes experienc-ing forcing terms distributed according to an approximately α-stable law. This may offer a possibleexplanation for the observation of α-stable noise in the calcium signal found in the Greenland ice cores[19].There are various extensions to this research that are worth exploring. All of the analysis given is forthe case where both the fast and slow subsystems are univariate. Extensions of the results of this paperto higher-dimensional fast slow systems may yield unexpected challenges, but are likely required beforethis stochastic averaging method could be applied to many research problems. To have results showingthat a climate model can exhibit α-stable statistics would certainly be of interest and one could inspect[17, 19, 37, 82] for inspiration. Also, while there is broad utility in studying models where the fast linearCAM noise process perturbs the slow variable, it would be worth exploring the implications of nonlinearfast perturbations to the slow process or where the fast, infinite-variance process is something other thanthe linear CAM noise process. Another interesting situation to study would be the situation where thefast linear CAM process is conditionally dependent on the slow process, or in other words the slow variableinfluences the parameters of the linear CAM noise process. This would be a realistic scenario to considerin the context of climate modelling due to the coupling between fast and slow time scale processes thatcharacterizes several associated problems.83Chapter 5Conclusions5.1 SummaryIn this thesis we were able to achieve several results that improved our understanding of non-Gaussianprocesses related to climate modelling. In Chapter 2, we successfully applied the method of Crommelinand Vanden-Eijnden to produce parameter estimates for a wind model given in [59] such that the statisticsof the wind speed model match those of the data very well and all the estimated parameter fields, savetwo, are very smooth. When the model is modified from its original form to eliminate a degeneracy, theremaining non-smooth field is significantly improved. Additionally, the estimated parameters reproducethe stationary statistics of the data accurately, except the skewness which is well-captured in terms of itsrelative magnitude, but not absolute magnitude. The results of this study draw attention to particularlimitations of the assumed wind model and suggest possible improvements. This study also highlights thedifficulty the estimation procedure has discerning between the effects of mixing of downward momentum(i.e. the linear terms in the wind model) and the sea surface drag terms (i.e. the nonlinear terms) whenthe winds are not strongly skewed. The results also emphasize the memory of the stochastic drivingprocess as a factor that needs proper consideration, since the estimation procedure assumes un-correlatedstochastic forcing which is, in this case, a significant idealization of the physical situation. To mitigatethe effects of correlated noise, a sufficient degree of subsampling of the data can mitigate the effects ofcorrelated noise, but still results in biased parameter estimates. These observations comprise a case studyof the application of the Crommelin-Vanden-Eijnden method to real data and suggest ways that futurestudies having similar features in the data may be treated.In Chapter 3, we derived stochastic averaging approximations for stochastic dynamical systems havingmultiple time scales and α-stable noise forcing where the fast process is linear. In the classical case, thecoefficient of the diffusion term of the stochastic averaging approximation is proportional to the integralof the autocovariance function, again demonstrating the importance of giving the proper consideration tomemory in stochastic modelling. In the case of α-stable noise forcing, the integral of the autocovariancefunction does not exist and so an alternative approach based on analysis of the joint characteristic functionof the fast process and its time-integral is undertaken, since the characteristic function is defined for anystochastic process. This alternative approach allows us to derive new approximations that are consistentwith the corresponding approximations for the case of Gaussian white noise forcing in the appropriate84limits. Our results are supported by numerical simulations, which include estimates of the PDFs for theslow variable of the original system and the stochastic averaging approximations as well as numericalestimates of the ACDs. The (L) approximation gives a good weak approximation to the slow variableclose to the mean of the distribution in cases where the dynamics of the slow variable are close to linearand the coupling strength with the fast variable does not depend on the state of the slow variable. The(N+) approximation can capture nonlinear dynamics and multiplicative noise and is valid over the statespace of the slow variable, but is more difficult to simulate, due to the required Marcus interpretation inthe multiplicative noise case as well as the numerical complications associated with numerical evaluationof the Marcus integral (if required) and numerical stability issues associated with the large jumps of thenoise process. Thus the (N+) approximation may be more accurate in general, but the (L) approximationis easier to numerically evaluate and may be sufficient for a given project. Naturally, the choice of methodis at the researcher’s discretion.The results of Chapter 3 were applied in Chapter 4, in which similar stochastic averaging approxima-tions were derived for fast-slow systems that are driven by a fast linear CAM noise process, another typeof infinite variance forcing derived from idealized models of climate variability [72, 79, 87] rather thanα-stable noise forcing. We found that the slow variable of the full system could be weakly approximatedby a process forced with α-stable noise with a high degree of accuracy. Not only did the stochastic aver-aging approximations that we derive allow us to accurately reproduce the PDFs for the slow variable ofthe CAM noise forced systems, but also the autocodifference function as well. The specific parameters ofthe reduced system could not be obtained via analytical means in this particular case, but we were ableto estimate these parameters using numerical means to obtain a satisfactory approximation. The factthat the fast linear CAM process does not have α-stable statistics itself makes the stochastic averagingprocedure more challenging to derive than in the case where the fast process is α-stable as in Chapter 3.The analysis of the time-dependent properties of the linear CAM noise process is intractable and requiredus to invoke an ansatz related to the distribution of the time integral of the fast linear CAM noise process.This ansatz is justified via numerical calculation and allows us to show that the integrated linear CAMnoise process converges to the integral of an OULP. This convergence allows us to invoke the stochasticaveraging result of Chapter 3 to determine a reduced system that weakly approximates the slow variableof the joint fast-slow system. While the ansatz proved to produce reduced models closely matching theoriginal system, it does not determine the scale parameter of the corresponding OULP, which needs tobe estimated numerically. As in the case of the fast process driven with α-stable noise, the stochasticaveraging approximation is accurate provided that the separation of time scales is sufficiently large. Ifthe parameters of the linear CAM forcing are such that the tails of its stationary distribution decay withexponent close to 3, the separation of time scales must be extremely large to observe the distributionalconvergence of the slow variable to the predicted distribution. This is due to the slow rate of convergenceof random variables with infinite variance where the tails of the PDF decay as |y|−α+1, where α / 2to α-stable random variables with stability index α as discussed in [42]. The results of this project alsosuggest that processes with dynamics that can be modelled using linear CAM noise processes (or otherinfinite variance processes evolving in continuous time) are a possible explanation for the appearance ofα-stable noise in climate time series (e.g. [19]).855.2 Future workThe results of this thesis discuss non-Gaussian statistics appearing in stochastic modelling problemsthrough nonlinear dynamics, multiple time scales, and stochastic driving. Climate modelling problemswere the inspiration for this research and the results contained in this dissertation certainly have implica-tions for future research in climate and atmospheric modelling problems, particularly those that involveparametrizing sub-grid scale and chaotic processes. For applied mathematicians and mathematically-inclined atmospheric scientists looking to make a contribution to climate modelling, the possible studiesdescribed here offer several interesting and challenging directions for future research. There are also sev-eral interesting mathematical problems mentioned in this thesis that are worth studying for their ownsake. The answers to any of them will certainly be worth noting and provide researchers with more toolsand techniques that can be used to understand the increasingly important and complex field of climatescience.Despite the success of the wind speed parameter estimation project, there is progress to be made thatcould substantially improve the results, both from the perspective of the wind speed model and the esti-mation procedure. The wind speed data display several features that the assumed model cannot capture,such as the observed anisotropy of autocorrelation times and the weakly positive along-mean skewnessof the winds at some spatial locations. In terms of the physical modelling, there may be improvementsthat can be made such that these effects are built into the model. The observation of anisotropic auto-correlation times highlights the more general issue of parameter estimation when unobservable correlatedforcing is assumed. As mentioned in the previous section, we managed to circumvent the issue of corre-lated forcing by artificially altering the resolution time scale through subsampling of data points, therebydecreasing the relative autocorrelation time scale of the stochastic forcing (i.e. “whitening” the stochasticforcing). However, the existence of two autocorrelation time scales required us to estimate the autocorre-lation time scales for both the zonal and meridional wind velocities and use their geometric mean to get asingle natural time scale for the evolution of the vector winds. A more elegant solution achieved througha modified version of the wind model or modifying the estimation procedure may offer ways to accountfor these features. In particular, it would also be of significant interest to determine how to modify theCrommelin-Vanden-Eijnden estimation routine to account for the presence of autocorrelated, but unob-served, noise forcing. Beyond this, there is the tacit assumption in the estimation procedure of spatialindependence of the winds at neighbouring grid points (i.e. the wind flow is not a continuum model).While this assumption has not prevented us from obtaining smooth parameter estimate fields, particularlyin the case where the wind model is re-interpreted such that the downward mixing of momentum is zero,it would be of interest to see what result a “continuum-based” estimation procedure would generate. Toimplement a parameter estimation procedure that includes consideration for a spatially coupled modelwould almost certainly be more computationally expensive and a more thorough study of how to adaptthe Crommelin-Vanden-Eijnden method for this type of situation would be required to make a concretestatement about the increased estimation difficulties. Other features of the data, such as seasonality andthe diurnal cycle, which have been neglected should also be possible to account for in the estimationprocedure. It would be interesting to apply a similar estimation procedure assuming that the estimatedparameters are not constant, but rather periodic with significant power at frequencies corresponding todiurnal and annual cycles, although it is not immediately clear how this could be done.86As mentioned at the conclusion of Chapter 3, extensions of the stochastic averaging results given are avery natural avenue of research to pursue. The assumption of linearity in the fast variable is a particularlysignificant restriction. It should be possible to extend the results to systems where the dynamics of thefast process have “sufficiently nice” nonlinearities (e.g. quadratic), but to do so would require analysisbeyond the scope considered in this thesis. Determining stochastic averaging results for higher-dimensionalsystems forced with α-stable noise would also be a good continuation of this research. The stochasticaveraging results derived in both Chapters 3 and 4 were derived under the assumption that the stochasticforcing of the full systems had stability index α > 1 so that the corresponding (A) approximation couldbe defined in the classical way (i.e. by taking the mean of the slow drift term over all values of the fastprocess). A numerical calculation in Section 3.4 showed that it is likely that the stochastic averagingapproximations we derived are valid in the case where α ≤ 1, but we have not shown this explicitly.Although it is very likely that a similar derivation of the (L) and (N+) approximations given in Chapter3 can be used for the corresponding approximations when α ≤ 1, it has not been explicitly demonstratedand we leave that for future work. If this can be done, then it may be possible to extend the results ofChapter 4 to systems like the CAM noise process where the tails of the distribution of the fast processare such that the mean does not exist.The fact that the dynamics of a system forced with fast CAM noise can appear α-stable is a significantresult as it provides a possible justification for the appearance of α-stable dynamics in some climatic timeseries (i.e. [19]). While we have provided the connection between CAM noise and α-stable noise alludedto in the conclusions of [73], it still remains to be seen if there are processes that can be modelled withlinear CAM noise behaviour that have the appropriate power law decay giving infinite variance of thestationary distribution. This will involve a more comprehensive analysis of the reduction of the fluiddynamical equations (or a projection of them) to the quadratically non-linear ODE form and further tothe linear CAM process (as described in [79]) and seeing if the heavy-tailed behaviour of the CAM noiseis possible based on realistic parameter values.One possible future project neatly joins the two focal points of this thesis into one project. That is,determining if a parameter estimation method similar to the Crommelin-Vanden-Eijnden method thatminimizes the differences in the eigenstructure of the infinitesimal generator of a SDE model and that ofthe conditional expectation operator can be derived in the case where the SDE model is assumed to haveα-stable properties. In principle, this should be possible provided the eigenstructure of the infinitesimalgenerator can be determined, but a preliminary investigation of the problem suggests that writing thediscretized (i.e. numerical) equivalents of many of the associated operators is not straightforward becauseof their non-local nature. However it would be interesting to see if the parameter estimates obtainedby such a method are consistent with those obtained via our stochastic averaging approach described inChapters 3 and 4.87Bibliography[1] D. Applebaum. Le´vy processes and stochastic calculus. Cambridge University Press, 2004. → pages45, 98, 99[2] L. Arnold, P. Imkeller, and Y. Wu. Reduction of deterministic coupled atmosphere-ocean models tostochastic ocean models: a numerical case study of the Lorenz-Maas system. Dyn. Syst., 18(4):295–350, 2003. → pages 3, 15, 16, 17, 44, 45, 47, 48, 55, 61[3] S. Asmussen and P. W. Glynn. Stochastic Simulation: Algorithms and Analysis. Springer, 2007. →pages 97, 98[4] A. N. Borodin. A limit theorem for solution of differential equations with random right-hand side.Theor. Probab. Appl., 22(3):482–497, 1977. → pages 43[5] S. B. Capps and C. S. Zender. Global ocean wind power sensitivity to surface layer stability.Geophys. Res. Lett., 36(9), 2009. → pages 18[6] A. S. Chaves. A fractional diffusion equation to describe Le´vy flights. Phys. Lett. A, 239:13–16,1998. → pages 48, 71[7] A. V. Chechkin and I. Pavlyukevich. Marcus versus Stratonovich for systems with jump noise. J.Phys. A, 47(34):342001, 2014. → pages 9, 45, 46, 51, 52, 53, 97, 98[8] M. D. Chekroun, H. Liu, and S. Wang. Approximation of Stochastic Invariant Manifolds:Stochastic Manifolds for Nonlinear SPDEs II. Springer International Publishing, 2015. → pages 20[9] M. D. Chekroun, H. Liu, and S. Wang. Stochastic Parameterizing Manifolds and Non-MarkovianReduced Equations: Stochastic Manifolds for Nonlinear SPDEs II. Springer InternationalPublishing, 2015. → pages 20[10] R. Cont and P. Tankov. Financial Modelling with Jump Processes. Taylor & Francis, 2004. →pages 52, 53, 54, 99[11] D. Crommelin. Estimation of space-dependent diffusions and potential landscapes fromnon-equilibrium data. Jour. Stat. Phys., 149(2):220–233, 2012. → pages 15, 18, 42[12] D. Crommelin and E. Vanden-Eijnden. Fitting timeseries by continuous-time markov chains: Aquadratic programming approach. Jour. Comp. Phys., 217:782–805, 2006. → pages 15, 16, 21[13] D. Crommelin and E. Vanden-Eijnden. Reconstruction of diffusions using spectral data fromtimeseries. Comm. Math. Sci., 4(3):651 – 668, 2006. → pages 18, 19, 42[14] D. Crommelin and E. Vanden-Eijnden. Data-based inference of generators for markov jumpprocesses using convex optimization. SIAM Multi. Model. Simul., 7(4):1751–1778, 2009. → pages 2188[15] D. Crommelin and E. Vanden-Eijnden. Diffusion estimation from multiscale data by operatoreigenpairs. SIAM Multiscale Model. Simul., 9(4):1588–1623, 2011. → pages 15, 16, 18, 19, 21, 22,24, 29, 42[16] A. Dai and C. Deser. Diurnal and semidiurnal variations in global surface wind and divergencefields. J. Geophys. Res., 104:31109–31125, 1999. → pages 35[17] D. Del-Castillo-Negrete. Asymmetric transport and non-Gaussian statistics of passive scalars invortices in shear. Physics of Fluids, 10(3):576, 1998. → pages 3, 16, 83[18] T. DelSole. A fundamental limitation of markov models. J. Atmos. Sci., 57:2158–2168, 2000. →pages 25, 33[19] P. Ditlevsen. Observation of alpha stable noise induced millenial climate changes from an ice-corerecord. Geophys. Res. Lett., 26(10):1441–1444, 1999. → pages 3, 44, 65, 83, 85, 87[20] M. Donelan, W. Drennan, E. Saltzman, and R. Wanninkhof, editors. Gas Transfer at WaterSurfaces. American Geophysical Union, Washington, DC, 2002. → pages 18[21] B. Dybiec, E. Gudowska-Nowak, and P. Ha¨nggi. Le´vy-brownian motion on finite intervals: Meanfirst passage time analysis. Phys. Rev. E, 73:046104, 2006. → pages 97[22] European Centre for Medium-Range Weather Forecasts. Downloadable datasets. URLhttp://apps.ecmwf.int/datasets/. Downloaded before Jan. 2011. → pages 16, 34[23] C. W. Fairall, E. F. Bradley, J. E. Hare, A. A. Grachev, and J. B. Edson. Bulk parameterization ofair-sea fluxes: Updates and verification for the COARE algorithm. J. Climate, 16:571–591, 2003.→ pages 18[24] W. Feller. An Introduction to Probability Theory and Its Applications: Vol. II, 2nd Ed. John Wiley& Sons, 1966. → pages 3, 6, 44, 50, 66, 68, 75[25] M. Freidlin and A. Wentzell. Random Pertubations of Dynamical Systems. Springer-Verlag, 1984.→ pages 3, 43, 47[26] C. W. Gardiner. Handbook of Stochastic Methods for Physics, Chemistry and the Natural Sciences:2nd Ed. Springer-Verlag, 1985. → pages 3, 6, 10, 19, 24[27] D. Givon, R. Kupferman, and A. Stuart. Extracting macroscopic dynamics: model problems andalgorithms. Nonlinearity, 17:55–127, 2004. → pages 3, 16, 43[28] E. Gobet, M. Hoffmann, and M. Reiß. Nonparametric estimation of scalar diffusions based on lowfrequency data. The Annals of Statistics, 32(5):2223–2253, 2004. → pages 19, 21[29] P. Grassberger and I. Procaccia. Characterization of Strange Attractors. Phys. Rev. Lett., 50(5):346–349, 1983. → pages 1[30] P. Grassberger and I. Procaccia. Estimation of the kolmogorov entropy from a chaotic signal. Phys.Rev. A, 28:2591–2593, 1983. → pages 1[31] M. Grigoriu. Numerical solution of stochastic differential equations with poisson and le´vy whitenoise. Phys. Rev. E, 80(2):026704, 2009. → pages 97, 98, 99[32] A. Gross. Some mixing conditions for stationary symmetric stable stochastic processes. Stoch.Proc. Appl., 51(2):277 – 295, 1994. → pages 7389[33] K. Hasselmann. Stochastic climate models: Part i. theory. Tellus, 28(6):473–485, 1976. → pages 2,16, 44[34] C. Hein, P. Imkeller, and I. Pavlyukevich. Limit theorems for p-variations of solutions of SDEsdriven by additive non-Gaussian stable Le´vy noise and Model Selection for Paleo-Climatic Data,chapter 10, pages 161–175. World Scientific, 2010. → pages 3, 44[35] E. J. Hinch. Perturbation Methods. Cambridge University Press, 1991. → pages 95[36] W. Horsthemke and R. Lefever. Noise-Induced Transitions. Springer Berlin Heidelberg, 1984. →pages 65[37] M. Huber, J. C. McWilliams, and M. Ghil. A climatology of turbulent dispersion in thetroposphere. J. Atmos. Sci, 58:2377–2394, 2001. → pages 3, 16, 44, 83[38] I. Jones and Y. Toba. Wind Stress Over the Ocean. Cambridge University Press, 2001. → pages 18,28[39] W. Just, H. Kantz, C. Rodenbeck, and M. Helm. Stochastic modelling: Replacing fast degrees offreedom by noise. J. Phys. A, 34(15), 2001. → pages 44[40] W. Just, K. Gelfert, N. Baba, A. Riegert, and H. Kantz. Elimination of fast chaotic degrees offreedom: on the accuracy of the Born approximation. Jour. Stat. Phys., 112(1/2):277–292, 2003. →pages 43, 44[41] H. Kantz, W. Just, N. Baba, K. Gelfert, and A. Riegert. Fast chaos versus white noise: entropyanalysis and a fokker-planck model for the slow dynamics. Phys. D, 187(1-4):200–213, 2004. →pages 1, 3, 16, 43[42] J. Keller and R. Kuske. Rate of convergence to a stable law. SIAM Journal on AppliedMathematics, 61(4):1308–1323, 2001. → pages 69, 70, 85[43] R. Z. Khas’minskii. On Stochastic Processes defined by differential equations with a smallparameter. Theory Probab. Appl., 11(2):211–228, 1966. → pages 2, 16, 43, 44, 47[44] R. Z. Khas’minskii. A limit theorem for the solutions of differential equations with randomright-hand sides. Theory Probab. Appl., 11(3):390–406, 1966. → pages 2, 16, 44[45] Y. Kifer. L2 Diffusion approximation for slow motion in averaging. Stoch. Dyn., 3(2):213–246, 2003.→ pages 16, 44[46] P. E. Kloeden and E. Platen. Numerical Solution of Stochastic Differential Equations.Springer-Verlag, 1992. → pages 30, 97, 103[47] F. Kwasniok. Reduced Atmospheric Models Using Dynamically Motivated Basis Functions. J.Atmos. Sci, 64(10):3452–3474, 2007. ISSN 0022-4928. → pages 3[48] T. Li, B. Min, and Z. Wang. Marcus canonical integral for non-Gaussian processes and itscomputation: Pathwise simulation and tau-leaping algorithm. J. Chem. Phys., 138(10):104118,2013. → pages 97, 99[49] T. Li, B. Min, and Z. Wang. Erratum: “marcus canonical integral for non-gaussian processes andits computation: Pathwise simulation and tau-leaping algorithm” [j. chem. phys., 138, 104118(2013)]. J. Chem. Phys., 140(9), 2014. → pages 97, 9990[50] T. Li, B. Min, and Z. Wang. Adiabatic elimination for systems with inertia driven by compoundpoisson colored noise. Phys. Rev. E, 89:022144, Feb 2014. → pages 45[51] W. T. Liu, W. Tang, and X. Xie. Wind power distribution over the ocean. J. Geophys. Res., 35:L13808, 2008. → pages 18[52] A. Majda, I. Timofeyev, and E. Vanden-Eijnden. A mathematical framework for stochastic climatemodels. Comm. Pure and Appl. Math., 54:891–974, 2001. → pages 3, 17, 43, 65, 83[53] A. Majda, I. Timofeyev, and E. Vanden-Eijnden. A priori tests of a stochastic mode reductionstrategy. Phys. D, 170:206–252, 2002. → pages[54] A. Majda, I. Timofeyev, and E. Vanden-Eijnden. Systematic strategies for stochastic modereduction in climate. J. Atmos. Sci., 60:1705–1722, 2003. → pages 3, 43[55] S. Marcus. Modeling and analysis of stochastic differential equations driven by point processes.IEEE Trans. Inform. Theory, IT-24(2):164–172, 1978. → pages 9, 17, 46, 77[56] R. G. Miller. Simultaneous Statistical Inference. Springer-Verlag, 1981. → pages 100[57] L. Mitchell and G. A. Gottwald. Data Assimilation in Slow-Fast Systems Using HomogenizedClimate Models. Journal of the Atmospheric Sciences, 69(4):1359–1377, 2012. → pages 3, 16[58] A. H. Monahan. A simple model for the skewness of global sea surface winds. J. Atmos. Sci., 61:2037 – 2049, 2004. → pages 16, 27, 35[59] A. H. Monahan. The probability distribution of sea surface wind speeds. part i: Theory andseawinds observations. J. Clim., 19:497 – 520, 2006. → pages 16, 18, 27, 29, 35, 40, 42, 84[60] A. H. Monahan. The probability distribution of sea surface wind speeds. part ii: Datasetintercomparison and seasonal variability. J. Clim., 19:521 – 534, 2006. → pages 16, 34, 35[61] A. H. Monahan. Empirical models of the probability distribution of sea surface wind speeds. J.Clim., 20:5798 – 5814, 2007. → pages 29[62] A. H. Monahan. The temporal autocorrelation structure of sea surface winds. J. Clim., 25:6684?6700, 2012. → pages 34, 35, 41, 42[63] A. H. Monahan and J. Culina. Stochastic averaging of idealized climate models. J. Clim., 28:3068–3088, 2011. → pages 3, 13, 15, 16, 17, 44, 45, 47, 48, 55, 61, 70[64] J. P. Nolan. Numerical calculation of stable densities and distribution functions. Comm. Statist.Stoch. Models., 13(4):759–774, 1997. → pages 56[65] B. Øksendal. Stochastic Differential Equations. Springer-Verlag, 2003. → pages 7, 8, 12, 15, 19, 99[66] T. Palmer and P. Williams. Stochastic Physics and Climate Modelling. Cambridge UniversityPress, 2010. → pages 1[67] T. N. Palmer, G. J. Shutts, R. Hagedorn, F. J. Doblas-Reyes, T. Jung, and M. Leutbecher.Representing model uncertainty in weather and climate prediction. Annual Review of Earth andPlanetary Sciences, 33(1):163–193, 2005. → pages 2[68] T. N. Palmer, F. J. Doblas-Reyes, A. Weisheimer, and M. J. Rodwell. Toward seamless prediction:Calibration of climate change projections using seasonal forecasts. Bull. Amer. Meteor. Soc., 89(4):459–470, 2008. → pages 4391[69] G. C. Papanicolaou and W. Kohler. Asymptotic theory of mixing stochastic ordinary differentialequations. Comm. Pure and Appl. Math., 27:641–668, 1974. → pages 2, 16, 43[70] G. A. Pavliotis and A. M. Stuart. Multiscale Methods: Averaging and homogenization. Springer,2007. → pages 3, 25, 43[71] C. Penland. A stochastic model of indopacific sea surface temperature anomalies. Phys. D, 98:534 –558, 1996. Nonlinear Phenomena in Ocean Dynamics. → pages 2, 3[72] C. Penland and B. D. Ewald. On modelling physical systems with stochastic models: diffusionversus le´vy processes. Philosophical Transactions of the Royal Society A: Mathematical, Physicaland Engineering Sciences, 366(1875):2455–2474, 2008. → pages 85[73] C. Penland and P. D. Sardeshmukh. Alternative interpretations of power-law distributions found innature. Chaos, 22(2):023119, 2012. → pages 17, 65, 66, 67, 68, 82, 87[74] P. Protter. Stochastic Integration and Differential Equations. Springer-Verlag, 1990. → pages 97[75] H. Risken. The Fokker-Planck Equation: Methods of Solution and Applications. Springer-Verlag,1989. → pages 6, 7, 29[76] D. Rosadi and M. Deistler. Estimating the codifference function of linear time series models withinfinite variance. Metrika, 73(3):395–429, 2009. → pages 100[77] B. Saltzman. Dynamical Paleoclimatology. Academic Press, 2002. → pages 1, 44[78] T. Sampe and S.-P. Xie. Mapping high sea winds from space: A global climatology. Bull. Amer.Meteor. Soc., 88:1965–1978, 2007. → pages 18[79] P. D. Sardeshmukh and P. Sura. Reconciling Non-Gaussian Climate Statistics with LinearDynamics. Journal of Climate, 22(5):1193–1207, 2009. → pages 16, 17, 65, 66, 82, 83, 85, 87[80] K.-I. Sato. Le´vy processes and Infinitely Divisible distributions. Cambridge University Press, 1999.→ pages 102[81] A. Schenzle and H. Brand. Multiplicative stochastic processes in statistical physics. Phys. Rev. A,20:1628–1647, Oct 1979. → pages 65, 68[82] K.-H. Seo and K. P. Bowman. Le´vy flights and anomalous diffusion in the stratosphere. J.Geophys. Res., 105(D10):295–302, 2000. → pages 3, 16, 44, 83[83] T. H. Solomon, E. R. Weeks, and H. L. Swinney. Chaotic advection in a two-dimensional flow:Le´vy flights and anomalous diffusion. Phys. D, 76:70–84, 1994. → pages 3, 16, 44[84] H. Sørensen. Parametric inference for diffusion processes observed at discrete points in time: asurvey. International Stat. Rev., 72(3):337–354, 2004. → pages 15, 18[85] T. Srokowski. Correlated Le´vy noise in linear dynamical systems. Acta Phys. Polon. B, 42(1):3–19,2011. → pages 4, 16, 17, 45, 61[86] X. Sun, J. Duan, and X. Li. An alternative expression for stochastic dynamical systems withparametric poisson white noise. Probab. Eng. Mech., 32(0):1 – 4, 2013. → pages 97, 99[87] P. Sura, M. Newman, C. Penland, and P. Sardeshmukh. Multiplicative noise and non-gaussianity:A paradigm for atmospheric regimes? Jour. Atmos. Sci., 62(5):1391–1409, 2005. → pages 16, 8592[88] M. S. Taqqu and G. Samarodnitsky. Stable Non-Gaussian Random Processes: Stochastic Modelswith Infinite Variance. CRC Press, 1994. → pages 44, 45, 46, 56, 68, 100[89] C. Tsallis, S. Levy, A. Souza, and R. Maynard. Statistical-Mechanical Foundation of the Ubiquityof levy distributions in nature. Phys. Rev. Lett., 75(20):3589–3593, 1995. → pages 3[90] M. Veillette. Stbl: Alpha stable distributions for matlab, 2012. URLhttp://www.mathworks.com/matlabcentral/fileexchange/37514-stbl--alpha-stable-distributions-for-matlab.Downloaded on Feb. 12, 2014. → pages 56, 106[91] E. Wong and M. Zakai. On the convergence of ordinary integrals to stochastic integrals. TheAnnals of Mathematical Statistics, 36(5):1560–1564, 1965. → pages 9, 15[92] E. Wong and M. Zakai. Riemann-stieltjes approximations of stochastic integrals.Wahrscheinlicheitstheoreie, 12(2):87–97, 1969. → pages 9, 15[93] J. Wouters and V. Lucarini. Disentangling multi-level systems: averaging, correlations andmemory. J. Stat. Mech. Theor. Exper., 2012(03):P03003, 2012. URLhttp://stacks.iop.org/1742-5468/2012/i=03/a=P03003. → pages 16[94] J. Wouters and V. Lucarini. Multi-level dynamical systems: Connecting the ruelle response theoryand the mori-zwanzig approach. J. Stat. Phys., 151(5):850–860, 2013. → pages 16[95] A. Wy loman´ska, A. Chechkin, J. Gajda, and I. M. Sokolov. Codifference as a practical tool tomeasure interdependence. Phys. A, 421(0):412 – 429, 2015. → pages 68, 73[96] Y. Xu, J. Duan, and W. Xu. An averaging principle for stochastic dynamical systems with Le´vynoise. Phys. D, 240(17):1395–1401, 2011. ISSN 01672789. → pages 45, 63[97] J. Yu. Empirical Characteristic Function Estimation and Its Applications. Econometric Reviews,23(2):93–123, 2004. → pages 10593Appendix AAppendix to Chapter 3A.1 Evaluation of the characteristic function ψIn this appendix we show the explicit calculations involving the asymptotic evaluation of the integral(3.24) that contributes to (3.22),∫ t0|Λ(r)|αΞ(Λ(r);α, β) dr (A.1)where Ξ, Λ are defined in (3.2), (3.23) respectively. We focus on two different temporal regimes, t = O()and t = O(1).In the case where t = O(), we make a change of variable y = r in (A.1),∫ t˜0|Λ(y)|αΞ(Λ(y);α, β) dy, t˜ = t/ = O(1). (A.2)We use a Taylor series approximation for |Λ(y)|α to expand the integrand in powers of for 0 < 1,|Λ(y)|α =∣∣∣∣l exp (g2y)−m1−γf2g2(1− exp (g2y))∣∣∣∣α(A.3)∼ |l|α exp(αg2y)− 1−γα|l|α−1 sgn (l)mf2g2[exp((α− 1)g2y)− exp (αg2y)] . (A.4)The function Ξ(s;α, β) = 1 − iβ sgn (s) tan(piα/2) depends on s only though sgn(s) and is thereforeconstant except when s changes sign. In the t = O() limit, Λ(y) = l exp(g2y)+O(1−γ) and so the valueof Ξ is determined by the sign of l for yt and sufficiently small:Ξ(Λ(y);α, β) = Ξ(l;α, β). (A.5)Substituting (A.4), (A.5) into (A.2), we obtain an approximation for (A.1) for t = O(),∫ t0|Λ(r)|αΞ(r;α, β) dr ≈ −|l|αΞ(l;α, β)αg2(1− exp (g2t/)) (A.6)+ 2−γ |l|α−1Ξ(l;α, β) sgn (l)mf2g22( αα− 1[1− exp((α− 1)g2t/)]− [1− exp (αg2t/)])+O(3−2γ).(A.7)94Keeping only the leading order term in l, we get the approximation to (A.1) valid for t = O() that iswritten in (3.24).For t = O(1), (A.1) depends on both global contributions from terms that are significant over theentire domain of integration, and on local contributions that are concentrated on an interval of lengthO() near t = 0. Methods for handling such integrals can be found in [35]. For the global contributions,we neglect terms in Λ(r) that are exponentially small for 0 < 1 and r = O(1) to get the leading orderbehaviours of |Λ(r)|α and Ξ(Λ(r);α, β),|Λ(r)|α ∼∣∣∣−1−γmf2g2∣∣∣α= ∣∣∣f2g2∣∣∣α|m|α,Ξ(Λ(r);α, β) = Ξ(f2m;α, β) = Ξ(m;α, β∗),(A.8)where β∗ is defined in (3.25). Then we evaluate (A.1) to get the global contribution plus a remainder, R.∫ t0|Λ(r)|αΞ(Λ(r);α, β) dr = ∣∣∣∣f2g2∣∣∣∣α|m|αΞ(m;α, β∗)t+R. (A.9)Then R is primarily the local contribution near r = 0 and can be written asR =∫ t0|Λ(r)|αΞ(Λ(r);α, β) dr − ∣∣∣∣f2g2∣∣∣∣α|m|αΞ(m;α, β∗)t =∫ t0exp(αg2r)Υ(r) dr (A.10)Υ(r) =∣∣∣∣l −m1−γf2g2(exp(g2r)− 1)∣∣∣∣αΞ(Λ(r);α, β) (A.11)− exp(−αg2r) ∣∣∣∣f2g2∣∣∣∣α|m|αΞ(m;α, β∗).The remainder is approximated by evaluating the integral asymptotically using Watson’s Lemma [35] andkeeping the first three terms,R ∼ −|l|ααg2Ξ(l;α, β) + 2−γf2g22|l|α−1Ξ(l;α, β) sgn (l)m( 1α +1α2)+ 3−2γf22g32|l|α−2m2Ξ(l;α, β)( 1α2 −1α)− 32αg2∣∣∣∣f2g2∣∣∣∣α|m|αΞ(m;α, β∗). (A.12)Substituting (A.12) into (A.9), we obtain the approximation to (A.1) valid for t = O(1), given in (3.24).A.2 Numerical methodsA.2.1 Simulating the SDEsIn this appendix, we describe the numerical methods used to simulate trajectories of the systems givenin Section 3.3. Depending on whether the stochastic dynamics of a system are interpreted in the senseof Ito¯ or Marcus, the stochastic forcing term is treated differently and give details below in AppendicesA.2.1 and A.2.1.For the systems studied in Section 3.3, we simulate the full systems of the form (3.3, 3.4) using atime step of size, δt (referred to as the simulation time step). Each simulated time series was sampled95System variable characteristic time scale δt(3.56, 3.57) yt ≥ Dt 10−3(3.59) ξt (1− ac)−1 ≥ 5/4 Dt(3.62, 3.63)∗ yt a ≥ Dt 2× 10−4(3.64)∗∗ Xt 1 Dt(3.65) ξt 1 Dt(3.66, 3.67) yt (1 + |xt|)−1 ( ≥ Dt) 5× 10−4(3.68)∗∗∗ Xt 1 Dt(3.69) ξt 1 Dt(3.70, 3.71) yt a ≥ Dt 10−4(3.72)∗ Xt 1 Dt(3.73) ξt 1 DtTable A.1: Simulation parameters for the examples considered in Section 3.3. The indicated vari-able determines the smallest characteristic time scale of the system, which is also indicated.All systems were sampled on the time step Dt = 10−2. Single asterisks indicate the systemwas simulated using a higher-order method as described by (A.15). Double/Triple asterisksindicate numerical Marcus integration without/with numerical evaluation of the Marcus map,θ (A.28).on a coarser time grid with sampling time step Dt = 10−2. The simulation time step, δt ≤ 110Dt, withDt no longer than the characteristic time scale of the fast process. The choice of time steps was takento ensure that the simulation time scale and the characteristic time scale of the fast process were well-separated and that we obtain a sufficiently long time series while keeping the number of data points to a“computationally tractable” number. Different simulation time steps were used for different systems tocompromise between issues of numerical stability and computation time, but all systems, including theapproximations, have the same sampling time step, Dt.For the (L) and (N+) approximations in Section 3.3, we used δt = Dt. This choice samples thedynamics sufficiently well since the characteristic time scale of the processes is O(1). Thus we ensure thatboth the full and reduced systems are simulated over the same time intervals with the same sample datasizes. Table A.1 gives the parameters used in our numerical simulations.Numerical simulations for Section 3.3: Ito¯ interpretationFor all of the (L) approximations simulated in Section 3.3, ξt satisfies an SDE of the formdξt = H(ξt) dt+ κ(xt) dL(α,β)t , ξ0 = 0, (A.13)where xt is treated as constant with respect to ξt, so that the noise term has the Ito¯ interpretation. As weare interested in the stationary distribution for xt + ξt, we take the initial value for xt to be its stationaryvalue in all numerical simulations of the (L) approximation, and hence there is no need to numericallysimulate xt as it is constant. We denote the constant value κ(xt) as κ. To numerically simulate (A.13),in most cases we use an Euler-type numerical approximation,ξˆn+1 = ξˆn +H(ξˆn) δt+ κ δL(α,β)n , ξˆ0 = 0, n = 0, 1, 2, . . . (A.14)where ξˆn denotes the numerical approximation to ξt at time t = n·δt and the terms δL(α,β)n ∼DSα(β, (δt)1/α).Numerical schemes analogous to (A.14) are used for all systems in this paper, unless otherwise indicated96in Table A.1. For two systems, the weak order 1.0 predictor-corrector method [46] was required for ad-ditional accuracy: i) nonlinear system 1 (3.62, 3.63) to ensure the x = 0 boundary was not crossed andii) the (N+) approximation of system 3 (3.72) as the cubic term caused numerical trajectories to divergewhen the standard Forward Euler step method was used. The predictor-corrector numerical scheme isgiven by ξˆn+1 = ξˆn + 12(H(ξˆ†) +H(ξˆn))δt+ κ δL(α,β)nξˆ† = ξˆn +H(ξˆn) δt+ κ δL(α,β)n, ξˆ0 = 0. (A.15)In all cases the increments of α-stable noise are given by δL(α,β)n , with stability parameter α ∈ (0, 1)∪(1, 2),skewness parameter β and scale parameter σ = (δt)1/α. We use the algorithm in [21] to generate thesequantities as follows,δL(α,β)n = Bα,βsin(α(ζ1 + Cα,β))cos(ζ1)1/α(cos(ζ1 − α(ζ1 + Cα,β))ζ2)(1−α)/α(A.16)where Bα,β = (δt)1/α(cos(arctan(β tan(piα2))))−1/α ,Cα,β = 1α arctan(β tan(piα2)) (A.17)and ζ1, ζ2 are respectively a uniform random variable on the open interval (−pi/2, pi/2) and an exponentialrandom variable with parameter 1. In the case α = 2, the increments δL(2,β)n = δL(2,0)n are Gaussianrandom variables with variance 2δt and the simulation scheme (A.14) is then an Euler-Maruyama method[46].Numerical methods for evaluating Marcus integralsWe summarize results of [3, 7, 31, 48, 49, 74, 86] relevant for processes zt satisfying SDEs with multiplica-tive noise terms in the sense of Marcus which are indicated by the “” symbol, such as (3.64) and (3.68).These systems are of the form,dzt = H(zt) dt+ κ(zt) dL(α,β)t , t ≥ 0 (A.18)and can be written in integral form aszt = z0 +∫ t0H(zs) ds+∫ t0κ(zs) dL(α,β)s . (A.19)The first integral on the right-hand side of (A.19) corresponds to the drift terms and is numericallyapproximated using standard methods such as Euler, explicit trapezoidal methods. Here L(α,β)t is a purejump process and so the Marcus increments are expressed in terms of the Marcus map, θ(r; ∆Ls, zs−) asfollows,∫ t0κ(zs) dL(α,β)s =∑s≤t[θ(1; ∆Ls, zs−)− zs−] . (A.20)97where θ(r; ∆Ls, zs−) satisfiesdθ(r; ∆Ls, zs−)dr = ∆Ls κ(θ(r; ∆Ls, zs−)), θ(0; ∆Ls, zs−) = zs− (A.21)[1, 7]. In the Marcus map θ(r; ∆Ls, zs−), r is a time-like variable in which the process zt travels infinitelyfast along a curve connecting the initial and end “times” of the jump, ∆Ls, given by r = 0 and r = 1,respectively. The jump occurring at time s, denoted ∆Ls = dL(α,β)s , is a random variable distributedaccording to an α-stable law, Sα(β, dt1/α). The resulting effect of the jump on the process zt is determinedby the size of the jump as well as the integrated effect of the noise coefficient over the course of the jump[1]. When an expression for θ can be derived and expressed in a closed form expression, then zt can benumerically estimated aszˆn+1 = zˆn +H(zˆn) δt+ [θ(1; δL(α,β)n , zˆn)− zˆn]. (A.22)where δL(α,β)n is as described in Appendix A.2.1. For example, in the (N+) approximation of nonlinearsystem 1 (3.64), θ(1; δL(α,β)n , zˆn) = zˆn exp(bτ δL(α,β)n).In cases where θ needs to be evaluated numerically, as in the case of the (N+) approximation ofnonlinear system 2 (3.68), we follow [3, 31] to approximate the α-stable noise increments dL(α,β)t as thesum of a (path continuous) Gaussian white noise process and a (pure jump) compound Poisson process.This process effectively decomposes dLs into small jumps and large jumps, approximated respectivelywith the Gaussian white noise process and the Poisson process. We consider the case where the noise issymmetric (i.e. β = 0) as the case of asymmetric noise has not been studied explicitly (to our knowledge).It is necessary to define a large jump threshold R > 0 to approximate dL(α,0)t asdL(α,0)t ≈ η dWt + dNt (A.23)where η is a constant satisfyingη2 =( α2− α)CαR2−α, Cα =1−αΓ(2−α) cos(piα/2) if α 6= 12/pi if α = 1,(A.24)Wt is a standard Wiener process, and Nt = Nt(ν, λ) is a compound Poisson process with rate λ andmeasure ν given byλ = Cα/Rα, ν(x) =αRα2 |x|−α+1 if x < −R or x ≥ R0 otherwise.(A.25)We choose R = 1 to be our large jump threshold, as is done in [31]. Since we are approximating α-stablenoise with a sum of a continuous path process and pure jump process, we must use the general represen-tation for stochastic integrals of Le´vy processes that includes contributions from both the continuous and98jump components [1, 10], yielding∫ t0κ(zs) dL(α,0)s ≈ η∫ t0κ(zs) dWs +η22∫ t0κ(zs)κ′(zs) dt+∫ t0κ(zs) dNs (A.26)= η∫ t0κ(zs) dWs +η22∫ t0κ(zs)κ′(zs) dt+∑s≤t[θ(1; ∆Ns, zs−)− zs−] (A.27)where the first integral on the right-hand side is interpreted in the sense of Ito¯ . The term ∆Ns representsthe jump in the process Nt occurring at time s (if any), accounting for large jumps in the process L(α,0)t .Then ∆Ns is distributed as a compound Poisson process evaluated over the infinitesimal time step dtwith rate and measure as in (A.25). We see that when α = 2, then the jump rate λ = 0 (i.e. ∆Nt = 0for all t and the sum in (A.27) vanishes). Thus (A.27) reduces to the sum of the Ito¯ integral and theWong-Zakai correction and η2 = 2, which is equivalent to the Stratonovich integral [65]. When α = 2 andκ is constant, then (A.27) reduces to the Ito¯ integral, as expected.To simulate (A.18) with β = 0, and time step δt, we use the numerical schemezˆn+1 = zˆn +(H(zˆn) +η22 κ(zˆn)κ′(zˆn))δt+ ηκ(zˆn) δWn + [θˆ(1; δNn, zˆn)− zˆn] (A.28)where δWn ∼DN (0, δt) and δNn are discrete increments of a compound Poisson process over the time stepδt with rate and measure given by (A.25) which can be simulated as in [10]. The term θˆ(1; δNn, zˆn) = θˆMis the numerical approximation to θ(1; δNn, zˆn), whereθˆj+1 = θˆj + δNnκ(θˆj) δu, θˆ0 = zˆn,(j = 0, 1, . . . ,M − 1, δu = 1M). (A.29)This numerical method for simulating zt when θˆ needs to be evaluated is similar to the tau leapingmethods that are given in [48] for evaluating Marcus integrals, and a more explicit description can befound therein (also see [49] and [86]). Our numerical method converges weakly and more technical detailsrelated to the convergence of our method are discussed in [10, 31].Here we have written the numerical solutions (A.22), (A.28), and (A.29) using the Euler method forthe drift terms, but it would be possible to use higher-order methods, such as the explicit trapezoidalmethod to numerically integrate the drift terms.A.2.2 Estimating the PDFsTo ensure accurate estimation of the tails of the sample PDFs, we estimate a value N for the size of thesample of the time varying processes that provides reasonable confidence in the estimates for the PDFs.This value of N is based on a calculation for estimated confidence intervals for the tails of the PDFs. HereN is equal to the length of the time interval of the simulation divided by Dt, so that in our calculationswe have N discrete observations of process zt sampled at intervals of Dt = 10−2. However, in orderto obtain an approximation for the confidence intervals of the tails of the PDF, we use approximationsbased on independent observations, so we require a subsample Zj , j = 1, . . . , NZ of zt which has NZalmost-independent observations. Given the sampling time step Dt = 10−2 and that the characteristic99time scale of the slow variable is O(1) (as can be seen by inspection of the ACDs above in Fig. 3.4, 3.6),we need to subsample zt once every 1/Dt = 102 points to ensure that observations of Z are sufficientlyindependent. Then N is determined as NZ × 102, with NZ determined from confidence intervals for thePDF of Z.To obtain confidence intervals for the tails of the PDFs, we discretize the state space of Z into ndisjoint bins where the true probability of observing Zj within bin i is pii. Under these assumptions, theestimated number of observations of Zj in the n discrete bins is given by a multinomial distribution withNZ observations and probabilities {pii}ni=1 (∑ni pii = 1). Obviously, the pii for the bins that correspondto the tail of the density are very small, yet we want to ensure that NZ (and thus N) is sufficiently largeto obtain a good approximation for the tails of the density in our simulations. Therefore, we derive anexpression for the confidence intervals of pii based on NZ and n. Specifying that small pii (pii = 10−5) fallwithin a 95% confidence interval of width O(10−5) about pii, we obtain a minimum value of NZ . First,we approximate the multinomial distribution for the number of observations using a multivariate normalapproximation which is justified on the basis of large NZ . We then apply the confidence interval suggestedin [56] by Quesenberry and Hurst with Goodman’s recommendation and apply the limitNZ →∞ to obtainthe following approximate two-sided 95% confidence interval for pii,P [pˆii − Ωi < pii < pˆii + Ωi] ≈ 0.95 (A.30)pˆii =1N {# observations of Z ∈ bin i} , (A.31)Ωi =√MZ pˆii(1− pˆii)NZ, MZ = χ21,1−0.05/n (A.32)where χ2m,1−c denotes the (1 − c)% quantile of the χ2 distribution with m degrees of freedom. To get aconservative estimate of the value of Ωi we might expect for our simulations, we choose n = 100, and soMZ = χ21,1−0.05/100 ≈ 12.12 = O(101). We wish to have narrow confidence intervals where pˆii is withinO(10−5) of pii = 10−5, which implies that Ωi = O(10−5) and NZ satisfies,NZ = MZpii/Ω2i = O(106). (A.33)Since N = NZ/Dt, we conclude that 108 points with time step Dt are needed in the full simulation toresolve the characteristic time scale, the sampling of zt, and the subsampling of Z.A.2.3 Estimating the autocodifference functionGiven a time-dependent stochastic process zt, the autocodifference of zt, Az(τ), is defined by (3.60)[76, 88]. To numerically estimate Az(s) we need to estimate both the CF of zt, Φz(k) = E [exp(ikzt)]and the joint CF of zt and a time-shifted version, zt+τ , Ψz(k, l, τ) = E [exp(ikzt+τ + ilzt)]. Assume wehave N observations of zt, which we denote by zˆn = zn·Dt, n = 0, 1, . . . , N − 1. The estimates for100Φz(k), Ψz(k, l, n ·Dt) are given byΦˆ[J,K]z (k) =1K − J + 1K∑j=Jexp(ikzˆj) ,Ψˆ[J,K]z (k, l, n) =1K − J + 1K∑j=Jexp(ikzˆj+n + ilzˆj) ,(A.34)respectively, where J,K are integers satisfying 0 ≤ J ≤ K ≤ N − 1 − n. The estimate for Az(n · Dt)which we denote by Aˆz(n ·Dt) can be computed at discrete time displacements n ·Dt using (A.34) andis given byAˆz(n ·Dt) = log(Ψˆ[0,N−1−n]z (1,−1, n))− log(Φˆ[n,N−1]z (1))− log(Φˆ[0,N−1−n]z (−1))(A.35)This estimator is reasonably accurate for short times where n N as there are N − n pairs of datapoints with relative time-displacement n · δt, and by the law of large numbers Ψˆ, Φˆ given in (A.34) arereasonable estimators. For our study, we consider n ≤ 400 N and so A(n ·Dt) is well-estimated.101Appendix BAppendix to Chapter 4B.1 Intermediate calculationsB.1.1 Asymptotic characteristic function for YjLet α ∈ (1, 2) and b > 0. It can be shown, as in [80], that∫ ∞beiky dyy1+α = |k|αΓ(−α)e−i sgn(k)piα/2 +∫ ∞b1 + ikyy1+α dy −∞∑n=2(ik)nbn−α(n− α)Γ(n+ 1) . (B.1)We could simplify this expression further, but we leave it in its present form to make later simplificationseasier to justify. Using this result, we can show that the Fourier transform of the PDF given by (4.23),ψY , satisfiesψY (k) =∫Rexp(iky)uY(y) dy = q−(∫ −a−∞eiky dy|y|−(1+α))+(∫ a−aeikyuY(y) dy)+ q+(∫ ∞aeiky dyy−(1+α))(B.2)=(Γ(−α)(q+ + q−) cos(piα2))|k|α(1− iβ˜ sgn (k) tan(piα2))(B.3)+ q−(∫ ∞a1− ikyy1+α dy −∞∑n=2(−ika)na−α(n− α)Γ(n+ 1))+ q+(∫ ∞a1 + ikyy1+α dy −∞∑n=2(ika)na−α(n− α)Γ(n+ 1))+(∫ a−aeikyuY(y) dy).where β˜ = q+−q−q++q− . Since the integral of uY must equal 1, the sum of the very last term in (B.3) and thetwo integral terms with integrand y−(1+α) equate to 1. If we additionally assume that the mean of Yj is102equal to zero, then the expression for ψY simplifies to,ψY (k) = 1 +(Γ(−α)(q+ + q−) cos(piα2))|k|α(1− iβ˜ sgn (k) tan(piα2))− q−(∞∑n=2(−ik)nan−α(n− α)Γ(n+ 1))− q+(∞∑n=2(ik)nan−α(n− α)Γ(n+ 1))+(∫ a−a(eiky − 1− iky)uY(y) dy)(B.4)= 1 +(Γ(−α)(q+ + q−) cos(piα2))|k|α(1− iβ˜ sgn (k) tan(piα2))+((q+ + q−) a2−α2(2− α) −∫ a−ay2uY(y) dy)k2 +O(k3) (B.5)For small arguments, log(ψY (k)) is approximated bylog(ψY (k)) ∼(Γ(−α)(q+ + q−) cos(piα2))|k|α(1− iβ˜ sgn (k) tan(piα2))−Qk2 +O(k3) (B.6)whereQ =∫ a−ay2uY(y) dy −((q+ + q−)2− αa2−α2). (B.7)Thus the characteristic function ψY (k) is given by (4.25) for small arguments, k.B.2 Simulating the CAM noise processSince the CAM noise processes must be evolving on a time scale that is much faster than the slow processwe use the weak order 2.0 explicit method [46] to simulate the CAM noise process rather than the Euler-Maruyama method which has weak order 1.0. Applying the weak order 2.0 method to the dynamics (4.1)gives the following weak numerical approximation yˆn = ynδt,yˆn+1 = yˆn +L˜2 (yˆn + Υ) δt+14(EΥ+ + 2Eyˆn + EΥ− + 4g)δW1,n+ 14(EΥ+ − EΥ−)(δW1,n2 − δt√δt)+ b δW2,n (B.8)where Υ = yˆn + L˜yˆnδt+ (Eyˆn + g)δW1,n + b δW2,n,Υ± = yˆn + L˜yˆnδt± (Eyˆn + g)√δt+ b δW2,n,(B.9)L˜ = L + E2/2 and δt, is the size of the discrete time step. The terms δW1,n, δW2,n, n = 0, 1, 2, . . . areindependent Gaussian random variables with mean 0 and variance δt.B.2.1 Consistency of simulationsUsing the weak order 2.0 simulation scheme (B.8) to simulate yt/ as in (4.1), we compare both theconvergence of the numerical scheme to the target CAM noise distribution u(0) and the convergence ofthe numerically integrated CAM noise process to the attracting α-stable distribution.103The CAM noise distributionThe stationary PDF for yt/, u(0) as given by (4.4). We consider two sets of parameters and vary thesimulation time step size to see the threshold where we expect convergence of the simulations to thestationary linear CAM distribution, u(0).−5 0 510−410−2100|uˆ−u(0) |/u(0)yδ t = ǫδ t = ǫ/10δ t = ǫ/100−5 0 510−410−2100102|uˆ−u(0) |/u(0)yδ t = ǫδ t = ǫ/10δ t = ǫ/100Figure B.1: The relative difference between the histogram-based empirical PDF of yt simulatedaccording to (B.8) and the true CAM noise PDF for step sizes δt as indicated, = 10−3.The time difference between successive sampled points is equal to 10−1 in all cases. Thenumber of sampled points for each time series in N = 105. Left/Right: (L,E, g, b) =(−1, 1.1547, 0, 1)/(−1, 1.0541, 0.6, 0.4).From Figure B.1, we see for both sets of parameters that for δt = , the numerical simulations havenot converged to the CAM noise distribution, which is not surprising since the simulation time step ison the order of the characteristic time of the process being simulated. For the values δt = /10 and/100, convergence appears to be reasonable. The difference in convergence between the PDFs simulatedusing δt = /10 and /100 is negligible. This suggests that we should take δt to be at least one orderof magnitude smaller than when simulating the fast CAM noise process yt/ to ensure our numericalresults are consistent with theory.The integral of the CAM noise processTo test whether the simulation step-size has an effect on the estimates of Yj =∫ j∆(j−1)∆, we compute asample of Yj for increasingly small values of to observe the convergence of the distribution of Yj to astable distribution. Unfortunately, we do not know a priori the scale parameter for the distribution of Yjfor ∆ large. However, from our numerical results in Figure B.2 we see, as in the case of the CAM noisedistribution, that there does not seem to be a significant improvement when taking smaller time stepsthan δt ≈ /10 in our estimations of Yj . .B.2.2 Summary of implications for numerical simulationsThe above results suggest that simulating the fast CAM noise process with time discretization δt ≤ /10is sufficient to have confidence in our numerical computations. All calculations in this paper are madewith δt satisfying this requirement.104−0.5 0 0.510−2100102u ( 0 . 1 )YY iδ t = ǫδ t = ǫ/10δ t = ǫ/100−0.5 0 0.510−2100102u ( 0 . 1 )YY iδ t = ǫδ t = ǫ/10δ t = ǫ/100Figure B.2: Histogram-based empirical PDFs of∫ 0.10 yt/ dt simulated according to (B.8) using sim-ulation step size δt as indicated. The sample size is N = 104. Left/Right: (L,E, g, b) =(−1, 1.1547, 0, 1)/(−1, 1.0541, 0.6, 0.4).B.3 Estimating θTo estimate the value of θ in the dynamics of zt (4.14) such that∫ ∆0 zs/ ds converges to the samedistribution as Yj , we estimate the distribution of Yj by numerically computing the integral of the processusing a trapezoidal integration method, a time resolution with step sizes δt ≤ /10, and a value of∆ = O(1) sufficiently large that the distribution of Yj =∫ j∆(j−1)∆ ys/ ds is approximately α-stable. Weestimate the scale parameter for the distribution of Yj giving us a ∆-dependent estimate for σ˜, which wedenote ˆ˜σ(∆). To estimate the value of the scale parameter σ˜ we use a method outlined in [97] to estimatethe characteristic function for Yj , described briefly in Appendix B.3.1 below. For a given value of θ, thedistribution of∫ ∆0 zs/ ds (4.20) will be α-stable with stability index α∗ and scale parameter γσ∗∆1/α/θ.Setting ˆ˜σ(∆) = γσ∗∆1/α/θ, we determine our estimate for θ:θ = γσ∗∆1/αˆ˜σ(∆). (B.10)It should be noted that ˆ˜σ(∆) is proportional to γ∆1/α in the limit ∆/→∞ and hence the estimate forθ, in theory, does not depend on the values of ∆, .B.3.1 Estimating the scale parameterTo estimate ΣY , we numerically evaluate the characteristic function for a number of realizations of Yj , j =1, 2, . . . , NY . An estimate of the characteristic function of Yj , ψYj (l) = E [exp(ilYj)], is given byψˆYj (l) =1NYNY∑n=1exp(ilYj). (B.11)By the discussion in Section 4.3.2, the characteristic function ψYj will converge weakly to that of astable random variable with known stability index α∗, known skewness parameter β∗ and unknown scale105parameter ˆ˜σ(∆) which has the characteristic function,ψYj (l;σ)→ exp(σα∗ |l|α∗Ξ(l, α∗, β∗)), (B.12)in the limit → 0 (having the same form as (4.28)).To estimate σ˜, we use a nonlinear least squares method to minimize the objective function, J , over aset of wavenumbers {lj}nlj=1, obtaining ˆ˜σ(∆):ˆ˜σ(∆) = arg minσJ(σ), J(σ) =nl∑j=1(ψYj (lj ;σ)− ψˆYj (lj))2. (B.13)We note that the objective function J is proportional to integral estimates of the square of the differencebetween ψYj and ψˆYj on the interval [min({lj}), max({lj})].Using this estimator was necessary for our study despite the fact that estimators that can simultane-ously estimate all parameters for a stable distribution exist. For example, the STBL package for MATLAB[90] is meant to do this. However, for values of α near 2, the estimator is not reliable in determining anestimate of α (which we know exactly) and consequently the values of the corresponding scale parameterare not reliable.106
- Library Home /
- Search Collections /
- Open Collections /
- Browse Collections /
- UBC Theses and Dissertations /
- Parametrization and multiple time scale problems with...
Open Collections
UBC Theses and Dissertations
Featured Collection
UBC Theses and Dissertations
Parametrization and multiple time scale problems with non-Gaussian statistics related to climate dynamics Thompson, William F. 2015
pdf
Page Metadata
Item Metadata
Title | Parametrization and multiple time scale problems with non-Gaussian statistics related to climate dynamics |
Creator |
Thompson, William F. |
Publisher | University of British Columbia |
Date Issued | 2015 |
Description | Many problems in climate modelling are characterized by their chaotic nature and multiple time scales. Stochastic parametrization methods can often simplify the analysis of such problems by using appropriate stochastic processes to account for degrees of freedom that are impractical to model explicitly, such that the statistical features of the reduced stochastic model are consistent with more complicated models and/or observational data. However, applying appropriate stochastic parametrizations is generally a non-trivial task. This is especially true when the statistics of the approximated processes exhibit non-Gaussian features, like a non-zero skewness or infinite variance. Such features are common in problems with nonlinear dynamics, anomalous diffusion processes, and multiple time scales. Two common topics in stochastic parameterization are model parameter estimation and the derivation of reduced stochastic models. In this dissertation, we study both of these topics in the context of stochastic differential equation models, which are the preferred formalism for continuous-time modelling problems. The motivation for this analysis is given by problems in atmospheric or climate modelling. In Chapter 2, we estimate parameters of a dynamical model of sea surface vector winds using a method based on the properties of differential operators associated with the probabilistic evolution of the wind model. The parameter fields we obtain allow us to reproduce statistics of the vector wind data and inform us regarding the limitations of both the estimation method and the model itself. In Chapter 3, we derive reduced stochastic models for a class of dynamical models with multiple time scales that are driven by α-stable stochastic forcing. The results of Chapter 3 are applied in Chapter 4, where we derive a similar approximation for processes that are driven by a fast linear process experiencing additive and multiplicative Gaussian white noise forcing. The results of these chapters complement previous results for systems driven by Gaussian white noise and suggest a possible dynamical mechanism for the appearance of α-stable stochastic forcing in some climatic time series. |
Genre |
Thesis/Dissertation |
Type |
Text |
Language | eng |
Date Available | 2015-08-20 |
Provider | Vancouver : University of British Columbia Library |
Rights | Attribution-NonCommercial-NoDerivs 2.5 Canada |
DOI | 10.14288/1.0166606 |
URI | http://hdl.handle.net/2429/54557 |
Degree |
Doctor of Philosophy - PhD |
Program |
Mathematics |
Affiliation |
Science, Faculty of Mathematics, Department of |
Degree Grantor | University of British Columbia |
Graduation Date | 2015-09 |
Campus |
UBCV |
Scholarly Level | Graduate |
Rights URI | http://creativecommons.org/licenses/by-nc-nd/2.5/ca/ |
Aggregated Source Repository | DSpace |
Download
- Media
- 24-ubc_2015_september_william_thompson.pdf [ 4.24MB ]
- Metadata
- JSON: 24-1.0166606.json
- JSON-LD: 24-1.0166606-ld.json
- RDF/XML (Pretty): 24-1.0166606-rdf.xml
- RDF/JSON: 24-1.0166606-rdf.json
- Turtle: 24-1.0166606-turtle.txt
- N-Triples: 24-1.0166606-rdf-ntriples.txt
- Original Record: 24-1.0166606-source.json
- Full Text
- 24-1.0166606-fulltext.txt
- Citation
- 24-1.0166606.ris
Full Text
Cite
Citation Scheme:
Usage Statistics
Share
Embed
Customize your widget with the following options, then copy and paste the code below into the HTML
of your page to embed this item in your website.
<div id="ubcOpenCollectionsWidgetDisplay">
<script id="ubcOpenCollectionsWidget"
src="{[{embed.src}]}"
data-item="{[{embed.item}]}"
data-collection="{[{embed.collection}]}"
data-metadata="{[{embed.showMetadata}]}"
data-width="{[{embed.width}]}"
async >
</script>
</div>
Our image viewer uses the IIIF 2.0 standard.
To load this item in other compatible viewers, use this url:
http://iiif.library.ubc.ca/presentation/dsp.24.1-0166606/manifest