Source estimation and uncertainty quantificationfor wave-equation based seismic imaging andinversionbyZhilong FangB.Math., Tsinghua University, 2010M.Math., Tsinghua University, 2012a thesis submitted in partial fulfillmentof the requirements for the degree ofDoctor of Philosophyinthe faculty of graduate and postdoctoralstudies(Geophysics)The University of British Columbia(Vancouver)May 2018c© Zhilong Fang, 2018AbstractIn modern seismic exploration, wave-equation-based inversion and imag-ing approaches are widely employed for their potential of creating high-resolution subsurface images from seismic data by using the wave equationto describe the underlying physical model of wave propagation. Despitetheir successful practical applications, some key issues remain unsolved, in-cluding local minima, unknown sources, and the largely missing uncertaintyanalyses for the inversion. This thesis aims to address the following twoaspects: to perform the inversion without prior knowledge of sources, andto quantify uncertainties in the inversion.The unknown source can hinder the success of wave-equation-based ap-proaches. A simple time shift in the source can lead to misplaced reflectorsin linearized inversions or large disturbances in nonlinear problems. Un-fortunately, accurate sources are typically unknown in real problems. Thefirst major contribution of this thesis is, given the fact that the wave equa-tion linearly depends on the sources, I have proposed on-the-fly source es-timation techniques for the following wave-equation-based approaches: (1)time-domain sparsity-promoting least-squares reverse-time migration; and(2) wavefield-reconstruction inversion. Considering the linear dependenceof the wave equation on the sources, I project out the sources by solving alinear least-squares problem, which enables us to conduct successful wave-equation-based inversions without prior knowledge of the sources.Wave-equation-based approaches also produce uncertainties in the re-sulting velocity model due to the noisy data, which would influence thesubsequent exploration and financial decisions. The difficulties related topractical uncertainty quantification lie in: (1) expensive computation relatedto wave-equation solves, and (2) the nonlinear parameter-to-data map. Thesecond major contribution of this thesis is the proposal of a computationallyfeasible Bayesian framework to analyze uncertainties in the resulting veloc-ity models. Through relaxing the wave-equation constraints, I obtain a lessnonlinear parameter-to-data map and a posterior distribution that can beiiadequately approximated by a Gaussian distribution. I derive an implicitformulation to construct the covariance matrix of the Gaussian distribution,which allows us to sample the Gaussian distribution in a computationallyefficient manner. I demonstrate that the proposed Bayesian framework canprovide adequately accurate uncertainty analyses for intermediate to large-scale problems with an acceptable computational cost.iiiLay summaryGeophysical prospecting uses sound waves to reveal subsurface structuresby detecting differences in the wave-speed of various earth media. I use themathematical tool called wave equation to simulate the physical process ofwave propagation in the Earth, and make the following two contributions inthis thesis by addressing some of key issues in using this tool. First, the exactwaveform of the sound wave is usually not known, and therefore affects theaccuracy of wave simulation. I have proposed an approach to estimate thiswaveform using observation of sound waves. Second, noise in the observedsound-wave data can lead to uncertainties in the subsurface structures, andto further risks in business decisions regarding oil-well deployment. To ad-dress this issue, I propose a method to quantify the uncertainties resultedfrom the noise in the data.ivPrefaceAll of my thesis work herein presented is carried out under the supervision ofDr. Felix J. Herrmann, in the Seismic Laboratory for Imaging and Modelingat the University of British Columbia.I prepared Chapter 1 .A revised version of Chapter 2 has been published [Mengmeng Yang,Philipp A. Witte, Zhilong Fang, and Felix J. Herrmann, 2016, Time-domainsparsity-promoting least-squares migration with source estimation, in SEGTechnical Program Expanded Abstracts 2016, p. 4225-4229]. The manuscriptwas written jointly while Mengmeng Yang was the lead investigator and themain manuscript composer. I derived the theory and designed the algorithmfor the source estimation. Mengmeng Yang built up the main structure of thetime-domain sparsity-promoting least-squares migration. I also contributedto the design and the explaination of the numerical experiment. MengmengYang has granted permission for this chapter to appear in my dissertation.Chapter 3 contains the text and figures excerpted from a submitted paper[Zhilong Fang, Rongrong Wang, and Felix J. Herrmann, Source estimationfor wavefield-reconstruction inversion]. I was the lead investigator and themanuscript composer.A short version of Chapter 4 has been published [Zhilong Fang and Fe-lix J. Herrmann, Source estimation for Wavefield Reconstruction Inversion,77th EAGE Conference & Exhibition, 2015, Madrid, Spain], and a revisedfull version has been acceptted by Journal of Geophysics [Zhilong Fang,Rongrong Wang, and Felix J. Herrmann, Source estimation for wavefield-reconstruction inversion]. I was the lead investigator and the manuscriptcomposer.A revised version of Chapter 5 has been reported [Bas Peters, ZhilongFang, Brendan R. Smithyman, and Felix J. Herrmann, 2015, Regulariz-ing waveform inversion by projections onto convex sets - application to the2D Chevron 2014 synthetic blind-test dataset (Research Report number:TR-EOAS-2015-7)]. The manuscript was written jointly with Bas Peters,vBrendan R. Smithyman, and Felix J. Herrmann. I was the lead investigatorand contributed to the source estimation part in the algorithm, carried outall the experiments, and produced all the figures. I wrote the sections ofsource estimation and numerical experiments. Bas Peters and Brendan R.Smithyman designed and implemented the algorithm of optimization withconvex constraints. Bas Peters and Felix J. Herrmann wrote the section ofthe optimization with convex constraints. Given the equivalent contribu-tions made by Bas Peters and I, Bas Peters has granted permission for thischapter to appear in my dissertation.A short version of Chapter 6 has been published [Zhilong Fang, ChiaYing Lee, Curt Da Silva, Tristan van Leeuwen, and Felix J. Herrmann, Un-certainty quantification for Wavefield Reconstruction Inversion using a PDEfree semidefinite Hessian and randomize-then-optimize method, in SEGTechnical Program Expanded Abstracts 2016, p. 1390-1394], and a revisedfull version has been submitted [Zhilong Fang, Curt Da Silva, Rachel Kuske,and Felix J. Herrmann, Uncertainty quantification for inverse problems withweak PDE-constraints]. I was the lead investigator and the manuscript com-poser.viTable of ContentsAbstract . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . iiLay summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . ivPreface . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . vTable of Contents . . . . . . . . . . . . . . . . . . . . . . . . . . . viiList of Tables . . . . . . . . . . . . . . . . . . . . . . . . . . . . . xList of Figures . . . . . . . . . . . . . . . . . . . . . . . . . . . . . xiGlossary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .xviiAcknowledgements . . . . . . . . . . . . . . . . . . . . . . . . . .xviii1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11.1 Least-squares reverse-time migration . . . . . . . . . . . . . . 71.2 Full-waveform inversion and wavefield-reconstruction inversion 111.3 Uncertainty quantification . . . . . . . . . . . . . . . . . . . . 141.4 Objectives . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 181.5 Thesis outline . . . . . . . . . . . . . . . . . . . . . . . . . . . 192 Source estimation for time-domain sparsity-promoting least-squares reverse-time migration . . . . . . . . . . . . . . . . . 212.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . 212.2 Theory . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 232.2.1 Imaging with linearized Bregman . . . . . . . . . . . . 232.2.2 On-the-fly source estimation . . . . . . . . . . . . . . . 252.3 Numerical experiments . . . . . . . . . . . . . . . . . . . . . . 272.3.1 RTM with true and wrong source functions . . . . . . 27vii2.3.2 LS-RTM with linearized Bregman . . . . . . . . . . . 292.3.3 Linearized Bregman with source estimation . . . . . . 312.4 Conclusions . . . . . . . . . . . . . . . . . . . . . . . . . . . . 363 Full-waveform inversion and wavefield-reconstruction in-version . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 373.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . 373.2 Full-waveform inversion . . . . . . . . . . . . . . . . . . . . . 383.3 Wavefield-reconstruction inversion . . . . . . . . . . . . . . . 393.4 Variable projection method . . . . . . . . . . . . . . . . . . . 403.5 FWI v.s. WRI . . . . . . . . . . . . . . . . . . . . . . . . . . 423.6 Conclusions . . . . . . . . . . . . . . . . . . . . . . . . . . . . 434 Source estimation for wavefield-reconstruction inversion . 454.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . 454.2 WRI with source estimation . . . . . . . . . . . . . . . . . . . 484.3 Optimization scheme . . . . . . . . . . . . . . . . . . . . . . . 524.4 Fast solver for WRI with source estimation . . . . . . . . . . 534.5 Numerical experiments . . . . . . . . . . . . . . . . . . . . . . 574.5.1 BG model with frequency-domain data . . . . . . . . . 574.5.2 BG model with time-domain data . . . . . . . . . . . 644.5.3 BG model with noisy data . . . . . . . . . . . . . . . . 644.5.4 Comparison with FWI . . . . . . . . . . . . . . . . . . 674.6 Discussions . . . . . . . . . . . . . . . . . . . . . . . . . . . . 734.7 Conclusions . . . . . . . . . . . . . . . . . . . . . . . . . . . . 785 Wavefield reconstruction inversion with source estimationand convex constraints — application to the 2D Chevron2014 synthetic blind-test dataset . . . . . . . . . . . . . . . . 805.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . 805.2 WRI-SE with regularization by projection onto the intersec-tion of convex sets . . . . . . . . . . . . . . . . . . . . . . . . 815.3 Practical considerations . . . . . . . . . . . . . . . . . . . . . 865.4 Application to the 2D Chevron 2014 blind-test dataset . . . . 875.5 Discussions . . . . . . . . . . . . . . . . . . . . . . . . . . . . 885.6 Conclusions . . . . . . . . . . . . . . . . . . . . . . . . . . . . 926 Uncertainty quantification for inverse problems with weakPDE constraints and its application to seismic wave-equation-based inversions . . . . . . . . . . . . . . . . . . . . . . . . . . 93viii6.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . 936.2 Bayesian framework for inverse problems with a weak PDE-constraint . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 966.2.1 The posterior PDF . . . . . . . . . . . . . . . . . . . . 976.2.2 Selection of λ . . . . . . . . . . . . . . . . . . . . . . . 1026.2.3 Sampling method . . . . . . . . . . . . . . . . . . . . . 1066.3 Uncertainty quantification for seismic wave-equation constrainedinverse problems . . . . . . . . . . . . . . . . . . . . . . . . . 1086.3.1 Steps 1 and 2: designing the posterior and prior PDF 1096.3.2 Steps 3 and 4: Gaussian approximation . . . . . . . . 1106.3.3 Steps 5 and 6: sampling the Gaussian PDFs . . . . . . 1146.3.4 A benchmark method: the randomized maximum like-lihood method . . . . . . . . . . . . . . . . . . . . . . 1156.4 Numerical experiments . . . . . . . . . . . . . . . . . . . . . . 1166.4.1 Influence of the penalty parameter λ . . . . . . . . . . 1166.4.2 A 2D layered model . . . . . . . . . . . . . . . . . . . 1186.4.3 BG Model . . . . . . . . . . . . . . . . . . . . . . . . . 1226.5 Discussion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1306.6 Conclusions . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1337 Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1357.1 Source estimation for time-domain sparsity promoting least-squares reverse-time migration . . . . . . . . . . . . . . . . . 1357.2 Source estimation for wavefield-reconstruction inversion . . . 1367.3 Uncertainty quantification for weak wave-equation constrainedinverse problems . . . . . . . . . . . . . . . . . . . . . . . . . 1377.4 Current limitations . . . . . . . . . . . . . . . . . . . . . . . . 1387.5 Future research directions . . . . . . . . . . . . . . . . . . . . 138Bibliography . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 140ixList of TablesTable 4.1 Comparison of the computational costs for different algo-rithms. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 56Table 4.2 Comparisons between inversion results obtained by WRI-SE-MS and WRI-SE-WS in terms of the final model errors,source errors, and data residuals. . . . . . . . . . . . . . . 61Table 4.3 Comparisons between inversion results obtained by WRI-SE-MS and WRI-SE-WS in terms of the final model errors,source errors, and data residuals. . . . . . . . . . . . . . . 65Table 4.4 Comparisons between inversion results obtained by WRI-SE-MS and WRI-SE-WS in terms of the final model errors,source errors, and data residuals. . . . . . . . . . . . . . . 68Table 4.5 Comparisons between inversion results obtained by FWIand WRI in terms of the final model errors, source errors,and data residuals. . . . . . . . . . . . . . . . . . . . . . . 73Table 6.1 The selection of λ for each frequency . . . . . . . . . . . . 126xList of FiguresFigure 1.1 (a) Schematic of land seismic survey. Image courtesy ION(www.iongeo.com) (b) Schematic of different marine seis-mic surveys. (Source: Caldwell and Walker, 2011) . . . . 3Figure 1.2 (a) One seismic trace, (b) one shot record, and (c) onedata volume. . . . . . . . . . . . . . . . . . . . . . . . . . 4Figure 1.3 The objective of seismic exploration is to reconstruct thesubsurface structures from the observed data. . . . . . . . 5Figure 1.4 The subsurface structure can be divided into two parts:(1) the long-wavelength structure (background velocitymodel) and (2) the short-wavelength structure (velocityperturbation). . . . . . . . . . . . . . . . . . . . . . . . . . 6Figure 1.5 The objective of migration is to map reflection or diffrac-tion seismic data to their true subsurface positions usinga smooth background model that is kinematically correct. 8Figure 1.6 A 1D example to illustrate the impact of the source func-tion to the observed data. The background velocity is4 km/s. There is an reflector at the depth of 1 km. Thewavefield generated by the source function propagatesfrom the surface, is reflected at the depth of 1 km, re-turns back to the surface, and is recorded as the observeddata. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9Figure 1.7 Reconstruct the 1D reflectivity model by using the correct(red) and wrong (blue) source functions. Clearly, the timeshift in the source function yields the location shift of thereflector. . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10Figure 1.8 Schematic FWI workflow. . . . . . . . . . . . . . . . . . . 13xiFigure 1.9 (a) We conduct a simple example on a homogeneous modelwith a constant velocity of 2 km/s. We simulate one shot(F) and receive the data at one receiver (∆) using a cor-rect source (red) and a wrong source (blue). (b) There is atime shift of 0.1 s between the correct and wrong sources.(c) There is also a time shift of 0.1 s between the datagenerated by the correct and wrong sources. . . . . . . . . 15Figure 1.10 Schematic workflow of the Bayesian statistical inversion. . 17Figure 2.1 (a) The smooth background model and (b) the true modelperturbation modified from Sigsbee2A. . . . . . . . . . . . 28Figure 2.2 Comparison of the true (red) and wrong source (blue)functions. . . . . . . . . . . . . . . . . . . . . . . . . . . . 29Figure 2.3 RTM images with the correct (a) and wrong (b) sourcefunctions. . . . . . . . . . . . . . . . . . . . . . . . . . . . 30Figure 2.4 LS-RTM images with the true (a) and wrong (b) sourcefunctions. . . . . . . . . . . . . . . . . . . . . . . . . . . . 32Figure 2.5 Data residuals and model errors along iterations . . . . . 33Figure 2.6 LS-RTM image with the on-the-fly source estimation. . . 34Figure 2.7 Comparison of the true (red), initial (blue), and recovered(green) source functions. . . . . . . . . . . . . . . . . . . . 34Figure 2.8 Data residuals and model errors along iterations . . . . . 35Figure 3.1 Comparison of objective functions of FWI and WRI as afunction of v0. For WRI, we select λ = 102, 103, 104, and105. Clearly, as λ increases the objective function of WRIconverges to that of the reduced formulation. When λ issmall, there are no local minima in the objective functionof WRI. . . . . . . . . . . . . . . . . . . . . . . . . . . . . 43Figure 4.1 (a) The true model. (b) Comparison of the correct (blue)and wrong (red) source functions. . . . . . . . . . . . . . 47Figure 4.2 (a) Inversion result with the true source function. (b)Inversion result with the wrong source function. . . . . . 48Figure 4.3 (a) True model, (b) initial model, and (c) the differencebetween them. . . . . . . . . . . . . . . . . . . . . . . . . 59Figure 4.4 (a) Inversion result with the true source function. (b) In-version result with WRI-SE-MS. (c) Inversion result withWRI-SE-WS. . . . . . . . . . . . . . . . . . . . . . . . . . 60xiiFigure 4.5 (a) Difference between the inversion results using the truesource function and the estimated source function by WRI-SE-MS. (b) Difference between the inversion results usingthe true source function and the estimated source functionby WRI-SE-WS. . . . . . . . . . . . . . . . . . . . . . . . 61Figure 4.6 Comparison of the true source function (+) and the esti-mated source function with WRI-SE-MS (×) and WRI-SE-WS (◦). (a) Phase comparison. (b) Amplitude com-parison. . . . . . . . . . . . . . . . . . . . . . . . . . . . . 62Figure 4.7 Comparison of the relative model errors. . . . . . . . . . . 63Figure 4.8 Comparison of the data residuals corresponding to theinversion results of WRI-SE-MS (dashed line) and WRI-SE-WS (dotted line). . . . . . . . . . . . . . . . . . . . . . 63Figure 4.9 Inversion results with (a) WRI-SE-MS and (b) WRI-SE-WS. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 65Figure 4.10 Comparison of the true source function (+) and the esti-mated source function with WRI-SE-MS (×) and WRI-SE-WS (◦). (a) Phase comparison. (b) Amplitude com-parison. . . . . . . . . . . . . . . . . . . . . . . . . . . . . 66Figure 4.11 Comparison of the data residuals corresponding to theinversion results of WRI-SE-MS (dashed line) and WRI-SE-WS (dotted line). . . . . . . . . . . . . . . . . . . . . . 67Figure 4.12 Inversion results with (a) WRI-SE-MS and (b) WRI-SE-WS. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 68Figure 4.13 Comparison of the true source function (+) and the esti-mated source function with WRI-SE-MS (×) and WRI-SE-WS (◦). (a) Phase comparison. (b) Amplitude com-parison. . . . . . . . . . . . . . . . . . . . . . . . . . . . . 69Figure 4.14 Comparison of the data residuals corresponding to theinversion results of WRI-SE-MS (dashed line) and WRI-SE-WS (dotted line). . . . . . . . . . . . . . . . . . . . . . 70Figure 4.15 (a)Initial model and (b) difference between the initialmodel and true models. . . . . . . . . . . . . . . . . . . . 71Figure 4.16 Comparisons between the observed (dashed line) and pre-dicted data (dotted line) computed with the initial modelfor a source located at the center of the model. (a) Realpart comparison. (b) Imaginary part comparison. . . . . . 72Figure 4.17 Inversion results with (a) WRI and (b) FWI. . . . . . . . 74xiiiFigure 4.18 Vertical profiles of the true model (solid line), initial model(dashdot line), and the inversion results with WRI (dashedline) and FWI (dotted line) at (a) x = 1000 m, (b)x = 2500 m, and (c) x = 4000 m. . . . . . . . . . . . . . 75Figure 4.19 Comparison of the true source function (+) and the esti-mated source function with FWI (×) and WRI (◦). (a)Phase comparison. (b) Amplitude comparison. . . . . . . 76Figure 4.20 Comparisons between the observed (dashed line) and pre-dicted data (dotted line) computed with the final resultsof WRI and FWI. (a) Real part and (b) imaginary partof WRI result. (c) Real part and (d) imaginary part ofFWI result. . . . . . . . . . . . . . . . . . . . . . . . . . . 77Figure 5.1 The trajectory of Dykstra’s algorithm for a toy examplewith constraints y ≤ 2 and x2 + y2 ≤ 3. Iterates 5, 6 and7 coincide; the algorithm converged to the point closestto point number 1 and satisfying both constraints. . . . . 84Figure 5.2 Initial model . . . . . . . . . . . . . . . . . . . . . . . . . 88Figure 5.3 Comparison of inversion result with and without smooth-ness constraint. (a) Inversion result without smoothnessconstraint. (b) Inversion result with smoothness constraint. 89Figure 5.4 Comparison of the well-log velocity (red), initial velocity(blue), and inverted velocities with (black) and without(green) the smoothness constraint. . . . . . . . . . . . . . 90Figure 5.5 Phase (a) and amplitude (b) comparisons between thefinal estimated source wavelet (◦) and the true sourcewavelet (∗) . . . . . . . . . . . . . . . . . . . . . . . . . . 91Figure 6.1 (a) Snapshot of the time-domain wavefield generated bya single source (∗); (b) recorded time-domain data at fourreceiver locations (∇). . . . . . . . . . . . . . . . . . . . . 103Figure 6.2 The four posterior PDFs corresponding to the penaltyformulations with λ = 10 (a), 50 (b), and 250 (c), andthe reduced formulation (d). . . . . . . . . . . . . . . . . 104Figure 6.3 The Gaussian approximations of the four posterior PDFsassociated with the model in equation 6.21. . . . . . . . . 107Figure 6.4 The comparison of the negative log-likelihood functions. . 117xivFigure 6.5 The comparison of the negative log-likelihood functions ofthe true distribution (ψ1), the approximate distribution(ψ2), and the Gaussian approximation distribution (ψ3)with the values of λ2 = 10−10µ1, 10−6µ1, 10−4µ1, 10−2µ1,100µ1, and 102µ1. . . . . . . . . . . . . . . . . . . . . . . 119Figure 6.6 The true model (a) and the prior mean model (b). . . . . 120Figure 6.7 The posterior mean models obtained by the GARTO method(a) and the RML method (b). . . . . . . . . . . . . . . . . 122Figure 6.8 The comparison between the observed data and the pre-dicted data with the posterior mean models obtained bythe GARTO method (a) and the RML method (b). . . . . 123Figure 6.9 The posterior standard deviations obtained by the GARTOmethod (a) and the RML method (b). . . . . . . . . . . . 124Figure 6.10 The mean (line) and 95% confidence interval (backgroundpatch) comparisons of the GARTO method (blue) and theRML method (red) at x = 500 m, 1500 m, and 2500 m.The similarity between these two results implies that theconfidence intervals obtained with the GARTO method isa good approximation of the ones obtain with the RMLmethod. . . . . . . . . . . . . . . . . . . . . . . . . . . . . 125Figure 6.11 The comparison of the prior marginal distribution (solidline) and the posterior marginal distributions obtained bythe GARTO method (dotted line) and the RML method(dashed line) at the locations of (x, z) = (1500m, 200m),(1500m, 700m), and (1500m, 1200m). . . . . . . . . . . . 126Figure 6.12 (a) The true model, (b) the prior mean model, and (c)the prior standard deviation. . . . . . . . . . . . . . . . . 127Figure 6.13 (a) The posterior MAP estimate of the model, (b) the pos-terior standard deviation, and (c) the relative differencebetween the posterior and the prior standard deviations,i.e.,STDpost(mk)−STDprior(mk)|STDprior(mk)| . . . . . . . . . . . . . . . . . . 129Figure 6.14 The mean and standard deviation comparisons of the pos-terior (blue) and the prior (red) distributions at x =1000 m, 2500 m, and 4000 m. . . . . . . . . . . . . . . . . 130Figure 6.15 The comparison of the prior (solid line) and the posterior(dotted line) marginal distributions at the locations of(x, z) = (2240m, 40m), (2240m, 640m), (2240m, 1240m),and (2240m, 1840m). . . . . . . . . . . . . . . . . . . . . . 131xvFigure 6.16 The 95% confidence intervals and the 10 realizations viathe RML method at x = 1000 m, 2500 m, and 4000 m. . . 132xviGlossaryADMM - Alternating Direction Method of MultipliersBPDN - Basis Pursuit DenoiseFWI - Full-Waveform InversionLASSO - Least Absolute Shrinkage and Selection OperatorLSM - Least Squares MigrationMAP - Maximum a PosteriorMcMC - Markov chain Monte CarloPNT - Projected Newton-typeRML - Randomized Maximum LikelihoodRTM - Reverse-Time MigrationRTO - Randomized-Then-OptimizeSE - Source EstimationSTD - Standard DeviationWRI - Wavefield-Reconstruction InversionxviiAcknowledgementsFirst and foremost, I would like to thank my research supervisor Dr. FelixJ. Herrmann. Thank him for providing me with the valuable opportunity tostudy and work in the world-class geophysical lab—SLIM—during the lastsix years. All the scientific knowledge and skills that I have learned in SLIMfrom Dr. Herrmann will benefit me in all aspects of my life. I am deeplygrateful for his supervision and help during the last six years.My gratitude also goes to Eldad Haber, Rachel Kuske, and MichaelFriedlander for sitting on my supervisory committee during the last six years.Their support and help are very important for my study and research.Thanks to all my colleagues at the SLIM lab. I would like to thank Hen-ryk Modzelewski, Miranda Joyce, Diana Ruiz, and Ian Hanlon for their pro-fessional assistance. I would also like to thank all students and researchersin SLIM—especially Rongrong Wang, Xiang Li, Ning Tu, Mengmeng Yang,Lina Miao, and Yiming Zhang—for their valuable scientific insights as re-search collaborators, and for the past six-years company and friendship.Thanks to Dr. James Gunning for the very insightful technical discus-sions. I also would like to thank the BG group for providing the synthetic3D compass velocity model that I used in Chapters 4 and 6. Many thanksto Chevron for providing the 2D synthetic dataset that I used in Chapter5. Thanks to the authors of the Sigsbee2A model that I used in Chapter2. Thanks to the authors of IWave, SPG`1, and Madagascar. I also wouldlike to acknowledge the collaboration of the SENAI CIMATEC Supercom-puting Center for Industrial Innovation, Bahia, Brazil, and the support ofBG Group and the International Inversion Initiative Project.Thanks to my wife for her company during my Ph.D. study. I am utterlyfortunate to have her unconditional support and love, which encourage meto overcome all the obstacles in both my research and life. I would alsolike to thank my parents for providing their support to my study in Canadaduring the last six years. Thank them for tolerating me to study and liveat such a distant place from their home in Hangzhou, China. I am gratefulxviiito my sons—they make my life filling with endless happiness.Thanks to all funding agencies that supported my research—NSERC,and all SINBAD consortium sponsors. I can not accomplish the research inthis thesis without their generous financial support.xixTo my dear wife Shan.You make me better everyday.To my dear Rixi and Yuji.You give me a new life.And endless happiness.To all Ph.D. candidates.Enjoy your life.You will make it.xxChapter 1IntroductionOil and gas industries rely heavily on images of the subsurface structure toevaluate the location, size, and profitability of a reservoir, as well as to quan-tify the exploration risk, drilling risk, and volumetric uncertainties. Thesesubsurface images are typically created from different types of geophysicaldata, ranging from seismic, electrical, electromagnetic, gravitational, andmagnetic, which result in different geophysical techniques. Among thesemethods, seismic exploration is the most important one in terms of capitalexpenditure because of its higher resolution, higher accuracy, and strongerpenetration (Sheriff and Geldart, 1995). Up to now, industries have widelyand successfully applied different seismic methods to oil and gas explorationactivities around the world (Etgen et al., 2009; Virieux and Operto, 2009).Seismic exploration technique involves collecting massive volumes of seis-mic data and processing these data to extract physical properties of subsur-face media. During the acquisition of seismic data, an active seismic source,such as a vibroseis truck used on land or an air-gun used offshore, generatesacoustic or elastic vibrations that travel from a certain surface location intothe Earth, pass through strata with different seismic responses and filteringeffects, and return to the surface. The returning waves are detected andrecorded as seismic data by receivers including geophones (measure groundmotion) or hydrophones (measure wave pressure) (Caldwell and Walker,2011). Figure 1.1 gives an illustration of standard land and marine seismicdata acquisition surveys. The recorded data at each receiver is a time seriescorresponding to the Earth’s response at the receiver location (Figure 1.2a),which is known as the seismic trace. The ensemble of all seismic tracesthat correspond to the same source experiment forms a shot record (Fig-ure 1.2b), and the ultimate data volume collects hundreds or thousands of1shot records (Figure 1.2c). These massive seismic data carry various physi-cal information about subsurface Earth media, and industries aim to extractthis information to exploit subsurface oil and gas resources.After the data acquisition, the next important procedure in seismic ex-ploration is to build up images of subsurface structures from observed seismicdata (Figure 1.3). The general workflow of imaging the subsurface structureis a two-step process. The first step is to reconstruct a background velocitymodel, which describes the long-wavelength characteristics of the subsur-face and correctly predicts the kinematics of wave propagation in the truesubsurface. The second step is to image the short-wavelength subsurfacefeatures—i.e., an image of the reflectivity or the perturbation with respectto the background velocity model—using the velocity model from the firststep (Cohen and Bleistein, 1979; Jaiswal et al., 2008). The ensemble of thelong and short-wavelength subsurface features (Figure 1.4) provides oil andgas industries with a detailed and informative subsurface image to locate oiland natural gas deposits.During the last two decades, oil and gas industries have faced the chal-lenge of exploiting reservoirs with complex geology, such as faults, salt bod-ies, folding, etc (Li, 2017). Therefore, there has been an increasing demandin oil and gas industries for high-resolution images of areas with compli-cated structures. This demand, along with recent advancements in high-performance computing, has produced a revolution in seismic exploration,as imaging techniques have upgraded from two to three dimensions, fromtime to depth, from post-stack to pre-stack, and from ray-based methodsto wave-equation-based methods. As a result, both academic and industrialefforts have focused on the research of wave-equation-based techniques in-cluding full-waveform inversion (Tarantola and Valette, 1982a; Lailly et al.,1983; Pratt, 1999; Virieux and Operto, 2009; Li et al., 2012; van Leeuwenand Herrmann, 2013a; Warner et al., 2013b), reverse-time migration (Baysalet al., 1983; McMechan, 1983; Whitmore, 1983; Etgen et al., 2009), and least-squares reverse-time migration (Aoki and Schuster, 2009; Herrmann and Li,2012; Tu and Herrmann, 2015b).In the oil and gas exploration and production (E&P) business, imagesof subsurface structures obtained by the aforementioned techniques serveras the inputs of the following procedure of structure interpretations, whoseresults become the inputs to the next step of building up reservoir models.The ultimate stage of this workflow is to analyze financial risks and to makeinvestment decisions (Osypov et al., 2013b). Due to the noise in the observeddata and modeling errors, uncertainties exist in images of subsurface struc-tures. Through the workflow, these image uncertainties will propagate to2(a) Land seismic survey(b) Marine seismic surveyFigure 1.1: (a) Schematic of land seismic survey. Image courtesy ION(www.iongeo.com) (b) Schematic of different marine seismic sur-veys. (Source: Caldwell and Walker, 2011)3-500 0 500Amplitude00.511.522.533.54Time [s](a) One seismic trace0 1000 2000 3000 400000.511.522.533.541000 2000 3000 4000.511.522.5.Receiver [m]Time [s]0 1000 2000 3000 400000.511.522.533.54(b) One shot record0 1000 2000 3000 400000.511.522.533.540 1000 2000 3000 400000.511.522.533.540 1000 2000 3000 400000.511.522.533.540 1000 2000 3000 400000.511.522.533.540 1000 2000 3000 400000.511.522.533.540 1000 2000 3000 400000.511.522.533.540 10 20 30 4000.511.522.533.54Receiver [m]Time [s]Source(c) One data volumeFigure 1.2: (a) One seismic trace, (b) one shot record, and (c) onedata volume.40 1000 2000 3000 400000.511.522.533.540 1000 2000 3000 400000.511.522.533.540 1000 2000 3000 400000.511.522.533.540 1000 2000 3000 400000.511.522.533.540 1000 2000 3000 400000.511.522.533.540 1000 2000 3000 400000.511.522.533.540 10 20 30 4000.511.522.533.54Receiver'[m]Time'[s]SourceObserved data0 1000 2000 3000 4000Lateral [m]0500100015002000Depth [m]1.522.533.544.55km/sSubsurface velocity structureFigure 1.3: The objective of seismic exploration is to reconstruct thesubsurface structures from the observed data.50 1000 2000 3000 4000Lateral [m]0500100015002000Depth [m]1.522.533.544.55km/sTrue velocity structure0 1000 2000 3000 4000Lateral [m]0500100015002000Depth [m]1.522.533.544.55km/sLong-wavelength structure(background velocity)0 1000 2000 3000 4000Lateral [m]0500100015002000Depth [m]-1-0.500.51km/sShort-wavelength structure(velocity perturbation)=+Figure 1.4: The subsurface structure can be divided into two parts:(1) the long-wavelength structure (background velocity model)and (2) the short-wavelength structure (velocity perturbation).6various E&P and financial uncertainties that impact final decision-making.Therefore, understanding and quantifying uncertainties in subsurface im-ages can assist oil and gas industries to have a better understanding of theultimate financial risks and to make more informed investment decisions(Osypov et al., 2013a).This thesis first aims to propose robust and computationally efficientwave-equation-based methods that can address practical problems, in whichthe source functions are typically unknown. Secondly, since it is desirableand important to have a quantitative evaluation of inversion results, this the-sis proposes a computationally feasible technique for assessing uncertaintiesin the velocity model. In order to accomplish these tasks, we will discuss thetopics of (1) least-squares reverse-time migration, (2) full-waveform inversionand wavefield-reconstruction inversion, and (3) uncertainty quantification.1.1 Least-squares reverse-time migrationIn the field of seismic exploration, one important and widely-used approachthat creates accurate and high-resolution subsurface images is the seismicmigration technique. The term migration refers to the process that movesreflection or diffraction seismic energy to their true subsurface positions(Yilmaz, 2001) (Figure 1.5). Conventional migration techniques, includ-ing Kirchhoff migration (French, 1975; Schneider, 1978; Biondi, 2001) andreverse-time migration (RTM) (Baysal et al., 1983; Etgen et al., 2009), onlyutilize the adjoint of the linearized forward modeling operator known as theBorn or demigration operator to map the observed data to the image do-main. Because the linearized forward operator is not an orthogonal operatorin general, these methods do not necessarily produce correct amplitude in-formation on the reflectivity or the perturbations with respect to the back-ground velocity. Moreover, these methods would also produce migrationartifacts caused by using the adjoint as an approximated inverse of the lin-earized operator. Least-squares RTM (LS-RTM) (Aoki and Schuster, 2009;Herrmann and Li, 2012; Tu and Herrmann, 2015b), in contrast, directlyinverts the linearized forward operator to reconstruct the true reflectivityor velocity perturbations. As a result, LS-RTM is capable of producingtrue-amplitude images and mitigating the migration artifacts.In general, LS-RTM inverts the linearized forward operator by minimiz-ing the `2-norm of the difference between the observed and predicted reflec-tion data iteratively. Because the predicted and observed data are functionsof velocity perturbations and source functions as shown in Figure 1.6, thesuccessful reconstruction of velocity perturbations strongly depends on the70 1 2 3 4Distance [km]00.511.522.5Depth1.61.822.22.40 1 2 3 4Receiver [km]00.511.522.533.54Time [s]Distance (km)0 0.5 1 1.5 2 2.5 3 3.5 4Depth (km)00.511.522.5-0.0500.05+Reflection data Smooth background modelImage of reflectivity or model perturbationFigure 1.5: The objective of migration is to map reflection or diffrac-tion seismic data to their true subsurface positions using asmooth background model that is kinematically correct.80 0.5 1 1.5 2Depth [km]-0.500.511.5Amplitude0 0.2 0.4 0.6 0.8 1Time [s]-0.500.51Amplitude0 0.2 0.4 0.6 0.8 1Time [s]-0.500.51AmplitudeSource func*on Reflec*vity modelObserved dataFigure 1.6: A 1D example to illustrate the impact of the source func-tion to the observed data. The background velocity is 4 km/s.There is an reflector at the depth of 1 km. The wavefield gen-erated by the source function propagates from the surface, isreflected at the depth of 1 km, returns back to the surface, andis recorded as the observed data.accuracy of source functions. For example, a simple time shift in the sourcefunction can lead to a wrongly located reflector as shown in Figure 1.7, andmore complicated disturbances in shapes and amplitudes of source func-tions can result in more unfavorable disturbances in reconstructed images.However, accurate source functions are typically unavailable in practicalproblems. Therefore, source functions should be estimated along with theinversion of velocity perturbations.We address the source estimation problem for LS-RTM by consideringthe fact that the linearized forward modeling operator is linear with respectto the source function. This fact implies that for any fixed velocity pertur-bation, there is a closed form of the optimal source function that minimizes90 0.2 0.4 0.6 0.8 1Time [s]-0.500.51AmplitudeSource func*on Observed dataInverted reflec*vity model0 0.2 0.4 0.6 0.8 1Time [s]-0.500.51AmplitudeCorrect sourceWrong source0 0.5 1 1.5 2Depth [km]-0.500.511.5AmplitudeResult w/ correct sourceResult w/ wrong sourceFigure 1.7: Reconstruct the 1D reflectivity model by using the correct(red) and wrong (blue) source functions. Clearly, the time shiftin the source function yields the location shift of the reflector.the difference between the predicted and observed data. Therefore, we cancompute the optimal source function after each update of the velocity per-turbation by solving a least-squares linear data fitting problem. However,different from the frequency domain where the source function is a singlecomplex number for each temporal frequency (Tu et al., 2016), the sourcefunction is a time series in the time domain. Because both the source func-tion and data are band limited, the solution of the least-squares problem isnot unique and unstable. To address this challenge, we regularize the sub-problem by simultaneously controlling the oscillations in the source functionand the energy of the source function. In this way, we significantly mitigatethe nonuniqueness and improve the stability of the estimated source func-tion.The first contribution of this thesis is that we propose an on-the-flysource estimation technique for the time-domain LS-RTM. During each it-10eration of LS-RTM, the proposed approach solves an additional regularizedleast-squares problem to obtain the optimal source function given the cur-rent update of the velocity perturbation. We embed the proposed sourceestimation approach in the recently developed sparsity-promoting LS-RTM(Herrmann et al., 2008; Herrmann and Li, 2012; Tu and Herrmann, 2015b) inthe time domain, which produces high-resolution images by combining bothstochastic optimization (Bertsekas and Tsitsiklis, 1995; Nemirovski et al.,2009; Shapiro et al., 2009; Haber et al., 2012) and sparsity-promoting opti-mization (Cande`s et al., 2006; Donoho, 2006; Cande`s and Wakin, 2008). Theresulting approach is able to create high-resolution images without knowingthe accurate source functions.1.2 Full-waveform inversion andwavefield-reconstruction inversionThe success of an LS-RTM requires an accurate background velocity modelthat can preserve the travel time kinematics of seismic data. Otherwise,the linearized forward modeling operator cannot generate data at the cor-rect time yielding wrongly positioned reflectors in the final images (Nemethet al., 1999; Herrmann et al., 2009; Herrmann and Li, 2012). During thepast twenty years, many researchers in both industries and academies havereported that the full-waveform inversion technique has the potential ca-pability to produce high-resolution and high-accuracy subsurface velocitymodels for migrations (Tarantola and Valette, 1982a; Lailly et al., 1983;Pratt, 1999; Virieux and Operto, 2009; Warner et al., 2013b).Full-waveform inversion is a wave-equation-based technique that aims toobtain the best subsurface velocity model consistent with the observed dataand any prior knowledge of the Earth. Through the usage of different typesof waves, including diving waves, supercritical reflections, and multiple-scattered waves, FWI possesses the potential capability of reconstructinga high-resolution and high-accuracy subsurface velocity model (Virieux andOperto, 2009). In recent years, many researchers have reported successfulapplications of FWI to industrial productions and demonstrated its capabil-ity in building up velocity models (Virieux and Operto, 2009; Sirgue et al.,2010; Warner et al., 2013b; Vigh et al., 2014).Most commonly, FWI is formulated as a nonlinear optimization problemthat inverts for the unknown velocity model by minimizing the dissimilaritiesbetween observed data and predicted data computed by solving wave equa-tions. This formulation is known as the adjoint-state method (Plessix, 2006;11Hinze et al., 2008) and its schematic workflow is shown in Figure 1.8. How-ever, it is well known that the adjoint-state method suffers from local min-ima in the objective function associated with the well-known cycle-skippingproblem (Bunks et al., 1995; Sirgue and Pratt, 2004). More specifically, ifthe starting model does not produce the predicted data within half a wave-length of the observed data, iterative optimization methods may stagnateat physically meaningless solutions (Warner et al., 2013a; van Leeuwen andHerrmann, 2013b; Huang et al., 2017). As a result, a successful inversionconducted by the adjoint-state method typically requires a good startingmodel and data containing enough low frequencies and long offsets to pre-vent cycle skipping from occurring (Bunks et al., 1995; Pratt, 1999; Sirgueand Pratt, 2004; Vigh et al., 2013).To help mitigate the problem of local minima, van Leeuwen and Her-rmann (2013b) and van Leeuwen and Herrmann (2015) proposed a penaltyformulation of FWI, which is known as wavefield-reconstruction inversion(WRI). Instead of solving the wave equation during each iteration as theadjoint-state method does, WRI considers the wave-equation as an `2-normpenalty term in the objective function, which is weighted by a penalty pa-rameter. As a result, compared to the adjoint-state method, WRI enlargesthe search space by considering wavefields as additional unknowns. Dur-ing each iteration, WRI aims to reconstruct wavefields that simultaneouslyfit both observed data and wave equations, which enables WRI to producenon-cycle-skipped data even for models further from the global optimum.This property provides WRI with the potential to start an inversion witha less accurate initial model and higher-frequency data in comparison withthe adjoint-state method (van Leeuwen and Herrmann, 2015; Peters et al.,2014). Indeed, in recent years, the strategy of avoiding cycle-skipped databy enlarging the search space has been also utilized by other extension-basedmodifications to the conventional FWI technique, including adaptive wave-form inversion (AWI) (Warner and Guasch, 2014) and FWI with source-receiver extension (Huang et al., 2017).Despite the encouraging initial results of WRI, successful applicationsof WRI also strongly depend on the accuracy of the source function. Anydisturbances presented in the source functions will propagate into the pre-dicted data as shown in Figure 1.9. Because the data nonlinearly dependson the velocity model, the additional data misfit introduced by the distur-bances in source functions may yield large discrepancies in the recoveredvelocity model (Pratt, 1999; Zhou and Greenhalgh, 2003; Sun et al., 2014).Moreover, due to the accumulation of errors, these discrepancies can furthergrow as the depth increases. Hence, accurate source functions play a cru-12 model solve forward wave-equation predicted data observed data compute data residual and misfit solve adjoint wave-equation compute gradient and update123456Figure 1.8: Schematic FWI workflow.13cial role in WRI for the reconstruction of the velocity model. Nevertheless,accurate source functions are generally unknown in practice (Pratt, 1999;Virieux and Operto, 2009; Li et al., 2013). As a result, an embedded proce-dure of source estimation becomes necessary for WRI to conduct a successfulpractical application.The second contribution of this thesis is that it equips WRI with anon-the-fly source estimation technique. As the source functions are alsounknown, we face minimizing an objective function with three unknowns,i.e., the velocity model, wavefields, and source functions. We observe thatfor any fixed velocity, the objective function is quadratic with respect toboth wavefields and source functions. Therefore, during each iteration, weapply the so-called variable projection method (Golub and Pereyra, 2003)to simultaneously project out the source functions and wavefields. After theprojection, we obtain a reduced objective function that only depends on thevelocity model, and we invert for the unknown velocity model by minimizingthis reduced objective. Finally, we illustrate that the proposed inversionscheme can recover the unknown velocity model without prior informationof the source functions, which enhances the feasibility of WRI to practicalproblems.1.3 Uncertainty quantificationThe seismic data used by FWI and WRI always contain noise that resultsfrom nearby human activities (such as traffic or heavy machinery), animals’movements, winds, and ocean waves, etc. During inversion, these uncertain-ties in the observed data propagate into the inverted velocity model, yieldinguncertainties there. Because the inverted velocity model is the input of thesubsequent processing and interpretations, tracking uncertainties in the ve-locity model can lead to significant improvements in the quantifications ofexploration, drilling, and financial risks (Osypov et al., 2013b).One systematic way to quantify these uncertainties is the Bayesian ap-proach (Kaipio and Somersalo, 2006), which has been applied by geophysicalresearchers for more than 30 years (Tarantola and Valette, 1982b; Duijn-dam, 1988; Scales and Tenorio, 2001; Sambridge et al., 2006; Osypov et al.,2008; Ely et al., 2017). The Bayesian approach formulates an inverse prob-lem in the framework of statistical inference, describing all unknowns asrandom variables and incorporating uncertainties in the observed data, theunderlying modeling map, as well as one’s prior knowledge on the unknownvariables. The randomness of the variables indicates the degree of the beliefof their values. The solution of such statistical inverse problem is expressed140 200 400 600 800 1000X [m]01002003004005006007008009001000Z [m](a) Source and receiver locations0 0.2 0.4 0.6 0.8 1Time [s]-4000-20000200040006000AmplitudeCorrect sourceWrong source(b) Correct source v.s. wrong source0 0.2 0.4 0.6 0.8 1Time [s]-200-1000100200300AmplitudeCorrect sourceWrong source(c) Data w/ correct source v.s. dataw/ wrong sourceFigure 1.9: (a) We conduct a simple example on a homogeneousmodel with a constant velocity of 2 km/s. We simulate oneshot (F) and receive the data at one receiver (∆) using a cor-rect source (red) and a wrong source (blue). (b) There is a timeshift of 0.1 s between the correct and wrong sources. (c) Thereis also a time shift of 0.1 s between the data generated by thecorrect and wrong sources.15in terms of the posterior probability distribution or posterior probabilitydensity function (PDF) that incorporates all available statistical informa-tion from both the observations in a likelihood distribution and the priorknowledge in a prior distribution. This posterior distribution allows us toextract statistics of interests about the unknown variables either by directlysampling the distribution using approaches like Markov chain Monte Carlomethods (Kaipio and Somersalo, 2006; Matheron, 2012) or by locally ap-proximating the distribution with an easy-to-study Gaussian distribution(Bui-Thanh et al., 2013; Osypov et al., 2013b; Zhu et al., 2016). Theseextracted statistics provide us with a complete description of the degree ofconfidence about the unknown variables. Specifically, in seismic exploration,these statistics allow us to assess the uncertainties of the inverted velocitymodel and identify the areas with high/low reliability in the model. Fig-ure 1.10 shows the standard workflow of the Bayesian statistical inversion.The key procedure in the Bayesian approach is to explore the poste-rior distribution. For small-scale problems with a fast simulation driver, wecan directly sample the posterior distribution by the best-practice McMCtype methods (Cordua et al., 2012; Martin et al., 2012; Ely et al., 2017).However, for wave-equation-based problems, the evaluation of the poste-rior distribution typically involves computationally expensive wave-equationsimulations, and the unknown variables live in a high-dimensional space thatarises from the discretization of the model. Moreover, the number of samplesrequired by McMC type methods increases rapidly with the dimensionality(Roberts et al., 2001). All of these facts hinder the application of McMCtype methods to large-scale wave-equation-based problems. An alternativeapproach with computationally low cost is to approximate the distributionwith an easy-to-study Gaussian distribution (Bui-Thanh et al., 2013; Osypovet al., 2013b; Zhu et al., 2016). Through exploring the Gaussian distribu-tion, this approach provides an approximated analysis of the uncertaintiesin the inverted velocity model. Without the requirement of iteratively eval-uating the expensive posterior distribution, this approaches can significantlyreduce the computational cost. Aside from the gain in the computationalcost, the accuracy of the resulting uncertainty analysis strongly relies on thefact that the posterior distribution can be adequately approximated by aGaussian distribution.The wave-equation simulation in conventional FWI is strongly nonlinearwith respect to the velocity model. As a result, the posterior distribution ofthe statistical FWI may be only valid in a small neighbor of the maximum aposteriori (MAP) estimator of the posterior distribution (Bui-Thanh et al.,2013; Zhu et al., 2016). Different from conventional FWI, we borrow the16Data Prior knowledgeLikelihood PDF Prior PDFPosterior PDFSamplesStatisticsFigure 1.10: Schematic workflow of the Bayesian statistical inversion.17insights from wavefield-reconstruction inversion and propose a new formu-lation for the posterior distribution to enlarge the region that the Gaussianapproximation can hold true. Instead of eliminating the wave-equation con-straint by solving the wave equation explicitly, we introduce the wavefields asauxiliary variables and allow for Gaussian deviations in the wave equation.The relaxation of the wave-equation constraint helps weaken the nonlinear-ity between the velocity model and simulated data. Therefore, the Gaussianapproximation of the resulting posterior distribution can be valid in a largerregion than that of conventional FWI, which improves the accuracy of thestatistics extracted from the Gaussian distribution.The third main contribution of this thesis, therefore, is that we proposea computationally feasible framework to assess uncertainty in velocity mod-els. Through relaxing the wave-equation constraint, we obtain a posteriordistribution that can be adequately approximated by a Gaussian distribu-tion. As a result, we can obtain approximated analyses of uncertainties invelocity models from sampling the Gaussian distribution.1.4 ObjectivesTo summarize, we aim to achieve the following objectives:1. To develop a robust time-domain sparsity promoting LS-RTM with anon-the-fly source estimation. We borrow insights from both stochasticoptimization and compressive sensing to reduce the large computa-tional cost of LS-RTM without compromising the image quality. More-over, we aim to develop an on-the-fly source estimation technique toenhance applications to realistic problems.2. To develop on-the-fly source estimation technique for WRI. When ap-plied to realistic problems, WRI faces the challenge that the sourcefunctions are unknown. We aim to develop an inversion scheme thatcan remove the demand of accurate source functions.3. To develop a computationally feasible approach to quantify uncertain-ties in the unknown velocity model by relaxing the wave-equation con-straints. Uncertainty quantification for wave-equation based inverseproblem face the challenges that arise from the high-dimensional spaceand computationally expensive parameter-to-data map with strongnonlinearity. We aim to weaken the nonlinear dependence of the dataon the velocity model and to propose a computationally tractable ap-proach to extracting statistical quantities of interests.181.5 Thesis outlineIn chapter 2, we propose an on-the-fly source estimation technique for time-domain LS-RTM. We first introduce the time-domain sparsity promoting LS-RTM as a Basis Pursuit Denoise (BPDN) optimization problem. Then we in-troduce an easy-to-implement algorithm—the linearized Bregman method—that solves the optimization problem. Following that, we derive the proposedon-the-fly source estimation technique and introduce how to embed it in thelinearized Bregman method. Finally, We investigate the effectiveness of theproposed approach using a realistic 2D synthetic example.In chapter 3, we first introduce the basic theory and implementation ofFWI and WRI. Then we describe the variable projection method that is usedto solve WRI. We conclude by an intuitive example comparing the objectivefunctions of FWI and WRI, which illustrates the potential advantages ofWRI over FWI.In chapter 4, we propose an on-the-fly source estimation technique forWRI. We first formulate the problem of WRI with source estimation as anoptimization problem with respect to the velocity model, wavefields, andsource functions. Then we derive the proposed source estimation technique,in which we use the variable projection method to eliminate the dependenceof the optimization problem on the wavefields and source functions. Finally,we verify the effectiveness of the proposed source estimation technique by aseries of numerical experiments.In chapter 5, we apply the proposed technique of WRI with the on-the-flysource estimation to the Chevron2014 blind test dataset. We first incorpo-rate the proposed optimization scheme with several regularizations on thevelocity model that penalizes unwanted and unphysical events. Then weuse the resulting optimization approach to the blind test dataset. We verifythe feasibility of the proposed source estimation technique in recovering thesource functions and velocity model when addressing real data.In chapter 6, we propose a computationally efficient approach to quan-tify uncertainties in the result of wave-equation based approaches. We firstexplain how to derive a posterior distribution with weak wave-equation con-straints. Then, we study how to select the variance for the wave-equationmisfit term so that it renders a posterior distribution that can be appropri-ately approximated by a Gaussian distribution. Following that, we explainhow to extract statistical quantities from this posterior distribution witha computationally acceptable cost. We finalize this chapter by presentingseveral numerical examples to investigate the performance of the proposeduncertainty quantification approach.19In chapter 7, we provide a summary of the thesis. We also discuss thelimitation of the approaches provided in the thesis and the possible worksin the future.20Chapter 2Source estimation fortime-domainsparsity-promotingleast-squares reverse-timemigration2.1 IntroductionIn seismic exploration, prestack depth migration techniques, including reverse-time migration (RTM) and least-squares reverse-time migration (LS-RTM),aim to transform the reflection data into a high-resolution image of theinterior of the Earth. To accomplish this task, traditional RTM approxi-mates the inverse of the linearized Born modeling operator by its adjoint,which is well known as the migration operator. As a consequence, RTMwithout extensive preconditioning—generally in the form of image domainscalings and source deconvolution prior to imaging—cannot produce high-resolution and true-amplitude images. By fitting the observed reflection datain the least-squares sense, least-squares migration, and in particular least-squares reverse-time migration (LS-RTM), overcomes these issues except fortwo main drawbacks withstanding their successful applications. Firstly, thecomputational cost of conducting demigrations and migrations iteratively isA version of this chapter has been published in the proceedings of SEG Annual Meet-ing, 2016, Dallas, USA21very large. Secondly, minimizing the `2-norm of the data residual can leadto model perturbations with artifacts caused by overfitting. These are dueto the fact that LS-RTM involves the inversion of a highly overdeterminedsystem of equations where it is easy to overfit noise in the data.One approach to avoid overfitting is to apply regularizations to the orig-inal formulation of LS-RTM and to search for the sparsest possible solutionby minimizing the `1-norm on some sparsifying representation of the image.Motivated by the theory of Compressive Sensing (Donoho, 2006), wheresparse signals are recovered from severe undersamplings, and consideringthe huge cost of LS-RTM, we reduce the size of the overdetermined imag-ing problem by working with small subsets of sources at each iteration ofLS-RTM, bringing down the computational costs.Herrmann et al. (2008) found that, as a directional frame expansion,curvelets (Ying et al., 2005; Candes et al., 2006) lead to the sparsity of seis-mic images in imaging problems. This property led to the initial formulationof curvelet-based “Compressive Imaging” (Herrmann and Li, 2012), whichformed the bases of later work by Tu et al. (2013) who included surface-related multiples and on-the-fly source estimation via variable projection.While this approach represents major progress, the proposed method, whichinvolves drawing new independent subsets of shots after solving each `1-normconstrained subproblem, fails to continue to bring down the residual. Thisresults in remaining subsampling related artifacts. In addition, the proposedmethod relies on a difficult-to-implement `1-norm solver. To overcome thesechallenges, Herrmann et al. (2015) motivated by Lorenz et al. (2014) re-laxed the `1-norm objective of Basis Pursuit Denoise (Chen et al., 2001) byan objective given by the sum of the λ-weighted `1- and `2-norms. Thisseemingly innocent change resulted in a greatly simplified implementationthat no longer relies on relaxing the `1-norm constraint and more impor-tantly continues to make progress towards the solution irrespective of thenumber of randomly selected sources participating in each iteration.Inspired by this work, we propose to extend our earlier work in thefrequency domain to the time domain, which is more appropriate for large-scale 3D problems and to include on-the-fly source estimation via variableprojection. Both generalizations are completely new and challenging becausethe estimation of the time-signature of the source function no longer involvesestimating a single complex number for each temporal frequency (Tu et al.,2013).This chapter is organized as follows. First, we formulate the sparsity-promoting LS-RTM as a Basis Pursuit Denoising problem formulated in thecurvelet domain. Next, we relax the `1-norm and describe the relative simple22iterative algorithm that solves this optimization with the mixed `1- `2-normobjective. We conclude this theory section by including estimation of thesource-time signature that calls for additional regularization to prevent over-fitting of the source function. We evaluate the performance of the proposedmethod on the synthetic Sigsbee2A model (Paffenholz et al., 2002) wherewe compare our inversion results to LS-RTM with the true source functionand standard RTM.2.2 Theory2.2.1 Imaging with linearized BregmanThe original sparsity-promoting LS-RTM has the same form, albeit it isoverdetermined rather than underdetermined, as a Basis Pursuit Denoise(BPDN, Chen et al. (2001)) problem, which is expressed as belowminimizex‖x‖1subject to∑i‖∇Fi(m0,qi)C>x− δdi‖2 ≤ σ, (2.1)where the vector x contains the sparse curvelet coefficients of the unknownmodel perturbation. The symbols ‖ · ‖1 and ‖ · ‖2 denote `1- and `2-norms,respectively. The matrix C> represents the transpose of the curvelet trans-form C. The vectors m0 stands for the background squared slowness modeland qi = qi(t) is the time-dependent source function for the ith shot. Thevector δdi represents the observed reflection data along the receivers for theith shot. The∑i runs over all shots. The parameter σ denotes the noiselevel of the data. The matrix ∇Fi represents the linearized Born modelingoperator, which is given by:∇Fi = −PA(m0)−1 (∇A(m0)ui) , (2.2)where the matrix A(m0) represents the discretized partial differential oper-ator of the time-domain acoustic wave-equation as follows:A(m0) = ∆−m0 ∂2∂t2. (2.3)The vector ui is the background forward wavefield given by ui = A(m0)−1qi.The matrix P projects the wavefields onto the receiver locations. The oper-23ator ∇A(m0)ui denotes the Jacobian matrix.Despite the success of BPDN in Compressive Sensing, where matrix-vector multiplies are generally cheap, solving equation 2.1 in the seismicsetting, where the evaluations of the ∇Fi’s involve wave-equation solves, isa major challenge because it is computationally unfeasible. To overcomethis computational challenge, Herrmann and Li (2012) proposed, motivatedby ideas from stochastic gradients (Nemirovski et al., 2009; Shapiro et al.,2009), to select subsets of shots by replacing the sum over i = 1 · · ·ns,with ns the number of shots, to a sum over randomly selected index setsI ⊂ [1 · · ·ns] of size n′s ns. By redrawing these subsets at certain pointsof the `1-norm minimization, Li et al. (2012) and Tu et al. (2013) were ableto greatly reduce the computational costs but at the expense of giving upconvergence when the residual becomes small.For underdetermined problems, Cai et al. (2009) presented a theoreticalanalysis that proves convergence of a slightly modified problem for iterationsthat involve arbitrary subsets of rows (= subsets I). Herrmann et al. (2015)adapted this idea to the seismic problem and early results suggested that thework of Cai et al. (2009) can be extended to the overdetermined (seismic)case. With this assumption, we propose to replace the optimization problemin equation 2.1 byminimizexλ1‖x‖1 + 12‖x‖22subject to∑i‖∇Fi(m0,qi)C>x− δdi‖2 ≤ σ.(2.4)The mixed objectives of the above type are known as elastic nets in ma-chine learning and for λ1 →∞—which in practice means λ1 large enough—solutions of these modified problems converge to the solution of equation 2.1.While adding the `2-term may seem an innocent change, it completelychanges the iterations of sparsity-promoting algorithms including the role ofthresholding (See Herrmann et al. (2015) for more discussion).Pseudo code illustrating the iterations to solve equation 2.4 are summa-rized in Algorithm 1. In this algorithm, tk = ‖Akxk − bk‖22/‖A>k (Akxk −bk)‖22 is the dynamic step size, Sλ1(z) is the soft thresholding operator(Lorenz et al., 2014), and Pσ projects the residual onto an `2-norm ballgiven by the size of the noise σ. To avoid too many iterations, the thresholdλ1, which has nothing to do with the noise level but with the relative im-portance of the `1 and `2-norm objectives, should be small enough. Usually,we set it proportional to the level of the maximum of zk to let entries of zk24enter into the solution.Algorithm 1 Linearized Bregman for LS-RTM1. Initialize x0 = 0, z0 = 0, q, λ1, batchsize n′s ns2. for k = 0, 1, · · ·3. Randomly choose shot subsets I ⊂ [1 · · ·ns], |I| = n′s4. Ak = {∇Fi(m0,qi)C>}i∈I5. bk = {δdi}i∈I6. zk+1 = zk − tkA>k Pσ(Akxk − bk)7. xk+1 = Sλ1(zk+1)8. endnote: Sλ1(zk+1) = sign(zk+1) max{0, |zk+1| − λ1}Pσ(Akxk − bk) = max{0, 1− σ‖Akxk−bk‖} · (Akxk − bk)2.2.2 On-the-fly source estimationAlgorithm 1 requires knowledge of the source-time signatures qi’s to be suc-cessfully employed. Unfortunately, this information is typically not avail-able. Following our earlier work on source estimation in time-harmonicimaging and full-waveform inversion (Tu and Herrmann, 2015a; van Leeuwenet al., 2011), we propose an approach where after each model update, weestimate the source-time function by solving a least-squares problem thatmatches predicted and observed data.For the source estimation, we assume that we have an initial guess q0 =q0(t) for the source, which convolved with a filter w = w(t) gives the truesource function q. By replacing q in equation 2.4 by the convolution of wand q0, we obtain the following joint optimization problem with respect toboth x and w:minimizex,wλ1‖x‖1 + 12‖x‖22subject to∑i‖∇Fi(m0,w ∗ q0)C>x− δdi‖2 ≤ σ,(2.5)where the symbol ∗ stands for the convolution along time. The filter wis unknown and needs to be estimated in every Bregman step by solvingan additional least-squares subproblem. Generally, estimating w requiresnumerous recomputations of the linearized data. This can be avoided byassuming that the linearized data for the current estimate of w is given bymodeling the data with the initial source q0 and convolving with the filter25w afterward:∇Fi(m0,w ∗ q0) ≈ w ∗ ∇Fi(m0,q0), (2.6)where the right hand side stands for the trace by trace convolution of wwith the shot gather traces.We expect the estimated source signature to decay smoothly to zerowith few oscillations within a short duration of time. Therefore in oursubproblem, we add a penalty term ‖diag(r)(w∗q0)‖2, where r is a weightingfunction that penalizes non-zeros entries at later times. In our example, wechoose r(t) to be a logistic loss function (Rosasco et al., 2004)r(t) = log(1 + eα(t−t0)). (2.7)Here, the scalar α defines the speed of the function r(t) decreases to 0 ast → 0, and we penalize all the events after t = t0. Furthermore, we adda second penalty term λ2‖w ∗ q0‖2 to control the overall energy of theestimated source function. Our subproblem for source estimation is thengiven byminimizew∑i‖w ∗ ∇Fi(m0,q0)− δdi‖22 + ‖diag(r)w ∗ q0‖22 + λ2‖w ∗ q0‖22.(2.8)The algorithm to solve the sparsity-promoting LS-RTM with source estima-tion using linearized Bregman is summarized in Algorithm 2.Algorithm 2 LB for LS-RTM with source estimation1. Initialize x0 = 0, z0 = 0, q0, λ1, λ2, batchsize n′s ns, weights r2. for k = 0, 1, · · ·3. Randomly choose shot subsets I ⊂ [1 · · ·ns], |I| = n′s4. Ak = {∇Fi(m0,q0)C>}i∈I5. bk = {δdi}i∈I6. d˜k = Akxk7. wk = arg minw ‖w ∗ d˜k − bk‖22 + ‖diag(r)(w ∗ q0)‖22 + λ2‖w ∗ q0‖228. zk+1 = zk − tkA>k(wk?Pσ(wk ∗ d˜k − bk))9. xk+1 = Sλ1(zk+1)10. endIn Algorithm 2, the symbol ? stands for the correlation, which is theadjoint operation of the convolution. Since the initial guess of x is zero, weomit the filter estimation in the first iteration and update x1 with the initialguess of the source. In the second iteration, after the filter is estimated for26the first time (line 7), z1 is reset to zero, and z2 and x2 are updated accordingto lines 8 − 9. This prevents that the imprint of the initial (wrong) sourcefunction pollutes the image updates of later iterations. The subproblem inline 7 can be solved by formulating the optimality condition and solving forwk directly.2.3 Numerical experimentsTo test our algorithm, we conduct two types of experiments. First, wetest the linearized Bregman method for sparsity-promoting LS-RTM for thecase with a known source function q and a wrong source function q0. Wethen compare these results with an example in which the source functionis unknown and estimated by the proposed approach. For the experiments,we use the top left part of the Sigsbee2A model, which has a size of 2918mby 4496m and is discretized with a grid size of 7.62 m by 7.62 m. The modelperturbation, which is the difference between the true model and a smoothedbackground model (Figure 2.1a), is shown in Figure 2.1b. We generatelinearized and single scattered data using the Born modeling operator. Theexperiment consists of 295 shot positions with 4 seconds recording time.Each shot is recorded by 295 evenly spaced receivers at a depth of 7.62 mand with a receiver spacing of 15.24 m, yielding a maximum offset of 4 km.We use a source function with a limited frequency spectrum (Figure 2.2)to simulate the data. The source function has a central frequency of 15 Hzwith a spectrum ranging from 5 Hz to 35 Hz.2.3.1 RTM with true and wrong source functionsFor later comparisons with the LS-RTM images, we first generate a tradi-tional RTM image (Figure 2.3a) using the true source function. Clearly,the traditional RTM approach creates an image with wrong amplitudes andblurred shapes at interfaces and diffraction points. Furthermore, we alsogenerate an RTM image (Figure 2.3b) using a wrong source function asshown in Figure 2.2. Compared to the true source function, the wrong onecontains a spectrum with a flat shape ranging from 12 to 28 Hz and an ex-ponential tapering between 0-12Hz and 28-40 Hz. Furthermore, the phaseand shape of the wrong source function also differ from the correct ones.Figure 2.3b illustrates that the usage of the wrong source function enhancesthe artifacts and destroys the energy’s focusing at reflectors and diffractionpoints.27Distance (km)0 0.5 1 1.5 2 2.5 3 3.5 4Depth (km)00.511.522.50.20.250.30.350.4s2/km 2(a) Smooth background modelDistance (km)0 0.5 1 1.5 2 2.5 3 3.5 4Depth (km)00.511.522.5-0.0500.05s2 /km 2(b) Model perturbationFigure 2.1: (a) The smooth background model and (b) the true modelperturbation modified from Sigsbee2A.280 0.2 0.4 0.6 0.8time (s)-0.100.10.20.3Ampsignal in time domainWrong sourceTrue source0 10 20 30 40 50 60frequency (HZ)1234Energyfrequency spectrumWrong sourceTrue sourceFigure 2.2: Comparison of the true (red) and wrong source (blue)functions.2.3.2 LS-RTM with linearized BregmanIn this example, we perform LS-RTM via the linearized Bregman methodusing both the correct source function q and the wrong source function q0shown in Figure 2.2. During the inversion, we perform 40 iterations and draw8 randomly selected shots in each iteration. Therefore, after 40 iterations,approximately all shot gathers have been used one time (˜one pass throughthe data). To accelerate the convergence, we apply a depth preconditionerto the model updates, which compensates for the spherical amplitude decay(Herrmann et al., 2008).The only parameter that needs to be provided by the user for the algo-rithm is the weighting parameter λ1 that controls the soft thresholding ofthe vector z. A large value of λ1 will threshold too many entries of z leadingto a slower convergence and more iterations, whereas a small λ1 allows noiseand subsampling artifacts to pass the thresholding. In our experiments, wedetermine λ1 in the first iteration by setting λ1 = 0.1∗max(|z|), which allowsroughly 90 percent of the entries in z to pass the soft thresholding in the29Distance (km)0 0.5 1 1.5 2 2.5 3 3.5 4Depth (km)00.511.522.5-6000-4000-200002000400060008000(a) RTM image with the correct source functionDistance (km)0 0.5 1 1.5 2 2.5 3 3.5 4Depth (km)00.511.522.5 -1000-50005001000(b) RTM image with the wrong source functionFigure 2.3: RTM images with the correct (a) and wrong (b) sourcefunctions.30first iteration.The inversion result of the first experiment with the correct source func-tion is shown in Figure 2.4a. It clearly shows that the faults and reflectorsare all accurately imaged. The interfaces are sharp because of the influenceof directly inverting the linearized Born modeling operator. On the otherhand, when the source function is not correct, the LS-RTM image exhibitsstrong artifacts, blurred layers, and wrongly located diffractors, as shownin Figure 2.4b. The data residual in the inversion with the correct sourcefunction decays as a function of the iteration number (Figure 2.5a), in spiteof some jumps occur due to redrawing the shot records in each iteration.On the other hand, when using the wrong source function, the data resid-ual does not decay with the increase of iterations. We can obtain the sameobservation when comparing the model errors of the two inversion strate-gies. This example indicates the importance of the correct source functionin LS-RTM.2.3.3 Linearized Bregman with source estimationTo solve the problem with the unknown source function, we utilize the lin-earized Bregman method with source estimation as described in Algorithm 2.To start the inversion, we select the wrong source function in Figure 2.2 asthe initial guess for the source function q0. To successfully conduct thesource estimation, the user also needs to supply the parameter α in thedamping function (c.f. equation 2.7). From our experience, once α is bigenough, the recovered source is not sensitive to the change of α. In ourexample, we select α = 8 and λ2 = 1.With the same strategy for choosing λ1 as in the previous example, theinversion result shown in Figure 2.6 is visually as good as the result using thecorrect source function (Figure 2.4a). Meanwhile, as shown in Figure 2.7,the recovered source function (green dotted line) closely matches the correctsource function (red dash-dot line), while some small differences remain inthe frequency spectrum. The data residual as a function of the iteration(Figure 2.8a) is similar to the one with the correct source, aside from aslightly larger residual observed at the beginning of the inversion due tothe wrong initial source function. We have the same observation for thecomparison of the model error history in Figure 2.8b. After 40 iterations,both the final data residual and model error are in the same range as thosefrom the experiment with the correct source function.31Distance (km)0 0.5 1 1.5 2 2.5 3 3.5 4Depth (km)00.511.522.5-0.0500.05(a) LS-RTM image with the true source functionDistance (km)0 0.5 1 1.5 2 2.5 3 3.5 4Depth (km)00.511.522.5-0.0500.05(b) LS-RTM image with the wrong source estimationFigure 2.4: LS-RTM images with the true (a) and wrong (b) sourcefunctions.3210 20 30iterations50010001500200025003000residualw/ true sourcew/ wrong source(a) Data residuals along iterations5 10 15 20 25 30 35 40iterations0.70.750.80.850.90.9511.05reletive model errorw/ true sourcew/ wrong source(b) Model errors along iterationsFigure 2.5: Data residuals and model errors along iterations33Distance (km)0 0.5 1 1.5 2 2.5 3 3.5 4Depth (km)00.511.522.5-0.0500.05Figure 2.6: LS-RTM image with the on-the-fly source estimation.0 0.2 0.4 0.6 0.8time (s)-0.100.10.20.3Ampsignal in time domainInitial sourceTrue sourceEstimated source0 10 20 30 40 50 60frequency (HZ)1234Energyfrequency spectrumInitial sourceTrue sourceEstimated sourceFigure 2.7: Comparison of the true (red), initial (blue), and recovered(green) source functions.345 10 15 20 25 30 35 40iterations100200300400500600residualw/ true sourcew/ source estimation(a) Data residuals along iterations5 10 15 20 25 30 35 40iterations0.750.80.850.90.9511.051.1reletive model errorw/ true sourcew/ source estimation(b) Model errors along iterationsFigure 2.8: Data residuals and model errors along iterations352.4 ConclusionsIn this chapter, we performed sparsity-promoting LS-RTM in the time do-main using the linearized Bregman method. This algorithm is easy to imple-ment and allows us to work with random subsets of data, which significantlyreduces the cost of LS-RTM. Furthermore, this algorithm also allows us toincorporate source estimation into the inversion by solving a small addi-tional least-squares subproblem during each Bregman iteration. Ultimately,we proposed a modified version of the LB algorithm including source es-timation for sparsity-promoting LS-RTM in the time domain. Numericalexperiments illustrated that the proposed algorithm can produce sharp sub-surface images with true amplitudes without the requirement of having thecorrect source functions, while it only needs a computational cost that is inthe range of one conventional RTM approach.36Chapter 3Full-waveform inversion andwavefield-reconstructioninversion3.1 IntroductionThe successful application of LS-RTM introduced in the previous chapterheavily rely on the background velocity model that can preserve the kine-matics of the wave propagation. During the past two-decades, full-waveforminversion has been reported by many researchers as a potentially promisingapproach to build up high-accuracy background velocity models for migra-tion techniques including LS-RTM (Tarantola and Valette, 1982a; Laillyet al., 1983; Pratt, 1999; Virieux and Operto, 2009; Sirgue et al., 2010;Warner et al., 2013b; Vigh et al., 2014).The objective of full-waveform inversion (FWI) is to compute the bestpossible Earth model that is consistent with observed data (Tarantola andValette, 1982a; Lailly et al., 1983; Pratt, 1999; Virieux and Operto, 2009).Mathematically, it can be formulated as an optimization problem that is con-strained by wave equations. The conventional FWI approach, also knownas the adjoint-state method, eliminates the wave-equation constraints bydirectly solving the wave equations. However, due to the lack of long-offset data with low-frequency components, the adjoint-state method suf-fers from the local minima related to the so-called cycle skipping problem(Bunks et al., 1995; Sirgue and Pratt, 2004). To help mitigate the prob-lem of local minima, van Leeuwen and Herrmann (2013b) and van Leeuwen37and Herrmann (2015) proposed a penalty formulation, known as wavefield-reconstruction inversion (WRI), where they relaxed the wave-equation con-straints by introducing the wavefields as auxiliary variables. Instead of ex-actly solving the wave equations as the adjoint-state method does, WRIreplaces the wave-equation constraint by an `2-norm penalty term in theobjective function. van Leeuwen and Herrmann (2013b) and van Leeuwenand Herrmann (2015) illustrated that through expanding the search space,WRI is less “nonlinear” and may contain less local minima compared to theconventional adjoint-state method.In this chapter, we first introduce the theoretical formulations of bothFWI and WRI in the frequency domain. Then we describe the variable pro-jection method, which is the underlying tool that solves the WRI problem.Finally, we present a simple comparison between the objective functions ofFWI and WRI to illustrate the potential benefit of WRI.3.2 Full-waveform inversionDuring FWI, we solve the following least-squares problem for the discretizedngrid-dimensional unknown medium parameters, squared slowness m ∈ Rngrid ,which appear as coefficients in the acoustic time-harmonic wave equation,i.e.,minimizeu,m12nsrc∑i=1nfreq∑j=1‖Piui,j − di,j‖22subject to Aj(m)ui,j = qi,j .(3.1)Here, the i, j’s represent the source and frequency indices, and the vec-tor d ∈ Cnfreq×nsrc×nrcv represents monochromatic Fourier transformed datacollected at nrcv receiver locations from nsrc seismic sources and sampled atnfreq frequencies. The matrix Aj(m) = ω2jdiag(m) + ∆ is the discretizedHelmholtz matrix at angular frequency ωj . The symbol ∆ represents thediscretized Laplacian operator and the matrix Pi corresponds to a restric-tion operator for the receiver locations. Finally, the vectors qi,j and ui,jdenote the monochromatic source term at the ith source location and jthfrequency and the corresponding wavefield, respectively. For simplicity, weomit the dependence of Aj(m) on the discretized squared slowness vectorm from now onwards.Since the Aj ’s are square matrices, we can eliminate the variable u bysolving the discretized partial-differential equation (PDE) explicitly, leading38to the so-called adjoint-state method, which has the following reduced form(Virieux and Operto, 2009):minimizemfred(m) =12nsrc∑i=1nfreq∑j=1‖PiA−1j qi,j − di,j‖22, (3.2)with the corresponding gradient gred(m) given bygred(m) =nsrc∑i=1nfreq∑j=1ω2i diag(conj(ui,j))vi,j , (3.3)where the forward wavefield ui,j and adjoint wavefield vi,j have the followingexpressions:ui,j = A−1j qi,j ,vi,j = −A−>j P>i (Piui,j − di,j).(3.4)Here, the symbol > denotes the (complex-conjugate) transpose, and theterm diag(conj(ui,j))represents a diagonal matrix with the elements of thecomplex conjugate of the vector ui,j on the diagonal. In the equation 3.2,the dependence of the objective function fred(m) on m runs through thenonlinear operator A−1j qi,j , instead of via the linear operator Ajui,j . Asa result, the price we pay is that the objective function fred(m) becomeshighly nonlinear in m. The gains, of course, are that we no longer haveto optimize over the wavefields and that we can compute the forward andadjoint wavefields independently in parallel. However, these gains are per-haps a bit short-lived, because the reduced objective function fred(m) maycontain local minima (Warner et al., 2013a), which introduces more difficul-ties in the search for the best medium parameters using the local derivativeinformation only.3.3 Wavefield-reconstruction inversionTo free FWI from these parasitic minima, van Leeuwen and Herrmann(2013b) and van Leeuwen and Herrmann (2015) proposed WRI—a penaltyformulation of FWI that readsminimizem,ufpen(m,u) =12nsrc∑i=1nfreq∑j=1(‖Piui,j − di,j‖22 + λ2‖Ajui,j − qi,j‖22) .(3.5)39In this optimization problem, we keep the wavefields u as unknown variablesinstead of eliminating them as in FWI, i.e., we replace the PDE constraintby an `2-norm penalty term. The scalar parameter λ controls the balancebetween the data and the PDE misfits. As λ increases, the wavefield ismore tightly constrained by the wave equation, and the objective functionfpen(m,u) in equation 3.5 converges to the objective function of the reducedproblem in equation 3.2 (van Leeuwen and Herrmann, 2015). Different fromthe nonlinear objective function of the reduced problem in equation 3.2, theWRI objective fpen(m,u) is a bi-quadratic function with respect to m andu, i.e., for fixed m, the objective fpen(m,u) is quadratic with respect to uand vice versa. In addition, van Leeuwen and Herrmann (2013b) and vanLeeuwen and Herrmann (2015) indicated that WRI explores a larger searchspace by not satisfying the PDE-constraints exactly, which results in thatWRI is less “nonlinear” and may contain less local minima compared to theconventional adjoint-state method.3.4 Variable projection methodIn the optimization problem of equation 3.5, both the wavefields u andmedium parameters m are unknown. Searching for both u and m by meth-ods like gradient-descent and limited-memory Broyden–Fletcher–Goldfarb–Shanno (l-BFGS) method (Nocedal and Wright, 2006) requires storing thetwo unknown variables. However, the memory cost of storing u can be ex-tremely large, because u ∈ Cngrid×nfreq×nsrc . To mitigate the storage cost,van Leeuwen and Herrmann (2013b) and van Leeuwen and Herrmann (2015)applied the variable projection method (Golub and Pereyra, 2003) to solveequation 3.5 using the fact that fpen(m,u) is bi-quadratic with respect tom and u. The variable projection method is designed to solve a general cat-egory of problems named separable nonlinear least-squares (SNLLS), whichpermits the following standard expression:minimizeθ,xf(θ,x) =12‖Φ(θ)x− y‖2, (3.6)where the matrix Φ(θ) varies (nonlinearly) with respect to the variable vectorθ. The vectors x and y represent a linear variable and the observations,respectively. More specifically, we can observe that the WRI problem inequation 3.5 is a special case of SNLLS problems by settingθ = m, xi,j = ui,j , yi,j =[λqi,jdi,j], and Φi,j(θ) =[λAjPi]. (3.7)40The vectors x and y are block vectors containing all {xi,j}1≤i≤nsrc,1≤j≤nfreqand {yi,j}1≤i≤nsrc,1≤j≤nfreq . Likewise, Φ(θ) is a block-diagonal matrix.When introducing the variable projection method, it is often useful torecognize that for fixed θ, the objective function f(θ,x) becomes quadraticwith respect to x. As a consequence, the variable x has a closed-form optimalleast-squares solution x(θ) that minimizes the objective f(θ,x):x(θ) =(Φ(θ)>Φ(θ))−1Φ(θ)>y, (3.8)For each pair of (i, j), we now havexi,j(θ) =(Φi,j(θ)>Φi,j(θ))−1Φi,j(θ)>yi,j . (3.9)By substituting equation 3.8 into equation 3.6, we project out the variable xby the optimal solution x(θ) and arrive at a reduced optimization problemover θ alone, i.e., we haveminimizeθf(θ) = f(θ,x(θ))= ‖(I−Φ(θ)(Φ(θ)>Φ(θ))−1Φ(θ)>)y‖2. (3.10)The gradient of the reduced objective function f(θ) can be computed viathe chain rule, yieldingg(θ) = ∇θf(θ) = ∇θf(θ,x(θ))= ∇θf(θ,x)|x=x(θ) +∇xf(θ,x)|x=x(θ)∇θx.(3.11)Note that from the definition of x(θ), we have∇xf(θ,x)|x=x(θ) = 0. (3.12)With this equality, the gradient in equation 3.11 can be written as:g(θ) = ∇θf(θ,x)|x=x(θ). (3.13)Equation 3.13 implies that although the optimal solution x(θ) is a functionof θ, the construction of the gradient g(θ) does not need to include thecross-derivative term ∇xf(θ,x)|x=x(θ)∇θx.As shown by van Leeuwen and Herrmann (2013b) and van Leeuwenand Herrmann (2015), we can through substituting equation 3.7 into equa-tions 3.8, 3.10, and 3.13 successfully project out the wavefields u by com-41puting the optimal wavefield ui,j(m) for each source and frequency:ui,j(m) =(λ2A>j Aj + P>i Pi)−1 (λ2A>j qi,j + P>i di,j), (3.14)and arrive at minimizing the following reduced objective:minimizemfpen(m)=fpen(m,u(m))=12nsrc∑i=1nfreq∑j=1(‖Piui,j(m)− di,j‖22 + λ2‖Ajui,j(m)− qi,j‖22) ,(3.15)with a gradient given bygpen(m) =nsrc∑i=1nfreq∑j=1λ2ω2i diag(conj(ui,j(m)))(Ajui,j(m)− qi,j). (3.16)With the projection of the wavefields u in equation 3.14, the new reducedobjective fpen(m) varies only with respect to the variable m. As a result,we do not need to store the u’s reducing the storage costs from O(ngrid ×(nfreq × nsrc + 1)) to O(ngrid).3.5 FWI v.s. WRITo illustrate the potential advantage of WRI over FWI, we conduct a simplenumerical experiment, adopted from van Leeuwen and Herrmann (2013b),to compare the objective functions of FWI and WRI. We work with a 2Dvelocity model that is parameterized by a single varying parameter v0 asfollows:v(z) = v0 + 0.75zm/s, (3.17)where the parameter z increases with the vertical depth. We simulate datawith a single source for v0 = 2500 m/s using a grid spacing of 50 m and a sin-gle frequency of 5 Hz. The data do not contain free-surface related multiples.We place the source at (z, x) coordinates (50 m, 50 m) and record the data at200 receivers located at the depth of 50 m with a sampling interval of 50 m.In Figure 3.1, we compare the objective functions corresponding to FWIand WRI as a function of v0 for various values of λ, i.e., λ = 102, 103, 104,and 105. Clearly, the global minima of WRI for different values of λ coin-cide with the one of FWI. Meanwhile, the objective function of WRI for a422300 2400 2500 2600 2700v0 [m/s]00.10.20.30.40.50.60.70.80.91misfitλ = 10 2λ = 10 3λ = 10 4λ = 10 5reducedFigure 3.1: Comparison of objective functions of FWI and WRI as afunction of v0. For WRI, we select λ = 102, 103, 104, and 105.Clearly, as λ increases the objective function of WRI convergesto that of the reduced formulation. When λ is small, there areno local minima in the objective function of WRI.small λ exhibits no local minima, while local minima exist in the objectivefunctions corresponding to FWI and WRI for a large λ. Furthermore, thebehavior of the objective function for WRI converges to the one of FWI asλ increases. This example implies that through expanding the search spaceand carefully selecting the penalty parameter, WRI is potentially less proneto the local minima compared to the conventional FWI.3.6 ConclusionsThis chapter has discussed the theoretical formulations of full-waveform in-version and wavefield-reconstruction inversion in the frequency domain. Wederived the optimization problem associated with FWI and the correspond-ing gradient. We also derived the optimization problem of WRI and in-troduced how to use the variable projection method to solve WRI. Afterintroducing these two approaches, we illustrated the potential advantage of43WRI over FWI on mitigating local minima by a simple 2D example. We ob-served that the objective function of WRI converges to that of the reducedformulation as λ increases, while the objective functions of WRI exhibit nolocal minima for small selections of λ. In the next three chapters, we willextensively use the application of wavefield-reconstruction inversion basedon the mathematical formulation described in this chapter.44Chapter 4Source estimation forwavefield-reconstructioninversion4.1 IntroductionSource information is essential to all wave-equation-based seismic inversions,including full-waveform inversion (FWI) and wavefield-reconstruction inver-sion (WRI). When the source function is inaccurate, errors will propagateinto the predicted data and introduce additional data misfit. As a conse-quence, inversion results that minimize this data misfit may become erro-neous. To mitigate the errors introduced by the incorrect and pre-estimatedsources, an embedded procedure that updates sources along with mediumparameters is necessary for the inversion.Many source estimation approaches have been developed for the conven-tional FWI, also known as the adjoint-state method (Plessix, 2006; Virieuxand Operto, 2009). Liu et al. (2008) proposed to treat source functions asunknown variables and update them with medium parameters simultane-ously. However, the amplitudes of source functions and medium parametersare different in scale, and so are the amplitudes of their corresponding gradi-ents. As a result, the gradient-based optimization algorithms would tend toprimarily update the one at the larger scale. Pratt (1999) proposed anotherapproach to estimate source functions during each iteration of the adjoint-state method. In this approach, the author estimates the source functionsA version of this chapter has been submitted.45after each update of the medium parameters by solving a least-squares prob-lem for each frequency. The solution of the least-squares problem is a singlecomplex scalar that minimizes the difference between the observed and pre-dicted data computed with the updated medium parameters. Because theupdates of the source functions and medium parameters are in two separatesteps, this method is not impacted by the different amplitude scales of thegradients. Indeed, Aravkin and van Leeuwen (2012) and van Leeuwen et al.(2014) pointed out that the problem of FWI with source estimation fallsinto a general category called separable nonlinear least-squares (SNLLS)problems (Golub and Pereyra, 2003), which can be solved by the so-calledvariable projection method. The method proposed by Pratt (1999) fallsinto that category. By virtue of these variable projections, the gradientwith respect to the source functions becomes zero, because the estimatedsource is the optimal solution that minimizes the misfit between the observedand predicted data given the current medium parameters (Aravkin and vanLeeuwen, 2012). For this reason, the gradient with respect to the mediumparameters no longer contains the non-zero cross derivative with respect tothe source functions, and the inverse problem becomes source-independent.In fact, because the problem of FWI with source estimation is a special caseof the SNLLS problem, the minimization of this source-independent objec-tive converges in fewer iterations than that of the original source-dependentobjective theoretically (Golub and Pereyra, 2003). Furthermore, Li et al.(2013) presented empirical examples to illustrate that the variable projec-tion method is more robust to noise compared to the method that performsalternating gradient descent steps on two variables separately (Zhou andGreenhalgh, 2003), as well as to the one that minimizes the two variablessimultaneously (Liu et al., 2008).Similar to the conventional FWI, WRI does, as illustrated in Figures 4.1and 4.2, also require accurate information on the source functions. Forinstance, a small difference in the origin time of the source can lead toa significant deterioration of the inversion result (cf. Figures 4.1 and 4.2).However, up to now, there is still missing an on-the-fly source estimationprocedure in the context of WRI .Recognizing this sensitivity to source functions, in this chapter, we pro-pose a modified WRI framework with source estimation integrated, makingthe method applicable to field data. In this new formulation, we continue tosolve an optimization problem but now one that has the source functions,medium parameters, and wavefields all as unknowns. For fixed medium pa-rameters, we observe that the objective function of this optimization prob-lem is quadratic with respect to both wavefields and source functions. As460 500 1000 1500 2000 2500 3000Lateral [m]050010001500Depth [m]11.522.53km/s(a) True model0 0.5 1 1.5 2Time [s]-0.500.51AmplitudeCorrect SourceWrong Source(b) Correct source v.s. wrong sourceFigure 4.1: (a) The true model. (b) Comparison of the correct (blue)and wrong (red) source functions.a result, if we collect the wavefield and source function for each source ex-periment into one single unknown vector variable, the optimization problembecomes an SNLLS problem with respect to the medium parameters andthis new unknown variable. As before, we propose the variable projec-tion method to solve the optimization problem by projecting out this newunknown variable during each iteration. After these least-squares prob-lems, we compute single model updates either with the gradient descentor limited-memory Broyden–Fletcher–Goldfarb–Shanno (l-BFGS) method(Nocedal and Wright, 2006).This chapter is organized as follows. First, we present the optimizationproblem of the source estimation for WRI. This optimization problem canbe formulated as an SNLLS problem by collecting the wavefield and source470 500 1000 1500 2000 2500 3000Lateral [m]050010001500Depth [m]11.522.53km/s(a) Result with the true source0 500 1000 1500 2000 2500 3000Lateral [m]050010001500Depth [m]11.522.53km/s(b) Result with the wrong sourceFigure 4.2: (a) Inversion result with the true source function. (b)Inversion result with the wrong source function.function for each source experiment into one single unknown vector variable.Then, we apply the variable projection method to solve this optimizationproblem through simultaneously projecting out the wavefield and sourcefunction during each iteration. We conclude by presenting a computationallyefficient implementation to compute the projection, which we evaluate onseveral synthetic examples to highlight the benefits of our approach.4.2 WRI with source estimationWhile spatial locations of the sources are often known quite accurately, thesource function is generally unknown. If we ignore spatial directivity of thesources, we can handle this situation in the context of WRI by incorporating48the source function through setting qi,j = αi,jei,j in the frequency-domain,where αi,j is a complex number representing the frequency-dependent sourceweight for the ith source experiment at the jth frequency. In this expression,the symbol ei,j represents a unit vector that equals one at the source locationand zero elsewhere. When we substitute qi,j = αi,jei,j into the objectivefunction in equation 3.5, we arrive at an optimization problem with threeunknown variables, i.e., we haveminimizem,u, αfpen(m, u, α)=12nsrc∑i=1nfreq∑j=1(‖Piui,j − di,j‖22 + λ2‖Ajui,j − αi,jei,j‖22) , (4.1)where the vector α = {αi,j}1≤i≤nsrc,1≤j≤nfreq contains all the source weights.Because this new objective fpen(m, u, α) contains more than two un-knowns, it no longer corresponds to a standard SNLLS problem, so we cannot directly apply the variable projection method to solve it. One simpleapproach that comes to mind to convert the problem in equation 4.1 to astandard SNLLS problem would be to reduce the number of unknown vari-ables by collecting two of the three variables into a new variable. Sincefor fixed m and α, fpen(m, u, α) is still quadratic with respect to u, themost straightforward approach would be to lump the medium parameters mand the source weights α together. Following the notation of the standardSNLLS problems, for each pair of (i, j), we now haveθ =[mα], xi,j =[ui,j1], yi,j =[0di,j], and Φi,j(θ) =[λAj −λαi,jei,jPi 0].(4.2)Through the substitution of equation 4.2 into equations 3.9 and 3.10, we canagain apply the variable projection method to project out the wavefields uviaui,j(m, αi,j) =(λ2A>j Aj + P>i Pi)−1 (λ2αi,jA>j ei,j + P>i di,j). (4.3)In this way, we arrive at the following reduced problem:minimizem, αf˜(m, α) = fpen(m, u(m, α), α). (4.4)Similarly, the computation of the gradient g˜(m, α) of the new objectivef˜(m, α) is straightforward if we substitute equation 4.2 into equation 3.13,49which yieldsg˜(m, α) =[∇mf˜(m, α)∇αf˜(m, α)]=[∇mfpen(m,u, α)|u=u(m,α)∇αfpen(m,u, α)|u=u(m,α)], (4.5)where∇mfpen(m,u, α)|u=u(m,α)=nsrc∑i=1nfreq∑j=1λ2ω2i diag(conj(ui,j(m, αi,j)))(Ajui,j(m, αi,j)− αi,jei,j) (4.6)is the gradient with respect to m and∇αfpen(m,u, α)|u=u(m,α)={λ2e>i,j(αi,jei,j −Ajui,j (m, αi,j))}1≤i≤nsrc,1≤j≤nfreq (4.7)is the gradient with respect to the source weights.While the gradients in equations 4.6 and 4.7 seemingly provide all in-formation that we would need to drive the inversion, their amplitudes maysignificantly differ in scale, which is a common problem in optimizationproblems with multiple unknowns (Sambridge, 1990; Kennett et al., 1988;Rawlinson et al., 2006). An undesired consequence of this mismatch be-tween the gradients in equations 4.6 and 4.7 is that the optimization maytend to primarily update the variable with the larger gradient contributionresulting in small updates for the other variable. This behavior may slowdown the convergence, and the small updates may lead to errors that endup in the other variable resulting in a poor solution.To mitigate the challenge arising from the different amplitude scales of mand α and their gradients, we propose another strategy to reduce the originalproblem in equation 4.1. Instead of collecting m and α together, we lump uand α into a single vector x. To understand the potential advantage of thisalternative strategy, let us rewrite the optimization problem in equation 4.150asminimizem,u, αfpen(m,u, α) =12nsrc∑i=1nfreq∑j=1‖[λAj −λei,jPi 0] [ui,jαi,j]−[0di,j]‖2=12nsrc∑i=1nfreq∑j=1‖[λAj −λei,jPi 0]xi,j −[0di,j]‖2: = fˆpen(m,x).(4.8)For fixed m, the objective function fˆpen(m,x) continues to be quadratic inthe new variable x, where u and α are grouped together. As a result, we canrewrite the optimization problem in equation 4.8 into the standard SNLLSform (see equation 3.6) when we use the following notations:θ = m, xi,j =[ui,jαi,j], yi,j =[0di,j], and Φi,j(θ) =[λAj −λei,jPi 0]. (4.9)Now by substituting the notations in equation 4.9 into equation 3.9, wecan apply the variable projection method again to project out x from theobjective function fˆpen(m,x) for fixed m by computing the optimal solutionxi,j(m) =[ui,j(m)αi,j(m)]=[λ2A>j Aj + P>i Pi −λ2A>j ei,j−λ2e>i,jAj λ2e>i,jei,j]−1 [P>i di,j0].(4.10)The linear system in equation 4.10 is solvable if and only if the rank of thematrix Φi,j(θ) in equation 4.9 is ngrid + 1, which requiresPiA−1j ei,j 6= 0. (4.11)In fact, this means that the wavefields corresponding to a point source donot vanish at all the receiver locations simultaneously, which holds in almostall seismic situations. Indeed, even in the extremely rare situation wherea source is made nearly “silent” or non-radiating due to its surroundinggeological structures, we can simply discard this source in the inversion toavoid inverting a rank deficient matrix. After the projection in equation 4.10,we obtain an objective function that only depends on m:fˆ(m) = fpen(m,u(m), α(m)), (4.12)51whose gradient gˆ(m) can be derived asgˆ(m) = ∇mfˆ(m) = ∇mfpen(m,u, α)|u=u(m),α=α(m)=nsrc∑i=1nfreq∑j=1λ2ω2i diag(conj(ui,j(m)))(Ajui,j(m)− αi,j(m)ei,j).(4.13)Compared to the objective function f˜(m, α), which we obtained by pro-jecting out u alone, the objective function fˆ(m) only depends on m. As aconsequence, we avoid having to optimize over two different variables thatdiffer in scale. Moreover, because the objective fˆ(m) is source-independent,the optimization with it does not require any initial guesses of the sourceweights, while the optimization with the objective f˜(m, α) does need a suf-ficiently accurate initial guess to ensure reaching the correct solution.4.3 Optimization schemeWith the gradient gˆ(m) we derived in the previous section we can com-pute updates for the medium parameters via the steepest-descent method(Nocedal and Wright, 2006):mk+1 = mk − βkgˆ(mk). (4.14)Here, the value βk is an appropriately chosen step length at the kth iter-ation, which requires a line search to determine. Even with an optimizedstep length, as a first-order method, the steepest-descent method can beslow. To speed up the convergence, second-order methods including New-ton method, Gauss-Newton method and quasi-Newton method may be moredesirable (Pratt, 1999; Akcelik, 2002; Askan et al., 2007). During each it-eration, Newton method and Gauss-Newton method use the full or Gauss-Newton Hessian to compute the update direction. However, the construc-tion and inversion of the full or Gauss-Newton Hessian for both FWI (Pratt,1999) and WRI (van Leeuwen and Herrmann, 2015) involve a large amountof additional PDE solves, which makes these two methods less attractivein this context. On the other hand, the quasi-Newton method, especiallythe l-BFGS method (Nocedal and Wright, 2006; Brossier et al., 2009; vanLeeuwen and Herrmann, 2013a), utilizes the gradient at the current iterationk and a few previous gradients (typically, 3-20 iterations) to construct anapproximation of the inverse of the Hessian. Hence, except for the increasedmemory, the l-BFGS method constructs the approximation of the inverse52of the Hessian for free. For this reason, we use the l-BFGS method as ouroptimization scheme and denote the l-BFGS update asmk+1 = mk − βkH(mk)−1gˆ(mk), (4.15)where the matrix H(mk) is the l-BFGS Hessian and the step length βk isdetermined by a weak Wolfe line search. We give a detailed description ofthe l-BFGS method for the proposed WRI framework with source estimationin Algorithm 3.Algorithm 3 l-BFGS algorithm for WRI with source estimation1. Initialization with an initial model m02. for k = 1→ niter3. for j = 1→ nfreq4. for i = 1→ nsrc5. Compute ui,j(mk) and αi,j(mk) by equation 4.106. end7. end8. Compute fˆk and gˆk by equations 4.12 and 4.139. Apply l-BFGS Hessian with history size nhis10. pk = lbfgs(−gˆk, {tl}kl=k−nhis , {sl}kl=k−nhis)11. {mk+1, fˆk+1, gˆk+1} = line search(fˆk, gˆk,pk)12. tk+1 = mk+1 −mk13. sk+1 = gˆk+1 − gˆk14. end4.4 Fast solver for WRI with source estimationMost of the computational burden for the objective function and its gradientin equations 4.12 and 4.13 lies within inverting the matrixCi,j(m) = Φi,j(m)>Φi,j(m) =[λ2A>j Aj + P>i Pi −λ2A>j ei,j−λ2e>i,jAj λ2e>i,jei,j]. (4.16)For 2D problems, we can invert the matrix Ci,j(m) by direct solvers suchas the Cholesky method, while for 3D problems, we may need an itera-tive solver equipped with a proper preconditioner. In either case, followingcomputational challenges arise from the fact that the matrix Ci,j(m) dif-fers from source to source and frequency to frequency. First, to apply theCholesky method, we need to calculate the Cholesky factorization for each53i and j, i.e., for each source and frequency. As a result, the computationalcost arrives at O(nsrc × nfreq × (n3grid + n2grid)), which prohibits the appli-cation to problems with a large number of sources. Secondly, for iterativesolvers, because the matrix Ci,j(m) varies with respect to i and j, the cor-responding preconditioner may as well as depend on i and j. Therefore,we may have to design nsrc × nfreq different preconditioners, which can becomputationally difficult and intractable. Moreover, the additional terms−λ2A>j ei,j , −λ2e>i,jAj , and λ2e>i,jei,j may lead to the condition number ofthe matrix Ci,j worse than that of the matrix λ2A>j Aj +P>i Pi. As a result,without a good preconditioner, the projection procedure in the framework ofWRI with source estimation may be much slower than that without sourceestimation.To lighten the computational costs of inverting the matrix Ci,j(m), wedescribe a new inversion scheme to implement the algorithm when the pro-jection operator P remains the same for all sources. This situation is com-mon in ocean bottom node acquisition and the land acquisition where allsources “see” the same receivers. When we replace Pi by a single P inequation 4.16, the matrix Ci,j(m) can be simplified asCi,j(m) =[λ2A>j Aj + P>P −λ2A>j ei,j−λ2e>i,jAj λ2e>i,jei,j]. (4.17)Unfortunately, this matrix Ci,j(m) still depends on the source and frequencyindices, and a straightforward inversion still faces the aforementioned com-putational challenges. However, this form, equation 4.17, allows us to use analternative block matrix formula to invert Ci,j(m). To arrive at this result,let us first write Ci,j(m) in a simpler formCi,j(m) =[M1 M2M3 M4], (4.18)where the matrix M1 = λ2A>j Aj+P>P, the vector M2 = M>3 = −λ2A>j ei,j ,and the scalar M4 = λ2e>i,jei,j . Now if we apply the block matrix inversionformula (Bernstein, 2005) to compute C−1i,j (m), we arrive at the following54closed form analytical expressionC−1i,j (m) =[M˜1 M˜2M˜3 M˜4], withM˜1 = (I + M−11 M2(M4 −M3M−11 M2)−1M3)M−11 ,M˜2 = −M−11 M2(M4 −M3M−11 M2)−1,M˜3 = −(M4 −M3M−11 M2)−1M3M−11 , andM˜4 = (M4 −M3M−11 M2)−1.(4.19)While complicated the right-hand side of this equation only involves the in-version of the source-independent matrix M1, all other terms are all scalarinversions that can be evaluated at negligible costs. Substituting equa-tion 4.19 into equation 4.10, we obtain an analytical solution for the optimalvariablexi,j(m) = C−1i,j (m)[P>di,j0]=[(I + M−11 M2(M4 −M3M−11 M2)−1M3)M−11 P>di,j−(M4 −M3M−11 M2)−1M3M−11 P>di,j].(4.20)In equation 4.20, we have to invert two terms, i.e., M1 and M4−M3M−11 M2.Because M4 is a scalar, the term (M4−M3M−11 M2)−1 is a scalar inversion,whose computational cost is negligible. Therefore, to construct xi,j(m), weonly need to compute the following two vectors:w1 = M−11 M2= −λ2(λ2A>j Aj + P>P)−1A>j ei,j ,(4.21)andw2 = M−11 P>di,j= (λ2A>j Aj + P>P)−1P>di,j .(4.22)From these two expressions, we can observe that the computation of the vec-tors w1 and w2 for each source is independent from that of other sources,therefore, we can sequentially compute them from source to source, yield-ing a negligible requirement of storing only 2 additional wavefields dur-ing the inversion. For each frequency, because the matrix M1 is source-independent, we only need one Cholesky factorization, whose computationalcost is O(n3grid). With the pre-computed Cholesky factors, for each source,55nr. of Choleskyfactorizations perfrequencynr. of inversionsw/ Choleskyfactors perfrequencyw/o SE 1 2 per sourcew/ SE – old form nsrc 2 per sourcew/ SE – new form 1 4 per sourceTable 4.1: Comparison of the computational costs for different algo-rithms.solving w1 and w2 by equations 4.21 and 4.22 requires inverting the matrixM1 twice with a computational cost of O(n2grid). Consequently, we reducethe total cost from O(nsrc × nfreq × (n3grid + n2grid)) to O(nfreq × (n3grid + 2×nsrc×n2grid)). As a reference, the computational complexity of WRI withoutsource estimation is of a close order O(nfreq × (n3grid + nsrc × n2grid)) (vanLeeuwen and Herrmann, 2015). A comparison of the computational com-plexity of WRI with and without source estimation (SE) using the Choleskymethod is shown in Table 4.1. In this table, we refer to the scheme that di-rectly inverts the matrix Ci,j(m) as the old form, and refer to the proposedscheme in equation 4.20 as the new form. Compared to the old form, in-stead of requiring nsrc Cholesky factorizations, the proposed new form onlyrequires 1 Cholesky factorization for each frequency, which significantly re-duces the computational cost. Furthermore, besides reducing the cost for di-rect solvers, the proposed inversion scheme can also benefit iterative solvers.For each frequency, since all sources only need to invert the same matrixM1, the proposed new form avoids inverting the potentially ill-conditionedmatrix Ci,j directly and only requires one preconditioner instead of nsrc,which significantly simplifies the projection procedure. In the following sec-tion of numerical experiments, we will only show the computational gain fordirect solvers.564.5 Numerical experiments4.5.1 BG model with frequency-domain dataTo compare the performance of the two source estimation methods for WRIdescribed in the previous section, we conduct numerical experiments on apart of the BG compass model mt shown in Figure 4.3a, which is a geologi-cally realistic model created by BG Group and has been widely used to eval-uate performances of different FWI methods (Li et al., 2012; van Leeuwenand Herrmann, 2013a). We will refer to the method that combines m and αas WRI-SE-MS and the one that combines u and α as WRI-SE-WS. For ourdiscretization, we use an optimal 9-point finite-difference frequency-domainforward modeling code (Chen et al., 2013). Sources and receivers are posi-tioned at the depth of z = 20 m with a source distance of 50 m and a receiverdistance of 10 m, resulting a maximum offset of 4.5 km. Data is generatedwith a source function given by a Ricker wavelet centered at 20 Hz with atime shift of 0.5 s. We do not model the free-surface, so the data containno free-surface related multiples. As is commonly practiced (see e.g. Pratt(1999)), we perform frequency continuation using frequency bands rangingfrom {2, 2.5, 3, 3.5, 4, 4.5}Hz to {27, 27.5, 28, 28.5, 29, 29.5}Hz with anoverlap of one frequency between every two consecutive bands. During theinversion, the result of the current frequency band serves as a warm starterfor the inversion of the next frequency band. During each iteration, we alsoapply a point-wise bound constraint (Peters and Herrmann, 2016) to theupdate to ensure that each gridpoint of the model update remains within ageologically reasonable interval. In this experiment, because the lowest andhighest velocities of the true model are 1480 m/s and 4500 m/s, we set theinterval of the bound constraint to be [1300, 5000] m/s. The initial modelm0 (Figure 4.3b) is generated by smoothing the original model, followedby an average along the horizontal direction. The difference between theinitial and true models is shown in Figure 4.3c. We use this initial modeland apply the source estimation method proposed by Pratt (1999) to obtainan initial guess of the source weights for WRI-SE-MS. Because this guessminimizes the difference between the observed and predicted data with theinitial model, it can be considered as the best choice, when there is no moreadditional information. For WRI-SE-WS, as described in the previous sec-tions, it does not need an initial guess of the source weights. To control thecomputational load, we fix the maximum number of l-BFGS iterations tobe 20 for each frequency band. We select the penalty parameter λ = 1e457according to the selection criteria in van Leeuwen and Herrmann (2015) thatoptimizes the performance of WRI.As a benchmark, we depict the inversion result obtained with the truesource weights in Figure 4.4a. Figures 4.4b and 4.4c show the inversion re-sults obtained by WRI-SE-MS and WRI-SE-WS, respectively. Figure 4.5ashows the difference between the results in Figures 4.4a and 4.4b, and Fig-ure 4.5b displays the difference between the results in Figures 4.4a and 4.4c.We observe that while both the results obtained by WRI-SE-MS and WRI-SE-WS are close to the result obtained with the true source weights, thedifference shown in Figure 4.5a is almost 10 times larger than that shownin Figure 4.5b. In addition, we compare the true source weights and theestimated source weights obtained with WRI-SE-MS and WRI-SE-WS inFigures 4.6a (phase) and 4.6b (amplitude). From these two figures, we ob-serve that both methods can provide reasonable estimates of the sourceweights while the estimates with WRI-SE-WS contain smaller errors. Theseerrors found in the source weights may result in the large model differ-ences shown in Figure 4.5a. In Figure 4.7, we display the relative modelerrors ‖mt−m‖‖mt−m0‖ versus the number of iterations during the three inversions.The dashed, solid, and dotted curves correspond to inversions with the truesource weights and those estimates using WRI-SE-MS and WRI-SE-WS.The relative model errors of WRI-SE-WS are almost the same as thoseusing the true source weights, while the errors of WRI-SE-MS are clearlylarger than the other ones. Figure 4.8 depicts a comparison of the dataresiduals corresponding to the inversion results of WRI-SE-MS and WRI-SE-WS. Clearly, the data residual of WRI-SE-WS is much smaller than thatof WRI-SE-MS. In Table 4.2, we quantitatively compare the inversion re-sults obtained by WRI-SE-MS and WRI-SE-WS in terms of the final modelerrors, source errors, and data residuals, which also illustrates the outper-formances of WRI-SE-WS over WRI-SE-MS. These observations imply thatcompared to iteratively updating α, projecting out α and u together duringeach iteration can provide more accurate estimates of source weights, whichfurther benefits the inversions of medium parameters.To evaluate the computational performance of the fast inversion schemeof WRI-SE-WS, we compare the time spent in evaluating one objective forthree cases, namely (i) WRI without source estimation; (ii) WRI-SE-WSwith the old form, equation 4.10; and (iii) WRI-SE-WS with the new form,equation 4.20. The computational time for each of the three cases is 94 s,2962 s, and 190 s, respectively. As expected, case (iii) spends twice theamount of time as case (i), because of the additional PDE solves for eachfrequency and each source. As only one Cholesky factorization is required580 1000 2000 3000 4000Lateral [m]0500100015002000Depth [m]1.522.533.544.55km/s(a) True model0 1000 2000 3000 4000Lateral [m]0500100015002000Depth [m]1.522.533.544.55km/s(b) Initial model0 1000 2000 3000 4000Lateral [m]0500100015002000Depth [m]-1-0.500.51km/s(c) Difference between the initial and true modelsFigure 4.3: (a) True model, (b) initial model, and (c) the differencebetween them.590 1000 2000 3000 4000Lateral [m]0500100015002000Depth [m]1.522.533.544.55km/s(a) Inversion result with the true source weights0 1000 2000 3000 4000Lateral [m]0500100015002000Depth [m]1.522.533.544.55km/s(b) Inversion result with WRI-SE-MS0 1000 2000 3000 4000Lateral [m]0500100015002000Depth [m]1.522.533.544.55km/s(c) Inversion result with WRI-SE-WSFigure 4.4: (a) Inversion result with the true source function. (b) In-version result with WRI-SE-MS. (c) Inversion result with WRI-SE-WS.600 1000 2000 3000 4000Lateral [m]0500100015002000Depth [m]0.020.040.060.080.10.120.14km/s(a) Model difference for WRI-SE-MS0 1000 2000 3000 4000Lateral [m]0500100015002000Depth [m]0.020.040.060.080.10.120.14km/s(b) Model difference for WRI-SE-WSFigure 4.5: (a) Difference between the inversion results using the truesource function and the estimated source function by WRI-SE-MS. (b) Difference between the inversion results using the truesource function and the estimated source function by WRI-SE-WS.WRI-SE-MS WRI-SE-WS‖mt −mf‖2 48 45‖αt − αf‖2 9.3e2 8e1‖dobs − dpred‖2 6.6e3 1.5e3Table 4.2: Comparisons between inversion results obtained by WRI-SE-MS and WRI-SE-WS in terms of the final model errors, sourceerrors, and data residuals.610 10 20 30Frequency [Hz]-4-2024PhaseTrue sourceEstimated source with WRI-SE-MSEstimated source with WRI-SE-WS(a) Phase comparison0 10 20 30Frequency [Hz]050100150AmplitudeTrue sourceEstimated source with WRI-SE-MSEstimated source with WRI-SE-WS(b) Amplitude comparisonFigure 4.6: Comparison of the true source function (+) and the esti-mated source function with WRI-SE-MS (×) and WRI-SE-WS(◦). (a) Phase comparison. (b) Amplitude comparison.620 50 100 150 200Iteration0.50.550.60.650.70.750.80.850.90.951Relative Model ErrorTure Source WaveletWRI-SE-MSWRI-SE-WSFigure 4.7: Comparison of the relative model errors.0 500 1000 1500 2000 2500 3000 3500 4000 4500Receiver [m]0102030405060708090|dobs-d pred|WRI-SE-MSWRI-SE-WSFigure 4.8: Comparison of the data residuals corresponding to theinversion results of WRI-SE-MS (dashed line) and WRI-SE-WS(dotted line).63for case (iii), the computational time is only 1/15 of that for case (ii). Thisresult illustrates that the proposed approach can estimate source weights inthe context of WRI with a small computational overhead.4.5.2 BG model with time-domain dataTo test the robustness of the two source estimation techniques in less idealsituations, we perform the inversion tests on non-inverse-crime data. For thispurpose, we generate time-domain data with a recording time of 4 secondsusing iWAVE (Symes et al., 2011) and transform the data from the timedomain to the frequency domain for the inversions. The data are generatedon uniform grids with a grid size of 6 m, while the inversions are carried outon uniform grids with a coarser grid of 10 m. As a result, modeling errorsarise from the differences between the modeling kernels and the grid sizes.All other experimental settings in this example are the same as example1. Figures 4.9a and 4.9b show the inversion results with WRI-SE-MS andWRI-SE-WS, respectively. From Figure 4.9a, we can observe that methodWRI-SE-MS fails to converge to a reasonable solution, as there is a largelow-velocity block at the depth of z = 1500 m, which does not exist in thetrue model. On the other hand, the result of WRI-SE-WS (Figure 4.9b) ismore consistent with the true model. The average model errors presentedin the inversion results of WRI-SE-MS and WRI-SE-WS are 0.18 km/s and0.11 km/s, respectively. Moreover, compared to the true source weights, theamplitudes of the estimated weights obtained with WRI-SE-MS contain verylarge errors, while the results obtained with WRI-SE-WS are almost iden-tical to the true source weights (see Figure 4.10). Additionally, the residualbetween the observed and predicted data computed by the inversion resultsof WRI-SE-WS is much smaller than that of WRI-SE-MS (see Figure 4.11).We can also obtain similar observations from the quantitative comparisonpresented in Table 4.3. These results imply that compared to WRI-SE-MS,WRI-SE-WS is more robust with respect to the modeling errors.4.5.3 BG model with noisy dataTo test the stability of the proposed two techniques with respect to measure-ment noise in data, we add 40% Gaussian noise into the data used in example2. Again, other experimental settings remain the same as in example 2. Asexpected, due to the noise in the data, both inverted models (Figures 4.12aand 4.12b) contain more noise than the inverted models in example 2. Sim-ilar to example 2, the result of WRI-SE-MS still contains a large incorrect640 1000 2000 3000 4000Lateral [m]0500100015002000Depth [m]1.522.533.544.55km/s(a) Inversion result with WRI-SE-MS0 1000 2000 3000 4000Lateral [m]0500100015002000Depth [m]1.522.533.544.55km/s(b) Inversion result with WRI-SE-WSFigure 4.9: Inversion results with (a) WRI-SE-MS and (b) WRI-SE-WS.WRI-SE-MS WRI-SE-WS‖mt −mf‖2 80 53‖αt − αf‖2 1.1e3 2.1e2‖dobs − dpred‖2 6.8e3 2.4e3Table 4.3: Comparisons between inversion results obtained by WRI-SE-MS and WRI-SE-WS in terms of the final model errors, sourceerrors, and data residuals.650 10 20 30Frequency [Hz]-4-2024PhaseTrue sourceEstimated source with WRI-SE-MSEstimated source with WRI-SE-WS(a) Phase comparison0 10 20 30Frequency [Hz]0200400600800AmplitudeTrue sourceEstimated source with WRI-SE-MSEstimated source with WRI-SE-WS(b) Amplitude comparisonFigure 4.10: Comparison of the true source function (+) and the esti-mated source function with WRI-SE-MS (×) and WRI-SE-WS(◦). (a) Phase comparison. (b) Amplitude comparison.660 500 1000 1500 2000 2500 3000 3500 4000 4500Receiver [m]0100200300400500600700800|dobs-d pred|WRI-SE-MSWRI-SE-WSFigure 4.11: Comparison of the data residuals corresponding to theinversion results of WRI-SE-MS (dashed line) and WRI-SE-WS (dotted line).low-velocity block at the depth of z = 1500 m, which we do not find in theresult of WRI-SE-WS. The final average model errors of WRI-SE-MS andWRI-SE-WS are 0.22 km/s and 0.16, km/s, respectively. A comparison ofthe true source weights (+), estimated source weights obtained with WRI-SE-MS (×) and WRI-SE-WS (◦) is depicted in Figure 4.13. The estimatedsource weights with WRI-SE-WS agree with the true source weights muchbetter than those obtained with WRI-SE-MS. We also compare the dataresiduals corresponding to the inversion results of WRI-SE-WS and WRI-SE-MS in Figure 4.14. Clearly, WRI-SE-MS produces larger data residualsthan WRI-SE-WS. Moreover, the quantitative comparison in Table 4.4 il-lustrates that the inversion results of WRI-SE-WS exhibit smaller modelerrors, source errors, and data residuals when compared to that of WRI-SE-MS. These observations imply that compared to WRI-SE-MS, WRI-SE-WSis more robust and stable with respect to measurement noise.4.5.4 Comparison with FWIFinally, we intend to compare the performances of FWI with source esti-mation and WRI-SE-WS under bad initial models. We use the same ex-perimental settings as example 1 except with frequency bands ranging from{7, 7.5, 8, 8.5, 9, 9.5}Hz to {27, 27.5, 28, 28.5, 29, 29.5}Hz and the selec-tion of the penalty parameter λ = 1e0. During the inversions, we use theinitial model displayed in Figure 4.15a. This model is difficult for both FWI670 1000 2000 3000 4000Lateral [m]0500100015002000Depth [m]1.522.533.544.55km/s(a) Inversion result with WRI-SE-MS0 1000 2000 3000 4000Lateral [m]0500100015002000Depth [m]1.522.533.544.55km/s(b) Inversion result with WRI-SE-WSFigure 4.12: Inversion results with (a) WRI-SE-MS and (b) WRI-SE-WS.WRI-SE-MS WRI-SE-WS‖mt −mf‖2 96 76‖αt − αf‖2 1.5e3 2.3e2‖dobs − dpred‖2 2.1e4 9.8e3Table 4.4: Comparisons between inversion results obtained by WRI-SE-MS and WRI-SE-WS in terms of the final model errors, sourceerrors, and data residuals.680 10 20 30Frequency [Hz]-4-2024PhaseTrue sourceEstimated source with WRI-SE-MSEstimated source with WRI-SE-WS(a) Phase comparison0 10 20 30Frequency [Hz]0200400600800AmplitudeTrue sourceEstimated source with WRI-SE-MSEstimated source with WRI-SE-WS(b) Amplitude comparisonFigure 4.13: Comparison of the true source function (+) and the esti-mated source function with WRI-SE-MS (×) and WRI-SE-WS(◦). (a) Phase comparison. (b) Amplitude comparison.690 500 1000 1500 2000 2500 3000 3500 4000 4500Receiver [m]05001000150020002500|dobs-d pred|WRI-SE-MSWRI-SE-WSFigure 4.14: Comparison of the data residuals corresponding to theinversion results of WRI-SE-MS (dashed line) and WRI-SE-WS (dotted line).and WRI with the given frequency range, because in the shallow part ofthe model, i.e., from the depth of 0 m to 120 m, the velocity of the initialmodel is 0.2 km/s higher than the true one (shown in Figure 4.15b), whichcan produce cycle-skipped predicted data shown in Figure 4.16. Moreover,as the maximum offset is 4.5 km, the transmitted waves that the conven-tional FWI uses to build up long-wavelength structures can only reach thedepth of 1.5 km (Virieux and Operto, 2009). When the transmitted dataare cycle-skipped, the resulting long-wavelength velocity structures wouldbe erroneous, which would further adversely affect the reconstruction of theshort-wavelength velocity structures, especially those below 1.5 km.Figures 4.17a and 4.17b show the inversion results obtained by WRIand FWI, respectively. As expected, due to the cycle-skipped data and ab-sence of low-frequency data, the conventional FWI fails to correctly invertthe velocity in the shallow area where the transmitted waves arrive, i.e.,z ≤ 1.5 km, which subsequently yields larger errors within the inverted ve-locity in the deep part of the model, i.e., z > 1.5 km. On the other hand,WRI mitigates the negative effects of the cycle-skipped data to the inversionand correctly reconstructs the velocity in both areas that can and cannotbe reached by the transmitted turning waves. The final average model errorof WRI is 0.1 km/s, which is much smaller than that of FWI—0.22 km/s.Moreover, as shown in Table 4.5, the `2-norm model error of FWI is twiceas large as that of WRI. This implies that besides the transmitted waves700 1000 2000 3000 4000Lateral [m]0500100015002000Depth [m]1.522.533.544.55km/s(a) Initial model0 1000 2000 3000 4000Lateral [m]0500100015002000Depth [m]-1-0.500.51km/s(b) Difference between the initial and true modelsFigure 4.15: (a)Initial model and (b) difference between the initialmodel and true models.WRI also uses the reflection waves to invert the velocity model. Clearly,the main features and interfaces in the model are reconstructed correctly byWRI. Figure 4.18 displays three vertical cross-sections through the model.Compared to FWI, WRI provides a result that matches the true modelmuch better. Figure 4.19 shows the comparison of the estimated sourceweights obtained by FWI and WRI. From the comparisons in Figures 4.19aand 4.19b, we observe that compared to FWI, WRI provides estimates ofsource weights with mildly better phases and significantly better amplitudes.The `2-norm source error of FWI is much larger than that of WRI as shownin Table 4.5. This is not surprising as the quality of the source estimationand the inverted velocity model are closely tied to each other. Figure 4.20shows the comparisons between the observed data and predicted data com-710 1000 2000 3000 4000Receiver [m]-1000-50005001000150020002500Real(d)Observed dataPredicted data(a) Real part0 1000 2000 3000 4000Receiver [m]-1200-1000-800-600-400-2000200400600Imag(d)Observed dataPredicted data(b) Imaginary partFigure 4.16: Comparisons between the observed (dashed line) andpredicted data (dotted line) computed with the initial modelfor a source located at the center of the model. (a) Real partcomparison. (b) Imaginary part comparison.72FWI WRI‖mt −mf‖2 100 50‖αt − αf‖2 1.3e3 6e1‖dobs − dpred‖2 3.8e4 1.7e3Table 4.5: Comparisons between inversion results obtained by FWIand WRI in terms of the final model errors, source errors, anddata residuals.puted with the final results of WRI and FWI. The predicted data computedfrom the WRI inversion (Figures 4.20a and 4.20b) almost matches the ob-served data, while the mismatch of that from the FWI inversion is muchlarger (Figures 4.20c and 4.20d), which can also be found in the quantita-tive comparison in Table 4.5. All these results imply that besides providinga better velocity reconstruction with less model errors compared to the truemodel, WRI also produces a better estimate of the source weights with muchless errors compared to the true source weights.4.6 DiscussionsSource functions are essential for wave-equation-based inversion methods toconduct a successful reconstruction of the subsurface structures. Because thecorrect source functions are typically unavailable in practical applications,an on-the-fly source estimation procedure is necessary for the inversion. Inthis study, we proposed an on-the-fly source estimation technique for the re-cently developed WRI by adapting its variable projection procedure for thewavefields to include the source functions. Through simultaneously project-ing out the wavefields and source functions, we obtained a reduced objectivefunction that only depends on the medium parameters. As a result, we areable to accomplish the inversion without prior information of the sourcefunctions.The main computational cost of the proposed source estimation methodlies within the procedure of projecting out the wavefields and source weights,in which we have to invert a potentially ill-conditioned source-dependent ma-trix. For ocean bottom node and land acquisitions where all sources “see”the same receivers, we applied the block matrix inversion formula and arrivedat a new inversion scheme that only involves inverting a source-independentmatrix. This new scheme brings benefits to both direct and iterative solvers730 1000 2000 3000 4000Lateral [m]0500100015002000Depth [m]1.522.533.544.55km/s(a) Inversion result with WRI0 1000 2000 3000 4000Lateral [m]0500100015002000Depth [m]1.522.533.544.55km/s(b) Inversion result with FWIFigure 4.17: Inversion results with (a) WRI and (b) FWI.when compared to the direct inversion of the original source-dependent ma-trix. For 2D problems, we illustrated that without losing accuracy, theinversion of the proposed scheme only needs one Cholesky factorization foreach frequency, while the direct inversion requires nsrc. Indeed, this benefitdoes not only apply to the Cholesky method, but also to other faster di-rect solvers including the multifrontal massively parallel sparse direct solver(MUMPS, Amestoy et al. (2016)), which can further speed up the pro-jection procedure. When solving 3D problems with iterative solvers, theproposed inversion scheme reduces the number of preconditioners for eachfrequency from nsrc to one; therefore, it significantly reduces the compu-tational complexity. Furthermore, the source-independent matrix requiredto invert is the same one as in the conventional WRI without source esti-mation. Thus, we avoid iteratively inverting the potentially ill-conditioned740200400600800100012001400160018002000Velocity [km/s]1 2 3 4 5Depth [m]True modelInitial modelWRI resultFWI result(a) x = 1000 m0200400600800100012001400160018002000Velocity [km/s]1 2 3 4 5Depth [m]True modelInitial modelWRI resultFWI result(b) x = 2500 m0200400600800100012001400160018002000Velocity [km/s]1 2 3 4 5Depth [m]True modelInitial modelWRI resultFWI result(c) x = 4000 mFigure 4.18: Vertical profiles of the true model (solid line), initialmodel (dashdot line), and the inversion results with WRI(dashed line) and FWI (dotted line) at (a) x = 1000 m, (b)x = 2500 m, and (c) x = 4000 m.source-dependent matrix and can straightforwardly apply any efficient andscalable methods designed for the conventional WRI (Peters et al., 2015)to the proposed approach, which allows us to extend the application of thisapproach to 3D frequency-domain waveform inversions.Aside from benefiting inversions with ocean bottom node and land ac-quisitions, the proposed scheme may also benefit inversions with conven-tional 3D marine towed-streamer acquisitions where all sources “see” differ-ent receivers. In this case, we may lose the benefit of reducing the numberof preconditioners due to the acquisition geometry, however, the benefitthat avoids the direct inversion of the potentially ill-conditioned source-dependent matrix remains. As a result, we still can straightforwardly applyany preconditioners designed for the conventional WRI without source esti-mation to solve the proposed scheme. With carefully designed precondition-ers, the computational cost of the proposed WRI with source estimation canbe roughly equal to that of standard 3D frequency-domain FWI using itera-tive solvers, because the total numbers of the iterative PDE solves requiredby the two approaches are approximately the same. As a result, it should7510 15 20 25 30Frequency [Hz]-4-2024PhaseTrue sourceEstimated source with FWIEstimated source with WRI(a) Phase comparison10 15 20 25 30Frequency [Hz]406080100120140160AmplitudeTrue sourceEstimated source of FWIEstimated source of WRI(b) Amplitude comparisonFigure 4.19: Comparison of the true source function (+) and the esti-mated source function with FWI (×) and WRI (◦). (a) Phasecomparison. (b) Amplitude comparison.760 1000 2000 3000 4000Receiver [m]-1000-50005001000150020002500Real(d)Observed dataPredicted data(a) Real part - WRI0 1000 2000 3000 4000Receiver [m]-1200-1000-800-600-400-2000200400600Imag(d)Observed dataPredicted data(b) Imaginary part - WRI0 1000 2000 3000 4000Receiver [m]-1000-50005001000150020002500Real(d)Observed dataPredicted data(c) Real part - FWI0 1000 2000 3000 4000Receiver [m]-1500-1000-50005001000Imag(d)Observed dataPredicted data(d) Imaginary part - FWIFigure 4.20: Comparisons between the observed (dashed line) andpredicted data (dotted line) computed with the final resultsof WRI and FWI. (a) Real part and (b) imaginary part ofWRI result. (c) Real part and (d) imaginary part of FWIresult.77be viable to apply the proposed WRI with source estimation to inversionswith the conventional 3D marine towed-streamer acquisitions.While the presented examples illustrated the feasibility of the proposedapproach—WRI with source estimation—for frequency-domain inversions,its extension to the time-domain inversions is not trivial. The main com-putational challenge lies in the fact that in the time domain, the projectionprocedure requires us to solve a very large linear system, whose size is ntime(number of time slices) times larger than that in the frequency domain. As aresult, the computational and storage cost in the time domain is larger thanthat in the frequency domain. Moreover, the time-domain source functionis a time series rather than a complex number in the frequency domain. Inorder to estimate the time series successfully, we may need additional con-straints, such as the duration of the source function, to regularize the un-known source functions. Due to the two facts, the extension of the proposedapproach to the time-domain inversions requires further investigations.In the examples of this chapter, we use a single λ for all the frequencies.However, as the amplitudes of data and the Helmholtz matrices differ fromfrequency to frequency, we may use different λ’s for different frequencies toconduct better inversions. Moreover, as introduced by Esser et al. (2018),it would be a good strategy to conduct the inversion with multiple cyclesof warm-started WRI during which the penalty parameter λ increases. Wewill discuss the strategy of selecting λ in the following chapters.4.7 ConclusionsIn this chapter, we showed that the recently proposed wavefield-reconstructioninversion method can be modified to handle unknown source situations. Inthe proposed modification of wavefield-reconstruction inversion, we consid-ered the source functions as unknown variables and updated them jointlywith the medium and wavefields parameters. To update the three un-knowns, we proposed an optimization strategy based on the variable projec-tion method. During each iteration of the inversion, for fixed medium pa-rameters, the objective function is quadratic with respect to both wavefieldsand source functions. This fact motivates us to use the variable projectionmethod to first project out the wavefields and source functions simultane-ously, and then to update the medium parameters. As a result, we obtainedan objective that does not depend on the wavefields and the source functions.Numerical experiments illustrated that equipped with the proposed sourceestimation method wavefield-reconstruction inversion can produce accurateinversion results without prior knowledge of the true source weights. More-78over, this method can also produce reliable results when the observationscontain noise.We also compared the proposed on-the-fly source estimation techniquefor wavefield-reconstruction inversion to the conventional source estimationtechnique for full-waveform inversion. The numerical result illustrated thatby expanding the search space and the inexact PDE-fitting, the proposedsource estimation technique for wavefield-reconstruction inversion is less sen-sitive to inaccurate initial models and can start the inversion with datawithout low-frequency components.79Chapter 5Wavefield reconstructioninversion with sourceestimation and convexconstraints — application tothe 2D Chevron 2014synthetic blind-test dataset5.1 IntroductionIn the previous chapter, we have proposed a modified wavefield-reconstructioninversion approach that is equipped with an on-the-fly source estimationtechnique (WRI-SE). In this chapter, we aim to investigate its effectivenessfor realistic problems. For this purpose, we apply the WRI-SE approach toa highly realistic 2D marine elastic dataset, which is for full-waveform inver-sion blind tests and designed by Chevron in 2014 (Qin et al., 2015). Wheninverting this dataset, several challenges remain: (1) frequency-band-limiteddata without available low frequencies below 4–5 Hz; (2) unclear relationsbetween P-velocity and S-velocity; and (3) bad starting model that is faraway from the true model. To address these challenges, we extend theWRI-SE approach by involving multiple pieces of prior information aboutthe Earth, such as limits on the velocity or minimum smoothness conditionson the model, into the optimization framework.80Wave-equation-based inversions including full-waveform inversion (FWI)and wavefield-reconstruction inversion (WRI) may require some forms of reg-ularization to yield solutions, which conform with our knowledge about theEarth at the survey location. We propose a constrained optimization frame-work that incorporates the WRI-SE approach with multiple pieces of priorinformation about the geology. To achieve this task, we first mathematicallyrepresent different prior information of the model parameters by differentconvex sets. Then we constrain the model parameters in the optimiza-tion problem of WRI-SE by projection onto intersections of these convexsets, which is similar to Baumstein (2013). This results in a constrainedoptimization problem that does not involve quadratic penalties in the ob-jective function. We illustrate that this constrained optimization frameworkprovides us with the flexibility to straightforwardly incorporate several pre-existing regularization ideas, such as Tikhonov regularization and gradientfiltering (Brenders and Pratt, 2007), into one framework without significantadditional computation.This chapter is organized as follows. First, we present the constrainedoptimization problem that is incorporated with the objective function ofWRI-SE. Then we introduce the optimization algorithm that solves the con-strained optimization problem. Finally, we show the results on the Chevron2014 data set to demonstrate the effectiveness of the proposed method. Tohighlight the effect of the proposed approach, we apply no data preprocess-ing.5.2 WRI-SE with regularization by projectiononto the intersection of convex setsIn search of a flexible and easy to use framework, which can incorporatemost existing ideas in the geophysical community to regularize the waveforminversion problem, we propose to solve constrained optimization problemsof the formminimizemf(m) s.t. m ∈ C1⋂C2, (5.1)where the objective function f(m) corresponds to the aforementioned re-duced objective function for WRI with source estimation in equation 4.12:f(m) =12nsrc∑i=1nfreq∑j=1(‖Piui,j(m)− di,j‖22+λ2‖Aj(m)ui,j(m)− αi,j(m)ei,j‖22),(5.2)81with[ui,j(m)αi,j(m)]=[λ2Aj(m)>Aj(m) + P>i Pi −λ2Aj(m)>ei,j−λ2e>i,jAj(m) λ2e>i,jei,j]−1 [Pidi,j0].(5.3)C1 and C2 are convex sets (more than 2 sets can be used). Intuitively, theline connecting any two points in a convex set is also in the set—i.e., for allx,y ∈ C the following holds:cx + (1− c)y ∈ C, 0 ≤ c ≤ 1. (5.4)Each set represents a known piece of information about the Earth, yieldinga constraint that the estimated model has to satisfy. The model is requiredto be in the intersection of the two sets, denoted by C1⋂ C2, which is alsoconvex. To summarize, solving Problem 5.1 means that we try to minimizethe objective function while satisfying the constraints, which represent theknown information about the geology. Below are two useful convex sets thatare also used in the examples. Bound-constraints can be defined element-wise asC1 ≡ {m | bl ≤m ≤ bu}. (5.5)If a model is in this set, then the value of every model parameter is be-tween a lower bound bl and upper bound bu. These bounds can vary permodel parameter and can, therefore, incorporate spatially varying informa-tion about the bounds. Bound constraints can also incorporate a referencemodel by limiting the maximum deviation from this reference model as:bl = mref− δm. The projection onto C1, denoted as PC1 , is the elementwisemedianPC1(m) = median{bl,m,bu}. (5.6)The second convex set used here is a subset of the wavenumber domainrepresentation of the model (spatial Fourier transformed). This convex setcan be defined asC2 ≡ {m |E∗F∗(I− S)FEm = 0}, (5.7)which is defined by a series of matrix multiplications. First, we apply themirror-extension operator E to the model m to prevent wrap-around effectsdue to applying linear operations in the wavenumber domain. Next, themirror extended model is spatially Fourier transformed using F. In thisdomain, we require the wavenumber content outside a certain range to bezero. This is enforced by the diagonal (windowing) matrix S. This represents82information that the model should have a certain minimum smoothness. Incase we expect an approximately horizontally layered medium without sharpcontrasts, we can include this information by requiring the wavenumbercontent of the model to vanish outside of an elliptical zone around the zerowavenumber point (see also Brenders and Pratt (2007)). In other words,more smoothness in one direction is required than in the other direction.The adjoint of F, (F∗) transforms back from the wavenumber to the physicaldomain, and E∗ goes from mirror-extended domain to the original modeldomain. The projection onto C2, denoted as PC2 , is given byPC2(m) = E∗F∗SFEm. (5.8)A basic way to solve Problem 5.1 is to use the projected-gradient algo-rithm, which has the main stepmk+1 = PC(mk − γ∇mf(m)), (5.9)at the kth iteration with a step length γ. PC = PC1⋂ C2 is the projectiononto the intersection of all convex sets that the model is required to be in.“The projection onto” means that we need to find the point, which is inboth sets and closest to the currently proposed model estimate generatedby (mk − γ∇mf(m)). To find such a point, defined mathematically asPC(m) = arg minx‖x−m‖2 s.t. x ∈ C1⋂C2, (5.10)we can use a splitting method such as the Dykstra’s projection algorithm(Bauschke and Borwein, 1994). This algorithm finds the projection ontothe intersection of the two sets by projecting onto each set separately. Thisis a cheap and simple algorithm and enables projections onto complicatedintersections of sets. The algorithm is given by Algorithm 4.Algorithm 4 Dykstra’s projection algorithm.1. x0 = m, p0 = 0, q0 = 02. For l = 0, 1, . . .3. yl = PC1(xl + pl)4. pl+1 = xl + pl − yl5. xl+1 = PC2(yl + ql)6. ql+1 = yl + ql − xl+17. EndWhile gradient-projections is a solid approach, it can also be relatively831.8 1.9 2 2.1 2.2 2.3 2.41.81.922.12.22.32.42.52.6xy1234567Figure 5.1: The trajectory of Dykstra’s algorithm for a toy examplewith constraints y ≤ 2 and x2 + y2 ≤ 3. Iterates 5, 6 and 7coincide; the algorithm converged to the point closest to pointnumber 1 and satisfying both constraints.slowly converging. A potentially much faster algorithm is the class of pro-jected Newton-type algorithms (PNT) (see for example Schmidt et al., 2012).Standard Newton-type methods iterate,mk+1 = mk − γB−1∇mf(m), (5.11)with B an approximation of the Hessian. At every iteration, the PNTmethod finds the new model by solving the constrained quadratic problemmk+1 = arg minm∈C1⋂ C2Q(m), (5.12)84where the local quadratic model is given byQ(m) = f(mk) + (m−mk)∗∇mf(mk) + (m−mk)∗Bk(m−mk). (5.13)The solution of Problem 5.12 does not involve additional PDE solves andis different from a simple projection of the Newton-type update. The PNTalgorithm needs the objective function value, its gradient, approximationto the Hessian, and an algorithm to solve the constrained quadratic sub-problem (Problem 5.12), which in turn needs an algorithm (Dykstra) tosolve the projection problem (Problem 5.10). The PNT sub-problem canbe solved in many ways. We select the alternating direction method ofmultipliers (ADMM) algorithm (Eckstein and Yao, 2012), which is given byAlgorithm 5.Algorithm 5 ADMM.1. x0 = mk, z0 = x0, y0 = 02. For l = 0, 1, . . .3. xl+1 = arg minx(Q(x) + (κ/2)‖x− zl + yl‖22)4. zl+1 = PC1⋂ C2(xl+1 + yl) by Dykstrayl+1 = yl + xl+1 − zl+15. EndIn Algorithm 5, the scalar κ is a penalty parameter. ADMM can solveProblem 5.12 very efficiently once provided an approximate Hessian B thatis easy to invert. In this algorithm, we use the diagonal approximate ofthe Gauss-Newton Hessian given by van Leeuwen and Herrmann (2013b) toconstruct the matrix B, which is much easier to invert than a discretizedHelmholtz system. The Dykstra-splitting algorithm solves a sub-probleminside the ADMM algorithm, which is also a splitting algorithm.PNT has a similar computational cost as the standard projected-gradientalgorithm—the same number of Helmholtz systems need to be solved. Theadditional cost is low because the projections are cheap to compute and theinversion of the diagonal approximate Gauss-Hessian matrix is also cheap.The final optimization algorithm is given by Algorithm 6 and can be bestdescribed as Dykstra’s algorithm within ADMM within a projected-Newtontype method. During each iteration, the constraints are satisfied exactly.85Algorithm 6 Projected-Newton type based waveform inversion with pro-jections onto convex sets.1. Define convex sets C1, C2, . . .2. Set up Dykstra’s algorithm to solve Problem 5.103. Set up ADMM to solve Problem 5.124. minimizem f(m) s.t. m ∈ C1⋂ C2 using PNT5.3 Practical considerationsThe above projected Newton-type approach with convex constraints pro-vides the formal optimization framework with which we will carry out ourexperiments. Before demonstrating the performance of our approach, wediscuss a number of practical considerations to make the inversion success-ful.Frequency continuation. We divide the frequency spectrum into over-lapping frequency bands of increasing frequency. This multiscale continua-tion makes the approach more robust against cycle skips.Selection of the smoothing parameter. Although there is no penaltyparameter to select in our method, we do need to define the convex setsonto which we project. The minimum-smoothness constraints require twoparameters—one that indicates how much smoothness is required and onethat indicates the difference between horizontal and vertical smoothness. Atthe start of the first frequency batch, these parameters are chosen such thatthe minimum smoothness constraints correspond to the smoothness of theinitial guess and we add some “room” to update the model to less smoothmodel. This is the only user intervention of our regularization framework.Subsequent frequency batches multiply the selected parameters by the em-pirical formula dfmaxvmin , where d is the grid-point spacing, fmax the maximumfrequency of the current batch, and vmin the minimum velocity in the cur-rent model. This automatically adjusts the selected parameters for changinggrids, frequency, and velocity.Data fit for Wavefield Reconstruction Inversion. Our inversionsare carried out with WRI-SE, which solves for wavefields that fit both thedata and the wave equation for the current model iterate in a least-squaressense. Since the data are noisy, we chose a relatively large λ so that we donot fit noise that lives at high source-coordinate wave numbers. In this way,the equation at the current model acts as a regularization term that doesprevent fitting the noise at the low frequencies.865.4 Application to the 2D Chevron 2014blind-test datasetWhile wave-equation-based inversion techniques have received a lot of atten-tion lately, their performance on realistic blind synthetics exposes a numberof challenges. The best results so far have been obtained by experiencedpractitioners from industry and academia who developed labor intensiveworkflows to obtain satisfactory results from noisy synthetic data sets thatare representative of complex geological areas. Our aim is to create versatileand easy to use workflows that reduce user input by using projections onconvex sets.Specifically, we will demonstrate the benefits from a combination of theconvex bound and minimal smoothness constraints that require minimalparameter selection. Moreover, we will investigate the effectiveness of ouron-the-fly source estimation technique for WRI when applying to more re-alistic data. We invert the 2D Chevron 2014 marine isotropic elastic syn-thetic blind-test dataset with two different settings: with and without theminimum smoothness constraints. The dataset contains 1600 shots locatedfrom 1 km to 40 km. The maximum offset is 8 km. Both source and receiverdepths are 15 m. We use 16 frequency bands ranging from 3 Hz to 19 Hz withoverlap. Each frequency band contains three different frequencies. For eachfrequency band, we perform 5 Projected Newton-type iterations. For eachiteration, we randomly jittered selected 360 sources from the 1600 sourcesto calculate the misfit, gradient, and Hessian approximation. The lower andupper bound of velocity are 1510 m/s and 4500 m/s, respectively.Figures 5.2, 5.3a, and 5.3b correspond to the initial model, inversionresult without the minimum smoothness constraint, and inversion resultwith the minimum smoothness constraint, respectively. At the depth of1.5 km, both inversion results have the gas clouds at x = 10 km, 20 km,and 24 km. Both results also obtain the low-velocity layer from the depthof 2 km to 3 km. This low velocity layer is difficult for conventional FWI toreconstruct from the initial model, because the turning waves that conven-tional FWI uses to reconstruct the long wavelength structures rarely reachthe depth below 2 km for a maximal offset of 8 km (Virieux and Operto,2009). Comparing the two results, the inversion without the smoothnessconstraint has many vertical artifacts due to the data noise and subsampledsources, while with the smoothness constraint, there are fewer vertical arti-facts in Figure 5.3b. Figure 5.4 depicts a comparison of the well-log velocity(red), initial velocity (blue), and inverted velocities with (black) and with-out (green) the smoothness constraints. Clearly, the velocity inverted with870 5 10 15 20 25 30 35 40 45Lateral [km]012345Depth [km]20002500300035004000m/sFigure 5.2: Initial modelthe smoothness constraint matches with the well-log velocity better thanthat inverted without the constraint. Both results demonstrate that thesmoothness constraint is able to remove artifacts and improve the quality ofthe inversion result.Figure 5.5 shows the comparison between the estimated source waveletand the true source wavelet provided by Chevron. Aside from an obvi-ous amplitude scaling, both the frequency-dependent phase and amplitudecharacteristics are well recovered from this complex elastic data set. This isremarkable because our modeling engine is the scalar constant-density waveequation and this result is a clear illustration of the benefits of having aformulation where the physics and data are fitted jointly during the inver-sion. Considering both the difference between the inversion results and thetrue model as well as errors from the discretization, these slight differencesbetween the true and estimated wavelet are acceptable.5.5 DiscussionsConceptually, the approach presented is related to the work of Baumstein(2013). There too, projections onto convex sets are used to regularize thewaveform inversion problem. The optimization, computation of projections,as well as the sets are different. However, in that approach convergence maynot be guaranteed because the projections are not guaranteed to be ontothe intersections and because they only project at the end and do not solveconstrained sub-problems. This is a problem when incorporating second-order information.880 5 10 15 20 25 30 35 40 45Lateral [km]012345Depth [km]20002500300035004000m/s(a) Inversion result without smoothness constraint0 5 10 15 20 25 30 35 40 45Lateral [km]012345Depth [km]20002500300035004000m/s(b) Inversion result with smoothness constraintFigure 5.3: Comparison of inversion result with and without smooth-ness constraint. (a) Inversion result without smoothness con-straint. (b) Inversion result with smoothness constraint.Tikhonov regularization is a well-known regularization technique andadds quadratic penalties to objectives such asminimizemf(m) +β12‖R1m‖22 +β22‖R2m‖22, (5.14)where two regularizers are used. The constants β1 and β2 are the scalarweights, determining the importance of each regularizer. The matrices R1and R2 represent properties of the model, which we would like to penalize.This method should in principle be able to achieve similar results as theprojection approach, but it has a few disadvantages. The first one is that891000 1500 2000 2500Depth [m]2200240026002800300032003400360038004000Velocity [m/s]Well logInitialw/ SCw/o SCFigure 5.4: Comparison of the well-log velocity (red), initial velocity(blue), and inverted velocities with (black) and without (green)the smoothness constraint.it may be difficult or costly to determine the weights β1 and β2. Multipletechniques for finding a “good” set of weights are available, including theL-curve, cross-validation, and the discrepancy principle, see Sen and Roy(2003) for an overview in a geophysical setting. A second issue with thepenalty approach is the effect of the penalty terms on the condition numberof the Hessian of the optimization problem, see Nocedal and Wright (2000).This means that certain choices of β1, β2, R1, and R2 may lead to anoptimization problem that is unfeasibly expensive to solve. A third issue isrelated to the two-norm squared, ‖ · ‖22 not being a so-called “exact” penaltyfunction Nocedal and Wright (2000). This means that constraints can beapproximated by the penalty functions, but generally not satisfied exactly.Note that the projection and quadratic penalty strategies for regularizationcan be easily combined asminimizemf(m) +β12‖R1m‖22 +β22‖R2m‖22, s.t. m ∈ C1⋂C2. (5.15)Multiple researchers, for example Brenders and Pratt (2007), use the conceptof filtering the gradient. A gradient filter can be represented as applying alinear operation to the gradient step asmk+1 = mk − γF˜∇mf(m), (5.16)where the operator F˜ filters the gradient. Brenders and Pratt (2007) applya low-pass filter to the gradient to prevent high-wavenumber updates to the905 10 15 20Frequency [Hz]- pi 0 pi PhaseTrue WaveletEstimated Wavelet(a) Phase5 10 15 20Frequency [Hz]4006008001000120014001600AmplitudeTrue WaveletEstimated Wavelet(b) AmplitudeFigure 5.5: Phase (a) and amplitude (b) comparisons between thefinal estimated source wavelet (◦) and the true source wavelet(∗)91model while inverting low-frequency seismic data. A similar effect is achievedby the proposed framework and using the set defined in equation 5.7. Theprojection-based approach has the advantage that it generalizes to multipleconstraints on the model in multiple domains, whereas gradient filteringdoes not. Restricting the model to be in a certain space is quite similar toreparametrization of the model.5.6 ConclusionsTo arrive at a flexible and versatile constrained algorithm for WRI-SE, wecombine Dykstra’s algorithm with a projected-Newton type algorithm ca-pable of merging several regularization strategies including Tikhonov regu-larization, gradient filtering, and reparameterization. While our approachcould in certain cases yield equivalent results yielded by standard typesof regularization, there are no guarantees for the latter. In addition, theparameter selections become easier because the parameters are directly re-lated to the properties of the model itself rather than to balancing data fitsand regularization. Application of WRI-SE to the blind-test Chevron 2014dataset show excellent and competitive results that benefit from imposingthe smoothing constraint. While there is still a lot of room for improvement,these results are encouraging and were obtained without data preprocessing,excessive handholding, and cosmetic “tricks”. The fact that we recover thewavelet accurately using an acoustic constant density only modeling kernelalso shows the relative robustness of the presented method.92Chapter 6Uncertainty quantificationfor inverse problems withweak PDE constraints andits application to seismicwave-equation-basedinversions6.1 IntroductionInverse problems with partial-differential-equation (PDE) constraints arisein many applications of geophysics (Tarantola and Valette, 1982a; Pratt,1999; Haber et al., 2000; Epanomeritakis et al., 2008; Virieux and Operto,2009). The goal of these problems is to infer the values of unknown spatialdistributions of physical parameters (e.g., sound speed, density, or electricalconductivity) from indirectly measured data, where the underlying physicalmodel is described by a PDE (e.g., the Helmholtz equation or Maxwell’sequations). The most challenging aspects of these problems arise from thefact that they are typically multimodal, with many spurious local minima(Biegler et al., 2012), which can inhibit gradient-based optimization algo-rithms from reaching the global optimal solution successfully.A version of this chapter has been submitted.93This multimodality stems in part from the fact that the observed dataare measured on a small subset of the entire boundary of the domain (Bui-Thanh et al., 2013) and the nonlinear parameter-to-data forward map (vanLeeuwen and Herrmann, 2013b, 2015). One approach to dealing with themultimodality is to formulate the inverse problem as a deterministic opti-mization problem that aims at minimizing the misfit between the predictedand observed data in an appropriate norm, while also adding a regularizationterm that may eliminate the nonconvexity in certain situations (Virieux andOperto, 2009; Martin et al., 2012). The result of this deterministic approachis an estimate of the model parameters that is consistent with the observeddata and contains few unwanted features. Since observed data typicallycontain measurement noise and modeling errors, we are not only interestedin an estimate that best fits the data, but also in a complete statistical de-scription of the unknown model parameters (Tarantola and Valette, 1982b;Osypov et al., 2013b). To that end, statistical approaches, and in particularthe Bayesian inference method, are desirable and necessary. Unlike in thedeterministic case, the solution produced by Bayesian inference is a poste-rior probability density function (PDF), which incorporates uncertainties inthe observed data, the forward modeling map, and one’s prior knowledge ofthe parameters. Once we can tractably compute the posterior distribution,we can extract various statistical properties of the unknown parameters.Bayesian inference methods have been applied to a number of PDE-constrained geophysical statistical inverse problems (Tarantola, 1987, 2005;Aster et al., 2011; Martin et al., 2012; Bui-Thanh et al., 2013; Zhu et al.,2016; Ely et al., 2017). In these reported works, the PDE is typically treatedas a strict constraint when formulating the posterior PDF, i.e., the fieldvariables should always exactly satisfy the PDE. This leads to the so-calledreduced or adjoint-state method (Plessix, 2006; Hinze et al., 2008) thateliminates the field variables by solving the PDE, resulting in a posteriorPDF with multiple modes. To study the posterior PDF, Markov chainMonte Carlo (McMC) type methods, including the Metropolis-Hasting basedmethods (Haario et al., 2006; Stuart et al., 2016; Ely et al., 2017), thestochastic Newton-type method (Martin et al., 2012), and the randomize-then-optimize (RTO) method (Bardsley et al., 2015) sample the posteriorPDF by drawing samples from a proposal distribution followed by an acceptor reject step. To compute the accept/reject ratio, these methods haveto evaluate the posterior PDF for each sample, which leads to solving alarge number of computationally expensive PDEs. Moreover, according tothe scaling analysis by Roberts et al. (2001), McMC type methods requirea significantly larger number of samples to reach a status of convergence94for large-scale problems in comparison with small-scale problems, which iswell known as the curse of dimensionality. These difficulties preclude thestraightforward applications of these methods to large-scale problems withmore than 106 unknown parameters.In this chapter, we present a new formulation of the posterior distri-bution that subsumes the conventional reduced formulation as a specialcase. Instead of treating the PDE as a strict constraint and eliminatingthe field variables by solving the PDE, we relax the PDE-constraint by in-troducing the field variables as auxiliary variables. This idea is similar tothe method of van Leeuwen and Herrmann (2015) applied to deterministicPDE-constrained optimization problems, in which the PDE misfit is treatedas a penalty term in the misfit function and weighted by a penalty param-eter. Moreover, the idea of relaxing the PDE-constraint is also widely-usedin weather forecasting applications (Fisher et al., 2005) for the sake of im-proving the stability of results. In the field of seismic exploration, Fanget al. (2015) and Fang et al. (2016) first introduced a method to construct aposterior PDF with weak acoustic wave-equation constraints. In the follow-ing study by Lim (2017), the authors showed that for small-scale problems,this posterior PDF can be sampled by the randomized maximum likelihood-McMC (RML-McMC) method. We demonstrate that the conventional re-duced posterior PDF is a special case of our new formulation. By exploitingthe structure of the new posterior PDF, we show that, with an appropriatepenalty parameter, the new posterior PDF can be approximated by a Gaus-sian PDF, which is centered at the maximum a posteriori (MAP) estimatethat maximizes the posterior PDF. To construct this Gaussian approxima-tion, we exploit the local derivative information of the posterior PDF andformulate the covariance matrix as a PDE-free operator, which allows us tocompute the matrix-vector product without the requirement of computinga large number of additional PDEs. By avoiding an explicit formulation ofthe covariance matrix, which would be impractical to compute and store, wecan apply a recently proposed bootstrap type method (Efron, 1981, 1992)—the so-called randomize-then-optimize method (Bardsley et al., 2014)—toaffordably draw samples from this surrogate distribution.We apply our new computational framework to several seismic wave-equation-based inverse problems ranging in size and complexity of the un-derlying parameters. Our first example compares our sampling method witha benchmark method—the randomize maximum likelihood (RML) method(Chen and Oliver, 2012)—to validate our Gaussian approximation on a sim-ple model with 1800 unknown parameters. Next, we apply our computa-tional framework to a more complex model with 92,455 unknown parameters95to test the feasibility of the approach to more realistically sized problems.This chapter is organized into three major sections. The first intro-duces the derivation of the posterior PDF and the corresponding samplingmethod in a general setting. The second section introduces each componentin the general framework when applied to the full-waveform inversion typeproblems. The final section presents the results of the application of ourframework to several numerical inverse problems for velocity models withdifferent size and complexity.6.2 Bayesian framework for inverse problemswith a weak PDE-constraintIn a PDE-constrained inverse problem, the goal is to infer the unknown dis-cretized ngrid-dimensional physical model parameters m ∈ Rngrid from ndata-dimensional noisy observed data d ∈ Cndata . As the noisy data are stochasticin nature, so are the inversion results obtained from them. Bayesian infer-ence is a widely-used approach that seeks to estimate the posterior PDFof the unknown parameters m by incorporating the statistics of the mea-surement and modeling error and one’s prior knowledge of the underlyingmodel. Mathematically, Bayesian inference applies Bayes’ law to formulatethe posterior PDF ρpost(m|d) of the model parameters m given the observeddata d by combining a likelihood PDF and a prior PDF asρpost(m|d) ∝ ρlike(d|m)ρprior(m), (6.1)where the likelihood PDF ρlike(d|m) describes the probability of observingthe data d given the model parameters m, and the prior PDF ρprior(m)describes one’s prior beliefs in the unknown model parameters. The propor-tionality constant depends on the observed data d, which are fixed. Oncewe have a computationally tractable approach to estimating the posteriorPDF, we can apply certain sampling methods to draw samples from theposterior PDF, which can then be used to compute statistical properties ofinterest such as the MAP estimatem∗ = arg maxmρpost(m|d), (6.2)the mean valueE[m|d] =∫mρpost(m|d)dm, (6.3)96the model covariance matrixΓ(mk,ml|d) =∫mkmlρpost(m|d)dm− E[mk|d]E[ml|d], (6.4)the model standard deviation (STD)STD(mk) =√Γ(mk,mk|d), (6.5)and the marginal distributions of mρM(mk|d) =∫· · ·∫ρpost(m|d)ngrid∏l=1,l 6=kdml, (6.6)where the values mk and ml denote the kth and lth elements of the vectorm (Kaipio and Somersalo, 2006; Matheron, 2012). If samples from theposterior PDF are available, then the quantities listed above can be easilycomputed through ensemble averages. The primary issue for statisticiansis to construct the posterior PDF and design methods that can efficientlydraw samples from it, which are used in the approximation of the integralsabove.6.2.1 The posterior PDFTo motivate the derivation of the new posterior PDF, it is helpful to startwith the conventional formulation of the posterior PDF for PDE-constrainedinverse problems. The so-called reduced approach eliminates the PDE-constraint by solving the PDE, which leads to the following nonlinear for-ward modeling map F (m):F (m) = PA(m)−1q. (6.7)Here the vector q ∈ Cngrid represents the discretized (known) source term.The matrix A(m) ∈ Cngrid×ngrid denotes the discretized PDE operator andthe operator P ∈ Rndata×ngrid samples the data d from the vector of fieldvariables u, which is the solution of the PDE u = A(m)−1q. In real-worldseismic applications, the observed data always contain correlated measure-ment noise arising from environmental disturbances and modeling errors,which are difficult to precisely quantify and model. One popular approachto simplifying the problem is to assume that the combined measurementand modeling noise ∈ Cndata is additive and is drawn from a Gaussian97distribution with zero mean and covariance matrix Γnoise (Bui-Thanh et al.,2013; Osypov et al., 2013b; Bardsley et al., 2015), i.e., ∼ N (0,Γnoise), inde-pendent of m. The assumption of Gaussianity results in distributions thatare relatively easy to model and sample, thereby providing a rich source oftractable examples (Kaipio and Somersalo, 2006). With this assumption inmind and the additional assumption that the prior distribution of the modelm is also Gaussian with the mean model parameters m˜ and the covariancematrix Γprior, we arrive at the following posterior distribution:ρpost(m|d) ∝ exp(−12‖F (m)− d‖2Γ−1noise− 12‖m− m˜‖2Γ−1prior), (6.8)where the symbol ‖ ‖Γ−1noise denotes the weighted `2-norm with the weightingmatrix Γ−1noise. There are several challenges in computing various quantitiesassociated with the posterior distribution in equation 6.8. In order to ob-tain the MAP estimate m∗, we need to solve the following deterministicoptimization problem:m∗ = arg maxmρpost(m|d) = arg minm− log ρpost(m|d)= arg minm12‖F (m)− d‖2Γ−1noise+12‖m− m˜‖2Γ−1prior,(6.9)which can be solved by the so-called adjoint-state method. As noted invan Leeuwen and Herrmann (2013b), the nonlinear forward modeling mapF (m) results in the objective function − log ρpost(m|d) being highly oscil-latory with respect to the model parameters m, which yields many localminima. To find the globally optimal solution, a sufficiently close initialmodel is necessary, which may be difficult to obtain in real-world scenar-ios. As mentioned previously, the nonlinear parameter-to-data map alsoresults in computational difficulties when sampling the posterior distribu-tion in equation 6.8. Specifically, the tradeoff between designing a proposaldistribution that is well tuned to the true posterior distribution and onethat is computationally cheap to sample is not a straightforward choice tomake. As one models a proposal that is easier to sample, typically the priceto pay is having to draw more samples until convergence is reached.These challenges result from the nonlinear forward modeling map F (m)induced by the strict PDE-constraint in the optimization problem in equa-tion 6.9. To overcome these difficulties, van Leeuwen and Herrmann (2013b)and van Leeuwen and Herrmann (2015) proposed a penalty formulation tosolve deterministic PDE-constrained optimization problems, wherein they98relax the strict PDE-constraint by penalizing the data misfit function by aweighted PDE misfit with a penalty parameter λ. This results in the follow-ing joint optimization problem with respect to both the model parametersm and the field variables collected in the vector u:arg minm,ufpen(m,u) =12‖Pu−d‖2Γ−1noise+λ22‖A(m)u−q‖2 + 12‖m−m˜‖2Γ−1prior.(6.10)The authors note that the problem in equation 6.10 is a separable nonlinearleast-squares problem, in which the optimization with respect to u is alinear data-fitting problem when m is fixed. In van Leeuwen and Herrmann(2015), the authors eliminate the field variables u by the variable projectionmethod (Golub and Pereyra, 2003) in order to avoid the high memory costsinvolved in storing a unique field variable for each individual source. Thevariable projection method also eliminates the dependence of the objectivefunction in equation 6.10 on u. As for each input parameter m, there isa unique u(m) satisfying ∇ufpen(m,u)|u=u(m) = 0, which has the closedform solutionu(m) =(λ2A(m)>A(m)+P>Γ−1noiseP)−1(λ2A(m)>q+P>Γ−1noised), (6.11)where the symbol > denotes the (complex-conjugate) transpose. As notedby van Leeuwen and Herrmann (2013b) and van Leeuwen and Herrmann(2015), minimizing the objective function (equation 6.10) with a carefullyselected λ is less prone to being trapped in suboptimal local minima, becausethe inversion is carried out over a larger search space (implicitly throughu(m)) and it is therefore easier to fit the observed data for poor startingmodels compared to the conventional reduced formulation in equation 6.9.Motivated by the penalty approach to solving the deterministic inverseproblems, we propose a more generic posterior PDF for statistical PDE-constrained inverse problems. As before, we relax the PDE-constraint byintroducing the field variables u as auxiliary variables, i.e., we haveρpost(u,m|d) ∝ ρ(d|u,m)ρ(u,m), (6.12)where the conditional PDF ρ(d|u,m) now describes the probability of ob-serving the data d given the field variables u and model parameters m. Toformulate the joint PDF ρ(u,m), we apply the standard conditional decom-position (Sambridge et al., 2006)ρ(u,m) = ρ(u|m)ρprior(m). (6.13)99Hence,ρpost(u,m|d) ∝ ρ(d|u,m)ρ(u|m)ρprior(m). (6.14)The implication of the reduced formulation is that the field variables u sat-isfy the PDE strictly—i.e., u = A(m)−1q. This adherence to the PDEcorresponds to solutions u that satisfy the PDE A(m)u = q with the prob-ability density ρ(u|m) = δ(A(m)u− q), where δ() denotes the Dirac deltafunction (Dummit and Foote, 2004). Conversely, replacing the constraintby a quadratic penalty term allows for a Gaussian error with zero mean andcovariance matrix λ−2I in the PDE misfit A(m)u− q. This yieldsρ(u|m) = (2pi)−ngrid2 det(λ2A(m)>A(m)) 12 exp(−λ22‖A(m)u− q‖2).(6.15)Indeed, for a given m, this distribution ρ(u|m) with respect to u is aGaussian distribution with a mean of A(m)−1q and a covariance matrixof λ−2A(m)−1A(m)−>. The conditional probability ρpost(u,m|d) is thejoint PDF with respect to both the model parameters m and field variablesu. Because the control or model variables m are of primary interest, weeliminate the dependence of the joint PDF on the auxiliary variables u bymarginalizing over u:ρpost(m|d) =∫ρpost(u,m|d)du∝∫ρ(d|u,m)ρ(u|m)ρprior(m)du=∫ρ(d|u,m)ρ(u|m)du× ρprior(m)∝ det(H(m))− 12 det (λ2A(m)>A(m)) 12× exp(− 12(λ2‖A(m)u(m)− q‖2 + ‖Pu(m)− d‖2Γ−1noise+ ‖m− m˜‖2Γ−1prior)),(6.16)whereH(m) = −∇2u log ρpost(u,m|d)∣∣u=u(m)= λ2A(m)>A(m) + P>Γ−1noiseP,(6.17)and u(m) is given by equation 6.11. As in the deterministic case, the poste-rior PDF corresponding to the conventional reduced formulation (cf. equa-tion 6.8) can also be derived from the marginal PDF ρpost(m|d) in equa-100tion 6.16. Indeed, as λ→∞, we havelimλ→∞det(H(m))− 12 det(λ2A(m)>A(m)) 12= limλ→∞det(I +1λ2A(m)−>P>Γ−1noisePA(m)−1)− 12= limλ→∞det(I +1λ2Γ− 12noisePA(m)−1A(m)−>P>Γ−>2noise)− 12= 1,(6.18)andlimλ→∞u(m) = A(m)−1q. (6.19)This yieldslimλ→∞ρpost(m|d) ∝ exp(−12‖PA(m)−1q− d‖2Γ−1noise− 12‖m− m˜‖2Γ−1prior),(6.20)which, as expected, is the posterior PDF corresponding to the conventionalreduced formulation.To illustrate the advantages of our penalty formulation for the posteriorPDF (cf. equation 6.16) over the conventional reduced formulation (cf. equa-tion 6.8), we conduct a simple experiment adopted from Esser et al. (2018).We invert for the sound speed given partial measurements of the pressurefield and generate data with a known band-limited source. The data arerecorded at four receivers (see Figure 6.1a) and is contaminated with Gaus-sian noise (see Figure 6.1b). As in Esser et al. (2018), we neglect the ampli-tude effects related to geometrical spreading and define the forward operatorF (m) asF (m) = PA−1(m)q =F−1eiωm‖x1−xs‖2FqF−1eiωm‖x2−xs‖2FqF−1eiωm‖x3−xs‖2FqF−1eiωm‖x4−xs‖2Fq , (6.21)where the operator F denotes the temporal Fourier transform and ω de-notes the angular frequency. The vectors xi, i = 1, ..., 4 and xs denote thereceiver and source locations, respectively, and the scalar m represents theslowness of the medium, i.e., m =1vwhere v is the velocity. The sourceand receiver locations are in Figure 6.1a denoted by the symbols (∗) and101(∇). With the forward model defined in equation 6.21, we formulate theposterior distribution for the reduced formulation and the penalty formula-tion by choosing a Gaussian distribution with a mean of 5 · 10−4 s/m anda standard deviation of 1.2 · 10−4 s/m for the prior distribution ρprior(m).Finally, we add a Gaussian noise with a variance of 4×10−4 to the observeddata, resulting in a noise-to-signal ratio of ‖noise‖2/‖signal‖2 = 0.24.Figure 6.2 depicts the posterior PDFs of the reduced and penalty for-mulations for λ = 10, 50, and 250. As expected, local maxima are presentin the PDF of the reduced formulation and in the PDFs of the penalty for-mulation when the λ values become too large. As λ increases from λ = 10,where the posterior is unimodal, local maxima appear for larger λ = 250,only one of which has strong statistical significance. For λ = 250, the re-sulting PDF is close to the one yielded by the reduced formulation, whichcorresponds to λ → ∞. From this stylized example, the strongly relaxedformulation appears unimodal and with low bias. As we will demonstrate inlater sections, being less prone to local maxima reduces the computationalcost associated with sampling these distributions.6.2.2 Selection of λBefore discussing computationally efficient sampling schemes, we first pro-pose a method to select values for λ, which will balance the tradeoff betweenthe unimodality of the distribution and its deviation from the reduced-formulation PDF.To arrive at a scheme to select λ, we focus on two terms of the negativelogarithm function of the posterior PDF in equation 6.16:φ(m) =− log ρpost(m|d)=12log det(I +1λ2Γ− 12noisePA(m)−1A(m)−>P>Γ−>2noise)+12(λ2‖A(m)u(m)− q‖2 + ‖Pu(m)− d‖2Γ−1noise+ ‖m− m˜‖2Γ−1prior),(6.22)namely the determinantφ1(m) =12log det(I +1λ2Γ− 12noisePA(m)−1A(m)−>P>Γ−>2noise), (6.23)102(a) Snap shot0 0.5 1 1.5 2t [s]d(t)(b) DataFigure 6.1: (a) Snapshot of the time-domain wavefield generated by asingle source (∗); (b) recorded time-domain data at four receiverlocations (∇).1032 4 6 8m [s/m] ×10 -40500100015002000250030003500400045005000ρ(c)(a) λ = 102 4 6 8m [s/m] ×10 -40100020003000400050006000ρ(c)(b) λ = 502 4 6 8m [s/m] ×10 -4010002000300040005000600070008000ρ(c)(c) λ = 2502 4 6 8m [s/m] ×10 -40100020003000400050006000700080009000ρ(c)(d) ReducedFigure 6.2: The four posterior PDFs corresponding to the penaltyformulations with λ = 10 (a), 50 (b), and 250 (c), and thereduced formulation (d).and the misfitφ2(m) =12(λ2‖A(m)u(m)− q‖2 + ‖Pu(m)− d‖2Γ−1noise+ ‖m− m˜‖2Γ−1prior).(6.24)These equations imply that we should avoid situations in which λ→ 0 andλ→∞. When λ→ 0, the optimal variable u(m) tends to fit the data. As aresult, φ2(m)→ 12‖m−m˜‖2Γ−1prior , which means that the observed data are notinformative to the unknown parameters and have few contributions to theposterior distribution. Furthermore, as λ → 0, the nonlinear determinant104function φ1(m) → ∞ and will dominate the overall function φ(m), whichresults in a highly nonlinear mapping m → φ(m). On the other hand,when λ → ∞, we find that φ1(m) → 0 and φ2(m) → 12‖PA(m)−1q −d‖2Γ−1noise+ 12‖m − m˜‖2Γ−1prior in which case the misfit φ(m) converges to thenonlinear reduced formulation. Considering both facts, we want to find anappropriate λ, so that φ1(m) is relatively small compared to φ2(m), thusensuring enough information from the observed data, while φ2(m) is stillless likely to contain local minima.Based on spectral arguments, van Leeuwen and Herrmann (2015) pro-posed a scaling for the penalty parameter according to the largest eigenvalueµ1 of the matrix A(m)−>P>Γ−1noisePA(m)−1. Relative to µ1, a penalty pa-rameter λ2 µ1 can be considered large while λ2 µ1 is considered small.As a result, λ chosen much greater than this reference scale—i.e., whenλ2 µ1, the minimizers for field variables u(m) will converge to the solu-tion of the wave equation A(m)−1q with a convergence rate of O(λ−2), andtherefore our penalty misfit approaches the reduced misfit. A similar consid-eration applies when λ2 µ1. After extensive parameter testing, we foundthat choosing λ2 = 0.01µ1 strikes the right balance so that the posteriorPDF is less affected by local maxima compared to the reduced formulationwhile the data is still informative to the model parameters and the determi-nant term φ1(m) remains negligible compared to φ2(m). With this choice ofλ, we can therefore neglect the φ1(m) term, as it is small relative to φ2(m),and consider an approximate posterior PDF ρpost(m|d) consisting only ofthe φ2(m) term, i.e., we now have the approximate equalityρpost(m|d) ≈ ρpost(m|d)∝ exp(− λ22‖A(m)u(m)− q‖2 − 12‖Pu(m)− d‖2Γ−1noise− 12‖m− m˜‖2Γ−1prior).(6.25)This approximation results in a posterior PDF that is much easier to evalu-ate. From here on out, we consider ρpost(m|d) as our PDF of interest in thesubsequent sampling considerations. In practice, we may face situations thatthe inversion involves data corresponding to different frequencies. In suchsituations, both the noise levels and the PDEs can differ from frequency tofrequency. As a result, the proposed selection criterion implies us to selectdifferent λ’s for different frequencies.1056.2.3 Sampling methodGiven this choice of λ, yielding the PDF in equation 6.25, the computa-tional considerations of drawing samples from this approximate distribu-tion are paramount in designing a tractable method. McMC-type methodsare unfortunately computationally unfeasible for high-dimensional problems,owing to the relatively large number of the expensive evaluations of poste-rior distributions needed to converge adequately. For this reason, we followan alternative approach—widely used in Bayesian inverse problems withPDE-constraints (Bui-Thanh et al., 2013; Zhu et al., 2016)—where we ap-proximate the target posterior PDF by a Gaussian PDF. To construct thisGaussian PDF, we first find the MAP estimate m∗ by solvingm∗ = arg minm− log (ρpost (m|d) ). (6.26)Next, we use the local second-order derivative information of the posteriorPDF at the MAP point—i.e., the Hessian Hpost = −∇2m log ρpost(m|d)—to construct the covariance matrix of the Gaussian PDF, which yields theGaussian distribution N (m∗,H−1post). Afterwards, we draw samples from theGaussian distribution from which we compute statistical quantities. We in-corporate this sampling strategy with the proposed posterior PDF with weakPDE-constraints and obtain the following Bayesian framework as shown inAlgorithm 7:Algorithm 7 Bayesian framework for inverse problems with weak PDE-constraints1. Set Γnoise, Γprior, prior mean model m˜, and a value for the penaltyparameter λ;2. Formulate the posterior PDF ρpost(m|d) with equation 6.25;3. Find the MAP estimate m∗ by minimizing − log ρpost(m|d);4. Compute the Hessian matrix Hpost = −∇2mlog(ρpost(m|d))at the MAPestimate m∗ and define the Gaussian PDF N (m∗,H−1post);5. Draw nsmp samples from the Gaussian PDF N (m∗,H−1post);6. Compute statistical properties of interest from the nsmp samples.Compared to McMC type methods, the additional evaluations of theposterior PDF are not needed once we calculate the MAP estimate m∗,which significantly reduces the computational cost. However, the accuracyof samples drawn from this surrogate PDF strongly depends on the accuracyof the Gaussian approximation in a neighborhood of m∗, which is related1062 4 6 8m [s/m] ×10 -401000200030004000500060007000ρ(c)True PDFGaussian approximation(a) λ = 102 4 6 8m [s/m] ×10 -4020004000600080001000012000140001600018000ρ(c)True PDFGaussian approximation(b) λ = 502 4 6 8m [s/m] ×10 -400.511.522.53ρ(c)×10 4True PDFGaussian approximation(c) λ = 2502 4 6 8m [s/m] ×10 -400.511.522.533.5ρ(c)×10 4True PDFGaussian approximation(d) ReducedFigure 6.3: The Gaussian approximations of the four posterior PDFsassociated with the model in equation 6.21.to our choice of λ. To illustrate this dependence, we continue the exampleshown in Figure 6.2 and compare the Gaussian approximation of the reducedformulation (i.e., λ = ∞) to the penalty formulation for different values ofλ, plotted in Figure 6.3. In this case the largest eigenvalue is µ1 = 104 andthe corresponding is λ = 10. Clearly, when selecting λ = 10, the Gaussianapproximation is relatively close to the true PDF, whereas increasing λdecreases the accuracy of the Gaussian approximation.Armed with an accurate Gaussian approximation to the unimodal PDF(for an appropriate choice of λ), we are now in a position to draw sam-ples from the Gaussian distribution N (m∗,H−1post). For small-scale prob-107lems, an explicit expression of the Hessian Hpost is available. Hence, wecan draw samples ms from the distribution N (m∗,H−1post) by utilizing theCholesky factorization of the Hessian Hpost—i.e., Hpost = R>postRpost—asfollows (Rue, 2001):ms = m∗ + R−1postr, (6.27)where the matrix Rpost is an upper triangular matrix and the vector r is arandom vector drawn from the ngrid-dimensional standard Gaussian distri-bution N (0, Ingrid×ngrid). Nevertheless, for large-scale problems, construct-ing and storing an explicit expression of the Hessian Hpost is infeasible.Typically, to avoid the construction of the explicit Hessian matrix, the Hes-sian Hpost is constructed as a matrix-free implicit operator, and we only haveaccess to computing the matrix-vector product with an arbitrary vector. Asa result, we need a matrix-free sampling method to draw samples. In thefollowing section, we will develop the implementation details for a suitablesampling method for seismic wave-equation-constrained inverse problems.6.3 Uncertainty quantification for seismicwave-equation constrained inverse problemsWave-equation-based inversions, where the coefficients of the wave equationare the unknowns, are amongst the most computationally challenging inverseproblems as they typically require wave-equation solves for multiple sourceexperiments on a large domain where the wave travels many wavelengths.Additionally, these inverse problems, also known as full-waveform inversion(FWI) in the seismic community (Pratt, 1999), involve fitting oscillatorypredicted and observed data, which can result in parasitic local minima.Motivated by the penalty formulation with its weak PDE-constraintsand the results presented above presumably, we will derive a time-harmonicBayesian inversion framework that is capable of handling relatively large-scale problems where the wave propagates about 30 wavelengths, a moderatenumber for realistic problems. Given acoustic seismic data, collected at thesurface from sources that fire along the same surface, our aim is to con-struct the statistical properties of the spatial distribution of the acousticwave velocity. With these properties, we are able to conduct uncertaintyquantification for the recovered velocity model. The subsections are orga-nized roughly in according to the main steps outlined in Algorithm 7.1086.3.1 Steps 1 and 2: designing the posterior and prior PDFTo arrive at a Bayesian formulation for wave-equation-based inverse prob-lems with weak constraints, we consider monochromatic seismic data col-lected at nrcv receivers locations from nsrc seismic source locations andsampled at nfreq frequencies. Hence, the observed data d ∈ Cndata withndata = nrcv × nsrc × nfreq. As before, the synthetic data are obtained byapplying, for each source, the projection operator P ∈ Cnrcv×ngrid . For theith source and jth frequency, the time-harmonic wave equation correspondsto the following discretized Helmholtz system:Aj(m)ui,j = qi,j with Aj(m) = ∆ + ω2j diag(m−2). (6.28)In this expression, the qi,j ’s are the monochromatic sources, the symbol∆ refers to the discretized Laplacian, ω represents the angular frequency,and m ∈ Rngrid denotes the vector with the discretized velocities. With aslight abuse of notation, this vector appears as the elementwise reciprocalsquare on the diagonal. To discretize this problem, we use the Helmholtzdiscretization from Chen et al. (2013).If we consider the data from all sources and frequencies simultaneously,the posterior PDF for the weak-constrained penalty formulation becomesρpost(m|d)∝ expnsrc∑i=1nfreq∑j=1−12‖Pui,j(m)− di,j‖2Γ−1noise −λ22‖Aj(m)ui,j(m)− qi,j‖2× exp(−12‖m− m˜‖2Γ−1prior).(6.29)Aside from choosing a proper value for the penalty parameter λ, anothercrucial component of the posterior PDF in equation 6.29 is the choice of theprior PDF. From a computational perspective, a suitable prior should havea bounded variance and one should be able to draw samples with moder-ate cost (Bui-Thanh et al., 2013). More specifically, this results in havingcomputationally feasible access to (matrix-free) actions of the square-rootof the prior covariance operator or its inverse on random vectors. To meetthis requirement, we utilize Gaussian smoothness priors (Matheron, 2012),which provide a flexible way to describe random fields and are commonlyemployed in Bayesian inference (Lieberman et al., 2010; Martin et al., 2012;Bardsley et al., 2015). Following Lieberman et al. (2010), we construct a109Gaussian smoothness prior ρprior(m) ∝ N (m˜,Γprior) with a reference meanmodel m˜ and a covariance matrix Γprior given byΓprior(k, l) = a exp(−‖sk − sl‖22b2)+ cδk,l. (6.30)In this expression for the covariance, the vectors sk = (zk, xk) and sl =(zl, xl) denote the kth and lth spatial coordinates corresponding to the kthand lth elements in the vector m, respectively. The parameters a, b, and ccontrol the correlation strength, variance, and spatial correlation distance.The variance of the kth element mk is var(mk) = Γprior(k, k) = a+c. The pa-rameter c also ensures that the prior covariance matrix remains numericallywell-conditioned (Martin et al., 2012). Clearly, when the distance betweensk and sl is large—i.e.,‖sk−sl‖22b2 1, the cross-covariance Γprior(k, l) vanishesquickly.The covariance matrix Γprior is dense. Therefore, for large-scale prob-lems, it is intractable to construct and store it. However, since Γprior(k, l)tends to zero quickly when |k − l| is sufficiently large, Γprior has a nearlysparse approximation. To construct this sparse approximation, we follow(Bardsley et al., 2015) and define the correlation length γ as the distancewhere the cross-covariance Γprior(k, l) drops to 1% of the var(mk). Theparameter b can be expressed in terms of the correlation length γ by theexpressionb =γ√2 log(100)− 2 log(1 + c/a) . (6.31)With the correlation length γ, we forceΓprior(k, l) = a exp(−‖sk − sl‖22b2)+ cδk,l, if ‖sk − sl‖ ≤ γΓprior(k, l) = 0, if ‖sk − sl‖ > γ,(6.32)and obtain a sparse Γprior. Compared to (6.30), the construction of Γpriorby (6.32) is much cheaper and requires less storage. Consequently, our prioris computationally feasible for large-scale problems.6.3.2 Steps 3 and 4: Gaussian approximationAfter setting up the posterior PDF, we need to construct its Gaussian ap-proximation N (m∗,H−1post), corresponding to steps 3 and 4 in Algorithm 7.In order to achieve this objective, first we compute the MAP estimate of the110posterior PDF m∗, which is equivalent to minimizing the negative logarithm− log ρpost(m|d):m∗ = arg minm− log ρpost(m|d)= arg minmnsrc∑i=1nfreq∑j=1(12‖Pui,j(m)− di,j‖2Γ−1noise +λ22‖Aj(m)ui,j(m)− qi,j‖2)+12‖m− m˜‖2Γ−1prior.(6.33)Note that the objective function − log ρpost(m|d) is analogous to the costfunction of the deterministic optimization problem (van Leeuwen and Her-rmann, 2015). Using similar techniques as in the aforementioned work, wecan express the gradient g for this objective asg =nsrc∑i=1nfreq∑j=1λ2G>i,j(Aj(m)ui,j(m)− qi,j)+ Γ−1prior(m− m˜), (6.34)where the sparse Jacobian matrix Gi,j =(∇mAj(m))ui,j(m). Followingvan Leeuwen and Herrmann (2013a), we use the limited-memory-Broyden–Fletcher–Goldfarb–Shanno method (l-BFGS, Nocedal and Wright, 2006) tosolve the optimization problem in equation 6.33 to find the MAP estimatem∗.Once we have computed m∗, we focus on approximating the posteriorPDF ρpost(m|d) by a Gaussian distribution centered at m∗. For simplicity,we omit the dependence of Aj(m) and ui,j(m) on m. A necessary compo-nent in this process is computing the Hessian Hpost = −∇2m log ρpost(m|d),which is given byHpost = Hlike + Γ−1prior=nsrc∑infreq∑jλ2G>i,jGi,j − S>i,j(P>Γ−1noiseP + λ2A>j Aj)−1Si,j + Γ−1prior,(6.35)whereSi,j = λ2(∇mA>j ) (Ajui,j − qi,j) + λ2A>j Gi,j , (6.36)and the optimal wavefield ui,j is computed by equation 6.11.The full Hessian Hpost is a dense ngrid×ngrid matrix, which is prohibitiveto construct explicitly when ngrid is large, even in the two-dimensional set-111ting. On the other hand, it is also prohibitive if we formulate the HessianHpost as a traditional implicit operator, which requires 2×nsrc×nfreq PDEsolves to compute each matrix-vector product Hpostm with any vector m,according to the expression in equation 6.35. Since the posterior covariancematrix is the inverse of the Hessian, we need to invert the square root of theHessian operator in order to generate random samples. With the implicitHessian operator, we would need to employ an iterative solver such as LSQRor CG (Golub and Van Loan, 2012), and the total number of PDE solvesrequired is therefore proportional to 2×nsmp×niter×nsrc×nfreq. As a result,this type of approach requires an extremely large computational cost whendrawing sufficiently many samples. As a remedy, we exploit the structure ofthe Hessian matrix Hpost to find an approximation that can be constructedand applied in a computationally efficient manner.To exploit the structure of the Hessian matrix Hpost, we will focus onthe Hessian matrix Hlike of the likelihood term, as we already have dis-cussed the matrix Γprior in the previous section. Based on equations 6.35and 6.36, Hlike consists of three components—the matrices P>Γ−1noiseP +λ2A>j Aj , (∇mA>j )(Ajui,j − qi,j), and Gi,j . We first consider the Jacobian(∇mA>j )(Ajui,j − qi,j) and specifically the PDE misfit Ajui,j−qi,j . Whenthe PDE misfit is approximately zero, this overall term is also expected to besmall. As the MAP estimate m∗ simultaneously minimizes the data misfit,PDE misfit, and model penalty term, the PDE misfit is expected to be smallwhen model and observational errors are not too large. Thus, we obtain agood approximation H˜like of the full Hessian Hlike, which corresponds to theGauss-Newton Hessian derived by van Leeuwen and Herrmann (2015):H˜like =nsrc∑infreq∑jλ2G>i,jGi,j − λ4G>i,jAj(P>Γ−1noiseP + λ2A>j Aj)−1A>j Gi,j .(6.37)Consequently, we can use the Gauss-Newton Hessian H˜like to construct theGaussian distribution that approximates the posterior PDF ρpost(m|d):ρpost(m|d) ≈ ρGauss(m) = N (m∗, H˜−1post) = N (m∗, (H˜like + Γ−1prior)−1).(6.38)The Gauss-Newton Hessian H˜like has a compact expression derived from theSherman–Morrison–Woodbury formula (Golub and Van Loan, 2012):H˜like =nsrc∑infreq∑jG>i,jA−>j P>(Γnoise+1λ2PA−1j A−>j P>)−1PA−1j Gi,j . (6.39)112We shall see that this expression provides a factored formulation to implic-itly construct the Gauss-Newton Hessian, which does not require any addi-tional PDE solves to compute matrix-vector products. In order to constructthe implicit Gauss-Newton Hessian H˜like, three matrices are necessary—A−>j P> ∈ Cngrid×nrcv , Gi,j ∈ Cngrid×ngrid , and Γnoise + 1λ2PA−1j A−>j P> ∈Cnrcv×nrcv . For each frequency, constructing the matrix A−>j P> requiresnrcv PDE solves. As described in the previous section, the matrix Gi,jis sparse and driven by the corresponding wavefields ui,j , whose compu-tational cost approximately equals to one PDE solve for each source andeach frequency. The computational complexity of inverting the matrixΓnoise +1λ2PA−1j A−>j P> is O(n3rcv). Since nrcv ngrid, inverting this ma-trix is much cheaper than solving a PDE. Thus, to construct the Gauss-Newton Hessian H˜like, we only need to solve nfreq × (nsrc + nrcv) PDEs.With the computed matrix A−>j P> and wavefield ui,j , the action of theGauss-Newton Hessian H˜like on any vector m can be performed with sev-eral efficient matrix-vector multiplications related to the matrices A−>j P>,Gi,j , and Γnoise+1λ2PA−1j A−>j P>. Compared to the conventional approach,this new factored formulation of the implicit operator does not require ad-ditional PDE solves to compute a matrix-vector multiplication once it isconstructed. In general, we have that nrcv nsmp×niter and using this im-plicit Gauss-Newton Hessian operator to draw nsmp samples is significantlycheaper than the conventional approach. Another advantage of using thisoperator arises from the fact that the computations of the necessary matri-ces corresponding to different frequencies are independent from each other.As a result, we can compute and store these matrices in parallel for differentfrequencies, allowing us to speed up our computations in a distributed com-puting environment. Furthermore, the expression in equation 6.39 providesa natural decomposition of the Gauss-Newton Hessian H˜like as follows:H˜like = R>likeRlike,Rlike =(Γnoise +1λ2PA−11 A−>1 P>)−12PA−11 G1,1· · ·(Γnoise +1λ2PA−1j A−>j P>)−12PA−1j Gi,j· · ·(Γnoise +1λ2PA−1nfreqA−>nfreqP>)−12PA−1nfreqGnsrc,nfreq.(6.40)Similarly to the factored formulation of the implicit Gauss-Newton Hes-sian, we can construct the factor Rlike as an implicit operator once we have113computed the matrix A−>j P> and wavefield ui,j . We will use this implicitoperator Rlike for the sampling method introduced in the next subsection.6.3.3 Steps 5 and 6: sampling the Gaussian PDFsThe covariance matrix H˜−1post is a dense ngrid × ngrid matrix, and the con-struction of its Cholesky factorization involves O(n3grid) operations. Both ofthese facts prohibit us from applying the Cholesky factorization method tosample the Gaussian distribution ρGauss(m) for large-scale problems. Herewe propose to use the so-called optimization-driven Gaussian simulators(Papandreou and Yuille, 2010; Orieux et al., 2012) or the randomize-then-optimize (RTO) method (Solonen et al., 2014) to sample the Gaussian distri-bution ρGauss(m). This method belongs to the classical bootstrap method(Efron, 1981, 1992; Kitanidis, 1995), and it does not require the explicitformulation of the Hessian matrix as well as the expensive Cholesky factor-ization. To outline this method, we first use equation 6.38 to divide H˜postinto H˜like and Γ−1prior. The Gauss-Newton Hessian H˜like has the factorizationin equation 6.40, and we can also compute the Cholesky factorization ofthe prior covariance matrix Γprior = R>priorRprior with an upper-triangularmatrix Rprior. Substituting these two factorizations into equation 6.38, wecan rewrite the Gaussian distribution ρGauss(m) as follows:ρGauss(m) ∝ exp(−12‖Rlikem−Rlikem∗‖2 − 12‖R−>priorm−R−>priorm∗‖2).(6.41)Papandreou and Yuille (2010) and Solonen et al. (2014) noted that indepen-dent realizations from the distribution in equation 6.41 can be computed byrepeatedly solving the following linear data-fitting optimization problem:ms = arg minm‖Rlikem−Rlikem∗ − rlike‖2 +∥∥∥R−>priorm−R−>priorm∗ − rprior∥∥∥2 ,rlike ∼ N (0, Indata×ndata),rprior ∼ N (0, Ingrid×ngrid).(6.42)This optimization problem can be solved by iterative solvers such as LSQRand PCG, which do not require the explicit expression for the matrix Rlikebut merely an operator that can compute the matrix-vector product. As aresult, we can use our implicit formulation of Rlike in equation 6.40 to solvethe optimization problem in equation 6.42 and draw samples from the Gaus-sian distribution ρGauss(m). The pseudo code of the RTO method to draw114samples from the Gaussian distribution ρGauss(m) is shown in Algorithm 8,which realizes step 5 in Algorithm 7. Because the overall sampling strategyconsists of the Gaussian approximation and the RTO method, we will referto the proposed method as GARTO in the rest of the chapter.Algorithm 8 Sample ρGauss(m) by the RTO method1. Start with the MAP estimate m∗, covariance matrices Γnoise and Γprior;2. Formulate the operator Rlike(m∗) by equation 6.40, and compute theCholesky factorization of Γprior = R>priorRprior;3. for s = 1:nsmp4. Generate rprior ∼ N (0, Ingrid×ngrid) and rlike ∼ N (0, Indata×ndata);5. Solve ms = arg minm ‖Rlikem−Rlikem∗ − rlike‖2+∥∥∥R−>priorm−R−>priorm∗ − rprior∥∥∥2;6. end6.3.4 A benchmark method: the randomized maximumlikelihood methodWe have proposed a computationally efficient algorithm—GARTO—thatcan approximately sample the target distribution in equation 6.29 withoutadditional PDE solves once the MAP estimate and the Gauss-Newton Hes-sian operator are computed. However, due to the loss of accuracy caused bythe Gaussian approximation, it is important to investigate the accuracy ofthe GARTO method by comparing it to a benchmark method that can sam-ple the target distribution in equation 6.29 regardless of the computationalcost. The randomized maximum likelihood (RML) (Chen and Oliver, 2012)method is a viable candidate as a benchmark because it has the capabil-ity to draw samples that are good approximations of those from the targetdistribution for weakly nonlinear problems with Gaussian prior distributionand additive Gaussian distributed errors (Chen and Oliver, 2012; Bardsleyet al., 2014, 2015). Indeed, as previously discussed, the target distributionwith weak PDE-constraints that the GARTO method aims to sample is lessprone to the nonlinearity with a carefully selected λ.To draw a sample, the RML method performs a bootstraping-like method(Efron, 1981, 1992) that first samples the data and prior model and thencomputes the resulting MAP. More precisely, in order to draw a samplefrom the target distribution in equation 6.29, the RML method solves the115following nonlinear optimization problem:ms = arg minmnsrc∑i=1nfreq∑j=1(12‖Γ−12noisePui,j(m)− Γ− 12noisedi,j − r(1)i,j ‖2+12‖λAj(m)ui,j(m)− λqi,j − r(2)i,j ‖2)+12‖Γ−12priorm− Γ− 12priorm˜− r(3)‖2,(6.43)where the vectors r(1)i,j , r(2)i,j , and r(3) are random realizations from the stan-dard norm distribution N (0, I). We refer interested readers to Chen andOliver (2012) for more details about the RML method. As a result of this ap-proach, the computational cost of drawing one sample by the RML method isapproximately equivalent to solving one FWI problem, which is significantlymore expensive than the GARTO method. Therefore, we are only able toconduct an comparison with the RML method on a small-scale problem inthe following section.6.4 Numerical experiments6.4.1 Influence of the penalty parameter λThe feasibility of the proposed Bayesian framework relies on the accuracyof the approximations in equations 6.25 and 6.38, both of which depend onthe selection of λ. To get a better understanding of the influence of thisparameter on these approximations, we will work with a relatively simple2D velocity model parameterized by a single varying parameter v0—i.e., thevelocity is given by v(z) = v0 + 0.75zm/s and z increases with verticaldepth. We simulate data with a single source and single frequency for v0 =2000 m/s using a grid spacing of 25 m. The frequency of the data is 5 Hz,which is suitable for avoiding numerical dispersion on this particular gridspacing. We place our source at (z, x) coordinates (50 m, 50 m) and recordthe data at 200 receivers located at the depth of 50 m with a samplinginterval of 25 m. We do not simulate the free surface in this example. Afterthe simulation, we add 10% Gaussian noise to the observed data. Becausethe prior distribution is independent of λ, we only investigate its influenceon the negative log-likelihood of the associated PDFs in this experiment.We abbreviate the negative log-likelihood with “NLL”.Figure 6.4 shows the NLL for the reduced approach (cf. equation 6.8) as1161700 1800 1900 2000 2100 2200 2300v0 [m/s]00.10.20.30.40.50.60.70.80.91 -logρλ2 = 10 -10 µ1λ2 = 10 -6 µ1λ2 = 10 -4 µ1λ2 = 10 -2 µ1λ2 = 10 0µ1λ2 = 10 2µ1ReducedFigure 6.4: The comparison of the negative log-likelihood functions.well as for the penalty approach (cf. equation 6.16) for various values of λ asa function of v0. As discussed previously, we select values of λ2 proportionalto the largest eigenvalue µ1 of the matrix A(m)−>P>Γ−1noisePA(m)−1, i.e.,λ2 = 10−10µ1, 10−6µ1, 10−4µ1, 10−2µ1, 100µ1, and 102µ1. From this fig-ure, we observe that, when λ is large, i.e., λ2 = 102µ1 and 100µ1, the NLLexhibits several local minima, and, as λ increases, it converges to the re-duced approach formulation, as expected. We also note that for small λ,i.e., λ2 = 10−10µ1, the resulting NLL is monotonically decreasing and doesnot have a global minimum, which is due to the fact that the determinantterm in equation 6.22 dominates the NLL. Additionally, in between thesetwo extreme values for λ (i.e., when λ2 = 10−6µ1, 10−4µ1, and 10−2µ1), theresulting NLLs are unimodal with a global minimum, which slightly differsfrom the one of the reduced formulation due to the fact that the wavefieldscomputed by the penalty formulation tend to simultaneously match the dataand PDEs instead of the PDEs alone. This observation implies that with acarefully selected λ, the posterior distribution with weak PDE-constraintspotentially contains less local maxima.To investigate the influence of the parameter λ on the accuracy of theapproximations in equations 6.25 and 6.38, we plot in Figure 6.5 the NLLcorresponding to the true (cf. equation 6.16) ρpost(m|d), its approximationneglecting the determinant term ρpost(m|d) (cf. equation 6.25), and the117Gaussian approximation ρGauss(m) (cf. equation 6.38) for various values ofλ. For simplicity, we refer to these three different functions as ψ1, ψ2, andψ3, respectively. From Figure 6.5a, we observe that when λ2 = 10−10µ1,ψ2 fails to adequately approximate ψ1—i.e., the approximation in equa-tion 6.25 fails—because the determinant term in equation 6.16 dominatesthe negative logarithm function. As λ increases, the determinant term be-comes negligible, and ψ2 becomes a reasonable approximation of ψ1, asshown in Figures 6.5b to 6.5f. However, among the five various selectionsof λ, only when λ2 = 10−2µ1 does ψ3 adequately approximate ψ2. Thisoccurs because when λ is relatively large, ψ2 contains a number of nonop-timal local minima, resulting in ψ2 being poorly modeled by its Gaussianapproximation. Additionally, when λ2 < 10−2µ1, the term Ajui,j − qi,j inequation 6.36 is not negligible, resulting in the Gauss-Newton Hessian be-ing a poor approximation of the full Hessian. These results imply that theproposed criterion—i.e., λ2 = 10−2µ1—provides a reasonable choice for theparameter λ, which can simultaneously satisfy both the approximations inequations 6.25 and 6.38. As a result, the corresponding posterior distribu-tion is less prone to the local maxima and can be appropriately approximatedby the Gaussian distribution in equation 6.38, which ensures the feasibilityof the proposed framework.6.4.2 A 2D layered modelIn this section, we develop an example to compare the accuracy of the statis-tical quantities produced by the GARTO method relative to those producedby the RML method. Considering the large computational cost required bythe RML method, we use a small three-layer model as our baseline model,as shown in Figure 6.6a, whose size is 1500 m × 3000 m. We discretize thevelocity model with a grid spacing of 50 m, yielding 1800 unknown parame-ters. At the surface, we place sixty sources and receivers with a horizontalsampling interval of 50 m. To control the computational cost and ensurethe feasibility of the RML method, we use a Ricker wavelet centered at 6 Hzto simulate the observed data with frequencies of 5, 6, and 7 Hz. We usean absorbing boundary condition at the surface, so that no surface-relatedmultiples are included. After the simulation, we add 15% Gaussian noiseto the observed data resulting in a covariance matrix of Γnoise = 1752I. Toset up the prior distribution, we first construct the monotonically increasingmodel shown in Figure 6.6b as the prior mean model, which correspondsto the well-known observation that the velocity of the Earth, in general,increases with depth. Following the strategy used by Bardsley et al. (2015)1181800 2000 2200v0 [m/s]02468-logρψ1ψ2ψ3(a) λ2 = 10−10µ11800 2000 2200v0 [m/s]02000400060008000-logρψ1ψ2ψ3(b) λ2 = 10−6µ11800 2000 2200v0 [m/s]3.544.555.56-logρ×10 4ψ1ψ2ψ3(c) λ2 = 10−4µ11800 2000 2200v0 [m/s]11.522.5-logρ×10 5ψ1ψ2ψ3(d) λ2 = 10−2µ11800 2000 2200v0 [m/s]0123-logρ×10 6ψ1ψ2ψ3(e) λ2 = 100µ11800 2000 2200v0 [m/s]012345-logρ×10 6ψ1ψ2ψ3(f) λ2 = 102µ1Figure 6.5: The comparison of the negative log-likelihood functions ofthe true distribution (ψ1), the approximate distribution (ψ2),and the Gaussian approximation distribution (ψ3) with the val-ues of λ2 = 10−10µ1, 10−6µ1, 10−4µ1, 10−2µ1, 100µ1, and 102µ1.1190 500 1000 1500 2000 2500 3000Lateral [m]050010001500Depth [m]1.522.5km/s(a) True model0 500 1000 1500 2000 2500 3000Lateral [m]050010001500Depth [m]1.522.5km/s(b) Prior mean modelFigure 6.6: The true model (a) and the prior mean model (b).that ensures the prior covariance matrix is well-conditioned, we constructthe prior covariance matrix by selecting a = 0.1 km2/s2, b = 0.65 , andc = 0.01 km2/s2, resulting in a prior standard deviation of 0.33 km/s thatmeets the maximal difference between the true and prior mean models. Weselect the penalty parameter λ for each frequency according to the proposedselection criterion, resulting in λ = 13, 12, and 11, respectively. To computereliable statistical quantities and while also ensuring that the RML methodterminates in a reasonable amount of time, we use both the GARTO and theRML methods to generate 10, 000 samples from the posterior distribution.Based on our experience, generating 10, 000 samples is sufficient for bothmethods to produce stable results in this case.Given that the computational overhead introduced by these methods isnegligible compared to the number of PDEs that need to be solved, we usethe number of PDEs as our performance metric. To generate 10, 000 sam-120ples, the RML method needs to solve 10, 000 nonlinear optimization prob-lems. To solve each nonlinear optimization problem, following van Leeuwenet al. (2014), we stop the optimization when the relative misfit differencebetween two iterations drops below 10−3 resulting in 100 l-BFGS iterations.During each l-BFGS iteration, we have to solve 2 × 3 × 60 PDEs to com-pute the objective and gradient. As a result, the RML method requires360 × 100 × 10000 = 360 million PDE solves to draw the 10, 000 sam-ples. Contrary to the RML method, the GARTO method requires signifi-cantly fewer PDE solves. The total number of PDE solves required by theGARTO method is 36, 360, which includes 36, 000 PDE solves to find theMAP estimate and another 360 PDE solves to construct the Gauss-Newtonoperator. With the MAP estimate and the Gauss-Newton operator in hand,the GARTO method uses the RTO approach to sample the 10, 000 sampleswithout involving any additional PDE solves as we explained above. There-fore, neglecting the costs associated with solving the least-squares systems,compared to the RML method, the GARTO method requires only 110000ththe computational budget to generate the same number of samples.In addition to the significant computational speedups introduced by theGARTO method, the GARTO method also generates samples that yieldsimilar statistical quantities as those produced by the RML method. Forinstance, the posterior mean models obtained by the two methods are shownin Figure 6.7. From Figures 6.7a and 6.7b, we observe that aside from someslight differences in the second and third layers, the two results are roughlyidentical, with an average pointwise relative difference of 1.5%. Both resultsprovide acceptable reconstructions of the original velocity model, despitethe fact that the data are noisy. We also use the two posterior mean modelsto compute the predicted data and compare them with the observed data inFigures 6.8a and 6.8b, which faithfully match the observed data, aside fromthe noise. The pointwise standard deviations computed from both meth-ods, shown in Figure 6.9, result in estimates that are visually quite similarthroughout the entire model. The average relative difference between thestandard deviations produced by both methods is 6%, an acceptable errorlevel, and results from the Gaussian approximation in GARTO. Figure 6.10depicts the 95% confidence intervals obtained with the two methods at threevertical cross-sections through the model. The confidence intervals obtainedby the two methods are virtually identical, as are the pointwise marginal dis-tributions shown in Figure 6.11. All these results illustrate that, comparedto the RML method, the proposed GARTO method can produce accurateestimates of statistical quantities, while requiring a fraction of the compu-tational costs.1210 500 1000 1500 2000 2500 3000Lateral [m]050010001500Depth [m]1.522.5km/s(a) GARTO0 500 1000 1500 2000 2500 3000Lateral [m]050010001500Depth [m]1.522.5km/s(b) RMLFigure 6.7: The posterior mean models obtained by the GARTOmethod (a) and the RML method (b).6.4.3 BG ModelTo demonstrate the effectiveness of our method when applied to a more re-alistic problem, we conduct an experiment on a 2D slice from the 3D BGCompass model shown in Figure 6.12a. This is a synthetic velocity modelcreated by BG Group, which contains a large degree of variability and hasbeen widely used to evaluate performances of different FWI methods (Liet al., 2012; van Leeuwen and Herrmann, 2013a). The model is 2000 m deepand 4500 m wide, with a grid size of 10 m resulting in 92,455 unknown pa-rameters. Following van Leeuwen and Herrmann (2013a), we use a Rickerwavelet with a central frequency of 15 Hz to simulate 91 sources at thedepth of 50 m with a horizontal sampling interval of 50 m. As before, wedo not model the free-surface, so that the data do not contain free-surfacerelated multiples. We place 451 equally spaced receivers at the same depth1220 500 1000 1500 2000 2500 3000Receiver [m]-6000-4000-200002000400060008000R(d)Observed datapredicted data - GARTO(a) GARTO0 500 1000 1500 2000 2500 3000Receiver [m]-6000-4000-200002000400060008000R(d)Observed datapredicted data - RML(b) RMLFigure 6.8: The comparison between the observed data and the pre-dicted data with the posterior mean models obtained by theGARTO method (a) and the RML method (b).1230 500 1000 1500 2000 2500 3000Lateral [m]050010001500Depth [m]0.050.10.150.20.250.3km/s(a) GARTO0 500 1000 1500 2000 2500 3000Lateral [m]050010001500Depth [m]0.050.10.150.20.250.3km/s(b) RMLFigure 6.9: The posterior standard deviations obtained by theGARTO method (a) and the RML method (b).as the sources to record the data, which contain 30 equally spaced frequencycomponents ranging from 2 Hz to 31 Hz. This results in 1,231,230 observeddata values. To mimic a real-world noise level, we corrupt the syntheticobservations with 15% random Gaussian noise, leading to Γnoise = 462I. Toconstruct the prior distribution, we first set its mean model (Figure 6.12b)by smoothing the true model followed by horizontal averaging. Second, weconstruct the covariance matrix of the prior distribution utilizing the factthat we have the true 3D model, which contains 1800 2D slices. We regardthese 2D slices as 1800 2D samples, from which we compute the pointwisestandard deviation. After horizontal averaging, we obtain the prior stan-dard deviation shown in Figure 6.12c. With the prior standard deviation,we select a = 0.02 km2/s2 and b = 19.5 to construct a well-conditionedcovariance matrix with a correlation length of 60 m (Bardsley et al., 2015).1241 2 3Velocity [km/s]050010001500Depth [m]GARTORML(a) x = 500 m1 2 3Velocity [km/s]050010001500Depth [m]GARTORML(b) x = 1500 m1 2 3Velocity [km/s]050010001500Depth [m]GARTORML(c) x = 2500 mFigure 6.10: The mean (line) and 95% confidence interval (back-ground patch) comparisons of the GARTO method (blue) andthe RML method (red) at x = 500 m, 1500 m, and 2500 m.The similarity between these two results implies that the con-fidence intervals obtained with the GARTO method is a goodapproximation of the ones obtain with the RML method.The parameter c in equation 6.30 is calculated according to the standarddeviation and the parameters a and b. Finally, we compute the penaltyparameter λ for each frequency (listed in Table 6.1) according to the crite-rion introduced earlier in order to obtain a posterior distribution that is lessprone to local maxima. Considering both the computational resources andthe accuracy of the inverted statistical quantities, we will use the GARTOmethod to draw 2000 samples according to Bardsley et al. (2015). Comparedto the previous example, which had a much simpler model, this model con-tains a significantly larger number of unknown parameters and data pointsand is a better proxy for real-world problems.During the inversion, we use 200 l-BFGS iterations to compute the MAPestimate plotted in Figure 6.13a with the same stopping criterion as in theprevious example. Compared to the true model, we observe that most of theobservable velocity structures are reconstructed in the MAP estimate, asidefrom some small measurement noise related imprints near the boundary ofthe model. We also observe that the shallow parts (z ≤ 1400 m) of theBG model, where the turning waves exist (for a maximal offset of 4500 m1250 1 2 3 4m [km/s]012345ρPosterior-GARTOPosterior-RMLPrior(a) x = 1500 m, z = 200 m0 1 2 3 4m [km/s]00.511.52ρPosterior-GARTOPosterior-RMLPrior(b) x = 1500 m, z = 700 m0 1 2 3 4m [km/s]00.511.5ρPosterior-GARTOPosterior-RMLPrior(c) x = 1500 m, z = 1200 mFigure 6.11: The comparison of the prior marginal distribution (solidline) and the posterior marginal distributions obtained bythe GARTO method (dotted line) and the RML method(dashed line) at the locations of (x, z) = (1500m, 200m),(1500m, 700m), and (1500m, 1200m).Frequency 2 3 4 5 6 7 8 9 10 11λ 37.8 29.3 24.4 20.5 17.6 15.2 13.5 12.1 10.9 10.1Frequency 12 13 14 15 16 17 18 19 20 21λ 9.3 8.8 8.3 7.9 7.5 7.1 6.9 6.5 6.4 6.1Frequency 22 23 24 25 26 27 28 29 30 31λ 5.9 5.7 5.5 5.4 5.2 5.1 4.9 4.8 4.7 4.6Table 6.1: The selection of λ for each frequency126Lateral [m]0 1000 2000 3000 4000Depth [m]0500100015002000 1.522.533.544.5km/s(a) True modelLateral [m]0 1000 2000 3000 4000Depth [m]0500100015002000 1.522.533.544.5km/s(b) Prior mean modelLateral [m]0 1000 2000 3000 4000Depth [m]05001000150020000.20.40.60.81km/s(c) Prior standard deviationFigure 6.12: (a) The true model, (b) the prior mean model, and (c)the prior standard deviation.127(Virieux and Operto, 2009)) are better recovered relative to the deep parts(z ≥ 1400 m), where the only received energy arises from reflected waves.This implies that the portion of the data corresponding to the turning wavesis more informative to the velocity reconstruction than that of the reflectedwaves, which is a well-known observation in seismic inversions.After obtaining the MAP estimate, we construct the Gauss-Newton Hes-sian operator and apply the RTO method to generate 2000 samples. Thisallows us to compute the posterior standard deviation (Figure 6.13b) andcompare it with the prior standard deviation (Figure 6.12c). To have abetter understanding of the information that the data introduce, we alsocompute the relative differenceSTDpost(mk)−STDprior(mk)|STDprior(mk)| between the poste-rior and prior standard deviations (Figure 6.13c). In the shallow parts ofthe model (z ≤ 1400 m), the posterior standard deviation decreases be-tween 10% − 50% compared to the prior standard deviation, while in thedeep parts (z ≥ 1400 m), the reduction in standard deviation is smallerthan 3%. This observation is consistent with the notion that, owing to theamplitude decay of propagating waves, the data place more constraints onthe velocity variations in the shallow parts of the model compared to thedeep parts. Additionally, considering the areas where the turning waves andthe reflected waves exist, this observation also implies that the portion ofthe data corresponding to the turning waves can reduce more uncertaintiesin the recovered velocity compared to the reflected waves. To further eval-uate this inversion result, we compare the prior model, the MAP estimateof the posterior, and the true velocity at three different cross sections inFigure 6.14 (i.e., x = 1000 m, 2500 m, and 4000 m), in which we also plotthe prior standard deviation (red patch) and posterior standard deviation(blue patch). In the shallow region of the model, the MAP estimates closelymatch the true model, while they diverge in the deeper region in a more pro-nounced manner. This is again consistent with the notion that the data aremore informative in the shallow area of the model compared to the deeperareas.We also consider the pointwise posterior marginal distribution generatedby the posterior and prior distributions to further understand the results ofthe GARTO method. Figure 6.15 compares these distributions at four differ-ent locations, (x, z) = (2240 m, 40 m), (2240 m, 640 m), (2240 m, 1240 m),and (2240 m, 1840 m). The posterior distribution is more concentrated thanthe prior distribution in the shallow regions of the model, while in the deepparts, the two distributions are almost identical. Therefore, the recoveredvelocity in the shallow parts is more reliable than in the deep parts.128Lateral [m]0 1000 2000 3000 4000Depth [m]0500100015002000 1.522.533.544.5km/s(a) Posterior MAPLateral [m]0 1000 2000 3000 4000Depth [m]05001000150020000.20.40.60.81km/s(b) Posterior standard deviationLateral [m]0 1000 2000 3000 4000Depth [m]0500100015002000-50-40-30-20-10%(c) Relative difference between the posterior and prior standard deviationsFigure 6.13: (a) The posterior MAP estimate of the model, (b) theposterior standard deviation, and (c) the relative differencebetween the posterior and the prior standard deviations, i.e.,STDpost(mk)−STDprior(mk)|STDprior(mk)| .129Velocity [km/s]0 2 4 6Depth [m]0200400600800100012001400160018002000Posterior modelPrior modelTrue model(a) x = 1000 mVelocity [km/s]0 2 4 6Depth [m]0200400600800100012001400160018002000Posterior modelPrior modelTrue model(b) x = 2500 mVelocity [km/s]0 2 4 6Depth [m]0200400600800100012001400160018002000Posterior modelPrior modelTrue model(c) x = 4000 mFigure 6.14: The mean and standard deviation comparisons of theposterior (blue) and the prior (red) distributions at x =1000 m, 2500 m, and 4000 m.To verify our statistical results, we also utilize the RML method as abaseline approach. For this example, drawing a single sample with theRML method using 200 l-BFGS iterations requires at least 1.09 million PDEsolves, which is computationally daunting. As a result, we only generate 10samples via the RML method and compare them to the 95% confidence in-tervals (i.e., the blue patch) obtained by the GARTO method in Figure 6.16.From these figures, it is clear that the majority of the 10 samples lie in theblue patch. Moreover, the variation of the ten samples also matches the 95%confidence interval. In this case, we conclude that the estimated confidenceintervals are likely accurate approximations of the true confidence intervals.6.5 DiscussionWhen the underlying model is given by PDE-constraints consisting of multi-ple experiments, one is forced to make various approximations and tradeoffsin order to develop a computationally tractable procedure. There are a largenumber of discretized parameters, corresponding to a discretized 2D or 3Dfunction, and one must necessarily avoid having to explicitly construct densecovariance matrices of the size of the model grid squared, whose construc-tion requires a large number of PDE solves. Moreover, each evaluation of130m [km/s]1 1.2 1.4 1.6 1.8 2;05101520PosteriorPrior(a) x = 2240 m, z = 40 mm [km/s]0 2 4 6;00.20.40.60.8PosteriorPrior(b) x = 2240 m, z = 640 mm [km/s]0 2 4 6;00.20.40.60.8PosteriorPrior(c) x = 2240 m, z = 1240 mm [km/s]0 2 4 6;00.20.40.60.8PosteriorPrior(d) x = 2240 m, z = 1840 mFigure 6.15: The comparison of the prior (solid line) and the pos-terior (dotted line) marginal distributions at the locationsof (x, z) = (2240m, 40m), (2240m, 640m), (2240m, 1240m),and (2240m, 1840m).the posterior distribution involves the solution of multiple PDEs, a com-putationally expensive affair. Methods that require tens or hundreds ofthousands of such posterior evaluations, such as McMC-type methods, donot scale to realistically sized problems. The original PDE-constrained for-mulation of Bayesian inference, while theoretically convenient, results in aposterior that cannot be appropriately approximated by a Gaussian distri-bution, whereas the relaxation of the exact PDE-constraints results in aposterior that is much more amenable to Gaussian approximation. Ideally,one would like to parameterize the distribution over the joint model/fieldvariables (m,u) and estimate the variances accordingly. Hidden in this no-tation, however, is the fact that in real-world problems, we have hundredsor potentially thousands of source experiments, each of which corresponds131Depth [m]0200400600800100012001400160018002000Velocity [km/s]0 2 4 60.95 Confidence interval(a) x = 1000 mDepth [m]0200400600800100012001400160018002000Velocity [km/s]0 2 4 60.95 Confidence interval(b) x = 2500 mDepth [m]0200400600800100012001400160018002000Velocity [km/s]0 2 4 60.95 Confidence interval(c) x = 4000 mFigure 6.16: The 95% confidence intervals and the 10 realizations viathe RML method at x = 1000 m, 2500 m, and 4000 m.to a different u. Storing all of these fields and updating them explicitlybecomes prohibitive memory-wise, even on a large cluster. As a result, ourapproach aims to keep the benefits of relaxing the PDE-constraints whilestill resulting in a computationally feasible algorithm.The initial results illustrate that with the specific selection of λ, i.e.,λ2 = 0.01µ1, the relaxed formulation of the posterior PDF is less proneto local maxima, which enables us to analyze it via an arguably accurateGaussian approximation. Once we can manipulate the covariance matrixof the Gaussian approximation through the implicit PDE-free formulation,we have access to a variety of statistical quantities, including the pointwisestandard deviation, the confidence intervals, and the pointwise marginal dis-tributions, in a tractable manner. We can use these quantities to assess theuncertainties of our inversion results and to identify the areas with high/lowreliability in the model. This information is important and useful for thesubsequent processing and interpretation. A straightforward example is thatit allows us to assess the reliability of the visible image features obtained bythe subsequent procedure of imaging, as in Ely et al. (2017).While the initial results are promising, some aspects of the proposedframework warrant further investigation. Although numerical examples il-lustrate the feasibility of the proposed method for the case with the selectionof λ2 = 0.01µ1, it does not guarantee the feasibility of the proposed approach132for PDFs arising from other choices of λ. For other selections of λ, the pos-terior PDFs can significantly differ from a Gaussian PDF, which makes ap-proximately sampling them challenging. Potentially other sampling schemescan be explored for these distributions.While we have shown the feasibility of the proposed sampling method for2D problems, the application of the proposed sampling method to 3D prob-lems is still challenging from the perspective of memory usage. To satisfythe O(ngrid × (nrcv + nsrc)× nfreq) storage requirement for formulating theimplicit Gauss-Newton Hessian operator, a large high-performance clusterwith enough computational workers and memory is needed to store all ofthe necessary matrices in parallel.6.6 ConclusionsWe have described a new Bayesian framework for partial-differential-equation-constrained inverse problems. In our new framework, we introduce the fieldvariables as auxiliary variables and relax the partial-differential-equation-constraint. By integrating out the field parameters, we avoid having tostore and update the high number of field parameters, and exploit the factthat the new distribution is Gaussian in the field variables for every fixedmodel. We propose a method for selecting an appropriate penalty parametersuch that the new posterior distribution can be approximated by a Gaussiandistribution, which is more accurate than the conventional formulation.We apply the new formulation to the seismic inverse problems and deriveeach component of the general framework. For this specific application, weuse a partial-differential-equation-free Gauss-Newton Hessian to formulatethe Gaussian approximation of the posterior distribution. We also illustratethat with this Gauss-Newton Hessian, the Gaussian approximation can besampled without the requirement of the explicit formulation of the Gauss-Newton Hessian and its Cholesky factorization.Our proposed method compares favorably to the existing randomizedmaximum likelihood method for generating different statistics on a simplelayered model and the more challenging BG Compass model. Compared tothe randomized maximum likelihood method, our method produces resultsthat are quite visually similar, while requiring significantly fewer partial-differential-equation solves, which are the computational bottleneck in theseproblems. We expect that these methods will scale reasonably well to 3Dmodels, where traditional methods have only begun to scratch the surfaceof the problem.While in this chapter we utilize the Gaussian smoothness prior distribu-133tion, indeed, the proposed sampling method can also handle other choices ofprior distributions, as long as they have sparse covariance matrices. In thefuture, we can investigate the incorporation of the proposed method withother kinds of prior distributions.134Chapter 7ConclusionSince oil and gas industries rely on images of the subsurface structure toevaluate the location, size, and profitability of oil and gas reservoirs, aswell as a variety of exploration and economic risks, research into producinghigh-resolution images of complexly structured areas is exceptionally signif-icant. In turn, this thesis has contributed source estimation techniques forseismic inversion and imaging problems, which can be applied to realisticcases, allowing for us to construct both background velocity models andhigh-resolution details without prior knowledge of the source functions. Inaddition, we have also contributed a computationally efficient framework toanalyze uncertainties in the recovered velocity model, which can be usedto assess risks in the subsequent procedures. In order to do this, we haveaddressed the topics of (1) source estimation for time-domain sparsity pro-moting least-squares reverse-time migration, (2) wavefield-reconstruction in-version with source estimation, and (3) uncertainty quantification for weakwave-equation constrained inverse problems.7.1 Source estimation for time-domain sparsitypromoting least-squares reverse-timemigrationIn the first part of the thesis (chapter 2), we proposed an on-the-fly sourceestimation technique for the time-domain least-squares reverse-time migra-tion. The linearized forward operator is linear with respect to the sourcefunction. Based on this fact, we proposed to project out the source func-tion by solving a regularized linear data-fitting optimization problem duringeach iteration of LS-RTM. The resulting source function can minimize the135difference between the predicted and observed data given the current modelupdate. Through iteratively updating the source function in this way alongwith the update of the model parameters, the time-domain least-squaresreverse-time migration is able to reach an image close to the one using theaccurate source function. As a result, we mitigate the dependence of thetime-domain least-squares reverse-time migration on the accurate sourcefunction.We embedded the proposed source estimation technique in the robusttime-domain sparsity promoting least-squares reverse-time migration. Thisapproach combines the stochastic optimization technique and a curvelet-based sparsity-promoting inversion algorithm. As a result, it can producehigh-resolution and artifact-free images at the cost of approximately onereverse time migration using all the sources. We illustrated that the sparsity-promoting inverse problem can be solved by an easy-to-implement approach—linearized Bregman method, which allows us to straightforwardly embed theproposed source estimation technique in each iteration.Finally, we provided numerical examples illustrating that the proposedalgorithm yields results that are comparable to those produced with the truesource functions. As a result, we have contributed an approach that provideshigh-resolution subsurface images but without the need for obtaining correctsource functions, and is, therefore, more applicable to real-world problems.7.2 Source estimation forwavefield-reconstruction inversionIn the second part of the thesis (chapters 3, 4, and 5), we proposed a compu-tationally efficient on-the-fly source estimation technique in the context ofwavefield-reconstruction inversion, which enhances the realistic applicationof wavefield-reconstruction inversion. To achieve this task, we derived an op-timization problem with respect to the velocity model, wavefields, and sourcefunctions. We illustrated that this optimization problem can be solved bythe variable projection method, which simultaneously projects out the wave-fields and source functions during each iteration. After the projection, weobtain a reduced objective function that only depends on the velocity. Weillustrated that minimizing this reduced objective function can produce aninverted velocity model that is close to the one using the correct sourcefunctions.During the procedure of source estimation, the projection of the wave-fields and source functions involves inverting a source-dependent linear sys-136tem, which requires an independent Cholesky factorization for each source.To reduce the computational cost, we applied the block matrix inversion for-mula and arrived at a new inversion scheme that only requires inverting asource-independent matrix. Therefore, all sources share the same Choleskyfactorization and the computational cost is significantly reduced.We demonstrated the effectiveness of the proposed source estimationtechnique by a series of numerical experiments conducted with the BG com-pass model dataset. From these experiments, we observed that the pro-posed approach is stable with respect to the modeling errors and measure-ment noise. Additionally, we demonstrated the feasibility of the proposedapproach to practical problems by applying it to the Chevron 2014 bench-mark blind test dataset. Although the data contain elastic events and ourmodeling kernel is based on the acoustic wave equation, the proposed ap-proach is still able to recover both the source functions and velocity model.In sum, the proposed source estimation technique can enable us to con-duct wavefield-reconstruction inversion without prior information about thesource functions.7.3 Uncertainty quantification for weakwave-equation constrained inverse problemsIn the last portion of the thesis (chapter 6), we presented an uncertaintyquantification framework based on weak wave-equation constraints. Theconventional approach strictly satisfies the wave-equation constraint by solv-ing the wave equation. Different from the conventional approach, we re-laxed the wave-equation constraints by allowing for Gaussian deviations inthe wave equations. We discovered that with a carefully selected variancechoice for the wave-equation misfit, the relaxing strategy results in a poste-rior distribution containing fewer modes in comparison with the conventionalstatistical FWI framework. This allows us to appropriately approximate thenew posterior distribution by a Gaussian distribution with a higher accuracythan the conventional one.We also proposed an implicit and wave-equation-free formulation to con-struct the covariance matrix of the Gaussian distribution. This implicit for-mulation enables us to compute the product of the covariance matrix andany vectors with several efficient matrix-vector multiplications instead ofsolving many expensive wave equations. Based on this fact, We embeddedthe implicit formulation in the randomize-then-optimize method. The re-sulting approach can sample the Gaussian distribution without the require-137ment of the explicit formulation of the covariance matrix and its Choleskyfactorization.We favorably compared the proposed approach to the existing random-ized maximum likelihood method for generating different statistical quanti-ties on a simple layered model and the more challenging BG Compass model.Compared to the randomized maximum likelihood method, the proposedapproach produces results that are quite visually similar, while requiringsignificantly fewer wave-equation solves, which are the main computationalbottleneck in these problems.7.4 Current limitationsSome limitations of the work presented in this thesis are as follows:1. The current implementation of the projection procedure in wavefield-reconstruction inversion is based on the Cholesky factorization method,which is feasible for small or intermediate-scale 2D problems but un-feasible for large-scale 3D problems.2. The Gaussian approximation for the new posterior distribution is onlyvalid with the specific selection of the variance in the wave-equationmisfit. For other selections of the variance, the posterior distributioncan significantly differ from a Gaussian distribution, which makes ap-proximately sampling them challenging.3. The source estimation procedure in the proposed time-domain sparsity-promoting LS-RTM requires solving a deconvolution problem. Due tothe large volume of data in 3D problems, this deconvolution problemcan be computationally expensive to solve for 3D problems.7.5 Future research directionsSome ideas for future work are as follows:1. Develop a computationally fast and memory efficient implementationfor 3D wavefield-reconstruction inversion with source estimation. In3D cases, it is computationally unfeasible to simultaneously projectout the wavefields and source functions by solving the data-augmentedsystem with direct solvers. Therefore, an efficient iterative solver withan appropriate preconditioner is necessary.1382. Develop a computationally efficient second-order method for wavefield-reconstruction inversion. We can also apply the implicit Gauss-NewtonHessian proposed in the chapter of uncertainty quantification to thedeterministic optimization problem of wavefield-reconstruction inver-sion. Because the Hessian-vector product only involves several simplematrix-vector multiplications, we can use iterative solves includingPCG and LSQR to invert the Gauss-Newton Hessian without solvingadditional wave equations. As a result, we can derive a computation-ally efficient second-order method by using the implicit Gauss-NewtonHessian.3. Incorporate non-Gaussian prior distribution with the proposed uncer-tainty quantification framework with weak wave-equation constraints.Gaussian prior models are used in this thesis since they provide math-ematically simple and computationally efficient formulations of im-portant inverse problems. Unfortunately, the Gaussian prior fails tocapture a range of important properties including sparsity and naturalconstraints such as positivity. This fact motivates us to incorporatenon-Gaussian priors with the current framework.4. The strategy of relaxing the wave-equation provides us with the ac-cess to separate the noise into the receiver part (measurement noise)and source part (background noise). Indeed, the conditional distri-bution ρ(u|m) can be understood as the distribution associated withthe background noise. We can easily replace the penalty parameter bya weighting matrix to describe the statistical properties of the back-ground noise. Therefore, we can use the proposed formulation to studythe influence of the measurement and background noise on the uncer-tainties of the inverted velocity model.5. Develop a 3D implementation for the sparsity-promoting least-squaresreverse-time migration with source estimation. Recent computationalimprovements allow us to simulate wavefields in a 3D manner andundertake the challenge of executing least-squares reverse-time migra-tion. Therefore, an extension of the current work to 3D problemswould be beneficial and feasible.139BibliographyAkcelik, V., 2002, Multiscale Newton-Krylov methods for inverse acousticwave propagation: PhD thesis, Carnegie Mellon University. → pages 52Amestoy, P., R. Brossier, A. Buttari, J.-Y. L’Excellent, T. Mary, L.Me´tivier, A. Miniussi, and S. Operto, 2016, Fast 3D frequency-domainfull-waveform inversion with a parallel block low-rank multifrontal directsolver: Application to OBC data from the North Sea: Geophysics,81(6), R363–R383. → pages 74Aoki, N., and G. T. Schuster, 2009, Fast least-squares migration with adeblurring filter: Geophysics, 74, WCA83–WCA93. → pages 2, 7Aravkin, A. Y., and T. van Leeuwen, 2012, Estimating nuisance parametersin inverse problems: Inverse Problems, 28, 115016. → pages 46Askan, A., V. Akcelik, J. Bielak, and O. Ghattas, 2007, Full waveforminversion for seismic velocity and anelastic losses in heterogeneousstructures: Bulletin of the Seismological Society of America, 97,1990–2008. → pages 52Aster, R. C., B. Borchers, and C. H. Thurber, 2011, Parameter estimationand inverse problems: Academic Press, 90. → pages 94Bardsley, J. M., A. Seppa¨nen, A. Solonen, H. Haario, and J. Kaipio, 2015,Randomize-then-optimize for sampling and uncertainty quantification inelectrical impedance tomography: SIAM/ASA Journal on UncertaintyQuantification, 3, 1136–1158. → pages 94, 98, 109, 110, 115, 118, 124,125Bardsley, J. M., A. Solonen, H. Haario, and M. Laine, 2014,Randomize-then-optimize: A method for sampling from posteriordistributions in nonlinear inverse problems: SIAM Journal on ScientificComputing, 36, A1895–A1910. → pages 95, 115Baumstein, A., 2013, POCS-based geophysical constraints inmulti-parameter full wavefield inversion: Presented at the 75th EAGEConference and Exhibition 2013, EAGE. → pages 81, 88Bauschke, H., and J. Borwein, 1994, Dykstras alternating projection140algorithm for two sets: Journal of Approximation Theory, 79, 418 – 443.→ pages 83Baysal, E., D. D. Kosloff, and J. W. Sherwood, 1983, Reverse timemigration: Geophysics, 48, 1514–1524. → pages 2, 7Bernstein, D. S., 2005, Matrix mathematics: Theory, facts, and formulaswith application to linear systems theory: Princeton University PressPrinceton. → pages 54Bertsekas, D. P., and J. N. Tsitsiklis, 1995, Neuro-dynamic programming:an overview: Decision and Control, 1995., Proceedings of the 34th IEEEConference on, IEEE, 560–564. → pages 11Biegler, L. T., T. F. Coleman, A. Conn, and F. N. Santosa, 2012,Large-scale optimization with applications: Part I: Optimization ininverse problems and design: Springer Science & Business Media, 92. →pages 93Biondi, B., 2001, Kirchhoff imaging beyond aliasing: Geophysics, 66,654–666. → pages 7Brenders, A. J., and R. G. Pratt, 2007, Full waveform tomography forlithospheric imaging: results from a blind test in a realistic crustalmodel: Geophysical Journal International, 168, 133–151. → pages 81,83, 90Brossier, R., S. Operto, and J. Virieux, 2009, Seismic imaging of complexonshore structures by 2D elastic frequency-domain full-waveforminversion: Geophysics, 74(6), WCC105–WCC118. → pages 52Bui-Thanh, T., O. Ghattas, J. Martin, and G. Stadler, 2013, Acomputational framework for infinite-dimensional Bayesian inverseproblems part I: The linearized case, with application to global seismicinversion: SIAM Journal on Scientific Computing, 35, A2494–A2523. →pages 16, 94, 98, 106, 109Bunks, C., F. M. Saleck, S. Zaleski, and G. Chavent, 1995, Multiscaleseismic waveform inversion: Geophysics, 60, 1457–1473. → pages 12, 37Cai, J.-F., S. Osher, and Z. Shen, 2009, Convergence of the linearizedBregman iteration for L1-norm minimization: Mathematics ofComputation, 78, 2127–2136. → pages 24Caldwell, J., and C. Walker, 2011, An overview of marine seismicoperations: Technical report 448: Technical report. → pages xi, 1, 3Candes, E., L. Demanet, D. Donoho, and L. Ying, 2006, Fast discretecurvelet transforms: Multiscale Modeling & Simulation, 5, 861–899. →pages 22Cande`s, E. J., et al., 2006, Compressive sampling: Proceedings of theinternational congress of mathematicians, Madrid, Spain, 1433–1452. →141pages 11Cande`s, E. J., and M. B. Wakin, 2008, An introduction to compressivesampling: IEEE signal processing magazine, 25, 21–30. → pages 11Chen, S. S., D. L. Donoho, and M. A. Saunders, 2001, Atomicdecomposition by basis pursuit: SIAM review, 43, 129–159. → pages 22,23Chen, Y., and D. S. Oliver, 2012, Ensemble randomized maximumlikelihood method as an iterative ensemble smoother: MathematicalGeosciences, 44, 1–26. → pages 95, 115, 116Chen, Z., D. Cheng, W. Feng, and T. Wu, 2013, An optimal 9-point finitedifference scheme for the Helmholtz equation with PML: InternationalJournal of Numerical Analysis and Modeling, 10, 389–410. → pages 57,109Cohen, J. K., and N. Bleistein, 1979, Velocity inversion procedure foracoustic waves: Geophysics, 44, 1077–1087. → pages 2Cordua, K. S., T. M. Hansen, and K. Mosegaard, 2012, Monte Carlofull-waveform inversion of crosshole GPR data using multiple-pointgeostatistical a priori information: Geophysics, 77, H19–H31. → pages16Donoho, D. L., 2006, Compressed sensing: IEEE Transactions oninformation theory, 52, 1289–1306. → pages 11, 22Duijndam, A., 1988, Bayesian estimation in seismic inversion. part ii:Uncertainty analysis1: Geophysical Prospecting, 36, 899–918. → pages14Dummit, D. S., and R. M. Foote, 2004, Abstract algebra: Wiley Hoboken,3. → pages 100Eckstein, J., and W. Yao, 2012, Augmented Lagrangian and alternatingdirection methods for convex optimization: A tutorial and someillustrative computational results: RUTCOR Research Reports, 32. →pages 85Efron, B., 1981, Nonparametric estimates of standard error: the jackknife,the bootstrap and other methods: Biometrika, 68, 589–599. → pages 95,114, 115——–, 1992, Bootstrap methods: another look at the jackknife, inBreakthroughs in statistics: Springer, 569–593. → pages 95, 114, 115Ely, G., A. Malcolm, and O. V. Poliannikov, 2017, Assessing uncertaintiesin velocity models and images with a fast nonlinear uncertaintyquantification method: Geophysics, 83(2), R63–R75. → pages 14, 16,94, 132Epanomeritakis, I., V. Akc¸elik, O. Ghattas, and J. Bielak, 2008, A142Newton-CG method for large-scale three-dimensional elasticfull-waveform seismic inversion: Inverse Problems, 24, 034015. → pages93Esser, E., L. Guasch, T. van Leeuwen, A. Y. Aravkin, and F. J. Herrmann,2018, Total-variation regularization strategies in full-waveform inversion:SIAM Journal on Imaging Sciences, 11, 376–406. → pages 78, 101Etgen, J., S. H. Gray, and Y. Zhang, 2009, An overview of depth imagingin exploration geophysics: Geophysics, 74, WCA5–WCA17. → pages 1,2, 7Fang, Z., C. Lee, C. Da Silva, F. J. Herrmann, and R. Kuske, 2015,Uncertainty quantification for wavefield reconstruction inversion:Presented at the 77th EAGE Conference and Exhibition 2015, EAGE.→ pages 95Fang, Z., C. Y. Lee, C. Da Silva, F. J. Herrmann, and T. Van Leeuwen,2016, Uncertainty quantification for wavefield reconstruction inversionusing a PDE-free semidefinite Hessian and randomize-then-optimizemethod: 86th Annual International Meeting, SEG, Expanded Abstracts,1390–1394. → pages 95Fisher, M., M. Leutbecher, and G. Kelly, 2005, On the equivalencebetween Kalman smoothing and weak-constraint four-dimensionalvariational data assimilation: Quarterly Journal of the RoyalMeteorological Society, 131, 3235–3246. → pages 95French, W. S., 1975, Computer migration of oblique seismic reflectionprofiles: Geophysics, 40, 961–980. → pages 7Golub, G., and V. Pereyra, 2003, Separable nonlinear least squares: thevariable projection method and its applications: Inverse Problems, 19,R1. → pages 14, 40, 46, 99Golub, G. H., and C. F. Van Loan, 2012, Matrix computations: JohnsHopkins University Press, 3. → pages 112Haario, H., M. Laine, A. Mira, and E. Saksman, 2006, Dram: efficientadaptive MCMC: Statistics and Computing, 16, 339–354. → pages 94Haber, E., U. M. Ascher, and D. Oldenburg, 2000, On optimizationtechniques for solving nonlinear inverse problems: Inverse problems, 16,1263. → pages 93Haber, E., M. Chung, and F. Herrmann, 2012, An effective method forparameter estimation with PDE constraints with multiple right-handsides: SIAM Journal on Optimization, 22, 739–757. → pages 11Herrmann, F. J., C. R. Brown, Y. A. Erlangga, and P. P. Moghaddam,2009, Curvelet-based migration preconditioning and scaling: Geophysics,74, A41–A46. → pages 11143Herrmann, F. J., and X. Li, 2012, Efficient least-squares imaging withsparsity promotion and compressive sensing: Geophysical prospecting,60, 696–712. → pages 2, 7, 11, 22, 24Herrmann, F. J., P. P. Moghaddam, and C. Stolk, 2008, Sparsity- andcontinuity-promoting seismic image recovery with curvelet frames:Applied and Computational Harmonic Analysis, 24, 150–173. → pages11, 22, 29Herrmann, F. J., N. Tu, and E. Esser, 2015, Fast ”online” migration withCompressive Sensing: Presented at the 77th EAGE Conference andExhibition 2015, EAGE. → pages 22, 24Hinze, M., R. Pinnau, M. Ulbrich, and S. Ulbrich, 2008, Optimization withPDE constraints: Springer Science & Business Media, 23. → pages 12,94Huang, G., R. Nammour, and W. Symes, 2017, Full-waveform inversion viasource-receiver extension: Geophysics, 82(3), R153–R171. → pages 12Jaiswal, P., C. A. Zelt, A. W. Bally, and R. Dasgupta, 2008, 2-Dtraveltime and waveform inversion for improved seismic imaging: NagaThrust and Fold Belt, India: Geophysical Journal International, 173,642–658. → pages 2Kaipio, J., and E. Somersalo, 2006, Statistical and computational inverseproblems: Springer Science & Business Media. → pages 14, 16, 97, 98Kennett, B. L. N., M. S. Sambridge, and P. R. Williamson, 1988, Subspacemethods for large inverse problems with multiple parameter classes:Geophysical Journal International, 94, 237–247. → pages 50Kitanidis, P. K., 1995, Recent advances in geostatistical inference onhydrogeological variables: Reviews of Geophysics, 33, 1103–1109. →pages 114Lailly, P., et al., 1983, The seismic inverse problem as a sequence of beforestack migrations: Presented at the Conference on Inverse Scattering. →pages 2, 11, 37Li, M., J. Rickett, and A. Abubakar, 2013, Application of the variableprojection scheme for frequency-domain full-waveform inversion:Geophysics, 78(6), R249–R257. → pages 14, 46Li, Q.-Z., 2017, High-resolution seismic exploration: Society of ExplorationGeophysicists. → pages 2Li, X., A. Y. Aravkin, T. van Leeuwen, and F. J. Herrmann, 2012, Fastrandomized full-waveform inversion with compressive sensing:Geophysics, 77(3), A13–A17. → pages 2, 24, 57, 122Lieberman, C., K. Willcox, and O. Ghattas, 2010, Parameter and statemodel reduction for large-scale statistical inverse problems: SIAM144Journal on Scientific Computing, 32, 2523–2542. → pages 109Lim, S., 2017, Bayesian inverse problems and seismic inversion: PhDthesis, University of Oxford. → pages 95Liu, J., A. Abubakar, T. Habashy, D. Alumbaugh, E. Nichols, and G. Gao,2008, Nonlinear inversion approaches for cross-well electromagnetic datacollected in cased-wells: 78th Annual International Meeting, SEG,Expanded Abstracts, 304–308. → pages 45, 46Lorenz, D. A., F. Schopfer, and S. Wenger, 2014, The linearized bregmanmethod via split feasibility problems: Analysis and generalizations:SIAM Journal on Imaging Sciences, 7, 1237–1262. → pages 22, 24Martin, J., L. C. Wilcox, C. Burstedde, and O. Ghattas, 2012, A stochasticNewton MCMC method for large-scale statistical inverse problems withapplication to seismic inversion: SIAM Journal on Scientific Computing,34, A1460–A1487. → pages 16, 94, 109, 110Matheron, G., 2012, Estimating and choosing: an essay on probability inpractice: Springer Science & Business Media. → pages 16, 97, 109McMechan, G. A., 1983, Migration by extrapolation of time-dependentboundary values: Geophysical Prospecting, 31, 413–420. → pages 2Nemeth, T., C. Wu, and G. T. Schuster, 1999, Least-squares migration ofincomplete reflection data: Geophysics, 64, 208–221. → pages 11Nemirovski, A., A. Juditsky, G. Lan, and A. Shapiro, 2009, Robuststochastic approximation approach to stochastic programming: SIAMJournal on optimization, 19, 1574–1609. → pages 11, 24Nocedal, J., and S. J. Wright, 2000, Numerical optimization: Springer. →pages 90——–, 2006, Numerical optimization: Springer-Verlag New York. → pages40, 47, 52, 111Orieux, F., O. Fe´ron, and J.-F. Giovannelli, 2012, Samplinghigh-dimensional Gaussian distributions for general linear inverseproblems: Signal Processing Letters, IEEE, 19, 251–254. → pages 114Osypov, K., N. Ivanova, Y. Yang, A. Fournier, D. Nichols, R. Bachrach,C. E. Yarman, and D. Nikolenko, 2013a, Uncertainty as a Rosetta StoneUniting Geoscience Knowledge and E&P Business Value: Presented atthe IPTC 2013: International Petroleum Technology Conference. →pages 7Osypov, K., D. Nichols, M. Woodward, O. Zdraveva, C. Yarman, et al.,2008, Uncertainty and resolution analysis for anisotropic tomographyusing iterative eigendecomposition: Presented at the 2008 SEG AnnualMeeting, Society of Exploration Geophysicists. → pages 14Osypov, K., Y. Yang, A. Fournier, N. Ivanova, R. Bachrach, C. E. Yarman,145Y. You, D. Nichols, and M. Woodward, 2013b, Model-uncertaintyquantification in seismic tomography: method and applications:Geophysical Prospecting, 61, 1114–1134. → pages 2, 14, 16, 94, 98Paffenholz, J., J. Stefani, B. McLain, and K. Bishop, 2002, SIGSBEE 2Asynthetic subsalt dataset-image quality as function of migrationalgorithm and velocity model error: Presented at the 64th EAGEConference and Exhibition 2002, EAGE. → pages 23Papandreou, G., and A. L. Yuille, 2010, Gaussian sampling by localperturbations: Advances in Neural Information Processing Systems,1858–1866. → pages 114Peters, B., C. Greif, and F. J. Herrmann, 2015, An algorithm for solvingleast-squares problems with a Helmholtz block and multipleright-hand-sides: Presented at the International Conference OnPreconditioning Techniques For Scientific And Industrial Applications.→ pages 75Peters, B., and F. J. Herrmann, 2016, Constraints versus penalties foredge-preserving full-waveform inversion: The Leading Edge, 36, 94–100.→ pages 57Peters, B., F. J. Herrmann, and T. van Leeuwen, 2014, Wave-equationbased inversion with the penalty method: adjoint-state versuswavefield-reconstruction inversion: Presented at the 76th EAGEConference and Exhibition 2014, EAGE. → pages 12Plessix, R.-E., 2006, A review of the adjoint-state method for computingthe gradient of a functional with geophysical applications: GeophysicalJournal International, 167, 495–503. → pages 11, 45, 94Pratt, R. G., 1999, Seismic waveform inversion in the frequency domain,Part 1: Theory and verification in a physical scale model: Geophysics,64, 888–901. → pages 2, 11, 12, 14, 37, 45, 46, 52, 57, 93, 108Qin, B., T. Allemand, G. Lambare´, et al., 2015, Full waveform inversionusing preserved amplitude reverse time migration: Presented at the 2015SEG Annual Meeting, Society of Exploration Geophysicists. → pages 80Rawlinson, N., A. M. Reading, and B. L. Kennett, 2006, Lithosphericstructure of tasmania from a novel form of teleseismic tomography:Journal of Geophysical Research: Solid Earth, 111. → pages 50Roberts, G. O., J. S. Rosenthal, et al., 2001, Optimal scaling for variousMetropolis-Hastings algorithms: Statistical Science, 16, 351–367. →pages 16, 94Rosasco, L., E. De Vito, A. Caponnetto, M. Piana, and A. Verri, 2004, Areloss functions all the same?: Neural Computation, 16, 1063–1076. →pages 26146Rue, H., 2001, Fast sampling of Gaussian Markov random fields: Journalof the Royal Statistical Society: Series B (Statistical Methodology), 63,325–338. → pages 108Sambridge, M., 1990, Non-linear arrival time inversion: constrainingvelocity anomalies by seeking smooth models in 3-d: GeophysicalJournal International, 102, 653–677. → pages 50Sambridge, M., K. Gallagher, A. Jackson, and P. Rickwood, 2006,Trans-dimensional inverse problems, model comparison and the evidence:Geophysical Journal International, 167, 528–542. → pages 14, 99Scales, J. A., and L. Tenorio, 2001, Prior information and uncertainty ininverse problems: Geophysics, 66, 389–397. → pages 14Schmidt, M., D. Kim, and S. Sra, 2012, 11, in Projected Newton-typeMethods in Machine Learning: MIT Press, 35, 305–327. → pages 84Schneider, W. A., 1978, Integral formulation for migration in two andthree dimensions: Geophysics, 43, 49–76. → pages 7Sen, M., and I. Roy, 2003, Computation of differential seismograms anditeration adaptive regularization in prestack waveform inversion:Geophysics, 68, 2026–2039. → pages 90Shapiro, A., D. Dentcheva, and A. Ruszczyn´ski, 2009, Lectures onstochastic programming: modeling and theory: SIAM. → pages 11, 24Sheriff, R. E., and L. P. Geldart, 1995, Exploration seismology: Cambridgeuniversity press. → pages 1Sirgue, L., O. Barkved, J. Dellinger, J. Etgen, U. Albertin, and J.Kommedal, 2010, Thematic set: Full waveform inversion: The next leapforward in imaging at valhall: First Break, 28, 65–70. → pages 11, 37Sirgue, L., and R. G. Pratt, 2004, Efficient waveform inversion andimaging: A strategy for selecting temporal frequencies: Geophysics, 69,231–248. → pages 12, 37Solonen, A., A. Bibov, J. M. Bardsley, and H. Haario, 2014,Optimization-based sampling in ensemble Kalman filtering:International Journal for Uncertainty Quantification, 4. → pages 114Stuart, G., W. Yang, S. Minkoff, and F. Pereira, 2016, A two-stage Markovchain Monte Carlo method for velocity estimation and uncertaintyquantification: 86th Annual International Meeting, SEG, ExpandedAbstracts, 3682–3687. → pages 94Sun, D., K. Jiao, D. Vigh, and R. Coates, 2014, Source wavelet estimationin full waveform inversion: Presented at the 76th EAGE Conference andExhibition 2014, EAGE. → pages 12Symes, W. W., D. Sun, and M. Enriquez, 2011, From modelling toinversion: designing a well-adapted simulator: Geophysical Prospecting,14759, 814–833. → pages 64Tarantola, A., 1987, Inverse problem theory: Methods for data fitting andparameter estimation. → pages 94——–, 2005, Inverse problem theory and methods for model parameterestimation: siam, 89. → pages 94Tarantola, A., and B. Valette, 1982a, Generalized nonlinear inverseproblems solved using the least squares criterion: Reviews ofGeophysics, 20, 219–232. → pages 2, 11, 37, 93——–, 1982b, Inverse problems = quest for information: Journal ofGeophysics, 50, 159–170. → pages 14, 94Tu, N., A. Aravkin, T. van Leeuwen, T. Lin, and F. J. Herrmann, 2016,Source estimation with surface-related multiples—fastambiguity-resolved seismic imaging: Geophysical Journal International,205, 1492–1511. → pages 10Tu, N., A. Y. Aravkin, T. van Leeuwen, and F. J. Herrmann, 2013, Fastleast-squares migration with multiples and source estimation: Presentedat the 75th EAGE Conference and Exhibition 2013, EAGE. → pages 22,24Tu, N., and F. J. Herrmann, 2015a, Fast imaging with surface-relatedmultiples by sparse inversion: Geophysical Journal International, 201,304–317. → pages 25——–, 2015b, Fast least-squares imaging with surface-related multiples:Application to a north sea data set: The Leading Edge, 34, 788–794. →pages 2, 7, 11van Leeuwen, T., A. Y. Aravkin, and F. J. Herrmann, 2011, Seismicwaveform inversion by stochastic optimization: International Journal ofGeophysics, 2011. → pages 25——–, 2014, Comment on: “Application of the variable projection schemefor frequency-domain full-waveform inversion” (M. Li, J. Rickett, and A.Abubakar, Geophysics, 78, no. 6, R249–R257): Geophysics, 79(3),X11–X17. → pages 46, 121van Leeuwen, T., and F. J. Herrmann, 2013a, Fast waveform inversionwithout source-encoding: Geophysical Prospecting, 61, 10–19. → pages2, 52, 57, 111, 122——–, 2013b, Mitigating local minima in full-waveform inversion byexpanding the search space: Geophysical Journal International, 195,661–667. → pages 12, 37, 38, 39, 40, 41, 42, 85, 94, 98, 99——–, 2015, A penalty method for PDE-constrained optimization ininverse problems: Inverse Problems, 32, 015007. → pages 12, 37, 38, 39,40, 41, 52, 56, 58, 94, 95, 98, 99, 105, 111, 112148Vigh, D., K. Jiao, W. Huang, N. Moldoveanu, and J. Kapoor, 2013,Long-offset-aided full-waveform inversion: Presented at the 75th EAGEConference and Exhibition 2013, EAGE. → pages 12Vigh, D., K. Jiao, D. Watts, and D. Sun, 2014, Elastic full-waveforminversion application using multicomponent measurements of seismicdata collection: Geophysics, 79, R63–R77. → pages 11, 37Virieux, J., and S. Operto, 2009, An overview of full-waveform inversion inexploration geophysics: Geophysics, 74(6), WCC1–WCC26. → pages 1,2, 11, 14, 37, 39, 45, 70, 87, 93, 94, 128Warner, M., and L. Guasch, 2014, Adaptive Waveform Inversion-FWIWithout Cycle Skipping-Theory: Presented at the 76th EAGEConference and Exhibition 2014, EAGE. → pages 12Warner, M., T. Nangoo, N. Shah, A. Umpleby, J. Morgan, et al., 2013a,Full-waveform inversion of cycle-skipped seismic data by frequencydown-shifting: 83th Annual International Meeting, SEG, ExpandedAbstracts, 903–907. → pages 12, 39Warner, M., A. Ratcliffe, T. Nangoo, J. Morgan, A. Umpleby, N. Shah, V.Vinje, I. Sˇtekl, L. Guasch, C. Win, et al., 2013b, Anisotropic 3Dfull-waveform inversion: Geophysics, 78, R59–R80. → pages 2, 11, 37Whitmore, N., 1983, Iterative depth migration by backward timepropagation, in SEG Technical Program Expanded Abstracts 1983:Society of Exploration Geophysicists, 382–385. → pages 2Yilmaz, O¨., 2001, Seismic data analysis: Society of ExplorationGeophysicists Tulsa, 1. → pages 7Ying, L., L. Demanet, and E. Candes, 2005, 3D discrete curvelettransform: Wavelets XI, International Society for Optics and Photonics,591413. → pages 22Zhou, B., and S. A. Greenhalgh, 2003, Crosshole seismic inversion withnormalized full-waveform amplitude data: Geophysics, 68, 1320–1330.→ pages 12, 46Zhu, H., S. Li, S. Fomel, G. Stadler, and O. Ghattas, 2016, A Bayesianapproach to estimate uncertainty for full-waveform inversion using apriori information from depth migration: Geophysics, 81(5),R307–R323. → pages 16, 94, 106149
- Library Home /
- Search Collections /
- Open Collections /
- Browse Collections /
- UBC Theses and Dissertations /
- Source estimation and uncertainty quantification for...
Open Collections
UBC Theses and Dissertations
Featured Collection
UBC Theses and Dissertations
Source estimation and uncertainty quantification for wave-equation based seismic imaging and inversion Fang, Zhilong 2018
pdf
Page Metadata
Item Metadata
Title | Source estimation and uncertainty quantification for wave-equation based seismic imaging and inversion |
Creator |
Fang, Zhilong |
Publisher | University of British Columbia |
Date Issued | 2018 |
Description | In modern seismic exploration, wave-equation-based inversion and imaging approaches are widely employed for their potential of creating high-resolution subsurface images from seismic data by using the wave equation to describe the underlying physical model of wave propagation. Despite their successful practical applications, some key issues remain unsolved, including local minima, unknown sources, and the largely missing uncertainty analyses for the inversion. This thesis aims to address the following two aspects: to perform the inversion without prior knowledge of sources, and to quantify uncertainties in the inversion. The unknown source can hinder the success of wave-equation-based approaches. A simple time shift in the source can lead to misplaced reflectors in linearized inversions or large disturbances in nonlinear problems. Unfortunately, accurate sources are typically unknown in real problems. The first major contribution of this thesis is, given the fact that the wave equation linearly depends on the sources, I have proposed on-the-fly source estimation techniques for the following wave-equation-based approaches: (1) time-domain sparsity-promoting least-squares reverse-time migration; and (2) wavefield-reconstruction inversion. Considering the linear dependence of the wave equation on the sources, I project out the sources by solving a linear least-squares problem, which enables us to conduct successful wave-equation-based inversions without prior knowledge of the sources. Wave-equation-based approaches also produce uncertainties in the resulting velocity model due to the noisy data, which would influence the subsequent exploration and financial decisions. The difficulties related to practical uncertainty quantification lie in: (1) expensive computation related to wave-equation solves, and (2) the nonlinear parameter-to-data map. The second major contribution of this thesis is the proposal of a computationally feasible Bayesian framework to analyze uncertainties in the resulting velocity models. Through relaxing the wave-equation constraints, I obtain a less nonlinear parameter-to-data map and a posterior distribution that can be adequately approximated by a Gaussian distribution. I derive an implicit formulation to construct the covariance matrix of the Gaussian distribution, which allows us to sample the Gaussian distribution in a computationally efficient manner. I demonstrate that the proposed Bayesian framework can provide adequately accurate uncertainty analyses for intermediate to large-scale problems with an acceptable computational cost. |
Genre |
Thesis/Dissertation |
Type |
Text |
Language | eng |
Date Available | 2018-05-01 |
Provider | Vancouver : University of British Columbia Library |
Rights | Attribution-NonCommercial-NoDerivatives 4.0 International |
DOI | 10.14288/1.0366129 |
URI | http://hdl.handle.net/2429/65731 |
Degree |
Doctor of Philosophy - PhD |
Program |
Geophysics |
Affiliation |
Science, Faculty of Earth, Ocean and Atmospheric Sciences, Department of |
Degree Grantor | University of British Columbia |
GraduationDate | 2018-09 |
Campus |
UBCV |
Scholarly Level | Graduate |
Rights URI | http://creativecommons.org/licenses/by-nc-nd/4.0/ |
AggregatedSourceRepository | DSpace |
Download
- Media
- 24-ubc_2018_september_fang_zhilong.pdf [ 4.62MB ]
- Metadata
- JSON: 24-1.0366129.json
- JSON-LD: 24-1.0366129-ld.json
- RDF/XML (Pretty): 24-1.0366129-rdf.xml
- RDF/JSON: 24-1.0366129-rdf.json
- Turtle: 24-1.0366129-turtle.txt
- N-Triples: 24-1.0366129-rdf-ntriples.txt
- Original Record: 24-1.0366129-source.json
- Full Text
- 24-1.0366129-fulltext.txt
- Citation
- 24-1.0366129.ris
Full Text
Cite
Citation Scheme:
Usage Statistics
Share
Embed
Customize your widget with the following options, then copy and paste the code below into the HTML
of your page to embed this item in your website.
<div id="ubcOpenCollectionsWidgetDisplay">
<script id="ubcOpenCollectionsWidget"
src="{[{embed.src}]}"
data-item="{[{embed.item}]}"
data-collection="{[{embed.collection}]}"
data-metadata="{[{embed.showMetadata}]}"
data-width="{[{embed.width}]}"
async >
</script>
</div>
Our image viewer uses the IIIF 2.0 standard.
To load this item in other compatible viewers, use this url:
https://iiif.library.ubc.ca/presentation/dsp.24.1-0366129/manifest