@prefix vivo: . @prefix edm: . @prefix ns0: . @prefix dcterms: . @prefix skos: . vivo:departmentOrSchool "Science, Faculty of"@en, "Mathematics, Department of"@en ; edm:dataProvider "DSpace"@en ; ns0:degreeCampus "UBCV"@en ; dcterms:creator "Liu, Michelle"@en ; dcterms:issued "2018-01-19T16:57:58Z"@en, "2018"@en ; vivo:relatedDegree "Master of Science - MSc"@en ; ns0:degreeGrantor "University of British Columbia"@en ; dcterms:description """Optimization problems with PDE constraints arise in many applications of science and engineering. The PDEs often describe physical systems and are characterized by a set of parameters, such as the material properties of the system, or velocity of the wave traveling through the earth’s subsurface. Usually the parameters cannot be directly measured and can only be inferred by solving inverse problems. Solving large scale inverse problems usually require prohibitively large number of PDE solves. As a result, various dimensionality reduction methods have been proposed to reduce the number of PDE solves. This work builds on a type of dimensionality method called the Simultaneous Source (SS) method. This method relies on random sampling and the stochastic trace estimator to reduce the number of PDE solves to a more manageable size. One of the limitations of the SS method is it can only be applied to data sets with no missing entries. Unfortunately, data sets with missing entries are common in practice, such as in geophysical applications. In this thesis, we propose a new coupled optimization problem that extend the SS method to data with missing entries. The block coordinate descent method (BCDM) is used to break the coupled problem into two subproblems. The first subproblem optimizes over the data and fills the missing entries by the rank minimization technique. The second subproblem minimizes over the model parameter and uses SS method to reduce the number of PDE solves into a more manageable size. We test our method in the context of a full-waveform inversion (FWI) problem and present our numerical results in which we are able to reduce the number of PDE solves by 98% with less than 2% difference between the model using 1 PDE solve and the model using full PDE solves."""@en ; edm:aggregatedCHO "https://circle.library.ubc.ca/rest/handle/2429/64406?expand=metadata"@en ; skos:note "Dimensionality Reduction for Solving Large Scale Inverse Problems withPDE Constraints through Simultaneous Source MethodbyMichelle LiuB. Sc Geophysics, The University of British Columbia, 2015A THESIS SUBMITTED IN PARTIAL FULFILLMENTOF THE REQUIREMENTS FOR THE DEGREE OFMaster of ScienceinTHE FACULTY OF GRADUATE AND POSTDOCTORAL STUDIES(Mathematics)The University of British Columbia(Vancouver)January 2018cMichelle Liu, 2018AbstractOptimization problems with PDE constraints arise in many applications of science and engineering. ThePDEs often describe physical systems and are characterized by a set of parameters, such as the materialproperties of the system, or velocity of the wave traveling through the earth’s subsurface. Usually theparameters cannot be directly measured and can only be inferred by solving inverse problems. Solvinglarge scale inverse problems usually require prohibitively large number of PDE solves. As a result, variousdimensionality reduction methods have been proposed to reduce the number of PDE solves. This workbuilds on a type of dimensionality method called the Simultaneous Source (SS) method. This methodrelies on random sampling and the stochastic trace estimator to reduce the number of PDE solves to amore manageable size. One of the limitations of the SS method is it can only be applied to data sets with nomissing entries. Unfortunately, data sets with missing entries are common in practice, such as in geophysicalapplications.In this thesis, we propose a new coupled optimization problem that extend the SS method to data withmissing entries. The block coordinate descent method (BCDM) is used to break the coupled problem intotwo subproblems. The first subproblem optimizes over the data and fills the missing entries by the rankminimization technique. The second subproblem minimizes over the model parameter and uses SS methodto reduce the number of PDE solves into a more manageable size. We test our method in the context of afull-waveform inversion (FWI) problem and present our numerical results in which we are able to reducethe number of PDE solves by 98% with less than 2% difference between the model using 1 PDE solve andthe model using full PDE solves.iiLay SummaryThe main computational bottleneck for solving large scale optimization problems with PDE constraints issolving for the PDEs. The Simultaneous Source (SS) method has been used to reduce the number of PDEsolves in order to reduce the computational cost. However, one of the limitation of this method is it canonly be applied to data sets with no missing entries. In this thesis, we propose a new coupled optimizationproblem to overcome this limitation.iiiPrefaceThis thesis contains my original research conducted under the supervision of Dr Eldad Haber during myM.Sc studies at the University of British Columbia. The idea of coupled inverse problem presented inchapter 4 came from conversations with Dr Haber. The derivations, code implementation, and numericalexperiments were carried out on my own with some support and suggestions from Dr Haber.ivTable of ContentsAbstract . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . iiLay Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . iiiPreface . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . ivTable of Contents . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . vList of Tables . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . viiList of Figures . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . viiiAcknowledgments . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . x1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12 Inverse Problems . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 32.1 Inverse Problem Definitions and Terminology . . . . . . . . . . . . . . . . . . . . . . . . . 32.2 Approach for Solving Large Scale Inverse Problems with PDE Constraints . . . . . . . . . . 42.2.1 Iterative Methods . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 53 Dimensionality Reduction through Simultaneous Source Method . . . . . . . . . . . . . . . . 73.1 Simultaneous Source with Random Sampling . . . . . . . . . . . . . . . . . . . . . . . . . 73.1.1 Some Important Properties of Monte Carlo Approximation and SAA Method . . . . 93.1.2 Choosing an Estimator . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 93.2 Simultaneous Source Method with Deterministic Sampling . . . . . . . . . . . . . . . . . . 104 The SS Method for Data with Missing Entries . . . . . . . . . . . . . . . . . . . . . . . . . . 124.1 Data Completion through Rank Minimization Method . . . . . . . . . . . . . . . . . . . . . 134.1.1 Low-Rank Matrix Completion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 134.1.2 Low Rank Matrices . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 144.1.3 Low rank minimization and convex relaxation . . . . . . . . . . . . . . . . . . . . . 144.1.4 Incoherent matrices . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 15v4.2 Coupled Inverse Problem . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 174.2.1 Importance of the Coupled term in (4.7) . . . . . . . . . . . . . . . . . . . . . . . . 174.2.2 Block Coordinate Descent Method (BCDM) . . . . . . . . . . . . . . . . . . . . . 194.3 Error Estimation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 214.3.1 Confidence Interval . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 224.3.2 Computing the Gap in (4.17) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 235 Full Wave-Form Inversion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 275.1 Experimental Setting and Numerical Setup . . . . . . . . . . . . . . . . . . . . . . . . . . . 275.1.1 Discretization . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 285.2 Tuning the parameters . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 295.3 Low Rank Data . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 305.4 Numerical Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 315.4.1 Boxed Model Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 315.4.2 Marmousi Model . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 445.5 Future Work . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 456 Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 47Bibliography . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 48A Stochastic Trace Estimator . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 50viList of TablesTable 5.1 Summary of results for 25% entries missing data . . . . . . . . . . . . . . . . . . . . . . 34Table 5.2 Summary of results for 50% entries missing data . . . . . . . . . . . . . . . . . . . . . . 36Table 5.3 Summary of results for 75% entries missing data . . . . . . . . . . . . . . . . . . . . . . 38Table 5.4 Summary of results for 85% entries missing data . . . . . . . . . . . . . . . . . . . . . . 40Table 5.5 Summary of results for 90% entries missing data . . . . . . . . . . . . . . . . . . . . . . 42Table 5.6 Marmusi Model: 30% missing data . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 44viiList of FiguresFigure 3.1 Fast decaying singular values . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10Figure 4.1 The plane cuts at DFull = D⇤Full along the m axis. Image generated from [1].We willemphasize that purpose of this figure is to only provide us with some intuition of whatmight be wrong with the First Approach. The graph is not the the actual graph ofy(m,DFull). . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 18Figure 4.2 Recovered model from data with 50% missing entries. 1PDE solve were used per misfitto recover the models. misfit = 12a2 || MaskDFull Dobs ||F +12 || P0H1(m)QDFull ||F . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 19Figure 4.3 D⇤Full represents the recovered data for 50%missing entries, andm⇤ was computed using1PDE solved at every iteration (see section 5.4.1 for complete results). The confidenceintervals have some associated probability of capturing the true mean y(m⇤,D⇤Full) (rep-resented in blue star in the graph). The intervals in the plot all successfully captured thetrue mean, except for the 68% confidence interval at L= 5e5. This is no surprise if wethat a 68% confidence interval means that there is a 68% chance the interval containsthe true mean, and 32% chance that it will not. Notice that as the sample size increases,the confidence interval decreases as expected. . . . . . . . . . . . . . . . . . . . . . . . 25Figure 4.4 The histogram on the right hand side shows a sampling distribution for yˆL(m⇤,D⇤Full).The histogram consists of 5000 sample points. Sample size L=200. The QQ plot isapproximately linear which indicates the sampling distribution is approximately normal. 26Figure 4.5 The histogram on the right hand side shows a sampling distribution for yˆ(m⇤,D⇤Full).The distribution consists of 5000 sample points. The sample size is L=5. The QQplot is not linear. This suggests that the sampling distribution is not normal. This isre-confirmed by the histogram, which is slightly skewed. Thus, the chosen sample sizeis too small . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 26Figure 5.1 Marine acquisition [2] . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 28Figure 5.2 figure 1a: The air gun (represented by the yellow star) is fired at every 20 km as itis moving towards the left of the page. Reflected wave gets recorded by a stream ofreceivers (red dots) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 30Figure 5.3 The singular values for Dtrue. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 31viiiFigure 5.4 True box model and data. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 32Figure 5.5 Recovered model from data with 25% missing entries. . . . . . . . . . . . . . . . . . . 33Figure 5.6 DDeterministic in the above title represents the recovered data by solving the full 60 PDEsfor one misfit. Similarly, D1PDE represents 1 PDE solved for one misfit. . . . . . . . . . 33Figure 5.7 The misfit in the plot was computed in the main (i.e. outer) iteration. We defined themisfit as: misfit = 12a2 ||MaskDFullDobs ||F + 12 || P0H1(m)QDFull ||F . Pleasebe aware that the graphs are converging to zero as the number of iterations increase.This is not the same as saying the misfit has reached zero. . . . . . . . . . . . . . . . . 34Figure 5.8 Recovered model from data with 50% missing entries. . . . . . . . . . . . . . . . . . . 35Figure 5.9 D60PDE in the above title represents the recovered data by solving the full 60 PDEs forone misfit. Similarly, D1PDE represents the recovered data by solving 1 PDE solved forone misfit. The same definition applies to D10PDE. . . . . . . . . . . . . . . . . . . . . . 35Figure 5.10 The misfit in the plot was computed in the main (outer) iteration. We defined the misfitas: misfit = 12a2 || MaskDFull Dobs ||F + 12 || P0H1(m)QDFull ||F . Please beaware that the graphs are converging to some constant value as the number of iterationsincrease. This is not the same as saying the graphs has reached a constant value. . . . . 36Figure 5.11 Recovered model from 75% missing entries data. . . . . . . . . . . . . . . . . . . . . . 37Figure 5.12 Recovered Data . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 37Figure 5.13 The misfit in the plot was computed in the main (i.e. outer) iteration. We defined themisfit as: misfit = 12a2 ||MaskDFullDobs ||F + 12 || P0H1(m)QDFull ||F . . . . . 38Figure 5.14 Recovered model from 85% missing entries data. . . . . . . . . . . . . . . . . . . . . . 39Figure 5.15 Recovered Data . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 39Figure 5.16 The misfit in the plot was computed in the main (i.e. outer) iteration. We defined themisfit as: misfit = 12a2 ||MaskDFullDobs ||F + 12 || P0H1(m)QDFull ||F . . . . . 40Figure 5.17 Recovered model from 90% missing entries data. . . . . . . . . . . . . . . . . . . . . . 41Figure 5.18 Recovered Data . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 41Figure 5.19 The misfit in the plot was computed in the main (i.e. outer) iteration. We defined themisfit as: misfit = 12a2 ||MaskDFullDobs ||F + 12 || P0H1(m)QDFull ||F . . . . . 42Figure 5.20 We define%Data difference= DtrueDDetermisiticDtrue ⇤100% and%Model Difference= mtruemDeterministicmDeterministic ⇤100%. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 43Figure 5.21 Marmusi Model . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 44ixAcknowledgmentsFirst and foremost, I would like to thank my advisor Dr Eldad Haber for providing me the opportunity tojoin his research group, and for giving me such a fun problem to work on. Given how much math terrifiedand hurt my brain in elementary school, I don’t think anyone including myself would have believed thatI would go onto university to pursue a higher degree in mathematics. This degree would not have beenpossible without the support of many great great people throughout my academic career. Special thanks tomy high school physics teacher Gordon Wiegele and math teacher Michael Partington.Thanks to my officemates Luz, Patrick, Daniel, Sanna, and Wenke for their daily warm morning wel-come and for creating such a pleasant work environment. Many thanks to Rajiv, Eran, Gary, and Uri forhelpful discussions on my work.Last but not least, I would like to thank my wonderful family and friends outside of UBC for theirendless encouragement and support.xChapter 1IntroductionPhysical systems are often represented by mathematical models. The mathematical models contains a setof model parameters (or model) that characterize the systems and their behavior. The model parametersusually cannot be directly measure but they can be inferred by solving inverse problems. Large scale inverseproblems involving PDEs arise in many applications of science and engineering and require a large numberof measurements in order to obtain a reasonable reconstructions of the model. For example, seismic data areoften acquired by oil and gas industries in attempt to get subsurface information of the Earth by inferring thevelocity model. In marine acquisition, seismic data are obtained by a ship towing a compressed air gun (i.e.a seismic source) and a stream of receivers. The air gun is usually fired at a regular interval, and producesacoustic wave that propagate deep into the ocean floor. Seismic data is collected when part of the wave getsreflected from beneath the sea floor and back to the surface where it gets recorded by the receivers. Fullwave form inversion is performed on the data to obtain the velocity model (i.e. the model). The modelcan then be used to provide subsurface image of the Earth. Seismic acquisitions usually involve manyshots (or experiments) in order to get a credible reconstruction image of the Earth’s subsurface. Each shotcorresponds to one PDE solve. Solving PDEs can be computationally intensive. The computational cost andmemory increase linearly with the number of experiments. Thus, the computational cost for solving largescale inverse problems is high.Indeed, an efficient algorithm would be one that can reduce the number of PDE solved and can stillgive a credible reconstructions of the model at a lower cost [19]. The use of the Simultaneous Source (SS)method have demonstrated some success in reducing the number of PDE solve (see [19][14] [13]). Themethod is a type of dimensionality reduction method that relies on random sampling and the stochastic traceestimator to reduce the number of PDE solves to a more manageable size.One of the limitations of this method is that it can only be applied to data set with no missing entries.When a data set has missing entries, the simultaneous source term is lost in the mathematical formulationof the problem. Unfortunately, data with missing measurements are common in practice. In the exampleprovided above, seismic data with missing entries can happen when experiments are not allowed to beperformed at certain regions during the acquisition due to environmental reasons. As a result, the data willhave some missing measurements which corresponds to regions where data could not be collected.One way to overcome this limitation is to fill in those missing entries. Various approaches have been1proposed (see [13], [21], [20], [16]). Nuclear norm minimization technique is one of the most cited and wellstudied data completion technique to date. One of the reason is because it has demonstrated great success inrecovering full data set from a small subset of the data. In this thesis,, we extend the SS method to data withmissing entries by filling those missing entries using nuclear norm minimization technique. We propose anew coupled optimization problem that allows us to apply the SS method to data with missing entries. Thecoupled problem can be broken down into two optimization subproblems. The first subproblem fills the databy minimizing over the data set. The second subproblem uses the simultaneous source method to reducethe PDE solves to a more manageable size. We tests our method in the context of full wave form inversionproblem in which we are able to reduce 60 PDEs solves to 1 PDE for data up to 75% missing entries.The outline of this thesis is as follow. In chapter 2, we review the mathematical framework for inverseproblems with PDE constraints. In chapter 3, we introduce the SS method. In chapter 4, we propose a newalgorithm that extends the SS method to data with missing entries. In chapter 5, we test our method in thecontext of full-wave form inversion and show our results.2Chapter 2Inverse ProblemsInverse problems arise in many applications of science and engineering. People in this area of disciplineare likely to have encountered and solved inverse problems at least once during their practice. For example,curve fitting is a popular inverse problem. Given a set of data point and a curve, a set of coefficients (i.e.the model) that gives the best fit is computed by solving an inverse problem. This chapter is organized asfollow. In section 2.1, we review some important terminology in inverse theory. In section 2.2, we introduceinverse problems with PDE constraints.2.1 Inverse Problem Definitions and TerminologyConsider the following mathematical model:F(m) = d (2.1)where F represent the forward operator, which often takes the form of one or more equations that relate themodel to the data. For example, a forward operator can be a wave equation that relates the measured traveltime (i.e. data) to the wave velocity (i.e. the model). The problem in (2.1) is a forward problem if we aregiven a model m and are asked to predict the data d. On the other hand, (2.1) is an inverse problem if weare given d and are asked to compute the model m. Unlike in a forward problem, the solution of an inverseproblem is not unique (i.e. there are more than onem that gives the same d).It might be worth pointing out at this point that there are some inconsistency in terminology betweenmathematicians and scientists [3]. Mathematicians usually refer to F(m)= d as the mathematical model (orjust the model) and to m as the parameter. On the other hand, scientists often refer to F as the forwardoperator and to m as the model. We will be adopting the scientist terminology from this point forward.A model m2 Rlm consists of model parameters. These model parameters can be parameters (such asseismic velocity, density, resistivity) that characterize the system, or they can be coefficients such as incurve fitting. Since measured data d always comes in discrete form in practice, the numerical recoveredmodel will also be discrete.32.2 Approach for Solving Large Scale Inverse Problems with PDEConstraintsBy large scale or large dimensional problem, we mean a large data set or large number of experiments thatresulted in a large data set. The goal of solving an inverse problem is to recover the model of the data. This isoften accomplished by solving an optimization problem where we find a model parameterm that minimizesthe least square misfit between the observed and predicted data within some tolerance. i.e.argminmNÂi=1|| Fi(m)di ||22 (2.2)where Fi represents a forward operator. The chosen norm in (2.2) is problem dependent. For most inverseproblems, l2-norm is a popular choice. The reason is due to maximum likelihood estimation (MLE), whichsays the model m that maximizes the likelihood of getting the data set {di}Ni=1 as an outcome is the onethat solves the least square problem. Throughout this entire thesis, we will be restricting ourselves to thefollowing PDE-constrained inverse problem:argminu1,...,uN,m12NNÂi=1|| Piuidi ||22subject to ci(ui,m) :=H(m)uiqi = 0, i= 1, . . . ,N.(2.3)where m 2 Rlm represents the discretized model, qi 2 Rlq represents the ith source that emitted the fieldui 2 Rlu , Pi is the matrix that maps the discretized field ui to the location of where the data di 2 Rl wascollected, H is a squared matrix discretizing PDE plus boundary conditions. We assume it is possible tocompute the field uj given m. With this assumption, (2.3) can be written in the form of (2.2) as follows:argminmNÂi=1|| PiH1(m)qidi ||2 (2.4)where Fi(m) = PiH1(m)qi is called the forward problem. Since inverse problem is often ill-posed, we adda regularizational function R(m) to (2.4):argminmNÂi=1|| diPiH1(m)qi ||2 +aR(m), (2.5)If P= Pi 8 i, then (2.5) can be expressed in Frobenius norm:argminm|| PH1(m)QD ||2F +aR(m), (2.6)where D2 Rl⇥N , Q2 Rl⇥N . a is a regularization parameter and it can be determined using either crossvalidation method, or by trial and error. Throughout this entire thesis, we will use the word source andexperiment interchangeably.42.2.1 Iterative MethodsSuppose we want to find an m that minimizes the misfit f(m) =|| DF(m) ||2F in (2.6), where F(m) =PH1(m)Q. Given the sensitivity matrix,J(m) = dF(m)dmwe compute the gradientg= —f(m) = JT(Fi(m)D),and approximate the (Gauss Newton) Hessian —2f(m):—2f(m) = JTJ+(F(m)D)dJT(m)dm⇡ S= JTJ.ConsiderSk dmk =gk (2.7)andmk+1 =mk+akdmk, (2.8)where dmk is a search direction, and gk = —f(mk). ak in (2.8) represents a step size and can be computedusing Armijo backtracking line search. Since we want to minimize the misfit, we require dmk to be a descentdirection (i.e. dmkgk < 0). This happens when S1k is symmetric positive definite (SPD). When S1k is SPD,gTkdmk = gTkS1k gk < 0. Different S1k corresponds to different iterative solver. Gauss Newton iterativemethod is the method of our choice. This is because Sk = JTJ is semi-definite and so the search direction isguaranteed to be a descent direction at every iteration .Computing the search direction dk requires solving the gauss newton linear system in (2.7). Performingexplicit matrix-vector product (i.e. (JTJ)gk) at every iteration is numerically expensive and not practical forlarge scale problems. Instead, we want to perform the product implicitly in order to reduce the computationalcost and storage. Since Sk is SPD in (2.7), the method of choice is the preconditioned conjugate gradient(PCG) method. In this method, we do not compute the sensitivity matrix J explicitly. Rather we workwith only matrix-vector products z= Jgk and its adjoint JTz. Detail and theoretical discussion of conjugategradient method can be found in [18][24]. We used the Laplacian as our preconditioner.The Gauss Newton method mentioned above is summarized in the following algorithm [12]:If NCG denotes the number of outer Gauss Newton iterations, NPCG the average number of PCG itera-tions, and there are N sources, then the work estimate for the algorithm above is:work estimate= (2NPCG+2)⇥NCG⇥Nsources5Algorithm 1 Gauss Newton1. Initialize m02. while not converge3. Compute f(m) and —fm(m) (this requires solving 2 forward problems)4. Solve the system in (2.7) using PCG. (This costs 2NPCG solutions of the forward problem becausewe have to compute J and JT)5. Setm m+akdmk6. end whileIndeed, the greater the number of sources, the greater the work estimate. This demonstrates the value ofusing the SS method. If the SS method can be applied to reduce Nsources to K sources (where K <> 1 sources. Assume all measurements are28performed at the same location (i.e. Pi = P 8 i), then the forward problem that gives the predicted data for ssources is:D= PH1(m)Q (5.3)where Q= [q1, ...,qs] 2 Rlq⇥s, and D= [d1, ...,ds] 2 Rl⇥s. We will be looking at two models: 1) the boxedmodel and 2) the Marmousi model. To get the data DTrue, we plug the model into (5.3). To generatea synthetic observed data, we compute Dobs =MDTrue, where M is the masking matrix that randomlyremoves the columns of the true data. The missing columns represents broken sources or receivers duringacquisitions.We recover the model from the data with missing columns by solving the problem in (4.7).Results can be found in section 5.4.5.2 Tuning the parametersThe following are the general approach we used to tune our parameters.Choosing grid size hFor our numerical example, we let 1 wavelength span 20 grid points. This corresonds to h⇥h= 0.05km⇥0.05km.Choosing the wavefield velocityFor the boxed model, we chose the initial velocity to be the surface velocity, which is 1e-4 m/s.Setting the angular frequency w in the Helmholtz equationGiven that we know the velocity and wavelength of the wave field, we can compute its frequency:f =vl=v10⇤hand subsequently its angular frequency :w = 2p fAttenuation function g = g(A,q)The attenuation parameter in (5.1) is defined as: g(A,q) = A(eqX1 + eqX2 + eqX3) where X1, X2 and X3represents the left, right, and bottom boundary. The amplitude A represents the intensity of the absorptionat the boundary, and q is inversely proportional to the thickness of the absorbing boundary. In other words,the greater the q , the thinner the boundary. Conversely, the smaller the q , the thinner the boundary. Wewant to choose q s.t. the thickness of the absorbing boundary approximately equals l so that the wave getsabsorbed. If A= 0, then g = 0. In this case, nothing gets absorbed and we get some reflections from theboundary.295.3 Low Rank DataRecall that in order to perform matrix completion on our data, the data has to be either low rank or approxi-mately low rank. A low rank matrix can be loosely defined as when many of its rows are linearly dependentof each other. Let the rows of the observed matrix Dobs represent the sources, and the columns represent thereceivers. The entries of the data di j can then be viewed as the measurement made at the jth receiver dueto the ith source. Two rows would be approximately linearly dependent if the receivers make very similarmeasurements from two different experiments (i.e. shots). To illustrate how this can happen in practice,consider figure 5.2 where we assume all experiments are measured by the same set of receivers:Figure 5.2: figure 1a: The air gun (represented by the yellow star) is fired at every 20 km as it ismoving towards the left of the page. Reflected wave gets recorded by a stream of receivers (reddots)In figure 2, the ship fires the air gun (i.e. the source) at every 20 km. i.e. q1 represents experiment performedat 20 km, q2 represents the second experiment performed at 40 km etc. Observe that at 60 km, 80 km and 100km, the receivers should record very similar wave field because the geological features beneath the sea floorin which the wave is reflected from are very similar. Thus, we would expect the data d5,d4,d3 collected fromexperiment q5,q4 and q3 respectively to be very similar, and data d1,d2 to be different because the geologicalfeatures beneath are different. When we put these data in to a matrix i.e. Dobs = [d1,d2,d3,d4,d5,d6], thematrix should be approximately low rank because most the rows are approximately linearly dependent.To verify whether Dtrue has low rank structure, we plot the singular values of Dtrue. Although thesingular values are decaying, it does not resemble the L shape (see figure ??) that a true low rank matrixwould exhibit. Thus, we say our data is approximately low rank. Because our data is low rank, there isno theorem that would guarantee that we can get an exact recovery of the matrix from a few sample of itsentries. However, we may still get a satisfying recovery. One way to know for sure if matrix completionmethod will give us a reasonable recovery from subset the entries of DTrue is to simply try it out.30Figure 5.3: The singular values for Dtrue.5.4 Numerical ResultsWe now present our numerical results for a) the boxed model, and b) the marmousi model.5.4.1 Boxed Model ResultsThe true boxed model and data that we will recover are the following:31Figure 5.4: True box model and data.In what follows, we will recover the model and data by solving the coupled inverse problem that wepropose in previous chapter. We will examine data with 25%, 50% 75%, 85% and 90% missing data entriesby uniformly and randomly removing the columns of the data (which represents missing sources). Thepurpose of doing so is to: 1) determine at what % missing entries the method break down, and 2) get an ideaof how well the model obtained from SAA method approximates the model from deterministic method.32Recovered Model and Data from 25% missing entries DataFigure 5.5: Recovered model from data with 25% missing entries.Figure 5.6: DDeterministic in the above title represents the recovered data by solving the full 60 PDEsfor one misfit. Similarly, D1PDE represents 1 PDE solved for one misfit.33Figure 5.7: The misfit in the plot was computed in the main (i.e. outer) iteration. We defined themisfit as: misfit = 12a2 ||MaskDFullDobs ||F + 12 || P0H1(m)QDFull ||F . Please be awarethat the graphs are converging to zero as the number of iterations increase. This is not the sameas saying the misfit has reached zero.Table 5.1: Summary of results for 25% entries missing dataNumber of PDE solved per misfit # of iterations % diff =| m60PDEmapproxm60PDE |⇥100% % diff =|m60PDEDapproxD60PDE|⇥100%Deterministic: Full 60 PDE solved 60 —— ——SAA: 1 PDE solved 24 1.11 0.82934Recovered Model and Data from 50% missing entries Data -Still need to upload new datafor 50 percent missing DAta entries. 1PDE solved still running on clusterFigure 5.8: Recovered model from data with 50% missing entries.Figure 5.9: D60PDE in the above title represents the recovered data by solving the full 60 PDEs for onemisfit. Similarly, D1PDE represents the recovered data by solving 1 PDE solved for one misfit.The same definition applies to D10PDE.35Figure 5.10: The misfit in the plot was computed in the main (outer) iteration. We defined the misfitas: misfit = 12a2 || MaskDFullDobs ||F + 12 || P0H1(m)QDFull ||F . Please be aware thatthe graphs are converging to some constant value as the number of iterations increase. This isnot the same as saying the graphs has reached a constant value.Table 5.2: Summary of results for 50% entries missing dataNumber of PDE solved per misfit # of iterations % diff =| m60PDEmapproxm60PDE |⇥100% % diff =|D60PDEDapproxD60PDE|⇥100%Deterministic: Full 60 PDE solved 22 —— ——SAA: 1 PDE solved 38 1.36 1.47SAA: 10 PDE solved 22 1.11 1.1636Recovered Model and Data from 75% missing entries DataFigure 5.11: Recovered model from 75% missing entries data. .Figure 5.12: Recovered Data37Figure 5.13: The misfit in the plot was computed in the main (i.e. outer) iteration. We defined themisfit as: misfit = 12a2 ||MaskDFullDobs ||F + 12 || P0H1(m)QDFull ||F .Table 5.3: Summary of results for 75% entries missing dataNumber of PDE solved per misfit # of iterations % diff =| m60PDEmapproxm60PDE |⇥100% % diff =|D60PDEDapproxD60PDE|⇥100%Deterministic: Full 60 PDE solved 60 —— ——SAA: 20 PDE solved 130 1.6 1.71SAA: 50 PDE solved 84 0.709 0.94538Recovered Model and Data from 85% missing entries DataFigure 5.14: Recovered model from 85% missing entries data. .Figure 5.15: Recovered Data39Figure 5.16: The misfit in the plot was computed in the main (i.e. outer) iteration. We defined themisfit as: misfit = 12a2 ||MaskDFullDobs ||F + 12 || P0H1(m)QDFull ||F .Table 5.4: Summary of results for 85% entries missing dataNumber of PDE solved per misfit # of iterations % diff =| m60PDEmapproxm60PDE |⇥100% % diff =|D60PDEDapproxD60PDE|⇥100%Deterministic: Full 60 PDE solved 35 —— ——SAA: 10 PDE solved 61 6.68 8.1540Recovered Model and Data from 90% missing entries DataFigure 5.17: Recovered model from 90% missing entries data. .Figure 5.18: Recovered Data41Figure 5.19: The misfit in the plot was computed in the main (i.e. outer) iteration. We defined themisfit as: misfit = 12a2 ||MaskDFullDobs ||F + 12 || P0H1(m)QDFull ||F .Table 5.5: Summary of results for 90% entries missing dataNumber of PDE solved per misfit # of iterations % diff =| m60PDEmapproxm60PDE |⇥100% % diff =|D60PDEDapproxD60PDE|⇥100%Deterministic: Full 60 PDE solved 67 —— ——SAA: 60 PDE solved 91 1.1 6.26The two plots below tries to demonstrate how the quality of the recovered data might affect the qualityof the recovered model. The plot shows that when the % data difference is high (i.e. the recovered data ismight not be a very good approximation of the true data), the % model difference is also high. For example,the recovery of the data is poor when we try to recover 90% of the missing data. As a result, we also get apoor recovery of the actual model.42Figure 5.20: We define %Data difference = DtrueDDetermisiticDtrue ⇤ 100% and %Model Difference =mtruemDeterministicmDeterministic ⇤100%.DiscussionRecall that one of the goals of using the SS method is so that we can recover a credible model solving fewerPDEs. Tables above show that the recovered models by performing1 PDE solved per misfit gives a resultsvery similar to performing the full PDE solved. More specifically, the difference between the model for1PDE solve and full PDE solved is less than 2%. A reasonable recovery of the model using deterministicand SAA method good upto 75% missing entries. We begin to get a poor recovery of the data when we tryto recover from 85% of missing data. As a result, we also get a poor recovery of the model and vice versa.435.4.2 Marmousi ModelFigure 5.21: Marmusi ModelTable 5.6: Marmusi Model: 30% missing dataMethod # of iterations % difference =| m60PDEmapproxm60PDE |⇥100%Full PDE solved (60 PDEs) per misfit 176 01 PDE solved per misfit 250 2.1125 PDE solved per misfit 392 2.16TSVD 40 54.6Discussion Table 5.1 shows the percent error using 1 PDE solved is similar to 25 PDE solved. Thisseems to suggest that it is suffice to approximate the true misfit (i.e. the misfit obtained from performing fullPDE solved) using 1 PDE solved. In other words, using more than 1 PDE solved is not going improve the44approximation of deterministic model by much. This observation can also be seen in figure 5.21 where themodels recovered from performing 1 PDE solved and 25 PDE solved look nearly indistinguishable from thefull PDE solved.If we had more time, we would increase our grid size by a factor of 3, and use multiple frequencies inour numerical approximation to get a better approximate of the true model. Using multiple frequencies willallow us to capture some of the geological features in the model that are otherwise missed when using justone frequency. Due to time constrained, this task is left as our future work.TSVD gave the worse approximation. We initially suspect it might have to do with how we chose ourregularization parameters a1, a2, and a3. After much fiddling with these parameters, we believe the TSVDmethod does not work for our particular set of data because the singular values are not decaying fast enough(see section 3.2).5.5 Future Work1. Computing the star norm ||DFull ||⇤ in (4.7) is expensive for large scale data because it requires solvingthe SVD of DFull. The computational costs of computing the SVD of an N⇥N matrix is O(N3). Wecan approximate the nuclear norm and avoid computing the SVDs by instead solving the followingnuclear norm factorization problem [16] [25] :|| DFull ||⇤ =minimizeL,R12(|| L ||2F + || R ||2F)subject to X= LRT(5.4)Computing || L ||2F and || R ||2F should be significantly cheaper than computing the SVD of a largematrix. Thus, we can avoid solving the SVD of the nuclear norm in (4.7) by solving the followingproblem:minimizem,L,R,DFulla12(|| L ||2F + || R ||2F)+a42|| DFullLRT ||2F +a22||MDFullDobs ||2F+12|| (PH(m)1QDFull)W ||2F +a3R(m)Future work will consists of comparing the computational run time and the quality of the recoveredmodel using this approach with the coupled problem in (4.7).2. Recall matrix completion is possible under the condition that the data is low rank (or approximatelylow rank). Having said that, the approach in (4.7) would fail if the data was full rank. One possiblesolution would be to find a transformation that maps the matrix into a different domain that promoteslow rank structure, and recover the missing entries in the new domain. For FWI problem, Kumar etal [? ] used the midpoint-offset transformations to map the data in the source receiver domain toa midpoint-offset domain. This type of transformation is equivalent to a 45deg rotation of the datawhere the diagonal entries in the source-receiver domain are mapped to a single column in the new45domain to promote faster decaying singular values. If our seismic data is high rank, we could trysolving (4.7) in the mid-point offset domain. This may or may not work. We leave this to our futurework.46Chapter 6ConclusionSolving large scale nonlinear inverse problems using equation (5) can be computationally expensive. Bylarge scale, we mean large data set. For many applications that has PDE constraints, such as in seismic andDC resistivity problems, large amounts of data are required to obtain a credible reconstruction of the model.Thus, there has been a driven need to to solve these large scale problems efficiently and at a reduced cost.Since the computational bottleneck for solving PDE constraints inverse problem is computing the PDEs,an efficient algorithm would be one that can reduce the number of PDE solved and still give a crediblereconstruction. In chapter 3, a special type of dimensionality reduction technique called the simultaneoussource (SS) method was introduced. This method consists of sampling large dimensional data to create alower one. The consequence of this is fewer PDE solved. One limitation of the SS method is it is onlyapplicable to data with no missing entries. In chapter 4, we overcome this limitation by introducing a newcoupled optimization that allows us to apply the SS method to data with missing entries. We tested ourmethod in the context of FWI in Chapter 5. The model recovered using full 60 PDE solved differed fromthe model recovered using 1 PDE solve by 2.1%.47Bibliography[1] Geogebra-free onliine geometry tool.https://www.math10.com/en/geometry/geogebra/geogebra.html. Accessed: 2017-08-30. ! pagesviii, 18[2] U.S. Environmental Protection Agency. Marine Seismic Methods, 2016. URL:https://archive.epa.gov/esd/archive-geophysics/web/html/marine seismic methods.html. Last visitedon 2017/08/01. ! pages viii, 28[3] Rick Aster, Brian Borchers, and Cliff Thurber. Parameter estimation and inverse problems (secondedition), 2005. ! pages 3[4] Haim Avron and Sivan Toledo. Randomized algorithms for estimating the trace of an implicitsymmetric positive semi-definite matrix. Journal of the ACM (JACM), 58(2):8, 2011. ! pages 10[5] T. Van. Leeuwen X. Li Z. Fang B. Peters, B. Smithyman. Full-Waveform Inversion, 2015 (accessedJune 2017). ! pages 27[6] Jian-Feng Cai, Emmanuel J Cande`s, and Zuowei Shen. A singular value thresholding algorithm formatrix completion. SIAM Journal on Optimization, 20(4):1956–1982, 2010. ! pages 14, 21[7] Emmanuel J Candes and Yaniv Plan. Matrix completion with noise. Proceedings of the IEEE,98(6):925–936, 2010. ! pages 13, 14, 15[8] Emmanuel J Cande`s and Benjamin Recht. Exact matrix completion via convex optimization.Foundations of Computational mathematics, 9(6):717, 2009. ! pages 14, 15[9] Yudong Chen, Srinadh Bhojanapalli, Sujay Sanghavi, and Rachel Ward. Coherent matrix completion.In Eric P. Xing and Tony Jebara, editors, Proceedings of the 31st International Conference onMachine Learning, volume 32 of Proceedings of Machine Learning Research, pages 674–682,Bejing, China, 22–24 Jun 2014. PMLR. ! pages 14, 15[10] Curt Da Silva. Large-scale optimization algorithms for missing data completion and inverseproblems. PhD thesis, University of British Columbia, 2017. ! pages 12[11] Eldad Haber. Mini Course on PDE-Constrained Optimization, 2011. URL:http://www.pdeopt2011.unibas.ch. Last visited on 2017/07/01. ! pages 28[12] Eldad Haber. Computational methods in geophysical electromagnetics. SIAM, 2014. ! pages 5, 9,20[13] Eldad Haber and Mathias Chung. Simultaneous source for non-uniform data variance and missingdata. arXiv preprint arXiv:1404.5254, 2014. ! pages 1, 248[14] Eldad Haber, Matthias Chung, and Felix Herrmann. An effective method for parameter estimationwith pde constraints with multiple right-hand sides. SIAM Journal on Optimization, 22(3):739–757,2012. ! pages 1, 9, 10, 21, 24[15] Michael F Hutchinson. A stochastic estimator of the trace of the influence matrix for laplaciansmoothing splines. Communications in Statistics-Simulation and Computation, 19(2):433–450, 1990.! pages 10, 50[16] Rajiv Kumar, Curt Da Silva, Okan Akalin, Aleksandr Y Aravkin, Hassan Mansour, Benjamin Recht,and Felix J Herrmann. Efficient matrix completion for seismic data reconstruction. Geophysics,80(5):V97–V114, 2015. ! pages 2, 45[17] David JC MacKay. Introduction to monte carlo methods. NATO ASI SERIES D BEHAVIOURAL ANDSOCIAL SCIENCES, 89:175–204, 1998. ! pages 9[18] Jorge Nocedal and Stephen J Wright. Conjugate gradient methods. Numerical optimization, pages101–134, 2006. ! pages 5[19] Farbod Roosta-Khorasani. Randomized algorithms for solving large scale nonlinear least squaresproblems. PhD thesis, University of British Columbia, 2015. ! pages 1, 7[20] Farbod Roosta-Khorasani, Kees Van Den Doel, and Uri Ascher. Data completion and stochasticalgorithms for pde inversion problems with many measurements. Electron. Trans. Numer. Anal,42:177–196, 2014. ! pages 2[21] Farbod Roosta-Khorasani, Kees van den Doel, and Uri Ascher. Stochastic algorithms for inverseproblems involving pdes and many measurements. SIAM Journal on Scientific Computing,36(5):S3–S22, 2014. ! pages 2, 10[22] Alexander Shapiro. Monte carlo sampling methods. Handbooks in operations research andmanagement science, 10:353–425, 2003. ! pages 9[23] Alexander Shapiro, Darinka Dentcheva, and Andrzej Ruszczyn´ski. Lectures on stochasticprogramming: modeling and theory. SIAM, 2009. ! pages 9, 21, 23[24] Jonathan Richard Shewchuk et al. An introduction to the conjugate gradient method without theagonizing pain, 1994. ! pages 5[25] Nathan Srebro. Learning with matrix factorizations. 2004. ! pages 4549Appendix AStochastic Trace EstimatorResult Let A be any N ⇥N symmetric matrix and let u = [u1, ...,un]T be a vector of n independent andidentically distributed random sample drawn from a distribution with zero mean and unit variance s2 = I.ThenE(uTAu) = TrAi.e. uTAu is an unbiased estimator of TrA [15].ProofAssume the random sample u is drawn from a distribution with zero mean and unit variance. When this isthe cases2(u) = E(uuT) = I (A.1)So:E(uTAu) = E(Tr(uTAu))= E(Tr(AuuT)) trace of a product is invariant under cyclic permutation.= Tr(E(AuuT)) trace and expectation are both linear operators and they commute.= Tr A(E(uuT)) A is not a random variable.= TrA used (A.1)50"@en ; edm:hasType "Thesis/Dissertation"@en ; vivo:dateIssued "2018-02"@en ; edm:isShownAt "10.14288/1.0363096"@en ; dcterms:language "eng"@en ; ns0:degreeDiscipline "Mathematics"@en ; edm:provider "Vancouver : University of British Columbia Library"@en ; dcterms:publisher "University of British Columbia"@en ; dcterms:rights "Attribution-NonCommercial-NoDerivatives 4.0 International"@* ; ns0:rightsURI "http://creativecommons.org/licenses/by-nc-nd/4.0/"@* ; ns0:scholarLevel "Graduate"@en ; dcterms:title "Dimensionality reduction for solving large scale inverse problems with PDE constraints through simultaneous source method"@en ; dcterms:type "Text"@en ; ns0:identifierURI "http://hdl.handle.net/2429/64406"@en .