Efficient Seismic Imaging withSpectral Projector and Joint SparsitybyLina MiaoB.Eng. Applied Geophysics, Ocean University of China, June 2011A THESIS SUBMITTED IN PARTIAL FULFILLMENT OFTHE REQUIREMENTS FOR THE DEGREE OFMASTER OF SCIENCEinThe Faculty of Graduate and Postdoctoral Studies(Geophysics)THE UNIVERSITY OF BRITISH COLUMBIA(Vancouver)March 2014c? Lina Miao 2014AbstractIn this thesis, we investigate the potential of improving the efficiency of seis-mic imaging with two advanced techniques: the spectral projector and the?joint sparsity?. The spectral projector offers an eigenvalue decompositionfree computation routine that can filter out unstable evanescent wave com-ponents during wave equation based depth extrapolation. ?Joint sparsity?aims to improve on the pure sparsity promoting recovery by making useof additional structure information of the signal. Besides, a new sparsityoptimization algorithm ? PQN`1 ? is proposed to improve both theoreti-cal convergence rate and practical performance for extremely large seismicimaging problems.iiPrefaceAll the work presented in this thesis is part of the SINBAD II project withsupport from the following organizations: BG Group, BGP, BP, CGG,Chevron, ConocoPhillips, ION, Petrobras, PGS, Statoil, Total SA, West-ernGeco, and Woodside.A version of Chapter 2 has been presented in the SINBAD 2013 fallconsortium meeting (Lina Miao, Fast imaging via depth stepping with thetwo-way wave equation, SINBAD Fall consortium talks. 2013). A versionof Chapter 4 has been presented in the SINBAD 2012 fall consortium meet-ing (Lina Miao, Accelerating on sparse promoting recovery and its benefitsin seismic application, SINBAD Fall consortium talks. 2012). A versionof Chapter 4 has also been submitted to CSEG 2013 Geoconvention (LinaMiao and Felix J. Herrmann, Acceleration on Sparse Promoting Seismic Ap-plications, in CSEG, 2013). The intellectual concept was generated underthe direction of the supervisor Dr. Felix J. Herrmann. The author is re-sponsible for the implementation, the experiments and the numerical andseismic results.Open source codes used in this thesis work include: SPOT (http:// www.cs.ubc.ca/ labs/ scl/ spot/ ), SPG`1 (http:// www.cs.ubc.ca/ labs/ scl/ spgl1/ ),CurveLab (http:// curvelet.org), MSN toolbox (http:// scg.ece.ucsb.edu/ msn.html) and SeismicLab (http:// seismic-lab.physics.ualberta.ca).iiiTable of ContentsAbstract . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . iiPreface . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . iiiTable of Contents . . . . . . . . . . . . . . . . . . . . . . . . . . . . ivList of Tables . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . viList of Figures . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . viiAcknowledgements . . . . . . . . . . . . . . . . . . . . . . . . . . . xiDedication . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . xii1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12 Seismic Imaging . . . . . . . . . . . . . . . . . . . . . . . . . . . 42.1 Double Square-root Migration . . . . . . . . . . . . . . . . . 42.2 Two-way Wave Equation Depth Stepping Migration . . . . . 62.2.1 Problem formulation . . . . . . . . . . . . . . . . . . 72.2.2 Stable Depth Extrapolation with Spectral Projector . 82.3 Efficient Computation of the Spectral Projector . . . . . . . 112.3.1 Newton-Schulz Method for the Matrix Sign Function? The Polynomial Recursion . . . . . . . . . . . . . 122.3.2 Accelerated Newton-Schulz Iteration with Random-ized HSS Matrix Representations . . . . . . . . . . . 152.4 Evaluation . . . . . . . . . . . . . . . . . . . . . . . . . . . . 312.5 Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . 373 ?Joint Sparsity? Promoting Seismic Imaging . . . . . . . . 383.1 ?Joint Sparsity? Recovery Theory . . . . . . . . . . . . . . . 413.2 ?Joint Sparsity? Optimization with the Sum-of-norms For-mulation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 42ivTable of Contents3.3 Ray Parameter Image Volume . . . . . . . . . . . . . . . . . 433.4 Problem Formulation and Evaluation . . . . . . . . . . . . . 463.4.1 Problem Formulation . . . . . . . . . . . . . . . . . . 463.4.2 ?Joint Sparsity? Promoting Seismic Imaging for Mi-gration Images . . . . . . . . . . . . . . . . . . . . . . 473.4.3 ?Joint Sparsity? Promoting Seismic Imaging for RayParameter Image Volume . . . . . . . . . . . . . . . . 493.5 Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . 524 Large Scale Optimization . . . . . . . . . . . . . . . . . . . . . 534.1 Projected Quasi Newton (PQN) Method . . . . . . . . . . . 544.2 From SPG`1 to PQN`1 . . . . . . . . . . . . . . . . . . . . . 574.3 Evaluation . . . . . . . . . . . . . . . . . . . . . . . . . . . . 594.3.1 Stylized Numerical Example . . . . . . . . . . . . . . 594.3.2 Seismic Imaging Example . . . . . . . . . . . . . . . . 634.4 Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . 655 Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 66Bibliography . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 67Appendices . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 71Appendix A SPG`1 . . . . . . . . . . . . . . . . . . . . . . . . . . 71vList of Tables2.1 Computation the complexity analysis of major HSS opera-tions (adapted from Sheng et al. (2007)), in which n is thedimension of the HSS matrix and k is the maximum rank ofoff-diagonal blocks of the matrix. . . . . . . . . . . . . . . . . 172.2 Theoretical computation complexity analysis of the two-waywave equation migration algorithm, where n is the problemsize (lateral grid points), n? is the number of frequencies andnz is the number of depth levels. Since n is always much big-ger than the diagonal ranks or the Newton-Schulz iterations,here we only consider the leading order complexity in n. . . . 364.1 Memory and CPU tradeoff in PQN`1 sparse seismic imagingproblem, time advantage is very obvious compared to SPG`1 65viList of Figures1.1 Illustration of model perturbation from the adapted saltdomemodel (a) and BG compass model (b). . . . . . . . . . . . . . 32.1 Illustration of the sign function mapping: top plot, the eigen-value decomposition of a sparse matrix L (achieved fromEquation 2.5); the bottom plot, its the corresponding signfunction. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 112.2 Eigenvalue spectrum of L, in which both negative and posi-tive eigenvalues are present. As stated in the previous section,the positive spectrum corresponds to the unstable evanescentwavefield components, while the non-positive spectrum corre-sponds to all propagating wavefiled components. The eigen-values are sorted in the increasing order. . . . . . . . . . . . . 132.3 Convergence trend for Algorithm 3. The subplots are neweigenvalue spectra corresponding to the increasing number ofiterations. As the number of iteration grows, the non-positivepart of the spectrum remains untouched, whereas the positivepart converges to all zeros. The eigenvalues are sorted in theincreasing order. . . . . . . . . . . . . . . . . . . . . . . . . . 142.4 Convergence behaviour of Algorithm 3. Lines with differentcolours are convergence curves for different matrix size. . . . 152.5 Matrices involved in the recursion algorithm fill up quickly inearly iterations, which brings O(n3) computational complex-ity in matrix matrix multiplication. . . . . . . . . . . . . . . . 162.6 Illustration of a two-level HSS partition scheme, with diago-nal sub-matrices Ds and tSVD of low rank off diagonal sub-matrices. (Xia (2012)) . . . . . . . . . . . . . . . . . . . . . . 192.7 Two-level HSS tree structure (adapted from Chandrasekaranet al. (2006)), with U, V,D stored at the leaves, translationsmatrices R,W stored in the intermediate levels. . . . . . . . . 20viiList of Figures2.8 Singular values for a sample L matrix of the size 512 ? 512.A slowly decay pattern indicates that the regular compressionvia tSVD would either lose significant amount of informationor can not achieve a satisfactory compression rate. . . . . . . 212.9 Singular value information for a sample L matrix in HSS par-tition. We can see that apart from the small blocks on themain diagonal, all the other blocks are highly compressible.We would expect high efficiency from using truncated SVDapproximation of the non main diagonal sub matrices, whichis the main idea of HSS matrix representation. . . . . . . . . 222.10 The HSS structure is maintained as the number of iterationsgrows in the previously mentioned polynomial recursion algo-rithm (Algorithm 3). . . . . . . . . . . . . . . . . . . . . . . . 232.11 Computational performance of HSS matrix-matrix multipli-cation (thss) vs standard multiplication (tdgemm) for Helmholtzmatrices for different problem sizes. The horizontal axis isthe matrix size, and the vertical axis is the ratio between thestandard matrix-matrix multiplication (tdgemm) and the HSSmatrix matrix multiplication (thss). The result is based onthe average of 20 experiments for each matrix size. . . . . . . 252.12 Computation performance of the randomized HSS construc-tion at the error level of 1e-6. The first plot shows an SVD andrSVD comparison on Helmholtz matrices of different sizes.The second plot shows a comparison on HSS constructionwith standard SVD and rSVD. A general conclusion can bedrawn from those two figures, randomized HSS constructionbehaves better compared to normal HSS construction in termsof computation time, and its advantage becomes more distinctas matrix size grows. . . . . . . . . . . . . . . . . . . . . . . 282.13 The computational performance of the spectral projector witheigenvalue decomposition, ordinary matrix matrix multiplica-tion and with randomized HSS acceleration in normal (a) andsemi-log (b) scale. As matrix size grows, the time advantagefor randomized HSS acceleration starts to be obvious, thegrowing trends from the semi-log plot also suggest the superi-ority of the proposed randomized HSS acceleration regardingto the time usage. . . . . . . . . . . . . . . . . . . . . . . . . 302.14 A sample migration example with the triangle model pertur-bation. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 32viiiList of Figures2.15 The imaging results with two-way wave equation depth step-ping migration (Algorithm 2) (middle plot), reverse time mi-gration (RTM) (top plot) and double square root migration(DSR) (Algorithm 1). . . . . . . . . . . . . . . . . . . . . . . 332.16 A sample migration example with the saltdome model per-turbation. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 342.17 The imaging results with two-way wave equation depth step-ping migration (Algorithm 2) (middle plot), reverse time mi-gration (RTM) (top plot) and double square root migration(DSR) (Algorithm 1). . . . . . . . . . . . . . . . . . . . . . . 352.18 Computational performance of the above fast two-way waveequation migration algorithm. The test cases for this analy-sis contain a single point model perturbation. For simplicityand without loss of generality, we use only one frequency. Thetheoretical performance of the migration algorithm is linearin the problem size, which is shown as a red line (linear trendscaled by the number of depth levels). Other reference curvesinclude y = nz ? x1.2 and y = nz ? x1.1. The true time perfor-mance of the above migration algorithm is shown as a blackline. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 373.1 An example of a signal that possesses ?joint sparsity? struc-ture. The plot at the top shows the signal displayed as a longvector. Besides the most significant structure ? sparsity,?joint sparsity? structure also exists among the short vectorsdivided by vertical red dash lines. As shown in the ?stemplot? at the bottom, the supports of individual short sparsevectors coincide with each other. Namely, the sparse matrixformed with those short vectors as columns has nonzero en-tries restricted to a subset of rows. . . . . . . . . . . . . . . 393.2 Top plot, slice view of the image volume via a DSR ray pa-rameter image gather migration, with axes midpoint position,ray parameter and depth level. We can see image slices withdifferent ray parameters are similar to each other, and theslice with same ray parameter and different midpoints hasflat lines. Bottom plot, support view of the image volume forselect ray parameters. We can see a joint sparsity structureamong images with different ray parameters. . . . . . . . . . 45ixList of Figures3.3 Top plot, migration image from pure sparsity recovery, using`1,1 on the ray parameter image volume. Middle plot, migra-tion image from joint sparsity recovery, using `1,2 on the rayparameter image volume. Bottom plot, migration image frompure sparsity promoting on migration image . . . . . . . . . . 483.4 (a) midpoint slice from sparsity recovery result from `1,1; (b)joint sparsity recovery result from `1,2 . . . . . . . . . . . . . 504.1 An example of an SPG`1 log file. There are two things thatneeded to be pointed out. First, the total time is a summationof ?project time? and ?mat-vec time?, where the mat-vec timeis taking 99% of the total. Second, the number of mat-vecsis proportional to the total number of iterations. . . . . . . . 544.2 SPG`1 and PQN`1 comparison in one stylized numerical ex-ample. Top plot: recovery with the same data residual, mid-dle plot: the convergence rate, black line is for SPG `1 andblue is for PQN`1, and bottom plot: memory and CPU tradeoff for the PQN`1 algorithm. . . . . . . . . . . . . . . . . . . 604.3 SPG`1 and PQN`1 comparison within in one ill-conditionednumerical example. Top plot, singular values of the sensingmatrix; middle plot, recovery with the same data residual;bottom plot, memory and CPU trade off of the PQN`1 algo-rithm. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 624.4 SPG`1 and PQN`1 comparison within in one seismic imagingexample, (a) reverse time migration result as a reference (b)SPG`1 recovery, and (c) PQN`1 recover with the same dataresidual . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 64A.1 Adapted from Figure 2.1 in van den Berg and Friedlander(2008). Pareto curve (black line) : The tradeoff between dataresidual and model sparsity. On the vertical axis is two normof data residual, and on the horizontal axis is the one normof solution. The red dot line shows the solution path of oneparticular BPDN problem, each dot represents one iteration. 72xAcknowledgementsI wish to extend my most sincere gratitude and appreciation to my advisorFelix J. Herrmann for his supervision of my research, creative ideas, andcontinued support through-out my studies. Without him none of this wouldbe possible. In addition, I would like to thank my committee members DougOldenburg, Ozgur Yilmaz, Michael P. Friedlander and Felix J. Herrmann. Iwill always appreciate their input and feedback.I would also like to thank Charles Jones from BG for providing us withthe BG Compass dataset, and the authors of SPOT (SPG`1 (http:// www.cs.ubc.ca/ labs/ scl/ spgl1/ ), CurveLab (http:// curvelet.org), MSN toolbox(http:// scg.ece.ucsb.edu/ msn.html) and SeismicLab (http:// seismic-lab.physics.ualberta.ca). This work was in part financially supported by the Natu-ral Sciences and Engineering Research Council of Canada Discovery Grant(22R81254) and the Collaborative Research and Development Grant DNOISEII (375142-08). This research was carried out as part of the SINBAD IIproject with support from the following organizations: BG Group, BGP,BP, CGG, Chevron, ConocoPhillips, ION, Petrobras, PGS, Total SA, West-ernGeco, and Woodside.Last but not the least, my sincere thanks to Art Petrenko, Ian Hanlonand Polina Zheglova for their kind help and valuable inputs in the thesisediting procedure.xiDedicationTo my parents,xiiChapter 1IntroductionSeismic imaging, also known as migration, aims to image the subsurfaceearth structure from the seismic data observed at the earth surface. From1920?s to present, seismic imaging techniques have been successfully helpingengineers locate hydrocarbon traps and geohazards inside the earth. Overthe past 90 years, numerous seismic imaging approaches have been developedto counter the challenges brought up in different scenarios. However the ex-isting imaging approaches have been showning increasing difficulty to adaptto current demands towards higher-resolution images in more complicatedregions. Geophysicists have always been trying to improve the efficiency andaccuracy of existing seismic imaging algorithms. Moving from one-way waveequation based imaging to the full two-way wave equation based imaging onthe one hand is the necessary requirement for higher accuracy, however, onthe other hand it generates more challenges in algorithm efficiency to workwith any practical size imaging problem.Moreover, the so-called ?curse of dimensionality? leads to exponentialincrease of imaging costs as the size of the area of interest increases (Her-rmann et al. (2012)). Sparsity promoting seismic imaging belongs to thebroad category of sparsity promoting seismic techniques, offering an alter-native approach to confront the ?curse of dimensionality? in seismic imagingproblems (Herrmann et al. (2012)). The sparsity promoting seismic tech-niques originate from recent insights in compressive sensing (CS) (Donoho(2006)). CS enables us to work with data that are traditionally considered tobe under-sampled, which leads to an entirely new technology where acquisi-tion and imaging related costs are no longer determined by overly stringentsampling criteria (Herrmann et al. (2012)). Besides an essential efficient andprecise imaging algorithm, successful adaption of sparsity promoting tech-niques to seismic imaging also relies on three key components: a randomizedcompressed sensing sampling scheme, a sparse/compressive transform basis,and a powerful optimization solver for sparsity promoting problems.Having noticed the existing challenges and techniques in seismic imag-ing, in this thesis we want to further explore the potential to improve theefficiency in seismic imaging. We include three different pieces of work in1Chapter 1. Introductionthis thesis. First in Chapter 2 we introduce a fast two-way wave equationmigration algorithm following a basic introduction of a classic one-way waveequation Double Square Root (DSR) migration. Then in Chapter 3, wepropose a new sparsity promoting seismic imaging scheme ? the ?joint spar-sity? promoting seismic imaging, trying to improve the sparsity promotingimaging quality by exploiting other structure information in seismic imagesbesides sparsity. Last but not the least, we develop a new sparsity optimiza-tion solver ? PQN`1 in Chapter 4 to accelerate the existent efficiency ofa current state-of-the-art sparse algorithm ? SPG`1 ? for extremely largescale seismic imaging problems. The proposed algorithm introduces a sec-ond order direction into SPG`1, which significantly improves its convergencerate and computational performance.Experiments shown in Chapter 2 and Chapter 3 are based on an adapted?saltdome model?. The ?saltdome model? is originally used in Berkhoutand Verschuur (2006) for multiple reflections imaging. The adapted modelperturbation is sparse with Dirac basis. Though with sparsity transform(for example, Curvelet) we can always obtain a sparse representation ofmore complex models, for simplicity and without loss of generality, we workin Dirac basis. The results and conclusions obtained with Dirac basis can beeasily extended to models that are sparse in other sparse transform domains.The experiment data is the linearized data generated according to the Bornapproximation theory. The imaging experiment shown in Chapter 4 is basedon a part of synthetic BG compass dataset within a limited frequency range.BG compass dataset is provided by Charles Jones from BG group to SeismicLaboratory of Imaging and Modeling (SLIM). The BG compass model is richof complex geological structures, including a salt body, shallow channels andunconformities, making it ideal for testing seismic imaging algorithms. Notethat all the imaging results in this thesis are plotted against the coordinatesinstead of specific units.2Chapter 1. Introductionsources/receivers coordinatedepth coordinatemodel perturbation 50 100 150 200 250 300 350 40020406080100120(a)model perturbationsource/receiver coordinatedepth coordinate20 40 60 80 100 120 1401020304050(b)Figure 1.1: Illustration of model perturbation from the adapted saltdomemodel (a) and BG compass model (b).3Chapter 2Seismic ImagingThe industrial standard approaches for seismic imaging can be classified intotwo general disciplines: the ray theory based methods and the wave theorybased methods. The ray theory based methods generally enjoy the simplic-ity of adapting to the observation geometry but they also suffer from thedifficulty to image complex structures. The wave equation based methodspossess higher imaging precision at the expense of a higher computationalcost. The imaging approach we choose to implement belongs to the categoryof wave equation based imaging methods. In Section 2.1 we briefly covera most basic double square-root (DSR) wave equation migration method.In Section 2.2 we introduce a more advanced fast two-way wave equationbased depth stepping migration method featuring randomized hierarchicallysemi-separable matrix representation acceleration.2.1 Double Square-root MigrationDouble square-root (DSR) migration, also known as the shot-geophone mi-gration, originates from the survey sinking concept (Claerbout (1985)),which is a counterpart for exploding reflector concept in pre-stack migra-tion. According to exploding reflector theory, zero offset seismic wavefieldsfrom a series of different experiments are associated with one pseudo exper-iment ? the exploding reflector experiment. On the other hand, the surveysinking concept incorporates both zero and nonzero offset data as an entiresurvey. Via wave extrapolation (for example using DSR) of both the sourceand receiver wavefield, the survey ?sinks? into subsurface layers, as if shotsand geophones were placed at increasingly deeper subsurface layers. Theimage of a reflector at (x, z) is defined as ?the strength of the echo seenby the closest possible source-geophone pair, taking the mathematical limit,this closet pair is a source and geophone located together on the reflector?(Claerbout (1985)), which can be summarized byM(x, z) = U(xs = x, xg = x, z, t = 0) (2.1)42.1. Double Square-root Migrationwhere M stands for the migration image, U stands for the extrapolatedwavefield, xs stands for the shot coordinates and xg is the receiver coordi-nates.DSR migration (Claerbout (1985)) is based on the one-way wave equa-tion, which is one of the most widely used simplified approximation to thetrue two-way wave equation. The depth propagating scheme of DSR fea-tures two square-root terms, which is reflected in the name of Double SquareRoot (DSR) migration.DSR extrapolation scheme can be expressed as:?U?z =???1v2 ?( ?t?g)2+?1v2 ?( ?t?s)2???U?t (2.2)in the time domain, or?U??z = ?i????1v2 ?(kg?)2+?1v2 ?(ks?)2?? U? (2.3)in the Fourier domain, where the two square-root terms stand for receiverextrapolation (?1v2 ? (kg? )2) and source extrapolation (?1v2 ? (ks? )2) respec-tively for propagating wavefields, U? stands for the propagating wavefield inthe Fourier space, v stands for the medium velocity, t stands for the traveltime, ? stands for the frequency, ks and kg stand for the wavenumber co-ordinates for sources and receivers. For a detailed implementation, pleaserefer to Algorithm 1.52.2. Two-way Wave Equation Depth Stepping MigrationAlgorithm 1 Double Square-root MigrationInput: Wavefield U?(xs, xg, z0, ?) at the surface z = z0 for all xs, xg, ?Output: Migration image M(x, z)Initialize M(x, z) = 0for ? = ?1 : ?max doInitialize D(xs, xg) = U?(xs, xg, z0, ?)for z = z1 : zmax do1. ?z = z ? zprev2. D?(ks, kg)? FFT {D(xs, xg)}3. D?(ks, kg)? exp[?i?(?1v2 ?k2s?2 +?1v2 ?k2f?2)?z]D?(ks, kg)4. D(xs, xg)? FFT ?1{D?(ks, kg)}5. M(x, z) + = D(x, x)endendDSR migration is fundamental to many current high performance mi-gration techniques and it is also a good starting point for velocity analysisand many other more complicated inversion algorithms. Meanwhile, DSRmigration also suffers from the the inaccuracy brought by the one-way waveequation approximation (Mulder and Plessix (2004)), and more importantly,it can not handle background medium with laterally varying velocity. There-fore, in the following section, we seek a more accurate two-way wave equationbased imaging algorithm.2.2 Two-way Wave Equation Depth SteppingMigrationDespite its speed and simplicity, the one-way DSR extrapolator suffers fromthe inaccuracies caused by lateral variations within complex structures. Thisis because the DSR migration method assumes that the velocity is laterallyinvariant. Alternatives available are those that add extra modifications tothe approximated one-way wave equation (e.g. split step DSR extrapolator62.2. Two-way Wave Equation Depth Stepping Migration(Popovici (1996))), and those that employ two-way wave equations. Ideallywe want to use two-way wave equation extrapolation for migration becauseof its higher accuracy. However, it is generally expensive to perform two-waywave equation migration. It is also challenging to handle the stability of thetwo-way wave equation based depth stepping algorithm with the existenceof the evanescent wave components. There are some promising methodsand techniques that try to address these problems. For example, Sandbergand Beylkin (2009) propose to use a fast spectral projector to filter out theevanescent waves without losing of any propagating wave components. Inthis section, we first briefly go through this approach, and then propose arandomized HSS based acceleration to further improve its computationalperformance.2.2.1 Problem formulationAssuming we are given a survey recorded at a surface z0, the wave propaga-tion can be described by the acoustic wave equation (Sandberg and Beylkin(2009)):ptt = v(z, x)2(pxx + pzz),p(x, 0, t) = 0, t ? 0,{p(x, z, 0) = f(x),pt(x, z, 0) = g(x),(2.4)where v(x, z) is the velocity, p(z, x, t) is the acoustic pressure at the point(x, z), f(x) is the initial pressure at time 0, and the subscripts indicatepartial derivatives. If we apply a Fourier transform with respect to time toEquation 2.4 and consider the result as an equation for p?:p?zz =[?( ?v(x, z))2?Dxx]p? := Lp?. (2.5)Please note here we define the operator in the square bracket as a newoperator L. We consider Equation 2.5 with the initial conditions q and qzat the surface z = z0{p?(x, z0, ?) = q?(x, z0, ?)p?z(x, z0, ?) = q?z(x, z0, ?)(2.6)where in practice, q? and q?z stand for the wavefield specified at the surfacez = z0 and its first derivative along z direction respectively.72.2. Two-way Wave Equation Depth Stepping MigrationA first order depth stepping imaging scheme can be formulated based onthe initial-value problem formed by Equations 2.5 and 2.6 (Sandberg andBeylkin (2009)):ddz[p?p?z]=[0 1L 0] [p?p?z]. (2.7)However, this depth stepping scheme is unstable. The cause of this insta-bility is the indefinite nature of the self-adjoint operator L, such that thepositive eigenvalues will result in an exponential growth of the wavefieldduring depth extrapolation. Physically, the negative eigenvalues correspondto propagating wavefield modes, and the positive eigenvalues are associatedwith the non-propagating evanescent wavefield modes. For more discussionson the stability of the wave extrapolation operator, please see Wapenaar andGrimbergen (1998), Sandberg and Beylkin (2009), Kosloff and Baysal (1983)and references therein. Using direct eigenvalue decomposition to filter outpositive eigenvalue components would be a most straightforward approachto suppress the unwanted wavefield modes. But the O(n3) numerical com-plexity for the eigenvalue decomposition is prohibitively large, since seismicproblem sizes can easily scale up to millions of grid samples. Therefore, inthe following sections, we describe an alternative eigenvalue decompositionfree approach to suppress the unwanted wavefield modes.2.2.2 Stable Depth Extrapolation with Spectral ProjectorThe early idea of ?spectral projector? can be found in the survey by Ken-ney and Laub (1995), where a projector P projecting onto the non-positiveinvariant subspace of a self-adjoint matrix is defined as:P = (I ? sign(L))/2. (2.8)Here I stands for the identity matrix, and sign(L) stands for the matrix signfunction of the matrix L. Spectral projector is idempotent:P2 = P. (2.9)Applying operator P to matrix L, the product PLP will preserve only itsnonpositive eigenvalues. More precisely, if L has the following eigenvaluedecomposition:L = V ?V ?, (2.10)where V is a matrix that contains the eigenvectors of L as columns, ? isthe diagonal matrix with the eigenvalues of L on the diagonal, then the82.2. Two-way Wave Equation Depth Stepping Migrationeigenvalue decomposition of PLP would be:PLP = V ??V ?, (2.11)where the eigenvectors V are the same as those of L, whereas the new setof eigenvalues ?? only contains the non-positive components of ?.With the spectral projector P applied to the unstable extrapolator L,we obtain a new stabilized extrapolation operator PLP. In the new oper-ator, the positive part of the spectrum of L is replaced by zeros, so thatthe evanescent wavefield modes associated with the positive eigenvalues areremoved during the extrapolation. Therefore, we replace the ill-posed depthstepping scheme by the following new stable system (Sandberg and Beylkin(2009)):ddz[p?p?z]=[0 1PLP 0] [p?p?z]. (2.12)Compared to other approximate Fourier filtering techniques (e.g. Kosloffand Baysal (1983)), with the spectral projector we only remove the trueevanescent waves (associated with the positive eigenvalues) without relyingon the assumption that the medium is locally laterally invariant. Therefore,we introduce the two-way wave equation depth stepping migration algorithm(Algorithm 2). For simplicity and without loss of generality, we assume thatthe source and receiver grids coincide.92.2. Two-way Wave Equation Depth Stepping MigrationAlgorithm 2 Fast Two-way Wave Equation Migration (adapted from Sand-berg and Beylkin (2009) )Input: Wavefield U?(xs, xg, z, w) at the surface in the frequency do-mainOutput: Image M(x, z)for ? = ?1 : ?max doInitialize D(xs, xg) as a frequency slice of U?(xs, xg, z, w);for z = zmin : zmax do1. Compute P2. For sources xsmin : xsmax,solveddz[D?(xs = xsi, xg)D?z(xs = xsi, xg)]=[0 1PLP 0] [D(xs = xsi, xg)Dz(xs = xsi, xg)]3. For receivers xgmin : xgmax,solveddz[D(xs, xg = xgi)Dz(xs, xg = xgi)]=[0 1PLP 0] [D?(xs, xg = xgi)D?z(xs, xg = xgi)]4. M(x, z) + = D(x, x)endendThis algorithm is very similar to migration algorithms originating fromthe survey sinking concept. In steps 2 and 3, source and receiver wavefieldsare downward extrapolated respectively into the earth, and in step 4 thezero offset data of the extrapolated wavefield are taken as the image update.Both the propagation scheme (step 2 and step 3) and the imaging condition(step 4) are quite similar to the DSR algorithm (Algorithm 1) describedpreviously, except the extrapolation operator here is obtained from the two-way wave equation. In this algorithm, the procedure that dominates thecomputation complexity is to compute the spectral projector P in step 1.Computation of the spectral projector P relies on the computation of thematrix sign function sign(L). Recall in linear algebra, extension of the scalarfunction to matrix function is generally not the element-wise application ofthe function to the matrix. Instead, specific mapping of the matrix spectrumis required (Higham (2008)). For the matrix sign function of the self adjoint102.3. Efficient Computation of the Spectral Projectormatrix L, we can write the mapping as:sign(L) = V sign(?)V ?. (2.13)In Figure 2.1 we illustrate this mapping with a real matrix example. Inthe top plot we show the eigenvalue decomposition of a sparse matrix L(achieved from Equation 2.5), in the bottom plot we its the correspondingsign function.==V V ??L = ?sign(?)?V ?sign(L) V= ? ?Figure 2.1: Illustration of the sign function mapping: top plot, the eigen-value decomposition of a sparse matrix L (achieved from Equation 2.5); thebottom plot, its the corresponding sign function.Straightforward computation of the matrix sign function according Equa-tion 2.13 still requires an eigenvalue decomposition. Therefore, in section2.3, we will discuss an efficient eigenvalue decomposition free algorithm tocompute the matrix sign function with a randomized HSS acceleration. Withthe proposed acceleration, we claim that the theoretical computational com-plexity of the spectral projector is linear with respect to the problem size.2.3 Efficient Computation of the SpectralProjectorIn this section, we introduce an efficient eigenvalue decomposition free al-gorithm to compute the matrix sign function as well as the matrix spectral112.3. Efficient Computation of the Spectral Projectorprojector. This section is arranged as follows: in section 2.3.1 we first intro-duce an eigenvalue decomposition free polynomial recursion algorithm forcomputing the matrix sign function; in section 2.3.2 we discuss the accelera-tion of this polynomial recursion algorithm with the randomized HSS matrixrepresentation.2.3.1 Newton-Schulz Method for the Matrix Sign Function? The Polynomial RecursionAs mentioned in the previous section, computation of the matrix sign func-tion is crucial to computing the matrix spectral projector. Due to the largeproblem sizes in seismic imaging problems, eigenvalue decomposition is tooexpensive and we intend to seek eigenvalue decomposition free algorithms.Auslander and Tsao (1991) introduced the Newton-Schulz method as a rea-sonable choice to compute the matrix sign function without involving eigen-value decompositions. The Newton-Schulz method (Auslander and Tsao(1991)), which essentially is a polynomial recursion(Algorithm 3), is alsoused by Sandberg and Beylkin (2009):Algorithm 3 Newton-Schulz Iteration for the Matrix Sign Function ( Aus-lander and Tsao (1991))Input: Self adjoint matrix LOutput: S := sign(L)1. Initialize S0 = L/||L||2, where ||L||2 stands for the 2 norm of matrixL2. For k = 1......N , Sk+1 = 32Sk ? 12S3kTo illustrate the algorithm, we show an example of the algorithm appliedto an unstable depth extrapolator L as described in the previous sections.First, in Figure 2.2 we demonstrate the eigenvalue spectrum of L, in whichboth negative and positive eigenvalues are present. As stated in the previ-ous section, the positive spectrum corresponds to the unstable evanescentwavefield components, while the non-positive spectrum corresponds to thepropagating wavefiled components. In order to achieve a stabilized depthextrapolator, PLP computed via Algorithm 3 and Equation 2.8 needs topreserve all the original non-positive eigenvalues and filter out the positive122.3. Efficient Computation of the Spectral Projectoreigenvalues. Figure 2.3 demonstrates the algorithm?s convergence trend.The subplots are new eigenvalue spectra corresponding to the increasingnumber of iterations. As the number of iteration grows, the non-positivepart of the spectrum remains untouched, whereas the positive part convergesto all zeros. This algorithm does not contain eigenvalue decomposition andthe iteration converges very fast (Auslander and Tsao (1991)). Figure 2.4shows its convergence rate described as the relative error versus iterations.Lines with different colours are the convergence curves for different matrixsizes. From the above example, we can observe that Algorithm 3 is ableto compute the matrix sign function without the necessity to perform theeigenvalue decomposition, and the algorithm converges within a relativelyaffordable small number of iterations.Eigenvalue*index*of*the*increasing*amplitude*sorted*eigenvaluesEigenvaluesEigenvalue*index*of*the*increasing*amplitude*sorted*eigenvaluesEigenvaluesEigenvalue*indexEigenvalue*indexFigure 2.2: Eigenvalue spectrum of L, in which both negative and positiveeigenvalues are present. As stated in the previous section, the positive spec-trum corresponds to the unstable evanescent wavefield components, whilethe non-positive spectrum corresponds to all propagating wavefiled compo-nents. The eigenvalues are sorted in the increasing order.Although the Newton-Schulz iteration converges in a small number ofiterations, it involves computing matrix matrix multiplications iteratively,which are not necessarily friendly in seismic imaging problems. The cubiccomplexity for dense matrix matrix multiplication is unrealistic for large sizeimaging problems. Even the Newon-Schulz iteration is initialized with sparsematrix, the sparse pattern usually is maintained during iterative matrixmatrix multiplications. Therefore we move on to the next section, where wedescribe how this computation can be sped up.132.3. Efficient Computation of the Spectral Projector0 500 1000 1500 2000?10?8?6?4?20x 10?3 iter:1Eigenvalue index of the increasing amplitude sorted eigenvaluesEigenvalues0 500 1000 1500 2000?10?8?6?4?20x 10?3 iter:3Eigenvalue index of the increasing amplitude sorted eigenvaluesEigenvalues0 500 1000 1500 2000?10?8?6?4?20x 10?3 iter:5Eigenvalue index of the increasing amplitude sorted eigenvaluesEigenvalues0 500 1000 1500 2000?10?8?6?4?20x 10?3 iter:7Eigenvalue index of the increasing amplitude sorted eigenvaluesEigenvalues0 500 1000 1500 2000?10?8?6?4?20x 10?3 iter:9Eigenvalue index of the increasing amplitude sorted eigenvaluesEigenvalues0 500 1000 1500 2000?10?8?6?4?20x 10?3 iter:11Eigenvalue index of the increasing amplitude sorted eigenvaluesEigenvaluesEigenvalue*indexEigenvalue*indexEigenvalue*indexEigenvalue*indexEigenvalue*indexEigenvalue*indexiteration*1iteration*5iteration*9iteration*3iteration*7iteration*11Figure 2.3: Convergence trend for Algorithm 3. The subplots are new eigen-value spectra corresponding to the increasing number of iterations. As thenumber of iteration grows, the non-positive part of the spectrum remainsuntouched, whereas the positive part converges to all zeros. The eigenvaluesare sorted in the increasing order.142.3. Efficient Computation of the Spectral Projector5 10 15 20 25 3000.0050.010.0150.020.025iterationerrorConvergence rate of the polynomial recursion for different matrix size 641282565121024Figure 2.4: Convergence behaviour of Algorithm 3. Lines with differentcolours are convergence curves for different matrix size.2.3.2 Accelerated Newton-Schulz Iteration withRandomized HSS Matrix RepresentationsHaving validated the expected convergence property of the Newton-Schulzalgorithm (Algorithm 3), we need to address the computationally intensivepart in it: the matrix matrix multiplications. Even if we start with a sparsematrix, as shown in Figure 2.5, the matrices involved in the recursion fill upquickly in early iterations. The O(n3) computational complexity for densematrix matrix multiplications is not practical for any realistic seismic prob-lem, especially in 3D, where the matrix size can be of order 106 by 106 orlarger. Therefore we need a powerful engine for fast matrix matrix mul-tiplication in order to compute the spectral projector efficiently. For thispurpose, we propose an efficient acceleration with the randomized Hierar-chically Semi-Seperable (HSS) method.152.3. Efficient Computation of the Spectral Projectoriteration((1iteration((3iteration((5iteration((2iteration((4iteration((6Figure 2.5: Matrices involved in the recursion algorithm fill up quickly inearly iterations, which brings O(n3) computational complexity in matrixmatrix multiplication.162.3. Efficient Computation of the Spectral Projector2.3.2.1 Hierarchically Semi-Seperable (HSS) MatrixRepresentationHSS matrix representation is first introduced by Chandrasekaran et al.(2006) as an alternative way of representing a particular class of n?n matri-ces. Instead of using the n2 entires, HSS representation exploits the low rankstructure of its particular sub-matrices. The HSS matrix representation hasa nice property of linear complexity in matrix computations and storage.Table 2.1 (adapted from Sheng et al. (2007)) includes a summary of thecomputational complexity of major HSS matrix algorithms. For a detailedcomplexity analysis please refer to Xia (2012) and reference therein. Amongthe listed matrix operations, we are most interested in using the HSS matrixmatrix multiplication in the polynomial recursion algorithm (Algorithm 3)to compute the spectral projectors. Earlier usage of HSS matrix representa-tions in seismic applications can be found in Jumah and Herrmann (2012)for robust seismic multiple removal algorithm, and recent developments insuper fast direct solver for Helmholtz systems are also available in Wanget al. (2010).Operation Complexity with HSS Complexity without HSSMatrix-Vector Multiplication O(nk2) O(n2)Matrix-Matrix Multiplication O(nk3) O(n3)Matrix addition O(nk2) O(n2)Compression O(nk3) Not ApplicableLU Decomposition O(nk3) O(n3)Inverse O(nk3) O(n3)Transpose O(nk) O(n2)Construct HSS (Xia (2012)) O(n2r) Not ApplicableTable 2.1: Computation the complexity analysis of major HSS operations(adapted from Sheng et al. (2007)), in which n is the dimension of the HSSmatrix and k is the maximum rank of off-diagonal blocks of the matrix.HSS matrix representation is most beneficial for matrices that are notlow rank by themselves but contain low rank off-diagonal sub-matrices, forexample, diagonally dominant matrices. This is clearly revealed by theHSS construction procedure. In HSS representation construction, matri-ces are hierarchically partitioned into high-rank diagonal sub-matrices and172.3. Efficient Computation of the Spectral Projectorlow-rank off-diagonal sub-matrices. Then truncated singular value decom-positios (tSVD) is applied to approximate the low-rank off-diagonal sub-matrices. Recursively, the high-rank diagonal sub-matrices are partitionedagain into diagonal and off-diagonal sub-matrices.An initial partition of the input matrix A in the first recursion step isgiven byA =(A1;1,1 A1;1,2A1;2,1 A1;2,2), (2.14)the subscripts represent the partition level, the sub-matrix row number, andthe sub-matrix column number respectively. With the tSVD applied to theoff-diagonal blocks we arrive at:A =(D1;1,1 (USV ?)1;1,2(USV ?)1;2,1 D1;2,2),where D?s are the diagonal sub-matrices and USV ??s correspond to thetSVD approximations of the off diagonal low rank sub-matrices. In the nextrecursion of the HSS construction, the diagonal matrices are partitionedagain using the same HSS partition and approximation scheme:A =????(D2;1,1 (USV ?)2;1,2(USV ?)2;2,1 D2;2,2)(USV ?)1;1,2(USV ?)1;2,1(D2;1,1 (USV ?)2;1,2(USV ?)2;2,1 D2;2,2)???? .Please see Figure 2.6 for an illustration of HSS partition and approximationscheme.To optimize memory usage and computational complexity, the HSS rep-resentation is designed to be stored in a neat tree structure. One illustrationof a two-level HSS tree structure (Chandrasekaran et al. (2006)) is shownin Figure 2.7, in which the matrices U ?s, V ?s,D?s are only stored at the leaflevel. Instead small transition matrices R?s,W ?s are stored in the interme-diate levels to compute the corresponding products when performing matrixcomputations.In order to illustrate why HSS representation would be a good fit forthe type of matrices involved in the Newton-schulz algorithm (Algorithm3), we take a sample normalized L matrix (Equation 2.5) as an example,the dimension of which is 512 ? 512, generated with a 4th order finitedifference scheme and periodic boundary condition. Figure 2.8 shows thesingular values of the example matrix. A slow decay pattern indicates that182.3. Efficient Computation of the Spectral ProjectorFigure 2.6: Illustration of a two-level HSS partition scheme, with diagonalsub-matrices Ds and tSVD of low rank off diagonal sub-matrices. (Xia(2012))192.3. Efficient Computation of the Spectral Projector!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!R1;1,2W1;1,2 W1;2,1R1;2,1R2;2,1W2;1,2 W2;2,1B1;2,1B1;1,2B2;1,2B2;2,1 D2;1,1 D2;1,2U2;1,2 U2;2,1V2;2,1B2;1,2B2;2,1R2;1,2W2;1,2R2;1,2V2;1,2D2;1,1 D2;1,2U2;1,2 U2;2,1V2;2,1 V2;1,2R2;2,1W2;2,1Figure 2.7: Two-level HSS tree structure (adapted from Chandrasekaranet al. (2006)), with U, V,D stored at the leaves, translations matrices R,Wstored in the intermediate levels.202.3. Efficient Computation of the Spectral Projector0 100 200 300 400 500 60000.10.20.30.40.50.60.70.80.91Single value pattern of an Helmholtz matrixNumber'of'singular'valuesSingular'valuesSingular'value'pattern'of'an'Hel holtz'matrixSingular)value)pa tern)of)the)sa ple)L)matrixSingular)ValuesSingular)value)indexFigure 2.8: Singular values for a sample L matrix of the size 512 ? 512. Aslowly decay pattern indicates that the regular compression via tSVD wouldeither lose significant amount of information or can not achieve a satisfactorycompression rate.212.3. Efficient Computation of the Spectral Projectorthe regular compression via tSVD would either lose significant amount ofinformation or can not achieve a satisfactory compression rate.Figure 2.9 shows the singular values of the sub-matrices of the HSSpartition. The sub-matrices at the main diagonal exhibit high rank, whileall the off diagonal matrices have a compressible singular value spectrum.Figure 2.10 illustrates that this rank distribution structure is maintainedas the number of iteration grows in Algorithm 3, which indicates that HSSrepresentation is efficient throughout the polynomial recursions.Figure 2.9: Singular value information for a sample L matrix in HSS par-tition. We can see that apart from the small blocks on the main diagonal,all the other blocks are highly compressible. We would expect high effi-ciency from using truncated SVD approximation of the non main diagonalsub matrices, which is the main idea of HSS matrix representation.222.3. Efficient Computation of the Spectral Projectoriter:1 iter:2iter:3 iter:4iter:5 iter:6iteration(1 iteration(2iteration(3 iteration(4iteration(5 iteration(6Figure 2.10: The HSS structure is maintained as the number of iterationsgrows in the previously mentioned polynomial recursion algorithm (Algo-rithm 3).232.3. Efficient Computation of the Spectral Projector2.3.2.2 Fast Matrix Matrix Multiplication via HSS MatrixRepresentationsAs stated in Table 2.1, matrix operations generally exploit linear complex-ities under HSS matrix representations. Specifically, with regard to matrixmatrix multiplications, cubic complexity O(n3) for ordinary ordinary ma-trix matrix multiplication is reduced to linear (O(nk3)) via HSS matrixrepresentation, where n is the matrix size, and k is the maximum rank ofthe transition matrices in the HSS tree. Therefore, in this work we con-centrate on investigating the HSS matrix matrix multiplication algorithmdescribed in Lyons (2005) using MATLAB (http:// www.mathworks.com/products/ matlab/ ). In this work, we investigate this algorithm by its prac-tical numerical performance instead of going into the tedious implementationdetails. Interested readers can refer to Lyons (2005). Below we include anumerical example, showing the algorithm?s performance as compared tothe ordinary matrix matrix multiplication. In the experiment, we recordthe time for a standard matrix matrix multiplication (calculated with thesubroutine dgemm in the BLAS package) of a sample L matrix, and thetime for multiplying two matrices under HSS matrix representations. Thecurve shown in Figure 2.11 is the ratio between the two recorded times fordifferent matrix sizes. The result is based on the average of 20 experimentsfor each matrix size.We can observe and conclude from Figure 2.11 thatwhen the matrix size is large enough, matrix matrix multiplication underHSS representations is faster than standard matrix matrix multiplication.Moreover, as the matrix size grows, the time advantage of using HSS rep-resentations becomes more remarkable. Please note the implementation wehave right now is not the optimal one, there?s still room for improvement ofthe algorithm?s performance.To compute the matrix sign function, Algorithm 3 also involves ma-trix multiplication with a scalar and matrix matrix addition (subtraction).These trivial matrix operations are not so obvious when the matrix is storedin HSS tree structure. But keeping matrix operations under HSS matrix rep-resentation offers linear complexity for the whole algorithm. For details ofthese HSS matrix algorithms, interested readers please refer to Lyons (2005)and Xia (2012).With the HSS matrix representation accelerated matrix matrix multipli-cation algorithm, it seems that we are almost ready to compute the spectralprojector efficiently without eigenvalue decompositions. Before rushing intocomputing the spectral projector, we want to further improve the exist-ing efficiency by using a more advanced technique ? the randomized HSS242.3. Efficient Computation of the Spectral Projector500 1000 1500 2000 2500 3000 3500 40000246810121416performance of HSS accleration on matrix multiplicationmatrix sizet dgemm / t hssFigure 2.11: Computational performance of HSS matrix-matrix multiplica-tion (thss) vs standard multiplication (tdgemm) for Helmholtz matrices fordifferent problem sizes. The horizontal axis is the matrix size, and the ver-tical axis is the ratio between the standard matrix-matrix multiplication(tdgemm) and the HSS matrix matrix multiplication (thss). The result isbased on the average of 20 experiments for each matrix size.252.3. Efficient Computation of the Spectral Projectormethod. The randomized HSS method incorporates a recently developedsuper fast randomized singular value decomposition (rSVD) algorithm intothe HSS framework. Basics and evaluations of this method are introducedin the following section.2.3.2.3 Randomized HSSIn this section, we examine the possibility of further improving the efficiencyof the HSS algorithms by making use of the recently developed randomizedsingular value decomposition (rSVD) algorithm. In many HSS algorithms,especially during the process of constructing HSS representations, SVD?s (orother rank revealing algorithms) are recursively used in compressing the lowrank off-diagonal components. The rSVD algorithm (Tygert and Rokhlin(2007)) is one of those algorithms (Halko et al. (2011)) that compute fastmatrix factorizations by exploiting the random probing technique. In rSVD,random probing is used to identify the compressed subspace that capturesmost of the action of a matrix, where further matrix manipulations areapplied to compute the approximate SVD result.In this work, we implement the rSVD algorithm (Algorithm 4) proposedby Tygert and Rokhlin (2007), and incorporate the rSVD algorithm intothe standard HSS algorithms, generating more efficient randomized HSSalgorithms. Tygert and Rokhlin (2007) claim the rSVD algorithm to have anumerical complexity of O(k2(m+ n)) with a quite small constant numberfor a rank k m? n matrix, where k ? n ? m. The idea of this algorithm isvery simple. To perform a rSVD to a rank k m? n matrix A, first (step 1)an n? (k + 20) random matrix R is multiplied to matrix A to identify therange of A; then the orthonormal basis of the range of A is approximated bylargest k singular vectors (Q) of the resulting tall matrix P (step 2); nexta SVD algorithm is performed to a fairly small matrix S formed by ATQ(step 3, step 4), and the results of the cheap SVD constitute the rSVD forA (step 4, step 5) (Tygert and Rokhlin (2007)).262.3. Efficient Computation of the Spectral ProjectorAlgorithm 4 The basic rSVD algorithm (Tygert and Rokhlin (2007))Input: An m x n matrix A with rank k, k ? n ? moutput: The randomized SVD of A, U,?, V1. Form Pm?(k+20) = Am?nRn?(k+20); R is random2. Form Qm?k, the matrix whose columns consist of the left singularvectors corresponding to the k greatest singular values of Pm?(k+20)3. Form Sn?k = ATn?mQm?k4. Compute the SVD : Tk?k?k?kV Tk?n of STk?n5. Form Um?k = Qm?kTk?kBelow, we show the time improvement of randomized HSS method incomparison with the standard HSS method in HSS tree structure construc-tion (using MATLAB built-in SVD?s). Figure 2.12 shows computationalperformance of the randomized SVD algorithm and the randomized HSSconstruction with the error controlled to the level of 1e-6. The first plotshows an SVD and rSVD comparison on a series of depth extrapolationmatrices L of different sizes. The second plot shows a comparison of HSSconstruction using standard SVD and rSVD. Some general conclusion canbe drawn from those two plots: first, the rSVD algorithm significantly de-creases the computation time compared to standard SVD algorithm; second,randomized HSS construction also significantly improves computation timecompared to the standard SVD algorithm, and its advantage becomes moredistinct as matrix size grows. For other HSS operations, the improvementof randomized HSS would be determined according to how intensively SVDsare included in specific algorithms.To conclude this part, we demonstrate the performance of the new pro-posed randomized HSS accelerated spectral projector computation algo-rithm. In Figure 2.13(a) and Figure 2.13(b) we show the computationalperformance of the spectral projector with eigenvalue decomposition (blueline), with polynomial recursion using ordinary matrix matrix multiplica-tion (black line) and randomized HSS acceleration (red line) in normal andsemi-log scale. As the matrix size grows, the time advantage for randomizedHSS acceleration starts to be obvious, the growing trends showed in thesemi-log plot also suggests the superiority of the proposed randomized HSS272.3. Efficient Computation of the Spectral Projector2000 4000 6000 8000 10000 12000 14000 160000100200300400500matrix sizetime(s)SVD on Helmholtz matrices standard SVDrandomized SVD2000 4000 6000 8000 10000 12000 14000 1600005001000150020002500matrix sizetime(s)HSS construction of Helmholtz matrices standard SVDrandomized SVDFigure 2.12: Computation performance of the randomized HSS constructionat the error level of 1e-6. The first plot shows an SVD and rSVD comparisonon Helmholtz matrices of different sizes. The second plot shows a comparisonon HSS construction with standard SVD and rSVD. A general conclusioncan be drawn from those two figures, randomized HSS construction behavesbetter compared to normal HSS construction in terms of computation time,and its advantage becomes more distinct as matrix size grows.282.3. Efficient Computation of the Spectral Projectoracceleration algorithm regarding to time usage.Based on the previous discussions, we can perform the stabilized two-waywave equation depth extrapolation (Algorithm 2) efficiently with the ran-domized HSS accelerated computation of the spectral projector P. There-fore, in the next section we want to examine and evaluate the algorithmwith some synthetic seismic imaging examples.292.3. Efficient Computation of the Spectral Projector0 2 4 6 8 10 12x 104050010001500200025003000350040004500matrix sizetime (s)spectral projector with eigenvalue decomposition, plain matrix multiplication and hss accleration eigen decompositionplain matrix multplicationhss acclerationComputation*C st*of*Spectral*Projectorolyno i l)recursion)with)ordinary)matrix)multiplicationpolynomial)recursion)with)matrix)multiplication)in)HSS)accelerationigenvalu )decomposi i n(a)0 2 4 6 8 10 12x 10410?1100101102103104matrix sizelog(time (s))spectral projector with eigenvalue decomposition, plain matrix multiplication and hss accleration eigen decompositionplain matrix multplicationhss acclerationComput tion*Cost*of*Spectral*Projector*(log*scale)olyno i l)recursion)with)ordinary)matrix)multiplicationpolynomial)recursion)with)matrix)multiplication)in)HSS)accelerationigenvalu )decomposi i n(b)Figure 2.13: The computational performance of the spectral projector witheigenvalue decomposition, ordinary matrix matrix multiplication and withrandomized HSS acceleration in normal (a) and semi-log (b) scale. As ma-trix size grows, the time advantage for randomized HSS acceleration startsto be obvious, the growing trends from the semi-log plot also suggest thesuperiority of the proposed randomized HSS acceleration regarding to thetime usage.302.4. Evaluation2.4 EvaluationIn this section, we evaluate the imaging performance of the two-way waveequation depth extrapolation migration (Algorithm 2). Two imaging exam-ples are presented, one is a triangle model and the other is a selected partfrom the saltdome model. The later part also shows a theoretical complexityanalysis and a plot of the real time performance.Figure 2.14 shows one migration example with a simple triangle modelperturbation (the bottom plot) in the laterally varying background medium(the top plot). In this experiment, we put sources and receivers at the toplayer grid points and modelled a frequency range of 4 Hz to 25 Hz usingthe linearized constant-density acoustic frequency-domain modelling opera-tor (https:// www.slim.eos.ubc.ca/ SoftwareDemos/ applications/ Modeling/2DAcousticFreqModeling/ ). Figure 2.15 shows the imaging results with two-way wave equation depth stepping migration (Algorithm 2) (middle plot),reverse time migration (RTM) (top plot) and double square root migration(DSR) (Algorithm 1). The images are shown with the source/receiver anddepth coordinates, with the grid intervals for both are 10 meters. Imageresult with the two-way wave equation depth stepping algorithm indicatesits capability of imaging correct dips and positions in laterally varying back-ground medium without some of the unwanted artifacts of RTM imaging.Figure 2.16 shows a more realistic saltdome migration example, withthe laterally varying background model (the top plot), the model pertur-bation (the bottom plot). In this experiment, we also put the sources andreceivers on the top layer grid points and modelled a frequency range of4 Hz to 35 Hz using the linearized constant-density acoustic frequency-domain modelling operator (https:// www.slim.eos.ubc.ca/ SoftwareDemos/applications/ Modeling/ 2DAcousticFreqModeling/ ). Depth interval and hor-izontal grid interval are all 10 meters. Figure 2.17 shows the imaging resultwith two-way wave equation depth stepping migration (Algorithm 2) (mid-dle plot), reverse time migration (RTM) (top plot) and double square rootmigration (DSR) (Algorithm 1). Observation from the imaging results inthis example is consistent with that in the previous example.In Table 2.2, we include a rough theoretical numerical complexity analy-sis of the two-way wave equation migration algorithm. n is the problem size(lateral grid points), n? is the number of frequencies and nz is the numberof depth levels. Since n is always much bigger than the off-diagonal ranksor the Newton-Schulz iterations, here we only consider the leading ordercomplexity in n. The total time of the migration algorithm mainly consistsof the time for computing the spectral projector and the time for depth ex-312.4. Evaluationbackground modelsource/receiver coordinatedepth coordiante 20 40 60 80 100 1201020304050 200025003000model perturbationsource/receiver coordinatedepth coordiante 20 40 60 80 100 1201020304050 ?50050Figure 2.14: A sample migration example with the triangle model pertur-bation.322.4. Evaluationsource/receiver coordinatedepth coordinatereverse time migration result 20 40 60 80 1001020304050model perturbationsource/receiver coordinatedepth coordiante20 40 60 80 100 1201020304050source/receiver coordinatedepth coordinatedepth stepping migration result 20 40 60 80 1001020304050DSR$migration$resultFigure 2.15: The imaging results with two-way wave equation depth steppingmigration (Algorithm 2) (middle plot), reverse time migration (RTM) (topplot) and double square root migration (DSR) (Algorithm 1).332.4. Evaluationmodel perturbationsource/receiver coordinatedepth coordiante 20 40 60 80 100102030405060 ?150?100?50050100150background modelsource/receiver coordinatedepth coordiante 20 40 60 80 100102030405060 28003000320034003600380040004200Figure 2.16: A sample migration example with the saltdome model pertur-bation.342.4. Evaluationsource/receiver coordinatedepth coordinateDSR migration result20 40 60 80 100102030405060two?way depth stepping migration resultsource/receiver coordinatedepth coordiante20 40 60 80 100102030405060reverse time migration resultsource/receiver coordinatedepth coordiante20 40 60 80 100102030405060Figure 2.17: The imaging results with two-way wave equation depth steppingmigration (Algorithm 2) (middle plot), reverse time migration (RTM) (topplot) and double square root migration (DSR) (Algorithm 1).352.4. EvaluationOperation ComplexityrSVD O(n)HSS construction O(n)HSS matrix matrix addition O(n)HSS matrix matrix multiplication O(n)HSS matrix scalar multiplication O(n)Depth extrapolation (Sandberg and Beylkin (2009)) O(n)The two-way wave equation depth extrapolation migration O(n?nzn)Table 2.2: Theoretical computation complexity analysis of the two-way waveequation migration algorithm, where n is the problem size (lateral gridpoints), n? is the number of frequencies and nz is the number of depth levels.Since n is always much bigger than the diagonal ranks or the Newton-Schulziterations, here we only consider the leading order complexity in n.trapolation (row 1-5) with the stabilized depth extrapolation operator usingfinite difference scheme (row 6). Since both of the these two parts can becomputed linearly with respect to the problem size, we can conclude thatthe total numerical complexity for the two-way wave equation migration al-gorithm would be also linear in the problem size, scaled by the number offrequency samples and depth levels.In Figure 2.18, we include an example showing the actual computationalperformance of the above fast two-way wave equation migration algorithmaccelerated with the randommized HSS technique. For simplicity and with-out loss of generality, the test cases use homogeneous background modelswith a single point perturbation at a fixed depth level, and a single fre-quency. The theoretical performance of the migration algorithm is linear inthe problem size, which is shown as a red line (linear trend scaled by thenumber of depth levels). The other reference curves include y = nz?x1.2 andy = nz ? x1.1. The true time performance of the above migration algorithmis shown as a black line. From this picture, we can observe that the actualperformance (black line) is not exactly linear in the problem size. In factthe complexity is in between y = nz ? x1.2 and y = nz ? x1.1. This resultshould be within expectation, since the current implementation is not opti-mal in any sense. However, the current performance already suggests it is avery promising imaging method in comparison to the reverse time migration(square or even cubic complexity) in terms of computational costs.362.5. ConclusionFigure 2.18: Computational performance of the above fast two-way waveequation migration algorithm. The test cases for this analysis contain a sin-gle point model perturbation. For simplicity and without loss of generality,we use only one frequency. The theoretical performance of the migrationalgorithm is linear in the problem size, which is shown as a red line (lineartrend scaled by the number of depth levels). Other reference curves includey = nz ? x1.2 and y = nz ? x1.1. The true time performance of the abovemigration algorithm is shown as a black line.Evaluations of the randomized HSS accelerated two-way wave equationdepth stepping algorithm in this section suggests that it can produce rea-sonably good images in laterally varying media and the computational costsare more affordable in large size imaging problems.2.5 ConclusionIn this chapter, we implemented a two-way wave equation based depth step-ping algorithm, featuring an eigenvalue decomposition free spectral projectorto filter out the unstable evanescent wave components for the stable depthextrapolation with two-way wave equation. We proposed an accelerationwith the randomized HSS technique to compute the spectral projector. Themigrated images and time costs in the imaging experiments suggest thatthe accelerated algorithm can image correct dips and positions in laterallyvarying media within reasonable time.37Chapter 3?Joint Sparsity? PromotingSeismic ImagingCompressive sensing theory enables recovery of a signal from fewer samplesthan what would be required by the Shannon-Nyquist theorem by makinguse of the sparsity structure of the signal. Therefore, a reasonable hypothesiswould be that if other additional structures besides sparsity of the signal canbe also taken into account, the recovery can perform even better: either abetter recovery with the same number of samples or equivalent recovery witheven fewer samples. One example of the additional structure would be theso called ?joint sparsity?. The concept of ?joint sparsity? originally arisesin the multiple-measurement-vector (MMV) problems, where instead of asingle measurement b, we are given a set of r measurementsb(k) = Ax(k)0 , k = 1, ..., r. (3.1)MMV problem is an extension of the classical compressed sensing single-measurement-vector (SMV) problem (b = Ax) to multiple experiments. Akey feature of the MMV problem is that the vectors xk are not only sparse,they are also jointly sparse, i.e. they have nonzero entries at the samelocations. ?Joint sparsity? can also be extended to the single signal case, ifthe signal can be rearranged into a sparse matrix with the nonzero entriesrestricted to a subset of rows. Figure 3.1 shows an example of a signalthat possesses ?joint sparsity? structure. The plot at the top shows thesignal displayed as a long vector. Besides the most significant structure? sparsity, ?joint sparsity? structure also exists among the short vectorsdivided by vertical red dash lines. As shown in the ?stem plot? at thebottom, the supports of individual short sparse vectors coincide with eachother. Namely, the sparse matrix formed with those short vectors as columnshas nonzero entries restricted to a subset of rows.Unlike the classical single-measurement-vector (SMV) problem, thor-ough theoretical analysis and experimental results are lacking for the multiple-measurement-vector (MMV) problem at present (Chen and Huo (2006) and38Chapter 3. ?Joint Sparsity? Promoting Seismic Imaging50 100 150 200 250 300 350 384?3?2?10123jointly sparse signalsignal smaplesignal amplitude10 20 30 40 50 60 64?3?2?10123signal supportsignal smaplesignal amplitude 1st short vector2nd short vector4rd short vector4th short vector5th short vector6th short vectorFigure 3.1: An example of a signal that possesses ?joint sparsity? structure.The plot at the top shows the signal displayed as a long vector. Besides themost significant structure ? sparsity, ?joint sparsity? structure also existsamong the short vectors divided by vertical red dash lines. As shown in the?stem plot? at the bottom, the supports of individual short sparse vectorscoincide with each other. Namely, the sparse matrix formed with those shortvectors as columns has nonzero entries restricted to a subset of rows.39Chapter 3. ?Joint Sparsity? Promoting Seismic ImagingTropp (2006)). Existing recovery algorithms include both greedy algorithmsand convex-relaxation algorithms. For greedy algorithms, some well per-forming examples include Orthogonal Matching Pursuit for MMV (OMP-MMV) from Chen and Huo (2006), Simutaneous Orthogonal Matching Pur-suit from Tropp (2006) and p-thresholding and p-simultaneous matchingpursuit (p-SOMP) from Gribonval et al. (2008). The most widely studiedconvex relaxation approach is based on solving a mixed norm problem (Chenand Huo (2006), Berg and Friedlander (2009)),minx?Rn?r||X||p,q subject to AX = B, (3.2)where X stands for the unknown jointly sparse signal to be recovered, ar-ranged as a matrix, A stands for the sampling matrix, B stands for thesampling results arranged in the same way as the signal, and the mixed `p,qnorm of X is defined as||X||p,q =??n?j=1||Xj?||pq??1/p, (3.3)where Xj? is the (column) vector whose entries form the jth row of X.In terms of choices for the mixed norm parameters p and q, there is noclear study showing which particular set is optimal. Related studies includep = 2, q ? 1 from Cotter et al. (2005), p = 1, q =? from Tropp (2006) andp = 1, q ? 1 from Chen and Huo (2006), Berg and Friedlander (2009) andEldar and Rauhut (2010). In this thesis we?ll focus on p = 1, q ? 1, whichis also called the ?sum-of-norms? formulation, and particularly on the casewhen p = 1, q = 1 and p = 1, q = 2. Notice that when q is taken as 1,the mixed `p,q norm degenerates into the `1 norm of the vectorized signal,which is essentially pure sparsity instead of ?joint sparsity?. Therefore, aperformance comparison between p = 1, q = 1 and p = 1, q = 2 would beequivalent to that between pure sparsity and ?joint sparsity?.The remainder of this chapter is arranged as follows: section 3.1 brieflygoes through the recovery theory of ?joint sparsity?; section 3.2 covers ex-isting theoretical and experimental results regarding to p = 1, q = 1 andp = 1, q = 2 in different circumstances; section 3.3 illustrates the domainwhere ?joint sparsity? structure exists in seismic imaging; section 3.4 pro-pose the ?joint sparsity? promoting seismic imaging scheme and examine itsreal performance.403.1. ?Joint Sparsity? Recovery Theory3.1 ?Joint Sparsity? Recovery TheoryThe most detailed analysis of ?joint sparsity? recovery theory that the au-thor is aware of is from Chen and Huo (2006), in which sparsity rank R inthe MMV problem, as a counterpart of ||x||0 from SMV, is defined as (Chenand Huo (2006)):R(X) = ||(m(xi))||0, (3.4)where xi is the transpose of the ith row of the matrix X, m(?) is any vectornorm. To keep consistent with the previous notations, rewrite R(X) as:R(X) = #(||Xj?||q 6= 0, j = 1, ..., n), (3.5)where # stands for ?the number of?, and the above equation essentiallymeans counting the number of row vectors from X that have nonzero `qnorm. Accordingly, a natural extension from SMV ||x||0 recovery:min ||x||0 subject to Ax = b (3.6)to ?joint sparsity? recovery can be written as follows (Chen and Huo (2006)):min R(X) subject to AX = B. (3.7)Theorem 2.2 and Theorem 2.4 in Chen and Huo (2006) claim that theunique recovery condition for the above sparse rank minimization problemis:R(X) < Spark(A)2 , (3.8)orR(X) < Spark(A)? 1 + Rank(B)2 , (3.9)where ?Spark(A) (or ?) is the smallest possible integer such that there exists? columns of matrix A that are linearly dependent.? Chen and Huo (2006)also prove that in terms of mutual coherence, the unique recovery conditioncan be rewritten as:R(X) ? M?1 + Rank(B)2 (3.10)where the mutual coherence is defined as:M = M(A) = max1?i,j?n,i 6=j|G(i, j)| (3.11)and G(i, j) is the (i, j)th entry of the Gram matrix G: G = ATA.413.2. ?Joint Sparsity? Optimization with the Sum-of-norms FormulationIn order to construct a convex relaxation of the above ?joint sparsity?R(X) minimization problem, as a counterpart of the `1 convex relaxation inSMV recovery problemmin ||x||1 subject to Ax = b, (3.12)we arrive at the MMV relaxed problem formulation(Chen and Huo (2006)):Relax(X) = ||(m(xi))||1 (3.13)which essentially is equivalent to:Relax(X) =n?j=1||Xj?||q, (3.14)Therefore, the corresponding convex relaxation optimization is formulatedas (Chen and Huo (2006)):min Relax(X) subject to AX = B, (3.15)Theorem 3.3 in Chen and Huo (2006) shows that the recovery of therelaxed MMV `1 problem is unique under the following conditions: For adictionary A, if AX = B andR(X) < 1 +M?12 (3.16)then the matrix X is the unique solution to Equation 3.15, and this solutionis identical with the solution of Equation 3.7.One interesting point we need to mention here is the above conditions aretrue for any normm(?) (Chen and Huo (2006)). In the next section, we wouldlike to further investigate ?joint sparsity? recovery with the most widelystudied norm formulation, the sum-of-norms approach, when p = 1, q ? 1.3.2 ?Joint Sparsity? Optimization with theSum-of-norms FormulationAs stated in the previous section, all the unique recovery conditions areapplicable to any norm m(?). Chen and Huo (2006), Berg and Friedlander(2009) and Eldar and Rauhut (2010) discuss various options for choosingm(?). In Chen and Huo (2006) the authors demonstrate that there is nosignificant difference among different vector norms. Theorem 3.2 in Berg423.3. Ray Parameter Image Volumeand Friedlander (2009) implies that using the restricted isometry theoryto analyze the uniform recovery conditions of the sum-of-norms approach,necessarily leads to results that are no stronger than uniform `1 recovery.In particular the `1,2 norm is compared against the `1,1 norm in Berg andFriedlander (2009), and they show that there are problems that can be solvedwith one approach but not with the other.However, according to Eldar and Rauhut (2010), though the theory anal-ysis predicts no performance gain, in practice MMV sparse recovery perfor-mance is superior to SMV, beyond what the theoretical worst case analysiswould lead us to expect. The average case study (Eldar and Rauhut (2010))also shows that with a very mild condition on the sparsity and on the dictio-nary coherence, the probability of recovery failure would decay exponentiallywith the number of channels.In this work we focus on preliminary application of ?joint sparsity? to theseismic imaging scenario using particular sum-of-norms formulations. In thefollowing part of this chapter, the `1,2 norm recovery is selected to compareagainst `1,1 norm recovery, which is equivalent to pure sparsity recovery, toinvestigate if there is any potential gain by making use of ?joint sparsity? forseismic imaging. Before rushing into the ?joint sparsity? promoting seismicimaging, we?ll first investigate the domain where ?joint sparsity? exists inseismic imaging in the next section.3.3 Ray Parameter Image VolumeBefore going directly to ?joint sparsity? promoting seismic imaging, in thissection we?ll first clarify possible circumstances where joint sparsity struc-ture may exist in seismic imaging, and its construction. For simplicity,examples shown in this part are based on the one-way wave equation basedDSR migration (Section 2.1), however the extension to the two-way waveequation based migration (Section 2.2) is quite straightforward. The con-clusion should hold in both cases, regardless of which migration techniqueit is based on.One promising choice where ?joint sparsity? structure would exist inseismic imaging is the so-called ray parameter image volume. The ray pa-rameter image volume contains multiple similar 2D images (ray parameterimages) with respect to different ray parameters, where the ray parameterscan be associated with either different horizontal slowness or different anglesof incidence. Ray parameter image volume is a widely used tool in seismicimaging context, for example in migration velocity analysis (MVA) or angle433.3. Ray Parameter Image Volumeversus offset analysis (AVO), especially in areas where there are complexgeological structures. An accurate ray parameter image volume is almostessential if we want to get an accurate velocity model. According to the sem-blance principle (Kleyn (1983),Yilmaz and Doherty (1987)), if the velocitymodel is correct, the resulting ray parameter image volume should be flatalong the ray parameter dimension. In other words, all non-flat componentswould be related to noise or inexact velocity model. Therefore, assumingone image in the ray parameter image volume has a sparsity structure in acertain domain, then all the other images in the ray parameter image volumeshould also be sparse in the same domain. Moreover, the sparse coefficientsfor different ray parameter images would also be similar to each other ?sharing the same supports.In Figure 3.2, we show one example of the ?joint sparsity? structure inthe image volume with a saltdome model. The top plot in Figure 3.2 showsa ?slice view? of the ray parameter image volume that plotted against mid-point position, ray parameter and depth level. It shows image slices corre-sponding to different ray parameters are similar to each other, and midpointslices have flat events. The bottom plot in Figure 3.2 shows a ?support view?of the same ray parameter image volume, in which different images corre-sponding to three selected ray parameters are plotted as long vectors. Anobvious ?joint sparsity? pattern is observed among those different vectors.To generate the ray parameter image volume, in this work we follow thepost imaging common angle imaging (CAI) approach developed by Biondiand Symes (2004). According to Biondi and Symes (2004), computation ofthe image volume after imaging is more efficient and accurate compared tothat before imaging. Essentially, the construction procedure is a combina-tion of the DSR imaging, a midpoint-offset transform and a slant stack. Atthe first stage, an extended image volume is computed via the DSR imaging(Algorithm 1), except replacing the zero offset imaging condition (step 5)with taking the whole data matrix D(xs, xg) in the source receiver domainat each depth level. Then at the second stage, the extended image volumeis transformed into the midpoint-offset domain. Lastly, a ray parameter im-age volume can be formed by applying slant stacks (Schultz and Claerbout(1978)) to each midpoint slice. As a result, multiple ray parameter imagesare produced, each corresponding to a specific ray parameter.Having clarify the existence of the ?joint sparsity? structure in the rayparameter image volumes. In the next section we?ll formulate the ?jointsparsity? promoting seismic imaging scheme and evaluate its performance.443.3. Ray Parameter Image VolumeRay$parametersDepthMidpoints100 200 300 400 500 600 700 8009.769.7659.779.7759.789.7859.799.7959.89.805 x 1011midpoint ? depth levelimage amplitudesupport view of an image volumeFigure 3.2: Top plot, slice view of the image volume via a DSR ray parameterimage gather migration, with axes midpoint position, ray parameter anddepth level. We can see image slices with different ray parameters are similarto each other, and the slice with same ray parameter and different midpointshas flat lines. Bottom plot, support view of the image volume for selectray parameters. We can see a joint sparsity structure among images withdifferent ray parameters.453.4. Problem Formulation and Evaluation3.4 Problem Formulation and Evaluation3.4.1 Problem FormulationTo obtain an optimization solver for ?joint sparsity? problems, we chooseto implement a ?joint sparsity? adaptation to the SPG`1 framework. Thereasons for choosing SPG`1 includes its suitability for seismic problems aswell as its straightforward support for extensions in functionality. For exam-ple, to implement the `p=1,q sum-of-norms ?joint sparsity? recovery schemeunder the SPG`1 framework, only the primal norm, the dual norm, andthe norm projection function need to be redefined. The primal norm fol-lows Equation 3.3, the dual norm is defined according to Corollary 6.2 invan den Berg and Friedlander (2011), and the projection function is formu-lated by a modification of the original `1 norm projection according to thesum-of-norms definition.With the construction of both the operator that can generate the ?jointsparsity? ray parameter image volume, and the corresponding solver, it?ssuitable to introduce the jointly sparse promoting seismic imaging scheme:min ||X||p,q subject to ||RMA(S?X)? b||2 ? ?, (3.17)where X is an unknown 2D matrix whose columns are the sparse repre-sentations of images corresponding to specific ray parameters, RM is thesampling matrix (e.g. the random Gaussian matrix), A is the operator thatgenerates the wavefield based on ray parameter image volumes, S is thesparsity transform (e.g. the Curvelet transform or the Dirac basis), b isthe observed wavefield corresponding to RM , and ? is a parameter thatdescribes the noise level and controls the accuracy of the data fit. Recallthat the usual sparsity promoting (on the migration image) seismic imagingscheme (Herrmann et al. (2012)) can be written as:min ||x||1 subject to ||RMAS?x? b||2 ? ?. (3.18)where x is an unknown sparse representation of the migration migrationimage, and A is the operator that generates the wavefield based on a 2-Dmigration migration image. Note that in our case, we choose to implementA as the adjoint of the operator that computes the ray parameter imagevolume as discussed in the previous section, and A as the adjoint of theoperator that performs DSR migration.The remainder of this section includes the evaluation of the ?joint spar-sity? promoting scheme (p = 1, q = 2) and its comparison against the puresparsity promoting scheme (p = 1, q = 1) for migration images (section3.4.2) and ray parameter image volumes (section 3.4.3).463.4. Problem Formulation and Evaluation3.4.2 ?Joint Sparsity? Promoting Seismic Imaging forMigration ImagesIn this section we apply the proposed ?joint sparsity? promoting seismicimaging scheme (Equation 3.17) with p = 1, q = 2 (joint sparsity) andp = 1, q = 1 (pure sparsity) to create migration images from pre stack data.For comparison, we also include an image generated by the usual sparsitypromoting method(Equation 3.18). The migration images are generated bysimply applying the adjoint slant stacks operation and a zero offset imagingcondition to the resulted ray parameter image volumes. The experimentsare based on a part of the saltdome model. The horizontal and vertical in-tervals for this model are all 10 meters. The pre stack data is generated withthe linearized constant-density acoustic frequency-domain modelling opera-tor (https:// www.slim.eos.ubc.ca/ SoftwareDemos/ applications/ Modeling/2DAcousticFreqModeling/ ) using a ricker wavelet. A taper boundary is alsoused on the left and right boundary of the computational domain to suppressreflections from these boundaries during imaging. We use a flat Gaussianmatrix as the simultaneous sensing matrix, which essentially means all thesources are fired simultaneously with random weights for several times.TheDirac basis is used as the sparsity basis. The results should be extendableto other sensing matries and sparsity basis quite straightforwardly as longas the unique recovery condition is satisfied.Result images selected to compare are the corresponding migration im-ages obtained from the resulted ray parameter image volumes, which aregenerated by solving different norm realizations (`1,1 : p = 1, q = 1 or`1,2 : p = 1, q = 2) of the Equation 3.17. In Figure 3.3, the top plot showsthe resulted migration image from the `1,1 norm pure sparsity formulation,the middle plot shows the resulted migration image from the `1,2 norm ?jointsparsity? formulation. Result with `1 sparsity promoting on the migrationimage by solving the Equation 3.18 is also shown in the bottom plot of Fig-ure 3.3 as a reference. All the results are generated with a fixed number ofiterations using the original and adapted SPG`1 solver using a zero vectoras the initial guess. The comparison is based on the fact that in both ofthe original and adapted SPG`1 solver, the number of matrix vector mul-tiplications is proportional to the number of iterations. The matrix vectorproducts correspond to migration or demigration, which dominate the totaloptimization time. The iteration number is chosen such that it is sufficientfor generating a proper migration image (Figure 3.3 bottom plot) with thesparsity promoting on migration image scheme (Equation 3.18).Two observations can be made based on the results shown in Figure 3.3.473.4. Problem Formulation and Evaluationpost stack image with l1,1 sparsity promotingsource/receiver cooridinatedepth coordinate10 20 30 40 50 6020406080100120post stack image with l1,2 sparsity promotingsource/receiver cooridinatedepth coordinate10 20 30 40 50 6020406080100120post stack image with l1 sparsity promotingsource/receiver coordinatedepth coordinate10 20 30 40 50 6020406080100120Figure 3.3: Top plot, migration image from pure sparsity recovery, using `1,1on the ray parameter image volume. Middle plot, migration image from jointsparsity recovery, using `1,2 on the ray parameter image volume. Bottomplot, migration image from pure sparsity promoting on migration image 483.4. Problem Formulation and EvaluationOne most obvious observation is that within a fixed number of SPG`1 it-erations, the resulting migration images based on the ray parameter imagevolume recovery (Equation 3.17), can not match in quality with the oneproduced by the migration image recovery (Equation 3.18). This is true forboth `1,1 and `1,2 norm formulation. One possible reason for the above isthat the in ray parameter image volume domain, the number of unknownsincreases dramatically, impacting the convergence rate and resulting in aless satisfactory image within a fixed number of iterations. However, whencomparing the two images acquired via sparsity promotion in the ray param-eter image volume domain, an advantage is visible from the one recoveredwith `1,2 norm ?joint sparsity? recovery. It shows less crosstalk and broaderillumination of the deeper events.We can draw the conclusion based on the presented imaging results thatsparsity promotion (either pure sparsity or ?joint sparsity?) with ray pa-rameter image volumes is not a suitable choice if the aim is only to producemigration images. However, the `1,2 norm ?joint sparsity? generates an im-proved migration imaging recovery compared to the `1,1 norm, although notsignificantly.3.4.3 ?Joint Sparsity? Promoting Seismic Imaging for RayParameter Image VolumeAlthough joint sparsity recovery formulations may not be the ideal choice togenerate a migration image. The ray parameter image volumes have thereown importance in various seismic applications. Therefore, in this sectionwe further examine the resulting ray parameter volumes without giving up?joint sparsity? promoting seismic imaging too early. In Figure 3.4, we showan example of the midpoint slices (at the 64th midpoint position) from theresulting image volumes obtained with `1,1 (left plot) and `1,2 (right plot)norm recovery.From the midpoint slices shown in Figure 3.4, we can observe that boththese methods are able to produce reasonable image panels, however, theresult with the ?joint sparsity? via `1,2 norm optimization (Figure 3.4(b)) ismuch clearer compared to the one obtained with the `1,1 norm optimization(Figure 3.4(a)). Besides, a broader illumination of the deeper events isalso visible. Notice, that the above observations are consistent with theobservations regarding the migration images in the previous section.Based on the observations made in this section, we can conclude that the`1,2 norm ?joint sparsity? promoting imaging scheme (3.17) offers a betterimaging capability compared to the `1,1 norm sparsity promoting imaging493.4. Problem Formulation and Evaluationimage panel from l1,1 recoveryray parameters coordinatedepth coordinate20 40 6020406080100120(a)image panel from l1,2 recoveryray parameters coordinatedepth coordinate20 40 6020406080100120(b)Figure 3.4: (a) midpoint slice from sparsity recovery result from `1,1; (b)joint sparsity recovery result from `1,2503.4. Problem Formulation and Evaluationscheme for constructing the ray parameter image volumes.513.5. Conclusion3.5 ConclusionIn this chapter, we investigated the potential of using ?joint sparsity? toimprove the current imaging . We first introduced the ?joint sparsity?theory and the sum-of-norms recovery approach. Then we proposed andimplemented a ?joint sparsity? promoting seismic imaging scheme withinthe domain of ray parameter image volume. Evaluation in real imagingexample suggested a practical superior imaging quality by making use ofthe ?joint sparsity?. Even though the migration image generated with `1,2norm ?joint sparsity? promoting imaging scheme is not as perfect, it offers areasonable choice to produce ray parameter imaging volumes in applicationssuch as AVO or MVA. In other words, seismic imaging offers another exam-ple where practically superior performance of the sum-of-norms with the `1,2norm formulation is observed compared to the `1,1 norm formulation. Weadmit that the results shown in this work are quite preliminary, since thereare many aspects that we can improve upon in the current ?joint sparsity?promoting imaging scheme, such as exploring other domains where ?jointsparsity? can be better revealed and other norm formulations that are moresuitable to seismic images.52Chapter 4Large Scale OptimizationOne general concern with seismic imaging algorithms is their efficiency,since a seismic imaging problem size can easily scale to millions of sam-ple points and a single run of seismic migration (demigration) might takeup to weeks. The situation is even worse for iterative imaging algorithms(e.g., the sparsity promoting seismic imaging algorithm), since iterativelyevaluating the objective function generally involves computing seismic mi-grations and (or) demigrations repeatly. In order to improve the efficiencyof sparsity promoting seismic applications, we propose a new optimizationsolver ? PQN`1. PQN`1 is based on a current state-of-the-art optimizationsolver ? SPG`1(van den Berg and Friedlander (2008)). Analysis and resultsin the following sections show the new algorithm is more efficient comparedto SPG`1 in large scale sparsity promoting seismic applications. A key fea-ture of the new solver is the decrease in the number of iterative evaluationof the objective function.The new algorithm is motivated by a thorough study of the computa-tional time usage of SPG`1. In Figure 4.1, we show a screen shot of anSPG`1 log file for one sparsity promoting seismic imaging experiment. Con-tents highlighted on the right with a circle indicate an imbalance patternin time usage. The ?total time? stands for the total computational timefor solving the sparsity promoting imaging problem, the ?mat-vec time?stands for time used for computing matrix-vector products (or operator-vector products if the seismic modelling or migration kernels are given asimplicit operators instead of explicit matrices), and ?project time? standsfor the time used to compute feasible descent directions for different iter-ations (projecting spectral gradient directions onto the sparse constraint).The total time is dominated by ?project time? and ?mat-vec time?, wherethe latter contributes over 99% of the total. This imbalance is consistentwith the nature of our seismic imaging problem: the ?mat-vec? (A ? x orA? ? x), the migration or demigration kernel, is the most computationallyintensive part of the whole problem, while the projection onto explicit sparseconstraint can be computed in much less time. Contents highlighted on theleft with the red circle demonstrate that the number of products with A and534.1. Projected Quasi Newton (PQN) MethodA? is proportional to the number of total iterations. This means that onaverage one migration and one demigration procedure is conducted at eachiteration, in other words, the number of total iterations can effectively reflectthe total computational expense. Therefore, to enhance the computationalperformance of SPG`1, we need to focus on decreasing the total number ofiterations. Based on these observations, we propose to introduce a secondorder method (Projected Quasi Newton (PQN) (Schmidt et al. (2011))) toreplace the original SPG algorithm in SPG`1. By projecting onto a moreexpensive but more effective quasi Newton direction at each iteration, thenew algorithm features a decease of total computation time and a shift ofthe computational burden from ?mat-vec time? to ?project time?.The following sections are arranged as follows: section 4.1 introducesthe Projected Quasi Newton algorithm, section 4.2 presents the new PQN`1algorithm, and section 4.3 shows numerical and seismic experiments evalu-ating the performance of the proposed algorithm.Iterations stepG nnzgnnzXRelative Gap Relative Error gNormObjective valueFigure 4.1: An example of an SPG`1 log file. There are two things thatneeded to be pointed out. First, the total time is a summation of ?projecttime? and ?mat-vec time?, where the mat-vec time is taking 99% of thetotal. Second, the number of mat-vecs is proportional to the total numberof iterations.4.1 Projected Quasi Newton (PQN) MethodThe projected quasi Newton (PQN) method belongs to the category of pro-jected Newton methods, which are extensions of Newton?s methods fromunconstrained optimization problems to constrained optimization problems.The projected Newton method stems from Levitin and Polyak (1966) and544.1. Projected Quasi Newton (PQN) MethodBertsekas (1982), with variations developed to fit different problems such asthe two metric projection method (Gafni and Bertsekas (1984)) and the pro-jected quasi Newton method (Schmidt et al. (2009)). According to Schmidtet al. (2011), projected Newton methods shares many of the appealing prop-erties of Newton methods. For example, the objective function is guaranteedto improve for small enough step size, and rapid local convergence rates existaround local minima. The projected quasi Newton algorithm we use in thiswork is based on the algorithm proposed in Schmidt et al. (2011), as shownin Algorithm 5. This algorithm solves for a limited-memory quasi-Newtondirection at each iteration, and is claimed to be suitable and competitive forlarge-scale optimization problems.Algorithm 5 Projected Quasi Newton (PQN) (Schmidt et al. (2011))Given x0 ? C, B0 0for k = 0, ... , until some stopping criteria met doStep I: Build Qk(x, ?) = f(xk)+(x?xk)??f(xk)+ 12?(x?xk)?Bk(x?xk)repeatStep IIa: Solve over C for the minimizer of Qk(x, ?): x?kaStep IIb: Update xk+1 ? xk + ?(x?ka ? xk)Step III: Update ? and/or ?until descent condition (Armijo condition: f(x(k+1)) ? f(xk) +???f(xk), xk+1 ? xk?, ? ? (0, 1)) is satisfied ;endMathematically, the projected quasi Newton method can be used to solveany optimization problem that can be reformulated as follows (Schmidt et al.(2012)):minxf(x) subject to x ? C (4.1)where f is a twice continuously differentiable and convex function and Cis a convex constraint set. At iteration k, step I of the algorithm builds aquadratic approximation of the objective function around the current itera-tion xk:Qk(x, ?) = f(xk) + (x? xk)??f(xk) + 12?(x? xk)?Bk(x? xk), (4.2)where ? stands for a positive stepsize and Bk stands for a positive definiteapproximation of the Hessian matrix at xk (Hk : ?2f(xk)). Theoretically,554.1. Projected Quasi Newton (PQN) Methodthe Hessian approximation Bk can be formulated via any user defined ap-proach to be fit with different problems? properties. In PQN, the limited-memory Broyden-Fletcher-Goldfarb-Shanno (L-BFGS or LM-BFGS), a lim-ited memory quasi Newton method, is used for the adaptability to largescale optimization problems.The L-BFGS algorithm uses a limited memory variation of the BFGS up-date to approximate the Hessian matrix. Unlike the original BFGS methodwhich stores a dense n?n approximation of the Hessian matrix, the L-BFGSmethod only stores a few (m ? 10) vectors that can represent the approx-imation implicitly. For theoretical background and implementation detailsof L-BFGS method, interested readers can refer to any standard optimiza-tion textbook. The choice of L-BFGS approach in the PQN algorithm has akey advantage of reducing memory usage in our seismic imaging problems.Essentially, instead of a full n ? n Hessian approximation, where n is thelength of the vectorized image update, we only need to store a m ? n Bkapproximation of the Hessian. The L-BFGS also offers us the freedom totrade off between CPU and Memory usage. Even though Bk is small, it?snot free in terms of memory. Generally, the closer the rank of Bk is tothe rank of the true Hessian, the faster the PQN?s convergence. So we canfreely determine how much memory we want to pay for the acceleration ofconvergence on a case by case basis (examples included in section 4.3).At step II of the PQN algorithm (Algorithm 5), xk+1 is updated via theminimizer of the quadratic approximation Qk:x?a := arg minx?CQk(x, ?), (4.3)xk+1 ? xk + ?(x?ka ? xk), (4.4)where ? ? (0, 1] is another step size. To ensure sufficient decrease in theobjective value, an Armijo line search is performed. Specifically, minimizingthe quadratic approximation Qk can be performed via any general iterativeoptimization algorithm, which offers another degree of freedom among dif-ferent problems. In PQN`1, the spectral projected gradient (SPG) method,which is also used in the solver SPG`1, is chosen for its favourable experimen-tal performance. Note, Schmidt et al. (2012) claim that the SPG subroutinedoes not need to solve Equation 4.3 exactly, iin fact one SPG iteration ini-tialized by xk is enough to guarantee a descent direction. Details of SPG andSPG`1 are included in Appendix A. Given the appealing convergence rateand suitability of the PQN algorithm, we evaluate the possibility of incor-porating it into the framework of SPG`1 algorithm in the following section.564.2. From SPG`1 to PQN`1The SPG`1 algorithm is extensively used in our current sparsity promotingseismic imaging algorithms.4.2 From SPG`1 to PQN`1As introduced in Appendix A, the key idea of SPG`1 is decomposing theBPDN problem (min ||x||1 subject to ||Ax? b||2 ? ?) into several relativelyeasy Lasso problems (min ||Ax ? b||2 subject to ||x||1 ? ?) with an explicitconstraint ? . In SPG`1, each individual Lasso problem is solved by means ofa first order spectral projected gradient (SPG) method. The SPG algorithmcan be reformulated as finding the minimum of a first order approximation tothe objective function within a feasible domain constraint at each iteration:xk+1 = arg min||x||1??f(xk) + (x? xk)? 5 f(xk) + 12? ||(x? xk)||22 (4.5)where ?f(xk) is the gradient at xk, ? is a spectral step size parameter.This sub optimization routine is very easy to solve, since Equation 4.5 canbe rewritten as:xk+1 = arg min||x||1??||x? (xk ? ?5 f(xk)||22. (4.6)which is essentially the Euclidean projection onto the one norm ball definedby the constraint ||x||1 ? ? . If we denote P as the projection functiondefined under the Euclidean norm, the closed form solution to Equation 4.5is:xk+1 = P||x||1???(xk ? ?5 f(xk)). (4.7)Even though each iteration of SPG is extremely easy, the general low conver-gence rate of the first order method is not ideal in our case, where iterativelyevaluating the objective function needs to be reduced. Therefore, we hopeto effectively control the convergence rate by incorporating the second or-der PQN method into the SPG`1 framework to replace the first order SPGmethod.In the PQN method, at each iteration we instead solve a second orderapproximation to the objective function written as :xk+1 = arg min||x||1??f(xk) + (x? xk)? 5 f(xk) + 12?(x? xk)?Bk(x? xk),(4.8)574.2. From SPG`1 to PQN`1where Bk is the limited memory quasi Newton Hessian matrix, and ? is astepsize parameter. Bk in the above formulation would bring in the originalsecond order curvature information to the optimization sub-routine, result-ing in a more effective step at each iteration. However, unlike the SPGmethod, Equation 4.8 does not have a closed form solution as the Euclideanprojection. Instead, the equivalent norm projection is associated with theHessian (quasi Newton) norm ball:xk+1 = arg min||x||1??||x? (xk ? ?dk)||2Bk , (4.9)where the quasi Newton norm ||y||Bk is defined as?y?Bky for some vectory, and dk is the quasi Newton direction. In PQN, SPG is used iterativelyto solve 4.8. It is worthwhile to mention at this point that minimizing thequadratic approximation in PQN algorithm using the SPG algorithm doesnot require additional evaluation of f(xk), which means this interior SPGminimization is a much more friendly optimization routine with no extramat-vecs.By using the local curvature information in Bk, each update on x in thePQN iterations will contribute more compared to gradient based method,resulting in a decrease in the total number of iterations needed to achievethe same data residual. Therefore, it would be very appealing to replacethe SPG algorithm in SPG`1 framework with PQN algorithm immediately.The resulting algorithm PQN`1 would be able to shift the time usage fromcomputing mat-vecs to computing the more sophisticated quasi Newton pro-jection, resulting in a decrease in the total time.Note, there are also some other improvements we implement in thePQN`1 algorithm, in addition to replacing SPG with PQN, which includebut are not limited to: (1), applying a projected curvy line search insteadof the basic Armijo line search in the PQN algorithm; (2), coming up witha support strategy in the PQN algorithm when the L-BFGS update cannot give a positive definite Hessian approximation; (3), building an adap-tive stopping criteria for the SPG subroutine, i.e., the accuracy of the SPGsub-routine is determined by the accuracy of PQN routine. All those dif-ferent components contribute together to the new high performance PQN`1algorithm. In the next section we will scompare its performance in variousproblem settings with the SPG`1 method.584.3. Evaluation4.3 EvaluationIn this section, some stylized compressed sensing numerical experiments andseismic imaging experiments are performed to examine the performance ofthe new PQN`1 algorithm.4.3.1 Stylized Numerical ExampleIdeal case In the first example, we show an experiment that involves anideal full rank matrix: we try to recover a vector of length 512 components,of which 20 are nonzero, from 120 noise free observations, with the i.i.dGaussian sensing matrix. Comparison between the two algorithms (SPG`1and PQN`1) is based on the results when they achieve the same data resid-ual. In the top plot of the Figure 4.3.1, we show the signals recovered bySPG`1 and PQN`1 within one experiment. The true signal is shown on thetop and the recovery with SPG`1 and PQN`1 with the same data residualare shown in the middle and at the bottom respectively. Note here in thePQN`1 algorithm, we allow a maximum of 30 for the rank of the L-BFGSapproximate Hessian matrix. From this plot we can see that both of thosealgorithms give reasonable and comparable recovery, however to arrive atthe same data residual, SPG`1 needs to evaluate 86 forward plus 67 adjointmatrix vector products, whereas PQN`1 only needs 38 forward plus 38 ad-joint matrix vector products. In the middle plot of the Figure 4.3.1, we showthe convergence rate of the two algorithms in this experiment, plotted asdata residual versus the number of iterations. Results show the data residualdecreases much faster for the PQN`1 compared to the SPG`1. This obser-vation is consistent with the previous figure where we observe different totalnumber of matrix vector products for the two algorithms. Having shown thesuperior performance of the PQN`1 algorithm in terms of overall less matrixvector products. In the bottom plot of Figure 4.3.1, we further illustrate thememory (storage) and CPU (time) trade off of the PQN`1 algorithm. In thisplot, for each fixed rank of the L-BFGS Hessian approximation, the num-ber of mat-vecs was averaged over twenty runs with different realizationsof the sensing matrix. This plot demonstrates a clear trend of decreasingof the required matrix vector multiplications as we increase the rank of theapproximate Hessian Bk, which essentially is increasing the memory usage.Ill-conditioned case In this example, we look specifically into the situ-ation when the sensing matrix is ill-conditioned. In the previous case, wehave a perfect ideal full rank sensing matrix, the Gaussian matrix. But a full594.3. Evaluation50 100 150 200 250 300 350 400 450 500?101 true signal50 100 150 200 250 300 350 400 450 500?1?0.500.5spg recovery, 86+97 mat?vecs50 100 150 200 250 300 350 400 450 500?101 pqn recovery, 38+38 mat?vecs10 20 30 40 50 60?8?7?6?5?4?3?2?10iterationslog(residual)convergence of SPGl1 and PQNl1 SPGl1PQNl15 10 15 20 25 30100105110115120rank of L?BFGS approximate Hessiannumber of mat?vecsmemory and CPU trade off of the PNQl1 algorithm mat?vec of SPGl1 is 219Figure 4.2: SPG`1 and PQN`1 comparison in one stylized numerical ex-ample. Top plot: recovery with the same data residual, middle plot: theconvergence rate, black line is for SPG `1 and blue is for PQN`1, and bottomplot: memory and CPU trade off for the PQN`1 algorithm.604.3. Evaluationrank sensing matrices rarely arise in real applications. The more commonill-conditioned sensing matrices often cause slow down of the convergence ofthe gradient based methods, such as SPG`1. As mentioned previously in theintroduction of the L-BFGS method, PQN`1 is supposed to outperform theSPG`1 method tremendously when the problem is ill-conditioned. In Figure4.3, we show the singular values of an ill-conditioned sensing matrix (topplot), recovery with SPG`1 and PQN`1 with the same data residual (mid-dle plot) and the memory and CPU trade off analysis (bottom plot) of thePQN`1 algorithm in this example. Unlike the previous full rank experiment,to generate the first two plots, we generate the ill-conditioned sensing matrixby multiplying the single values of a full rank random Gaussian matrix withan exponential decay damping factor. Similar to the full rank case, here thePQN`1 recovered signal is also generated by allowing a maximum of 30 forthe rank of the L-BFGS approximate Hessian matrix. The third plot is alsorecording the average performance over twenty random realizations for eachL-BFGS approximate Hessian rank. The observation we made for from theprevious full rank experiment also apply to this ill-conditioned experiment.However, in this case, with the sensing matrix being ill-conditioned, the ad-vantage of the PQN`1 algorithm is more obvious compared to the SPG`1algorithm.614.3. Evaluation0 50 100 150 20000.10.20.30.40.50.60.7 singular valuesNo. of singular valuessingular value amplitudes50 100 150 200 250 300 350 400 450 500?101 original signal50 100 150 200 250 300 350 400 450 500?0.500.5SPGl1 recovery, 679 + 442 mat?vecs50 100 150 200 250 300 350 400 450 500?0.500.5PQNl1 recovery, 60 + 60 mat?vecs5 10 15 20 25 30150200250300350400450500rank of LBFGS approximate Hessianthe number of mat?vecsmemory and CPU trade off of the PNQl1 algorithmmat?vec of SPGl1 is 1276Figure 4.3: SPG`1 and PQN`1 comparison within in one ill-conditionednumerical example. Top plot, singular values of the sensing matrix; middleplot, recovery with the same data residual; bottom plot, memory and CPUtrade off of the PQN`1 algorithm.624.3. Evaluation4.3.2 Seismic Imaging ExampleFinally, a comparison of the algorithm PQN`1 and SPG`1 is performed witha sparsity promoting seismic imaging example. The experiment is basedon a subset of BG compass model. In this experiment we put the sourcesand receivers on the top layers? grids, with the grid interval 10 meters forboth horizontal coordinates and vertical coordinates. We used the nar-row band (8Hz - 25Hz) linearized data generated with the constant-densityacoustic frequency-domain modelling operator (https:// www.slim.eos.ubc.ca/ SoftwareDemos/ applications/ Modeling/ 2DAcousticFreqModeling/ ). The2D Curvelet transform is used as the sparsity basis. Note that we usually donot want to perform sparsity promoting imaging to sequential shot experi-ments. Here it is only shown as an illustration of the difference in softwareperformance. For simplicity and without loss of generality we use a nar-row band of low frequencies to image, which is also not commonly used inseismic imaging. Figure 4.4(a) shows the reverse time migration result as areference, Figure 4.4(b) and Figure 4.4(c) show SPG`1 and PQN`1 recover-ies with the same data residual. Based on the above result, we can observethat: first, imaging quality from both algorithms are improved comparedto that of reverse time migration; second, recoveries from the two algo-rithms with same data residual produce similar recovery images. In Table4.1, we show the total number of iterations, the total number of mat-vecs(forward+adjoint) and total time required for SPG`1 and PQN`1 methods.We also show the memory versus CPU trade off for the PQN`1 algorithmwith different maximum approximate Hessian rank (shown in parentheses).Based on those results, we can safely draw the conclusion, that the proposedPQN`1 algorithm is able to accelerate the solution of the seismic imagingproblem compared to the SPG`1 algorithm. The increase in rank of theL-BFGS approximate Hessian matrix generally results in a decrease in thetotal computational time, and the decrease is most obvious before the rankreaches 10. This suggests that in this seismic imaging example, we mightconsider rank 10 as a optimal trade-off between the CPU and memory usage.The observations and conclusions from both the numerical and seismicimaging examples suggest that the proposed sparsity optimization solver ?PQN`1 is a favourable choice in large scale optimization problems, espe-cially when evaluation the objective function is costly. Compared to SPG`1algorithm, the PQN`1 algorithm shifts the computational time from the?mat-vec time? to the ?project time?, and results in a decrease in the totaltime.634.3. EvaluationSource/receiver*coordinateDepth*coordinateImaging*result*from*RTM(a)Source/receiver*coordinateDepth*coordinateImaging*result*from*SPGl1(b)Source/receiver*coordinateDepth*coordinateImaging*result*from*PQNl1(c)Figure 4.4: SPG`1 and PQN`1 comparison within in one seismic imagingexample, (a) reverse time migration result as a reference (b) SPG`1 recovery,and (c) PQN`1 recover with the same data residual644.4. ConclusionIteration Mat-vec Total Time(s)SPG`1 100 156+101 39375PQN`1(5) 36 45+45 9278.7PQN`1(10) 35 44+44 7832.4PQN`1(20) 35 44+44 6515.3PQN`1(30) 35 44+44 6335.9Table 4.1: Memory and CPU tradeoff in PQN`1 sparse seismic imagingproblem, time advantage is very obvious compared to SPG`14.4 ConclusionIn this chapter, we proposed a new sparsity optimization solver ? thePQN`1 ? to accelerate the optimization problems with large costly ob-jective functions. We introduced basic building blocks of the algorithm andthe state-of-the-art SPG`1 framework. Evaluation with both numerical andseismic imaging examples suggested that the PQN`1 algorithm a promisingchoice for large scale optimization problems.65Chapter 5ConclusionIn this thesis we proposed to improve the efficiency of some existing seismicimaging algorithms with the use of the spectral projector and the ?jointsparsity?. In Chapter 2, we introduced the theories and implementations ofthe spectral projector and the resulted two-way wave equation based depthstepping imaging algorithm. Theoretical analysis and practical evaluationsshowed that the algorithm has a promising time complexity for extremelylarge size seismic imaging problems. In Chapter 3, we introduced the ?jointsparsity? concept as a potential boost engine to improve on the currentsparsity promoting seismic imaging framework. We briefly covered the the-oretical background existing for ?joint sparsity? promoting problems. Wealso proposed a preliminary scenario of using the ?joint sparsity? promotingrecovery in seismic imaging context. Evaluation shows the superior prac-tical performance with ?joint sparsity? compared to only using sparsity inthe seek of the ray parameter image volumes. In Chapter 4, the aim forimproving imaging efficiency is addressed from a different point of view ?the optimization solver. Specifically, with the application to the sparsitypromoting seismic imaging. A fast solver that is more suitable for the ex-tremely large scale sparsity promoting seismic imaging problems is proposed.Evaluations showed overall higher efficiency compared to the current widelyused sate-of-the-art sparsity promoting solver SPG`1.66BibliographyL. Auslander and A. Tsao. On parallelizable eigensolvers. Citeseer, 1991.J. Barzilai and J. M. Borwein. Two-point step size gradient methods. IMAJournal of Numerical Analysis, 8(1):141?148, 1988.E. v. d. Berg and M. P. Friedlander. Joint-sparse recovery from multiplemeasurements. arXiv preprint arXiv:0904.2051, 2009.A. Berkhout and D. Verschuur. Imaging of multiple reflections. Geophysics,71(4):SI209?SI220, 2006.D. P. Bertsekas. Projected newton methods for optimization problems withsimple constraints. SIAM Journal on Control and Optimization, 20(2):221?246, 1982.B. Biondi and W. W. Symes. Angle-domain common-image gathers for mi-gration velocity analysis by wavefield-continuation imaging. Geophysics,69(5):1283?1298, 2004.S. Chandrasekaran, P. Dewilde, M. Gu, W. Lyons, and T. Pals. A fast solverfor hss representations via sparse matrices. SIAM Journal on MatrixAnalysis and Applications, 29(1):67?81, 2006.J. Chen and X. Huo. Theoretical results on sparse representations ofmultiple-measurement vectors. Signal Processing, IEEE Transactions on,54(12):4634?4643, 2006.J. F. Claerbout. Imaging the earth?s interior, volume 316. Citeseer, 1985.S. F. Cotter, B. D. Rao, K. Engan, and K. Kreutz-Delgado. Sparse solutionsto linear inverse problems with multiple measurement vectors. SignalProcessing, IEEE Transactions on, 53(7):2477?2488, 2005.D. L. Donoho. Compressed sensing. Information Theory, IEEE Transactionson, 52(4):1289?1306, 2006.67BibliographyY. C. Eldar and H. Rauhut. Average case analysis of multichannel sparserecovery using convex relaxation. Information Theory, IEEE Transactionson, 56(1):505?519, 2010.E. M. Gafni and D. P. Bertsekas. Two-metric projection methods for con-strained optimization. SIAM Journal on Control and Optimization, 22(6):936?964, 1984.R. Gribonval, H. Rauhut, K. Schnass, and P. Vandergheynst. Atoms of allchannels, unite! average case analysis of multi-channel sparse recoveryusing greedy algorithms. Journal of Fourier analysis and Applications, 14(5-6):655?687, 2008.D. Hale. Stable explicit depth extrapolation of seismic wavefields. Geo-physics, 56(11):1770?1777, 1991.N. Halko, P.-G. Martinsson, and J. A. Tropp. Finding structure with ran-domness: Probabilistic algorithms for constructing approximate matrixdecompositions. SIAM review, 53(2):217?288, 2011.G. Hennenfent, L. Fenelon, and F. J. Herrmann. Nonequispaced curvelettransform for seismic data reconstruction: A sparsity-promoting ap-proach. Geophysics, 75(6):WB203?WB210, 2010. URL http:// slim.eos.ubc.ca/ Publications/ Public/ Journals/ hennenfent2010nct.pdf .F. J. Herrmann, C. R. Brown, Y. A. Erlangga, and P. P. Moghad-dam. Curvelet-based migration preconditioning and scaling. Geo-physics, 74:A41, 2009. URL http:// slim.eos.ubc.ca/ Publications/ Public/TechReports/ herrmann08cmp-r.pdf .F. J. Herrmann, M. P. Friedlander, and O. Yilmaz. Fighting thecurse of dimensionality: Compressive sensing in exploration seismol-ogy. Signal Processing Magazine, IEEE, 29(3):88?100, 2012. ISSN1053-5888. URL https:// www.slim.eos.ubc.ca/ Publications/ Public/Journals/ IEEESignalProcessingMagazine/ 2012/ Herrmann11TRfcd/Herrmann11TRfcd.pdf .N. J. Higham. Functions of matrices: theory and computation. Siam, 2008.B. Jumah and F. J. Herrmann. Dimensionality-reduced esti-mation of primaries by sparse inversion. 2012. URL https:// www.slim.eos.ubc.ca/ Publications/ Private/ Submitted/ Journal/bander2012dre/ bander2012dre.pdf .68BibliographyC. S. Kenney and A. J. Laub. The matrix sign function. Automatic Control,IEEE Transactions on, 40(8):1330?1348, 1995.A. Kleyn. Seismic reflection interpretation. Applied Science Publishers,1983.D. D. Kosloff and E. Baysal. Migration with the full acoustic wave equation.Geophysics, 48(6):677?687, 1983.E. Levitin and B. Polyak. Constrained minimization methods.{USSR} Computational Mathematics and Mathematical Physics, 6(5):1 ? 50, 1966. ISSN 0041-5553. doi: http://dx.doi.org/10.1016/0041-5553(66)90114-5. URL http:// www.sciencedirect.com/ science/article/ pii/ 0041555366901145 .T. T. Lin and F. J. Herrmann. Compressed wavefield extrap-olation. Geophysics, 72(5):SM77?SM93, 2007. URL https:// www.slim.eos.ubc.ca/ Publications/ Public/ Journals/ Geophysics/2007/ lin07cwe/ lin07cwe.pdf .W. Lyons. Fast algorithms with applications to pdes. PhD thesis, UNIVER-SITY of CALIFORNIA, 2005.W. A. Mulder and R.-E. Plessix. A comparison between one-way and two-way wave-equation migration. Geophysics, 69(6):1491?1504, 2004.A. M. Popovici. Prestack migration by split-step dsr. Geophysics, 61(5):1412?1416, 1996.K. Sandberg and G. Beylkin. Full-wave-equation depth extrapolation formigration. Geophysics, 74(6):WCA121?WCA128, 2009.M. Schmidt, D. Kim, and S. Sra. 1 projected newton-type methods inmachine learning. Optimization for Machine Learning, page 305, 2011.M. Schmidt, D. Kim, and S. Sra. 1 projected newton-type methods inmachine learning. Optimization for Machine Learning, page 305, 2012.M. W. Schmidt, E. Berg, M. P. Friedlander, and K. P. Murphy. Optimiz-ing costly functions with simple constraints: A limited-memory projectedquasi-newton algorithm. In International Conference on Artificial Intel-ligence and Statistics, 2009.P. S. Schultz and J. F. Claerbout. Velocity estimation and downward con-tinuation by wavefront synthesis. Geophysics, 43(4):691?714, 1978.69BibliographyZ. Sheng, P. Dewilde, and S. Chandrasekaran. Algorithms to solve hierar-chically semi-separable systems. pages 255?294, 2007.J. A. Tropp. Algorithms for simultaneous sparse approximation. part ii:Convex relaxation. Signal Processing, 86(3):589?602, 2006.M. Tygert and Rokhlin. A randomized algorithm for approximating the svdof a matrix. 2007. URL https:// www.ipam.ucla.edu/ publications/ setut/setut?7373.pdf .E. van den Berg and M. P. Friedlander. Probing the pareto frontier for basispursuit solutions. SIAM Journal on Scientific Computing, 31(2):890?912,2008. doi: 10.1137/080714488. URL http:// link.aip.org/ link/ ?SCE/ 31/890 .E. van den Berg and M. P. Friedlander. Sparse optimization with least-squares constraints. SIAM J. Optimization, 21(4):1201?1229, 2011.S. Wang, J. Xia, and M. V. de Hoop. A 3d massively parallel structuredapproximate direct helmholtz solver: algorithms & methods. In 80th SEGmeeting, Denver Colorado, USA, Expanded Abstracts, pages 1039?1043,2010.K. Wapenaar and J. L. Grimbergen. A discussion on stability analysis ofwave field depth extrapolation. In 1998 SEG Annual Meeting, 1998.J. Xia. On the complexity of some hierarchical structured matrix algorithms.SIAM Journal on Matrix Analysis and Applications, 33(2):388?410, 2012.O?. Yilmaz and S. M. Doherty. Seismic data processing, volume 2. Societyof Exploration Geophysicists Tulsa, 1987.70Appendix ASPG`1The solver SPG`1 is a high performance large scale solver for sparsity pro-moting problems, originating from van den Berg and Friedlander (2008).The algorithm is designed to solve the following sparsity promoting BP orBPDN problem:(BP) minimizex||x||1 subject to Ax = b (A.1)(BPDN) minimizex||x||1 subject to ||Ax? b||2 ? ? (A.2)In this appendix we will briefly summarize the theoretical background thatsupports SPG`1. Interested readers are referred to van den Berg and Fried-lander (2008) for detailed illustration and proof of convergence.Pareto curve Let x? denote the optimal solution of a Lasso problemparameterized by ? :(LS? ) minimizex||Ax? b||2 subject to ||x||1 ? ?. (A.3)Accordingly, we define a single parameter function ? describing the optimalvalue of (LS? ) for each ? ? 0:?(?) = ||r? ||2, with r? := b?Ax? (A.4)The Pareto curve (?(?)) is shown as the black line in In Figure A.1. Onthe vertical axis is the two norm of the data residual, and on the horizontalaxis is the one norm of the solution. The region above the Pareto curve isthe feasibility region for the BPDN problem, whereas, the region below isnot feasible. The red dotted line shows a realization of one solve of a BPDNproblem, starting with an initial guess of zero, and ending when the reachingnoise level ? is reached.There are several nice properties of the Pareto curve as demonstrated inTheorem 2.1 from van den Berg and Friedlander (2008), for example:(a) The function ? is convex and non-increasing.(b) For all ? ? (0, ?BP ), ? is continuously differentiable, ??(?) = ?? , and the71Appendix A. SPG`1Figure A.1: Adapted from Figure 2.1 in van den Berg and Friedlander(2008). Pareto curve (black line) : The tradeoff between data residual andmodel sparsity. On the vertical axis is two norm of data residual, and onthe horizontal axis is the one norm of solution. The red dot line showsthe solution path of one particular BPDN problem, each dot represents oneiteration.optimal dual variable ?? = ||?T y? ||? , where y? = r?/||r? ||2.(c) For ? ? [0, ?BP ], ||x? ||1 = ? , and ? is strictly decreasing.Those properties permit us to use a Newton-based root-finding algorithm tofind roots of the nonlinear equation:?(?) = ? (A.5)Interestingly enough, the proof of those properties in van den Berg andFriedlander (2008) does not rely on any specific norm used in the objectiveand regularization functions, which offers a friendly extension of the frame-work to any norm formulation as long as its primal, dual and projection iswell defined.Inexact Newton root finding Solving for Equation A.5 the algorithmgenerates a sequence of regularization parameters ?k ? ?? based on theNewton iteration:?k+1 = ?k +4?k with 4?k := (? ? ??k)/??(?k) (A.6)Each iteration of the root-finding requires the evaluation of ? and ?? atsome ? , and hence the minimization of (LS? ). This is a potentially expensivesubproblem, and the effectiveness of SPG`1 hinges on the ability to solve thissubproblem only approximately. Precisely, the Newton iteration describedabove needs only an approximation of ?(?k) and ??(?k). As proved in section72Appendix A. SPG`13.1 from van den Berg and Friedlander (2008), this inexact Newton iterationpossesses a sublinear convergence rate, and the rate can be made arbitrarilyclose to super linear by increasing the accuracy with each subproblem.Spectral projected gradient Each iteration of the Newton root-findingmethod described previously requires the (approximate) evaluation of ?(?),and therefore a procedure for minimizing (LS? ). For solving each individualLasso problem, SPG`1 uses a well known high performance Spectral Pro-jected Gradient (SPG) method, which is shown in Algorithm 6 (van denBerg and Friedlander (2008)). SPG includes projecting iterates onto thefeasible set {x|||x||1 ? ?} via the projection operator:P? (c) := arg minx||c? x||2 subject to ||x||1 ? ? (A.7)which gives the projection of an n?vector c onto the one-norm ball withradius ? . Each iteration of the algorithm searches the projected gradientpath P? (x`??g`), where g` is the current gradient for the function ||Ax?b||22.SPG algorithm also uses a non monotone line search, which ensures that atleast every M iterations yield a sufficient decrease in the objective functionwith the step length ?. ? is initialized with the spectral step length thatwas introduced by Barzilai and Borwein (1988).73Appendix A. SPG`1Algorithm 6 Spectral projected gradient for (LS? ) (van den Berg andFriedlander (2008))Input: x, ?, ?SPGoutput: x? , r?Set minimum and maximum step lengths 0 < ?min < ?maxSet initial step length ?0 ? [?min, ?max] and sufficient descent parameter? ? (0, 1)Set an integer line search history length M ? 1Set initial iterates : x0 ? P? [x], r0 ? b?Ax0, g0 ? ?AT r0`? 0begin?SPGl ? ||rl||2 ? (bT r` ? ? ||g`||?)/||r`||2if ?SPGl < ?SPG then break?? ?`beginx?? P? [x` ? ?g`]r? ? b?Ax?if ||r?||22 ? maxj?[0,min{k,M?1}]||r`?j ||22 + ?(x?? x`)T g` thenbreakelse?? ?/2endendx`+1 ? x?, r`+1 ? r?, g`+1 ? ?AT r`+14x? x`+1 ? x`,4g ? g`+1 ? g`if 4xT 4 g ? 0 then?`+1 ? ?maxelse?`+1 ? min{?max,max[?min, (4xT 4 x)/(4xT 4 g)]}end`? `+ 1endreturn x? ? x`, r? ? r`74
- Library Home /
- Search Collections /
- Open Collections /
- Browse Collections /
- UBC Theses and Dissertations /
- Efficient seismic imaging with spectral projector and...
Open Collections
UBC Theses and Dissertations
Featured Collection
UBC Theses and Dissertations
Efficient seismic imaging with spectral projector and joint sparsity Miao, Lina 2014
pdf
Page Metadata
Item Metadata
Title | Efficient seismic imaging with spectral projector and joint sparsity |
Creator |
Miao, Lina |
Publisher | University of British Columbia |
Date Issued | 2014 |
Description | In this thesis, we investigate the potential of improving the efficiency of seismic imaging with two advanced techniques: the spectral projector and the “joint sparsity”. The spectral projector offers an eigenvalue decomposition free computation routine that can filter out unstable evanescent wave components during wave equation based depth extrapolation. “Joint sparsity” aims to improve on the pure sparsity promoting recovery by making use of additional structure information of the signal. Besides, a new sparsity optimization algorithm — PQNl1 — is proposed to improve both theoretical convergence rate and practical performance for extremely large seismic imaging problems. |
Genre |
Thesis/Dissertation |
Type |
Text |
Language | eng |
Date Available | 2014-03-18 |
Provider | Vancouver : University of British Columbia Library |
Rights | Attribution-NonCommercial-NoDerivs 2.5 Canada |
DOI | 10.14288/1.0166861 |
URI | http://hdl.handle.net/2429/46256 |
Degree |
Master of Science - MSc |
Program |
Geophysics |
Affiliation |
Science, Faculty of Earth, Ocean and Atmospheric Sciences, Department of |
Degree Grantor | University of British Columbia |
GraduationDate | 2014-05 |
Campus |
UBCV |
Scholarly Level | Graduate |
Rights URI | http://creativecommons.org/licenses/by-nc-nd/2.5/ca/ |
AggregatedSourceRepository | DSpace |
Download
- Media
- 24-ubc_2014_spring_miao_lina.pdf [ 3.2MB ]
- Metadata
- JSON: 24-1.0166861.json
- JSON-LD: 24-1.0166861-ld.json
- RDF/XML (Pretty): 24-1.0166861-rdf.xml
- RDF/JSON: 24-1.0166861-rdf.json
- Turtle: 24-1.0166861-turtle.txt
- N-Triples: 24-1.0166861-rdf-ntriples.txt
- Original Record: 24-1.0166861-source.json
- Full Text
- 24-1.0166861-fulltext.txt
- Citation
- 24-1.0166861.ris
Full Text
Cite
Citation Scheme:
Usage Statistics
Share
Embed
Customize your widget with the following options, then copy and paste the code below into the HTML
of your page to embed this item in your website.
<div id="ubcOpenCollectionsWidgetDisplay">
<script id="ubcOpenCollectionsWidget"
src="{[{embed.src}]}"
data-item="{[{embed.item}]}"
data-collection="{[{embed.collection}]}"
data-metadata="{[{embed.showMetadata}]}"
data-width="{[{embed.width}]}"
async >
</script>
</div>
Our image viewer uses the IIIF 2.0 standard.
To load this item in other compatible viewers, use this url:
http://iiif.library.ubc.ca/presentation/dsp.24.1-0166861/manifest